- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Delete a source
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Understanding the status of your dataset
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Precision and recall
Overview
When you build a taxonomy by annotating data, you are creating a model. This model will use the labels you have applied to a set of data to identify similar concepts and intents in other messages and predict which labels apply to them.
In doing so, each label will have its own set of precision and recall scores.
Let’s say as part of a taxonomy we have a label in the platform called ‘Request for information’, how would precision and recall relate to this:
- Precision: For every 100 messages predicted as having the ‘Request for information’ label, it is the percentage of times that the ‘Request for information’ was correctly predicted out of the total times it was predicted. A 95% precision would mean that for every 100 messages, 95 would correctly be annotated as ‘Request for information’, and 5 would be wrongly annotated (i.e. they should not have been annotated with that label)
- Recall: For every 100 messages which should have been annotated as ‘Request for information’, how many did the platform find. A 77% recall would mean that there were 23 messages which should have been predicted as having the ‘Request for information’ label apply, but it missed them
'Recall' across all labels is directly related to the coverage of your model.
If you are confident that your taxonomy covers all of the relevant concepts within your dataset, and your labels have adequate precision, then the recall of those labels will determine how well covered your dataset is by label predictions. If all of your labels have high recall, then your model will have high coverage.
Precision versus recall
We also need to understand the trade-off between precision and recall within a particular model version.
The precision and recall statistics for each label in a particular model version are determined by a confidence threshold (i.e. how confident is the model that this label applies?).
The platform publishes precision and recall statistics live in the Validation page, and users are able to understand how different confidence thresholds affect the precision and recall scores using the adjustable slider.
As you increase the confidence threshold, the model is more certain that a label applies and therefore, precision will typically increase. At the same time, because the model needs to be more confident to apply a prediction, it will make fewer predictions and recall will typically decrease. The opposite is also typically the case as you decrease the confidence threshold.
So, as a rule of thumb, when you adjust the confidence threshold and precision improves, recall will typically decrease, and vice versa.
Within the platform, it’s important to understand this trade-off and what it means when setting up automations using the platform. Users will have to set a confidence threshold for the label that they want to form part of their automation, and this threshold needs to be adjusted to provide precision and recall statistics that are acceptable for that process.
Certain processes may value high recall (catching as many instances of an event as possible), whilst others will value high precision (correctly identifying instances of an event).