- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Delete a source
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Understanding the status of your dataset
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Selecting label confidence thresholds
- Create a stream
- Update or delete a stream
- Licensing information
- FAQs and more
Selecting label confidence thresholds
The platform is typically used in one of the first steps of an automated process: ingesting, interpreting and structuring an inbound communication, such as a customer email, much like a human would do when that email arrived in their inbox.
When the platform predicts which labels (or tags) apply to a communication, it assigns each prediction a confidence score (%) to show how confident it is that the label applies.
If these predictions are to be used to automatically classify the communication, however, there needs to be a binary decision - i.e. does this label apply or not? This is where confidence thresholds come in.
A confidence threshold is the confidence score (%) at or above which an RPA bot or other automation service will take the prediction from the platform as a binary 'Yes, this label does apply' and below which it will take the prediction as a binary 'No, this label does not apply'.
It's therefore very important to understand confidence thresholds and how to select the appropriate one, in order to achieve the right balance of precision and recall for that label.
- To select a threshold for a label, navigate to the Validation page and select the label from the label filter bar
- Then simply drag the threshold slider, or type a % figure into the box (as shown below), to see the different precision and recall statistics that would be achieved for that threshold
- The precision vs recall chart gives you a visual indication of the confidence thresholds that would maximise precision or
recall, or provide a balance between the two:
- In the first image below, the confidence threshold selected (68.7%) would maximise precision (100%) - i.e. the platform should typically get no predictions wrong at this threshold - but would have a lower recall value (85%) as a result
- In the second image, the confidence threshold selected (39.8%) provides a good balance between precision and recall (both 92%)
- In the third image, the confidence threshold selected (17%) would maximise recall (100%) - i.e. the platform should identify every instance where this label should apply - but would have a lower precision value (84%) as a result
So how do you choose the threshold that is right for you? The simple answer is: it depends.
Depending on your use case and the specific label in question, you might want to maximise either precision or recall, or find the threshold that gives the best possible balance of both.
When thinking about what threshold is required, it's helpful to think about potential outcomes - what is the potential cost or consequence to your business if a label is incorrectly applied? What about if it is missed?
For each label your threshold should be chosen based on the better outcome for the business if something goes wrong - i.e. something is incorrectly classified (a false positive), or something is missed (a false negative).
For example, if you wanted to automatically classify inbound communications in different categories, but also had a label for 'Urgent' that routed requests to a high-priority work queue, you might want to maximise the recall for this label to ensure that no urgent requests are missed, and accept a lower precision as result. This is because it may not be very detrimental to the business to have some less urgent requests put into the priority queue, but it could be very detrimental to the business to miss an urgent request that is time sensitive.
As another example, if you were automating a type of request end-to-end that was some form of monetary transaction or was of high-value, you would likely choose a threshold that maximised precision, so as to only automate end-to-end the transactions the platform was most confident about. Predictions with confidences below the threshold would then be manually reviewed. This is because the cost of a wrong prediction (a false positive) is potentially very high if a transaction is then processed incorrectly.