- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create or delete a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend dataset settings
- Delete messages via the UI
- Delete a dataset
- Export a dataset
- Using Exchange Integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more

Communications Mining User Guide
Label hierarchy and best practices
To meet your business objectives, it is important to understand how to create your taxonomy before you begin to train your model. This includes how to name and structure your labels, and what they should consist of. For more details, check Building your taxonomy structure.
The generative annotation feature uses label names and descriptions as training input. As a result, it is important to use clear, distinct, and descriptive label names. Label names and descriptions provide the model with the best training inputs when it automatically generates predictions.
You can rename labels and add levels of hierarchy at any time. This enables you to refine labels and label descriptions to improve the automatically-generated predictions before you annotate messages with labels.
>
, to capture when a label concept is a subset of a broader parent concept.
Label structure examples:
- [Parent Label]
- [Parent Label] > [Child Label]
- [Parent Label] > [Branch Label] > [Child Label]
You can add more than three levels of hierarchy, but we do not recommend that you do this often, as it becomes complex to train the model. To add additional levels of hierarchy, you can rename your labels later in the model training process.
>
when you specify the label name.
To understand how hierarchies work, consider the Child Label X from the diagram in the Label hierarchies section.
When the model predicts that Child Label X applies to a message, it also predicts Branch Label C and Parent Label 1 at the same time. This is because Child Label X is a subset of the two.
Each level of hierarchy adds an increasing level of specificity. However, the model is often more confident in assigning a parent or branch label than a more specific child label. This means the model can assign different probabilities to different label predictions within the same hierarchy.
To exemplify, for a particular message, the model could be:
- 99% confident that Parent Label 1 applies.
- 88% confident that Branch Label C applies.
- 75% confident that Child Label X applies.
The model predicts each label independently, so it is important that parent labels represent genuine topics or concepts instead of abstract ones.
For example, it is ineffective to use Process as a parent label if it groups specific processes, as it is too abstract for the model to predict. Instead, a specific process name from the message text works better as a parent label, with branch and child labels that represent relevant sub-processes.
Sometimes, you may need to make difficult choices regarding the structure of your taxonomy. For example, it could be difficult to choose whether a label should be a parent label or a child label. This can happen because the label could logically serve as a broad parent category with its own sub-categories, or it could be a specific sub-category of another broader parent category.
For example, imagine a dataset of hotel reviews. Many reviews might include the pricing of various aspects of the holiday and hotel such as the restaurant, the bar, the rooms, the activities, and so on.
You might consider the following choices:
- You could have Pricing as a parent label, and each specific aspect of pricing (i.e. restaurant) as child labels.
- However, you could also have parent labels that relate to the specific aspects such as Restaurants and Rooms, and have Pricing as a child label under each one.
When you decide, make sure you consider the following:
- Will there be a significant number of other concepts related to this broader topic you would like to include? If yes, then it should be a parent label.
- What is the most important thing to track from a Management Information or reporting perspective? Considering our example, is it useful to clearly view in the Communications Mining analytics exactly how many people are talking about pricing and its sub-categories? Or is it more helpful to see overall statistics on the feedback on rooms, restaurants, activities, etc., with pricing being just one aspect of those?
There is not always a clear right or wrong answer in these situations – it ultimately depends on what matters most to you and your business.
So far, we have discussed how to name labels and structure them in hierarchies. However, you might still be wondering what exactly a label should capture.
It is important to remember that Communications Mining is a natural language processing (NLP) tool. The platform reads and interprets each message that is assigned a label, and begins to form an understanding of how to identify that label concept based predominantly on the text within it.
As you add more varied and consistent examples for each label, the model improves its understanding of that label concept. Once a label performs well, avoid adding more labels, as the process would yield diminishing returns. Also, avoid accepting a large number of high-confidence predictions for a label, as this does not provide the model with new information.
Since Communications Mining uses the language of the message to understand and identify what constitutes label concepts, the label must be clearly identifiable from the text of the messages to which it is applied. For an email message, this includes both the subject and the body of the email.
Cancellation > Confirmation > Termination
applied:
You can clearly infer the label name from the email subject and body.
While the model can consider certain metadata properties, such as NPS scores, when it trains on customer feedback datasets to help understand sentiment, the text of the message remains the most important data for Communication Mining models.
This means that that each label must be specific in what it aims to capture. Otherwise, the model will struggle to identify the trends and patterns in the language necessary to predict the label concept accurately.
Extremely broad labels such as General query or Everything else can be unhelpful if you use them to group together multiple different topics and there is no clear pattern or commonality between the examples provided to the model.
For the model to predict a label accurately, it requires multiple similar examples of the various expressions of each concept captured by the label. Therefore, extremely broad labels need a very large number of examples to be predicted effectively.
It is a better practice to split broad labels out into distinct labels, even if you have Everything else > [Various child labels].
If the model can better identify a child label because it is more specific and clearly identifiable, compared to a very broad parent category, it can significantly enhance its ability to predict the parent label as well.
The Generative Annotation feature uses label descriptions and label names to automatically train a specialized model. Therefore, it is important to add descriptive, clear, and informative descriptions to each label so the model can generate accurate predictions.
The Generative AI model uses descriptions as inputs to pre-annotate messages in the background. This reduces the time and effort you spend to manually annotate examples.
You can add label descriptions once the Create Dataset process creates them, or you can add or edit them from the Taxonomy page in Dataset Settings.
One effective way to maintain label consistency throughout the model-building process is to add descriptions to each label. This is useful if multiple users train your model because it ensures all users have the same understanding of a given label and its associated concept. Another benefit of maintaining label consistency is that it makes the handover process more efficient if you need to transfer the model to another user.