communications-mining

latest

false

Important :

Communications Mining is now part of UiPath IXP. Check the User Guide Introduction for more details.

Communications Mining User Guide

Last updated May 23, 2025

Label hierarchy and best practices

Introduction

To meet your business objectives, it is important to understand how to create your taxonomy before you begin to train your model. This includes how to name and structure your labels, and what they should consist of. For more details, check Building your taxonomy structure.

Naming labels

The generative annotation feature uses label names and descriptions as training input. As a result, it is important to use clear, distinct, and descriptive label names. Label names and descriptions provide the model with the best training inputs when it automatically generates predictions.

You can rename labels and add levels of hierarchy at any time. This enables you to refine labels and label descriptions to improve the automatically-generated predictions before you annotate messages with labels.

Label hierarchies

When you name labels, you must also determine their hierarchy within your taxonomy. They can have multiple levels of hierarchy, separated by a greater-than sign >, to capture when a label concept is a subset of a broader parent concept.

Label structure examples:

[Parent Label]
[Parent Label] > [Child Label]
[Parent Label] > [Branch Label] > [Child Label]

You can add more than three levels of hierarchy, but we do not recommend that you do this often, as it becomes complex to train the model. To add additional levels of hierarchy, you can rename your labels later in the model training process.

Conceptually, each label nested under another should represent a subset of the label above it. This nesting represents the level of hierarchy, and is established by the greater-than sign > when you specify the label name.

Illustration of how label hierarchies work conceptually docs image

Practical examples of hierarchies

To understand how hierarchies work, consider the Child Label X from the diagram in the Label hierarchies section.

When the model predicts that Child Label X applies to a message, it also predicts Branch Label C and Parent Label 1 at the same time. This is because Child Label X is a subset of the two.

Each level of hierarchy adds an increasing level of specificity. However, the model is often more confident in assigning a parent or branch label than a more specific child label. This means the model can assign different probabilities to different label predictions within the same hierarchy.

To exemplify, for a particular message, the model could be:

99% confident that Parent Label 1 applies.
88% confident that Branch Label C applies.
75% confident that Child Label X applies.

Note: If the model predicts a child label for a message, it should always predict the parent label - and branch label where applicable - with at least the same confidence as the child label, if not greater.

The model predicts each label independently, so it is important that parent labels represent genuine topics or concepts instead of abstract ones.

For example, it is ineffective to use Process as a parent label if it groups specific processes, as it is too abstract for the model to predict. Instead, a specific process name from the message text works better as a parent label, with branch and child labels that represent relevant sub-processes.

Deciding between parent labels and child labels

Sometimes, you may need to make difficult choices regarding the structure of your taxonomy. For example, it could be difficult to choose whether a label should be a parent label or a child label. This can happen because the label could logically serve as a broad parent category with its own sub-categories, or it could be a specific sub-category of another broader parent category.

For example, imagine a dataset of hotel reviews. Many reviews might include the pricing of various aspects of the holiday and hotel such as the restaurant, the bar, the rooms, the activities, and so on.

You might consider the following choices:

You could have Pricing as a parent label, and each specific aspect of pricing (i.e. restaurant) as child labels.
However, you could also have parent labels that relate to the specific aspects such as Restaurants and Rooms, and have Pricing as a child label under each one.

When you decide, make sure you consider the following:

Will there be a significant number of other concepts related to this broader topic you would like to include? If yes, then it should be a parent label.
What is the most important thing to track from a Management Information or reporting perspective? Considering our example, is it useful to clearly view in the Communications Mining analytics exactly how many people are talking about pricing and its sub-categories? Or is it more helpful to see overall statistics on the feedback on rooms, restaurants, activities, etc., with pricing being just one aspect of those?

There is not always a clear right or wrong answer in these situations – it ultimately depends on what matters most to you and your business.

Capturing data with labels

So far, we have discussed how to name labels and structure them in hierarchies. However, you might still be wondering what exactly a label should capture.

It is important to remember that Communications Mining is a natural language processing (NLP) tool. The platform reads and interprets each message that is assigned a label, and begins to form an understanding of how to identify that label concept based predominantly on the text within it.

As you add more varied and consistent examples for each label, the model improves its understanding of that label concept. Once a label performs well, avoid adding more labels, as the process would yield diminishing returns. Also, avoid accepting a large number of high-confidence predictions for a label, as this does not provide the model with new information.

Since Communications Mining uses the language of the message to understand and identify what constitutes label concepts, the label must be clearly identifiable from the text of the messages to which it is applied. For an email message, this includes both the subject and the body of the email.

The following email example has the label Cancellation > Confirmation > Termination applied:

Example email message highlighting the text that the model takes into account when making predictions

You can clearly infer the label name from the email subject and body.

While the model can consider certain metadata properties, such as NPS scores, when it trains on customer feedback datasets to help understand sentiment, the text of the message remains the most important data for Communication Mining models.

Note: The model does not consider the specific sender or recipient address of an email. Therefore, these addresses should not be used at all when you determine which label to apply to an email message.

This means that that each label must be specific in what it aims to capture. Otherwise, the model will struggle to identify the trends and patterns in the language necessary to predict the label concept accurately.

Why you should avoid using very broad labels

Extremely broad labels such as General query or Everything else can be unhelpful if you use them to group together multiple different topics and there is no clear pattern or commonality between the examples provided to the model.

For the model to predict a label accurately, it requires multiple similar examples of the various expressions of each concept captured by the label. Therefore, extremely broad labels need a very large number of examples to be predicted effectively.

It is a better practice to split broad labels out into distinct labels, even if you have Everything else > [Various child labels].

If the model can better identify a child label because it is more specific and clearly identifiable, compared to a very broad parent category, it can significantly enhance its ability to predict the parent label as well.

Label descriptions

The Generative Annotation feature uses label descriptions and label names to automatically train a specialized model. Therefore, it is important to add descriptive, clear, and informative descriptions to each label so the model can generate accurate predictions.

The Generative AI model uses descriptions as inputs to pre-annotate messages in the background. This reduces the time and effort you spend to manually annotate examples.

You can add label descriptions once the Create Dataset process creates them, or you can add or edit them from the Taxonomy page in Dataset Settings.

One effective way to maintain label consistency throughout the model-building process is to add descriptions to each label. This is useful if multiple users train your model because it ensures all users have the same understanding of a given label and its associated concept. Another benefit of maintaining label consistency is that it makes the handover process more efficient if you need to transfer the model to another user.

Example label descriptions within the Dataset Settings page