communications-mining
latest
false
Communications Mining User Guide
Last updated Nov 7, 2024

Training using clusters

User permissions required: ‘View Sources’ AND ‘Review and annotate’.

Note: Users will be able to see messages in Discover if they have ‘View Sources’ AND see labels if they have ‘View labels’ permissions, but they will require the ‘Review and annotate’ permission in order to actually apply labels in Discover.

Overview

Once your data is in the platform, the platform will group and display 30 clusters of communications (messages) that it believes share concepts or similar intents. The aim of this part of the training process is to go through each of these clusters and annotate the data presented in each of them.

This process makes training the model easier and faster to begin with, as you can add labels to multiple similar messages at once, as well as adding/removing labels to individual messages as required.

Helpful tips for annotating clusters:

  • Don’t spend too long thinking about the name of the label. You can rename a label at any point during the training process.
  • Be as specific as possible when naming a label and keep the taxonomy as flat as possible initially (don’t add too many child labels). It is better to be as specific as possible with your label name at the outset as you can always change and restructure the hierarchy later. At this stage you should add as many labels as possible to a message as you can always go back and delete them later, which is quicker and easier than expanding an existing label.
  • Remember it’s often easier to create a more specific, finer-grained taxonomy in the first instance. If it’s too detailed it’s easy to edit and ‘prune’ your taxonomy later. This means to add more rather than less labels and sub labels
  • It’s good to start with labels in a flat hierarchy (not adding too many sub-labels) – you can always restructure the taxonomy to a more hierarchical structure later
  • Each message can have multiple labels assigned to it – make sure to apply all relevant labels, otherwise you teach the model not to associate it with the label that you have omitted
  • It is better to take the time to carefully annotate now, so that the machine can rapidly and precisely predict labels in future
  • Not all clusters will have obviously similar intents and it’s ok to move on if they are all different

Help, my Discover is empty!

When you first create a new Dataset you may find that Discover is empty as shown below. Don’t worry, this is simply because the platform's algorithms are busy working in the background to group your messages into clusters. Depending on the number of messages in the data source this could take up to a few hours to process.

Empty Discover page whilst clusters are being generated

Layout

The layout of Discover and an example cluster are shown below. In this example, the platform has detected that these messages share the common theme of the comfort of the hotel beds:

Discover page in 'cluster' mode

Layout explained:

  • A - Toggle button to switch between 'Cluster' and 'Search' mode
  • B - Drop-down menu that lets you switch between different clusters
  • C - Button to apply a label to all of the messages shown on the page
  • D - One of six messages shown from cluster #7 (each cluster contains 12 messages)
  • E - Button to apply a label to an individual message
  • F - Drop-down menu to adjust the number of messages shown on the page (between 6 and 12)
  • G - Buttons to adjust and invert the selection of messages on the page
  • H - Button to de-select a message to exclude it from labels added in bulk

Discover highlights common themes

As highlighted in the image below, Discover highlights the parts of a message that most contribute to that message being included in the cluster, helping you identify the common themes quicker:

Discover highlighting common themesdocs image

Discover highlighting common themes

  • The darker lines indicate more importantparts of the span (this is explained when you hover over it)
  • The lighter coloured lines indicate a medium and slightly weaker contribution to the cluster

Key steps

Note: The following guide describes the process for annotating a dataset that does not have sentiment analysis enabled. If you do have sentiment analysis enabled, the process is very similar, you just also select a positive or negative sentiment when applying each label, and you can use neutral label names where the sentiment denotes whether its the positive or negative version of that concept. See here for more details on annotating with sentiment analysis.

1. Review each message in the cluster

2. If you think there is a label that applies to all messages on the page, select ‘Add label’



3.Type in the name of the label and hit enter or click the pin button that appears (you can add several labels at once this way, just type in another label and click the pin button again).



Please Note: this does not apply the label yet

4. Click the ‘Apply labels’ button to assign the label(s) to the messages. The assigned labels will now appear underneath every message on the page.



Alternatively, you can add a label to individual messages by clicking the ‘Add label +’ button highlighted underneath it.

Adding labels to individual messages in Discover

If you want to add a label to a group of messages on the page, but wish to exclude one or several, you can de-select them using the toggle button highlighted (A). You can then invert the selection or de-select / reselect all using the buttons highlighted at the top (B).

Excluding individual messages from bulk selection in Discoverdocs image

You can view different pages of the same cluster (A) and adjust the number of messages per page (B) using the buttons highlighted. Once the cluster is annotated, you can move onto a new cluster using the drop down list below (C).

The model will present you with 30 clusters and it’s important to work your way through them to create a solid basis for the Explore phase. If a cluster isn’t relevant to you, however, just skip over it.

Navigating between clusters and cluster pages in Discover

Note:

Discover begins to retrain after a significant amount of training is completed. After 180 messages have been annotated (half of the clusters), Discover will retrain and update the clusters. Don't be put off, just carry on working through them until you've reviewed at least30.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.