communications-mining
latest
false
Communications Mining User Guide
Last updated Sep 4, 2024

Using general fields

A guide to setting up and training General Fields in the platform.

Defining and setting up your fields

It is important to define the key data points (i.e. fields) that you want to extract from your Communications Mining data. These typically facilitate downstream automation, but can also be useful for analytics - particularly in assessing the potential success rate and benefit of automation opportunities.

The definitions below help you understand the difference between general and extraction fields:
  • General fields are fields that you may want to extract, that can be found across multiple different topics/labels in a dataset.
  • Extraction fields are the fields conditioned and created on a specific label. In other words, it is tied to a specific label that you want to automate.
Note: If Generative Extraction is available in your region, it is recommended to use general fields as a backup to extraction fields, in case there are no confident label predictions for a message. Use extraction fields, linked to specific labels, to facilitate end to end automation, and general fields for automated triage.

Check out the official documentation, to find out more about the Generative extraction and General vs extraction fields. If Generative Extraction is not available in your region, continue to use general fields as normal. The rest of this section provides guidance on how to use general fields.

Ultimately, general field predictions, combined with labels, can facilitate automation by providing the structured data points needed to complete a specific task or process. It’s much more time-efficient to train general fields in your dataset in conjunction with labels, rather than focusing on one and then the other (i.e., training general fields after training a full taxonomy of labels).

Note: If you want to automate Address Change requests, a label would be used to capture the request type, whilst general fields would capture the various components of the address (i.e., Address Line, City, Postcode / Zip Code, etc.). Each prediction is made available via the API enabling every message to be acted upon.

Understanding general fields

Note: If Generative Extraction is available in your region, it is recommended to use general fields as a backup to extraction fields, in case there are no confident label predictions for a message. Use extraction fields, linked to specific labels, to facilitate end to end automation, and general fields for automated triage.

Check out the official documentation, to find out more about the Generative extraction and General vs extraction fields. If Generative Extraction is not available in your region, continue to use general fields as normal. The rest of this section provides guidance on how to use general fields.

What are general fields?

General Fields are additional elements of structured data which can be extracted from within the messages in your dataset. General Fields include data points such as monetary quantities, dates, currency codes, email addresses, URLs, as well as many other industry specific categories (see below for an example).

Example email message with address line, city name, and policy number general fields predicted

The platform is able to predict most general fields (except those trained from scratch) as soon as they are enabled, as it can identify them based on their typical, or in some instances very specific, format and a training set of similar general fields.

Like labels, users are able to accept or reject general fields that are correctly or incorrectly predicted, enhancing the model’s ability to identify them in future.

Types of general fields

There are currently two main types of general fields:

  • Pre-trained general fields that are typically based on a set of standard or custom-defined rules - e.g. Monetary Quantity, URL, and Date
  • General fields trained from scratch by a user (like they would train labels) that are machine learning based

Trainable versus non-trainable general fields

All general fields are either trainable' by nature (general fields trained from scratch), or can be made trainable when they're enabled (all other general field kinds).

Trainable general fields are those that will update live in the platform based on training provided by users. For more detail on training general fields, see here.

If you enable training on a pre-trained general field that is typically based on a set of standard or custom-defined rules, you can refine the platform's understanding of that general field within the parameters of those rules. Essentially, further training on these will reduce the scope of what the platform can consider that general field, but not increase it.

This is because many of these general fields, like dates (e.g. 'tomorrow') and monetary quantities (e.g. £20), need to be normalised into a structured data format for downstream systems. Also for general fields like ISINs or CUSIPs, these must have a set format, so the platform should not be taught to predict anything that does not conform to their defined formats.

When any trainable general fields are assigned, the platform looks at both the text of the general field, as well as the context of the general field within the rest of the communication, i.e. what is happening before and after the general field value (in the same paragraph, and the one above and below). It learns to better predict the general field based on the values themselves, as well as how the value appears within the context of the communication.

If a pre-trained general field is not set as trainable, you can still accept or reject the general field predictions you see in your dataset. These are updated and refined offline using this in-platform feedback provided by users. It is helpful for you to accept or reject these general fields when reviewing messages. To learn more on how to enable general fields on a dataset, check the Enabling, disabling, updating, and creating general fields page.

What pre-built templates are available for general fields?

Note: You can enable all the general fields as trainable, to refine the platform's understanding of them through training, and reduce the scope of what the platform considers to be a general field of that kind.

Standard template field types for general fields

When configuring general field types, you can select from one of the following pre-built options, via the template option when selecting the data type for the field type.

General Field TypeDescription
EmailAn email address.
CurrencyA currency code, e.g. GBP, CHF, or USD.
URLA uniform resource locator (i.e. web address).
SEDOLA financial security identifier, short for Stock Exchange Daily Official List, which is 7 characters in length.
BIC CodeA Business Identifier Code (BIC) is an international standard under ISO 9362 for routing business transactions and identifying business parties. The BIC code is 8 or 11 characters in length.
LEIA Legal General Field Identifier (LEI) is a unique global identifier of legal general fields participating in financial transactions. LEI is formatted as a 20-character alpha-numeric code.
ISINAn International Securities Identification Number (ISIN) uniquely identifies a financial security. ISIN is a 12-character alpha-numeric code.

Mark-to-market

(MTM or M2M)

Mark-to-market refers to the fair value of an asset or liability. Mark-to-market is based on the current market price, the price of similar assets and liabilities, or on another objectively obsessed fair value.
CUSIPA CUSIP is a 9-digit number or a 9-character alpha-numeric code that identifies a North American financial security for the purposes of facilitating clearing and settlement of trades.

Enabling, disabling, updating and creating general fields

User permissions required: 'View Sources' AND 'Modify Datasets' OR 'Datasets Admin'.

Note: You have a default quota of 25 general fields per dataset. If you need more than 25 general fields, request a quota increase via the Account team.

Enabling general fields on a new dataset

To enable general fields on a new dataset that you want to create, you simply need to select them during the setup process.

Click the + button in the box shown below and you will be presented with a drop-down menu of all of the general fields that you are able to enable for that dataset. Simply click all of the general fields you want to enable before creating the dataset. If you add any in error, you can click the ‘X’ icon next to the general field name to remove it.

To understand more about how to create a new dataset, see here.

Create new dataset modal

Enabling, updating, and disabling general fields on an existing dataset

If you want to enable, update or disable general fields for an existing dataset, you can do so from the settings tab on the top navigation bar, and then selecting the Labels and extraction fields tab.

Settings > Labels and extraction fields tab

Enabling general fields:

To enable existing general fields, click inside the General Fields box, and select the general fields you want to enable from the drop down menu. Once you're happy with your selections, select Update General Fields (as shown below).

These general fields will have their settings pre-selected for you. You can then update them, including making them trainable, as shown below.

General fields tab

Updating general fields:

To update an enabled general field, click the general field in the general field box as shown in the above images and the 'Edit general field' modal (below) will appear.

Here you can update the base general field, the title of the general field and the API name (these concepts are described in detail below), as well as making the general field 'trainable'.

If you have previously reviewed general fields for a general field kind that was not set to 'trainable', this information is still stored.

Edit general field modal

Disabling general fields:

To remove any selected general fields, simply click the 'X' icon next to the general field name, and then click Update General Fields.

Note:

If you remove a general field and click Update General Fields, this will also remove the training data for that general field for this dataset. If you chose to re-enable the general field, you will need to train it again.

If you make a mistake while updating the general fields, click 'Reset' before you click Update General fields and your changes will not be applied.

Creating new general fields

The above sections covered how to enable and update existing pre-trained general fields for both new and existing datasets. In each instance, for either a new or existing dataset, you can also create new general fields.

Newly created general fields can be based on an existing pre-trained general field or can be trained from scratch (like a new label).

You can do this by clicking the '+' icon in the general field box, either in the 'Create dataset' flow or in the dataset settings page (as shown above).

This will bring up the 'Add a new general field' modal as shown below.

Here you can set the field types, title, and API name, as well as selecting whether the general field is trainable or not (these can be updated later as shown above).

When you've filled in each of the fields (explained below), simply click 'Create'.

Create new general field modal

Field types

  • This will serve as the initial state for your new general field, and the dropdown will contain a list of all the pre-trained general fields available to you
    • For example, if you select 'Date' as your base general field, all of the general fields predicted for this kind will be dates, and you could then train the platform to only recognise specific dates
  • If you want to train a general field entirely from scratch, you can select 'None - Train from scratch', and then you essentially start with a blank canvas when training the general field. The platform's predictions for this general field will be entirely based on the training examples you provide

General field title

  • The general field title is the name of the general field that will appear in the UI of the platform

API name

  • The API name of the general field is what will be returned via the API when it provides predictions for messages
  • The API name cannot contain any spaces or punctuation except for dash ( - ) and underscore ( _ )

General field filtering

User permissions required: View Sources AND View general fields.

Just as you can for labels, you can filter messages by whether they have general fields predicted or assigned, both in Explore and Reports.

You can apply any combination of AND, ANY OF and NOT when applying more than one general field filter. These filters can give you much greater flexibility when training and interpreting your data, and can provide much deeper insights on what's happening in your communication channels.

Here's some of the things you can now do when filtering by general field predictions:

  • Apply multiple general field filters at once, in both Explore and Reports
  • Filter to messages that have one of the number of selected general field predicted (i.e., ANY OF the General field X AND General field Y AND ...)
  • Filter to messages that have multiple different general fields predicted (i.e., general field X AND general field Y AND ...)
  • Filter to messages that do not have certain general fields predicted (i.e., NOT General field Y)
  • Search for general fields containing specific search terms, whilst having general field filters applied

All of the general fields you have enabled on your dataset will appear as shown below in the filter bar. Assigning general fields is covered in detail in the Reviewing and applying general fields.

Applying advanced prediction filters

There are now two ways to apply general field filters, and you can use them in combination with each other to create the right type of query.

This is how the general field filter bar looks like:


The default state is shown above, whereby no filter is applied and all messages will be shown (unless another filter is applied).

To update the general field filter, use the buttons explained below. They change colour when selected:

docs imageShow messages containing any annotated general fields.
docs imageShow messages predicted to contain a general field

If you want to filter to messages that have any annotated general fields or predicted to contain a general field, use the buttons at the top (as shown above). If you want to filter to messages with specific annotated or predicted general fields, hover over the general field in question and the same two buttons will appear to the right.

If you want to filter to either an assigned or predicted general field, select the name of the general field, and it shows messages with either one of them.

To remove your selection, select the button again, and to remove multiple selections, select All. You can also select Clear All at the top of the filter bar, but this will clear every filter you have selected, not just general field filters.

General field Bar

The taxonomy of general fields functions as a normal filter bar, and allows you to select multiple general fields at once with a single click for each.

Selecting multiple general fields from the list creates an ANY OF type query.

If you selected General field A, General field B and General field C in the General field bar, this creates a Show me messages with General field A, General field B, or General field C predicted query.

When filtering to specific general fields, you can make multiple selections. For instance, you could filter to see messages that have an address line general field assigned OR a city general field predicted (as shown below).

General field filter bar with predicted invoice id or assigned product id selected general fields

Add general field filter

The second filter option is the + Add General field filter button above the general field bar.

This enables a dropdown general field bar that allows you to select more complex filters, such as excluding certain general fields from consideration.

From this dropdown, you can select multiple general fields to include or exclude by clicking the name of the general field (for assigned and predicted), or the individual buttons (including minus for where this general field is neither assigned nor predicted).

The result looks like in this example, which returns messages predicted to have the Invoice ID general field, but not the Prod ID general field assigned or predicted:



You can select + Add General field Filter, multiple times to add additional layers to your query. Two separate general field filters create an AND type query, whilst multiple general fields selected in the same general field filter create an ANY OF type query.

In the example below, multiple general field filters have been applied individually. This creates a filter that will return messages predicted to have any of the three general fields in the first filter, but that also have the Policy Number general field predicted, and do not have the UK Postcode general field predicted or assigned.

An example of complex general field query combining ANY OF, AND and NOT general field filters

A helpful tip is that by selecting the & sign in an individual filter containing multiple general fields, you can automatically split them out into individual filters. This would change the query from ANY OF (i.e. any of these general fields predicted) to AND (i.e. all of these general fields predicted).

Combining general field bar filters and added general field filters

It's possible to combine filters from both the general field bar, and individually added general field filters. Filters applied in the general field bar are treated as an AND query with any individually applied general field filters.

For example, in the image below, this combined query would return any messages that had either ORDER ID or PROD ID predicted.

Combine general field filter using general field bar and individually added general field filters.

Combining general field filters and sorting by general field for training

What these new filters also mean, is that you can now apply general field filters and sort by a specific general field for a training mode.

Example of the Explore page showing Check general field mode for a specific general field, with an additional general field exclusion filter applied:


Reviewing and applying general fields

User permissions required: 'View Sources' AND 'Review and label'.

Identifying general field predictions

Predicted general fields appear as colour highlighted text, such as in the first line of the message below, with a different colour appearing for each different general field type. Once a general field has been confirmed by a user, by either manually applying it or accepting a prediction, the general field will appear as highlighted text with a bold, darker outline as shown below.

If a paragraph has had general fields assigned, dismissed, or applied, it will appear highlighted in grey, as shown in the body of the message below.

General field format example

How does the platform make general field predictions for trainable general fields?

When reviewing trainable general fields, it's important to remember that the platform will learn from both the general field values that you assign, as well as the context of where they appear within the communications, i.e. the other language that's used around the values themselves.

The platform will consider the context of the language in the same paragraph as the general field value, as well as the single paragraphs (denoted by a new separated line) directly before and after the paragraph that the general field sits in.

Please Note: For general fields that are not set to 'trainable', the platform's predictions are based entirely on the rules defined within the platform for that general field. This can be beneficial for when a general field absolutely has to follow a set format for a downstream automation, with any incorrect values causing a failure or exception.

General field confidence scores

When the platform predicts which general fields apply to a communication, it assigns each prediction a confidence score (%) to show how confident it is that the general field applies to the highlighted span of text. You can view a general field’s confidence score by hovering over the general field.

This confidence score is also made available via the API so that it can inform automated actions taken downstream.

Example of a General field’s confidence score

Accepting and rejecting general field predictions

Once general fields are enabled (see here), the platform will automatically start predicting them within the messages throughout your dataset. Users can then accept the predictions that are correct or reject them where they are incorrect. Each of these actions sends training signals that will be used to improve the platform’s understanding of that general field.

For the pre-trained general fields that are trained offline (e.g. Monetary quantity, URL, etc.), it is more important from an improvement perspective for users to reject or correct wrong predictions than it is for them to accept correct predictions.

For the general fields that train live in the platform, it is equally important to accept correct predictions as well as reject incorrect predictions. You do not, however, need to keep accepting many correct examples of each unique general field for these kinds (e.g. Example Bank Ltd. is a unique organisation general field) if you aren't finding incorrectly predicted ones.

The key caveat to this if that if you review any general field in a paragraph, you need to review all of the other general fields in that paragraph.

To review a general field prediction, hover the mouse over the prediction and the general field review modal will appear, as shown in the example below. To accept it, click 'Confirm', to reject it, click 'Dismiss'.

General fields and labels can be trained independently of each other. Reviewing labels for a message does not mean you have to review the general fields in that same message. It is, however, good practice to do both at the same time, as the most efficient use of your time whilst model training.

Please Note: It's very important when training general fields to follow the best practices explained below - particularly regarding not partially annotating paragraphs.

To understand how well the platform is able to predict each general field enabled for a dataset (particularly the trainable ones), see here.

Example message with both assigned and predicted general fields

Note:

It’s important that you reject incorrect general field predictions, but if the highlighted text was in fact a different general field (this would be more common for date-related general fields) that you apply the correct one afterwards (see below on how to apply general fields).

Applying general fields

To apply a general field to some text where the platform may not have predicted it, users simply need to highlight the section of test like you would if you were going to copy it.

A dropdown menu will appear, as shown below, containing all of the general fields that you have enabled for your dataset. Simply click the correct one to apply it, or press the corresponding keyboard shortcut.

The default keyboard shortcut for each general field is the letter is starts with. If more than one general field starts with the same letter, one will be assigned at random to the other.

An example message showing general field application modal

Once a general field has been applied, it will be highlighted in colour with a bold outline (see below). Each general field type will have its own specific colour.

An example message showing an applied ‘Policy Number’ general field

Note:

A value for a given general field type cannot be split across multiple paragraphs. The full value must be contained within a paragraph for it to be extracted as one general field value.

Best Practice

There are two very important best practices to remember when accepting, rejecting or applying general fields within messages:

1. Don't split words

It’s important not to split words – the highlighted general field should cover the entire word (or several) in question, not just part of it (see the incorrect example on the left below, and the correct application on the right)

Incorrect example of the ‘Address Line’ general field being applied

Correct example of the ‘Address Line’ general field being applied

2. Don't partially annotate paragraphs

When annotating, if a user assigns one label to a message, they should apply ALL the labels that could apply to that message, otherwise you teach the model that those other labels should not apply. For general fields, the same is true, except general fields are reviewed or applied at the paragraph level, rather than the whole message.

Paragraphs in a message are separated by new lines. The subject line of an email message is considered its own single paragraph.

Make sure to review or apply all of the general fields within a paragraph across all general field kinds if you review or apply one of them. Applying, accepting or rejecting general fields in a paragraph means that the paragraph is treated as ‘reviewed’ by the platform from a general field perspective. Therefore, it’s important to accept or reject ALL of the predictions in that paragraph.

The example below shows the different paragraphs that have been reviewed within the email message.

Example email message showing correctly reviewed general fields across multiple paragraphs

The message shown below shows the same example where the user has not accepted or rejected all of the general field predictions in a single paragraph. This is incorrect, as the model will falsely treat the monetary quantity general field as an incorrect prediction.

Example email message that has not been properly reviewed

Validation for general fields

Introduction

The platform displays validation statistics, warnings and recommended actions for enabled general fields in the Validation page, much like it does for every label in your taxonomy.

To see these, navigate to the Validation page and select the General fields tab at the top, as shown in the image below.

How to access General fields Validation page

How does general field validation work?

The process in which the platform validates its ability to correctly predict general fields is very similar to how it does it for labels.

Messages are split (80:20) into a training set and a test set (determined randomly by the message ID of each message) when they are first added to the dataset. Any general fields that have been assigned (predictions that were accepted or corrected) will fall into the training set or the test set, based on whichever set the message that they're in was assigned to originally.

As there can sometimes be a very large number of general fields in one message and no guarantee whether a message is in the training set or the test set, you may see a large disparity between the number of general fields in each set.

There may also be instances where all of the assigned general fields fall into the train set. As at least one example is required in the test set to calculate the validation scores, this general field would require more assigned examples until some were present in the test set.

How are the scores calculated?

The individual precision and recall statistics for each general field with sufficient training data are calculated in a very similar way to that of labels:

Precision = No. of matching general fields / No. of predicted general fields

Recall = No. of matching general fields / No. of actual general fields

A 'matching general field' is where the platform has predicted the general field exactly (i.e. no partial matches)

The F1 Score is simply the harmonic mean of both precision and recall.

Trainable general fields

It's worth noting that the precision and recall stats shown in this page are most useful for the general fields that are trainable live in the platform (shown in the second column above), as all of the general fields reviewed for these general field kinds will directly impact the platform's ability to predict those general fields.

Hence accepting correct general fields and correcting or rejecting wrong general fields should be done wherever possible.

Pre-trained general fields

For general fields that are pre-trained via template field types, in order for the validation statistics to provide an accurate reflection of performance, users would need to ensure they accept a considerable amount of correct predictions, as well as correcting wrong ones.

If they were only to correct wrong predictions, the train and test sets would be artificially full of only the instances where the platform has struggled to predict a general field, and not those where it is better able to predict them. As correcting wrong predictions for these general fields does not lead to a real-time update of these general fields (they are updated periodically offline), the validation statistics may not change for some time and could be artificially low.

Accepting lots of the correct predictions may not always be convenient, as these general fields are predicted correctly far more often than not. But if the majority of the predictions are correct for these general fields, it's likely that you may not need worry about their precision and recall stats in the Validation page.

What do the summary statistics means?

The summary stats (average precision, average recall and average F1 score) are simply averages of each of the individual general field scores.

Like with labels, only general fields that have sufficient training data are included in the average scores. Those that do not have sufficient training data to be included have a warning icon next to their name.

Note: The summary stats incorporate all of the general fields with sufficient training data, both those that are trainable live and those that are pre-trained. The predictions for general fields that are pre-trained are often only corrected when they are wrong, and not always accepted when they are right. This means their precision and recall stats can often be artificially low, which would lower the average scores.

Metrics

The General fields Validation page shows the average general field performance statistics, as well as a chart showing the average F1 score of each general field versus their training set size. The chart also flags general fields that have amber or red performance warnings.



The general field performance statistics shown are:

  • Average F1 Score: Average of F1 scores across all general fields with sufficient data to accurately estimate performance. This score weighs recall and precision equally. A model with a high F1 score produces fewer false positives and negatives.
  • Average Precision: Average of precision scores across all general fields with sufficient data to accurately estimate performance. A model with high precision produces fewer false positives.
  • Average Recall: Average of recall scores across all general fields with sufficient data to accurately estimate performance. A model with high recall produces fewer false negatives.

Understanding general field performance

The general field performance chart shown in the Metrics tab of the Validation page (see above) gives an immediate visual indication of how each individual general field is performing.

For a general field to appear on this chart, it must have at least 20 pinned examples present in the training set used by the platform during validation. To ensure that this happens, users should make sure they provide a minimum of 25 (often more) pinned examples per general field from 25 different messages.

Each general field will be plotted as one of three colours, based on the model's understanding of how the general field is performing. Below, we explain what these mean:

General field performance indicators

General field performance indicators:

  • Those general fields plotted as blue on the chart have a satisfactory performance level. This is based on numerous contributing factors, including number and variety of examples and average precision for that general field
  • General fields plotted as amber have slightly less than satisfactory performance. They may have relatively low average precisionornot quite enough training examples. These general fields require a bit of training / correction to improve their performance
  • General fields plotted as red are poorly performing general fields. They may have very low average precision or not enough training examples. These general fields may require considerably more training / correction to bring their performance up to a satisfactory level
Note: You will see the amber and red performance indicators appear in the general field filter bars in Explore, Reports and Validation. This helps to quickly notify you which general fields need some help, and also which general fields' predictions should not be relied upon (without some work to improve them) when using the analytics features.

Individual general field performance

Users can select individual general fields from the general field filter bar (or by clicking the general field's plot on the All general fields chart) in order to see the general field's performance statistics.

The specific general field view will also show any performance warnings and recommended next best action suggestions to help improve its performance.

The general field view will show the average F1 score for the general field, as well as its precision and recall.
Example general field card with recommended actionsdocs image

Improving general field performance

User permissions required: Review and annotate.

Overview

Like training labels, training general fields is the process by which a user teaches the platform which general fields apply on a given message using various training modes.

Like with labels, the ‘Teach’, ’Check’, and ’Missed’ modes are available to help train and improve the performance of general fields and can be accessed either 1) on the Explore page using the training dropdown, or 2) by following the recommended actions on the General fields tab of the Validation page.

This is how the dropdown menu containing the general field training modes in Explore looks like:


General field recommended actions

If a specific general field has a performance warning, the platform recommends the next best action that it thinks will help address that warning, listed in order of priority. This will be shown when you select a specific general field from the taxonomy or the All general field chart.

The next best actions suggestions act as links that you can click to take you direct to the training view that the platform suggests in order to improve the general field's performance. The suggestions are intelligently ordered with the highest priority action to improve the general field listed first.

This is the most important tool to help you understand the performance of your general fields, and should regularly be used as a guide when trying to improve general field performance.

Check this example of a general field card with recommended actions:


General field training modes

The following table summarises when the platform recommends each general field training mode:

Teach General fieldCheck General fieldMissed General field

- Show predictions for a label where the model is most confused if it applies or not

- For training general fields on unreviewed messages

- Shows messages where the platform thinks the general field may have been misapplied

- For training general fields on reviewed messages to try to find and correct any inconsistencies

- Shows messages that the platform thinks may be missing the selected general field

- For training general fields on reviewed messages to try to find and correct any inconsistencies

Using Teach General field

Using Teach General field boosts general field performance, because the model is being given new information on messages it is unsure about, as opposed to ones that it already has highly confident predictions for.



The platform recommends Teach General Fields when:

  • There is a performance warning next to a general field (as seen below – when the min. 25 examples has not been provided)
  • The F1 score on a given general field is low
  • There may not always be obvious context within the text for a general field, or there is lots of variation within the general field values for a given type
This is an example of training a general field in Teach General Fields mode:

docs image

Using Check General Fields

Using check general field helps identify inconsistencies in the reviewed set, while improving the model's understanding of the general field, by ensuring that the model has correct and consistent examples to make predictions. This will improve the recall of a general field.

The platform recommends Check General Fields when:

  • There is low recall, but high precision
  • The predictions the platform makes are very accurate, but a lot of the time where the general field has been applied, it doesn’t catch these examples
This is an example of training a general field in Check general fields mode:


(For more details on calculations for general field validation, please see here)

Using Missed General Field

Using missed general field helps find examples in the reviewed set that should have the selected general field but do not. It will also help identify partially annotated messages which can be detrimental to the model's ability to predict a general field. This will improve the precision of a general field and ensure the model has correct and consistent examples to make predictions from.

The platform recommends Missed General Field when:

  • There is high recall, but low precision
  • We’re incorrectly predicting general fields a lot, but when we do predict them correctly -we catch many of the examples that should be there
This is an example of training a general field in Missed General Field mode:


For more details on calculations for general field validation, check the Validation for general fields page.

Building custom regex general fields

Permissions required: Modify Datasets.

Note: You can build custom Regex general fields through the Dataset settings or the Manage general fields option in the Generative Extraction field annotation experience, explained in detail, in the Generative extraction page.

What are custom Regex general fields?

Use custom Regex general fields to extract and format spans of text that have a known repetitive structure, such as IDs or reference numbers.

This is a useful option for simple, structured general fields with little variation. In case of general fields with significant variation and where the context has a big influence on predictions, a machine-learning based general field is the right choice. You can use combinations of the two in any dataset within Communications Mining.

A broader Regex (i.e., set of rules to define the general field) can also be used as the base of a custom general field. This combines the rules with contextual, machine learning based refinement through training within Communications Mining to create sophisticated custom general fields. This provides the most optimal performance as well as the necessary restrictions on values extracted for automation.

Custom Regex Template

A Custom Regex general field is made up of a field type with the Regex data type, which in turn has one or more custom Regex Templates. Each template expresses one way to extract (and format) the general field.

Combined together, these templates offer a flexible and powerful way to cover multiple representations of the same general field type.

A template is made of two parts:

  1. The regex (regular expression), which describes the constraints that need to be met by a span of text to be extracted as a general field.
  2. The formatting, which expresses how to normalise the extracted string into a more standard format.
For instance, if your customer IDs is either the ID word, followed by 7 digits, or an alphanumeric string of 9 characters, this is what your two templates will look like:




Type-ahead validation

When typing into the text box for either the Regex or the Formatting, the interface will provide immediate feedback on the validity of the input. For instance, the invalid input Regex ID\d{} will show:

Extraction preview

The Custom Regex Template can be tested on text to ensure that it behaves as expected. Any general field that would be extracted with the Template will be shown in a list, with its value, as well as the position of the start and end characters.

For instance, if the Regex is \d{4} and the formatting ID-{$}, the following test string will show one extraction:


Regex

The regex is the pattern used to extract general fields in the text. Check the syntax documentation.

Named capture groups can be used to identify a specific section of the extracted string for subsequent formatting. The names of the capture groups should be unique across all templates, and should only contain lowercase letters or digits.

Formatting

Formatting can be provided to post-process the extracted general field.

By default, no formatting is applied and the string returned by the platform will be the string extracted by the regex. However, if needed, more complex transformations can be defined, using the following rules.

Variables

Any named capture group defined in the regex will be available to use in the formatting logic as a variable, prefixed with the $ symbol. Note that the $ symbol by itself represents the full regex match.
Variables can then be used in the formatting string to insert the corresponding extracted span into the value returned by the platform; the variable name needs to be surrounded by { and } braces.
For instance, if we want to extract seven digits as an ID, and return these seven digits prefixed with ID- then the regex and the formatting would be:


Or, using a named capture group:


Later on, if the platform is given the My identification number is 1234567 text, it will return one general field: ID-1234567

String Operations

Raw strings can be used, and strings can be concatenated using the & symbol.
Regex(?P<id1>\b\d{3}\b)|(?P<id2>\b\d{4}\b)
Formatting{$id1 & "-" & $id2}
TextThe first id is 123 and the second one is 4567
General Field returned by the platform123-4567

Functions

Some functions can also be used in the formatting to transform the extracted string. The names of the functions and their signatures are inspired by Excel.

Upper

Converts all characters in the extracted span to uppercase:

Regex\w{3}
Formatting{upper($)}
Textabc
General Field returned by the platformABC

Lower

Converts all characters in the extracted span to lowercase:

Regex\w{3}
Formatting{lower($)}
TextAbC
General Field returned by the platformabc

Proper

Capitalises the extracted span:

Regex\w+\s\w+
Formatting{proper($)}
Textalbert EINSTEIN
General Field returned by the platformAlbert Einstein

Pad

Pads the extracted span up to a given size with a given character.

Function arguments:

  1. The text containing the characters to be padded
  2. Size of the padded string
  3. Character to be used for padding
Regex\d{2,5}
Formatting{pad($, 5, "0")}
Text123
General Field returned by the platform00123

Substitute

Replaces characters with other characters.

Function arguments:

  1. The text containing the characters to be substituted
  2. What characters to replace
  3. What the old characters should be replaced with
Regexab
Formatting{substitute($, "a", "12")}
Textab
General Field returned by the platform12b

Left

Returns the first n characters from the span.

Function arguments:

  1. The text containing the characters to be extracted
  2. The number of characters to return
Regex\w{4}
Formatting{left($, 2)}
TextABCD
General Field returned by the platformAB

Right

Returns the last n characters from the span.

Function arguments:

  1. The text containing the characters to be extracted
  2. The number of characters to return
Regex\w{4}
Formatting{right($, 2)}
TextABCD
General Field returned by the platformCD

Mid

Returns n characters after the specified position from the span.

Function arguments:

  1. The text containing the characters to be extracted
  2. The position of the first character to return
  3. The number of characters to return
Regex\w{5}
Formatting{mid($, 2, 3)}
TextABCDE
General Field returned by the platformBCD

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.