document-understanding

latest

false

Document Understanding Modern Projects User Guide

DELIVERY:

Last updated Jun 19, 2025

Build

This section provides the following experiences:

Upload documents and classify them automatically.
Upload documents straight into document types.
Manage files from the project (add, remove files and add, change tags).
Annotate documents.
Add or remove fields.
Have a guided experience on training classification and extraction models using the recommendations.

Annotate documents

After successfully creating your project and uploading your documents to a specific document type, they are automatically pre-annotated. This is done using a combination of generative and specialized models, based on the document type's schema. The schema clearly defines the fields you want to extract from a particular document type. To find the document type's schema, go to the Annotation page and check the Fields section.

For more in-depth information on how to annotate your documents, check the Annotate documents how-to page.

Exceptions for review

You can use documents that were validated in Validation Station to further improve the performance of your models.

If there are any changes after the validation step, the Exceptions for review button is displayed for the impacted document type.

Figure 1. Exceptions for review button

For more in-depth information on how to retrain your models, check the Retrain extractors how-to page.

Tag documents

Once you uploaded your documents, you can add tags to them.

You can add one tag with a maximum of 100 characters for each document.

To add a a tag to your documents, select the documents you want to add and select the Tags button from the menu above the document types list.

You can search through your documents easier if you filter using tags. You can also check the results per tags in the advanced configuration file when a model is trained.

Document type manager

You can edit the settings for multiple fields from Document type manager.

To get to there, select the three-dot icon ⋮ next to the document type you want to edit and select Document type manager from the menu.

Figure 2. Select Document type manager

Extraction fields

Editing or adding new fields

To add a new field, select Add field and fill in the needed information. You can add or edit the following options for each field:

Field name: the unique name for the field.
Content type: the content type of the field:
- String: used for company names or addresses, as well as payment terms, or for any other field where you want to build the parsing or formatting logic manually, in the RPA workflow.
- Number: used for amounts or quantities, with intelligent parsing of the decimal/thousands separators.
- Date: parse, format and unify the output using the YYYY-MM-DD format.
- Phone: use for phone number. Formatting removes letters and parentheses, and replaces spaces with dashes.
- ID Number: used for alphanumeric codes, numbers of IDs. It's similar to the string content type, but removes any characters coming before the : character. If the Id number you need to extract can contain : characters, use string content type instead to avoid data loss.
Shortcut: the shortcut key for the field. One key or a combination of two keys is allowed.
Advanced settings: the available options differ depending on the Content type of the selected field. Select the Advanced settings button for the desired field to edit:
Figure 3. Document type advanced settings
- Field ID: the unique id for the field.
- Post processing:
  - first_span: if the model predicts more than one instance of a field in a document, make it return the first one.
  - longest_value: if the model predicts more than one instance of a field in a document, make it return the value consisting of the largest number of characters.
  - highest_confidence: if the model predicts more than one instance of a field in a document, make it return the value with the highest confidence.
  Scoring: the measure used to determine the accuracy when running evaluations of model predictions is only available for fields with content type String:
  - exact_match: prediction will only be deemed to be correct (score of 1) if it exactly matches the true value. If it differs by even a single character, then it is deemed to be incorrect (score of 0). This is the default setting for all fields except for String fields.
  - levenshtein: prediction will be deemed to be partially correct according to the Levenshtein distance between the prediction and the true value. For example, if a 10 letter value is predicted correctly except for the last 2 characters, then the score of that prediction is be 0.8.
- Date format: this field is only available for fields with content type Date and it indicates how ambiguous dates are parsed and returned:
  - Auto
  - US style: YYYY-DD-MM
  - Non-US style: YYYY-MM-DD
- Multi-line: fields which span multiple text lines (addresses or descriptions) need to have this checked, otherwise only the first line is returned.
- Multi-value: field returns a list with all the values detected in the document.

You can also reorder the fields from this view.

Changes in document type settings are not reflected in the new project version if you publish a new project version before re-triggering a training.

Workaround: To avoid this, retrain the document type after making modifications to the document type fields. You can do this by tagging or confirming additional documents for that type before publishing a new version.

Search field names

You can search through the available field names. To do so, use the search bar from the top left corner of the Document type manager interface. For a more efficient search, use the Filter feature to filter by Content type.

Figure 4. Search field names

Delete fields

Select the Delete button next to the field you want to delete.

Figure 5. Delete a field

You can also select several (or all) fields and delete them at once. To do so, select the check mark next to the fields you want to delete and then select Delete.

Figure 6. Delete several fields at once

Classification fields

Note: Classification fields are currently in public preview.

Classification fields are data points that refer to a document as a whole. For example, the expense type of a receipt (food, hotel, airline, or transportation) or the currency of an invoice (USD, EUR, JPY) are classification fields.

Note:

The following limitations currently apply for the Classification fields feature:

When using the Extract Document Data activity, classification fields are supported for modern project extractors and out-of-the-box models, but not for classic project extractors.
Classification fields are extracted for custom document types only after a successful training.

Editing or adding classification fields

To add a new classification field, select Add field and type in a name for the new field.

You can also reorder the fields from this view.

Figure 7. Add a new classification field

To check the classification field ID, select Advanced settings next to the needed classification field.

Figure 8. Classification fields advanced settings

Editing or adding classes

To add a new class for a classification field, select Add class and type in a class name and an optional description.

Note: Each classification field must contain at least two classes. Each class needs to be annotated at least five times to be included in training.

Figure 9. Add a new class

You can edit the name and description for each class.

You can also reorder the classes from this view.

To remove a class, select Delete next to the class you want to remove.

Figure 10. Delete a class

Settings

You can change the document type settings from the Settings tab.

Figure 11. Model settings

You can change the following settings:

Base model: Dataset size estimations used in the Recommended Actions depend on the base model used to train. Using the most similar base model to your Document Type will reduce the amount of annotation work required.
Number of languages: Dataset size estimation used in the Recommended Actions depend on the number of languages in the dataset. More languages generally require annotating more data.

Search documents

You can search uploaded documents by document name. To do so, use the search bar from the left corner of the Build section. For a more efficient search, use the Filter feature to filter by:

Document type: choose the desired document type from the drop-down list.
Upload date: choose a date interval when the document was uploaded.
Status: choose the status of the document.
Tag: choose the tags you want to filter.

Figure 12. Filter documents

Project and model score

You can check your project's overall score from the top right corner. This score factors in the classifier and extractor scores for all document types. Click Project score to display the Measure section. You can check more in-depth performance measurements in that section.

You can check the score for each document type separately from the Document type section. This score factors in the overall performance of the model, as well as the size and quality of the dataset.

Note: You need to upload at least 10 documents to get a project score. For a document type score, you need at least 10 documents under the same document type.

You can check the model rating of your models if you select the score tag. The model rating is a functionality intended to help you visualize the performance of a classification model. It is expressed as a model score from 0 to 100 as follows: