Document Understanding User Guide

DELIVERY:

Last updated Feb 4, 2025

Machine Learning Extractor Trainer

The Machine Learning Extractor Trainer collects the human feedback for you, in a directory of your choice. Once you collect data and you want to retrain an ML Model, you can just zip the content of the directory and upload it in Data Manager for curation.

Endpoint & ML Skill Options

The same rule as for the Machine Learning Extractor applies to the Machine Learning Extractor Trainer. See here.

How to Use

Below are the steps that you need to follow for using the Machine Learning Extractor Trainer activity.

Use the Taxonomy Manager Wizard to define your document types and fields.
Drag a Machine Learning Extractor Trainer in a Train Extractors Scope activity.
In the Machine Learning Extractor wizard that automatically opens, add the Endpoint information.
Select the checkbox for the Update activity arguments if you wish to also use the entered values as input arguments for the activity, more precisely for the Endpoint.
Click the Get Capabilities button. The wizard closes after this operation.
Enter a value for Output Folder.
Select the Configure Extractors option in the Train Extractors Scope. A wizard is displayed.
The Machine Learning Extractor Trainer is now ready for configuration. Expand the document type that you want to apply it for, and start selecting the fields you want to train, by clicking the checkboxes next to the appropriate fields.
Fill in the textboxes either manually or by selecting, from the available drop-down list, the correct data you wish to map to each field. The drop-down list contains all fields that the Machine Learning Extractor Trainer, using the endpoint entered in the Machine Learning Extractor wizard, declares as extraction capability.
Note: If you click the checkbox but you leave the textbox empty, the latter will be automatically filled in with the Document Type ID from the local taxonomy. The changes apply after saving. Should you want to avoid using a long string for the field ID, we would recommend you to manually enter a value in case you do not have access to the internal taxonomy of the extractor.
To check if you are using the latest capabilities of the extractor, you can click the Get or refresh extractor capabilities which opens the Machine Learning Extractor wizard.
Selecting one of the options from a drop-down list automatically confirms that field.
To train an extractor based on its extraction result, you can set the exact alphanumeric value in the Framework Alias field previously used for an extractor.
Select the Save button once all fields are configured properly.
Important: You cannot choose the same option for two distinct fields.