Document classification training overview

What Is document classification training

Document Classification Training is a component in the Document Understanding^TM Framework that helps in closing the feedback loop for classifiers that are capable of learning from human feedback.

When classification training should be used

You can build Document Understanding processes that do not contain any training component. This may occur for multiple reasons, of which some are:

the classifiers you are using do not support retraining
you don't want to perform retraining as you'd rather have the process always use the same training
you want to update the classifier training offline and you are managing its updates outside of your DU process.

Training your classifiers as part of regular process usage is, though, of great benefit in a majority of cases, because the classifiers can gather their own training data and perform their own updates by ingesting the human validation information, without requiring you to update your already existing workflows in any way. They become, so to speak, self-learning algorithms that can teach themselves to act better in the future, based on what the humans have validated as correct data.

How to use the document classification training component

Classification training is done through the Train Classifiers Scope activity. You can train one or more classifiers, as the scope activity has the role of configuring and executing one or more algorithms for classification training in one go.

Classification training is usually run after Document Classification Validation: only human confirmed feedback should be sent back to the classifiers for training, to ensure accuracy of the training data received by the algorithms.

Classification training should be run both in the case of a failed classification (no automatic classification, or automatic classification that was corrected by the knowledge worker), as well as in the case of a successful one (no corrections done by the user in the validation stage, all automatic results confirmed). This is because both cases are useful for the algorithms to learn from.

You can train both classifiers that have been used in the Document Classification phase, as well as classifiers that have not been used for classification prediction. The latter approach is used for collecting training data and training a classifier from scratch, with the intent of later putting it to use by adding it to document understanding workflows.

In short, this is what the Train Classifiers Scope does:

Provides all Classifier Trainers (training algorithms) the necessary configurations for them to run.
Accepts one or more classifier trainers.
Allows for document type filtering and taxonomy mapping between the project taxonomy and any internal classifier taxonomies.

The Train Classifiers Scope allows you to configure it by using the Configure Classifiers wizard. You can customize

which document types are sent for training to which classifier trainer,
what is the taxonomy mapping, at document type level, between the project taxonomy and the classifier's internal taxonomy (if any).

Available classifier trainers

Classifiers and their respective trainer activities can be found in the UiPath.IntelligentOCR.Activities packages, and UiPath.DocumentUnderstanding.ML.Activities.

The available classifier trainers are: