- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD125 - ML package
- ACORD126 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Data and security
- Licensing
Document Understanding User Guide
Document classification training overview
Document Classification Training is a component in the Document UnderstandingTM Framework that helps in closing the feedback loop for classifiers that are capable of learning from human feedback.
You can build Document Understanding processes that do not contain any training component. This may occur for multiple reasons, of which some are:
- the classifiers you are using do not support retraining
- you don't want to perform retraining as you'd rather have the process always use the same training
- you want to update the classifier training offline and you are managing its updates outside of your DU process.
Training your classifiers as part of regular process usage is, though, of great benefit in a majority of cases, because the classifiers can gather their own training data and perform their own updates by ingesting the human validation information, without requiring you to update your already existing workflows in any way. They become, so to speak, self-learning algorithms that can teach themselves to act better in the future, based on what the humans have validated as correct data.
Classification training is done through the Train Classifiers Scope activity. You can train one or more classifiers, as the scope activity has the role of configuring and executing one or more algorithms for classification training in one go.
Classification training is usually run after Document Classification Validation: only human confirmed feedback should be sent back to the classifiers for training, to ensure accuracy of the training data received by the algorithms.
Classification training should be run both in the case of a failed classification (no automatic classification, or automatic classification that was corrected by the knowledge worker), as well as in the case of a successful one (no corrections done by the user in the validation stage, all automatic results confirmed). This is because both cases are useful for the algorithms to learn from.
You can train both classifiers that have been used in the Document Classification phase, as well as classifiers that have not been used for classification prediction. The latter approach is used for collecting training data and training a classifier from scratch, with the intent of later putting it to use by adding it to document understanding workflows.
In short, this is what the Train Classifiers Scope does:
- Provides all Classifier Trainers (training algorithms) the necessary configurations for them to run.
- Accepts one or more classifier trainers.
- Allows for document type filtering and taxonomy mapping between the project taxonomy and any internal classifier taxonomies.
The Train Classifiers Scope allows you to configure it by using the Configure Classifiers wizard. You can customize
- which document types are sent for training to which classifier trainer,
- what is the taxonomy mapping, at document type level, between the project taxonomy and the classifier's internal taxonomy (if any).
Classifiers and their respective trainer activities can be found in the UiPath.IntelligentOCR.Activities packages, and UiPath.DocumentUnderstanding.ML.Activities.
The available classifier trainers are:
- Keyword Based Classifier Trainer: trainer activity for the Keyword Based Classifier
- Intelligent Keyword Classifier Trainer: trainer activity for the Intelligent Keyword Classifier
- Machine Learning Classifier Trainer: trainer activity for the Machine Learning Classifier.