- Overview
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 4506T - ML package
- 990 - ML Package - Preview
- ACORD125 - ML package
- ACORD126 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Passports - ML package
- Payslips - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public Endpoints
- Hardware requirements
- Pipelines
- Document Manager
- OCR services
- Deep Learning
- Document Understanding deployed in Automation Suite
- Document Understanding deployed in AI Center standalone
- Licensing
- Activities
- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentProcessing.Contracts
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Checkboxes and signatures
There are several types of multiple choice fields that use checkboxes:
- the mutually exclusive checkboxes
- the non-mutually exclusive checkboxes, where you can select more than one option.
Another important aspect is the number of choices available for a given multiple choice field. In some cases there may be a single option, where the checkbox is either checked or not, while in other cases there may be 10, 20, or more options, arranged in a grid or table, like on many health forms.
There are two major ways in which you may label these kinds of multiple choice fields.
Let's take an example to understand how you can label the options. Forms can include the options Project or Policy. In this case, you only have one field, and you only label the selected word, i.e. label the word Project if the checkbox next to it is checked or the word Policy if the checkbox next to it is checked. If neither is checked then you label neither, and both being checked is not possible, and such documents would just be deleted from the training set.
This approach has the advantage that you have a single field, which requires less data. It also has the advantage that it does not rely on a successful detection of checkboxes. If a checkbox is detected as a letter X, the model can still learn to recognize that it means the option next to it is selected.
The disadvantage is that you need to make sure both options are roughly equally represented, which is not always the case. Potentially, in your training set, 90% of the documents might have Project checked. In this case, the model cannot perform well and this approach fails. The problem gets worse when you have more options because some of them are almost always rare. In these cases you may need to create fake documents with the rare options checked to balance things out.
Starting with the 2022.4 LTS Enterprise release, signatures can be detected using the UiPath Document OCR, hence, Machine Learning Models can directly detect signatures.
Label a signature like any other field is labelled in your document. Once detected by the UiPath Document OCR, the Machine Learning Model learns to recognize the field as a signature.