- Getting Started
- Framework Components
- Data Extraction Validation Overview
- Validation Station
- Data Extraction Validation Related Activities
- Document Understanding in AI Center
- Pipelines
- ML Packages
- Data Manager
- OCR Services
- Licensing
- References
Data Extraction Validation Overview
After automatic data extraction, one optional (but highly recommended) step is that of extracted data validation.
This refers to a human review step, in which knowledge workers can review the automatically extracted results and correct them when necessary.
Using Data Extraction Validation ensures that the structured data now available is 100% correct.
It is strongly recommended to use the Data Extraction Validation components when:
- you need 100% accuracy on the data,
- you have no other way to double-check the
automatically extracted information from other sources of truth
- e.g., you can check a certain Name or Address that equals a Name or Address already confirmed and existing in a database, etc.
- you do not have sufficient synthetic checks you
can use on data consistency
- e.g., you can check that line
items add up to a total; you can check that an ID number checksum is
correct, etc.
Note:
Our strong recommendation is that, if possible, do add the Validation step, if you need 100% accuracy.
If this is not an option for all documents, then:
- try to double-check as much of the information as possible
- try to decide on specific confidence thresholds that the business use case can accept for certain fields
- make sure to always check both Extraction Confidence as well as OCR Confidence for a given value before making your decision.
- e.g., you can check that line
items add up to a total; you can check that an ID number checksum is
correct, etc.
Validating the automatically extracted data can be done by a human input through the use of Validation Station.
The Validation Station is available both
- as an attended activity, through the use of the Present Validation Station activity, or
- as Action Center tasks, through the use of the Create Document Validation Action and Wait for Document Valiation Action and Resume activities.