activities
latest
false
- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- Manual validation for digitize documents
- Anchor-based data extraction using Intelligent Form Extractor
- Validation station
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Manual validation for digitize
documents
data:image/s3,"s3://crabby-images/02f33/02f3326d12ccf98bd207c638e5b88e785a5474e8" alt=""
Document Understanding Activities
Last updated Feb 14, 2025
Manual validation for digitize documents
The example below explains how to manually extract data from an image and present the output in a separate file. It presents activities such as Digitize Document or Present Validation Station. You can find these activities in the UiPath.IntelligentOCR.Activities package.
Note: This workflow is using an older version of the UiPath.IntelligentOCR.Activities package.
Steps:
- Open Studio and create a new Process named
by default Main.
Note: Make sure to add all the needed files (
.json
files and all the images) inside the project folder. - Add a Sequence container in the
Workflow Designer and create the variables shown in
the following table:
Table 1. Variables to be created Variable Type
Default Value
Text
String
DOM
UiPath.DocumentProcessing.Contracts.Dom.Document
Data
UiPath.DocumentProcessing.Contracts.Taxonomy.DocumentTaxonomy
DocumentTaxonomy
UiPath.DocumentProcessing.Contracts.Taxonomy.DocumentTaxonomy
TaxonomyJSON
String
HumanValidated
UiPath.DocumentProcessing.Contracts.Results.ExtractionResult
- Add a Read Text File activity inside the
sequence.
- In the
Properties panel, add the name of the file,
in this case
"taxonomy.json"
, in the FileName field. - Add the
variable
TaxonomyJSON
in the Content field.
- In the
Properties panel, add the name of the file,
in this case
- Add an Assign activity after the Read
Text File activity.
- Add the
variable
Data
in the To field and the expressionDocumentTaxonomy.Deserialize(TaxonomyJSON)
in the Value field. This activity builds the taxonomy for extraction.
- Add the
variable
- Add a Digitize Document activity after the
Assign activity.
- In the
Properties panel, add the value
1
in the DegreeOfParallelism field. - Add the
expression
"Input\Invoice01.tif"
in the DocumentPath field. - Add the
variable
DOM
in the DocumentObjectModel field. - Add the
variable
Text
in the DocumentText field.
- In the
Properties panel, add the value
- Add a Google OCR engine inside the
Digitize Document activity.
- In the
Properties panel, add the variable
Image
in the Image field. - Select the check box for the ExtractWords option. This option extracts the on-screen position of all detected words.
- Add the
expression
"eng"
in the Language field. - Select
the option
Legacy
from the Profile drop-down list. - Add the
value
2
in the Scale field.
- In the
Properties panel, add the variable
- Add a Present Validation Station activity
after the Digitize Document activity.
- In the
Properties panel, add the variable
DOM
in the DocumentObjectModel field. - Add the
expression
"Input\Invoice01.tif"
in the DocumentPath field. - Add the
variable
Text
in the DocumentText field. - Add the
variable
Data
in the Taxonomy field. - Add the
variable
HumanValidated
in the ValidatedExtractionResults field.
- In the
Properties panel, add the variable
- Add a For Each activity under the
Present Validation Station activity.
- In the
Properties panel, select the option
UiPath.DocumentProcessing.Contracts.Results.ResultsDataPoint
from the TypeArgument drop-down list. - Add the
expression
HumanValidated.ResultsDocument.Fields
in the Values field.
- In the
Properties panel, select the option
- Add a Log Message activity inside the
Body of the For Each activity.
- Select
the option
Info
from the Level drop-down list. - Add the
expression
item.FieldName
in the Message field.
- Select
the option
- Add a Log Message activity below the first
Log Message activity.
- Select
the option
Info
from the Level dropdown list. - Add the
expression
item.Values(0).Value.ToString
in the Message field.
- Select
the option
- Add a Write Line activity under the Log
Message activities.
- Add the
value
""
in the Text field.
- Add the
value
- Run the process. The robot uses the IntelligentOCR activities to manually process the data and to present the results.
Visit the following link to download the example as a
ZIP
file: Example.