- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded

Document Understanding Activities
Extract Document Data
The Extract Document Data activity extracts data from an input file or a Document Data object, resulted as output from the Classify Document activity. The Extract Document Data activity also stores the extracted results into a Document Data object. You can use the result of the Extract Document Data activity to perform actions on the extracted results or send them for validation in Action Center.
Prerequisites
The Extract Document Data activity requires input objects of type Document Data or File. A possible use case for using this activity is to precede it with a Classify Document activity, that generates an object of type Document Data.
Input options
- Document Data - from the Classify Document activity
- File - from Get File/Folder or Get Newest Email activities
Supported languages for generative models
The supported languages for the generative models are the same as the OCR engine used, which depends on the project. For the Generative Predefined and Predefined projects, the OCR Engine used is UiPath Document OCR. For more information, visit the OCR Supported languages page.
Models used by the activity
The Extract Document Data activity uses the following:- Pre-trained specialized models, available out of the box, based on DocPath.
- Custom pre-trained models deployed in Document Understanding modern and classic projects.
- Generative extraction models.
Start reading about the configurations available for the Extract Document Data activity. The configuration information is split into two separate pages, each corresponding to their own experience:
- Extract Document Data – Describes the configuration information for the latest activity package version released as GA.
- Extract Document Data - Preview – Describes the configuration information for the latest activity package version released in preview.