- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Classify Document
UiPath.IntelligentOCR.StudioWeb.Activities.ClassifyDocument
You can use this activity to classify documents, by selecting a desired classifier, and a document that you would like to classify.
The supported languages for the generative models are the same as the used OCR engine used. For more information, check the OCR Supported languages page.
Unless this activity is the first Document Understanding activity part of a Studio workflow, the input should be Document Data. File should only be used as input if the activity is the first Document Understanding one part of a Studio workflow.
Properties
- Project - Requires you to
select your Document Understanding project from the drop-down menu. The
available options are:
- Predefined - The default project type
- You can create a new project by clicking the + icon.
Note: If you have created more than 500 projects on your tenant and use the Classify Document activity, UiPath Studio or Studio Web will not display any projects beyond the initial 500. Therefore, those projects cannot be used. - Classifier - Requires you
to select your Document Understanding classifier from the drop-down menu.
Note: The data sent to the Generative Classifier will be sent to an LLM Model instance which is not publicly available, will not leave it, and once processed, it will not be stored or used for training.
- Predefined - The default project type
- Generative
Classifier - The generative classifier type
Important:
This feature is currently part of an audit process and is not to be considered part of the FedRAMP Authorization until the review is finalized. See here the full list of features currently under review.
- Prompt -
Prompt to identify Document Types, provided as key-value pairs,
where the key represents the name of the Document Type and the
value a description for it, helping the classifier identify such
documents.
- Document Type - Provide the name of the document type to be used as classification result (30-character limit).
- Generative prompt - Requires you to provide the prompt as input for the Generative Classifier. The maximum number of characters allowed is 1000.
- Prompt -
Prompt to identify Document Types, provided as key-value pairs,
where the key represents the name of the Document Type and the
value a description for it, helping the classifier identify such
documents.
- Input - Provide the input
file or the Document Data object.
Important: The maximum numbers of pages a file can have is 500. Files exceeding this limit fail to be classified.Tip: When your files aren't stored as an
IResource
type variable, there's an option to perform a conversion. UseLocalResource.FromPath(<reference_to_the_file>)
in the Input property field for this.Consider a scenario where you are iterating through a list of files using a For Each activity. SupposecurrentItem
is your iterating variable. To convertcurrentItem
intoIResource
, pasteLocalResource.FromPath(currentItem)
into the Input field.
Advanced Options
- Minimum confidence -
Specify the minimum confidence threshold based on which a document type is
assigned during classification. If a document's confidence score falls below
this threshold, its Document Type is reported as "unknown".
Tip: Most document types generate a prediction with a confidence level. Setting this property prevents false positives by only considering the predictions with a confidence level above the threshold. You can identify an optimal confidence level by testing various documents within your workflow, recording the results in an Excel spreadsheet, for example, and then analyze what threshold value is the most accurate.
Input
- Timeout (seconds) - Maximum execution time (in seconds) for the call to the generative model. If the operation exceeds this timeout, it is automatically terminated to prevent delays or hangs. This property is only displayed if the Generative Classifier is selected as a classifier.
Output
- Document Data - All the validated extracted field data from the file.
To quickly get started with the generative capabilities of the Classify Document activity, perform the following steps:
- Add a Classify Document activity
- From the Project dropdown list, select Predefined.
- For Classifier, select
Generative Classifier.
The Prompt property appears in the body of the activity.
- In the Prompt field, provide
your instructions as Dictionary key-value pairs, where:
- Key represents the Document Type (example: CV).
- Value represents the
Generative prompt: The description used by the generative
classifier to identify the document types.
For example, check the following table for a sample of key-value pairs:
Table 1. Key-value pairs used as a prompt for the generative classifier Document type Generative prompt CV "Find common CV keywords such as "Education", "Skills", and "Experience"." Invoice "Find common field names such as "Invoice number" "Bill to" or "Total Amount"."
Figure 1. Key-value pairs used as a prompt for the generative classifier