- Overview
- Document Understanding Process
- Quickstart Tutorials
- Framework Components
- ML Packages
- Pipelines
- Document Manager
- OCR Services
- Document Understanding deployed in Automation Suite
- Document Understanding deployed in AI Center standalone
- Deep Learning
- Licensing
- References
- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.DocumentProcessing.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Import Documents
The Import data dialog box enables you to easily import new documents to be labeled or revised.
Click the Import button from the management bar.
The dialog box contains the following controls:
- Batch name text field - it is mandatory to enter a name for your export, otherwise the Browse or drop files section is disabled; a valid name can have up to 24 characters and should not contain special characters.
- Make this an evaluation set checkbox - if selected, the dataset is used for evaluation purposes.
- Enable large documents checkbox - if selected, you can upload documents with more than 150 pages.
- Browse or drop files section - click the Browse files to upload to navigate through your directory or simply drag and drop the files inside the frame.
- Status section - click (load previous
import log) to see to check the status of the latest import; when uploading
data, in the Status section you receive an overview of your files and you are
prompted to proceed with the import by clicking YES or abort the import by
clicking CANCEL.
Important:The 2021.10 release of Document Manager supports labeling multi-page documents. This is a major change from previous releases where each page was labeled separately. Labeling and exporting multi-page documents assumes each document represents a single logical document. For instance, a six-page document may contain a single six-page invoice but it should not contain three different invoices, two pages each. This is particularly important for evaluation sets.
This requirement is not relevant for Backwards-compatible exports.
There are 4 types of Import supported in Document Manager:
- Schema import
- Raw documents import (max 2000 or 1GB pages per import)
- Document Manager dataset import (max 2000 or 1GB pages per import)
- Validation Station dataset import (max 2000 or 1GB pages per import)
If you would like to launch a new Document Manager session using the same schema as in an existing session, you can follow these steps:
- Click the Export button from the management bar.
- In the Export files dialog box, check the Schema option.
- Click the Export button inside the dialog box. A
.zip
file is exported. - Click the Import button from the management bar.
- Upload or drag & drop the
.zip
file directly into the new Document Manager session (do not unzip). In this step, you can also upload a . - Click YES in the Status section to proceed with the import. The schema is imported.
You could also use one of the predefined schemas provided in the page.
Schema import can also be applied for multi-value fields.
.pdf
, .tiff
, .png
, .jpg
.
.zip
files are not supported for raw documents import.
OCR settings need to be configured before import.
Follow the steps below:
- Click the Import button . The Import data dialog box is displayed.
- Provide a batch name in the Batch name
field. This enables you to easily filter and find these documents using the
Search drop-down later on.
- If you want to use this document batch for training an ML model, leave unselected the Make this an evaluation set checkbox.
- If you want to use this document batch for evaluating an ML model (i.e. measuring its performance), select the Make this an evaluation set checkbox. This ensures the data is ignored by the Training Pipelines.
- If you have documents with more than 150 pages, select the Enable large documents checkbox. Otherwise, leave the checkbox unselected.
- Upload or drag & drop a file or set of files into the Browse or drop files section.
- Click YES. The file or set of files are imported.
.zip
file which was exported originally, and import it directly into the new Document Manager instance.
If your new Document Manager instance is completely empty (no data and no fields defined), then both the documents with labels and the schema are imported.
If your new Document Manager instance already has fields defined, then the newly imported dataset needs to have the same fields, or a subset of those fields. Otherwise, the import is rejected.
Split large datasets
.zip
files into multiple .zip
files that are smaller than 1GB and that have less than 1500 files.
As your RPA workflow processes documents using an existing ML model, some documents may require human validation using the Validation Station activity (available on attended bots or in the browser using Orchestrator Action Center).
The validated data generated in Validation Station can be exported using Machine Learning Extractor Trainer activity and can be used to train ML models using the feature described below.
Follow the steps below:
- Configure the Machine Learning Extractor Trainer to output data into a folder with path
<Trainer/Output/Folder>
(use any empty folder path). - Run an RPA workflow including Validation Station and Machine Learning Extractor Trainer.
- Machine Learning Extractor Trainer creates three subfolders: documents, metadata, and predictions inside of the output folder.
- Zip the
<Trainer/Output/Folder>
to obtain a.zip
file, for instance TrainerOutputFolder.zip. - Import the
.zip
file into Document Manager which detects that the import contains data produced by Machine Learning Extractor Trainer and imports the data accordingly.
If there are missing fields required by the dataset, an error message is displayed in the import dialog box.