- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Release notes
Release date: November 26, 2024
Updated the explanation text in the Form Extractor Template Editor to reflect the process of defining field anchors.
Release date: October 22, 2024
We've improved product stability by updating our common dependencies to the most recent versions. This upgrade is automatic and doesn't require any action from your side.
Release date: October 3, 2024
You can now use the following activities within the Classify Document Scope and Data Extraction Scope, even if the robot is connected to a local Orchestrator:
- Generative Classifier
- Generative Extractor
- Document Understanding Project Classifier
- Document Understanding Project Extractor
For the Data Extraction Scope activity, specifically, we have made it possible to use auto-validation features from a different organization or tenant.
We've added the RuntimeTenantURL and RuntimeCredentialsAsset properties to the previous activities. With these properties, you can now directly use credentials from external applications, stored in Orchestrator, to access Document Understanding resources at runtime. To achieve this, ensure that your selected tenant has Document Understanding enabled and AI Units allocated.
Also, in the Get Capabilities wizard of Document Understanding Project Classifier and Document Understanding Project Extractor activities, we've included properties like App Id, App Secret, and TenantUrl. These properties facilitate accessing resources from different organizations and tenants during the design phase.
IntelligentOCR.Activities now supports consumption of modern Document Understanding projects, through a new set of activities targeting modern project usage. You can now use the following activities for consuming your modern Document Understanding projects and versions:
- Document Understanding Project Classifier, used within a Classify Document Scope activity.
- Document Understanding Project Extractor within a Data Extraction Scope activity.
Enhance the capability of your projects to process documents by using the UiPath Extended Languages OCR which is now generally available (GA).
- Resolved an issue where the "Text length is zero" error was encountered when both Keyboard - Based Classifier and Intelligent Keyboard Classifier were enabled for the same document type.
- The ML extractor returned empty values, causing validation tasks to fail due to null value properties.
- Manually added field values in the Validation Station within Action Center weren't formatted according to the taxonomy, causing exported results to show incorrect data.
- Resolved the issue which prevented you from marking tables within image files in the Validation Station.
The UiPath Chinese, Japanese, Korean OCR will be deprecated starting with January 2025. We recommend using the UiPath Extended Languages OCR instead. Check the deprecation timeline for more information about upcoming deprecations and removals.
Release date: 13 August 2024
We've upgraded some internal dependencies for enhanced performance.
Release date: 31 July 2024
- Attempting to digitize a certain file content led to a "System.InvalidOperation" exception along with a "Fullness" message during the Digitize Document activity.
- When using the Generative Classifier and Generative Extractor activities, inputting a prompt that started or ended with whitespaces led to a "KeyNotFoundException" error with the "The given key was not present in the dictionary" message.
Release date: July 29, 2024
We've improved product stability by updating our common dependencies to the most recent versions. This upgrade is automatic and doesn't require any action from your side.
Release date: July 29, 2024
We've improved product stability by updating our common dependencies to the most recent versions. This upgrade is automatic and doesn't require any action from your side.
Release date: 20 June 2024
We are constantly working to improve your UiPath Document Understanding experience. Even though there are no major significant changes with this release, we made sure to bring minor improvements and accessibility fixes to our product.
Release date: 5 June 2024
We've improved product stability by updating our common dependencies to the most recent versions. This upgrade is automatic and doesn't require any action from your side.
Release date: 27 May 2024
Release date: 29 April 2024
These release notes contain all the updates made between November 2023 and March 2024.
Validator Notes
You can now enable Validator Notes for each field in Taxonomy Manager. When enabled, you can set notes on these fields, and they are displayed to the human validator. If the notes are set as editable, the validator can edit them and communicate information back to the automation, through a new ExtractionResult object property.
Generative Validation for Data Extraction Scope
You can use Generative Validation for the Data Extraction Scope activity to adjust confidence using generative extraction cross-checking. Check out the ApplyAutoValidation and AutoValidationConfidenceThreshold properties in the Data Extraction Scope activity.
- Installing the UiPath.IntelligentOCR.Activities package automatically installs the UiPath.DocumentUnderstanding.ML.Activities package. You do not need to install it separately.
- Fixed an issue where Japanese font was not recognized when converting to JPG.
- Fixed an issue where the order of numbers in Hebrew is reversed in Validation Station.
- Fixed an issue related to the extraction of bidirectional (left-to-right and right-to-left) text values, which caused wrong order for punctuation symbols.
A known issue exists when using the Document Understanding Process Template version 2022.10.2 within Studio 2023.4.4 on a Windows project. Opening the Taxonomy Manager results in an error stating that you must install missing .NET frameworks. Regardless of whether you choose to install .NET or not, another error message follows: "Communication between UiPath Studio and Taxonomy Manager ended unexpectedly."
Workaround: Manually install the .NET 6.0 Runtime.
Release date: 24 October 2023
- Present Validation Station
- Create Document Validation Action
- Form Extractor
- Intelligent Keyword Classifier
The Digitize Document activity can now detect native PDF radio buttons.
The content type detection capabilities of the Digitize Document is improved.
The Taxonomy Manager now allows definition of multiple math expressions in the business rules.
Release date: 19 September 2023
You are not authorized
error when
resuming a job after document validation is completed from Action Center.
Release date: 28 August 2023
We've fixed a bug that slowed down the Validation Station when documents contained large tables.
Release date: 8 June 2023
We've fixed a bug that was causing inconsistencies for the formatted values when the amount was negative.
Release date: 7 June 2023
A new option is available when using the Form Extractor activity, Send documents for algorithm improvements, allowing you to enable or disable the option before running the workflow. The default value is enabled.
We've made minor bug fixes and accessibility fixes throughout the entire UiPath.IntelligentOCR.Activities package.
Release date: 19 September 2023
You are not authorized
error when
resuming a job after document validation is completed from Action Center.
Release date: 7 June 2023
We are constantly working to improve your UiPath Document Understanding experience. Even though there are no major significant changes with this release, we made sure to bring minor improvements and accessibility fixes to our product.
Release date: 2 May 2023
We fixed a bug causing the Data Extraction Scope activity to crash when the extraction is completed on all but the first sub-document. This was happening when a classifier was used to perform document splitting and multiple classification results were returned from Classify Document Scope.
Release date: 26 April 2023
- UiPath Document OCR is the new default OCR engine for the following activities: Intelligent Kewword Classifier, Intelligent Kewword Classifier Trainer, and Form Extractor.
- We've added a retry functionality to the Wait for Document Validation Action and Wait for Document Classification Action activities. You can use the new Retry option and set it as Enabled/Disabled, the default value being Enabled. If enabled, HTTP calls will be retried upon failure.
- The UiPath.IntelligentOCR.Activities package can now be used with right-to-left languages.
- The Form Extractor activity extraction accuracy has been improved by including the page matching information into the extraction algorithm.
- We've updated the design of the field rules that can be set in Taxonomy Manager.
- You can now apply mathematical expressions on field rules by using the Taxonomy Manager wizard of the Load taxonomy activity.
- The Validation Station wizard has been updated, allowing you to see the rules applied on fields. Also, when a field is manually updated, the field rule automatically updates as well.
- The Digitize Document activity has been improved and now consumes less system memory.
- Stamp widgets are now digitized in native PDFs.
- PDF file support was improved for the Digitize Document activity.
Erratum September 2023: Splitting advanced features are available for the Intelligent Keyword Classifier activity.
- We've updated the package dependencies and fixed the Method not found error thrown when an Invoke Code activity was added to the workflow.
- We've fixed a bug that was preventing the auto population of the API key when your authentication token for the used Orchestrator instance has expired. The error occurred for both UiPath Document OCR and OCR for Chinese, Japanese, Korean API key fields.
We recommend that you regularly check the deprecation timeline for any updates regarding features that will be deprecated and removed.
Release date: 15 December 2022
- The Document Understanding API key is now pre-populated for the following activities: UiPath Document OCR, OCR for Chinese, Japanese, and Korean, Machine Learning Extractor, Machine Learning Classifier, and the Template Manager of the Form Extractor activity.
- The Studio user interface is now available in Traditional Chinese.
- You can now benefit from the API Key field being pre-populated for the following activities included in the UiPath.IntelligentOCR.Activities package: Intelligent Keyword Classifier and Form Extractor.
- The Keyboard shortcuts menu now includes new hotkeys, added in a separate Accessibility section. They are available for the Present Validation Station and Present Classification Station activities.
Release date: 24 October 2022
- New action objects are available for the Wait for Document Classification Action and Resume and Wait for Document Validation Action and Resume activities.
- The Digitize Document activity has been upgraded and now comes with a default preselected OCR engine, the UiPath Document OCR engine. As a consequence to this change, the UiPath.OCR.Activities package has become a dependency of the UiPath.IntelligentOCR.Activities package.
- Digitize Document activity received a new parameter,
Detect Checkboxes
, that enables the check box detection while the document is digitized. - The OCR confidence level can be individually updated for a selected field in Validation Station.
- The confidence filter design has been updated and confidence scores have been added at table level, for each entry, for both OCR and extraction. You can now check the original confidence level of a field that was manually validated. Both values are available by clicking the displayed confidence level.
- Updates have been made to the Validation Station wizard. You can now set a threshold for the confidence levels and sort them depending on the set limit.
- The Taxonomy Manager wizard interface was updated, making it even more easy to use. Among the new features are an extra Delete option for all groups, fields, categories, or the Toggle keyboard shortcuts option.
- Checkbox detection is now applied on native PDF pages which do not have embedded native checkbox characters or controls.
- PDF processing capabilities have received a major update, including the ability to process vector-based text, capabilities to ignore invisible text objects, improvements to word detection, improvements to logo processing, fixes for character duplication issues, and other improvements.
- Text extraction from PDF files has been upgraded, resulting in an optimized extraction process, where both native and scanned text is retrieved at the same time, with the OCR being applied only on the images identified in the PDF file. This improvement is available only when the ApplyOCROnPDF option is set to Auto.
- The Document Understanding Process Studio template has been upgraded to a new version. The UiPath.IntelligentOCR.Activities package is a dependency for this template.
- Fixed a bug that was causing extraction errors when Digitizer was used by upgrading the PDF library and using hybrid OCR features.
- Fixed a bug occurring on the Digitize Document activity that was causing checkbox extraction on some PDFs, even when the
DetectCheckboxes
option was set to False. - Fixed a bug occurring on the Classify Document Scope activity that was throwing an empty error for the
documentText
parameter when two classifiers were used in the scope and processing a certain document. - When a field from Validation Station was manually validated, the confidence level didn't update to 100%. The bug was fixed and now the percentage of the confidence level updates automatically when a field is manually validated by the user.
- Fixed a bug that occurred in Classification Station wizard and Taxonomy Manager when the mouse cursor was moved to the Document View section. Now, everything works as expected.
- An error occurred when Validation Station was used in text view with documents that included special characters. The bug was fixed and now you can view documents with special characters in text view as well.
Release Date: 9 May 2022
- We have been working hard to give a new face to some of our wizards, hence, the Validation Station, Classification Station, and Taxonomy Manager now all have a brand-new interface that is very user friendly and presents multiple new functionalities.
- Some of the features worth mentioning from Validation Station are the confidence scores shown for each extracted field. They can be sorted by OCR or Extractor and show you exactly the confidence score for each extracted field. The confidence score should be used only for guidance purposes. You can always improve that score by manually validating the data.
- The Validation Station wizard also has a restyled header in the PDF viewer from where you can choose to swipe the document view from left to right, to hide the extracted tokens for a clean view of the document, or to visualize the keyboard shortcuts. Don't hesitate anymore and go, try it out.
- The Classification Station wizard presents itself with the same restyled header in the PDF viewer as its peer, the Validation Station. Here, you can also choose to display the document on the left or right side of the screen, or you can check the available keyboard shortcuts. The rotate option is also available in the new header, making document manipulation easier than ever.
- The Taxonomy Manager wizard has, among other improvements, a new, particularly useful one, a Delete option available in the header of the document type that also enables a bulk delete.
- The UiPath.IntelligentOCR.Activities package has been upgraded to .NET5 portable, allowing you to run them on Linux robots.
- The UiPath Studio MSI size has been optimized and the UiPath.IntelligentOCR.Activities package is not anymore a core package of the UiPath Studio MSI, but an optional one. All functionalities remain the same. The only change is that you need to manually install the package in UiPath Studio.
- The Digitize Document activity and the Intelligent Keyword Classifier activity have been updated and the ForceApplyOCR option has been replaced by the ApplyOcrOnPdf option. The Apply OCR on PDF has three options available in the dropdown list: True, False, and Auto. If set to True, the OCR is applied to all PDF pages of the document, if set to False, only digitally typed text is extracted. The default value is Auto, determining if the document requires to apply the OCR algorithm depending on the input document.
- In Classification Station wizard, the value of the Not Classified groups is now set as N/A.
- The Intelligent Form Extractor activity deprecation is planned for October 2022. We recommend using the Form Extractor activity.
- The Form Extractor activity can now process documents with detected signatures on them.
- Fixed a bug occurring on the Validation Station wizard. Certain Asian fonts were not correctly displayed in the PDF Viewer of the Validation Station.
- Fixed a bug occurring while using the Digitize Document activity with UiPath Studio v19.10 and v20.10. An error was thrown when trying to process
.tiff
files. Now, everything works as expected. - Fixed a bug occurring on the Validation Station while using the TAB shortcut key. Instead of saving the changes, the TAB key was reverting the field to the previous value. Now, everything works as expected.
- Fixed a bug occurring on the Form Extractor activity. The wrong error message was displayed when a template was imported.
Release Date: 19 October 2021
- The Taxonomy Manager received a complete overhaul, with an improved UI and user experience. You can now add document types without group or category, configure colors and hotkeys for fields, and more. Consult the documentation for a complete description.
- The UiPath.IntelligentOCR.Activities package has been upgraded to .NET5. While both .NET versions continue to be supported, the .NET5 projects can only work on 64-bit architectures.
- The digitization process has been improved for the entire framework throughout Document Understanding and Data Manager.
- Improvements have been made to optimize the OCR results on scanned documents. Best results are obtained by keeping the skew angle between +/- 20 degrees.
- Image processing dimensions have been improved for better results. For an image to be successfully digitized/processed, its width and height dimensions should be between 50 and 10000 pixels. Any image below or above this range is to be rejected, with an exception message. An image validated with the previously mentioned dimensions and with a total size bigger than 14MP is to be scaled down to 14MP, while maintaining the aspect ratio (width/height ratio).
- The Validation Station error message system has been improved and now, if
the user rejects a document, an exception of type
DocumentRejectedByUserException
is thrown and the process is stopped. - Improved the load time of Validation Station for document types with large taxonomies.
- For derived parts in Validation Station, numbers with more than two decimals are not rounded up anymore.
- Due to improvements to image processing algorithms, changes might appear in the digitization of certain documents.
- A new check box has been added to the Template Manager wizard, allowing you to choose if the added synonyms are case sensitive or not.
- A design update has been made to the Template Manager wizard accessible from the Intelligent Form Extractor and Form Extractor activities.
- If you experience timeouts due to long processing time, you can now use the newly added Timeout parameter for Form Extractor and Intelligent Form Extractor to increase the service call timeout.
- If a field is checked in both Signature and Handwritten boxes in the Template Manager wizard of the Intelligent Form Extractor activity, then a popup message appears informing you that a field can be added only in one box, not both.
- The wizard available for the Intelligent Keyword Classifier activity received an update, meaning that clicking the OK button of the vector(s) exported message now returns the user to the wizard instead of closing the wizard.
- The Create Document Validation Action and Present Validation Station activities, received a new parameter, ShowOnlyRelevantPageRange. This allows you to configure the activity so that it only shows the page range captured in the classification part of the extraction result.
- Performance and memory improvements in the Digitize Document activity.
- CefSharp reference was updated to version 92.0.260.
- Fixed a bug that occurred when OCR was run on different OS region formats. Now, the OCR runs as expected and all results are generated correctly, no matter of the OS region format.
- Fixed a bug in the Export Extraction Results activity that was deleting the extracted table when a field was marked as handwritten. Now, the entire extraction result is exported as expected.
- Fixed an issue related to Validation Station that was causing unexpected number formatting when reading the derived parts value.
- Fixed a bug in the Wait For Validation Action activity that was returning an error when the Automatic Extraction Result parameter was set as empty. Now, the activity runs as expected, without any errors.
- Fixed an issue that threw a runtime error when no extraction results were served to the Present Validation Station activity.
- Fixed an issue in Digitize Document activity, that caused the activity to crash when the ForceApplyOCR parameter was set to False.
- Fixed an issue in the Template Manager wizard that caused data not to be extracted when using the table selection with Form Extractor.
- Fixed an issue that caused derived parts to not be extracted for a date field when processing a specific document.
- Fixed an issue in the Template Manager wizard that caused anchors not to be highlighted after marking a table.
- Fixed an issue that was causing the Data Extraction Scope activity to throw an error when stating that the fields from the extractors configuration could not be found in the taxonomy, although the extractors were removed from the scope.
- Fixed an issue that caused the Template Manager wizard to throw an error when trying to save a template with certain words added as page evidence.
- Fixed an issue which prevented the display of an empty Validation Station with full manual processing for data entry when the AutomaticExtractionResults parameter was null.
- Fixed a bug that was occurring when special characters were included in the file or bucket name for any of the following activities: Create Document Classification Action, Wait for Document Classification Action, Create Document Validation Action, Wait for Document Validation Action. Now, all special characters from the file/bucket names are encoded as expected.
- Fixed an issue that was causing signature and handwritten fields to not be extracted due to the background contrast. Now all fields are correctly extracted, no matter of the background color.
- Fixed a bug that was causing the OCR engine to return an error on certain air-gapped systems.
- Fixed a bug that was merging the extracted content when using the Digitize Document activity with the UiPath Document OCR engine. Now, each item is extracted separately.
- If you want to use any OCR activity from this package in Studio v2019.10, please install the UiPath.CoreIPC package, version 2.0.1 or higher.
- If you install the UiPath.IntelligentOCR.Activities package, v5.0.0 on a machine using Windows N/KN as an operating system, then the Media Features package is also required. Visit Media Feature Pack list for Windows N editions for installation instructions for the Media Features package.
Release Date: 29 March 2021
- Extended the Form Extractor and Intelligent Form Extractor capabilities by adding field-level anchor-based extraction rules. Besides page-level anchors, field-level anchors can now be defined in Template Editor - a new option of defining the bounds of a custom area from which data is to be extracted. As opposed to page-level configurations, which define data positions with respect to the entire page content, anchor-based configurations now allow for targeting data extraction based on field-level configurations, thus allowing for more flexibility.
- Performance improvements on Validation Station.
- Updated Validation Station and Classification Station design system for a better user experience.
- The Validation Station, Classification Station, and Template Manager now have a three-state button, in the Document View side, that allow users to choose between different document interaction modes: Tokens (word selections), Custom area (area selection), and Choice on selection (users can choose between Tokens and Custom Area at each selection).
- The user interfaces, Validation Station, Classification Station, and Template Manager, have been improved with a new selection mode in text view, now allowing users to perform selections from the text version of a document in the same way they interact with the original version. A new hotkey, d+s, was also added, to assist in switching between the original document view and the text view modes.
- The Validation Station now displays a "crop" from the original document, when you assign a value to a data field, under the reported text value selected. This helps with locating and verifying a specific field value against the value area in the document.
- Changed confidence calculation for Intelligent Keyword Classifier to be scalable with the length of the word vectors.
- Added the IncludeOCRConfidence checkbox to the properties panel of the Export Extraction Results activity. If selected, the exported information will contain OCR Confidence for each value as well.
- Improved letter and word processing algorithms to avoid reporting duplicate characters or words in certain situations.
- Classify Document Scope and Train Classifiers Scope now support classifier capabilities.
- Classify Document Scope has been optimized to perform sequential calls to the classifiers in its scope, with only the page ranges that are not already classified by a previous extractor.
- Fixed an issue that threw a runtime error in specific cases when a Form Extractor activity and an Intelligent Form Extractor activity were in the same Data Extraction Scope.
- Fixed an issue that prevented classifier errors to be thrown in specific cases, but classification failed silently.
- Fixed an issue that caused derived parts not to be extracted for a number field when processing a specific document.
- Fixed an issue in Digitize Document, that caused the activity to process document pages even after an exception was reported, thus increasing the overall execution time for cases of failure.
- Fixed a bug that did not allow for the correct configuration of Regex expressions in Regex Based Extractor, in C# projects, and other very specific situations.
- Fixed a performance issue that appeared in Validation Station and Template Editor, when a document type contained more than 200 fields.
- Fixed a bug in which, in certain situations, numbers were merged into a single reported numeric value.
- Fixed an issue through which, in certain situations, the Wait for Document Validation Action and Resume activity would throw an exception when communicating with storage buckets.
Release Date: 12 November 2020
- CefSharp reference updated to version 84.4.10.
- Updated endpoints as follows:
- Form Extractor -
from
https://formextractor.uipath.com
tohttps://du.uipath.com/svc/formextractor
- Intelligent Form
Extractor - from
https://intelligentforms.uipath.com
tohttps://du.uipath.com/svc/intelligentforms
- Intelligent Keyword
Classifier - from
https://intelligentkeywords.uipath.com
tohttps://du.uipath.com/svc/intelligentkeywords
- Form Extractor -
from
- Made improvements to Validation Station while in mark table mode.
Release Date: 20 October 2020
More detailed error logging for Form Extractor, Intelligent Form Extractor and Intelligent Keyword Classifier.
Release Date: 5 October 2020
New activities
The following activities have been included in the package:
- Present Classification Station - designed for classifying and separating files based on the document type.
- Create Document Classification Action & Wait for Document Classification and Resume - designed for integration with Orchestrator.
- Intelligent Keyword Classifier & Intelligent Keyword Classifier Trainer - designed for classifying, splitting, and training document packages into individual document types.
Validation Station
To easily identify the information in the Validation Station, color codes were added to field cards and tokens or custom areas. Each field card has by default a color code, while tokens or custom areas get the same color code as the field card they are assigned to.
New shortcuts have been added to Validation Station allowing the user to move a selected line from a table up, down, left, or right. Also, when selections are made in Validation Station, these can be assigned to a specific field using field-level shortcuts. Each field card has a key associated with it. When no selections are made, you can use field-level shortcuts to jump from one field card to another.
For Validation Station table fields, a row-level checkmark was added. You can now check all the fields from a row by selecting the checkmark or it will be automatically checked when you visit all the fields.
Tokens in Validation Station have been updated. Thus, the highlighted tokens have a red bottom border and the selected tokens have a dashed border.
Field values with no reference are now supported in Validation Station. Users can assign values to fields that do not have a reference in the document. To do so, while the user creates a field in Taxonomy Manager, the Requires Reference checkbox needs to be unchecked.
Classification Station
New shortcuts were created for Classification Station allowing the user to navigate through document types; add, change, remove or highlight reference; move all pages up or down; split after selected page; discard changes; save; report as exception.
Besides using the document type menu, a reference can now be removed at page level as well by hovering over a page and clicking the blue icon in the bottom right corner. The icon also allows the user to highlight the reference.
PDF Viewer in Classification Station and Validation Station
The Rotate button was added to the PDF Viewer. By clicking the button, the current document page will rotate clockwise.
Selection mode is enabled by default in PDF Viewer.
Other activities
The Intelligent Form Extractor and Form Extractor activities can now incorporate imported templates that have the same name but different content as the already available ones. Each template is analyzed, and a warning message is displayed for each case.
The ActionPriority property from the Create Document Validation Action activity now supports expressions and variables.
The terms BucketFolderPath and DirectoryFolderPath were changed to BucketDirectoryPath and DownloadDirectoryPath for the Create Document Validation Action activity, respectively for the Wait for Document Validation Action and Resume activity. The reason behind this is to clearly separate from the Orchestrator concept of “Folder”.
Release Date: 24 August 2020
- Fixed an issue that in some cases was returning a
407ProxyAuthenticationRequired
error message for Kerberos or NTLM authentication requests. This applies to Form Extractor, Intelligent Form Extractor, and Intelligent Keyword Classifier. - Fixed an issue that was causing the Intelligent Form Extractor to not properly display a timeout error.
- Missing translations were added for certain Validation Station strings.
- Fixed an issue that was causing the Data Extraction Scope to throw an error when unselecting a table field.
Release Date: 24 June 2020
Release Date: 4 May 2020
- This release brings many new exciting activities such as Create Document Validation Action, and Wait for Document Validation Action and Resume that can be used to create, suspend, and resume orchestration workflows in the UiPath Action Center.
- Two new extractors are here to be of your help. You can find them under the name of Form Extractor and Intelligent Form Extractor. Both activities can extract information from fixed form documents based on predefined templates, the difference being that the Intelligent Form Extractor can also be configured to interpret fields that are signed or handwritten. You can extract information from any type of field, including tables and create custom table extraction rules by using the Template Manager wizard.
- While using the Intelligent Form Extractor activity, if the number of handwritten fields might have been exceeded, then a warning is displayed directly in the workflow. This does not stop the user from running the workflow.
- The Regex Based Extractor activity received a new option named
UseVisualAlignment
that can be used for complex layouts where it is easier for users to write regular expressions based on how words are visually organized on lines, ignoring any sentence, paragraph, or layout group otherwise identified in the document. - You can define a regular expression for identifying the table area, a regular expression for identifying a table row in that area, and regular expressions for identifying specific columns in the table rows.
- The Present Validation Station and its wizard come with many new and improved features.
- The Validation Station wizard now has a new button named Discard changes. You can use it for confirming or dismissing any changes done in Validation Station. The function can be used on each document type individually.
- The wizard also has a new option named Show Suggestions that allows you to select one value from multiple candidates if the used extractors report multiple possible values.
- The list of shortcuts available in the Validation Station has been enriched with
a new one,
f+a
, allowing you to add a new value in a multiple values field. - Improvements have been made on the Digitize Document activity that can now better identify the check boxes in a document.
- The Digitize Document activity also has a new option named ForceApplyOCR. When selected, it applies the OCR engine to all the pages of the document, including native PDF.
- The Data Extraction Scope activity can now automatically read Extractor capabilities (internal taxonomies) if the Extractor declares them. This simplifies the configuration step by exposing the extractor's known fields. The Machine Learning Extractor now supports this new functionality, making it very easy to use and configure.
- The Export Extraction Results activity received a new option named IncludeConfidence. If selected, the confidence level is provided.
- The extraction and configuration wizards now support bulk field selection for document types and table fields.
Release Date: 14 January 2020
- Fixed an issue that was causing the Validation Station wizard to display incorrectly the table preferences, when using the Extract new table option.
- Fixed an issue that was returning an error when Validation Station wizard was run with Callout activity chained before or after it. Now, the activity runs as expected.
- Fixed an issue that was causing the Data Extraction Scope activity to throw an error when it was run with a customized machine culture and the FormatValuesIfPossible option selected. Now, the activity runs as expected.
- Fixed an issue that was causing some performance issues when large amounts of text were selected in the Text View option of the Validation Station wizard. Now, the Text View option displays the text as expected.
- Fixed an issue that was causing the Data Extraction Scope activity to throw an error when it was run with an extractor without an internal taxonomy set and a new field was added in the project’s taxonomy. Now, the activity runs as expected.
- On certain machines, rotated documents were not displayed properly when using the Validation Station.
Release Date: 6 December 2019
- Major updates occurred for the UiPath.IntelligentOCR.Activities package. All activities used for working with FineReader and FlexiCapture Abbyy product families were moved into a separate package named UiPath.Abbyy.Activities. This has led to a breaking change for the UiPath.IntelligentOCR.Activities package, which caused the version to skip ahead from v3.1.0 to v4.0.0. The following list shows the activities that were moved from the UiPath.IntelligentOCR.Activities package into the UiPath.Abbyy.Activities:
- The UiPath.Abbyy.Activities package cannot be used with versions lower than v19.11 for the UiPath.UIAutomation.Activities package and lower than v4.0.0 for the UiPath.IntelligentOCR.Activities package.
- If after updating a workflow to the new UiPath.IntelligentOCR.Activities v4.0.0 and UiPath.Abbyy.Activities v1.0.0 you encounter runtime validation errors, please force a new save on the
.xaml
file by making a small change and then reverting it. This might occur for workflows using FlexiCapture activities. - Workflows created or upgraded to UiPath.IntelligentOCR.Activities v4.0.0 cannot be downgraded to a lower UiPath.IntelligentOCR.Activities version.
Release Date: 8 November 2019
- A new activity meant to help you better organize and manage your trainable classifiers is available: Keyword Based Classifier Trainer. This activity can be used only together with the Train Classifiers Scope activity.
- The Validation Station wizard received an important upgrade and is now available for you to explore its maximum potential. This wizard becomes available only when the Present Validation Station activity is used in a workflow. You can use the upgraded version for benefiting from a new user-friendly interface, navigating through the document while using the keyboard shortcuts, or selecting one or multiple words or a custom area. You can easily mark a field as missing, extract new data, edit a table, or extract a new table. All these marvelous things can be done with the Validation Station wizard while using a dark theme.
- One of the improvements included in this release is that the Keyword Based Classifier activity received a new parameter named LearningData. Besides specifying where the learning file data are located, you can now also use the string containing the serialized classifier data. This activity was enhanced with a wizard named Manage Keyword Based Classifier Learning that can be used for configuring and managing the keywords used for identifying specific document types.
- Both the Keyword Based Classifier and Keyword Based Classifier Trainer activities are now able to manage multiple keywords. After the keyword sets are selected, the extraction is based on a full match of the selected words.
- Another great improvement is that the
DocumentObjectModel
output, included in the Digitize Document activity, can now support word polygons, besides word horizontal boxes. - The Taxonomy Manager wizard received a new scrolling bar that incorporates all UI elements and it provides a better user experience.
- Data Extraction Scope, Train Extractors Scope, Train Classifier Scope, and Classify Document Scope activities are now arranging their extractors and classifiers in horizontal order, replacing the previous vertical order.
- The Regex Based Extractor activity has been improved and can now process and return multi-values. The output is visible only when the activity is used together with the Validation Station.
- Four new languages, Turkish (TR), Portuguese (PT), Spanish (ES), and Spanish-Mexico (ES-MX) are available for the UiPath.IntelligentOCR.Activities package.
.xaml
file. If no files are opened when you access the Taxonomy
Manager, a recording window is shown and Taxonomy Manager is displayed only after
closing the recording window.
- An exception was thrown when using the Data Extraction Scope activity together with a Try Catch activity. The issue was fixed and now the activity is executed as expected.
- When a Boolean field was set to No in Validation Station, the output file should have shown the result as No but instead is showing it as missing. The issue was fixed and now the output file shows the correct result.
- Fixed incorrect number parsing that occurred when the Data Extraction Scope was trying to parse numbers in documents using a different number format than the document's culture.
- When using multiple Validation Stations, the order of the derived parts was not respected in the validated results. The issue was fixed and now the results are displaying the derived parts in the same order they were introduced.
- Differences between the boxes with custom selection occurred when the results of a Validation Station were run through a second Validation Station. The issue was fixed and now there are no differences between boxes with custom selection.
- When the Digitize Document activity was used together with Microsoft Azure Computer Vision OCR engine, the rotation was not working when HandwritingRecognition parameter was set as True. The issue was fixed and now the information is processed correctly.
- When using Digitize Document activity, an error occurred when trying to process images with a lot of text. The bug was fixed by improving the scaling process.
- Fixed an issue that was throwing when trying to train the Keyword Based Classifier activity in the training scope and the extraction was run without a classification reference. The issue was fixed and now the fact that there is no learning information is only logged, not thrown as an error.
- An error was thrown when using the FlexiCapture Extractor activity
and the same name was given to both a table column and a field. The issue was
fixed and the
.fcdot
file is now processed as expected.
- v6.22.1
- Improvements
- Bug fixes
- v.6.9.8
- v6.14.3
- Bug fixes
- v6.19.7
- Bug fixes
- v6.22.0
- What's new
- Support for activities from an on-premises setup
- IntelligentOCR support for modern projects
- Extended Languages OCR
- Bug fixes
- Upcoming deprecations
- v6.19.6
- v6.19.5
- Bug fixes
- v6.0.2
- v6.5.4
- v6.19.3
- v6.14.2
- v6.19.2
- Bug fixes
- v6.19.1
- Bug fixes
- v6.19.0
- What's New
- Improvements
- Bug fixes
- Known issues
- v6.14.1
- New Features & Improvements
- v6.9.5
- v6.9.4
- v6.9.3
- v6.9.2
- v6.5.3
- v6.5.2
- v6.9.1
- Bug Fixes
- v6.9.0
- New features & Improvements
- Bug Fixes
- Deprecation timeline
- v6.5.1
- New features & Improvements
- v6.6.0
- New features & Improvements
- Bug Fixes
- v6.5.0
- New features & Improvements
- Bug Fixes
- 8 December 2022
- v6.0.1
- New features & Improvements
- Bug Fixes
- Known Issues
- v5.0.2
- Improvements
- Bug Fixes
- Known Issues
- v4.13.2
- Improvements
- v4.13.1
- Improvements
- v4.13.0
- New Features and Improvements
- Bug Fixes
- Known Issues
- v4.10.5
- Improvements
- v4.10.4
- Improvements
- v4.10.3
- Improvements
- v4.10.2
- New Features and Improvements
- Bug Fixes
- v4.10.1
- Improvements
- Bug Fixes
- v4.10.0
- New Features and Improvements
- v4.5.2
- Bug Fixes
- v4.5.1
- Bug Fixes
- v1.3.2
- Improvements
- v4.5.0
- New Features and Improvements
- v4.0.1
- Bug Fixes
- v4.0.0
- Breaking Changes
- Known Issues
- Bug Fixes
- v2.0.2
- Improvements
- v1.3.1
- Improvements
- v3.1.0
- New Features and Improvements
- Known Issues
- Bug Fixes