Document Understanding User Guide

DELIVERY:

Last updated Apr 28, 2025

Overview

There are several ways in which you can consume Document Understanding^TM capabilities:

The DocumentUnderstanding.Activities package is available in Studio Web, Studio X, and Studio Desktop and is pre-configured for you either when you create a new automation starting from a file, or if you continue your journey after publishing a project version.
Using the IntelligentOCR package, which is designed for Windows and Windows Legacy projects, and pre-configured in the Document Understanding process template.
Using cloud API calls, consuming Document Understanding as a service via the programming language of your choice.

DocumentUnderstanding activities

If you're an RPA developer, you can use DocumentUnderstanding.Activities in your cloud projects. Using Document Understanding allows you to handle all data about a document within a single input/output object, named Document Data. Also, Document Understanding activities don't require setting the taxonomy of Document Types, so you can easily leverage out-of-the-box-models.

You can easily setup an automation using some of the following activities, through the Extraction Automation Builder available in Document Understanding, the Marketplace, and Studio Web:

Keep in mind that Document Understanding activities don't support the following capabilities, yet: splitting, training (fine-tuning of models), production/developer tenant support, on-premises support, and multiple extraction methods per Document Type.

If you start new automation projects that leverage modern projects (created using the Active Learning experience), you can use DocumentUnderstanding.Activities.

Intelligent OCR

As an RPA developer that wants to try the IntelligentOCR package, you can use different extraction and classification models based on your needs. If one model doesn't suit your needs, you can use other extractors or classifiers as a backup option. You can also modify the taxonomy, Document Object Model (DOM), and extraction results using RPA code during runtime.

However, there is a longer learning curve required for using IntelligentOCR, because its flexibility involves complexity, while working with multiple activities and data types.

With IntelligentOCR, you can integrate your own classifier, extractor, or OCR engine. Visit Document Processing Code Samples to check implementation examples.

API calls

You can use API calls as an alternative to the robotic process automation (RPA) approach. API calls allow you to retrieve detailed information about your project, including the extractors and classifiers used, facilitate the use of digitization APIs, classify and extract data from documents using both specialized and generative models, and validate previously digitized, classified, and extracted information.

For consuming the APIs, you can use any programming/scripting language (since the calls are made using HTTP), including RPA.

You can access the APIs via Swagger: In the toolbar of the Document Understanding service, search the REST API dropdown list, and select Framework.