Document Understanding Activities

Last updated May 15, 2025

Using OmniPage with an extended language

Follow these steps to build the example process:

Open Studio and create a new Process named by default Main.
Note: Add your files to the project directory in order to be able to run the entire process from the same place.

Add a Sequence container in the Workflow Designer.

Create the variables shown in the following table:

Table 1. Variables to be created
Variable Name	Variable Type	Default Value
`textFile`	Image	N/A
`extractedText`	String	N/A

Add a Digitize Document activity inside the Sequence container.
- In the Properties panel, add the path of the file you want to digitize, in the DocumentPath field. You can find a sample file in the downloadable example.
Add an OmniPage OCR engine inside the Digitize Document activity.
- In the Properties panel, add the value Image in the Image field.
- Select the Extended option from the EnginePack drop-down list.
- Select the check box for the ExtractWords option. This extracts the on-screen position of each detected word.
- Add the value "qct" in the Language field. This represents the language code for Traditional Chinese.
- Add the variable extractedText in the Text field for capturing and retaining all the text from the document.
Add a Write Line activity after the Digitize Document activity.
- Add the variable extractedText in the Text field.
Run the process. The used activities are analyzing the provided file and extract all the detected words written in the Traditional Chinese language.