Intelligent Keyword Classifier

What Is Intelligent Keyword Classifier

The Intelligent Keyword Classifier is a classifier that uses the word vector it learns from files of certain document types to perform document classification.

The algorithm is built around the concept of repeating content for the same document type and starts from the premise that document types have a series of words that usually occur in those document types, thus allowing for a vector similarity computation.

When classifying a file into a document type, the Intelligent Keyword Classifier:

finds the closest word vector a file is more similar to,
reports on the highest scoring document type, with the underlying matching main words.

The Intelligent Keyword Classifier also has file splitting capabilities, meaning that it can report more than one class for a given file, for separate page ranges.

When To Use

You should consider using this classifier if:

your files contain one or more document types within a single file
your document types are relatively easy to differentiate as far as content goes.

Special Requirements

You need to use your Automation Cloud Document Understanding API Key, or host your own instance of the Intelligent Keyword Classifier in AI Center on-prem, to use this classifier.

How To Train

Place the Intelligent Keyword Classifier Trainer activity in a Train Classifiers Scope, and configure it accordingly.

We cannot enforce training file consistency across parallel trainings at the activity level. Two possible solutions for this issue are provided by Document Understanding Process. Both consist of traffic control:

lock files (implemented by default in the process): rename the file using the .lock extension, modify and save the file, then rename the file again, removing the .lock extension
manual setup of a special queue: create an empty queue in Orchestrator and integrate your two activities from the project.

For more information on how to train a Classifier, check this page that describes the process of using the Manage Learning wizard.