- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD125 - ML package
- ACORD126 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Data and security
- Licensing
Document Understanding User Guide
Traffic limitations
The Extraction and Classification ML Packages require a significant amount of compute resources, which implies some limitations as the size of the documents and/or the throughput of number of documents per minute grow.
Documents larger than 100 pages are expected to run into compute or latency limitations, causing ML Skills to be unstable, or to return HTTP errors. An exact upper limit is hard to define because the text density and image resolution of documents has a large dynamic range, and the text density (number of words per page) impacts the compute and RAM resources required, as well as the latency. Additionally, the capacity of a ML skill depends on the size of the hardware used to deploy it, which is controlled by AI Center. For instance, ML skills can be deployed on GPU or on CPU, which has a large impact on the capacity and speed of the ML Skill.
Regarding throughput, ML Skills can only process one document at a time; this means you need to wait for one document to finish before sending the next one. The larger the documents, the fewer you can process per unit of time.
To mitigate these issues, if you need to process very large documents, keep in mind that in many cases the relevant data may be found on a smaller subset of pages, and this subset may be split out using the Intelligent Keyword Classifier. This may be a great strategy because it eliminates ML skill errors/failures/timeouts, increases throughput and responsiveness, increases extraction accuracy by reducing false positives, and reduces costs by eliminating unnecessary consumption of AI units.