- Overview
- Getting started
- Building models
- Consuming models
- Model Details
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD125 - ML package
- ACORD126 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Public endpoints
- Supported languages
- Insights dashboards
- Data and security
- Licensing and Charging Logic
- How to
Migrating classic projects
- Export the dataset from the project based on AI Center.
- Import the dataset into the modern project.
- Currently, importing datasets larger than 3000 pages is not supported. Only the initial 3000 pages will be successfully imported, with any additional pages failing to do so. For example, if your dataset consists of 2999 pages and you try to import a document of 4 pages, the process will not succeed.
- Batch names and corresponding batch results are not currently available. If your data is organized into batches, this information is not displayed yet, but it is saved.
Once the dataset is imported, the model training starts. After the training is complete, the model score is displayed. To check detailed model scores, select the score, and then Detailed model scores.
This action takes you to the Measure page where you can access detailed model metrics.
When the same dataset is used to train an ML twice, you can observe slightly different model metrics. This can occur due to a few reasons:
- Initialization: Machine learning uses optimization methods that need initial guesses to trigger the optimization algorithms. Different initial guesses during each training could lead to various outcomes due to the unpredictable nature of these algorithms.
- Random state: Some algorithms use randomness in their operations. For instance, when training a neural network, procedures like stochastic gradient descent and mini-batch gradient descent introduce randomness. Therefore, even with identical initial model parameters and datasets, the performance of models may vary in different runs.
- Regularization: Certain algorithms include a penalty term that encourages the model to maintain smaller weights. Due to the randomness involved, the model could operate with a different weight set each time.
However, it's vital to note that these minor differences don't necessarily imply that one model is superior or inferior to another. Even with slightly varying metrics, the models' ability to comprehend data essentially remains the same, provided the differences are not significantly large. Moreover, repeating this process numerous times and taking an average should yield similar performance metrics.
For classic projects, there are various methods for exporting data. Not all types of exported data are compatible for importing into modern projects. To compare the model results across both project types,filter documents by Training and validation set and select Choose search results to export the dataset. For more information on each option, check the following table.
Type of export | Exported data | What happens to imported data |
---|---|---|
Current search results | Exports the current filtered dataset. Use it together with the Training and validation set filter. | Documents tagged as training are used to train the model.
Documents tagged as validation are used to measure the model
performance.
Tip: To compare model
results between two project types, always export and import the
dataset as Train and validation.
|
All labeled | Exports all annotated documents from the dataset:
|
|
Schema | Exports the list of fields and their respective settings. | A schema is imported is imported if there is none. If a schema is already defined, importing fails. |
All | Exports all annotated and unannotated documents. |
|
- Create a custom document type in the Build section.
- Import the zip file that holds the schema.
- Schema imports are limited to custom document types with no pre-existing schemas.
- If you import a schema into a document type that already contains a schema, the import will fail.