document-understanding
2024.10
false
- Overview
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- 990 - ML Package - Preview
- ACORD125 - ML package
- ACORD126 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Passports - ML package
- Payslips - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public Endpoints
- Hardware requirements
- Pipelines
- Document Manager
- OCR services
- Deep Learning
- Insights dashboards
- Document Understanding deployed in Automation Suite
- Document Understanding deployed in AI Center standalone
- Licensing
- Activities
- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentProcessing.Contracts
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Document Understanding User Guide
Last updated Nov 11, 2024
OCR
Each OCR engine is
tailored to deliver efficient and effective optical character recognition, regardless of
your specific needs or deployment. This page provides information on the supported
languages for UiPath® OCR engines:
- UiPath Document OCR: default UiPath OCR, which receives regular updates and improvements. You can use it on either GPU or CPU, delivering the same level of accuracy in both cases.
- UiPath Document OCR_CPU: specially optimized to run on CPU.
- UiPath Extended Languages OCR: capable of processing documents in over 200 languages, especially in Chinese, Korean, Vietnamese, Thai, major Indian languages, and languages that use the Cyrilic or Greek alphabets.
- : available as an endpoint and only for CPU deployments.
Tip: Choosing the right
OCR engine for your documents is simple. By default, use the UiPath Document OCR,
which benefits from regular updates and improvements. If this doesn't support your
document language or it's not performing as expected, switch to one of our other OCR
engines, like the UiPath Extended Languages OCR.
Language (Language Code) | UiPath Document OCR and UiPath Document OCR_CPU | UiPath Extended Languages OCR | Chinese, Japanese, Korean OCR |
---|---|---|---|
Adyghe (ADY) | |||
Afar (AA) | |||
Afrikaans (AFR) | |||
Akan (AK) | |||
Albanian (SQI) | |||
Algonquin (ALQ) | |||
Angika (Devanagari) (ANP) | |||
Arabic (ARA) | (Preview) | ||
Asturian (AST) | |||
Asu (ASA) | |||
Avaric (AV) | |||
Awadhi-Hindi (Devanagari) (AWA) | |||
Aymara (AYM) | |||
Azerbaijani (Latin) (AZ) | |||
Bafia (KSF) | |||
Bagheli (BFY) | |||
Bambara (BM) | |||
Bashkir (BA) | |||
Basque (EU) | |||
Belarusian (Cyrilic) (BE, BE-CYRL) | |||
Belarusian (Latin) (BE, BE-LATN) | |||
Bemba (BEM) | |||
Bena (BEZ) | |||
Bhojpuri-Hindi (Devanagari) (BHO) | |||
Bikol (BIK) | |||
Bislama (BI) | |||
Bodo (Devanagari) (BRX) | |||
Bosnian (Latin) (BS) | |||
Brajbha (BRA) | |||
Breton (BR) | |||
Bulgarian (BG) | |||
Bundeli (BNS) | |||
Buryat (Cyrilic) (BUA) | |||
Catalan (CA) | |||
Cebuano (CEB) | |||
Chamling (RAB) | |||
Chamorro (CH) | |||
Chechen (CE) | |||
Chhattisgarhi (Devanagari) (HNE) | |||
Chiga (CGG) | |||
Chinese - Simplified (ZH-Hans) | |||
Chinese - Traditional (Hant) | |||
Choctaw (CHO) | |||
Chukot (CKT) | |||
Chuvash (CV) | |||
Cornish (KW) | |||
Corsican (CO) | |||
Cree (CR) | |||
Creek (MUS) | |||
Crimean Tatar (Latin) (CRH) | |||
Croatian (HR) | |||
Crow (CRO) | |||
Czech (CS) | |||
Danish (DA) | |||
Dargwa (DAR) | |||
Dari (PRS) | |||
Dhimal (Devanagari) (DHI) | |||
Dogri (Devanagari) (DOI) | |||
Duala (DUA) | |||
Dungan (DNG) | |||
Dutch (NL) | |||
Efik (EFI) | |||
English (EN) | |||
Erzya (Cyrilic) (MYV) | |||
Estonian (ET) | |||
Faroese (FO) | |||
Fijian (FJ) | |||
Filipino (FIL) | |||
Finnish (FI) | |||
Fon (FON) | |||
French (FR) | |||
Friulian (FUR) | |||
Ga (GAA) | |||
Gaelic - Irish (GA) | |||
Gaelic - Scottish (GD) | |||
Gagauz (Latin) (GAG) | |||
Galician (GL) | |||
Ganda (LG) | |||
Gayo (GAY) | |||
German (DE) | |||
Gilbertese (GIL) | |||
Gondi (Devanagari) (GON) | |||
Greek (EL) | |||
Greenlandic (KL) | |||
Guarani (GN) | |||
Gurung (Devanagari) | |||
Gusii (GUZ) | |||
Haitian Creole (HT) | |||
Halbi (Devanagari) (HLB) | |||
Hani (HNI) | |||
Haryanvi (BGC) | |||
Hawaiian (HAW) | |||
Hebrew (HE) | |||
Herero (HZ) | |||
Hiligaynon (HIL) | |||
Hindi (HI) | |||
Hmong Daw (Latin) (MWW) | |||
Ho (Devanagari) (HOC) | |||
Hungarian (HU) | |||
Iban (IBA) | |||
Icelandic (IS) | |||
Igbo (IG) | |||
Iloko (ILO) | |||
Inari Sami (SMN) | |||
Indonesian (ID) | |||
Ingush (INH) | |||
Interlingua (IA) | |||
Inuktitut (Latin) (IU) | |||
Italian (IT) | |||
Japanese (JA) | |||
Jaunsari (Devanagari) (JNS) | |||
Javanese (JV) | |||
Jola-Fonyi (DYO) | |||
Kabardian (KBD) | |||
Kabuverdianu (KEA) | |||
Kachin (Latin) (KAC) | |||
Kalenjin (KLN) | |||
Kalmyk (XAL) | |||
Kangri (Devanagari) (XNR) | |||
Kanuri (KR) | |||
Karachay-Balkar (KRC) | |||
Kara-Kalpak (Cyrilic) (KAA-CYR) | |||
Kara-Kalpak (Latin) (KAA) | |||
Kashubian (CSB) | |||
Kazakh (Cyrilic) (KK-CYR) | |||
Kazakh (Latin) (KK-LATN) | |||
Khakas (KJH) | |||
Khaling (KLR) | |||
Khasi (KHA) | |||
K'iche' (QUC) | |||
Kikuyu (KI) | |||
Kildin Sami (SJD) | |||
Kinyarwanda (RW) | |||
Komi (KV) | |||
Kongo (KN) | |||
Korean (KO) | |||
Korku (KFQ) | |||
Koryak (KPY) | |||
Kosraean (KOS) | |||
Kpelle (KPE) | |||
Kuanyama (KJ) | |||
Kumyk (Cyrilic) (KUM) | |||
Kurdish (Arabic) (KU-ARAB) | |||
Kurdish (Latin) (KU-LATN) | |||
Kurukh (Devanagari) (KRU) | |||
Kyrgyz (Cyrilic) (KY) | |||
Lak (LBE) | |||
Lakota (LKT) | |||
Latin (LA) | |||
Latvian (LV) | |||
Lezghian (LEX) | |||
Lingala (LN) | |||
Lithuanian (LT) | |||
Lower Sorbian (DSB) | |||
Lozi (LOZ) | |||
Lule Sami (SMJ) | |||
Luo (Kenya and Tanzania) (LUO) | |||
Luxembourgish (LB) | |||
Luyia (LUY) | |||
Macedonian (MK) | |||
Machame (JMC) | |||
Madurese (MAD) | |||
Mahasu Pahari (Devanagari) (BFZ) | |||
Makhuwa-Meetto (MGH) | |||
Makonde (KDE) | |||
Malagasy (MG) | |||
Malay (Latin) (MS) | |||
Maltese (MT) | |||
Malto (Devanagari) (KMJ) | |||
Mandinka (MNK) | |||
Manx (GV) | |||
Maori (MI) | |||
Mapundungun (ARN) | |||
Marathi (MR) | |||
Mari (Russia) (CHM) | |||
Masai (MAS) | |||
Mende (Sierra Leone) (MEN) | |||
Meru (MER) | |||
Meta' (MGO) | |||
Minangkabau (MIN) | |||
Mohawk (MOH) | |||
Mongolian (Cyrilic) (MN) | |||
Mongondow (MOG) | |||
Montenegrin (Cyrilic) (CNR-CYRL) | |||
Montenegrin (Latin) (CNR-LATN) | |||
Morisyen (MFE) | |||
Mundang (MUA) | |||
Nahuatl (NAH) | |||
Navajo (NV) | |||
Ndonga (NG) | |||
Neapolitan (NAP) | |||
Nepali (NE) | |||
Ngomba (JGO) | |||
Niuean (NIU) | |||
Nogay (NOG) | |||
North Ndebele (ND) | |||
Northern Sami (Latin) (SME) | |||
Norwegian (NO) | |||
Nyanja (NY) | |||
Nyankole (NYN) | |||
Nzima (NZI) | |||
Occitan (OC) | |||
Ojibway (OJ) | |||
Oromo (OM) | |||
Ossetic (OS) | |||
Pampanga (PAM) | |||
Pangasinan (PAG) | |||
Papiamento (PAP) | |||
Pashto (PS) | |||
Pedi (NSO) | |||
Persian (FA) | |||
Polish (PL) | |||
Portuguese (PT) | |||
Punjabi (Arabic) (PA) | |||
Quechua (QU) | |||
Ripurian (KSH) | |||
Romanian (RO) | |||
Romansh (RM) | |||
Rundi (RN) | |||
Russian (RU) | |||
Rwa (RWK) | |||
Sadri (Devanagari) (SCK) | |||
Sakha (SAH) | |||
Samburu (SAQ) | |||
Samoan (Latin) (SM) | |||
Sango (SG) | |||
Sangu (Gabon) | |||
Sanskrit (Devanagari) (SA) | |||
Santali (Devanagari) (SAT) | |||
Scots (SCO) | |||
Sena (SEH) | |||
Serbian (Cyrilic) (SR-CYRL) | |||
Serbian (Latin) (SR, SR-LATN)) | |||
Shambala (KSB) | |||
Shona (SN) | |||
Siksika (BLA) | |||
Sirmauri (Devanagari) (SRX) | |||
Skolt Sami (SMS) | |||
Slovak (SK) | |||
Slovenian (SL) | |||
Soga (XOG) | |||
Somali (Arabic) (SO) | |||
Somali (Latin) (SO-LATN) | |||
Songhai (SON) | |||
South Ndebele (NR) | |||
Southern Altai (ALT) | |||
Southern Sami (SMA) | |||
Southern Sotho (ST) | |||
Spanish (ES) | |||
Sundanese (SU) | |||
Swahili (Latin) (SW) | |||
Swati (SS) | |||
Swedish (SV) | |||
Tabassaran (TAB) | |||
Tachelhit (SHI) | |||
Tahitian (TY) | |||
Taita (DAV) | |||
Tajik (Cyrilic) (TG) | |||
Tamil (TA) | |||
Tatar (Cyrilic) (TT-CYRL) | |||
Tatar (Latin) (TT) | |||
Teso (TEO) | |||
Tetum (TET) | |||
Thai (TH) | |||
Thangmi (THF) | |||
Tok Pisin (TPI) | |||
Tongan (TO) | |||
Tsonga (TS) | |||
Tswana (TN) | |||
Turkish (TR) | |||
Turkmen (Latin) (TK) | |||
Tuvan (TYV) | |||
Udmurt (UDM) | |||
Uighur (Cyrilic) (UG-CYRL) | |||
Ukranian (UK) | |||
Upper Sorbian (HSB) | |||
Urdu (UR) | |||
Uyghur (Arabic) (UG) | |||
Uzbek (Arabic) (UZ-ARAB) | |||
Uzbek (Cyrilic) (UZ-CYRL) | |||
Uzbek (Latin) (UZ) | |||
Vietnamese (VI) | |||
Volapuk (VO) | |||
Vunjo (VUN) | |||
Walser (WAE) | |||
Welsh (CY) | |||
Western Frisian (FY) | |||
Wolof (WO) | |||
Xhosa (XH) | |||
Yucatec Maya (YUA) | |||
Zapotec (ZAP) | |||
Zarma (DJE) | |||
Zhuang (ZA) | |||
Zulu (ZU) |
Arabic characters | 'ا','ب','ة','ت','ث','ج','ح','خ','د','ذ','ر','ز','س','ش','ص','ض','ط','ظ','ع','غ','ـ','ف','ق','ك','ل','م','ن','ه','و','ى','ي','ٓ','ٔ','ٕ','٠','١','٢','٣','٤','٥','٦','٧','٨','٩','٪','٫','٬','٭','ٱ','۔','ً','ٌ','ٍ','َ','ُ','ِ','ّ','ْ','ٰ','ۥ','ۦ','آ','،','؛','؟','ء','أ','ؤ','إ','ئ' |
Supported OCR characters | ! " # $ % & \ ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ \ ] ^ _ a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ £ ¥ § © ® ° ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý ß à á â ã ä å æ ç è é ê ë ì í î ï ñ ò ó ô õ ö ø ù ú û ü ý Ā ā Ă ă Ą ą Ć ć Ċ ċ Č č Ď ď Đ đ Ē ē Ė ė Ę ę Ě ě Ğ ğ Ġ ġ Ħ ħ Ī ī Ĭ ĭ Į į İ ı Ĺ ĺ Ľ ľ Ł ł Ń ń Ň ň Ŋ ŋ Ō ō Ő ő Œ œ Ŕ ŕ Ř ř Ś ś Š š Ť ť Ŧ ŧ Ū ū Ŭ ŭ Ů ů Ų ų Ź ź Ż ż Ž ž Ə Ǵ ǵ Ș ș Ț ț ə μ א ב ג ד ה ו ז ח ט י ך כ ל ם מ ן נ ס ע ף פ ץ צ ק ר ש ת ₪ € ≤ ≥ |