- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Delete a source
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Understanding the status of your dataset
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Multilingual sources and datasets
Communications Mining™ supports multilingual sources and datasets. This means that the models can understand sources that contain multiple different supported languages, without actually having to translate them.
The languages that are currently in General Availability within multilingual sources and datasets are: English, French, German, Spanish, Italian, Portuguese, Dutch, and Japanese.
If users work and do business in several languages that are supported by the platform, they can train on messages in those languages, rather than translating everything into a single language.
Important considerations when looking to use multilingual sources and datasets:
- If a dataset is multilingual, users will not be able to see translations of any messages (as provided for translated datasets), so they will need to be able to understand all of the languages in the dataset to effectively train their model.
- Understanding multiple languages is a more complex machine-learning problem than understanding a single language, so these datasets may potentially experience a slight drop in performance compared to datasets in a single language.
- The platform supports the following languages: English, French, German, Spanish, Italian, Portuguese, Dutch, and Japanese. If the dataset contains other languages, applying labels used for supported languages may cause confusion. Instead, annotate these instances with language-specific labels. Note that the platform will not process or understand the content of unsupported languages.
How do you create multilingual sources and datasets?
For both data source and datasets, the language family is selected when they are created, and cannot be changed once they are.
Simply select multilingual from the language family drop-down list on the create source or create dataset modal (it's typically the last setting to select).
For more detail on creating a source in the UI, check the Create a data source in the GUI page.
For more detail on creating a dataset, check the Create a new dataset page.
Register on the Insider Portal to provide feedback or raise issues.
We currently support a wide range of additional languages in Preview mode. This means our team will be continuously refining them based on your usage. Many of these languages are expected to perform very well and may require minimal to no fine-tuning to achieve optimal performance.
- Afrikaans
- Albanian
- Amharic
- Arabic
- Armenian
- Assamese
- Azerbaijani
- Basque
- Belarusian
- Bengali
- Bengali (Romanized)
- Bosnian
- Breton
- Bulgarian
- Burmese
- Burmese
- Catalan
- Chinese (Simplified)
- Chinese (Traditional)
- Croatian
- Czech
- Danish
- Esperanto
- Estonian
- Filipino
- Finnish
- Galician
- Georgian
- Greek
- Gujarati
- Hausa
- Hebrew
- Hindi
- Hindi (Romanized)
- Hungarian
- Icelandic
- Indonesian
- Irish
- Javanese
- Kannada
- Kazakh
- Khmer
- Korean
- Kurdish (Kurmanji)
- Kyrgyz
- Lao
- Latin
- Latvian
- Lithuanian
- Macedonian
- Malagasy
- Malay
- Malayalam
- Marathi
- Mongolian
- Nepali
- Norwegian
- Oriya
- Oromo
- Pashto
- Persian
- Polish
- Punjabi
- Romanian
- Russian
- Sanskrit
- Scottish Gaelic
- Serbian
- Sindhi
- Sinhala
- Slovak
- Slovenian
- Somali
- Sundanese
- Swahili
- Swedish
- Swiss German
- Tamil
- Tamil (Romanized)
- Telugu
- Telugu (Romanized)
- Thai
- Turkish
- Ukrainian
- Urdu
- Urdu (Romanized)
- Uyghur
- Uzbek
- Vietnamese
- Welsh
- Western Frisian
- Xhosa
- Yiddish