communications-mining
latest
false
- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create or delete a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend dataset settings
- Delete messages via the UI
- Delete a dataset
- Export a dataset
- Using Exchange Integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Create a new dataset

Communications Mining User Guide
Last updated Feb 10, 2025
Create a new dataset
User permissions required: ‘Modify datasets’.
To create a new dataset:
Go to the datasets page and select New Dataset which reveals a modal to create the new dataset.
New Dataset modal

Complete the form with all the relevant information, then select Next to progress through each step:
- Add the title in the Dataset Name field, to provide more information on the dataset that you create.
- Give the dataset a descriptive name under the API Name field, using hyphens instead of spaces - e.g. zendesk-cs-chats.
- From the drop-down menu, select the Project that the dataset should be in. You can assign the dataset to any of the projects that you are a member of.
- Use the toggle to enable or disable the Use
generative AI features option. The features
provide design-time and run-time capabilities,
which use third-party generative AI models. These
significantly improve time to value with features
such as Generative annotation.
If you use the preview generative extraction mode, this uses a third-party LLM, and requires you to turn on the toggle for Use preview generative extraction model.
Note: To disable third-party LLMs, make sure you turn off the Use Generative AI Features toggle for a dataset. - Select an Existing source from the drop-down list or add a new one. To add a new data source:
- Tick the New source radio button.
- Enter the Source Name and API Name. You can't change the API name once added.
-
Note: You can add a new source only if you have the Source Admin permission.
-
Note: If you have the Tenant Admin permission, you can create a new project.Select the Create new option from the drop-down list:
- Add the new project's Title and API name, then select Save.
Note:
- Once you add a new project, you are automatically set as the project founding user. This terminology will soon change to project owner, and you will have all the permissions in that project.
- Set the Model language(s)
- Confirm the model language, i.e. English that matches the language of your data. If you select a Multilingual model, see the Multilingual sources and datasets page for more details.
- Define labelsChoose a Dataset by selecting Import from a dataset option from the Import labels drop-down list. This copies labels and descriptions only from an existing dataset. To copy an entire dataset, select Duplicate from the Datasets page.
- Add Additional settings
- Add any pre-trained labels to your dataset. Some examples could include chaser, urgent, out of office, etc. You do not have to enable any during the dataset creation, and you can always enable them later in the dataset settings page as well.
- Set the sentiment and language(s) of the dataset:
- Enable or disable sentiment analysis - with sentiment analysis enabled every label in the taxonomy has an associated positive or negative sentiment, visit the Enabling sentiment on a dataset page, to understand why you would or wouldn't enable it.
Select Create to create the dataset.
Note:
- You can add up to 20 individual sources to a dataset in the GUI.
- Sources can sit in a different project to a dataset. As long as users have the appropriate permissions in each project, they will be able to see the messages and annotate as usual.
- If there are multiple sources in a dataset, they should share a similar intended purpose for your analysis.