AI Center User Guide

DELIVERY:

Last updated Jun 6, 2024

Use Custom NER With Continuous Learning

Background information

This example is used to extract chemicals by the category mentioned in research papers. By following the procedure below you will extract the chemicals and categorize them as ABBREVIATION, FAMILY, FORMULA, IDENTIFIER, MULTIPLE, SYSTEMATIC, TRIVIAL and NO_CLASS.

Prerequisites

This procedure uses the Custom Named Entity Recognition package. For more information on how this package works and what it can be used for, see Custom Named Entity Recognition.

For this procedure, we have provided sample files as follows:

Pre-labeled training dataset in CoNLL format. You can download it from here.
Pre-labeled test dataset. You can download it from here.
Sample workflow for extracting categories of chemicals mentioned in research papers. You can download it from here.
Note: Make sure that the following variables are filled in in the sample file:
- in_emailAdress - the email address to which the Action Center task will be assigned to
- in_MLSkillEndpoint - public endpoint of the ML Skill
- in_MLSkillAPIKey - API key of the ML Skill
- in_labelStudioEndpoint - optional, to enable continuous labeling: provide import URL of a label studio project

Procedure

Follow the procedure below to extract chemicals by their category from research papers.

Import the sample dataset in AI Center by going to the Datasets menu and uploading the train and test folder from the sample.
Select the desired Custom NER package from ML Packages > Out of the Box Packages > UiPath Language Analysis and create it.
Go to the Pipelines menu and create a new full pipeline run for the package created in the previous step. Point to the training and the test dataset provided in the sample file.
Create a new ML Skill using the package generated by the pipeline run from the previous step and deploy it.
Once the skill is deployed, leverage it in the UiPath Studio workflow provided. To enable capturing data with weak predictions, deploy a Label Studio instance and provide the instance URL and API key in the Label Studio activity in the workflow.

Getting started With Label Studio

To get started with Label Studio and export data to AI Center, follow the instructions below.

Install Label Studio on your local machine or cloud instance. To do so, follow the instructions from here.
Create a new project from the Named Entity Recognition Template and define your Label Names.
Make sure that the label names have no special characters or spaces. For example, instead of Set Date, use SetDate.
Make sure that the value of the <Text> tag is "$text".
Upload the data using the API from here.
cURL request example:
```
curl --location --request POST 'https://<label-studio-instance>/api/projects/<id>/import' \)\)
--header 'Content-Type: application/json' \)\)
--header 'Authorization: Token <Token>' \)\)
--data-raw '[
    {
      "data": {
        "text": "<Text1>"
      },
    },
    {
      "data": {
        "text": "<Text2>"
       }
    }
]'curl --location --request POST 'https://<label-studio-instance>/api/projects/<id>/import' \)\)
--header 'Content-Type: application/json' \)\)
--header 'Authorization: Token <Token>' \)\)
--data-raw '[
    {
      "data": {
        "text": "<Text1>"
      },
    },
    {
      "data": {
        "text": "<Text2>"
       }
    }
]'
```
Annotate your data.
Export the data in CoNLL 2003 format and upload it to AI Center.
Provided the Label Studio instance URL and API key in the provided sample workflow in order to capture incorrect and low confidence predictions.