- Release notes
- Before you begin
- Getting started
- Integrations
- Working with process apps
- Working with dashboards and charts
- Working with process graphs
- Working with Discover process models and Import BPMN models
- Showing or hiding the menu
- Context information
- Export
- Filters
- Sending automation ideas to UiPath® Automation Hub
- Tags
- Due dates
- Compare
- Conformance checking
- Root cause analysis
- Simulating automation potential
- Triggering an automation from a process app
- Viewing Process data
- Creating apps
- Loading data
- Customizing process apps
- Publishing Dashboards
- App templates
- Additional resources
Process Mining
Performance characteristics
The response time of Process Mining apps is determined by many factors. However, in general the following principle holds:
-
Less data equals faster execution
In Process Mining, there are two areas that have different performance characteristics: data runs to load the data, and dashboards to view the data.
In Process Mining, each process app has a development stage and a published stage. If your intended app requires a large dataset, then it is advised to use a smaller dataset (<10M records) for developing the data transformations and dashboards.
The development dataset is used for testing the data transformations. It does not affect the data displayed in the dashboards of the published process app. Once your app is ready to be used by business users, you can publish the app and ingest new data for use in the published process app.
A common scenario is to use a dataset with a shorter timeframe for development, for example, only 100k events in a time window of 2 weeks. While publishing, a larger dataset, for example, spanning 12 months, can be used.
Data runs in Process Mining are triggered in the following use cases:
-
Creating an app
-
Uploading data
-
Triggering Apply to dashboards , Run all , or Run file in the Data transformations editor.
-
Publishing an app, that has changes in data transformations.
A data run typically consists of the following steps, that each have different performance characteristics:
When uploading data, the overall size of the uploaded data on disk is the most important factor for the speed. Check out Loading data. The factors affect the performance are:
-
The number of tables;
-
The number of records in the tables;
-
The number of columns in the tables;
-
The data in the tables. For example, a multi-line description column is slower, than a simple Boolean column.
Data transformations change the input data into the data model that is required for the dashboards. Check out Data transformations.
.sql
file in the transformations runs an additional SQL query. The following factors affect the speed of data transformations:
-
The number of
.sql
files; -
The number of records in each table;
-
The number of columns in each table;
-
The complexity of the SQL query: joins conditions, number of Common Table Expressions (CTEs), expressions in the SQL query.
The data model determines the set of tables that is exposed to the dashboards. During a data run, tests are run to validate the structure of these tables in the data model. However, the most time consuming part is the pre-computations that are done to speed up viewing dashboards later.
The overall speed of this step is determined by:
-
The number of tables in the data model;
-
The relation between the output tables;
-
The number of columns in the output tables
-
The number of records in the output tables.
The last part of a data run is running pre-computations to speed up the process graph.
-
The number of variants;
-
The number of events.
If you use an import BPMN model to display the process, then the complexity of the BPMN model also affects performance. The more activities and edges there are, the slower the computations.
Reduce the data volume
To improve the speed of uploading data, reduce the size of your data to the minimum that is needed. This advise holds for all stages of the data:
-
Only extract the input data required;
-
Only transform data that is needed;
-
Only add tables to the data model if it is required for data analysis.
The easiest way to do this is usually to decrease the time window used for data extraction, as that reduces the number records for most data tables from input to transformation to output.
The earlier you can reduce data size the more efficient:
-
Filter
sql
files as early as possible in your data transformations, or if possible in your data extraction. -
For development, typically a smaller dataset is used, to speed up testing queries, refer to Development versus production data.
Reduce data tables and columns
Additionally, take care to only load columns that are actually used. The earlier in the process they can be left out, the better.
-
Reduce the set of extracted data columns to what is needed.
-
Remove any
.sql
file that’s not required for the output data model. -
Remove any unnecessary data columns in the queries.
-
Remove any unnecessary activities from the set of events.
Reduce complexity
The more complicated the calculations in the data transformations, and the data model are the slower data runs will be. Reducing complexity can be a challenge, but can have a major impact on the data run time.
-
Reduce the complexity of the SQL statements where possible, check out Tips for writing SQL .
-
Reduce the data in the data model to the data required for data analysis. Any tables or columns that are not needed for data analysis should be removed.
-
If you are using an import BPMN model for displaying the process, then keeping the number of activities and edges low, will improve the performance.
In general, dashboard loading times are affected by the amount of data used by the charts and the metrics that are calculated.
Every time a dashboard is loaded in Process Mining, each chart is computed in parallel. The speed of loading a chart is affected by the following factors:
-
The number of metrics shown in the chart.
-
For each metric, the join size required to calculate the metric matters. This is determined by the table used for grouping a chart, combined with the table of the metric.
-
The complexity of the relation between those two tables.
-
The distance between these two tables in the data model.
-
-
The data type of the fields used. Numerical fields are faster than text fields.
-
The complexity of the metrics itself. Metrics can be based on multiple fields.
Removing any metric that is not required for a chart will speed up loading time.
-
Consider the KPIs shown in the top bar;
-
Consider the metrics shown in your charts. If a chart shows multiple metrics, each of them add additional calculation time.
Simplifying the definition of metrics can also speed up the chart loading time.
-
Consider if you can make metric definition simpler;
-
Consider pre-computing parts of the metric in the data transformations. Any static calculation that is already done before, does not need to be done at runtime.