Knowledge BaseMetric InsightsMetrics & ReportsData CollectionHow do "Max Concurrent Threads" (Data Collection) and "Threads per Trigger execution" (Data Source) work together?

How do "Max Concurrent Threads" (Data Collection) and "Threads per Trigger execution" (Data Source) work together?

Question

How do the settings Max Concurrent Threads (Data Collection Trigger setting) and Threads per Trigger execution (Data Source setting) work together?

Answer

The data collection process for an element against a data source is executed via threads. This process may be single-threaded or across multiple threads. Threads are executed simultaneously and independently from each other. As seen in the images above, the maximum number of threads that get executed can be configured in the Data Collection Trigger and Data Source Editors. To understand how they both work in tandem, let's first go over each setting.

 

Max Concurrent Threads

By default, the Max Concurrent Threads field is left blank for Data Collection Triggers. The system treats this blank state as “unlimited number of threads allowed to be executed upon run.” Generally you can leave this setting as is, however, if you have many elements collecting data for a given time period, you may need to limit the number of threads executed to lessen the load on the system.

For example, by setting the option to 4 means the trigger can only collect data for 4 elements at a time (one thread for each element). If there are 12 elements to be processed, then the trigger will collect data for 4 elements at a time (4, 4, 4). Comparatively, if you set this option to 1 for the same trigger, then the trigger will collect data for 1 element at a time, taking longer to complete.

Setting this option to 1 might be beneficial unless each element returns a large amount of data (thousands of rows) during data collection. In short, setting this option requires a bit of fine-tuning and understanding what the elements are collecting.

 

Threads per Trigger execution

The Threads per Trigger execution at the Data Source level functions the same way: to allow some number of threads to be executed at any given time for that trigger. However, whereas a blank setting for Data Collection Triggers equals "unlimited number of threads," for the Data Source, a blank setting equals "1 thread only”

 

So, what happens when both settings have different values? (see Diagram 1 below)

Let's assume you have data collection trigger T which collects data for a set of elements. All these elements source data from Data Source 1 (DS1) and Data Source 2 (DS2). For this example, DS1 is set to use 3 Threads per Trigger execution, while DS2 is set to use 4 threads. That means T will spawn 3 + 4 = 7 threads upon data collection run.

Now assume T is set to use 2 Max Concurrent Threads. That means only 2 (out of the 7 possible) threads can be active at any given time. While 2 threads are open, the other 5 are on standby. When one of the two active threads completes, an opening appears and the next thread in line is executed. This pattern continues until all elements have been collected.

Note, these collection jobs are all managed by a thread manager. This manager makes sure that:

  1. All items (elements, calendars, datasets, etc.) are being grouped and updated in a specific order 
  2. The number of allowed threads defined at the Data Source level does not exceed the number of maximum allowed threads on the Data Collection Trigger side. This is important because if left blank, again, the Data Source defaults to 1 thread but unlimited for the Trigger.

The thread manager compares the sum of allowed threads for all data sources involved (Total) with the number of maximum allowed threads per data collection trigger (Max). If the Total exceeds the Max, the thread manager allows the Max number of threads to run concurrently. If the Total is less than the Max, then the thread manager allows only the Total number of threads to run.

To paint this example again (see Diagram 2 below), let's say the thread setting for Data Sources and Data Collection Triggers are left blank (default settings). Now, let's assume you have 6 elements which use the same trigger but gets data from 3 different data sources (DS1, DS2, DS3). With the default thread settings this means each data source allows a maximum of 1 thread each per run while the trigger allows unlimited threads

The thread manager groups the 6 elements, prioritizes them by type, then checks how many threads are allowed for each data source. With the default settings and 3 data sources, this means 3 threads are allowed to run concurrently (1 thread x 3 datasources = 3 total threads). That means that at any given time during the collection, only 3 elements are updated at a time, with the other elements forming a queue.

 

Metric Insights recommends leaving these fields empty to start with. Then, based on system specs, the number of elements to collect data for, and dataset size, the number of threads allowed per run can be adjusted over time.