How To Use The Data Profiler
Profiling data helps to identify data quality issues that will affect reporting and analysis. It can also help to identify gaps in the data, cardinality, number of distinct values, and more useful information to help get a better understanding of the data at each stage of the workflow.
This article shows how to use the Data Profiling tools available on each table in a workflow.
1. Clicking any node (input, output, or other data node) will populate the area below the canvas with information from the table at that node. Select the “Data profiler” tab and then click on “Add Column.”
2. Select the columns you want to profile, then click “OK”.
3. A profile of each of the columns selected is now shown.
4. You can close the column's overview by clicking on the "Properties" title. In case there are many values for a respective column, you will see the pagination at the bottom. You can use it to scroll through the data.
5. Under the properties table, on the right side, there's a dropdown menu with the value "Count Desc." Click on it to find four options:, ascending/descending count, and ascending/descending value, which will change how data is shown in that area.
For String data types, the profiler is not case sensitive. “Elephant” and “elephant” are both considered to be the same value. Select the “Case Sensitive” checkbox to have them profiled separately.
6. Date columns can be profiled by date intervals. By default, columns are profiled by Days. Select the label on the left to change the interval ranging from seconds to years.
7. The boxes' order can be changed if you click on the title, "sourceMedium" for example, and drag and drop it to another slot.
Once configured, the profiled columns will remain available at the selected node until the data at that node has changed.