Data Profiler Block – tutorial transcript
Welcome to the Data Profiler Block tutorial.
This tutorial aims to give you an overview of Data Profiler Block in Analytics Canvas, and to touch on some high-level concepts, such as how to view the output of the Data Profiler Block and how to create custom rules.
The Data Profiler Block can be found in the Block Library. Simply drag and drop it onto the Main Canvas, and connect it to the input data to start the analysis.
By using the Data Profiler Block, data on your Canvas can be sampled and profiled, and the results used just as in any other block. Use this block to produce information about your data, and to monitor data profile trends over time.
The top output of the Data Profiling Block contains the following:
- Profile Set Name – the name of the profiler block; many blocks can be used within one canvas.
- Capture Date – the time stamp of the profiling run.
- Column Name – the name of the column being profiled.
- Null rows – the count of rows that contain the Null value.
- Missing rows – for string types, the count of rows containing the empty string.
- Populated rows – the count of rows that are not Null or missing.
- Completeness – the percentage of rows that are Populated.
- Cardinality – the number of unique values contained in the row.
- Uniqueness – the cardinality divided by the number of populated rows.
- Minimum – the minimum value within the column for numeric columns.
- Maximum – the maximum value within the column for numeric columns.
- Average – the average of all values for numeric columns.
- Earliest Date – the earliest date for date columns.
- Latest Date – the latest date for date columns.
The Data Profiler Block is a great tool with which to explore your data and drill deeper into data quality issues. Custom rules can be created to address specific issues with data. For example, we can explore the format of the column “Transaction Number” to see if it contains letters. We set the rules, and click the “Run” or “Refresh” buttons.
As you can see, there are no letters in the column “Transaction Number”.
- Data Profiler Block
- Data Profiler Block output
You can download a free trial of Analytics Canvas to follow along with the video.