How To Use The Summarize Block
The summarize block allows you to perform aggregations on the data depending on the data type. More than just finding the sum of a column, this block can be used to:
- remove duplicates
- find max and min values in a column
- calculate the sum or average value in a column
- count or count distinct values in a column
- aggregate by date
1. The first step is to drag and drop the Summarize block on to the Canvas.
2. Connect to the data you want to summarize by clicking the output node from the source block, then clicking and dragging to the input node on the Summarize block.
3. Select the block by clicking on the center of it. At the bottom of the screen, you will see the “Configure Rollup and Grouping for Select Columns” area, where you can configure the block. Click “Add Column” to select the columns you want to work with from the data source.
4. Select the columns that you want to add and click “OK.” If you want to add the same column several times, click on the “Add column" again and repeat the process.
5. You can now see the columns added to the table. Under the “Summary Methods” dropdown, which has the value “NoRollup” by default, you will see multiple options based on the data type of that column:
- For string columns, there are five available options: No Roll-up, Count, CountDistinct, Min, Max;
- For numeric columns, those same four, plus Sum and Average;
- For date columns, same as string, plus Year, Month, Year_month, Week, Day, Hour, and Minute.
Select the type of roll-up you want for each column.
6. Under “Output Column Name,” the Canvas automatically generates a name created from the column’s name and the default summary method. If you want to change it, click on the field.
7. A pop-up will appear where you can write the new name. Click “OK” to make the change.
8. You can drag and drop the rows to change the order of the columns.
9. You can change one of the columns by clicking on the name and selecting another one from the list.
10. To remove a column from the summary table, click "Delete".
11. Click on the output node to see the resulting summary table.