KNIME Data Preparation Short Course
KNIME Data Preparation Short Course
Many of the techniques and processes of data analytics have been automated into mechanical processes and algorithms that work over raw data
for human consumption.
Data analytics techniques can reveal trends and metrics that would otherwise be lost in the mass of information.
This information can then be used to optimize processes to increase the overall efficiency of a business or system.
Implementing it into the business model means companies can help reduce costs by identifying more efficient ways of doing business and by
storing large amounts of data.
A company can also use data analytics to make better business decisions and help analyze customer trends and satisfaction, which can lead to
new—and better—products and services.
Data Collection
Data Cleaning
Data Analysis
Communication
Data often stored in various location, may not be structured and may
contain irrelevant information.
In this step, you’ll begin to slice and dice your data to extract
meaningful insights from it.
Using the techniques and methods of data analysis, you’ll look for
hidden patterns and relationships, and find insights and predictions.
The biggest obstacle in data analytics is getting clean and correct data.
Data analytics and data scientists spend up to 80% of their time in cleaning data.
How to access all of them and prepare the data for visual analytics.
Build Data Science Workflows Blend Data from Any Source Leverage Machine Learning & AI
One single, open-source data analytics ● Open and Combine ● Build machine learning models
tools ● Connect to a host ● Optimize model performance
● Access and Retrieve Data ● Validate Models
● Explain machine learning models
Shape Your Data Discover and Share Data Insights Scale Execution with Demands
● Derive Statistics ● Visualize your Data ● Build workflow prototypes
● Aggregate, sort, filter, and join ● Display Summary Statistics ● scale workflow performance
● Cleaning ● Export Reports ● Exercise the power of in-database
● Extract and Select Features ● Store Processed Data processing
Download the archive and extract the file, or download the installer
package and run it.
● The Explorer toolbar on the top has a search box and buttons to:
select the workflow displayed in the active editor
refresh the view
Dedicated file extensions for workflows and workflow groups associated with
KNIME Analytics Platform.
It can also be set to use personal and local group usage statistics.
Console view prints out error and warning messages about what is going
on under the hood.
Data is transferred over a connection from an out-port to the in-port(s) of other nodes.
Nodes can be added by drag and drop from the Node Repository
to the Workflow Editor
The result of the node’s operation on the data is provided at the out-port to successor nodes.
Examples:
● (LOCAL, , C:\Users\username\Desktop)
● (RELATIVE, knime.workflow, file1.csv)
● (MOUNTPOINT, MOUNTPOINT_NAME, /path/to/file1.csv)
● (CONNECTED, amazon-s3:eu-west-1, /mybucket/file1.csv)
● Files in a folder
Supported operations
● Column filtering
● Column sorting
● Column renaming
● Column type mapping
● Select between union or intersection of columns (in
case of reading many files)
Clicking the header of a data column allows to sort the data rows in an ascending / descending order.
Right-clicking the header of a data column allows to visualize the data using specific renderers.
For Double/Integer data, for example, the “Bars” renderer displays the data as bars with a proportional length to their value and on a red/green
heatmap.
Add and remove node ports based on your needs, e.g. in order to concatenate
three or more tables.
Allows for:
● Concatenation of multiple files/tables
● Column filtering
● Column sorting
● Column renaming
● Column type mapping
Outputs:
● Top port: Resulting joined table
● Middle port: Unmatched rows from the left input table (top input port)
● Bottom port: Unmatched rows from the right input table (bottom input port)
Row filtering with include and exclude options according to certain criteria:
● Select rows by attribute value (pattern matching)
○ Value matching: column value matching some predefined
pattern value
○ Range checking for numerical columns: column value above or
below a given value
○ Missing Value Matching
● Select rows by row number
● Select rows by RowID (pattern matching on RowID)
Row-wise calculations.
Some column-wise statistics.
Many mathematical functions.
Double-click function, then select column by click.