knime
knime
KNIME
KNIME
KNIME stands for Konstanz Information MinEr Pronounced as “naim”
Developed by KNIME AG located in Zurich and the group of Michael
Berthold at the University of Konstaz, Chair for Bioinformatics and
Information Mining
WHAT IS KNIME
A tool for data analysis, manipulation, visualization and
reporting
Based on Graphical User Interface
Popular for its flexibility and ability to integrate with various data
sources and tools, including
Databases
R
Python
Key Features and
Benefits of Knime
Linux
Extract the downloaded tarball to a location of desired choice.
Run the knime executable to start KNIME Analytics Platform
Mac
Double click the downloaded dmg file and wait for the verification to finish
Then move the KNIME icon to Applications.
Double click the KNIME icon in the list of applications to launch KNIME Analytics
Platform
KNIME Workbench
Components
Welcome page
KNIME Explorer
Workflow Editor & Nodes
Workflow Coach
Node Repository
KNIME Hub Search
Description
Outline
console
KNIME Workbench
Workflow Editor (Workspace) –
Central Space where workflow is
designed
Node Repository – Panel where
Nodes are available
Console – Debugging Tool, gives
feedback on the workflow status
and any error messages
Outline – Overview of workflow
Structure
Node Description- Gives summary
of the selected node in “Workflow
Editor” or “Node Repository”
Explorer – Panel shows the list of
workflow available in the selected
Basic go through KNIME
Workbench
KNIME Components and
Terminology
Node
Building Blocks of KNIME Workflow.
Represents a Specific operation or
analysis Step
Port
Nodes have input and output port
Input port receives data and output port
sends data to other nodes
Data flows between nodes through these
ports
Workflow
Sequence of node connected to each
other
Represents entire data analysis program
KNIME Components and
Terminology
Data Table
Data in KNIME is represented as a
tabular data structure
Each Row is a data point
Each Column is a feature or
Attribute
KNIME Components and
Terminology
Connectors: Different Connectors
Lines that link the output
port of one node to the
input port of another,
defining the flow of data
within a workflow.
Meta Node
A container that allows to
group nodes and create Meta Node
reusable sub-workflows
simplifies the
visualization of complex
workflows
KNIME Components and
Terminology
Variable
used to store and manage data or
values within a workflow
can be created, modified, and
used in various nodes
Workflow Variable:
Variables specific to a workflow and
can be used to pass data or values
between nodes within the same
workflow
Dataflow in KNIME
Dataflow defines how data
is processed and
transformed as it moves
through the workflow.
Keypoints about data flow
in KNIME include:
Input and Output Ports
Connectors
Data Table
Data Transformation
Workspace
The folder where all current
workflows and preferences are
saved for the next KNIME Session
The folder where all current
workflows and preferences are
saved for the next KNIME Session
By default, the workspace folder is
“…\knime-workspace”
Can be changed, by the changing
the path in the “Workspace
Launcher” window, before starting
the KNIME working session
Exercise-1 Create Workspace
Launch KNIME
In the Workspace
Launcher Window, Click
“Browse”
Select the path for the
new workspace
Create “Test Workspace”
KNIME Workfl ow
KNIME Analytics platform does not work
with Scripts, but with graphical workflows
Each step of the data analysis is
implemented and
executed through a little box called
“Node”
Sequence of Nodes makes a workflow
An analysis flow in graphics, having the
following process:
Step 1: Read data
Step 2: Clean Data
Step 3: Filter Data
Step 4: Train a model
Workflows in KNIME are graphs
File Extensions: .knwf
and .knar files
Knime workflows can be package
and exported in “.knwf” or “.knar”
files
A “.knwf” file contains only one
workflow
A “.knar” file contains a group of
workflows
A double click opens the workflow
inside KNIME Analytics Platform
Workflow Configuration
And Execution
1. Node Configuration
2. Variable Assignment
3. Execution Control
4. Monitoring Execution
5. Workflow Results
Building a Basic Workflow
Launching KNIME
Creating New Workflow
Go to the “File”
menu
Select “New
Workflow” – Creates
a new canvas to
design workflow
Building a Basic
Workflow
Adding Nodes – Workflows in KNIME
are built by adding nodes, by dragging
and dropping onto the Canvas from the
Node Repository
Connecting Nodes – Nodes are
connected using connectors. Output of
one node is connected to the input of
the next node.
Configuring Node – Double click or
Right Click on a node to open its
configuring node.
Running the Node – To execute the
workflow, Click the “Run” button on the
toolbar
Visual KNIME Workflows
What is KNIME
Extensions
KNIME Extensions Fast, flexible way to
extend your data science platform.
Open source extensions provide
additional functionalities such as access
to and processing of complex data types,
as well as the addition of advanced
machine learning algorithms.
Install Extensions
From the top menu, select
“File Install KNIME Extensions
->Select
-KNIME Math Expression
extension (JEP)
-KNIME External Tool Node
-KNIME Report Designer
->Click “Next”
Exercise - 2
01 02
Select From the Top right corner options, select Menu “Install Extensions”
01 02 03 02
In Space Explorer
Click on the black
button with three dots
Click “Create
Workflow”
Save a Workfl ow
Saving the workflow saves the
workflow architecture, the
node’s configuration, and the
data produced at the output of
each node.
Click the disk icon on the Top
Menu
To save the copy of the
currently selected workflow,
Click “Save as..”
To save ALL open workflows,
Click “Save ALL” stack of
disks icon.
Delete a Workfl ow
-Right Click the workflow in the
“KNIME Explorer”
-Select Delete
-Confirm Delete
Import/Export Workflow
Steps :
Data
Exploration
Read Data
❑Steps :
Math Formula
Valuates a mathematical
expression based on the values in
a row
Computed results can be either
appended as new column or be
used to replace an input column
Data Transformation
String Manipulation
Node
Manipulates strings
like search and
replace, capitalize
or remove leading
and trailing white
spaces.
Data Aggregation
Nodes like "GroupBy" to perform
aggregation operations on the , such as
calculating sums, averages, or counts.
Groups the rows of a table by the unique
values in the selected group columns.
Data Imputation
Rule Engine:
Takes a list of user-defined rules and
tries to match them to each row in the
input table. If a rule matches, its
outcome value is added into a new
column.
Exercise 6
Extract details of persons born outside “United-States” in CSV Form
Extract details of persons born outside “United-States” in CSV Form
Read the file adult data
Rename column “fnlwgt to Final Weight”
Remove Column “Final Weight”
Remove Row Containing “ United-States”
Write the data to “CSV file” named “Born outside US”
DATA INTEGRATION AND
TRANSFORMATION
Involves combining data from various sources,
reshaping it and preparing it for analysis. Some nodes
used for the purpose are:
Joining Data : Combine data from multiple tables
based on common keys or criteria
Pivoting and Unpivoting : Help to reshape data
from wide to long format or vice versa
Data Sampling : Used to select a subset of data for
analysis
Data Normalization and Scaling : Normalize and
Scale the data to prepare it for machine learning
algorithms.
Text Mining and NLP: Supports text data processing
and Natural Language Processing
EXPORTING DATA FROM
KNIME
Data Export Node: To Export Data Visualisation: Nodes like Model Deployment: Machine Data Reports: Reports with
Data, nodes like “CSV Writer”, “Bar Chart”, “Pie learning models in KNIME can customised layouts and
“Excel Writer” or “Database Chart”,”Heatmap” etc are be exported for deployment export them in various
Writer” is used depending on used to create charts, plots. in production environment. formats such as PDF or HTML.
the nature of output
Exercise 7
Objective: To do data Visualization
Create a workflow “Exercise 7” under the workflow group “Exercises”
Read Data Iris
Name the columns : Sepal length, Sepal Width, Petal Length, Petal
Width, Class
Classify the flower types to” Iris-setosa”, “Iris-versicolor” and “Iris-
Virginca” to Class 1, Class 2, Class 3 respectively
Split the contents of “Class” into three columns
Joining two columns
Converting the contents of a column to upper case
Replacing the word “Iris” to “Flower”
Creating “Bins” based on Sepal Length
Grouping the “Class” based on Bins
Create Bar Chart and Scatter Plot
Connecting to Big Data
Sources
KNIME provides various connectors and integrations to
connect to Big data platforms:
Big Data Nodes Used Nodes
Platform
Hadoop HDFS Connector
Distributed HDFS File Picker
File System
(HDFS)
Apache Spark Spark Reader- to load
data from Spark data
frames
Spark SQL: Querying
and Manipulating Data
Connecting to Big Data
Sources
Big Data Platform Nodes Used Nodes
Apache Hive Hive Connector
Hive Table
Selector
Big Databases like
HBase, Cassandra
Other Sources: Amazon S3
Amazon, S3, connector
Google Cloud Google Cloud
Storage, Azure Storage Connector
Blob Storage
Handling Big Data in
KNIME
Data Sampling
KNIME's sampling capabilities to work with a representative subset of the
data for initial exploration and modeling
Distributed Computing
Processes databases in parallel, which improves processing by utilizing
multiple processing nodes.
Data Chunking
To prevent memory constraints, KNIME can process data in smaller,
manageable chunks.
Data Compression
Data Compression techniques are employed to reduce storage requirements
and optimize data transfer between nodes an across the network.
In – Database Processing
Data remines within the database for analysis , thus minimizes data
movement and enhances performance.
Big Data’s Data Sampling
workflow
Data Chunking Node with
workflow
Data Compression Node with
workflow
01 02 03 02
In-Database Processing
DISTRIBUTED DATA
PROCESSING
Distributed data processing is a key capability in
KNIME for efficiently analysing and processing
Big Data: