0% found this document useful (0 votes)
10 views97 pages

knime

KNIME, or Konstanz Information MinEr, is a flexible open-source tool for data analysis, manipulation, visualization, and reporting, developed by KNIME AG and the University of Konstanz. It features a graphical user interface, modular workflow design, and integration capabilities with various data sources and programming languages like R and Python. The document outlines installation steps, key components, terminology, and functionalities for building workflows, data transformation, and connecting to big data sources.

Uploaded by

qwerty123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views97 pages

knime

KNIME, or Konstanz Information MinEr, is a flexible open-source tool for data analysis, manipulation, visualization, and reporting, developed by KNIME AG and the University of Konstanz. It features a graphical user interface, modular workflow design, and integration capabilities with various data sources and programming languages like R and Python. The document outlines installation steps, key components, terminology, and functionalities for building workflows, data transformation, and connecting to big data sources.

Uploaded by

qwerty123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 97

UNIT-3

KNIME
KNIME
 KNIME stands for Konstanz Information MinEr Pronounced as “naim”
Developed by KNIME AG located in Zurich and the group of Michael
Berthold at the University of Konstaz, Chair for Bioinformatics and
Information Mining
WHAT IS KNIME
 A tool for data analysis, manipulation, visualization and
reporting
 Based on Graphical User Interface
 Popular for its flexibility and ability to integrate with various data
sources and tools, including
 Databases
 R
 Python
Key Features and
Benefits of Knime

OPEN SOURCE MODULAR RICH NODE INTEGRATION SCALABILITY AUTOMATION


WORKFLOW DEPOSITORY
KNIME Server vs KNIME
Analytics Platform
KNIME Installation
and Set up
 Visit the KNIME Website: https://www.knime.com
 Choose the Edition
 KNIME Analytics Platform – Free Desktop Version for individual
users and smaller teams
 KNIME Server – Commercial offering for larger organisations
 Download KNIME Analytics Platform – Latest Version
 Select your operating System - Windows, macOS, or Linux
 Complete Download
 Install KNIME
 Launch KNIME
Download KNIME
Install KNIME
 Windows
 Run the downloaded installer or self extracting archive.
 If zip archive is downloaded, unpack it to a desired location.
 Run knime.exe to start KNIME Analytics Platform

 Linux
 Extract the downloaded tarball to a location of desired choice.
 Run the knime executable to start KNIME Analytics Platform

 Mac
 Double click the downloaded dmg file and wait for the verification to finish
 Then move the KNIME icon to Applications.
 Double click the KNIME icon in the list of applications to launch KNIME Analytics
Platform
KNIME Workbench
Components

 Welcome page
 KNIME Explorer
 Workflow Editor & Nodes
 Workflow Coach
 Node Repository
 KNIME Hub Search
 Description
 Outline
 console
KNIME Workbench
 Workflow Editor (Workspace) –
Central Space where workflow is
designed
 Node Repository – Panel where
Nodes are available
 Console – Debugging Tool, gives
feedback on the workflow status
and any error messages
 Outline – Overview of workflow
Structure
 Node Description- Gives summary
of the selected node in “Workflow
Editor” or “Node Repository”
 Explorer – Panel shows the list of
workflow available in the selected
Basic go through KNIME
Workbench
KNIME Components and
Terminology
 Node
 Building Blocks of KNIME Workflow.
 Represents a Specific operation or
analysis Step
 Port
 Nodes have input and output port
 Input port receives data and output port
sends data to other nodes
 Data flows between nodes through these
ports
 Workflow
 Sequence of node connected to each
other
 Represents entire data analysis program
KNIME Components and
Terminology
 Data Table
 Data in KNIME is represented as a
tabular data structure
 Each Row is a data point
 Each Column is a feature or
Attribute
KNIME Components and
Terminology
 Connectors: Different Connectors
 Lines that link the output
port of one node to the
input port of another,
defining the flow of data
within a workflow.
 Meta Node
 A container that allows to
group nodes and create Meta Node
reusable sub-workflows
 simplifies the
visualization of complex
workflows
KNIME Components and
Terminology
 Variable
 used to store and manage data or
values within a workflow
 can be created, modified, and
used in various nodes
 Workflow Variable:
 Variables specific to a workflow and
can be used to pass data or values
between nodes within the same
workflow
Dataflow in KNIME
 Dataflow defines how data
is processed and
transformed as it moves
through the workflow.
 Keypoints about data flow
in KNIME include:
 Input and Output Ports
 Connectors
 Data Table
 Data Transformation
Workspace
 The folder where all current
workflows and preferences are
saved for the next KNIME Session
 The folder where all current
workflows and preferences are
saved for the next KNIME Session
 By default, the workspace folder is
“…\knime-workspace”
 Can be changed, by the changing
the path in the “Workspace
Launcher” window, before starting
the KNIME working session
Exercise-1 Create Workspace

 Launch KNIME
 In the Workspace
Launcher Window, Click
“Browse”
 Select the path for the
new workspace
 Create “Test Workspace”
KNIME Workfl ow
 KNIME Analytics platform does not work
with Scripts, but with graphical workflows
 Each step of the data analysis is
implemented and
 executed through a little box called
“Node”
 Sequence of Nodes makes a workflow
 An analysis flow in graphics, having the
following process:
 Step 1: Read data
 Step 2: Clean Data
 Step 3: Filter Data
 Step 4: Train a model
 Workflows in KNIME are graphs
File Extensions: .knwf
and .knar files
 Knime workflows can be package
and exported in “.knwf” or “.knar”
files
 A “.knwf” file contains only one
workflow
 A “.knar” file contains a group of
workflows
 A double click opens the workflow
inside KNIME Analytics Platform
Workflow Configuration
And Execution
1. Node Configuration
2. Variable Assignment
3. Execution Control
4. Monitoring Execution
5. Workflow Results
Building a Basic Workflow

Launching KNIME
Creating New Workflow
Go to the “File”
menu
Select “New
Workflow” – Creates
a new canvas to
design workflow
Building a Basic
Workflow
 Adding Nodes – Workflows in KNIME
are built by adding nodes, by dragging
and dropping onto the Canvas from the
Node Repository
 Connecting Nodes – Nodes are
connected using connectors. Output of
one node is connected to the input of
the next node.
 Configuring Node – Double click or
Right Click on a node to open its
configuring node.
 Running the Node – To execute the
workflow, Click the “Run” button on the
toolbar
Visual KNIME Workflows
What is KNIME
Extensions
 KNIME Extensions Fast, flexible way to
extend your data science platform.
Open source extensions provide
additional functionalities such as access
to and processing of complex data types,
as well as the addition of advanced
machine learning algorithms.
Install Extensions
 From the top menu, select
“File Install KNIME Extensions
 ->Select
 -KNIME Math Expression
extension (JEP)
-KNIME External Tool Node
 -KNIME Report Designer
 ->Click “Next”
Exercise - 2

01 02

Installation of Lab extensions


additional
extensions
Exercise 3
 Install the following Extensions
 KNIME Database
 KNIME JavaScript Views
 KNIME Report Designer
Solution: Exercise 3

Select From the Top right corner options, select Menu “Install Extensions”

Search Search for Required Extensions

Click Click Next

Follow Follow the instructions


Data Access
 Files
 CSV, txt, Excel, Word, PDF
 XML, JSON, PMML
 Images, Texts, Networks
 Databases
 MySQL, PostgreSQL, Oracle
 Theobald
 Any JDBC (DB2, MS SQL Server)
 Other
 Twitter
 Google
 Sharepoint
Transformation
 Preprocessing
 Row, column
 Data Blending
 Join, concatenate, append
 Aggregation
 Grouping, Pivoting, Binning
 Feature Creation and Selection
Create a Node
 Drag and Drop the node from the
“Node Repository” panel into the
workflow editor, or
 Double-click the node in the “Node
Repository” panel To Connect a node
with existing nodes
 Select a node in the workflow and
double click a node in the repository
 Click the output port of the first node
and release the mouse at the input
port of the second node
View a Processed Node
If the execution is successful, Green light will be shown, and can view
the processed data

01 02 03 02

The data table


Select the last
Right-click the with the
option in the
node processed data
context menu will be
appeared
Create a workflow
Group
 If click on the Home Button:
 Click on the “Local Workspace”
 Select “Create Folder”
 In the “Create Folder” dialog:
 Enter the name of Workflow Group
Create a Workflow
 In the Local Space:
 Click on the “+”
symbol
 In the “Create a new
workflow” dialog:
 Enter the name of
Workflow
 Click Create

 In Space Explorer
 Click on the black
button with three dots
 Click “Create
Workflow”
Save a Workfl ow
 Saving the workflow saves the
workflow architecture, the
node’s configuration, and the
data produced at the output of
each node.
 Click the disk icon on the Top
Menu
 To save the copy of the
currently selected workflow,
Click “Save as..”
 To save ALL open workflows,
Click “Save ALL” stack of
disks icon.
Delete a Workfl ow
-Right Click the workflow in the
“KNIME Explorer”

-Select Delete

-Confirm Delete
Import/Export Workflow
Steps :

To import workflow, Right Click


anywhere local workspace in KNIME
Explorer

To export a workflow or workflow


group, first select the workflow (or
group) to export

Next, write the path to the


destination folder and the file
name. While exporting workflow
group, select the elements you
Exercise 4
 Create Empty Workflow
 Click “New” in the toolbar panel
 Right Clicking a folder in the local workspace in the KNIME
Explorer
 Enter the name of the “WorkFlow”
 Browse the destination folder
 Click “Finish”
Data Importing & Blending
Node Operations

Read data from Understanding different data


the file structure & data type

Data
Exploration
Read Data
❑Steps :

❑Add a file reader node, by double clicking or by


drag & drop

❑In Configuration Dialog,

❑Click “Browse "to select the path of the file

❑In most of the cases, File Reader, automatically


detects the file structure

❑If not, then enable/disable all required


checkboxes according to the data structure
Excel Reader
 Reads .xls and .xlsx file from
Microsoft Excel
 Supports reading from multiple
sheets
Exercise - 5
STEPS -:

 Write final data to file in CSV format


 Create Workflow “Exercise 1”
 Read file “data1.txt”
 Change column name of “ranking”
as “marks”
 Change column name of “ranking”
as “marks”
 Remove Column “Class”
 Write final data to file in CSV format
Database Connector Node
• Creates a connection to an artibitary JDBC
database
• Select an appropriate driver and provide
the JDBC URL of the database
Data Preparation and Cleaning
Essential steps to ensure the
data is accurate and ready for
analysis.

Some nodes for this task:


 Data Exploration
 Initial overview of the
data
 Generates Summary
statistics and
visualizations
Data Explorer
Install “KNIME JavaScript Views (Lab)”
Supports CSS styling
Data Cleaning
Nodes like “Missing Value”
and “ Duplicate Row Filter”
helps to handle missing data
and remove duplicates
Missing Value – Helps to
Handle Missing Value in the
data
Data Cleaning
Duplicate Row Filter – Identifies
duplicate rows can either remove all
duplicate rows from the input table
and keep only unique and chosen
rows or mark the rows with additional
information about their duplication
status
Data Cleaning
Column Splitter
 Splits the columns of the
input table into two
output table
Data Transformation
Data can be transformed using Nodes like “Column
Filter”, “Math Formula” , “String Manipulation” etc
Data Transformation
Column Filter Node:-
 Allows columns to be filtered from
the input table while only the
remaining columns are passed to
the output table
Data Transformation

 Math Formula
 Valuates a mathematical
expression based on the values in
a row
 Computed results can be either
appended as new column or be
used to replace an input column
Data Transformation
 String Manipulation
Node
 Manipulates strings
like search and
replace, capitalize
or remove leading
and trailing white
spaces.
Data Aggregation
Nodes like "GroupBy" to perform
aggregation operations on the , such as
calculating sums, averages, or counts.
Groups the rows of a table by the unique
values in the selected group columns.
Data Imputation

Rule Engine:
Takes a list of user-defined rules and
tries to match them to each row in the
input table. If a rule matches, its
outcome value is added into a new
column.
Exercise 6
 Extract details of persons born outside “United-States” in CSV Form
 Extract details of persons born outside “United-States” in CSV Form
 Read the file adult data
 Rename column “fnlwgt to Final Weight”
 Remove Column “Final Weight”
 Remove Row Containing “ United-States”
 Write the data to “CSV file” named “Born outside US”
DATA INTEGRATION AND
TRANSFORMATION
Involves combining data from various sources,
reshaping it and preparing it for analysis. Some nodes
used for the purpose are:
 Joining Data : Combine data from multiple tables
based on common keys or criteria
 Pivoting and Unpivoting : Help to reshape data
from wide to long format or vice versa
 Data Sampling : Used to select a subset of data for
analysis
 Data Normalization and Scaling : Normalize and
Scale the data to prepare it for machine learning
algorithms.
 Text Mining and NLP: Supports text data processing
and Natural Language Processing
EXPORTING DATA FROM
KNIME

Data Export Node: To Export Data Visualisation: Nodes like Model Deployment: Machine Data Reports: Reports with
Data, nodes like “CSV Writer”, “Bar Chart”, “Pie learning models in KNIME can customised layouts and
“Excel Writer” or “Database Chart”,”Heatmap” etc are be exported for deployment export them in various
Writer” is used depending on used to create charts, plots. in production environment. formats such as PDF or HTML.
the nature of output
Exercise 7
 Objective: To do data Visualization
 Create a workflow “Exercise 7” under the workflow group “Exercises”
 Read Data Iris
 Name the columns : Sepal length, Sepal Width, Petal Length, Petal
Width, Class
 Classify the flower types to” Iris-setosa”, “Iris-versicolor” and “Iris-
Virginca” to Class 1, Class 2, Class 3 respectively
 Split the contents of “Class” into three columns
 Joining two columns
 Converting the contents of a column to upper case
 Replacing the word “Iris” to “Flower”
 Creating “Bins” based on Sepal Length
 Grouping the “Class” based on Bins
 Create Bar Chart and Scatter Plot
Connecting to Big Data
Sources
KNIME provides various connectors and integrations to
connect to Big data platforms:
Big Data Nodes Used Nodes
Platform
Hadoop HDFS Connector
Distributed HDFS File Picker
File System
(HDFS)
Apache Spark Spark Reader- to load
data from Spark data
frames
Spark SQL: Querying
and Manipulating Data
Connecting to Big Data
Sources
Big Data Platform Nodes Used Nodes
Apache Hive Hive Connector
Hive Table
Selector
Big Databases like
HBase, Cassandra
Other Sources: Amazon S3
Amazon, S3, connector
Google Cloud Google Cloud
Storage, Azure Storage Connector
Blob Storage
Handling Big Data in
KNIME
Data Sampling
KNIME's sampling capabilities to work with a representative subset of the
data for initial exploration and modeling
Distributed Computing
Processes databases in parallel, which improves processing by utilizing
multiple processing nodes.
Data Chunking
To prevent memory constraints, KNIME can process data in smaller,
manageable chunks.
Data Compression
Data Compression techniques are employed to reduce storage requirements
and optimize data transfer between nodes an across the network.
In – Database Processing
Data remines within the database for analysis , thus minimizes data
movement and enhances performance.
Big Data’s Data Sampling
workflow
Data Chunking Node with
workflow
Data Compression Node with
workflow

01 02 03 02
In-Database Processing
DISTRIBUTED DATA
PROCESSING
Distributed data processing is a key capability in
KNIME for efficiently analysing and processing
Big Data:

Parallel Execution Data Partitioning Load Balancing Scalability

Automatically partitions data into smaller


Distribute Data processing tasks across multiple scale horizontally by adding more compute
chunks, which can be processed in parallel by Distributing tasks evenly across available
nodes or worker machines, thereby improves resources to handle larger datasets and more
different nodes or worker nodes, thus ensuring resources to maximise their utilisation
performance and reduces processing time. complex analyses
efficient resource utilisation.
BIG DATA ANALYTICS WITH KNIME
Data Exploration
Tools to gain insights into big data, generate summary statistics and create
visualisations
Data Preprocessing
Clean, transform and prepare Big Data for analysis
Machine Learning
Offers a wide range of machine learning algorithms which can be applied to
big data, including classification, regression, clustering and anomaly
detection.
Models can be build for predictive analysis
Text and Image Analysis
Allows to extract valuable information from Unstructured Big Data Sources
Advanced Analytics
Conduct advance analytics like time series analysis, network analysis and
geospatial analysis on big data
BIG DATA ANALYTICS WITH
KNIME
Data Visualization and Reporting
Create visualizations and reports to communicate the findings effectively
Deployment and Integration
Deploy big data analytics workflow for production use
Export models and predictions for integration with other applications and
data pipelines
Monitoring and Maintenance
Continuously monitor and maintain big data analytics workflows
Optimize workflows for performance
Keep KNIME and it extensions upto date
Collaboration and Sharing
Collaborate with team members and stakeholders by sharing knime
workflows ,
reports and results
Use Knime Server for collaborative work and scheduled
DATA TRANSFORMATION AND
MANIPULATION
Data Cleaning
Nodes like "Missing Value," "String Manipulation," and "Rule Engine" to
handle missing data, correct errors, and clean the dataset.
Data Transformation
For transforming data, use nodes such as "Column Filter,""Math Formula,"
and "Pivoting," to reshape data and create new features.
Data Aggregation
Aggregate data using the "GroupBy" node to calculate summary statistics
or create aggregated datasets
Data Joining
Combine datasets using nodes like "Joiner" or "Concatenate" to merge data
from different sources or tables.
DATA TRANSFORMATION
AND MANIPULATION
Data Splitting
Split data into training and testing sets using nodes like "Partitioning" to
facilitate model evaluation.
Data Normalization and Scaling
Normalize or scale features to bring them to a common scale using nodes
like "Normalizer" or "Scorer
Text and String Processing
Nodes are available for Text Processing such as tokenization, stemming, and
sentiment analysis, making it suitable for text data manipulation
Data Transformation
and Manipulation
related workflow
STATISTICAL ANALYSIS IN
KNIME
Descritpive Analysis
Nodes like "Statistics" and "Data Explorer“ are used to compute descriptive
statistics, including measures of central tendency, dispersion, and
frequency distributions
Hypothesis Testing
Nodes like "Group Comparison" for comparing means, "Chi-Square Test" for
categorical data, and others.
Correlation Analysis
Determine relationships between variables using correlation analysis
nodes.
ANOVA and Regression Analysis
Perform analysis of variance (ANOVA) and regression analysis to explore
relationships between dependent and independent variables.
Time Series Analysis
Analyse time series data with specialized nodes for forecasting, trend
analysis, and seasonal decomposition
Statistical Analysis Related
Workflow
MACHING LEARNING WORKFLOWS
Model Selection
Use nodes for model selection, such as "Variable Selection" and "Feature Selection," to identify
the most relevant features for modeling.
Model Training
Train machine learning models using nodes for various algorithms, including decision trees,
random forests, support vector machines, and neural networks
Model Evaluation
Evaluate models with nodes like "Scorer" to assess their performance using metrics like accuracy,
precision, recall, F1-score, and ROC curves
Cross Validation
Implement cross-validation techniques using nodes like "Cross-Validation Loop" to assess model
generalization
MACHING LEARNING
WORKFLOWS
Ensemble Learning
Build ensemble models using nodes like "Ensemble Learner" to combine multiple
models for improved predictive performance
Hyperparameter Tuning
Optimize model hyperparameters with nodes like "Parameter Optimization" to
achieve the best model performance.
Machine Learning
Related Node and
Workflow
PREDICTIVE ANALYSIS
AND DATA MINING
Classification
Build classification models to predict categorical outcomes
KNIME supports various algorithms like logistic regression, decision trees, and
k-nearest neighbours
Regression
Perform regression analysis to predict numeric outcomes, using regression
algorithms like linear regression and support vector regression
Clustering
Use clustering algorithms to group similar data points together
KNIME offers k-means clustering, hierarchical clustering, and DBSCAN, among
others
Anomaly Detection
Detect anomalies or outliers in data using specialized nodes like “Local
outlier Factor”
PREDICTIVE ANALYSIS
AND DATA MINING
Association Rule Mining
Discover Patterns and associations in data using association rule
mining nodes
Time Series Forecasting
Forecast future values of time series data using dedicated nodes
for time series analysis and prediction.
Text Mining and NLP
Perform analyse and extract insights from unstructured text data
Geospatial Analysis
Perform geospatial analysis and visualization using nodes for
geographic information system (GIS) data
Classification Models to Predict
Categorical Outcomes Workflow
K – means Clustering Workflow
Geographic Information
System (GIS) data
workflow
DATA VISUALIZATION IN
KNIME
 Data visualization is a powerful way to communicate insights and patterns
in data
 Visualization Nodes
 Nodes for creating visualizations, including scatter plots, bar charts,
line charts, heatmaps, and more
 Interactive Plots
 Supports interactive plots that allows to explore data dynamically,
which includes zooming, panning, tooltips and filtering
 Customization
 Customize the appearance of the visualizations, such as color
schemes, labels, and axis scaling, to ensure clarity and relevance
 Automated Visualization
 Automate the generation of visualizations using
data-driven approaches.
Interactive Plots Output
Automated Visualization
Workflow
CREATING INTERACTIVE
DASHBOARDS

Dashboard Drag and Drop Real Time Updates Parameterization


Components Design
KNIME offers a range Build dashboards Dashboards can be Create dashboards
of dashboard using a user-friendly, designed to update with parameterized
components like drag-and-drop in real-time as data inputs, allowing
tables, charts, filters, interface changes, enabling users to customize
and input widgets Arrange components users to see the views based on their
that can be and link them to latest information preferences or
combined to create control each other instantly specific analysis
interactive dynamically requirements
dashboards
Interactive Output Example
MODEL DEPLOYMENT AND
INTEGRATION
Deploying machine learning models and integrating
them into production systems is crucial for making
data-driven decisions

Export Models RESTful Web Services Batch Processing Database Integration

KNIME allows to export trained machine learning


Automate model deployment by integrating KNIME Integrate models with databases to perform in-
models in standard formats (e.g., PMML) for Deploy models as RESTful web services using KNIME Enables real-time predictions and integration with
workflows into batch processing pipelines to generate database scoring, making predictions directly within
deployment in various environments, such as web Server. other applications
predictions on new data regularly the database engine
applications or databases.
Model Deployment and Integration
Workflow
AUTOMATION AND
REPORTING IN KNIME
 Automation and reporting are essential for streamlining
workflows and sharing insights
 Workflow Automation
 Automate repetitive tasks and data processing steps using
KNIME’s workflow automation capabilities
 Schedule workflows to run at specific times or events
 Report Generation
 Create customizable reports in KNIME with text, tables, charts,
and visualizations.
 Can be generated automatically as part of a workflow or on-
demand.
AUTOMATION AND
REPORTING IN KNIME
 Data Export
 Export data, results, and reports in various formats, including
PDF, Excel, CSV, and more, to share insights with
stakeholders
 Integration with External Systems
 KNIME can integrate with external systems and databases to
import data, trigger workflows, and export results seamlessly
 Notifications
 Configure notifications and alerts to inform users or
administrators about workflow status, errors, or specific
events.
Workflow Automation
Configure Notifications and
Alerts Template
TH A N
K
YO U

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy