0% found this document useful (0 votes)
40 views90 pages

KNIME Data Preparation Short Course

The document discusses an introduction to data analytics course using KNIME Analytics Platform. The course agenda covers introductions to data analytics and KNIME, application terminology, use cases, connecting to and saving data, and data transformation techniques. KNIME is presented as an open-source visual data analytics tool that can be used across industries for tasks like data blending, machine learning, and sharing insights. The document promotes KNIME's ability to build workflows, access different data sources, shape and prepare data, and scale processing for data analytics projects.

Uploaded by

mr joney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views90 pages

KNIME Data Preparation Short Course

The document discusses an introduction to data analytics course using KNIME Analytics Platform. The course agenda covers introductions to data analytics and KNIME, application terminology, use cases, connecting to and saving data, and data transformation techniques. KNIME is presented as an open-source visual data analytics tool that can be used across industries for tasks like data blending, machine learning, and sharing insights. The document promotes KNIME's ability to build workflows, access different data sources, shape and prepare data, and scale processing for data analytics projects.

Uploaded by

mr joney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

2022

Get Prepared in Data Analytics with


KNIME Analytics Platform
Jeni Sudirman

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Course Agenda
1. Intro to Data Analytics
2. Get Started with KNIME Analytics Platform
3. Application Terminology
4. Use Case
5. Connecting to Data and Saving Files
6. Data Transformation: Clean, Blend and Aggregate Data

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Intro to Data Analytics

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
What is Data Analytics?
Data analytics is the science of analyzing raw data in order to make conclusions about that information.

Many of the techniques and processes of data analytics have been automated into mechanical processes and algorithms that work over raw data
for human consumption.

Data analytics techniques can reveal trends and metrics that would otherwise be lost in the mass of information.

This information can then be used to optimize processes to increase the overall efficiency of a business or system.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Why Data Analytics is Important?
Because it helps businesses optimize their performances.

Implementing it into the business model means companies can help reduce costs by identifying more efficient ways of doing business and by
storing large amounts of data.

A company can also use data analytics to make better business decisions and help analyze customer trends and satisfaction, which can lead to
new—and better—products and services.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
The analytical problem-solving process
Data
Requirements

Data Collection

Data Cleaning

Data Analysis

Communication

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Requirements
Specification

The data required for analysis is based on a question or an


experiment.

Specific variables regarding a population (e.g., Age and Income) may


be specified and obtained.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Collection

Data Collection is the process of gathering information on targeted


variables identified as data requirements.

Data often stored in various location, may not be structured and may
contain irrelevant information.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Cleaning

The processed and organized data may be incomplete, contain


duplicates, or contain errors.

Data Cleaning is the process of preventing and correcting these


errors.

There are several types of Data Cleaning that depend on the


type of data.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Analysis

Data that is processed, organized and cleaned would be ready for


the analysis.

In this step, you’ll begin to slice and dice your data to extract
meaningful insights from it.

Using the techniques and methods of data analysis, you’ll look for
hidden patterns and relationships, and find insights and predictions.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Communicate the
Result

After you’ve interpreted the results and drawn meaningful insights


from them, the next step is to create visualizations by selecting the
most appropriate charts and graphs.

The analysis tools provide facility to highlight the required


information with color codes and formatting in tables and charts.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
https://www.anaconda.com/state-of-data-science-2021

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Our Issues
Data Analytics is the key competitive edge for companies and therefore in high demand across all industries.

The biggest obstacle in data analytics is getting clean and correct data.

Data analytics and data scientists spend up to 80% of their time in cleaning data.

“Trash in trash out” – clean data is key!

Data often is stored in various different locations.

How to access all of them and prepare the data for visual analytics.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Analytics Tools

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Get Started with KNIME
Analytics Platform
CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN
Copying in whole or in part is strictly forbidden without prior written approval
What is KNIME
Analytics Platform?

Open and open-source modular Data Science platform.

A tool for data analysis, manipulation, visualization, and


reporting.

Based on the visual programming paradigm (GUI based).

Provides a diverse array of extensions:


Text Mining, Network Mining, Cheminformatics, Deep
Learning, Many integrations, such as Java, R, Python,
Weka, Keras, Plotly, H2O, etc, ... And more

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME Popularity

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
How KNIME is Used in Different Industries

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
How KNIME is Used in Different Industries

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
How can KNIME support in your Data Analytics?

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME Products Line

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME Analytics Platform

Build Data Science Workflows Blend Data from Any Source Leverage Machine Learning & AI
One single, open-source data analytics ● Open and Combine ● Build machine learning models
tools ● Connect to a host ● Optimize model performance
● Access and Retrieve Data ● Validate Models
● Explain machine learning models

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME Analytics Platform

Shape Your Data Discover and Share Data Insights Scale Execution with Demands
● Derive Statistics ● Visualize your Data ● Build workflow prototypes
● Aggregate, sort, filter, and join ● Display Summary Statistics ● scale workflow performance
● Cleaning ● Export Reports ● Exercise the power of in-database
● Extract and Select Features ● Store Processed Data processing

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME Server
Data Science Team
Collaboration
● Share expertise and best practices by sharing data, components, and workflows across your
team and company.
● Comply with data protection policies through the control of access management at the data,
workflow, and application levels.
● Reproduce data science by recording workflow revisions along with the data, enabling
debugging, tracking, and auditing.

Data Science Automation


● Schedule workflows to run automatically and give yourself more time to focus on data
science - and specify a number of retries for failed jobs.
● Control workflows to automate model management.
● Scale and pin workflow execution with well provisioned, high performance server architecture
which is configured to your specifications.
● Design, edit, and execute workflows on KNIME Server using the Remote Workflow Editor and
take advantage of well provisioned hardware in a secure environment.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME Server
Manage and Monitor Workflows
● Host KNIME Server in your data center or in the cloud via Microsoft Azure, Amazon AWS, or the
cloud provider of your choice.
● Integrate authentication with corporate LDAP / Active Directory setups and manage permissions.
● Monitor server activity and manage ongoing services in the AdminPortal.
● Control IT operations via central management settings for multiple installations.
● Use Metadata Mapping with workflow summary to completely map all aspects of the workflow.

Deploy Data Apps and Services


● Bring complex data science and machine learning to business analysts with Guided
Analytics. Data scientists build and deploy a workflow to KNIME Server. End users interact with
the workflow in the web in a controlled way and view results.
● Build and publish detailed reports which can be sent via email or accessed on demand from
the KNIME WebPortal.
● Deploy workflows as industry standard web services seamlessly from KNIME workflows via
REST API, and build out your data science infrastructure.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Download and Install
KNIME
https://www.knime.com/downloads/download-knime

Select the KNIME Analytics Platform version for your computer:


● Mac
● Windows – 32 or 64 bit
● Linux

Download the archive and extract the file, or download the installer
package and run it.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Application Terminology

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
The KNIME Workspace
The workspace is the folder/directory in which workflows (and potentially data files) are stored for the current session.

Workspaces are portable (just like KNIME Analytics Platform)

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
The KNIME Analytics Platform Workbench

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME Explorer
● In LOCAL you can access your own workflow projects.

● Other mountpoints allow you to connect to:


○ EXAMPLE Server
○ KNIME Hub
○ KNIME Server

● The Explorer toolbar on the top has a search box and buttons to:
select the workflow displayed in the active editor
refresh the view

● The KNIME Explorer can contain 4 types of content:


○ Workflows
○ Workflow groups
○ Data files
○ Shared Components

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Workflow
A workflow is a pipeline of nodes, each configurable to perform a specific task. The data flow through nodes from left to right.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Workflow Description
When selecting the workflow, the Description window gives information
about the workflow’s:
● Title
● Description
● Associated Tags and Links
● Creation Date
● Author

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Creating a new workflow
Click anywhere on the KNIME Explorer to create a new workflow or workflow group.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME File Extensions

Dedicated file extensions for workflows and workflow groups associated with
KNIME Analytics Platform.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Workflow Coach
Node Recommendation engine.

It gives hints about which node to use next in the workflow.

It is based on world-wide KNIME community usage statistics.

It can also be set to use personal and local group usage statistics.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Console and Other
views

Console view prints out error and warning messages about what is going
on under the hood.

Click on View and select Other… to add additional views

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Node
Nodes are the basic processing units of a workflow.

Each node has a number of input and/or output ports.

Data is transferred over a connection from an out-port to the in-port(s) of other nodes.

Under each node, a light shows its status.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Nodes State
A node can have 4 states:

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
How many nodes?
Over 4500 native and embedded nodes included:

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Node Repository
The Node Repository contains all KNIME nodes - ordered by
category with further subcategories.

Extension installation can sensibly increase the number of nodes.

Two search methods:


Crisp Search
Fuzzy Search

Nodes can be added by drag and drop from the Node Repository
to the Workflow Editor

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Node Description

The Description window gives information about:


● Node Functionality
● Input & Output
● Node Settings
● Ports
● References to literature

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Port Types
A pipeline of such nodes makes a workflow.

The result of the node’s operation on the data is provided at the out-port to successor nodes.

Only port of the same type can be connected.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Node Configuration
Most nodes require configuration.

To access a node configuration window:


● Double-click the node
OR
● Right-click > Configure

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Node Execution
Right-click node
Select Execute in context menu

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Node Views

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Frequently Used Nodes

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Annotations
Annotations are coloured editable boxes that you can add to your workflow.
They help you making it more readable and visually pleasant.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Use case

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Use Case

We received the following files from the client:


● Transactional data aggregated to a state/product/day level in 12
files divided by month (Transactions mmyy.csv & Transactions
mmyy.xlsx)
● Product data giving information about list prices and costs per
country (Product Info.xlsx)
● List of sales manager for the regions and products (Sales Rep.xlsx)

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Our Task
Consolidate the transactions data into one giant table.
1. Explore Import options in KNIME
2. Saving our files to disk
3. Cleaning the Sales Area in the Sales Rep Sheet
4. Joining data - start combining our results (additional cleaning
involved)
5. Cleaning and analyzing the data:
a. Remove duplicate entries (if any)
b. Remove quantities below or equal to zero
c. Remove Rows by attribute value
d. Calculate the Sales and Profit
6. Aggregate the data with Groupby and Pivoting

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Our Workflow

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Connecting to Data and
Saving Files
CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN
Copying in whole or in part is strictly forbidden without prior written approval
Read the dataset

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Source Nodes

Typically characterized by:


● Orange color
● By default no input ports, 1-2 output ports

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
CSV Reader

Reads either one or multiple .csv and .txt files.

Further tabs to:


● limit the rows
● select encoding

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Excel Reader (XLS)

Reads .xls and .xlsx file from Microsoft Excel

Supports reading from multiple sheets

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Common Settings: File Path
A path consists of three parts:
● Type: Specifies the file system type e.g. local, relative, mountpoint, custom_url or connected.
● Specifier: Optional string with additional file system specific information e.g. relative to which location (knime.workflow)
● Path: Specifies the location within the file system

Examples:
● (LOCAL, , C:\Users\username\Desktop)
● (RELATIVE, knime.workflow, file1.csv)
● (MOUNTPOINT, MOUNTPOINT_NAME, /path/to/file1.csv)
● (CONNECTED, amazon-s3:eu-west-1, /mybucket/file1.csv)

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Common Settings: Read Single or Multiple Files
Importing Content from Multiple Files of the same Type to a Single Table
● Single file

● Files in a folder

● Option to include subfolder


● Option to define filter criterions

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Common Settings:
Transformation Tab

Supported operations
● Column filtering
● Column sorting
● Column renaming
● Column type mapping
● Select between union or intersection of columns (in
case of reading many files)

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
KNIME Data Structure
Data in KNIME are organized as a table with a fixed number of columns.
Each row is identified by a Row ID.

Columns are identified by column headers.


Each column represents a data type:
● Double (“D”)
● Integer (“I”)
● String (“S”)
● Date & Time (calendar + clock icon)
● Unknown (“?”)
● Other domain related types

Clicking the header of a data column allows to sort the data rows in an ascending / descending order.
Right-clicking the header of a data column allows to visualize the data using specific renderers.

For Double/Integer data, for example, the “Bars” renderer displays the data as bars with a proportional length to their value and on a red/green
heatmap.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Concatenate

Combine rows from two tables with shared columns


● Handles duplicate row keys gracefully
● The order of the column doesn’t have to be the same.
● Take the union or intersection of columns

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Concatenate : Dynamic Ports
Importing Content from Multiple Table of different file types to a Single Table.

Add and remove node ports based on your needs, e.g. in order to concatenate
three or more tables.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Excel Writer

Writes the input table into a spreadsheet of an Excel file.

Select overwrite, to overwrite a spreadsheet to an existing


Excel File and define the name of the new sheet.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Workflow: Connecting and Saving Data

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Transformation:
Clean, Blend and Aggregate Data

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Cell Replacer
Replaces the content of a column based on a lookup
● The top port references the table you want to search
● Bottom port holds the lookup table (search keys and replacement
values)

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Type Conversion

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
String to Number

Converts all cells of a column from type “String” to type “Double” or


“Integer”.
● The final column type: Double or Int
● The decimal separator and the thousands separator (if any)
● The names of the columns to be converted to the selected
type. These columns are listed in the frame “Include”. All other
columns are listed in the frame “Exclude”.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Table Manipulator

Allows for:
● Concatenation of multiple files/tables
● Column filtering
● Column sorting
● Column renaming
● Column type mapping

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Value Counter
Counts the number of occurrences of all values in the selected column.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Missing Value
Replaces missing values in a data set everywhere or only in selected columns with a value of your choice.
In tab “Default”, replacement values are defined separately for numerical and string type columns and applied to the all data columns
of the same type.
In tab “Column Settings”, a replacement value is defined specifically for each selected data column and applied only to that column.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Joining Columns of Data

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Joining Columns of Data

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Joiner
Combines columns from two different tables
● Top input port: “Left” data table
● Bottom input port: “Right” data table

Outputs:
● Top port: Resulting joined table
● Middle port: Unmatched rows from the left input table (top input port)
● Bottom port: Unmatched rows from the right input table (bottom input port)

By default the two bottom output ports are deactivated

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Joiner – Join Mode

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Joiner Configuration –
Linking Rows

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Duplicate Row Filter
Detects duplicate rows and apply a selected treatment
● First tab provides the option to select columns for duplicate detection
● Second tab provides options for treating duplicated values

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Row Filter and
Row Splitter

Row filtering with include and exclude options according to certain criteria:
● Select rows by attribute value (pattern matching)
○ Value matching: column value matching some predefined
pattern value
○ Range checking for numerical columns: column value above or
below a given value
○ Missing Value Matching
● Select rows by row number
● Select rows by RowID (pattern matching on RowID)

Each of these criteria can be used to include or to exclude rows.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Math Formula

Row-wise calculations.
Some column-wise statistics.
Many mathematical functions.
Double-click function, then select column by click.

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Workflow: Combine and Filtering Data

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Aggregation - GroupBy

Aggregated on Category (group) by Sum (aggregation method)

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
GroupBy
Aggregate rows to summarize data
● First tab provides grouping options
● Second tab provides control over aggregation details

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Workflow: GroupBy Aggregation

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Data Aggregation - Pivoting

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Pivoting

Performs pivoting on selected columns for grouping and pivoting


● Values of group columns become unique rows
● Values of the pivot columns become unique columns for each set of
column combination together with each aggregation
● Many aggregation methods are provided (similar to GroupBy)

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Pivoting

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Workflow: Pivoting

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Additional Resources

● KNIME Pages (https://www.knime.com)


● RESOURCES/LEARNING HUB (https://www.knime.com/learning-hub)
● RESOURCES/NODE GUIDE (https://www.knime.com/nodeguide)
● Search Engine for Nodes (https://nodepit.com)
● FORUM for questions and answers (https://forum.knime.com)

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval
Thank You
CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN
Copying in whole or in part is strictly forbidden without prior written approval
Contact Us

Get In Touch With Us


Office Call Us Send Us
The Manhattan Square +6221 7822 473 contact@dataacademy.co.id
TB Simatupang Road +62 877-8863-6985 @cybertrend_data_academy
South Jakarta 12560 – Indonesia (Corporate & Public Sales) Cybertrend Data Academy

+62 852-1022-9595 Cybertrend Data Academy

(Education Sales) Cybertrend Data Academy

CONFIDENTIAL AND PROPRIETARY © 2022 PT CENDIKIA DATA ANDALAN


Copying in whole or in part is strictly forbidden without prior written approval

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy