From Alteryx To KNIME: Written By: Corey Weisinger Edited By: Someone, I Hope
From Alteryx To KNIME: Written By: Corey Weisinger Edited By: Someone, I Hope
This book was written for people who are familiar with Alteryx and now interested in finding out to transition to KNIME Analytics Platform. Consider this
book a bit like a foreign language dictionary: We look at how the most commonly used tasks are spoken in “Alteryx” and then translate them into “KNIME”.
Find out, for example, how to import and manipulate data, how to perform modeling and machine learning, which includes sections on regressions,
clustering, neural networks and components to name just a few. The appendix contains a useful quick tool to node reference.
KNIME Interface
Node Repository
This is the equivalent of the Alteryx Tool Palette. In KNIME we call the
tools “nodes” and they can be searched for from here and dragged into
the workflow editor
Configuration Dialog
To configure a node in KNIME, you right click the node you wish to
configure and select Configure. Unlike Alteryx, in KNIME the node
configuration window is not always open.
Workflow Editor
This is the equivalent of the Canvas in Alteryx, it’s where you drag & drop
your nodes to build your workflow. Figure 2 Where to find the Node Monitor
Node Monitor
This optional view can be enabled by going to View > Other > Node Monitor and selecting
open. You can see where in Figure 2 to the left. Next, if you click the arrow in the Node Monitor view, you’ll see a few different options here. Feel free to
play around and see what each view displays but for now let’s use the Show Output Table option (see Figure 3). This will give you an easy-to-see view of
the output table of whichever node you have selected in your workflow, just like the normal results window in Alteryx.
The other views available allow you to see configuration settings, run time information, and Flow Variables that exist after the selected node. We’ll cover
what flow variables are later in this book but just keep the Node Monitor in mind if you’re ever getting deep into their uses!
Unconfigured node:
If the traffic light below the node is red, the node has not yet been configured and it is not ready to be executed. A yellow triangle may
show, detailing the error. In this case the node simply has not yet been configured.
Configured node:
Once the node is configured the traffic light turns yellow, this means the node is ready to be run and just needs to be executed. Some nodes
may look like this when inserted into a workflow if they don’t need specific configuration.
Executed node:
After a node has been executed its light turns green. At this point the data are available at the output port for viewing or further processing
at the output port.
Local Files
Figure 4 The Alteryx Input Data tool and the KNIME Reader nodes
Local files, like Excel files, CSVs, PDFs, JSON, text files, and many others, are those typical files that just hang out on your hard drive. Similar to Alteryx, you
can simply drag and drop the file you want to import into the Workflow Editor; KNIME automatically inserts the correct node needed to read it in.
Let’s look at each of the KNIME nodes one at a time, see what makes each one special. I’ll give you a hint, it’s the kind of files they can read and how they
can be configured!
The File Reader can read just about any ANSCII This node uses the Apache Tika library and can
data. It automatically detects common formats. read a lot of data types! Try it with Emails or
CSV files can be read by the File Reader node, This node, as the name suggests, is for reading
but the CSV Reader gives you more specific JSON files. KNIME can also convert these to
options.
The File Reader can handle Excel files, but the This node is for reading XML files, an XPath
Excel Reader node lets you read specific sheets, Query can optionally be used in configuration.
rows, or columns.
In Alteryx this is like using the Connect In-DB tool and Data Stream Out tool:
So how do we connect to the database in KNIME? This is done with the Database Connector node, be it a traditional format like MySQL, or a Hadoop based
one like Impala or Hive. Once that connection is established, we can select a table in the DB Table Selector node. The DB Connector node at the far left (of
the KNIME workflow in Figure 5) is a generic connector, it references a JDBC driver and connects to anything that supports one.
Figure 7 Node Repository with the Query folder expanded showing the DB Query nodes
Figure 8 KNIME Nodes from the Social Media section of the Node Repository
Figure 9 The configuration dialogs for the Twitter Search and the Google Sheets Reader nodes
Keep in mind that the remote connections also work while writing data! Simply click the 3-dot icon and select add connection port. Then you’ll be free to
connect whichever remote connection you need! See page 8 for info on remote connections.
Local Files
Figure 10 Alteryx tools for writing data and the equivalent KNIME nodes
The nodes listed here are for writing local files, both standard data storage formats like CSV, Excel, and JSON, which, in Alteryx, you would write with the
Output Data tool, and images and PDFs, which you’d write with the Image and Render tools in Alteryx. Again, the main difference here is that in Alteryx your
output tool can be configured differently to perform different tasks and in KNIME we have separate nodes for these separate tasks.
Writes to a CSV, allows for delimiter, missing Table cells can hold an XML data type. This node
value pattern, quote configuration, and more. writes those cells out as separate files.
Allows you to quickly export to XLS. For Some graphing nodes output images; connect
advanced options we’ll look at more nodes on them to this node to export them!
Write values to a JSON file with this node. Creates a PDF version of your data table. Combine
Optionally automatically compress the output. with graphs to include a snapshot of the actual
Above you see a string of XLS Formatter nodes, these are linked together and each change bits of the formatting in an Excel file you’re preparing to write.
The modular nature makes it easy to customize your formatting as much as you like, as well as add or remove parts. There is a variety of nodes for this
purpose; if this is a major use case for you, check out the linked guide below for a full introduction to formatting Excel files in KNIME.
• https://www.knime.com/community/continental-nodes-for-knime-xls-formatter
Note: This exact same node can also be used to feed the DB Table Selector node when reading from a database, and, by swapping the database info in the
connector you can easily transfer from a development to a production environment.
• Table Creator simply represents the data you want to write. This could be a file you’ve read into KNIME or the output of an entire workflow
• DB Connector supplies the information for connecting to the database, e.g. login credentials
• DB Writer is where you specify the name of the table you want to write to as well as which columns you want to write to it
Figure 13 Alteryx Select tool is the same as the Row Filter, Row Splitter, and Rule-based Row Filter nodes in KNIME
Allows for quick filtering by way of string pattern matching, a numeric range, or missing values. This filtering can be
performed on a column or the row ID itself and can be set as an include or exclude filter.
This node works just like the Row Filter above except that it exports both the included and excluded rows. You’ll notice
that it has two output data ports (represented by the black triangles on the right of the node). The top port is for included
rows and the lower port is for excluded rows.
The Rule-based Row Filter node is akin to the custom filter option in Alteryx’ Filter tool. You enter a set of rules, which are
run through one by one. The first to match the row ends the process, e.g.:
$Score$ > 0.9 => TRUE
$Name$ = “Jon Smith” => TRUE
Figure 14 Alteryx Sort tool or the KNIME Sorter node and the KNIME Sorter node configuration dialog
You can also sort columns in KNIME, to do this simply using the Column Resorter node. You can sort alphabetically, or
manually. This may be helpful when verticly combining tables with different column names or when combining multiple
columns into a list data type with the the Column Aggregator node.
Figure 15 The Alteryx Summarize tool and the KNIME GroubBy node
In this case the configuration window looks quite a bit different in KNIME. To quickly summarize, in one tab you set the columns to use for creating groups:
Figure 18 Alteryx Formula, Text to Columns & Find Replace tools and the KNIME String Manipulation, Rule Engine & Cell Splitter nodes
In this section, we touch on a few options for manipulating string data, namely the KNIME equivalents to the Formula, Text to Columns, and Find Replace
tools in Alteryx. The Formula tool is most like the String Manipulation node, it is for writing basic string alteration instructions. The Rule Engine node is
similar as well but can be use in more complicated ways as it allows for ‘if then’ type functionality. The Text to Columns tool can be replaced by the Cell
Splitter node.
Use this node for things such as removing white space, removing punctuation, regex expressions, sub string creation, capitalizing
and more.
The Rule Engine has a lot of the same functionality as the String Manipulation node. You can this for more control. For example, use
this to reformat strings differently base on which source they are from.
This node will take one string column and split it into multiple columns based on a specified delimiter. A comma for example. Unlike
the Alteryx equivalent, you do not need to specify the number of expected output columns. Rows with fewer than the max will simply
have missing values in the right most columns.
Use the String Replacer for quick replacements or even removals inside strings. For example, configure this node to replace all
instances of “two” with 2. This node also supports regular expressions.
You can direct this node to a text file formatted as detailed in the Node Description window. There’s a little more setup here but with
it you can easily replace a large set of strings. Otherwise, it functions as the String Replacer node above.
Figure 19 The different Alteryx tools and KNIME nodes for handling numeric data
There is a near endless variety of ways to manipulate numbers while preparing, analyzing, and modeling data. We’ll touch on a few common examples and
discuss how to get started with these manipulations in KNIME.
Like the Formula tool, the Math Formula node The Normalizer will stretch or compress your
will allow you to alter numeric data with common data to be within a given range, commonly 0 to 1.
math functions.
Figure 20 Lag Column and Math Formula KNIME nodes for recreating Alteryx’ Multi-Row Formula tool
To use the lag column node, you first select the column to lag in the drop down menu shown on the left of Figure 21. Next, you select the Lag and Lag
Interval, this means you specify the number of lagged columns to create (Lag) and the number of rows to lag each time (Lag interval). I chose Lag = 3 and
Lag Interval = 1, so I have created three columns, each lagged one from the last.
Figure 21 The Lag Column configuration dialog and the Lag Column output
If you need to Lag multiple original columns simply apply a second Lag Column node to your workflow. After you’ve created the lagged values you need for
your calculation you can call them just like you would any other value in your formula node of choice.
Fig. 21: How missing data is handled in KNIME in comparison with Alteryx
Sampling Data
Whether you want to sample data to reduce execution time for analytics or constructing training sets for machine learning and modeling there are many
options available in KNIME.
Figure 22 Sampling data with the Partitioning, Bootstrap Sampling, Equal Size Sampling, Row Sampling & Database Sampling nodes in KNIME
The Partitioning node allows you to split your data into two sets based on either a percentage or a number of records. There are a few options for how
these partitions are drawn: from the top, linear sampling, random sampling, and stratified sampling. The node description defines these terms well, so
don’t forget to look them up on the KNIME Hub, if you’re unsure. The Bootstrap Sampling node allows for the use of the bootstrapping technique for
oversampling your data artificially, creating a larger dataset. Equal size sampling requires that you pick a nominal column to define the different classes; it
then creates a sampled set with an equal number of records for each class. This can be helpful when training models based on counting algorithms like
decision trees. Finally - remember there is a Database Sampling node. Performing sampling on the database end will save time when transferring data to
KNIME for analysis.
Figure 23 Alteryx tools vs KNIME nodes for table manipulation: join, concatenate, pivot & unpivot
Use the Concatenate node to vertically combine tables. This node will match fields by name and can
be configured to retain either a union or intersection of the columns in the two input tables.
The Joiner node in KNIME is going to replace your Join tool in Alteryx. There shouldn’t be too much to
get used to here simply select the fields you wish to match and the type of join: inner, left-outer, right-
outer, full outer.
Configuring this node will be straight forward if you’re familiar with pivot tables, just choose 3 things.
The columns to be used as pivots the contents of which will become new columns. The columns to be
used as groups, this will let you aggregate the rows as you pivot. And the aggregation methods for the
fields you wish to retain.
Setting up the Unpivoting node I easy as well. Just select the columns you wish to rotate back down in
to distinct rows, the value columns. And select the columns with values you wish to retain, the retained
columns.
Figure 24 The different options in KNIME to document your workflow and keep it organized
Node Comments
By double clicking the text underneath a node you can edit the comment. Use this to note changes or just to give more detail on exactly what the node is
doing in your workflow. You can comment nodes, metanodes, and components.
Workflow Annotations
Workflow annotations are colored boxes you can place over your workflow, as can be seen in many KNIME examples. A common use is to clearly separate
sections of your workflow into data loading, ETL, modeling, and predicting. This makes it easy for colleagues to quickly identify the area they’re looking for.
You can customize the color of the border, the background, and text font / size.
Metanodes
Metanodes are like a subfolder inside a workflow. They are a container around a selection of nodes. To create a metanode simply highlight all the nodes
you want to put inside and right click to select Collapse into Metanode. This won’t affect how your workflow runs at all, it simply helps to structure the view
visually. When you collapse your nodes into a metanode you can select what to name the metanode: this is the text that appears above the node. You can
also comment your metanodes just like normal nodes by double clicking beneath them.
Now, of course, to generate a successful model for deployment, you’ll want to make sure you’ve cleaned up your data and completed any feature
engineering you might want to do first, but this is how training a model will look in KNIME. Pretty straightforward right?
Figure 25 Part of a KNIME workflow in which a model is built using the Learner, Predictor, and Scorer nodes
Figure 26 Nodes that support that training and deployment of tree-based models in KNIME
Both regression and classifications trees are supported as well as their ensembles such as random or boosted forests. In KNIME you can use KNIME
specific implementations of these algorithms as well as those from several other popular open source tools, such as H20, XGBoost, and Spark. The
customizations on these models is also quite robust with the ability to customize minimal tree node sizes, maximal tree depth, and more. The two primarily
learning methods supported are Information Gain, and Gini Index.
Let’s look at how to set up hierarchical clustering. Unlike other techniques where you use a learner and a predictor, we’ll require three steps here. First, we
need to calculate distances using a distance node, note that there’s a separate node for string and numeric distances: pick whichever suits your data.
Second, we’ll use those distances in the Hierarchical Clustering (DistMatrix) node to create the cluster tree. Then finally, the Hierarchical Cluster Assigner
node assigns the actual cluster values to each row based on either a number of clusters or a maximum distance, which you can set in the configuration
dialog, as shown in Figure 28.
Figure 28 Hierarchical Clustering Example workflow and Cluster Assigner Configuration Dialog
Figure 30 KNIME nodes for evaluating the success of your models, including the model interpretability tools, such as LIME and Shapley
Figure 31 Parameter Optimization Loop Start Configuration Dialog (left) and Parameter Optimization Loop End Configuration Dialog (right)
Figure 32 Parameter Operation example workflow, retrains Random Forest model with different numbers of trees and different maximum tree depths
using a brute force method. Maximizes on accuracy.
KNIME has very similar functionality through the KNIME WebPortal. The WebPortal is
part of KNIME server and is accessed through a web browser. Building these
WebPortal enabled workflows is easy and I’ll summarize the steps but first let’s talk
about Widgets and Components.
In Figure 33 you’ll see one page of a WebPortal example called Guided Visualization,
this is available on the KNIME Hub for you to try out. The view you see is a single
component, and, as you continue to move through the WebPortal each new page is
based on a new component in your workflow. In this way, the WebPortal user can
move through your workflow at specific interaction points you have defined. This
example allows a user to upload a data file of their choice and then create custom
visualizations taken from the WebPortal. This is perfect for speeding up presentation
design for a marketing or sales team!
These WebPortal applications are built in just the same way as you would build
workflow in KNIME Analytics Platform and then deployed to KNIME Server for use.
Macros in Alteryx are your way to create what are, in effect, custom tools. You’ll build a
section of your workflow using special tools, which allow for interaction, and then wrap
them up into a macro that can be used in another workflow.
These components can be saved either locally or to a Server for repeated use and sharing. Do this by right clicking on your component, expanding the
component line, and then choosing the Share option.
To create a component intended for use locally the only major addition is a Configuration node, found in the Workflow Abstraction folder. These
configuration nodes behave a lot like some of their widget counterparts with one exception. Instead of displaying in the WebPortal they display in the
configuration dialog when you right click the finished component. Behaving just like a regular KNIME node. See Figure 34 bellow for an example.
Figure 34 Right click and select Create Component… to condense a set of nodes into a Component.
Widget nodes:
Widgets can be found under Workflow Abstraction > Widgets in the node repository.
They come in a few different categories represented by the subfolders you see to the
right: input, selection, filter, and output. Input enables users to provide Flow Variables
for use in the workflow, this could be in the form of a dropdown selection for strings, a
slider for numeric values, a calendar selector for Data & Time, etc. Input also contains
the file upload node to allow the user to supply their own data. Selection allows the user
to set things such as filter parameters, or column(s) for processing. Filter includes more
interactive options for filtering, these can be tied to graphical displays for dynamic
views. Finally, output allows for end outputs such as downloadable files, images, or text.
Figure 37 Counting Loop example and configuration window of Count Loop Start node
The image above using the Counting Loop Start node, this loop start variant simply loops a given number off time as you can set in its configuration
window. Let’s look at a couple other types of loops in KNIME to get you more familiar with what’s possible. These are only three types, there are several
more you can explore as well.
The Group Loop works a lot like the Group By node, which we looked at earlier in this booklet. You
select a set of columns to use to group your data, but instead of setting a manual aggregation method
for the data in that group you gain access to the groups one by one as you iterate through the Group
The Recursive Loop is special in that it is the only type of loop that can pass data back to the start to
be used in the next iteration. It must be paired with a Recursive Loop End node where you’ll declare
what gets sent back to the next iteration.
The Table Row to Variable Loop doesn’t supply data to each iteration like the others. It iterates over
each row of the table providing the values inside that row as flow variables. A popular use of this node
is to combine it with the List Files node and the Excel Reader node to easily read and concatenate an
entire directory of files.
Figure 38 Example workflow using a flow variable to name an Excel file to be written
Figure 39 Example workflow using flow variables to control the number of clusters in a k-Means clustering node
You see in this workflow that instead of creating a Flow Variable with a configuration node manually the red variable line starts with a Table Row to
Variable Loop Start node. We briefly touched on this node in the loop section as well but basically what it will do is convert each column in a table to a
variable and iterate through them one row at a time allowing you to build a workflow that performs many similar tasks quickly and easily. In this case we’re
passing a variable into the K-Means clustering node to change how many clusters it creates and collecting that information, along with some info from the
Entropy Scorer node at the end of the Loop to help us decide how to cluster our data.
Bool X X
INT X X
Decimal X X Both having multiple options for precision
Complex Number X
String X X Alteryx having multiple options for storage efficiency
Nominal X
Data and/or Time X X Dates, Times, or Date / Times
Spatial Objects X Points, Lines, and Polygons
Network / Graph X X
Audio X .wav format
Image X
Document X Includes text and meta data for text mining
Collection X List of values in single table cell
Alteryx Tool KNIME Node Alternate Node Alternate Node Alternate Node
https://www.knime.com/faq
KNIME Hub
The perfect place to search for nodes or example workflows when you’re not quite sure what you need yet.
https://hub.knime.com/
Forum
Come here to engage in community discussion, submit feature requests, ask for help, or help others yourself!
https://forum.knime.com/
Blogs
A collection of blog posts covering data science with KNIME, a great space to learn what KNIME can really do.
https://www.knime.com/blog
Learning Hub
A central spot to access education material to get you started with KNIME
https://www.knime.com/learning-hub
KNIME TV
Our very own YouTube channel with everything from community news, to webinars, to mini lessons.
https://www.youtube.com/user/KNIMETV
KNIME Press
Information on all our available books, like this one!
https://www.knime.com/knimepress
https://www.knime.com/learning/events