0% found this document useful (0 votes)
377 views38 pages

Talend Quick Book

Talend is a software integration platform that provides solutions for data integration, quality, management, preparation and big data. It offers various commercial products and Talend Open Studio, which is a free and open source ETL tool used widely for data integration and big data tasks. The document discusses Talend Open Studio's system requirements, installation process, benefits, and how it can be used to create ETL jobs using various components for tasks like database integration, file processing and data transformation.

Uploaded by

kailash yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
377 views38 pages

Talend Quick Book

Talend is a software integration platform that provides solutions for data integration, quality, management, preparation and big data. It offers various commercial products and Talend Open Studio, which is a free and open source ETL tool used widely for data integration and big data tasks. The document discusses Talend Open Studio's system requirements, installation process, benefits, and how it can be used to create ETL jobs using various components for tasks like database integration, file processing and data transformation.

Uploaded by

kailash yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Talend is a software integration platform which provides solutions for Data

integration, Data quality, Data management, Data Preparation and Big Data. The
demand for ETL professionals with knowledge on Talend is high. Also, it is the only
ETL tool with all the plugins to integrate with Big Data ecosystem easily.
According to Gartner, Talend falls in Leader magic quadrant for Data Integration
tools.
Talend offers various commercial products as listed below −

• Talend Data Quality


• Talend Data Integration
• Talend Data Preparation
• Talend Cloud
• Talend Big Data
• Talend MDM (Master Data Management) Platform
• Talend Data Services Platform
• Talend Metadata Manager
• Talend Data Fabric
Talend also offers Open Studio, which is an open source free tool used widely for
Data Integration and Big Data.

Talend - System Requirements


The following are the system requirements to download and work on Talend Open
Studio −

Recommended Operating system


• Microsoft Windows 10
• Ubuntu 16.04 LTS
• Apple macOS 10.13/High Sierra
Memory Requirement
• Memory - Minimum 4 GB, Recommended 8 GB
• Storage Space - 30 GB
Besides, you also need an up and running Hadoop cluster (preferably Cloudera.
Note − Java 8 must be available with environment variables already set.

Talend - Installation
To download Talend Open Studio for Big Data and Data Integration, please follow
the steps given below −
Step 1 − Go to the page: https://www.talend.com/products/big-data/big-data-
open-studio/ and click the download button. You can see that TOS_BD_xxxxxxx.zip
file starts downloading.
Step 2 − After the download finishes, extract the contents of the zip file, it will create
a folder with all the Talend files in it.
Step 3 − Open the Talend folder and double click the executable file: TOS_BD-win-
x86_64.exe. Accept the User License Agreement.

Step 4 − Create a new project and click Finish.

Step 5 − Click Allow Access in case you get Windows Security Alert.
Step 6 − Now, Talend Open Studio welcome page will open.
Step 7 − Click Finish to install the Required third-party libraries.

Step 8 − Accept the terms and click on Finish.

Step 9 − Click Yes.


Now your Talend Open Studio is ready with necessary libraries.

Talend Open Studio


Talend Open Studio is a free open source ETL tool for Data Integration and Big Data.
It is an Eclipse based developer tool and job designer. You just need to Drag and Drop
components and connect them to create and run ETL or ETL Jobs. The tool will create
the Java code for the job automatically and you need not write a single line of code.
There are multiple options to connect with Data Sources such as RDBMS, Excel, SaaS
Big Data ecosystem, as well as apps and technologies like SAP, CRM, Dropbox and
many more.
Some important benefits which Talend Open Studio offers are as below −
• Provides all features needed for data integration and synchronization
with 900 components, built-in connectors, converting jobs to Java code
automatically and much more.
• The tool is completely free, hence there are big cost savings.
• In last 12 years, multiple giant organizations have adopted TOS for Data
integration, which shows very high trust factor in this tool.
• The Talend community for Data Integration is very active.
• Talend keeps on adding features to these tools and the documentations
are well structured and very easy to follow.

Talend - Data Integration


Most organizations get data from multiple places and are store it separately. Now if
the organization has to do decision making, it has to take data from different sources,
put it in a unified view and then analyze it to get a result. This process is called as
Data Integration.

Benefits
Data Integration offers many benefits as described below −
• Improves collaboration between different teams in the organization
trying to access organization data.
• Saves time and eases data analysis, as the data is integrated effectively.
• Automated data integration process synchronizes the data and eases
real time and periodic reporting, which otherwise is time consuming if
done manually.
• Data which is integrated from several sources matures and improves
over time, which eventually helps in better data quality.
Working with Projects
In this section, let us understand how to work on Talend projects −

Creating a Project
Double click on TOS Big Data executable file, the window shown below will open.
Select Create a new project option, mention the name of the project and click on
Create.

Select the project your created and click Finish.


Importing a Project
Double click on TOS Big Data executable file, you can see the window as shown
below. Select Import a demo project option and click Select.

You can choose from the options shown below. Here we are choosing Data
Integration Demos. Now, click Finish.
Now, give the Project name and description. Click Finish.

You can see your imported project under existing projects list.
Now, let us understand how to import an existing Talend project.
Select Import an existing project option and click on Select.

Give Project Name and select the “Select root directory” option.
Browse your existing Talend project home directory and click Finish.
Your existing Talend project will get imported.

Opening a Project
Select a project from existing project and click Finish. This will open that Talend
project.

Deleting a Project
To delete a project, click Manage Connections.
Click Delete Existing Project(s)

Select the project you want to delete and click Ok.


Click OK again.

Exporting a Project
Click Export project option.
Select the project you want to export and give a path to where it should be exported.
Click on Finish.
Talend - Model Basics
Business Model is a graphical representation of a data integration project. It is a non-
technical representation of the workflow of the business.

Why you need a Business Model?


A business model is built to show the higher management what you are doing, and
it also makes your team understand what you are trying to accomplish. Designing a
Business Model is considered as one the best practices which organizations adopt at
the beginning of their data integration project. Besides, helping in reducing costs, it
finds and resolves the bottlenecks in your project. The model can be modified during
and after the implementation of the project, if required.

Creating Business Model in Talend Open Studio


Talend open studio provides multiple shapes and connectors to create and design a
business model. Each module in a business model can have a documentation
attached to itself.
Talend Open Studio offers the following shapes and connector options for creating a
business model −
• Decision − This shape is used for putting if condition in the model.
• Action − This shape is used to show any transformation, translation or
formatting.
• Terminal − This shape shows the output terminal type.
• Data − This shape is used show data type.
• Document − This shape is used for inserting a document object which
can be used for input/output of the data processed.
• Input − This shape is used for inserting input object using which user
can pass the data manually.
• List − This shape contains the extracted data and it can be defined to
hold only certain kind of data in the list.
• Database − This shape is used for holding the input / output data.
• Actor − This shape symbolizes the individuals involved in decision
making and technical processes
• Ellipse − Inserts an Ellipse shape.
Gear − This shape shows the manual programs that has to be replaced
by Talend jobs.

Talend - Components for Data Integration


All the operations in Talend are performed by connectors and components. Talend
offers 800+ connectors and components to perform several operations. These
components are present in palette, and there are 21 main categories to which
components belong. You can choose the connectors and just drag and drop it in the
designer pane, it will create java code automatically which will get compiled when
you save the Talend code.
Main categories which contains components are shown below −
The following is the list of widely used connectors and components for data
integration in Talend Open Studio −
• tMysqlConnection − Connects to MySQL database defined in the
component.
• tMysqlInput − Runs database query to read a database and extract
fields (tables, views etc.) depending on the query.
• tMysqlOutput − Used to write, update, modify data in a MySQL
database.
• tFileInputDelimited − Reads a delimited file row by row and divides
them into separate fields and passes it to the next component.
• tFileInputExcel − Reads an excel file row by row and divides them into
separate fields and passes it to the next component.
• tFileList − Gets all the files and directories from a given file mask
pattern.
• tFileArchive − Compresses a set of files or folders in to zip, gzip or tar.gz
archive file.
• tRowGenerator − Provides an editor where you can write functions or
choose expressions to generate your sample data.
• tMsgBox − Returns a dialog box with the message specified and an OK
button.
• tLogRow − Monitors the data getting processed. It displays data/output
in the run console.
• tPreJob − Defines the sub jobs that will run before your actual job starts.
• tMap − Acts as a plugin in Talend studio. It takes data from one or more
sources, transforms it, and then sends the transformed data to one or
more destinations.
• tJoin − Joins 2 tables by performing inner and outer joins between the
main flow and the lookup flow.
• tJava − Enables you to use personalized java code in the Talend
program.
• tRunJob − Manages complex job systems by running one Talend job
after another.

Talend - Job Design


This is the technical implementation/graphical representation of the business model.
In this design, one or more components are connected with each other to run a data
integration process. Thus, when you drag and drop components in the design pane
and connect then with connectors, a job design converts everything to code and
creates a complete runnable program which forms the data flow.

Creating a Job
In the repository window, right click the Job Design and click Create Job.
Provide the name, purpose and description of the job and click Finish.

You can see your job has been created under Job Design.
Now, let us use this job to add components, connect and configure them. Here, we
will take an excel file as an input and produce an excel file as an output with same
data.

Adding Components to a Job


There are several components in the palette to choose. There is a search option also,
in which you can enter the name of the component to select it.
Since, here we are taking an excel file as an input, we will drag and drop
tFileInputExcel component from the palette to the Designer window.
Now if you click anywhere on the designer window, a search box will appear. Find
tLogRow and select it to bring it in the designer window.

Finally, select tFileOutputExcel component from the palette and drag drop it in
designer window.
Now, the adding of the components is done.

Connecting the Components


After adding components, you must connect them. Right click the first component
tFileInputExcel and draw a Main line to tLogRow as shown below.

Similarly, right click tLogRow and draw a Main line on tFileOutputExcel. Now, your
components are connected.
Configuring the components
After adding and connecting the components in the job, you need to configure them.
For this, double click the first component tFileInputExcel to configure it. Give the path
of your input file in File name/stream as shown below.
If your 1st row in the excel is having the column names, put 1 in the Header option.
Click Edit schema and add the columns and its type according to your input excel file.
Click Ok after adding the schema.

Click Yes.

In tLogRow component, click on sync columns and select the mode in which you want
to generate the rows from your input. Here we have selected Basic mode with “,” as
field separator.

Finally, in tFileOutputExcel component, give the path of file name where you want to
store
your output excel file with the sheet name. Click on sync columns.

Executing the Job


Once you are done with adding, connecting and configuring your components, you
are ready to execute your Talend job. Click Run button to begin the execution.

You will see the output in the basic mode with “,” separator.
You can also see that your output is saved as an excel at the output path you
mentioned.

Talend - Metadata
Metadata basically means data about data. It tells about what, when, why, who,
where, which, and how of data. In Talend, metadata has the entire information about
the data which is present in Talend studio. The metadata option is present inside the
Repository pane of Talend Open Studio.
Various sources like DB Connections, different kind of files, LDAP, Azure, Salesforce,
Web Services FTP, Hadoop Cluster and many more options are present under Talend
Metadata.
The main use of metadata in Talend Open Studio is that you can use these data
sources in several jobs just by a simple drag and drop from the Metadata in repository
panel.

Talend - Context Variables


Context variables are the variables which can have different values in different
environments. You can create a context group which can hold multiple context
variables. You need not add each context variable one by one to a job, you can simply
add the context group to the job.
These variables are used to make the code production ready. Its means by using
context variables, you can move the code in development, test or production
environments, it will run in all the environments.
In any job, you can go to Contexts tab as shown below and add context variables.
Talend - Managing Jobs
In this chapter, let us look into managing jobs and the corresponding functionalities
included in Talend.

Activating/Deactivating a Component
Activating/Deactivating a Component is very simple. You just need to select the
component, right click on it, and choose the deactivate or activate that component
option.

Importing/Exporting Items and Building Jobs


To export item from the job, right click on the job in the Job Designs and click Export
items.
Enter the path where you want to export the item and click Finish.
To import item from the job, right click on the job in the Job Designs and click on
Import items.

Browse the root directory from where you want to import the items.
Select all the checkboxes and click Finish.

Talend - Handling Job Execution


In this chapter, let us understand handling a job execution in Talend.
To build a job, right click the job and select Build Job option.
Mention the path where you want to archive the job, select job version and build type,
then click Finish.
How to Run Job in Normal Mode
To run a job in a normal node, you need to select “Basic Run” and click the Run button
for the execution to begin.
How to Run Job in Debug Mode
To run job in a debug mode, add breakpoint to the components you want to debug.
Then, select and right click on the component, click Add Breakpoint option. Observe
that here we have added breakpoints to tFileInputExcel and tLogRow components.
Then, go to Debug Run, and click Java Debug button.

You can observe from the following screenshot that the job will now execute in
debug mode and according to the breakpoints that we have mentioned.
Advanced Settings
In Advanced setting, you can select from Statistics, Exec Time, Save Job before
Execution, Clear before Run and JVM settings. Each of this option has the
functionality as explained here −
• Statistics − It displays the performance rate of the processing;
• Exec Time − The time taken to execute the job.
• Save Job before Execution − Automatically saves the job before the
execution begins.
• Clear before Run − Removes everything from the output console.
• JVM Settings − Helps us to configure own Java arguments.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy