Talend Quick Book
Talend Quick Book
integration, Data quality, Data management, Data Preparation and Big Data. The
demand for ETL professionals with knowledge on Talend is high. Also, it is the only
ETL tool with all the plugins to integrate with Big Data ecosystem easily.
According to Gartner, Talend falls in Leader magic quadrant for Data Integration
tools.
Talend offers various commercial products as listed below −
Talend - Installation
To download Talend Open Studio for Big Data and Data Integration, please follow
the steps given below −
Step 1 − Go to the page: https://www.talend.com/products/big-data/big-data-
open-studio/ and click the download button. You can see that TOS_BD_xxxxxxx.zip
file starts downloading.
Step 2 − After the download finishes, extract the contents of the zip file, it will create
a folder with all the Talend files in it.
Step 3 − Open the Talend folder and double click the executable file: TOS_BD-win-
x86_64.exe. Accept the User License Agreement.
Step 5 − Click Allow Access in case you get Windows Security Alert.
Step 6 − Now, Talend Open Studio welcome page will open.
Step 7 − Click Finish to install the Required third-party libraries.
Benefits
Data Integration offers many benefits as described below −
• Improves collaboration between different teams in the organization
trying to access organization data.
• Saves time and eases data analysis, as the data is integrated effectively.
• Automated data integration process synchronizes the data and eases
real time and periodic reporting, which otherwise is time consuming if
done manually.
• Data which is integrated from several sources matures and improves
over time, which eventually helps in better data quality.
Working with Projects
In this section, let us understand how to work on Talend projects −
Creating a Project
Double click on TOS Big Data executable file, the window shown below will open.
Select Create a new project option, mention the name of the project and click on
Create.
You can choose from the options shown below. Here we are choosing Data
Integration Demos. Now, click Finish.
Now, give the Project name and description. Click Finish.
You can see your imported project under existing projects list.
Now, let us understand how to import an existing Talend project.
Select Import an existing project option and click on Select.
Give Project Name and select the “Select root directory” option.
Browse your existing Talend project home directory and click Finish.
Your existing Talend project will get imported.
Opening a Project
Select a project from existing project and click Finish. This will open that Talend
project.
Deleting a Project
To delete a project, click Manage Connections.
Click Delete Existing Project(s)
Exporting a Project
Click Export project option.
Select the project you want to export and give a path to where it should be exported.
Click on Finish.
Talend - Model Basics
Business Model is a graphical representation of a data integration project. It is a non-
technical representation of the workflow of the business.
Creating a Job
In the repository window, right click the Job Design and click Create Job.
Provide the name, purpose and description of the job and click Finish.
You can see your job has been created under Job Design.
Now, let us use this job to add components, connect and configure them. Here, we
will take an excel file as an input and produce an excel file as an output with same
data.
Finally, select tFileOutputExcel component from the palette and drag drop it in
designer window.
Now, the adding of the components is done.
Similarly, right click tLogRow and draw a Main line on tFileOutputExcel. Now, your
components are connected.
Configuring the components
After adding and connecting the components in the job, you need to configure them.
For this, double click the first component tFileInputExcel to configure it. Give the path
of your input file in File name/stream as shown below.
If your 1st row in the excel is having the column names, put 1 in the Header option.
Click Edit schema and add the columns and its type according to your input excel file.
Click Ok after adding the schema.
Click Yes.
In tLogRow component, click on sync columns and select the mode in which you want
to generate the rows from your input. Here we have selected Basic mode with “,” as
field separator.
Finally, in tFileOutputExcel component, give the path of file name where you want to
store
your output excel file with the sheet name. Click on sync columns.
You will see the output in the basic mode with “,” separator.
You can also see that your output is saved as an excel at the output path you
mentioned.
Talend - Metadata
Metadata basically means data about data. It tells about what, when, why, who,
where, which, and how of data. In Talend, metadata has the entire information about
the data which is present in Talend studio. The metadata option is present inside the
Repository pane of Talend Open Studio.
Various sources like DB Connections, different kind of files, LDAP, Azure, Salesforce,
Web Services FTP, Hadoop Cluster and many more options are present under Talend
Metadata.
The main use of metadata in Talend Open Studio is that you can use these data
sources in several jobs just by a simple drag and drop from the Metadata in repository
panel.
Activating/Deactivating a Component
Activating/Deactivating a Component is very simple. You just need to select the
component, right click on it, and choose the deactivate or activate that component
option.
Browse the root directory from where you want to import the items.
Select all the checkboxes and click Finish.
You can observe from the following screenshot that the job will now execute in
debug mode and according to the breakpoints that we have mentioned.
Advanced Settings
In Advanced setting, you can select from Statistics, Exec Time, Save Job before
Execution, Clear before Run and JVM settings. Each of this option has the
functionality as explained here −
• Statistics − It displays the performance rate of the processing;
• Exec Time − The time taken to execute the job.
• Save Job before Execution − Automatically saves the job before the
execution begins.
• Clear before Run − Removes everything from the output console.
• JVM Settings − Helps us to configure own Java arguments.