0% found this document useful (0 votes)

345 views18 pages

Talend Architecture White Paper - Branded - Final 11302020

Uploaded by

Noman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

345 views18 pages

Talend Architecture White Paper - Branded - Final 11302020

Uploaded by

Noman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

MODERN DATA ARCHITECTURE

WITH
TALEND AND
MICROSOFT AZURE

Brought to you by:

White Paper
Table of Contents
Introduction 1

The Reference Architecture for Talend Cloud with Azure 2

Modern Data Integration Architecture with Talend on Azure 4

Data Management Principles for Modern Data Architecture 9

Agile Delivery Principles for Scalable Enterprise Analytics 12

Adopting Cloud-Native Architecture Principles 14

Conclusion 15

Modern Data Architecture with Talend and Azure

Introduction
As companies face the mandate to quickly migrate their data and analytics
systems into the cloud, many of them miss their target expectations of
increased analytics speed and agility. Modernization efforts that begin with a
“lift-and-shift” strategy of merely taking on-premises systems and moving them
to cloud-based platforms will quickly reveal that a more holistic cloud strategy
including new cloud-native principles is the key to unlocking the potential
with the cloud. For existing data and analytics programs, the modern data
architecture is the next-generation mindset that challenges organizations to
unlearn old habits and pioneer new best practices. Companies need to take the
next steps in what was always intended to be an integration journey – rather
than a one-time migration activity – to fulfill the promises of the cloud with a
modern data architecture.

The goal of a modern data architecture in the cloud is to leverage the benefits
that allow agile teams to build and deliver high-quality data and analytics
environments faster and more efficiently to respond to business needs. Within
this context, modern data management principles and best practices are
emerging and, when combined with cloud-native principles, help fulfill the
cloud promises. This white paper explains these principles and establishes a
reference architecture based on Radiant Advisors’ research of Talend Cloud
customers on the Microsoft Azure platform. Additionally, this paper distills key
success factors customers discovered when using Talend Cloud and modern
data integration.

Modern Data Architecture with Talend and Azure 1

The Reference Architecture for Talend Cloud
with Azure
The modern data architecture is designed for data architects and data
engineers to enable agile analytic delivery teams, business analysts, and
data scientists to deliver three primary analytics capabilities to the business
in a faster and more agile manner (see Figure 1). The combination of data
architecture and integration is based on data management principles and the
pipelines that support it are designed to deliver cloud-native capabilities. The
reference architecture incorporates technologies and techniques that accelerate
Figure 1: analytics delivery.
Conceptual enterprise analytics
ecosystem based on a modern
data architecture

B usine ss Inte llig e nce E nte r p r ise S e lf-S e r v ice D ata S cie nce
and R e p o r ting D ata A nalytics and AI

Dashboards, Reporting Discover and Visualize Data Science

and OLAP Data and Analytics API Gateway

Data Science
Teams
Data Unification (Data Catalog, Governance, Collaboration, Semantic Layer)
Data Management

Data Warehouse Labs and Sandboxes Data Science Engines

Support Teams
Self-Service

Systems Analytic Databases Spark Cluster

Teams

Data Integration Data Prep

Data Platform Team

Enterprise Data Lake (Raw, Curated, Processed)

Engineering Data Pipelines

Data Ingestion (Streaming and Batch)

Operational Systems 2nd Party Systems 3rd Party Systems Public External Data User Data

Modern Data Architecture with Talend and Azure 2

Two separate – but related – architecture principles are represented in the
reference architecture shown in Figure 2. The data integration architecture
supports batch-oriented and streaming data processing with DataOps principles
for continuous integration and continuous deployment (CI/CD) through open-
source languages and orchestrated deployment options for data pipelines.
Figure 2:
Additionally, the data architecture is based on the data management principle
Reference Architecture for
Talend Cloud deployments of polyglot persistence, which dictates that the best-suited database technology
with Microsoft Azure is selected for the data classification and workload.

Data Sources Data Ingestion Data Pipeline Data Platform Analytics User Experience

On-Premises Streaming Data Processing

Op Sys Power BI
Consumers
Apache Kafka Azure Databricks Azure Azure Embedded
Replicate on HDInsight Synapse Analytics API Manger

Op Sys Batch Data Processing

Azure Azure Power BI Executive
Websites Machine Learning
SQL Database
Talend Talend
Cloud Apps Remote Engine Remote Engine

Azure Report
Azure Power BI Pro
Dynamics Developers
Cosmos DB Analysis Services
IoT Data Processing
Office Apps

eCommerce
Azure IoT Hub Talend Azure Azure Talend Business
Remote Cluster Data Lake Storage Data Lake Analytics Data Prep Analysts
External Data

Public Data

Social Media
Talend Jupyter Data
Talend Cloud
Data Quality and Governance Notebooks Scientists
IoT Devices
Orchestration Data Governance

This reference architecture illustrates at a high level how data moves from
data sources to analytics end-users through separate data ingestion and data
pipeline stages before arriving in the multi-tiered data platform analytics. These
architecture components will be discussed in more detail throughout this paper.
As we will show, Talend plays a key role in data pipeline deployments and
orchestration, along with data quality and governance.

Talend customers working on the Azure platform shared that

the goal of having “enterprise trusted data” and “more accurate
information” were among the primary reasons
for choosing Talend.

Modern Data Architecture with Talend and Azure 3

Modern Data Integration Architecture with
Talend on Azure
Following the well-established agile development principle of polyglot
programming, data engineers select the programming language and
deployment option that is best suited for their pipeline development project.
Further, these pipelines are then developed and deployed based on the cloud-
native principle of high accessibility with managed APIs. In this way, a data
engineer can deploy a corresponding API to access the data pipeline or function
from any application. Both Talend Cloud and Azure have the ability to manage
APIs, and interviewed Talend customers tend to leverage both.

Figure 3: Data Ingestion Data Pipeline

Deployment Options
with Talend Cloud Streaming Data Processing

Apache Kafka Azure Databricks

on HDInsight

Batch Data Processing

Talend Talend
Remote Engine Remote Engine

IoT Data Processing

Azure IoT Hub Talend

Remote Cluster

Talend Cloud

Orchestration

As an example, a data engineer can use Talend Cloud to orchestrate their

data pipeline in Spark and deploy it to a Spark cluster, while another data
engineer may choose to generate and orchestrate their data pipeline as a Java
program running on a Linux virtual machine. One Talend customer cited their
preference for deploying to Windows virtual machines because it could run
both .NET and Java data pipelines.

Modern Data Architecture with Talend and Azure 4

The flexible deployment options in Talend Cloud include remote engines such
as Windows or Linux virtual machines, clusters, or Spark clusters, Spark on
Azure HDInsight, and Azure Databricks. Figure 3 illustrates how teams can
configure and orchestrate multiple deployment options for data ingestion and
data processing for batch, streaming, and IoT streaming data pipelines.

Interviews with Talend Cloud customers cited the “speed and

flexibility of deploying servers” for development teams with
their own deployment options as one of their top benefits of
selecting Talend Cloud on Azure.

Based on orchestration tips from Talend customer interviews, Figure 4

represents the best practice of having an on-premise remote engine for local
processing and pairs of Linux virtual machines clustered for deployments,
with plans to move to Spark cluster or Azure Databricks in the future. Further,
one Talend customer created a deployment server in Dev, QA, and Prod and
named each according to their environments, which enhanced their success.
Without this environment naming convention they were challenged to keep
track of where they were deploying code. Further, they highly recommended
others should “follow the Talend model for configuration” with the “use
of generic context variables” and “avoid hardcoding anything” to prevent
additional complexity and potential rework in the future. In other tips for
success, another customer found benefits of using Talend Cloud orchestration
rather than Windows Task Scheduler in Windows virtual machine deployments
because Talend facilitates overall centralized management (whereas the
Windows management only managed that particular machine). The benefit of
centralized orchestration was also cited by a customer who noted that the way
Talend Cloud “retained the orchestration for Databricks jobs was very handy.”

Talend customers shared that the Talend development environment

speeds pipeline development time because it offers so many components
that minimize manual coding – as opposed to sorting through third-party
component tools. One Talend Cloud customer we interviewed shared the
additional benefit of using the development environment “for maintenance
activities, such as versioning, branching, and promoting pipelines”
to production.

Modern Data Architecture with Talend and Azure 5

Many Talend customers are still focused on batch-oriented data warehousing
refreshes with an orchestrated series of data reads and writes from data
lakes and staging areas before loading into the data warehouse (see Figure
4). These Talend customers on Azure stated that they had an intent for
streaming data processing in the future and that they will consider Talend’s
Figure 4:
streaming capability. This facilitates the data requirements for analytic
Data integration principle
for separating ingestion applications to deliver answers in near real-time scenarios, such as with
and processing predictions and recommendations.

Data Sources Data Ingestion Data Pipeline Data Platform

On-Premises

Op Sys
Azure
Batch Data Processing Synapse Analytics
Replicate

Talend
Remote Engine
Azure
Websites
SQL Database

Talend
Cloud Apps Remote Engine

Azure
Dynamics
Cosmos DB

Office Apps

eCommerce
Azure
Data Lake Storage

Talend
Talend Cloud
Data Quality and Governance

Orchestration Data Governance

When modernizing traditional ETL to modern data integration architectures,

a recommended data integration principle is to create smaller data pipeline
segments for improved delivery speed and management. (Note: This is
similar to the cloud-native principle to employ microservices and functions, as
discussed later in the paper.) As an initial step, a best practice recommendation
is to isolate the data ingestion processes in order to serve all data consumers
and agile development product teams (as represented in Figure 4).

Modern Data Architecture with Talend and Azure 6

Figure 5 illustrates the value of having a streaming data hub, such as Apache
Kafka, in the reference architecture to improve data pipeline development
speed by leveraging the work of other data pipelines’ functionality already
deployed upstream (i.e., reusability). Azure offers Apache Kafka on HDInsight
service and Azure Event Hubs or Azure IoT Hub as options for streaming
data hubs. A data ingestion pipeline or streaming application is developed to
continuously receive source data and publish it to a Kafka topic dedicated to
that data source, such as a database table. For scheduled data acquisitions, a
Talend data pipeline is developed to connect and acquire a set of records for
the data source. Traditionally, the acquired data is written to a data warehouse
staging area or the data lake, but, ideally, the records are also published to
their own Kafka topic a become a stream of data records.

Figure 5:
Data Ingestion Data Pipeline Data Platform
Integration architecture
pattern for streaming data Streaming Data Processing
with Talend
Subscribers
Azure
r Synapse Analytics
ib e
Su bscr
Azure Databricks
u cer
P ro d
Azure
Subscriber SQL Database

Apache Kafka Producer

on HDInsight
Talend
Azure
Remote Cluster
Cosmos DB

Subscribers

Azure
Data Lake Storage

Talend
Talend Cloud Data Quality and Governance

Orchestration Data Governance

The streaming data hub is the most influential aspect of a modern data
integration architecture that moves away from batch-oriented extract from
source, transform data, load into data warehouse ETL paradigm. This
architecture follows a publish-subscribe paradigm where there can be many

Modern Data Architecture with Talend and Azure 7

independent and asynchronous subscribers of the same data. Therefore, what
used to be data targets are now subscribers to the streaming data hub. This
isolates the traditional extraction (ingestion) process, transformation processes,
and loading process for more reusability and fault tolerance in an enterprise
environment. (See Figure 5.)

To further break down this process, a data pipeline can be dedicated to data
cleansing or data masking so that other data pipelines don’t have to duplicate
that process. As an example, a traditional ETL job is broken down into several
data pipeline segments. First, a data pipeline is created that uses SQL to acquire
changed data in a source database then deployed to a Talend remote engine
on-premises where Talend Cloud is scheduled to execute this job every 15
minutes. This data ingestion pipeline publishes its data to a topic in Apache
Kafka on Azure HDInsight. The Kafka connector for Azure Data Lake Storage
subscribes to the topic and automatically receives data every 15 minutes.
Another data pipeline is developed to integrate data from several Kafka topics
and writes its output to a different Kafka topic named “integrated data” where
other downstream data pipelines or databases can subscribe to it.

Talend customer interviews revealed that developing

and deploying a data pipeline that executes in-database
transformations with SQL is a performant way to leverage the
database compute and keep the resource load on the Talend
remote engine low.

This type of data pipeline that uses SQL statements to transform data inside
of the database is often referred to as extract, load, transform (ELT), or in-
database processing.

Modern Data Architecture with Talend and Azure 8

Data Management Principles for
Modern Data Architecture
The data management principle of polyglot persistence states that data should
be persisted (or stored) in the database technology that is the most optimal for
its workload. In a modern data architecture, the most common methods for
working with data is file-based, SQL access, or a REST API data service that
decouples the database. When applied to a data architecture for enterprise
analytics, Radiant Advisors specifies three classifications of data technologies
for analytics: a flexible class, an analytics-optimized class, and a data
management class.

Figure 6:
Data Platform
Data management with
polyglot persistence
principle

Azure
Synapse Analytics

Azure
SQL Database

Azure
Cosmos DB

Azure
Data Lake Storage

The flexible class serves as the important data architecture foundation and
repository of all enterprise data assets. Most commonly referred to as the data
lake, the data technology best-suited for flexible access is an object store such
as Azure Blob Storage or Hadoop Distributed File System (HDFS). A Talend
Cloud customer shared that their successful technique is to write files to their
Azure Blob Storage utilizing Talend data pipelines and the optimal Parquet
format standard.

Modern Data Architecture with Talend and Azure 9

The analytics-optimized class includes SQL database engines that leverage
techniques such as massively parallel processing or shared-nothing architecture.
Other analytics databases include columnar data storage, in-memory data
storage, and OLAP cubes through Azure Analysis Services. The analytics-
optimized class also includes NoSQL document stores and graph databases,
such as Azure’s popular Cosmos DB.

The reference class refers to SQL databases that retain row-based storage
(similar to OLTP databases), such as Azure Database PostgreSQL and Azure
SQL Database, and in-memory persistence with streaming ingestion for high-
performance data loading and updates.

The organization of these technology classes means the modern data

architecture is fundamentally a two-tiered data architecture of a scalable
data lake of all enterprise data assets and an optimized database layer for
analytics workloads. Over time, we anticipate that cloud-native architectures
for databases will evolve and improve to compete with the current optimized
databases, and therefore more data will be persisted in object stores with
analytic engines that leverage elastic compute resources.

The data lake serves as a single repository of all enterprise data, including the
data sources’ raw data formats (structured, semi-structured, and unstructured).
The default technology is an object store such as the Azure Blob Storage or
Azure Data Lake Storage (ADLS). In the past, it was common for on-premises
Hadoop clusters and Azure HDInsight to facilitate a distributed file system to
meet this data architecture role. In a modern data architecture, a data lake
allows for the data to be well-organized, managed, and cataloged, in addition
to being secure and governed.

The data warehouse and data marts continue to provide business

performance analytics with reporting and OLAP over historical trends.
MPP databases, columnar databases, and OLAP cubes are proven database
technologies that have been optimized for analytics and can support faster
query response time for large amounts of event data or high-performance

Modern Data Architecture with Talend and Azure 10

slicing and dicing of data in dimensional data models. Azure Synapse
Analytics represents the cloud-native evolution of the Azure SQL Data
Warehouse, while Azure SQL and Azure Analysis Services combine with

Figure 7: Power BI for delivering the reports, data visualizations, and dashboards that
Talend data pipelines organizations need to run their businesses.
on Azure Databricks for
optimized analytics

Data Ingestion Data Pipeline Data Analytics Access User Experience

Enterprise Data Hubs

Spark Streaming

Azure Databricks Azure Azure Power BI

Consumers
Synapse Analytics API Manger Embedded

Apache Kafka
Azure Azure Power BI Internal
on HDInsight Machine Learning Analysis Services

Figure 7 illustrates a data integration pattern for Talend Cloud deploying

data pipelines to Azure Databricks that delivers transformed data to Azure
Synapse Analytics. This combines the benefits of SQL Data Warehouse,
Spark-like compute, and with Jupyter-like notebooks (with Python and R)
in order to minimize the number of components involved and streamline the
data scientists’ data flow. Azure Synapse also easily integrates with Power
BI and Azure Machine Learning services. The Azure Cosmos DB is also
easily integrated to act as the analytics-optimized NoSQL database in the
modern data architecture. Azure Databricks is a popular choice on the Azure
platform for developers and data engineers who prefer to work in the latest
Spark environments.

Modern Data Architecture with Talend and Azure 11

Agile Delivery Principles
for Scalable Enterprise Analytics
Many organizations have embraced an agile methodology for delivering
analytics products and features rather than project-oriented development.
These agile delivery teams can now work independently for their product
owners and customers while leveraging the variety of Azure services, options,
and resources available to them. The independent delivery teams work in a
federated organizational model with centralized support in IT Cloud Ops and
Platform Architecture teams.

Speed and agility for delivering analytics are mostly derived from having an
agile methodology for delivering data pipelines and data preparation. Agile
delivery teams engage in sprints that focus on incremental product and features
for engineering data pipelines and analytics model development. Having a
data pipeline platform that allows data engineers to minimize the amount
of time needed for packaging, deploying, and monitoring code increases the
amount of time available for product development and business value within a
given sprint. This agile methodology can be amplified with DataOps (the data
engineering equivalent to DevOps) and CI/CD processes that can speed release
cycles if updates are needed to respond to production changes.

It is the responsibility of the Platform Architecture team to recommend best

practices, architecture patterns, and standards for the modern data architecture
intended to enable the agile delivery teams with fewer architectural and
standards decisions. Every agile delivery team has the option to develop their
data pipelines in any language they choose and operate them on various Azure
services. This includes Python, Java, Scala, or Julia running on Windows or
Linux virtual machines, Apache Spark clusters, or Azure Databricks services.
The Platform Architecture team also recommends a data pipeline development
tool, such as Talend, which gives the agile delivery teams faster development
times from a component-based development environment, the flexibility of
embedding custom code and exporting to open source languages such as
Python and Java, and the independence to deploying the data pipelines to
remote engines or Spark clusters.

Modern Data Architecture with Talend and Azure 12

Even with DataOps principles, monitoring data pipelines in production can
be a challenge due to inconsistency and volume if there is not a dedicated
data management console such as Talend Management Console. The Talend
Management Console monitors all data pipelines in remote engines and
Spark clusters – both in Azure and on-premises – and includes configurable
alert notifications for jobs and impacted dependencies. Azure Monitoring
offers a similar ability to monitor application logs and server metrics
universally for analysis and notification but does not have the specifics for
data pipeline operations.

Orchestration is one of the most challenging aspects of cloud data pipeline

execution. This is where data pipeline job scheduling, environments, and
notifications allow users to set up and orchestrate jobs. This also promotes a
consistent understanding and terminology for all agile delivery teams when
working with each other and with the Platform Architecture and Cloud Ops
teams. Deploying custom code into the cloud environments has proven to be
challenging and time-consuming for many agile delivery teams when trying to
sort out the many options in cloud services that need to be leveraged together,
but the Talend Management Console can create users, projects, environments,
and workspaces for every agile delivery team and their corresponding remote
execution engines. Agile delivery teams can log in to the console to manage
and monitor all tasks (both scheduled and running in production) and
configure their alerts and notifications.

One of the Talend Cloud apps is Talend Pipeline Designer, used for agile
delivery to build data pipelines directly in the browser, while Talend Data
Inventory centralizes connectivity to enterprise data sets for agile delivery
teams to share. The Talend Cloud API Designer can be used to support the
deployment of APIs for data services.

Modern Data Architecture with Talend and Azure 13

Adopting Cloud-Native Architecture Principles
Migrating to the cloud with Microsoft Azure and Talend presents the potential
for speed and agility to create value with enterprise analytics. In order to
realize that potential, this journey must be guided by cloud-native principles
and an organization’s ability to adopt them. Cloud-native principles are
primarily focused on application development through DevOps, CI/CD,
microservices, and containers, and these principles can be adapted to facilitate
analytics delivery speed (as previously discussed), development scalability, and
portability of data engineering and enterprise analytics.

Scalability in analytics delivery will come from the combination of adopting

microservices, serverless functions, and automation. Data pipelines need
to evolve away from large ETL packages and, rather, be designed for
microservices and serverless functions. We have already discussed how the
larger sub-modules dedicated to extracting and loading can be isolated and
fully automated with streaming data topics.

Further, Radiant Advisors recommends as a best practice to

isolate the cleansing, integrating, and calculating sub-modules
of data transformations. Talend customers agreed that this is a
best practice they plan to adopt.

For example, a data pipeline can subscribe to a Kafka topic representing a data
source table, be dedicated to cleansing each column of data, leverage API calls
for conversions or validations if needed, and then publish cleansed data back
to a new Kafka topic (for cleansed data only) that can be further transformed
by multiple consumer applications and analytics. A data pipeline can then
subscribe to the cleansed data Kafka topic and be dedicated to enrichment
of the data with additional information from an external service API for geo-
spatial information, demographics, or economic data for data scientists to
leverage in analytic models. Serverless functions can be called when needed
as part of a data process, and execution is measured and billed in hundreds of
milliseconds of use without the need to provision computing resources.

For analytics portability, data pipelines and analytics model work can be
deployed in lightweight containers, such as Kubernetes or Docker, with APIs

Modern Data Architecture with Talend and Azure 14

and URL designations for data persistence. With an orchestration service, these
containers can be deployed across on-premises data centers, Microsoft Azure,
or other public clouds when needed for execution. Containers empower data
engineers and data scientists to write-once and deploy-anywhere without
being concerned about which computing resources are available. The use of
containers is a mature cloud-native capability that we recommend if there is an
appropriate specific use case or when agile delivery teams have the experience
and expertise to increase their speed and agility with containers. Still, the
most significant initial benefits will come from adopting the CI/CD process and
DataOps, followed by microservices and serverless functions.

Conclusion
For companies modernizing their data architecture for analytics delivery on
Microsoft Azure, the journey requires that they embrace these modern data and
cloud principles found in this reference architecture. As the journey advances,
this cloud strategy can be holistically characterized as “rehost, replatform,
then rearchitect,” which requires tools that share these principles and
comprehensive functionality, such as Talend Cloud. A modern data architecture
designed with the principles and best practices distilled within this paper can
fully leverage the potential of the cloud to enable analytics delivery speed,
agility, and scalability.

While every attempt has been made to ensure that the information in this
document is accurate and complete, some typographical errors or technical
inaccuracies may exist. Radiant Advisors does not accept responsibility for any
kind of loss resulting from the use of information contained in this document.
The information contained in this document is subject to change without notice.

All brands and their products are trademarks or registered trademarks of their
respective holders and should be noted as such.

This edition published November 2020.

Modern Data Architecture with Talend and Azure 15

About the Author

John O’Brien is Principal Advisor and CEO of Radiant

Advisors. A recognized thought leader in data strategy and
analytics, John provides research, strategic advisory services
and mentoring that guide companies in data strategy,
architecture, analytics and emerging technologies.

This research report sponsored by:

About Talend

Talend (NASDAQ: TLND), a leader in cloud data integration

and data integrity, enables companies to transform by
delivering trusted data at the speed of business. Learn more at
www.Talend.com.

About Radiant Advisors

Radiant Advisors is an independent research and advisory firm that

delivers innovative, cutting-edge research and thought-leadership
to transform today’s organizations into tomorrow’s data-centric
industry leaders. To learn more, visit www.RadiantAdvisors.com.

Radiant Advisors
Boulder, CO USA
Email: info@radiantadvisors.com

Databricks Questions
No ratings yet
Databricks Questions
31 pages
The Basics of Data Analytics
89% (9)
The Basics of Data Analytics
17 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
TM-3534 AVEVA Diagrams 14.1.SP3 Administration 4.0
No ratings yet
TM-3534 AVEVA Diagrams 14.1.SP3 Administration 4.0
218 pages
Securing Snowflake
No ratings yet
Securing Snowflake
114 pages
Denodo Data Virtualization Basics
100% (1)
Denodo Data Virtualization Basics
57 pages
Profisee Datasheet Integrator 8.5x11
No ratings yet
Profisee Datasheet Integrator 8.5x11
1 page
Informatica Big Data Management Course Agenda
100% (2)
Informatica Big Data Management Course Agenda
4 pages
TalendOpenStudio DQ UG 7.0.1 en PDF
No ratings yet
TalendOpenStudio DQ UG 7.0.1 en PDF
309 pages
PBL2 SME Governance Problem Statement-V2
No ratings yet
PBL2 SME Governance Problem Statement-V2
3 pages
Backend Database Hacking
No ratings yet
Backend Database Hacking
77 pages
VIMSpc 2015A Installation Manual
0% (2)
VIMSpc 2015A Installation Manual
122 pages
Architecture Basics Guide Dataiku
No ratings yet
Architecture Basics Guide Dataiku
31 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Senior Data Engineer Resume Example
No ratings yet
Senior Data Engineer Resume Example
1 page
SCD Typ2 in Databricks Azure
0% (1)
SCD Typ2 in Databricks Azure
8 pages
Snowflake UNIT II
No ratings yet
Snowflake UNIT II
44 pages
External Tables
No ratings yet
External Tables
105 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
The Medallion Architecture
100% (1)
The Medallion Architecture
2 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
Snowflake To Oracle
No ratings yet
Snowflake To Oracle
16 pages
Master Snowflake Interview Q A 1729835390
No ratings yet
Master Snowflake Interview Q A 1729835390
7 pages
Talend Day-1 PDF
No ratings yet
Talend Day-1 PDF
26 pages
Informatica IDQ Dashboard Reports 961
No ratings yet
Informatica IDQ Dashboard Reports 961
14 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
Azure Data Engineer Course Curriculum Nareshit
100% (1)
Azure Data Engineer Course Curriculum Nareshit
10 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
1 Introduction To Databricks Machine Learning
No ratings yet
1 Introduction To Databricks Machine Learning
9 pages
Snowflake
No ratings yet
Snowflake
16 pages
Big Query Optimization Document
No ratings yet
Big Query Optimization Document
10 pages
Imp Quries
No ratings yet
Imp Quries
3 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Snowflake Certification Syllabus
No ratings yet
Snowflake Certification Syllabus
4 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
AWS Glue Studio
100% (1)
AWS Glue Studio
126 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
Informatica University
No ratings yet
Informatica University
6 pages
Resume - Tanmoy Munshi PDF
No ratings yet
Resume - Tanmoy Munshi PDF
2 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
Speed Your Data Lake ROI
100% (1)
Speed Your Data Lake ROI
16 pages
Ingestion Layer PDF
No ratings yet
Ingestion Layer PDF
11 pages
FSLDM Data Modeller
No ratings yet
FSLDM Data Modeller
1 page
What Is Dimensional Model
No ratings yet
What Is Dimensional Model
7 pages
Teradata To Snowflake Migration Guide
No ratings yet
Teradata To Snowflake Migration Guide
14 pages
Python For Data Engineering Guide
No ratings yet
Python For Data Engineering Guide
4 pages
Ajay Kadiyala Resume 2023 PDF
No ratings yet
Ajay Kadiyala Resume 2023 PDF
6 pages
Loading Data in +snowflake
No ratings yet
Loading Data in +snowflake
10 pages
3 Snowflake+Architecture
No ratings yet
3 Snowflake+Architecture
20 pages
Data Vault and HQDM Principles PDF
No ratings yet
Data Vault and HQDM Principles PDF
8 pages
Sssis Interview Questins
No ratings yet
Sssis Interview Questins
7 pages
LUF-MDM-002 Informatica MDM Hub Installation and Configuration Guide v01.1
100% (1)
LUF-MDM-002 Informatica MDM Hub Installation and Configuration Guide v01.1
50 pages
ETL Tool Comparison
No ratings yet
ETL Tool Comparison
16 pages
Snowflake
No ratings yet
Snowflake
3 pages
Teradata SQL Performance Tuning Case Study Part II
0% (1)
Teradata SQL Performance Tuning Case Study Part II
37 pages
Informatica Cloud Enterprise Labs
No ratings yet
Informatica Cloud Enterprise Labs
90 pages
Shelly Bansal - SR Data Engineer
No ratings yet
Shelly Bansal - SR Data Engineer
6 pages
Prashanth Talend
No ratings yet
Prashanth Talend
4 pages
ILT-Fundamentals 4-Day - Datasheet
No ratings yet
ILT-Fundamentals 4-Day - Datasheet
4 pages
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
12 Requirements For A Modern Data Architecture in A Hybrid Cloud World
100% (2)
12 Requirements For A Modern Data Architecture in A Hybrid Cloud World
23 pages
Azure Application Modernization PDF
No ratings yet
Azure Application Modernization PDF
55 pages
SVC323 - Samuel Zhao - Modernize Applications To The Cloud
No ratings yet
SVC323 - Samuel Zhao - Modernize Applications To The Cloud
29 pages
Critical Success Factors For Data Lake Architecture: Checklist Report
No ratings yet
Critical Success Factors For Data Lake Architecture: Checklist Report
17 pages
Equalum Provides A, End To End Solution To Data Ingestion.: Future-Proof
No ratings yet
Equalum Provides A, End To End Solution To Data Ingestion.: Future-Proof
4 pages
Lessons Learnt: Options Lesson Learnt Challenge
No ratings yet
Lessons Learnt: Options Lesson Learnt Challenge
2 pages
Togaf
No ratings yet
Togaf
24 pages
ISU Release Notes EHP 7
No ratings yet
ISU Release Notes EHP 7
10 pages
Gas Agency
50% (2)
Gas Agency
96 pages
BulkConfigurator V11
No ratings yet
BulkConfigurator V11
96 pages
TA Post
No ratings yet
TA Post
6 pages
Unit 1
No ratings yet
Unit 1
17 pages
Library Management System: Koh Yee Keat
No ratings yet
Library Management System: Koh Yee Keat
147 pages
The Matter of Future Heritage
100% (1)
The Matter of Future Heritage
228 pages
Atmospheric Data Collection, Processing and Database Management in India Meteorological Department PDF
No ratings yet
Atmospheric Data Collection, Processing and Database Management in India Meteorological Department PDF
9 pages
AutoCAD Civil 3D Engineering Survey Plan Processing Guide PDF
No ratings yet
AutoCAD Civil 3D Engineering Survey Plan Processing Guide PDF
77 pages
FB Chatbot REPORT Final
No ratings yet
FB Chatbot REPORT Final
41 pages
Data Base Concepts
No ratings yet
Data Base Concepts
51 pages
Dbms Viva PDF
No ratings yet
Dbms Viva PDF
14 pages
Hana Ha
No ratings yet
Hana Ha
140 pages
Golang Backend Development Roadmap
No ratings yet
Golang Backend Development Roadmap
15 pages
Ebook Part 3 - The BI Framework - How To Turn Information Into A Competitive Asset
No ratings yet
Ebook Part 3 - The BI Framework - How To Turn Information Into A Competitive Asset
54 pages
CSI2132 Course Schedule
No ratings yet
CSI2132 Course Schedule
2 pages
Business Case Study Scenario - Pets N Paws
No ratings yet
Business Case Study Scenario - Pets N Paws
2 pages
NCDrive Version Info - EN
No ratings yet
NCDrive Version Info - EN
17 pages
1.roles of Information Systems in Business: Information Storage and Analysis
No ratings yet
1.roles of Information Systems in Business: Information Storage and Analysis
8 pages
Answers PDF
No ratings yet
Answers PDF
9 pages
Neo4j: What's A Graph Database?
No ratings yet
Neo4j: What's A Graph Database?
2 pages
Smartyouth Savings and Credit System Project Report
No ratings yet
Smartyouth Savings and Credit System Project Report
24 pages
1-ABC of Workflow - Shareapps4u
No ratings yet
1-ABC of Workflow - Shareapps4u
5 pages
TCS NQT Real Interview Experenices
No ratings yet
TCS NQT Real Interview Experenices
19 pages
Arcade Business College: Synopsis
No ratings yet
Arcade Business College: Synopsis
37 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Talend Architecture White Paper - Branded - Final 11302020

Uploaded by

Talend Architecture White Paper - Branded - Final 11302020

Uploaded by

MODERN DATA ARCHITECTURE

Brought to you by:

The Reference Architecture for Talend Cloud with Azure 2

Modern Data Integration Architecture with Talend on Azure 4

Data Management Principles for Modern Data Architecture 9

Agile Delivery Principles for Scalable Enterprise Analytics 12

Adopting Cloud-Native Architecture Principles 14

Modern Data Architecture with Talend and Azure

Modern Data Architecture with Talend and Azure 1

Dashboards, Reporting Discover and Visualize Data Science

Data Warehouse Labs and Sandboxes Data Science Engines

Systems Analytic Databases Spark Cluster

Data Integration Data Prep

Data Platform Team

Engineering Data Pipelines

Data Ingestion (Streaming and Batch)

Modern Data Architecture with Talend and Azure 2

On-Premises Streaming Data Processing

Op Sys Batch Data Processing

Talend customers working on the Azure platform shared that

Modern Data Architecture with Talend and Azure 3

Figure 3: Data Ingestion Data Pipeline

Apache Kafka Azure Databricks

Batch Data Processing

IoT Data Processing

Azure IoT Hub Talend

As an example, a data engineer can use Talend Cloud to orchestrate their

Modern Data Architecture with Talend and Azure 4

Interviews with Talend Cloud customers cited the “speed and

Based on orchestration tips from Talend customer interviews, Figure 4

Talend customers shared that the Talend development environment

Modern Data Architecture with Talend and Azure 5

Data Sources Data Ingestion Data Pipeline Data Platform

Orchestration Data Governance

When modernizing traditional ETL to modern data integration architectures,

Modern Data Architecture with Talend and Azure 6

Apache Kafka Producer

Orchestration Data Governance

Modern Data Architecture with Talend and Azure 7

Talend customer interviews revealed that developing

Modern Data Architecture with Talend and Azure 8

Modern Data Architecture with Talend and Azure 9

The organization of these technology classes means the modern data

The data warehouse and data marts continue to provide business

Modern Data Architecture with Talend and Azure 10

Data Ingestion Data Pipeline Data Analytics Access User Experience

Enterprise Data Hubs

Azure Databricks Azure Azure Power BI

Figure 7 illustrates a data integration pattern for Talend Cloud deploying

Modern Data Architecture with Talend and Azure 11

It is the responsibility of the Platform Architecture team to recommend best

Modern Data Architecture with Talend and Azure 12

Orchestration is one of the most challenging aspects of cloud data pipeline

Modern Data Architecture with Talend and Azure 13

Scalability in analytics delivery will come from the combination of adopting

Further, Radiant Advisors recommends as a best practice to

Modern Data Architecture with Talend and Azure 14

This edition published November 2020.

Modern Data Architecture with Talend and Azure 15

John O’Brien is Principal Advisor and CEO of Radiant

This research report sponsored by:

Talend (NASDAQ: TLND), a leader in cloud data integration

About Radiant Advisors

Radiant Advisors is an independent research and advisory firm that

© 2020 Radiant Advisors. All Rights Reserved.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.