The document outlines a training schedule for a Data Engineer from May to November 2024, covering essential skills such as programming in Python, SQL, cloud fundamentals, ETL processes, and big data engineering using tools like Spark and Apache Kafka. It includes learning objectives for each month, focusing on data transformation, data warehousing, and data visualization with tools like Tableau and Power BI. The curriculum emphasizes hands-on experience with various data processing libraries and cloud platforms.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
28 views16 pages
Data Engineer - Copie
The document outlines a training schedule for a Data Engineer from May to November 2024, covering essential skills such as programming in Python, SQL, cloud fundamentals, ETL processes, and big data engineering using tools like Spark and Apache Kafka. It includes learning objectives for each month, focusing on data transformation, data warehousing, and data visualization with tools like Tableau and Power BI. The curriculum emphasizes hands-on experience with various data processing libraries and cloud platforms.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16
DATA ENGINEER
May 2024 June 2024 June 2024 July 2024
• Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker DATA ENGINEER May 2024 June 2024 June 2024 July 2024 • Learn Programming • Learn SQL Programming in • Cloud Fundamentals • ETL using Python/Spark 1. Basic Python detail (Azure / • Data Processing 2. Working with Data/Files • Rank Window Functions AWS / GCP) Libraries / • Aggregations • Hadoop Ecosystem Constructs (NumPy, • Learn Basics of Relational • Data wangling and (HDFS, Pandas, Database Analysis MapReduce, YARN, RDD, Spark, Dataframe) 1. SQL • Data warehouse and Sqoop, Hive Server/MySQL/PostgreSQL concepts etc.) • Data Modeling for warehouse
September 2024 August 2024 August 2024 July 2024
• Data Transformation Tools • Handling Data streaming • Data Engineering (AWS / • Big Data Engineering • DBT • Processing Streaming GCP / Using Spark Data Azure) Optimization in Spark • Apache Kafka Links • Cloud Data Warehouse • Workflow Schedules (Snowflake / Databrics / (Airflow) Redshift)
September 2024 October 2024 October 2024 November 2024
• Dashboarding & • Docker, DataOps, Azure • Data Lakehouse, Data • Data Quality, Data Visualization DevOps Mesh, Data • Fabrics Observability, • Data • Tableau / Power BI / Governance Looker