Ankitkumar Pranaykumar Sinha Data Engineer
Ankitkumar Pranaykumar Sinha Data Engineer
Mobile: +91-8626-064676
Email-ID: sinhaankit.mca2014@gmail.com
Career Objective
To be involved in work where I can utilize my skills and gain self-satisfaction. That
effectively contributes to the growth of organization.
Technical Skills
Backend: SQL, Teradata, Pyspark, Sqoop, Pig, Python, Scala, spark.
Scripts: Shell Scripts, JavaScript, Python
Web Technology: HTML, CSS, Bootstrap
Tools: Control M, Airflow, Jira, Git, Talend.
Google Cloud Platform: Big Query, Google Storage, Dataproc, Dataflow, AirFlow
An Overview
• 4.10 years of professional experience of Data engineering projects having
worked in Big Data Technologies such as Hadoop, Spark, Sqoop, Pig, Python and
tools on Google Cloud Platform(GCP) such as Big query, Dataproc,
Spark,Scala,Hive, Python, Talend tools.
• Experience of working with Data transformation, , Ingestion, writing the ETL
logic and warehousing solutions across platforms involving Teradata , Google
Cloud Platform
• Experience in migration of a project from on premises Hadoop server
to cloud. Scheduling of script is done by using Airflow using composer.
• Experience in loading data to Hive table using Spark.
• Experience of end to end pipeline (File based load, Kafka load)
Organizational Scan
• April 2021 to current: GSPANN Technology as Senior Software Engineer.
• March 2020 to April 2021: Deloitte as Consultant
• April 3018 to March 2020 : Sears Holding India as Associate Engineer
• August 2017 to April 2020: Sears Holdings India as Intern
• Jan 2017 to August 2017 : Vision Media Entertainment as Intern
Project Overview
• At GSPANN
I am working for a Pharm client. Involve in POC in my early days. Created end to end generic
pipeline for file-based ingestion and API ingestion. Later on working on cost optimisation for project.
We migrated 18 different composer instances to a single common composer instance to reduce the cost
of the project. Migrated almost 150 DAG’s to new environment. POC is in progress to setup the CI/CD
pipeline for DAG deployment.
Also created the End to End pipeline to setup a ELN system. Our source is Oracle and destination
Datawarehouse is Bigquery. We are using Talend as an ETL tool for transforming the data. All the jobs
are created in the Talend Studio and we had used the windows scheduler to trigger the jobs.
Created a Pipeline to connect with the SAP and Salesforce data source. Directly pulling the raw
data from the source and getting it loaded to the Bigquery staging layer. All the business ETL are written
in Biquery SQL and we have used Airflow to schedule the DAG’s.
Recenty in the current project we have created the connection with Postgrey, Sharepont and MS
project online to integrate the various source and pull all the data into a single data warehouse. On top of
that date we are developing the various model to get the better insites for the sales.
Also worked on a project in which we are using the web scrapping mechanism. We are targeting
to the 10 different website from where we are scrapping the data on weekly basis and loading it in a
Bigquery. All the data cleaning are taken care into the Bigquery ETL. We have used the Python for
scraping the data and created DAG to load the scrapped data to bigquery on a regular basis.
Academic Credentials
2017 Master of Computer Applications - Vishvakarma Institute of Technology,
Pune Secured 73.27 %
Personal Dossier
Date of birth: 23rd February, 1994
Residential: R-1004, Jade Residency, Whagoli, Pune