Dataengineering
Dataengineering
VIRTUAL INTERNSHIP
BY
PALLAPOTHU TEJASWINI
(21JD1A0584)
1
ELURU COLLEGE OF ENGINEERING AND TECHNOLOGY(JNTUK)
DUGGIRALA(V).PEDAVEGI(M).ELURU-534004
Affiliated to JNTUK,Kakinada & Approved By AICTE-New Delhi
Department of Computer Science and Engineering
CERTIFICATE
This is to certify that the Summer Internship work entitiled “Alteryx Sparked Data
Analytics Process Automation Virtual Internship” is a bonfire record of internship work
done by Pallapothu Tejaswini(21JD1A0584) for the award of the Summer internship of
Computer science and Engineering by Jawaharlal Nehru Technological University,Kakinada
during the academic year 2024-2025
Signature of HOD
Department of
CSE
2
3
ELURU COLLEGE OF ENGINEERING & TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
4
PROGRAM OUTCOMES(POs)
1 Engineering Knowledge: Apply the knowledge of mathematics, science, engineering fundamentals and an engineering
specialization to the solution of complex engineering Problems.
2 Problem analysis: Identify, formulate, review research literature, and analyze complex engineering problems
reaching substantiated conclusions using first principles of mathematics, natural sciences, and engineering
sciences
3 Design/development of solutions: Design solutions for complex engineering problems and design system
components or processes that meet the specified needs with appropriate consideration for the public health and
safety, and the cultural, societal, and environmental considerations
4 Conduct investigations of complex problems: Use research-based knowledge and research methods including design
of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions.
5 Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering and IT
tools including prediction and modeling to complex engineering activities with an understanding of the limitations
6 The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal, health,
safety, legal and cultural issues and the consequent responsibilities relevant to the professional engineering
practice.
7 Environment and sustainability: Understand the impact of the professional engineering solutions in societal and
environmental contexts, and demonstrate the knowledge of, and need for sustainable development.
8 Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
engineering practice.
9 Individual and team work: Function effectively as an individual, and as a member or leader in diverse teams, and in
multidisciplinary settings.
10 Communication: Communicate effectively on complex engineering activities with the engineering community and
with society at large, such as, being able to comprehend and write effective reports and design documentation,
make effective presentations, and give and receive clear instructions.
11 Project management and finance: Demonstrate knowledge and understanding of the engineering and management
principles and apply these to one’s own work, as a member and leader in a team, to manage projects and in
multidisciplinary environments.
12 Life-long learning: Recognize the need for, and have the preparation and ability to engage in independent and
life-long learning in the broadest context of technological change.
5
DAY TO DAY EVALUATION
6
Module 1:Introduction to Data Analytics
Data analytics is the process of using data to solve problems and find insights. It involves
collecting, organizing, and transforming data to make predictions, draw conclusions, and
inform decisions. Data analytics can be used to improve business processes, foster growth,
and improve decision-making.
Here are some things to know about data analytics:
What it involves
Data analytics uses a variety of tools, technologies, and processes to analyze data. It can
include math, statistics, computer science, and other techniques.
What it's used for
Data analytics can help businesses understand their performance, customer behavior, and
market trends. It can also help companies make better decisions by using the data they
generate from log files, web servers, social media, and more.
Soft skills
Some soft skills that are useful for data analytics include:
Analytical thinking and problem-solving
Strong communication and presentation skills
Attention to detail
Critical thinking
Adaptability
Data analytics focuses on analyzing past data to derive insights and make decisions based
on historical trends. On the other hand, data science encompasses a broader scope,
including data analysis, machine learning, predictive modelling, and more, to solve complex
problems and uncover new insights from data
Descriptive
Diagnostic
Predictive
Prescriptive analytics
7
We need data analytics for the purpose of Business data analytics collects, processes, and
analyzes data to help make smart decisions. Smart Decision-Making: By looking at past data,
businesses can predict what's coming next, helping them act before problems pop up.
8
4. Marketing: Marketers use data analytics to understand consumer preferences, measure
campaign success, and target specific audiences. This leads to more effective marketing
strategies and better returns on investment.
5. Supply Chain Management: Data analytics helps optimize supply chain operations by
predicting demand, managing inventory, and improving logistics. This helps reduce costs and
increase efficiency.
6. Education: In education, data analytics tracks student performance, personalizes learning,
and improves outcomes. It also aids in decision-making and resource management for
educational institutions.
7. Sports: Teams and coaches use data analytics to improve performance, plan strategies,
and monitor player statistics. This helps in gaining a competitive edge and improving
results.
Data analysis tools are software programs, applications, and other aids that professionals
use to analyze data sets in ways that characterize the big picture of the information and
provide usable information for meaningful insights, predictions, and decision-making
purposes.
1.RapidMiner
9
Primary use: Data mining
2. Orange
Primary use: Data mining
o Orange is a package renowned for data visualization and analysis, especially
appreciated for its user-friendly, color-coordinated interface. You can find a
comprehensive selection of color-coded widgets for functions like data input,
cleaning, visualization, regression, and clustering, which makes it a good
choice for beginners or smaller projects.
3. KNIME
Primary use: Data mining
o KNIME, short for KoNstanz Information MinEr, is a free and open-source data
cleaning and analysis tool that makes data mining accessible even if you are a
beginner. Along with data cleaning and analysis software, KNIME has
specialized algorithms for areas like sentiment analysis and social network
analysis.
4. Tableau
Primary use: Data visualization and business intelligence
o Tableau stands out as a leading data visualization software, widely utilized in
business analytics and intelligence.
Tableau is a popular data visualization tool for its easy-to-use interface and powerful
capabilities. Its software can connect with hundreds of different data sources and
manipulate the information in many different visualization types
5. Google Charts
Primary use: Data visualization
o Google Charts is a free online tool that excels at producing a wide array of
interactive and engaging data visualizations. Its design caters to
user-friendliness, offering a comprehensive selection of pre-set chart types
that can be embedded into web pages or applications
6. Datawrapper
10
Primary use: Data visualization
o Datawrapper is a tool primarily designed for creating online visuals, such as
charts and maps. Initially conceived for journalists reporting news stories, its
versatility makes it suitable for any professional in charge of website
management.
8. Qlik
Primary use: Business intelligence
o Qlik is a global company designed to help businesses utilize data for
decision-making and problem-solving. It provides comprehensive, real-time data
integration and analytics solutions to turn data into valuable insights.
9. Google Analytics
Primary use: Business intelligence
o Google Analytics is a tool that helps businesses understand how people
interact with their websites and apps. To use it, you add a
special Javascript code to your web pages. This code collects information
when someone visits your website, like which pages they see, what device they’
re using, and how they found your site.
10. Spotfire
Primary use: Business intelligence
o TIBCO Spotfire is a user-friendly platform that transforms data into
actionable insights. It allows you to analyze historical and real-time data,
predict trends, and visualize results in a single, scalable platform.
11
APA isn’t RPA or Business Process Automation (BPA), and it isn’t a data tool, either. It’s an
automated, self-service data analytics platform that focuses on business outcomes first
while empowering everyone in your organization to adopt a culture of analytics. It allows
you and anyone else in your organization to perform advanced analytics whether or not they
know any code and whether or not they’re trained in data science. Analytic Process
Automation (APA) describes a unified platform for self-service data analytics that makes
data easily available and accessible to everyone in your organization, optimizes and
automates data analytics and data science processes, and empowers your entire
organization to develop skills and make informed decisions using machine learning (ML),
artificial intelligence (AI), and predictive and prescriptive analytics.
Think of Analytic Process Automation as an all-in-one supercharged data analytics machine.
If all you want to do is clean up some data, you can do that. If you want to merge multiple
different data types, you can do that, too. If you want to automate month-long tedious and
complex data tasks and push those outputs to decision-makers and business processes, you
got it.
If you want advanced analytics that provide forward-looking insights you can use to
produce better business outcomes, improve revenue, reduce spending, and transform your
organization from the ground up, then you need it.
12
Top Benefits of Automated Data Analytics
Increase speed of onboarding and processing data. ...
Identify data observability issues. ...
Accelerate time-to-insights. ...
Save time and costs. ...
Improve efficiency in decision-making. ...
Reduce potential of errors. ...
Improve productivity and innovation. ...
Data quality and integrity.
13
Adaptability: Automated tools may struggle with dynamic websites and may not be
able to adapt to changing requirements.
Contextual understanding: Automated tools may miss nuanced information.
Monitoring: Continuous monitoring is needed to ensure data accuracy.
Other disadvantages of automation include: Job displacement and unemployment, Reduced
human interaction and customer experience, and Dependency on technology and loss of
human skills.
Data analytics automation is a technique that uses computer systems and processes to
perform analytical tasks with little or no human intervention. It can be useful for many
reasons, including:
Time and money savings: Automation can save time and money by eliminating manual,
repetitive tasks.
Improved accuracy: Automation can improve the accuracy of data.
Faster insights: Automation can provide insights faster.
Frees up time: Automation can free up time for employees to focus on higher-level
tasks, such as interpreting automated data.
Better scalability: Automation can improve scalability.
Competitive edge: Automation can help an organization gain a competitive edge by
leading to new products and tweaks to existing ones.
Automation can be used for a variety of tasks, including data discovery, preparation,
replication, and warehouse maintenance. It can be especially useful for big data.
Automation can range from simple scripts to full-service tools that can perform
exploratory data analysis, statistical analysis, and model selection.
14
15
Module 2:Alteryx Foundation Micro Credential Badges
16
17
2.1 Alteryx Micro Credential -Design Core:General Knowledge
18
2.2 Alteryx Micro Credential-Design core:Data Preparation
19
2.3 Alteryx Micro Credential-Design Core:Data Manipulation
20
2.4 Alteryx Micro Credential-Design core:Data Transformation
21
2.5 Alteryx Design core
22
Machine learning (ML) can automate many steps in the data analytics process, including:
Data cleansing: ML algorithms can automatically detect and fix errors,
inconsistencies, or missing data.
Data transformation: ML models can automatically transform raw data into a more
usable format.
Feature engineering: ML can automate feature selection and engineering, which are
essential for building predictive models.
Predictive analytics: ML models can identify patterns, trends, and correlations in
data to help predict future trends or events.
Reducing bias: ML models can help reduce unintended bias.
Network architecture search: Some automation of network architecture searches is
possible, such as with the neural architecture search (NAS) method.
Data analytics automation can help businesses make better, more informed decisions by
analyzing large sets of data quickly and efficiently. Here are some other benefits of data
analytics automation:
Faster work: Employees can use their time on high-value work.
Improved customer satisfaction: Data analytics automation can help improve
customer satisfaction.
Free up time: Businesses can focus on creating and selling products and services.
Some of the most common scripting languages for process automation are Python, Ruby,
and PowerShell. Python is a versatile and easy-to-learn language that has many libraries
and frameworks for web development, data analysis, and machine learning.
23
Developing Automation Scripts: Depending on the internship, you might write scripts
or code to automate more complex processes. This could involve using Python, Java,
or other programming languages.
Real-World Applications: You might work on projects related to customer service,
finance, human resources, or supply chain management, applying your skills to solve
real business problems.
Where to find these internships:
Company Websites: Check the career pages of companies known for data analytics
and automation (e.g., Accenture, Deloitte, IBM, UiPath, Automation Anywhere).
Job Boards: Search for "data analytics process automation internship" or related
terms on sites like Indeed, LinkedIn, Glassdoor, and Monster.
Internship Platforms: Explore platforms like Internshala, LetsIntern (India-specific),
WayUp, and Chegg Internships.
Networking: Attend online industry events, webinars, and connect with professionals
on LinkedIn to learn about potential opportunities.
Tips for your search:
Develop Relevant Skills: Build a foundation in data analysis (Excel, SQL, Python, R)
and familiarize yourself with basic automation concepts.
Highlight Your Interest: In your resume and cover letter, clearly express your
enthusiasm for data analytics and process automation.
Portfolio Projects: Create personal projects that demonstrate your abilities (e.g.,
automate a task in your daily life, analyze a public dataset).
Practice Your Skills: Use online resources and practice platforms (like HackerRank
or LeetCode) to improve your data analysis and coding skills.
# Open the file in write mode (this will create the file if it doesn't exist)
24
with open(file_name, 'w') as file:
file.write(text)
print(f"File '{file_name}' has been created and the message has been written.")
Python code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# 1. Data Ingestion
# Load data from a CSV file
data = pd.read_csv('your_data.csv')
25
plt.scatter(data['feature1'], data['target'])
plt.xlabel('Feature 1')
plt.ylabel('Target')
plt.show()
# 5. Model Training
# Create and train a model (e.g., Linear Regression)
model = LinearRegression()
model.fit(X_train, y_train)
# 6. Model Evaluation
# Make predictions on the test set
y_pred = model.predict(X_test)
26
import joblib
joblib.dump(model, 'trained_model.pkl')
1. *Data Ingestion:*
- Load data from a CSV file using pandas.read_csv().
5. *Model Training:*
27
- Create an instance of the chosen machine learning model (e.g., LinearRegression).
- Train the model on the training data using the fit() method.
6. *Model Evaluation:*
- Make predictions on the test data using the predict() method.
- Evaluate model performance using appropriate metrics (e.g., Mean Squared Error,
R-squared).
*Key Considerations:*
- *Data Quality:* The quality of your data significantly impacts the performance of your
analysis. Ensure data is clean, accurate, and relevant to your goals.
- *Feature Engineering:* Careful feature engineering can significantly improve model
performance. Experiment with different transformations and combinations of features.
- *Model Selection:* Choose the appropriate machine learning model based on the nature of
your data and the problem you're trying to solve.
- *Hyperparameter Tuning:* Fine-tune the model's hyperparameters to optimize its
performance.
- *Automation:* Use tools like pandas, scikit-learn, and joblib to automate repetitive tasks
and streamline the analysis process.
This code provides a basic framework for data analytics process automation. You can adapt
and extend it based on your specific needs and the complexity of your analysis.
28
HTML CODE:
<!DOCTYPE html>
<html>
<head>
<title>Data Analytics Process Automation</title>
<style>
body {
font-family: sans-serif;
margin: 20px;
}
h1 {
text-align: center;
}
ul {
list-style-type: square;
padding-left: 20px;
}
li {
margin-bottom: 10px;
}
code {
background-color: #f0f0f0;
padding: 2px 5px;
}
</style>
</head>
<body>
29
<p>Here's a basic outline of a data analytics process that can be partially automated
using HTML, JavaScript, and potentially other tools:</p>
<ol>
<li>*Data Collection:*
<ul>
<li>*HTML Forms:* Create forms for users to input data manually.
<br>
<code><form></code>
<code><input type="text" name="name"></code>
<code><input type="number" name="age"></code>
<code><textarea name="comments"></textarea></code>
<code></form></code>
</li>
<li>*Data Import:* Use HTML and JavaScript to handle file uploads (e.g., CSV,
Excel) for importing data.
<br>
<code><input type="file" name="dataFile"></code>
<script>
// JavaScript code to handle file upload and processing
</script>
</li>
</ul>
</li>
<li>*Data Cleaning and Transformation:*
<ul>
<li>*Basic Data Validation:* Use JavaScript to perform simple data validation
(e.g., check for empty fields, correct data types).
</li>
<li>*Data Transformation:* While limited, you can use JavaScript for basic data
transformations (e.g., string manipulation, simple calculations).
</li>
30
</ul>
</li>
<li>*Data Analysis:*
<ul>
<li>*Basic Statistics:* Use JavaScript libraries like D3.js for basic
visualizations (charts, graphs) to explore data.
</li>
<li>*Data Summarization:* Use JavaScript to calculate basic statistics (e.g.,
mean, median, standard deviation) and display results.
</li>
</ul>
</li>
<li>*Data Visualization:*
<ul>
<li>*Charts and Graphs:* Utilize libraries like Chart.js or D3.js to create
interactive visualizations within your HTML page.
</li>
</ul>
</li>
<li>*Reporting:*
<ul>
<li>*HTML Reports:* Generate basic reports in HTML format, including tables,
charts, and summary statistics.
</li>
</ul>
</li>
</ol>
<p>*Important Notes:*</p>
<ul>
31
<li>HTML is primarily for presentation and user interaction. For more complex data
analysis and automation, you'll need to integrate with other tools and languages like Python
(with libraries like Pandas, NumPy, and scikit-learn) or R.</li>
<li>JavaScript can handle some data processing and visualization tasks, but its
capabilities are limited compared to dedicated data analysis tools.</li>
<li>This example provides a simplified overview. Real-world data analytics automation
often involves more intricate workflows and requires more powerful tools and
techniques.</li>
</ul>
</body>
</html>
Explanation:*
1. *Data Collection:*
- HTML forms allow users to input data manually.
- File upload functionality enables importing data from external sources.
3. *Data Analysis:*
- JavaScript libraries like D3.js can be used for basic data exploration and visualization.
- Simple statistical calculations can be performed using JavaScript.
4. *Data Visualization:*
- Libraries like Chart.js or D3.js can be used to create interactive charts and graphs
within the HTML page.
5. *Reporting:*
- HTML can be used to generate basic reports, including tables, charts, and summary
statistics.
32
*Key Points:*
33
34
35