0% found this document useful (0 votes)
7 views69 pages

DAV - Module 1

The document outlines the course structure for Data Analytics and Visualization (DAV) under the supervision of Mrs. Aditi Malkar, covering topics such as the data analytics lifecycle, program outcomes, and various analytics techniques. It details the teaching methodology, examination scheme, and course outcomes, emphasizing the importance of data visualization and its tools. Additionally, it highlights the significance of data privacy, security, and governance in the analytics process.

Uploaded by

chaitanyabrele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views69 pages

DAV - Module 1

The document outlines the course structure for Data Analytics and Visualization (DAV) under the supervision of Mrs. Aditi Malkar, covering topics such as the data analytics lifecycle, program outcomes, and various analytics techniques. It details the teaching methodology, examination scheme, and course outcomes, emphasizing the importance of data visualization and its tools. Additionally, it highlights the significance of data privacy, security, and governance in the analytics process.

Uploaded by

chaitanyabrele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Data Analytics and Visualization (DAV)

CSC601

Subject Incharge
Mrs. Aditi Malkar
Assistant Professor

Google Classroom Code:


email: aditi.malkar@fragnel.edu.in

Jan to Jun 2025

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 1
Module 1:
Introduction to Data analytics
and life cycle

Content:
Data Analytics Lifecycle overview: Key Roles for a
Successful Analytics, Background and Overview of Data
Analytics Lifecycle Project
Phase 1: Discovery, Phase 2: Data Preparation, Phase 3:
Model Planning, Phase 4: Model Building, Phase 5:
Communicate Results and Phase 6: Operationalize

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 2
Program Outcomes (POs)
• POs are consistent with Graduate Attributes as mentioned in Washington
Accord
• These form a set of individually assessable outcomes that are the
components indicative of the graduate’s potential to acquire competence
to practice at the appropriate level
• GAs are exemplars of the attributes expected of a graduate of an
accredited program.

These Program Outcomes (POs) are -


1. Engineering knowledge: Apply the knowledge of mathematics, science,
engineering fundamentals, and an engineering specialization for the solution of
complex engineering problems.
2. Problem analysis: Identify, formulate, research literature, and analyse complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified
needs with appropriate consideration for public health and safety, and cultural,
societal, and environmental considerations.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 3
4. Conduct investigations of complex problems: Use research-based
knowledge and research methods including design of experiments,
analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques,
resources, and modern engineering and IT tools, including prediction
and modelling to complex engineering activities, with an understanding
of the limitations.
6. The engineer and society: Apply reasoning informed by the
contextual knowledge to assess societal, health, safety, legal and
cultural issues and the consequent responsibilities relevant to the
professional engineering practice.
7. Environment and sustainability: Understand the impact of the
professional engineering solutions in societal and environmental
contexts, and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 4
9. Individual and team work: Function effectively as an individual,
and as a member or leader in diverse teams, and in multi-disciplinary
settings.

10. Communication: Communicate effectively on complex engineering


activities with the engineering community and with t h e society at
large, such as, being able to comprehend and write effective reports
and design documentation, make effective presentations, and give and
receive clear instructions.

10. Project management and finance: Demonstrate knowledge and


understanding of the engineering and management principles and
apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments.

10. Life- long learning: Recognize the need for, and have the preparation
and ability to engage in independent and life-long learning in the
broadest context of technological change.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 5
Course Outcomes (COs) of CSC601- DAV

Select protocols or technologies required for


CSC601.1 Module 1
various web applications
Apply JavaScript to add functionality to web
CSC601.2 Module 2
pages.

Design front end application using basic


CSC601.3 Module 3
React.
Construct web based Node.js applications
CSC601.4 Module 4
using Express.
Design front end applications using
CSC601.5 Module 5
functional components of React.

CSC601.6 Design back-end applications using Node.js Module 6

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 6
Books

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 7
▪ Teaching Scheme
▪ Theory : 3 hours/week
▪ No. of Credits :3

▪ Examination Scheme
▪ ISE1 & ISE2 : 20 marks each
▪ MSE : 30 marks
▪ End Semester Exam : 100 marks (30 % weightage)

▪ Quiz / Presentation / Case Study

https://r.fossee.in/data-analysis-using-r-and-python

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 8
Teaching Methodology
▪ Each topic in the syllabus will include
▪ A PPT /live teaching setup based lecture
▪ In Lecture/Practical MCQs
▪ After the session, the following will be uploaded on Google
Classroom for reference
▪ PPT/ saved on-screen explanation/ board screenshots
▪ Additional reading material/ web references
▪ Whenever applicable, the video recording of the lecture with
restricted access for a quick recapitulation or to complete your
notes

▪ Doubts can be mentioned in the WhatsApp group


▪ Doubt Clearing Session every week for half an hour

▪ A graded quiz will conducted at the end of each module

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 9
Module 1:
Introduction to Data analytics
and life cycle

Content:
Data Analytics Lifecycle overview:Key Roles for a
Successful Analytics, Background and Overview of Data
Analytics Lifecycle Project
Phase 1: Discovery, Phase 2: Data Preparation, Phase 3:
Model Planning, Phase 4: Model Building, Phase 5:
Communicate Results and Phase 6: Operationalize

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 10
❖ Introduction to Data analytics and Life Cycle

Data Analytics Lifecycle Overview:


❏ Key Roles for a Successful Analytics
❏ Background
❏ Overview of Data Analytics Lifecycle Project
Phase 1: Discovery
Phase 2: Data Preparation
Phase 3: Model Planning
Phase 4: Model Building
Phase 5: Communicate Results
Phase 6: Operationalize

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 11
❖ Data Analytics and Visualization (DAV)

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 12
➢ Background

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 13
❖ Data

● In computing, data is information that has been translated into


a form that is efficient for movement or processing.
● Relative to today's computers and transmission media, data is
information converted into binary digital form.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 14
❖ Data

● Types of Data:
○ Structured Data: Organized and formatted data typically
found in databases.
○ Unstructured Data: Information that doesn't have a
predefined data model (e.g., text, images).
○ Semi-structured Data: Falls between structured and
unstructured data and includes elements like XML or JSON.

● Data Sources:
○ Primary Data: Collected first hand for a specific purpose.
○ Secondary Data: Existing data collected for another purpose
but used for a different inquiry.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 15
❖ Data

● Data Storage and Databases:


○ Relational Databases: Organize data into tables with
predefined relationships.
○ NoSQL Databases: Designed to handle unstructured and
varied data types.

● Data Processing:
○ Batch Processing: Processing data in chunks at scheduled
intervals.
○ Real-time Processing: Analyzing and processing data as it is
generated.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 16
❖ Data
● Data Privacy and Security:
Concerns about protecting sensitive and personally identifiable
information.

● Data Analytics and Techniques:


○ Descriptive Analytics: Summarizing historical data to
understand past trends.
○ Predictive Analytics: Using statistical algorithms and
machine learning to make predictions.
○ Prescriptive Analytics: Recommending actions based on
analysis.

● Data Governance:

Policies and practices to ensure high data quality, availability,


and security.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 17
❖ Data

● Big Data: Refers to extremely large and complex datasets that


traditional data processing applications are unable to handle
efficiently.
● This term is characterized by the three V’s:
○ Volume: Big Data involves large amounts of data, typically ranging from
terabytes to petabytes and beyond. Various sources such as social
media, sensors, business transactions, and more.
○ Velocity: Big Data is often generated at high speed and requires
real-time or near-real-time processing. Such As streaming data from
social media, financial transactions, or IoT (Internet of Things) devices.
○ Variety: Including structured, semi-structured, and unstructured data.

Specialized tools and technologies, including:

● Distributed Computing, NoSQL Databases, Data Lakes, Machine


Learning and Analytics

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 18
❖ Analytics
● Analytics is the systematic computational analysis of data or
statistics. It involves the discovery, interpretation, and
communication of meaningful patterns in data.
● The goal of analytics is to gain insights, make informed
decisions, and optimize processes based on data-driven
evidence.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 19
❖ Analytics

Key Types of Analytics:

● Descriptive Analytics:
○ Describes what has happened in the past by summarizing
historical data.
○ Involves reporting, data visualization, and dashboards.

● Diagnostic Analytics:
○ Focuses on understanding why something happened by
identifying patterns and trends.
○ Seeks to answer questions about the root causes of events.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 20
❖ Analytics

● Predictive Analytics:
○ Involves using statistical algorithms and machine learning
techniques to make predictions about future outcomes.
○ Utilizes historical data to build models that can forecast
future trends.
● Prescriptive Analytics:
○ Recommends actions to optimize or improve outcomes.
○ Combines insights from descriptive, diagnostic, and
predictive analytics to provide actionable recommendations.
● Text Analytics (Text Mining):
○ Involves extracting valuable information and insights from
unstructured text data, such as emails, social media, and
documents.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 21
❖ Analytics

● Business Intelligence (BI):


○ Encompasses a set of tools, processes, and technologies that
help organizations transform raw data into actionable
business insights.
● Spatial Analytics:
○ Focuses on analyzing spatial data, often in the form of maps,
to uncover geographical patterns and trends.
● Customer Analytics:
○ Analyzes customer data to understand behavior, preferences,
and trends.
○ Aids in customer segmentation, targeting, and personalized
marketing.
● Web Analytics:
○ Examines web data to understand website performance, user
behavior, and online marketing effectiveness.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 22
❖ Analytics

● Healthcare Analytics:
○ Applies analytics to healthcare data to improve patient
outcomes, optimize operations, and reduce costs.
● Social Media Analytics:
○ Analyzes data from social media platforms to understand
trends, sentiment, and user engagement.
● Fraud Analytics:
○ Utilizes analytics to detect and prevent fraudulent activities
by identifying irregular patterns in data.
● Supply Chain Analytics:
○ Applies analytics to optimize supply chain operations,
inventory management, and logistics.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 23
❖ Data Analytics
❏ Data analytics is the process of examining, cleaning,
transforming, and modeling data with the goal of discovering
useful information, drawing conclusions, and supporting
decision-making.
❏ Key components and concepts in data analytics include:
● Data Collection
● Data Cleaning and Preprocessing
● Exploratory Data Analysis (EDA)
● Descriptive Analytics
● Predictive Analytics
● Prescriptive Analytic
● Data Visualization
● Machine Learning
● Big Data Analytics
● Business Intelligence (BI)
● Data Mining
● Optimization
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 24
❖ Visualization

● Data visualization is the representation of data in graphical or


pictorial formats to make it more accessible, understandable,
and interpretable.
● Visualizing data helps to identify patterns, trends, correlations,
and insights that might be challenging to discern from raw data
alone.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 25
❖ Visualization

Key Aspects of Data Visualization:

● Types of Visualizations:
○ Charts and Graphs: Bar charts, line charts, scatter plots, pie
charts, etc.
○ Maps: Geospatial visualizations to represent data on maps.
○ Dashboards: Integrated displays of multiple visualizations
and key metrics.
○ Infographics: Visual representations combining text and
images to convey information.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 26
❖ Visualization
● Benefits of Data Visualization:
○ Clarity: Complex datasets become more understandable and
accessible.
○ Insights: Patterns and trends can be quickly identified.
○ Communication: Facilitates effective communication of data-driven
insights.
○ Decision-Making: Supports informed decision-making processes.

● Tools for Data Visualization:


○ Tableau: A popular data visualization tool with a user-friendly
interface.
○ Power BI: Microsoft's business analytics tool for creating interactive
visualizations.
○ Matplotlib and Seaborn: Python libraries for creating static and
interactive visualizations.
○ D3.js: A JavaScript library for creating dynamic and interactive
visualizations in web browsers.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 27
❖ Visualization
● Design Principles:
○ Simplicity: Keep visualizations clear and uncluttered.
○ Relevance: Ensure that visualizations directly relate to the
data and the objectives.
○ Consistency: Use consistent colors, scales, and labeling for
better comprehension.
○ Interactivity: When appropriate, add interactive elements for
exploration.
● Common Visualization Techniques:
○ Heatmaps: Representing data values in a matrix using
colors.
○ Bubble Charts: Visualizing three-dimensional data with the
size of bubbles.
○ Treemaps: Hierarchical data represented as nested
rectangles.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 28
❖ Visualization
● Storytelling with Data:
○ Combine visualizations into a coherent narrative to tell a
compelling story.
○ Use annotations and commentary to guide the audience
through the data story.

● Challenges in Data Visualization:


○ Misinterpretation: Poorly designed visualizations can lead to
misinterpretation.
○ Overcomplexity: Overloading visualizations with unnecessary
details.
○ Biases: Unintentional biases introduced through design
choices.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 29
❖ Data Analytics and Visualization (DAV)

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 30
❖ Introduction to Data analytics and Life Cycle

➢ Data Analytics Lifecycle Overview:


❏ Key Roles for a Successful Analytics
❏ Background
❏ Overview of Data Analytics Lifecycle Project
Phase 1: Discovery
Phase 2: Data Preparation
Phase 3: Model Planning
Phase 4: Model Building
Phase 5: Communicate Results
Phase 6: Operationalize

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 31
➢ Data Analytics Life Cycle Overview

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 32
❏ Key Roles for a Successful Analytics

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 33
❏ Key Roles for a Successful Analytics

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 34
❏ Key Roles for a Successful Analytics

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 35
➢ Data Analytics Life Cycle Overview

https://www.linkedin.com/pulse/data-analytics-lifecycle-aniket-shukla

https://slideplayer.com/slide/7903200/

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 36
➢ Data Analytics Life Cycle Overview

● It is designed for Big data


problems and data science
projects

● Data Analytics Lifecycle defines


the analytics process and best
practices from discovery to
project completion.

● It defines the roadmap of how


data is generated, collected,
processed, used, and analyzed to
achieve business goals.

● It offers a systematic way to


manage data for converting it into
information that can be used to
fulfill organizational and project
goals.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 37
➢ Overview of Data Analytics Lifecycle
Project
● Phase 1: Discovery
● Phase 2: Data Preparation
● Phase 3: Model Planning
● Phase 4: Model Building
● Phase 5: Communicate Results
● Phase 6: Operationalize

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 38
❏ Data Analytics Life Cycle Overview

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 39
● Phase 1: Discovery
➔ Learning the Business Domain
➔ Resources Framing the Problem
➔ Identifying Key Stakeholders
➔ Interviewing the Analytics
Sponsor
➔ Developing Initial Hypotheses
➔ Identifying Potential Data
Sources

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 40
Phase 1: Discovery

❏ The first stage of the data analytics lifecycle is discovery.


❏ In this stage, organizations identify business problems or
opportunities that can be addressed through data analysis.
❏ This stage involves understanding the business context, defining the
problem statement, and setting clear objectives and goals for the
data analytics project.
❏ It also involves identifying the relevant data sources and
stakeholders who will be involved in the data analytics process.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 41
Phase 1: Discovery

➔ Learning the Business Domain


The team learns the business domain, including relevant history such as
whether the organization or business unit has attempted similar projects
in the past from which they can learn.

➔ Resources Framing the Problem


The team assesses the resources available to support the project in
terms of people, technology, time, and data.

➔ Identifying Key Stakeholders (People)

➔ Interviewing the Analytics Sponsor

➔ Developing Initial Hypotheses


The team formulates initial hypothesis that can be later tested with data.

➔ Identifying Potential Data Sources

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 42
Phase 1: Discovery

Following people involved in this phase:

● Business user, project sponsor, project manager – Vice presidents from


office of CTO
● BI Analyst – person from IT
● Data engineer, DBA – People from IT
● Data Scientist – Distinguished Engineer

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 43
● Phase 2: Data Preparation
➔ Preparing the Analytic Sandbox
➔ Performing ETLT
➔ Learning About the Data
➔ Data Conditioning
➔ Survey and visualize
➔ Common Tools for the Data
Preparation Phase

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 44
Phase 2: Data Preparation

❏ Once the problem statement is defined, the next stage is data


preparation.
❏ Data preparation is a critical stage in the data analytics lifecycle
as the quality of the data used for analysis has a direct impact on
the accuracy and reliability of the results.
❏ In this stage, data is collected, cleaned, and transformed into a
format that is suitable for analysis.
❏ This may involve data integration, data cleansing, data
enrichment, and data transformation activities.
❏ Data visualization techniques may also be used to gain a better
understanding of the data and identify any data quality issues.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 45
Phase 2: Data Preparation
➔ Preparing the Analytic Sandbox
The first subphase of data preparation requires the team to obtain
an analytic sandbox (also commonly referred to as a workspace), in
which the team can explore the data without interfering with live
production databases

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 46
Phase 2: Data Preparation

➔ Performing ETLT
The team may want clean data and aggregated data and may need
to keep a copy of the original data to compare against or look for
hidden patterns that may have existed in the data before the
cleaning stage.
This process can be summarized as ETLT to reflect the fact that a
team may choose to perform ETL in one case and ELT in another.
➔ Learning About the Data
A critical aspect of a data science project is to become familiar with
the data itself.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 47
Phase 2: Data Preparation

➔ Data Conditioning

Refers to the process of cleaning data, normalizing datasets, and


performing transformations on the data.

Can involve many complex steps to join or merge datasets or


otherwise get datasets into a state that enables analysis in
further phases.

Data conditioning is often viewed as a preprocessing step for the


data analysis because it involves many operations on the dataset
before developing models to process or analyze the data.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 48
Phase 2: Data Preparation

➔ Survey and visualize


After the team has collected and obtained at least some of the
datasets needed for the subsequent analysis, a useful step is to
leverage data visualization tools to gain an overview of the data.
Seeing high-level patterns in the data enables one to understand
characteristics about the data very quickly.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 49
Phase 2: Data Preparation
➔ Common Tools for the Data Preparation Phase:
Apache Hadoop distributed processing of large datasets across clusters
of computer using simple programming models.
Alpine Miner provides a graphical user interface (GUI) for creating
analytic workflows, including data manipulations and a series of analytic
events such as staged data-mining techniques (for example, first select
the top 100 customers, and then run descriptive statistics and clustering)
on Postgres SQL and other Big Data sources.
OpenRefine (formerly called Google Refine) is “a free, open source,
powerful tool for working with messy data.” It is a popular GUI-based tool
for performing data transformations, and it’s one of the most robust free
tools currently available.
Data Wrangler is an interactive tool for data cleaning and
transformation. It was developed at Stanford University and can be used
to perform many transformations on a given dataset. Data transformation
outputs can be put into Java or Python. The advantage of this feature is
that a subset of the data can be manipulated in Wrangler via its GUI, and
then the same operations can be written out as Java or Python code to be
executed against the full, larger dataset offline in a local analytic sandbox.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 50
● Phase 3: Model Planning
➔ Data Exploration and Variable Selection
➔ Model Selection
➔ Common Tools for the Model Planning Phase

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 51
Phase 3: Model Planning

➔ Data Exploration and Variable Selection


◆ To understand relationships between variables and identify key
features for analysis.
◆ Variable selection involves determining which variables
(features) are most relevant to the problem at hand.
Techniques such as correlation analysis

➔ Model Selection
◆ Choosing a statistical or machine learning model that aligns
with the nature of the problem.
◆ The selection may depend on factors such as the type of
analysis (classification, regression, clustering), the structure of
the data, and the goals of the project.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 52
Phase 3: Model Planning

➔ Common Tools for the Model Planning Phase

● Statistical Software (e.g., R, Python with libraries like


Scikit-Learn): Enables statistical analysis and model building.

● Machine Learning Frameworks (e.g., TensorFlow, PyTorch): For


building and training machine learning models.

● Data Visualization Tools (e.g., Tableau, Power BI): Assist in


visually exploring relationships between variables.

● Domain-Specific Software: Industry-specific tools for


specialized analytics tasks.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 53
● Phase 4: Model Building
➔ Common Tools for the Model Building Phase

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 54
Phase 4: Model Building

➔ Model Training:
Build and train the selected models using the prepared data.

➔ Model Evaluation:
The model's performance is evaluated using a separate set of data that it hasn't seen
before. Common evaluation metrics depend on the type of analysis:

● Regression: Mean Squared Error, R-squared.


● Classification: Accuracy, Precision, Recall, F1 Score, ROC-AUC.
● Clustering: Silhouette Score.
➔ Hyperparameter Tuning:
Hyperparameters are configuration settings that are not learned during training and
need manual adjustment.

➔ Cross-Validation:

Splitting the dataset into multiple subsets, training the model on different subsets,
and evaluating its performance across them. Common techniques include k-fold
cross-validation.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 55
Phase 4: Model Building

➔ Creating robust models that are suitable to a specific situation,

Questions to consider include these:

● Does the model appear valid and accurate on the test data?
● Does the model output/behavior make sense to the domain
experts?
● Do the parameter values of the fitted model make sense in the
context of the domain?
● Is the model sufficiently accurate to meet the goal?
● Does the model avoid intolerable mistakes?
● Are more data or more inputs needed?
● Do any of the inputs need to be transformed or eliminated?
● Will the kind of model chosen support the runtime
requirements?
● Is a different form of the model required to address the
business problem? If so, go back to the model planning phase
and revise the modeling approach
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 56
Phase 4: Model Building

➔ Common Tools for the Model Building Phase


● Scikit-Learn (Python): Provides a wide range of tools for machine
learning, including model building, evaluation, and
hyperparameter tuning.
● TensorFlow and PyTorch: Deep learning frameworks for building
neural networks and advanced models.
● RapidMiner: An integrated data science platform with tools for
building, evaluating, and deploying models.
● AutoML Tools (e.g., H2O.ai, DataRobot): Automate aspects of the
model building process, including feature engineering and
hyperparameter tuning.
● SAS Enterprise Miner
● WEKA
● SPCS Modeler
● MATLAB
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 57
● Phase 5: Communicate Results

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 58
Phase 5: Communicate Results

➔ Interpretation and Analysis:


After building and evaluating the model, it's essential to interpret the
results and analyze the findings. This involves understanding the
implications of the model's predictions or insights and how they align
with the initial hypotheses and business objectives.

➔ Visualization of Results:
Create clear and compelling visualizations to communicate the
results effectively. Visual representations such as charts, graphs, and
dashboards can help stakeholders, including non-technical
audiences, understand the key findings and insights derived from the
analysis.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 59
Phase 5: Communicate Results

➔ Preparation of Reports and Documentation:


Document the entire analytics process, including data sources,
methodologies, and key decisions made during the project.
Prepare comprehensive reports that highlight the main results,
insights, and any recommendations for decision-makers.
Well-documented reports are essential for knowledge transfer,
replication of analyses, and future reference.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 60
Phase 5: Communicate Results

➔ Tailoring Communication to Stakeholders:


Tailor communication of results to different stakeholders, considering
their backgrounds and information needs. Executives may require
high-level summaries and strategic implications, while technical teams
may seek more detailed information about the modeling approach and
results.

➔ Feedback and Validation:


Engage with stakeholders to gather feedback on the results and
validate the findings. This iterative process helps ensure that the
analysis is aligned with the business context and that any additional
insights or clarifications are addressed.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 61
Phase 5: Communicate Results

➔ Common Tools for Communicating Results:


● Data Visualization Tools (e.g., Tableau, Power BI):

Create interactive and visually appealing dashboards to


present results.

● Reporting Tools (e.g., Jupyter Notebooks, R Markdown):

Prepare detailed reports with a combination of text, code,


and visualizations.

● Presentation Software (e.g., Microsoft PowerPoint):

Create slide decks for delivering presentations to


stakeholders.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 62
● Phase 6: Operationalize

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 63
Phase 6: Operationalize

● Develop a plan for the deployment of analytical models.


● Integrate solutions into existing business processes or
systems.
● Monitor and evaluate the performance of deployed
models.
● Provide documentation and training for end-users.
● Establish maintenance procedures for continuous
improvement.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 64
Phase 6: Operationalize

➔ Common Tools for Operationalize:


● Model Deployment Platforms (e.g., TensorFlow Serving,
Flask): Tools to deploy machine learning models into
production environments.

● Monitoring Tools (e.g., Prometheus, Grafana): Monitor the


performance of deployed models and track key metrics.

● Collaboration Platforms (e.g., Confluence, SharePoint):


Facilitate documentation and knowledge sharing among team
members.

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 65
❏ Example of Big Data Analytics Lifecycle

● After describing the emerging Big Dataecosystem and new roles needed to
support its growth.
● Three examples of Big Data Analytics in different areas: retail, IT
infrastructure, and social media.
● As mentioned earlier, Big Data presents many opportunities to improve
sales and marketing analytics. An example of this is the U.S. retailer
Target. Charles Duhigg’s book The Power of Habit [4] discusses how
Target used Big Dataand advanced analytical methods to drive new
revenue.
● After analyzing consumer-purchasing behavior, Target’s statisticians determined
that the retailer made a great deal of money from three main
lifeevent situations.

➔ Marriage, when people tend to buy manynew products


➔ Divorce, when people buy new products and change their spending habits
➔ Pregnancy, when people have manynew things to buy and have an urgency to
buy them
Target determined that the most lucrative of these life-events is the
thirdsituation: pregnancy.
Using data collected from shoppers, Target was able to identify this fact and
predict which of its shoppers were pregnant.
In one case, Target knew a female shopper was pregnant even before her
family knew….
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 66
❏ Overview of Data Analytics Lifecycle
Project

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 67
❏ Questionnaires
1. Define the Data Analytics Lifecycle and explain the significance of each phase briefly.

2. Discuss the roles and responsibilities of key stakeholders for a successful data analytics project.

3. Explain the activities involved in Phase 1: Discovery of the Data Analytics Lifecycle.

4. Describe the process and importance of data preparation in Phase 2 of the Data Analytics Lifecycle.

5. What are the key steps involved in Phase 3: Model Planning of the Data Analytics Lifecycle? Discuss the tools
commonly used in this phase.

6. Discuss the activities performed in Phase 4: Model Building and list some common tools used during this phase.

7. Why is it important to effectively communicate results in Phase 5 of the Data Analytics Lifecycle? Provide examples of
tools or techniques used in this phase.

8. Explain the importance of Phase 6: Operationalize in the Data Analytics Lifecycle. How does it impact the overall
success of an analytics project?

9. Describe the process of ETLT (Extract, Transform, Load, Transform) during the data preparation phase and explain its
importance in analytics.

10. How does understanding the business domain and interviewing key stakeholders help in framing the problem during
the Discovery phase?

11. Explain the Key Output from a or a successful data analytics project.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 68
Thank you …

Data Analytics and Visualization


Department of AI & DS Mrs. Aditi Malkar 69

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy