DAV - Module 1
DAV - Module 1
CSC601
Subject Incharge
Mrs. Aditi Malkar
Assistant Professor
Content:
Data Analytics Lifecycle overview: Key Roles for a
Successful Analytics, Background and Overview of Data
Analytics Lifecycle Project
Phase 1: Discovery, Phase 2: Data Preparation, Phase 3:
Model Planning, Phase 4: Model Building, Phase 5:
Communicate Results and Phase 6: Operationalize
10. Life- long learning: Recognize the need for, and have the preparation
and ability to engage in independent and life-long learning in the
broadest context of technological change.
▪ Examination Scheme
▪ ISE1 & ISE2 : 20 marks each
▪ MSE : 30 marks
▪ End Semester Exam : 100 marks (30 % weightage)
https://r.fossee.in/data-analysis-using-r-and-python
Content:
Data Analytics Lifecycle overview:Key Roles for a
Successful Analytics, Background and Overview of Data
Analytics Lifecycle Project
Phase 1: Discovery, Phase 2: Data Preparation, Phase 3:
Model Planning, Phase 4: Model Building, Phase 5:
Communicate Results and Phase 6: Operationalize
● Types of Data:
○ Structured Data: Organized and formatted data typically
found in databases.
○ Unstructured Data: Information that doesn't have a
predefined data model (e.g., text, images).
○ Semi-structured Data: Falls between structured and
unstructured data and includes elements like XML or JSON.
● Data Sources:
○ Primary Data: Collected first hand for a specific purpose.
○ Secondary Data: Existing data collected for another purpose
but used for a different inquiry.
● Data Processing:
○ Batch Processing: Processing data in chunks at scheduled
intervals.
○ Real-time Processing: Analyzing and processing data as it is
generated.
● Data Governance:
● Descriptive Analytics:
○ Describes what has happened in the past by summarizing
historical data.
○ Involves reporting, data visualization, and dashboards.
● Diagnostic Analytics:
○ Focuses on understanding why something happened by
identifying patterns and trends.
○ Seeks to answer questions about the root causes of events.
● Predictive Analytics:
○ Involves using statistical algorithms and machine learning
techniques to make predictions about future outcomes.
○ Utilizes historical data to build models that can forecast
future trends.
● Prescriptive Analytics:
○ Recommends actions to optimize or improve outcomes.
○ Combines insights from descriptive, diagnostic, and
predictive analytics to provide actionable recommendations.
● Text Analytics (Text Mining):
○ Involves extracting valuable information and insights from
unstructured text data, such as emails, social media, and
documents.
● Healthcare Analytics:
○ Applies analytics to healthcare data to improve patient
outcomes, optimize operations, and reduce costs.
● Social Media Analytics:
○ Analyzes data from social media platforms to understand
trends, sentiment, and user engagement.
● Fraud Analytics:
○ Utilizes analytics to detect and prevent fraudulent activities
by identifying irregular patterns in data.
● Supply Chain Analytics:
○ Applies analytics to optimize supply chain operations,
inventory management, and logistics.
● Types of Visualizations:
○ Charts and Graphs: Bar charts, line charts, scatter plots, pie
charts, etc.
○ Maps: Geospatial visualizations to represent data on maps.
○ Dashboards: Integrated displays of multiple visualizations
and key metrics.
○ Infographics: Visual representations combining text and
images to convey information.
https://www.linkedin.com/pulse/data-analytics-lifecycle-aniket-shukla
https://slideplayer.com/slide/7903200/
➔ Performing ETLT
The team may want clean data and aggregated data and may need
to keep a copy of the original data to compare against or look for
hidden patterns that may have existed in the data before the
cleaning stage.
This process can be summarized as ETLT to reflect the fact that a
team may choose to perform ETL in one case and ELT in another.
➔ Learning About the Data
A critical aspect of a data science project is to become familiar with
the data itself.
➔ Data Conditioning
➔ Model Selection
◆ Choosing a statistical or machine learning model that aligns
with the nature of the problem.
◆ The selection may depend on factors such as the type of
analysis (classification, regression, clustering), the structure of
the data, and the goals of the project.
➔ Model Training:
Build and train the selected models using the prepared data.
➔ Model Evaluation:
The model's performance is evaluated using a separate set of data that it hasn't seen
before. Common evaluation metrics depend on the type of analysis:
➔ Cross-Validation:
Splitting the dataset into multiple subsets, training the model on different subsets,
and evaluating its performance across them. Common techniques include k-fold
cross-validation.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 55
Phase 4: Model Building
● Does the model appear valid and accurate on the test data?
● Does the model output/behavior make sense to the domain
experts?
● Do the parameter values of the fitted model make sense in the
context of the domain?
● Is the model sufficiently accurate to meet the goal?
● Does the model avoid intolerable mistakes?
● Are more data or more inputs needed?
● Do any of the inputs need to be transformed or eliminated?
● Will the kind of model chosen support the runtime
requirements?
● Is a different form of the model required to address the
business problem? If so, go back to the model planning phase
and revise the modeling approach
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 56
Phase 4: Model Building
➔ Visualization of Results:
Create clear and compelling visualizations to communicate the
results effectively. Visual representations such as charts, graphs, and
dashboards can help stakeholders, including non-technical
audiences, understand the key findings and insights derived from the
analysis.
● After describing the emerging Big Dataecosystem and new roles needed to
support its growth.
● Three examples of Big Data Analytics in different areas: retail, IT
infrastructure, and social media.
● As mentioned earlier, Big Data presents many opportunities to improve
sales and marketing analytics. An example of this is the U.S. retailer
Target. Charles Duhigg’s book The Power of Habit [4] discusses how
Target used Big Dataand advanced analytical methods to drive new
revenue.
● After analyzing consumer-purchasing behavior, Target’s statisticians determined
that the retailer made a great deal of money from three main
lifeevent situations.
2. Discuss the roles and responsibilities of key stakeholders for a successful data analytics project.
3. Explain the activities involved in Phase 1: Discovery of the Data Analytics Lifecycle.
4. Describe the process and importance of data preparation in Phase 2 of the Data Analytics Lifecycle.
5. What are the key steps involved in Phase 3: Model Planning of the Data Analytics Lifecycle? Discuss the tools
commonly used in this phase.
6. Discuss the activities performed in Phase 4: Model Building and list some common tools used during this phase.
7. Why is it important to effectively communicate results in Phase 5 of the Data Analytics Lifecycle? Provide examples of
tools or techniques used in this phase.
8. Explain the importance of Phase 6: Operationalize in the Data Analytics Lifecycle. How does it impact the overall
success of an analytics project?
9. Describe the process of ETLT (Extract, Transform, Load, Transform) during the data preparation phase and explain its
importance in analytics.
10. How does understanding the business domain and interviewing key stakeholders help in framing the problem during
the Discovery phase?
11. Explain the Key Output from a or a successful data analytics project.
Data Analytics and Visualization
Department of AI & DS Mrs. Aditi Malkar 68
Thank you …