Tasbi Ul Hasan-20023247
Tasbi Ul Hasan-20023247
TASBI UL HASAN
ID: 20023247
Answer to the question no 01
1. Web Data:
o Description: Data collected from the company’s website, capturing user interactions,
click-through rates, session durations, and navigation paths.
2. Legacy Systems:
o Description: These are older systems that store historical data across different company
operations, potentially containing records on product versions, customer service logs, or
past transactions.
o Relevance: Legacy data is essential for the Design Department to track the
development and performance of products over time and understand long-term trends.
It may also support Marketing with historical customer data.
Mixed data types, including text fields for product descriptions and service logs.
4. Sales Transactions:
o Relevance: This data is crucial for the Marketing Department to analyze sales
performance, identify purchasing patterns, and track high-performing products.
o Relevance: The Operations Department requires this data to ensure employee well-
being, maintain a safe working environment, and monitor compliance with safety
regulations.
1. Extract:
Web Data: Retrieved from web logs or API endpoints, capturing session data
and user interactions.
Legacy Systems: Extracted from databases or file systems where historical data
is stored, often requiring specialized connectors due to legacy formats.
2. Transform:
Integration: Link related data across sources. For example, employee data from
Health and Safety records might be connected to Plant Operations to track
safety incidents by location.
Schema Alignment: Ensure data fields are consistently named and structured to
fit the data warehouse’s schema.
3. Load:
o The transformed data is loaded into the central data warehouse, typically organized in
tables and schemas that match business domains (e.g., Sales, Operations, Marketing).
o Data Partitions: The data might be partitioned by time or department for faster access
and analysis.
o Data Marts: Department-specific views or “marts” are created based on the central
warehouse to allow departments to access only relevant data.
o Database Structure:
Fact Tables: Store transactional data, such as sales transactions, manufacturing outputs, or
safety incidents.
Dimension Tables: Store reference data (e.g., customer details, product info, employee records)
to allow efficient joining with fact tables.
Data is organized by business domains (e.g., Sales, Operations, Health and Safety).
Implement indexes on frequently queried fields (e.g., date, transaction ID) to improve
performance.
Historical Data Retention: Data from legacy systems can be retained in a separate archival
schema if needed for long-term reference.
Implement role-based access controls so that each department can only view and access
relevant data.
Data marts provide department-specific access to the data stored in the central warehouse. Here’s
how each data mart will be structured for the company’s departments:
o Structure: Fact tables for production metrics, dimension tables for machine details and
operators.
o Structure: Fact tables for incident reports, dimension tables for employee and machine
information.
o Purpose: Review historical data on product versions and customer engagement trends.
o Structure: Fact tables for product data and website interactions, dimension tables for
product features.
o Structure: Fact tables for sales and web interactions, dimension tables for customer
demographics and product details.
The company uses Python and JMP Pro as API tools to enable users to access the data warehouse:
1. Python API:
o Can be used for creating automated scripts that pull data for specific analyses or
visualizations.
o Departments can use JMP Pro to directly access and analyze their data mart, creating
customized reports and visual insights.
These APIs will facilitate easy and secure data access, enabling each department to use the data
warehouse efficiently without manual data handling.
Type of
Description Example
Analytics
Focuses on summarizing
A retail store analyzing monthly
Descriptive and describing historical
sales reports to identify popular
Analytics data to understand what
products.
has happened.
Purpose: Looks at past data to identify trends and understand historical
performance.
Techniques Used: Reporting, data aggregation, and data visualization (e.g.,
dashboards).
Benefit: Helps businesses get a clear picture of what has happened, allowing
them to recognize successes or spot issues.
Uses historical data and
Using past sales data to predict
Predictive statistical techniques to
customer demand for the
Analytics forecast what might
upcoming holiday season.
happen in the future.
Purpose: Applies statistical models and machine learning algorithms to
historical data to make predictions about future events.
Techniques Used: Regression analysis, time-series forecasting, and
classification.
Benefit: Allows businesses to anticipate future trends and adjust their strategies
accordingly.
Prescriptive Recommends actions by
Analytics analyzing possible An inventory system that suggests
outcomes, aiming to optimal stock levels based on
suggest the best course predicted demand and supplier
of action to achieve delivery times.
desired results.
Purpose: Builds on predictive insights to recommend specific actions that can
help optimize outcomes.
Techniques Used: Optimization algorithms, decision analysis, and simulation.
Benefit: Provides actionable recommendations, helping businesses to make
decisions that align with their goals.
The ETL process (Extract, Transform, Load) is a crucial part of data warehousing, enabling data from
various sources to be integrated, standardized, and stored for analysis.
Step Description