0% found this document useful (0 votes)

26 views13 pages

Data in Enterprise End Term Cheat Sheet

The document is a comprehensive cheat sheet on data in enterprise, covering topics such as data types, cloud computing, business intelligence, data governance, and data ethics. It explains key concepts like structured and unstructured data, data warehouses, and machine learning techniques, while also addressing challenges and opportunities in data management. Additionally, it outlines data preprocessing techniques, data cleaning methods, and best practices for data security.

Uploaded by

pearl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views13 pages

Data in Enterprise End Term Cheat Sheet

Uploaded by

pearl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Data in Enterprise End Term Cheat Sheet

Compiled by: Tanisha Khandelwal, Aryan Patel, Suzy Paladiya, Yashvi Patel, Neha Bhansali & Vanshika Modi

Module 1

1. What is data? Explain structured and unstructured data types with examples
Answer - Data refers to information in the form of facts, statistics, or raw observations. There are
2 types of data structured and unstructured. Structured data is data that has been predefined and
formatted to a set structure before being placed in data storage and unstructured data is data
stored in its native format and not processed until used. The example of structured data would be
relational databases and for unstructured data it would be media and entertainment data.

2. Explain briefly about Flat file formats and their limitations

Answer- Flat file formats are a type of data storage format that stores data in a plain text file,
where each line represents a single record and each field is separated by a delimeter, such as a
comma or tab. Limitations could include it’s difficult to work with large amounts of data or
complex data structure.

3. Differentiate between Data Warehouse and Datalake

Answer- A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making process. A data lake is a central
location that holds a large amount of data in its native, raw format.

4. What is cloud computing? Explain Elasticity and Scalability in cloud computing

Answer-Cloud computing is the delivery of computing services—including servers, storage,
databases, networking, software, analytics, and intelligence—over the Internet to offer faster
innovation, flexible resources, and economies of scale.
ELASTICITY: As your workload changes, resources can be changed to compensate (up or
down. Example: Seasonal Demand for retail website Black Friday
SCALABILITY:Increase or decrease resources based on workload demand. There are two types
of scalability: vertical and horizontal.

5. Short notes on i) Big data ii) NoSQL Databases

Answer i) Big Data- Big data refers to the massive amounts of structured and unstructured data
that are generated every day. This data comes from a variety of sources, including social media,
sensors, and other digital devices. Example : New York Stock Exchange that generates 1 tarabyte
of new trade data per day.
ii) NoSQL Databases- It refers to distributed databases with dynamic schema. It is horizontally
scalable and is highly available.

6. Explain different types of Machine Learning Techniques

Answer - Machine learning encompasses various techniques and algorithms to train models and
make predictions or decisions based on data. Here are some different types of machine learning
techniques:

Supervised Learning:

Involves training a model on labeled data to make predictions or classifications.

Types include regression (predicting continuous values) and classification (categorizing data into
classes).
Unsupervised Learning:

Deals with unlabeled data, aiming to find patterns, structures, or groupings within the data.
Clustering and dimensionality reduction are common techniques.

7. Explain any 3 Key challenges in context with data enterprise

● Data Quality and Consistency: Enterprises often deal with data from multiple sources and
systems, leading to inconsistencies and inaccuracies. Ensuring data quality and
consistency across the organization is challenging.
● Data Security and Privacy: Enterprises handle sensitive customer and business data.
Protecting this data from breaches and ensuring compliance with data privacy regulations
(e.g., GDPR, CCPA) is a significant challenge.
● Data Integration: Enterprises use a variety of applications and systems, resulting in data
silos. Integrating these disparate data sources to get a unified view can be complex and
time-consuming.
● Scalability: With the growth of data volume, enterprises need to scale their infrastructure
to handle and process large amounts of data effectively.

8. Explain any 3 Key opportunities in context with data enterprise

● Improved Decision-Making: Data-driven insights enable more informed and accurate
decision-making, helping organizations stay competitive and responsive to market
changes.
● Innovation: Analyzing data can uncover new opportunities for innovation, product
development, and process improvement.
● Personalization: Enterprises can use data to understand customer preferences and
behaviors, allowing them to provide tailored products and services.
● Operational Efficiency: Data can be used to optimize processes, reduce inefficiencies,
and enhance overall operational performance.
Module 2 & 5

1. List the different types of Data preprocessing techniques

● Data Cleaning
● Data Integration
● Data Reduction
● Data Transformation and Data Discretization

2. Explain Equal width Discretization

Discretization is used to divide the range of a continuous attribute into intervals and then n
Interval labels can then be used to replace actual data values

● Equal-width is (distance) partitioning

● It divides the range into N intervals of equal size: uniform grid
● if A and B are the lowest and highest values of the attribute, the width of intervals will
be: W = (B –A)/N.
● It is the most straightforward method
● The drawback is that outliers may dominate presentation and Skewed data is not handled
well

3. Explain Equal depth Discretization

Discretization is used to divide the range of a continuous attribute into intervals and then n
Interval labels can then be used to replace actual data values

● Equal-depth is a frequency partitioning

● nDivides the range into N intervals, each containing approximately same number of
samples
● It has good data scaling
● nManaging categorical attributes can be tricky

4. Explain any 2 data manipulation techniques

● Data manipulation involves making changes or transformations to data to extract useful

information, clean the data, or prepare it for analysis. Here are some common examples
of data manipulation tasks:
● Filtering Data:
Example: Removing rows from a dataset where the age of individuals is less than 18.
python
Example: Sorting a list of sales transactions by date.
● Merging Data:
Example: Combining data from two different datasets based on a common key, such as
merging customer data with order data using a customer ID.
merge(customer_data, order_data, on='customer_id')

● Cleaning Data:
Example: Replacing missing values with a default value or removing rows with missing
data.
● Transforming Data:
Example: Converting data types, such as converting a string date to a datetime object.

5. Explain any 2 data cleaning techniques

● Handling Missing Data:

Identify and handle missing values, which can be represented as NaN, NULL, or other
placeholders.

Options include removing rows with missing data, imputing missing values with means,
medians, or modes, or using more advanced imputation techniques.

Impute missing values with the mean

● Removing Duplicates:

Identify and remove duplicate rows from the dataset.

● Dealing with Outliers:

Detect and handle outliers in the data, either by removing them or transforming them.

● Standardizing Data:

Ensure consistency in data format, like converting text to lowercase or uppercase.

6. Explain about Data Transformation in detail

Data transformation is the process of converting data from one format, such as a database file,
XML document or Excel spreadsheet, into another.

Transformations typically involve converting a raw data source into a cleansed, validated and
ready-to-use format. Data transformation is crucial to data management processes that include
data integration, data migration, data warehousing and data preparation.
It is a critical component for any organization seeking to leverage its data to generate timely
business insights.
As the volume of data has proliferated, organizations must have an efficient way to harness data
to effectively put it to business use. Data transformation is one element of harnessing this data,
because -- when done properly -- it ensures data is easy to access, consistent, secure and
ultimately trusted by the intended business users

7. Draw the block diagram to show the data hierarchy in data governance

•MDM – Master Data Management

•RDM – Reference Data Management

•CM – Content Management

•RM – Record Management

8. List any 5 unique properties of data governance

9. Write the importance of Data governance and Data management

Data Governance:

● Ensures data accuracy, compliance, and accountability.

● Manages data policies, ownership, and risk.
● Improves decision-making and efficiency.

Data Management:

● Organizes, integrates, and secures data.

● Cleans and transforms data for accuracy.
● Supports data analytics and reporting.
● Enables effective data governance.
Module 3

1. What do you mean by business intelligence? Explain in brief the BI process

Business intelligence (BI) is a technology-driven process for analyzing data and delivering
actionable information that helps executives, managers and workers make informed business
decisions.
Process
Collect data from internal IT systems and external sources, prepare it for analysis, run queries
against the data
Create data visualizations, BI dashboards and reports to make the analytics results available to
business users for operational decision-making and strategic planning.

2. What are the benefits of BI

● Speed up and improve decision-making

● Optimize internal business processes

● Increase operational efficiency and productivity

● Spot business problems that need to be addressed

● BI enables C-suite executives and department managers to monitor business performance
on an ongoing basis so they can act quickly when issues or opportunities arise.
● Analyzing customer data helps make marketing, sales and customer service efforts more
effective.
● Supply chain, manufacturing and distribution bottlenecks can be detected before they
cause financial harm.
● HR managers are better able to monitor employee productivity, labor costs and other
workforce data.

3. How does the BI process work? Explain.

There are 5 steps to the BI process and they are:
1) Data from source systems is integrated and loaded into a data warehouse or other
analytics repository.
2) Data sets are organized into analysis data models or OLAP cubes to prepare them for
analysis.
3) BI analysts, other analytics professionals and business users run analytical queries against
the data.
4) The query results are built into data visualizations, dashboards, reports and online portals.
5) Business executives and workers use the information for decision-making and strategic
planning.
4. What is a Data Warehouse?
A data warehouse is a type of data repository used to store large amounts of structured data from
various data sources.

5. What is primary purpose of Data Warehouse

Data warehouses are designed to feed information into decision support systems, business
intelligence (BI) software, data dashboards, and other types of analytics and reporting tools.
Enables an organization to easily access and analyze relevant data to extract key business
insights and plan for the future.

6. What is a cloud data warehouse?

A cloud data warehouse is a type of data warehouse that is managed and hosted by a cloud
service provider (CSP).

7. What are the different components of Data warehouse architecture

● Central Database
● Data integration tools
● Metadata
● Data access tools

8. Compare ETL and ELT in detail

● ETL is Extract Transform Load

Takes raw data, transforms it into a predetermined format, then loads it into the target
data warehouse.
ETL is slower than ELT.

● ELT is Extract Load Transform

Takes raw data, loads it into the target data warehouse, then transforms it just before
analytics.
ELT is faster than ETL as it can use the internal resources of the data warehouse.
9. List down the different steps in data pipeline development process
The data pipeline development process starts by defining what, where
Data ingestion.
Data integration.
Data cleansing.
Data filtering.
Data transformation.
Data enrichment.
Data validation.
Data loading.

10. Draw the data pipeline architecture.

11. Map each component of data pipeline architecture with a data pipeline process (Refer to the
block diagram)

Data ingestion. Data Data Loading Data Visualization

Cleaning
and
integration

Module 4
1. What is Data ethics in context with data in an enterprise?
Ans)
1. Responsible Data Use
Adherence to ethical guidelines and principles governing the responsible collection, processing,
and use of data
within the organization.
2. Privacy and Confidentiality
Ensuring the protection of individuals' privacy rights and sensitive information through proper
data anonymization, encryption, and access controls.
3. Transparency and Accountability
- Promoting transparency by clearly communicating data practices, policies, and purposes to
stakeholders.
- Holding individuals and departments accountable for ethical data handling and
decision-making.
4. Fairness and Impartiality
- Avoiding biases in data collection, analysis, and decision-making to ensure fairness and
impartiality in outcomes.
- Mitigating algorithmic biases to prevent discrimination in automated decision systems.
5. Informed Consent and Control
- Obtaining informed consent from individuals regarding data collection, use, and sharing,
providing individuals
control over their data.
6. Compliance with Regulations
- Abiding by legal and regulatory frameworks related to data protection, privacy, and security
(e.g., GDPR, HIPAA, CCPA).
7. Ethical AI and Machine Learning- Integrating ethical considerations into the development and
deployment of AI and machine learning models,
addressing issues of bias, fairness, and interpretability.
8. Continuous Education and Improvement
- Providing ongoing training and education to employees regarding ethical data practices and
evolving ethical standards.
- Regularly reviewing and updating policies and procedures to align with emerging ethical
challenges and changes in regulations.

2. What are the 5 principles of data ethics

Ans) 1. Transparency:
➢ Openness: Ensure clear and understandable communication about data practices, including
how data is collected, used, and shared within the organization.
➢ Disclosure: Inform stakeholders about the purposes, methods, and potential impacts of data
usage, fostering trust and accountability.
2. Fairness:
➢ Equality: Strive to eliminate biases in data collection, analysis, and decision-making
processes, ensuring fair treatment of individuals and groups.
➢ Avoidance of Discrimination: Mitigate algorithmic biases to prevent discriminatory outcomes
in automated decision systems or AI models
3. Accountability:
➢ Responsibility: Hold individuals and departments accountable for ethical data handling,
ensuring compliance with regulations and internal policies.
➢ Oversight and Governance: Establish mechanisms for oversight, auditing, and governance to
monitor adherence to data ethics standards.
4. Privacy:
➢ Data Protection: Respect individuals' privacy rights by safeguarding their personal and
sensitive information through encryption, anonymization, and access controls.
➢ Informed Consent: Obtain informed consent from individuals regarding data collection,
usage, and sharing, granting individuals control over their data.
5. Beneficence:
➢ Positive Impact: Strive to use data for the benefit of individuals, society, and the organization,
considering the broader impact of data practices on stakeholders.
➢ Ethical Decision-Making: Prioritize ethical considerations in data-related decisions, aiming
for positive social outcomes and responsible data stewardship.

3. List any 5 data security best practices

Ans) The 5 data security practices
1) Regular data backups
2) Access control and user permissions
3) Data Encryption
4) Regular Software Updates and Patch Management
5) Employee Training and Awareness

Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Project Work 1
No ratings yet
Project Work 1
12 pages
Fundamentals of Business Analytics Reviewer
No ratings yet
Fundamentals of Business Analytics Reviewer
7 pages
Report
No ratings yet
Report
2 pages
Question Bank Final
No ratings yet
Question Bank Final
109 pages
Definition of Data Science
No ratings yet
Definition of Data Science
38 pages
8.7 - Hard Drive Interface Controller PDF
No ratings yet
8.7 - Hard Drive Interface Controller PDF
10 pages
EmTec Chapter 2
No ratings yet
EmTec Chapter 2
32 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
33 pages
Short Answer
No ratings yet
Short Answer
19 pages
Sap Hana Native Storage Extension Implementation With VSP
No ratings yet
Sap Hana Native Storage Extension Implementation With VSP
15 pages
Setup Kamailio IMS Servers (P-CSCF: I-CSCF: S-CSCF) - Multiplier
100% (1)
Setup Kamailio IMS Servers (P-CSCF: I-CSCF: S-CSCF) - Multiplier
7 pages
Advanced Excel - Power Pivot - Tutorialspoint
No ratings yet
Advanced Excel - Power Pivot - Tutorialspoint
9 pages
SAP ABAP Code For DR Report
No ratings yet
SAP ABAP Code For DR Report
3 pages
Module 2 Data Engineering 6 Mark Answers
No ratings yet
Module 2 Data Engineering 6 Mark Answers
3 pages
Top 51 Data Architect Interview Questions and How To Answer Them - Datacamp
No ratings yet
Top 51 Data Architect Interview Questions and How To Answer Them - Datacamp
19 pages
What Is Duplicate Data?
No ratings yet
What Is Duplicate Data?
10 pages
Data Binding
No ratings yet
Data Binding
9 pages
Research Assignment 02burhan Ul Din
No ratings yet
Research Assignment 02burhan Ul Din
8 pages
Question Bank With Answers
No ratings yet
Question Bank With Answers
103 pages
Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
Fdsa Unit 1 Aids Sem 4
No ratings yet
Fdsa Unit 1 Aids Sem 4
26 pages
SAP IQ Unstructured Data Analytics en
No ratings yet
SAP IQ Unstructured Data Analytics en
82 pages
Lesson 3 Data Science
No ratings yet
Lesson 3 Data Science
12 pages
DSA Question Bank
No ratings yet
DSA Question Bank
22 pages
Lost All Permissions To Hard Drive in Windows 10 - Super User
No ratings yet
Lost All Permissions To Hard Drive in Windows 10 - Super User
3 pages
Data Science
No ratings yet
Data Science
11 pages
DS.140 D S: Indus Towers Limited Infra Services - Additional PR Process
No ratings yet
DS.140 D S: Indus Towers Limited Infra Services - Additional PR Process
23 pages
20 Objective Questions Data Management
No ratings yet
20 Objective Questions Data Management
3 pages
DS Unit 1 Essay Answers.
No ratings yet
DS Unit 1 Essay Answers.
18 pages
Unit 1
No ratings yet
Unit 1
7 pages
Questions and Answers
No ratings yet
Questions and Answers
19 pages
Midterm Solution
No ratings yet
Midterm Solution
22 pages
NetBackup83 9x Tuning Guide
No ratings yet
NetBackup83 9x Tuning Guide
165 pages
Deloite USI Interview Questions
No ratings yet
Deloite USI Interview Questions
3 pages
DA 100 Mod11 ENU PowerPoint
No ratings yet
DA 100 Mod11 ENU PowerPoint
21 pages
250-513 Administration of Symantec Data Loss Prevention 12 Exam
No ratings yet
250-513 Administration of Symantec Data Loss Prevention 12 Exam
10 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
2 Marks Foundations of Data Science
No ratings yet
2 Marks Foundations of Data Science
13 pages
Chapter 2EMR
No ratings yet
Chapter 2EMR
21 pages
Course Name: Introduction To Emerging Technologies
No ratings yet
Course Name: Introduction To Emerging Technologies
24 pages
Data Science
No ratings yet
Data Science
31 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
55 pages
Commande SIG JEUDI
No ratings yet
Commande SIG JEUDI
23 pages
Revision
No ratings yet
Revision
19 pages
Ad3491-FDA Unit 1 Question Bank
No ratings yet
Ad3491-FDA Unit 1 Question Bank
8 pages
Data Modeler
No ratings yet
Data Modeler
5 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
11 - AWS RDS Notes
No ratings yet
11 - AWS RDS Notes
4 pages
Chapter 2 EmTe
No ratings yet
Chapter 2 EmTe
37 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
DWDM
No ratings yet
DWDM
11 pages
DTS 201 Lecture Note
No ratings yet
DTS 201 Lecture Note
24 pages
Data Warehouse
No ratings yet
Data Warehouse
10 pages
2024-04-27
No ratings yet
2024-04-27
10 pages
SSD-Insider: Internal Defense of Solid-State Drive Against Ransomware With Perfect Data Recovery
No ratings yet
SSD-Insider: Internal Defense of Solid-State Drive Against Ransomware With Perfect Data Recovery
10 pages
Big Data
No ratings yet
Big Data
10 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
37 pages
Machine Learning and Hadoop
No ratings yet
Machine Learning and Hadoop
26 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
41 pages
Advanced Data Structures: Binary Search Tree
No ratings yet
Advanced Data Structures: Binary Search Tree
14 pages
Correct Db2 Final Answer List 1 Q &amp A
No ratings yet
Correct Db2 Final Answer List 1 Q &amp A
17 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Purging Oracle Workflow Data - Note 277
No ratings yet
Purging Oracle Workflow Data - Note 277
4 pages
Data Warehousing & Data Mining - Study Material
No ratings yet
Data Warehousing & Data Mining - Study Material
27 pages
Q1. Explain Data Science Process Along With Detailed Diagram
No ratings yet
Q1. Explain Data Science Process Along With Detailed Diagram
7 pages
PDS Question Bank
No ratings yet
PDS Question Bank
19 pages
AFS MTO and PT
No ratings yet
AFS MTO and PT
4 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Data Science Interview Best
No ratings yet
Data Science Interview Best
48 pages
Unit 2 Data Gathering
No ratings yet
Unit 2 Data Gathering
14 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
DMBI Viva
No ratings yet
DMBI Viva
18 pages
Oracledatabaseadministrationi1z0 082examdumps2024 240524135625 A77fac36
No ratings yet
Oracledatabaseadministrationi1z0 082examdumps2024 240524135625 A77fac36
12 pages
Fods QB
No ratings yet
Fods QB
35 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
Introduction To ANSYS Introduction To ANSYS Meshing
No ratings yet
Introduction To ANSYS Introduction To ANSYS Meshing
15 pages
SnowPro Advanced Data Engineer
No ratings yet
SnowPro Advanced Data Engineer
8 pages
Data Science Fundamentals QB
No ratings yet
Data Science Fundamentals QB
23 pages
FDS - Unit 1 Question Bank
No ratings yet
FDS - Unit 1 Question Bank
16 pages
SAP PS Value Category
No ratings yet
SAP PS Value Category
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data in Enterprise End Term Cheat Sheet

Uploaded by

Data in Enterprise End Term Cheat Sheet

Uploaded by

Data in Enterprise End Term Cheat Sheet

2. Explain briefly about Flat file formats and their limitations

3. Differentiate between Data Warehouse and Datalake

4. What is cloud computing? Explain Elasticity and Scalability in cloud computing

5. Short notes on i) Big data ii) NoSQL Databases

6. Explain different types of Machine Learning Techniques

Involves training a model on labeled data to make predictions or classifications.

7. Explain any 3 Key challenges in context with data enterprise

8. Explain any 3 Key opportunities in context with data enterprise

1. List the different types of Data preprocessing techniques

2. Explain Equal width Discretization

● Equal-width is (distance) partitioning

3. Explain Equal depth Discretization

● Equal-depth is a frequency partitioning

4. Explain any 2 data manipulation techniques

● Data manipulation involves making changes or transformations to data to extract useful

5. Explain any 2 data cleaning techniques

● Handling Missing Data:

Impute missing values with the mean

Identify and remove duplicate rows from the dataset.

● Dealing with Outliers:

Ensure consistency in data format, like converting text to lowercase or uppercase.

6. Explain about Data Transformation in detail

•MDM – Master Data Management

•RDM – Reference Data Management

•CM – Content Management

•RM – Record Management

9. Write the importance of Data governance and Data management

● Ensures data accuracy, compliance, and accountability.

● Organizes, integrates, and secures data.

1. What do you mean by business intelligence? Explain in brief the BI process

2. What are the benefits of BI

● Optimize internal business processes

● Increase operational efficiency and productivity

● Spot business problems that need to be addressed

3. How does the BI process work? Explain.

5. What is primary purpose of Data Warehouse

6. What is a cloud data warehouse?

7. What are the different components of Data warehouse architecture

8. Compare ETL and ELT in detail

● ETL is Extract Transform Load

● ELT is Extract Load Transform

10. Draw the data pipeline architecture.

Data ingestion. Data Data Loading Data Visualization

2. What are the 5 principles of data ethics

3. List any 5 data security best practices

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.