MGMT 134 C2 Notes
MGMT 134 C2 Notes
Identify and obtain the data needed for solving the problem.
Requires a firm understanding of what data are available to you and where they are stored, being skilled
in the process of extracting, transforming, and loading (ETL) the data in preparation for data analysis.
Requesting data for extraction and of extracting data mastering the data step can be described via the
ETL process:
Step 1 Determine the purpose and scope of the data request (extract).
Internal data sources (accounting information system, supply chain management system, customer
relationship management system, and human resource management system)
Enterprise Resource Planning (ERP) integrates applications from throughout the business (such as
manufacturing, accounting, finance, human resources, etc.) into one system.
Accounting information system: records, processes, reports, and communicates the results of business
transactions to provide financial and nonfinancial information for decision-making purposes
Supply chain management (SCM): includes information on active vendors (their contact info, where
payment should be made, how much should be paid), the orders made to date (how much, when the
orders are made), or demand schedules for what component of the final product is needed
Customer relationship management (CRM): overseeing all interactions with current and potential
customers with the goal of improving relationships
Human resource management (HRM): managing all interactions with current and potential employees
Relational database is used for data storage because it is more capable of ensuring data integrity and
maintaining “one version of the truth” across multiple processes (relational database management
systems or RDBMS) (Microsoft SQL Server)
Storing data in a normalized, relational database ensures that data are complete, not redundant, and
that business rules and internal controls are enforced, aids communication and integration across
business processes.
Completeness: ensures that all data required are included in the dataset.
No redundancy: avoided (unnecessary space, unnecessary processing to run reports) to ensure that
there aren’t multiple versions of the truth, and it decreases the risk of data-entry errors. Storing data in
normalized relational databases requires there to be one version of the truth and for each element of
data to be stored in only one place.
Business rules enforcement: relational databases can be designed to aid in the placement and
enforcement of internal controls and business rules in ways that fat files cannot.
Each table must have a primary key (made up of one column, to ensure that each row in the table is
unique, often referred to as a “unique identifier”, collection of letters or simply sequential numbers are
used)
When you request your data into a fat file, you’ll receive one big table with a lot of redundancies, ideal
for analyzing data. Each group of information is stored in a separate table. The tables that are related to
a
Relationship is created by placing a foreign key in one of the two tables that are related (another type of
attribute, and its function is to create the relationship between two tables)
Primary and foreign keys facilitate the structure of a relational database, and the descriptive attributes
provide actual business information.
Determining the purpose and scope of the data request. Obtaining the data. Validating the data for
completeness and integrity. Cleaning the data. Loading the data for analysis.
Extract
Determine exactly what data you need
Requesting the data involves the first two steps of the ETL process
It is also necessary to specify the format in which you would like to receive the data
When you receive the data, make sure that you understand the data in each column (the data dictionary
should prove extremely helpful for this)
ADS (developed by the AICPA): alleviate the headaches associated with data requests by serving as a
guide to standardize these requests and specify the format an auditor desires from the company being
audited (Order-to-Cash subledger, Procure-to-Pay subledger, Inventory subledger, General Ledger)
Data request form template can make communication easier between data requester and provider.
Once the data are received, you can move on to the transformation phase of the ETL process
Once you have identified the data you need, you can start gathering the information.
SQL (Structured Query Language): computer language to interact with data (tables, records, and
attributes) in a database by creating, updating, deleting, and extracting; combine data from one or more
tables and organize the data in a way that is more intuitive and useful for data
Microsoft Excel or Power BI: When data are not stored in a relational database, or are not too large for
Excel, the entire table can be analyzed directly in a spreadsheet; simpler for doing exploratory analysis
Two of Excel’s most useful techniques for looking up data and matching them based on a matching
primary key/foreign key relationship are the VLOOKUP or Index/Match functions.
SQL will often be the best option for retrieving data, after which that data can be loaded into Excel or
another tool for further analysis (can be saved and reproduced at will or at regular intervals, easier and
more efficient to re-create data requests
When you are performing exploratory analysis, it can be beneficial to load entire tables into Excel and
bypass the SQL step
Transform
Step 3: Validating the Data for Completeness and Integrity
It is possible that some of the data could have been lost during the extraction
Ensure that the extracted data are complete, and the integrity of the data remains to validate the data
after extraction:
Compare the number of records that were extracted to the number of records in the source: ensuring
that the record counts match
Compare descriptive statistics for numeric fields: ensure that the numeric data were extracted
completely.
Validate Date/Time fields: same way as numeric fields, to numeric and running descriptive statistic
comparisons.
Compare string limits for text fields: ensure that you haven’t cut off any characters.
Pay close attention to the state of the data and clean them as necessary to improve the quality of the
data and subsequent analysis.
Five main data quality issues to consider: Dates, Numbers, International characters and encoding,
Languages and measures, Human error
Load
Step 5: Loading the Data for Data Analysis
Variety of different tools to use for analyzing data beyond including Excel, Power BI, Tableau Prep, and
Tableau Desktop.
ETL or ELT
ETL has been in popular
However, the procedure is shifting toward ELT. Particularly with tools such as Microsoft’s Power BI suite
most common method for mastering the data that we use is more in line with ELT than ETL
Potential ethical issues include an individual’s right to privacy and whether assurance is offered that
certain data are not misused
The Institute of Business Ethics suggests that companies consider the following six questions
How does the company use data, and to what extent are they integrated into firm strategy?
Does the company send privacy notices to individuals when their personal data is collected?
Does the company assess the risks linked to the specific type of data the company uses?
Does the company have safeguards in place to mitigate the risks of data misuse?
Does the company have the appropriate tools to manage the risks of data misuse?
Does our company conduct appropriate due diligence when sharing with or acquiring data from third
parties?