Week2 - Master The Data
Week2 - Master The Data
Accounting Analytics
MASTER THE DATA
A D A P T E D F R O M M C G R AW - H I L L E D U C AT I O N . P L E A S E D O N OT D I S T R I B U T E .
2
Learning Objective 1
3
Understand the data by looking at how it
is organized.
▪ Data can be found throughout
various systems.
▪ In most cases, you need to
know which tables and attributes
contain the relevant data.
Exhibit. Procure-to-Pay Database Schema (Simplified)
▪ Unified Modeling Language
(UML) is one way to understand
databases.
4
Learning Objective 2
5
Relational databases ensure that data:
▪ Are complete or include all data.
▪ Aren’t redundant, so they don’t take up too much space.
▪ Follow business rules and internal controls.
▪ Aid communication and integration of business processes.
6
There are four types of attributes.
• Primary keys are unique identifiers. Purchase Order Table
of two foreign keys used for line 1789 11/8/2020 1002 1010 1 52 2004
7
Examples of two tables, attributes, and
data. Notice the PK-FK relationship.
Purchase Order Table Purchase Order Detail
Cash Quantity
PO_ Created Supplier Employee PO_Number Item_Number
Date Approved By Disbursement Purchased
Number By ID ID
ID 1787 10 50
1787 25 50
1787 11/1/2020 1001 1010 1 52 2001
1789 5 30
1790 5 100
1788 11/1/2020 1005 1010 2 52 2003
1789 11/8/2020 1002 1010 1 52 2004 Exhibit. Line Items Table: Purchase Order
Detail Table
1790 11/15/2020 1005 1010 1 52 2004
8
Data dictionaries define what data are acceptable.
For each attribute, we learn: Primary or
Foreign
Key?
Required
Attribute
Name
Description Data Type
Default
Value
Field
Size
Notes
9
Q: What is the purpose of the primary key?
A foreign key? A non-key attribute?
Learning Objective 3
11
The Requesting data is an iterative
practice involving 5 steps:
Step 1: Determine the purpose and scope of the data
request.
Step 2: Obtain the data.
Step 3: Validate the data for completeness and integrity.
Step 4: Clean the data.
Step 5: Load the data for data analysis.
12
Step 1: Determine the purpose and scope
of the data request
▪ Ask a few questions before beginning the process:
▪ What is the purpose of the data request?
▪ What do you need the data to solve?
▪ What business problem will it address?
▪ What risk exists in data integrity (for example, reliability, usefulness)?
▪ What is the mitigation plan?
▪ What other information will impact the nature, timing, and extent of the data analysis?
13
Step 2: Obtain the Data – Questions
▪ How will data be requested and/or obtained?
▪ Do you have access to the data yourself, or do you need to request a database
administrator or the information systems department to provide the data for
you?
▪ If you need to request the data, is there a standard data request form that you
should use?
▪ From whom do you request the data?
▪ Where are the data located in the financial or other related systems?
▪ What specific data are needed (tables and fields)?
▪ What tools will be used to perform data analytic tests or procedures and why?
14
Step 2: Obtain the Data – Methods
▪ There are a couple options:
o Obtain data through a data request to the I T department.
o Obtain data yourself.
15
Example Standard Data Request Form –
Header
Section 1: Request Details
One-Off Annually Termly
Requestor Name: Frequency (circle one)
Other:___________
Requestor Contact
Number:
Spreadsheet
Requestor Email Format you wish the
Word Document
Address: data to be delivered
Text File
Please provide a description of the information in(circle one):
Other: ____________
needed (indicate which tables and which fields
you require): Request Date:
Required Date:
What will the information be used for?
Intended Audience:
Customer
(if not requestor):
EXHIBIT 2-7 Example Standard Data Request Form
16
Example Standard Data Request Form –
Response
Section 2: To be Completed by Information Systems
Section 3: Completion Details
Department
Date
Request Number Date Date
Received
Completed Provided
Received by Assigned to
Initial review comments (discussion with client— Revisions
revisions required? agreement to proceed? etc.) Required
18
Step 3: Validate the data for completeness
and integrity
▪ Chances are the data you request isn’t complete. Before you begin, do a little
work to make sure your data are valid:
▪ Compare the number of records.
▪ Compare descriptive statistics for numeric fields.
▪ Validate Date/Time fields.
▪ Compare string limits for text fields.
19
Step 4: Clean the data
▪ Once you have valid data, there is still some work that needs to be done to
make sure it is consistent and ready for analysis:
▪ Remove headings or subtotals.
▪ Clean leading zeroes and nonprintable characters.
▪ Format negative numbers.
▪ Correct inconsistencies across data, in general.
20
Watch out for bad data quality.
Dates (e.g., 7/6/2023 or 6/7/2023 or 2023-07-06)
Numbers (e.g., 1 or I, 7 or seven)
International characters and encoding (e.g., * or “ or TAB)
Languages and measures (e.g., Arkansas or AR, $ or €)
Human error (e.g., 23 or 32)
21
Step 5: Load the data for data analysis
Finally, you can now import your data into the tool of your choice and expect the
functions to work properly.
22
Q: What are four common issues with data
that must be fixed before analysis can take
place?
Learning Objective 4
WHAT ETHICAL ISSUES DO WE
ENCOUNTER IN DATA COLLECTION
AND USE?
24
Potential ethical issues surround how data are
collected and how they are shared.
1. How does the company use data, and to what extent are they integrated into
firm strategy
2. Does the company send a privacy notice to individuals when their personal
data are collected?
3. Does the company assess the risks linked to the specific type of data the
company uses?
4. Does the company have safeguards in place to mitigate the risks of data
misuse?
5. Does the company have the appropriate tools to manage the risks of data
misuse?
6. Does our company conduct appropriate due diligence when sharing with or
acquiring data from third parties?
25
Q: A firm purchases data from a third party
about customer preferences for laundry
detergent. How would you recommend that
this firm conduct appropriate due diligence
about whether the third-party data provider
follows ethical data practices?
Summary
The first step in the IMPACT cycle is to identify the Once you have the data, they will need to be validated for
questions that you intend to answer through your data completeness and integrity—that is, you will need to
analysis project. Once a data analysis problem or question ensure that all of the data you need were extracted, and
has been identified, the next step in the IMPACT cycle is that all data are correct. Sometimes when data are
mastering the data, which can be broken down to mean extracted, some formatting or sometimes even entire
obtaining the data needed and preparing it for analysis. records will get lost, resulting in inaccuracies. Correcting
the errors and cleaning the data is an integral step in
In order to obtain the right data, it is important to have a mastering the data.
firm grasp of what data are available to you and how that
information is stored. Finally, after the data have been cleaned, there may be
◦ Data are often stored in a relational database, which helps to one last step of mastering the data, which is to load them
ensure that an organization’s data are complete and to avoid into the tool that will be used for analysis. Often, the
redundancy. Relational databases are made up of tables with cleaning and correcting of data occur in Excel and the
uniquely identified records (this is done through primary keys) analysis will also be done in Excel. In this case, there is no
and are related through the usage of foreign keys. need to load the data elsewhere. However, if you intend to
do more rigorous statistical analysis than Excel provides,
To obtain the data, you will either have access to extract or if you intend to do more robust data visualization than
the data yourself or you will need to request the data from can be done in Excel, it may be necessary to load the data
a database administrator or the information systems team. into another tool following the transformation process.
If the latter is the case, you will complete a data request
form, indicating exactly which data you need and why.
27
Thank you!
Contact me at:
duanh@sacredheart.edu