0% found this document useful (0 votes)
19 views28 pages

Week2 - Master The Data

The document provides an introduction to accounting analytics, focusing on the organization, storage, and ethical considerations of data in accounting information systems. It outlines the process of data extraction, transformation, and loading (ETL), emphasizing the importance of data validation and cleaning for accurate analysis. Additionally, it addresses ethical issues related to data collection and usage, highlighting the need for companies to manage risks associated with data misuse.

Uploaded by

Jake Moynihan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views28 pages

Week2 - Master The Data

The document provides an introduction to accounting analytics, focusing on the organization, storage, and ethical considerations of data in accounting information systems. It outlines the process of data extraction, transformation, and loading (ETL), emphasizing the importance of data validation and cleaning for accurate analysis. Additionally, it addresses ethical issues related to data collection and usage, highlighting the need for companies to manage risks associated with data misuse.

Uploaded by

Jake Moynihan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction to

Accounting Analytics
MASTER THE DATA

A D A P T E D F R O M M C G R AW - H I L L E D U C AT I O N . P L E A S E D O N OT D I S T R I B U T E .

Instructor: Huijue Kelly Duan


Learning Objectives
▪ Understand how data are organized in an accounting information system.

▪ Understand how data are stored in a relational database.

▪ Explain and apply extraction, transformation, and loading (ETL) techniques.

▪ Describe the ethical considerations of data collection and data use.

2
Learning Objective 1

HOW ARE DATA USED AND STORED


IN THE ACCOUNTING CYCLE?

3
Understand the data by looking at how it
is organized.
▪ Data can be found throughout
various systems.
▪ In most cases, you need to
know which tables and attributes
contain the relevant data.
Exhibit. Procure-to-Pay Database Schema (Simplified)
▪ Unified Modeling Language
(UML) is one way to understand
databases.

4
Learning Objective 2

HOW ARE DATA STORED IN


RELATIONAL DATABASES?

5
Relational databases ensure that data:
▪ Are complete or include all data.
▪ Aren’t redundant, so they don’t take up too much space.
▪ Follow business rules and internal controls.
▪ Aid communication and integration of business processes.

6
There are four types of attributes.
• Primary keys are unique identifiers. Purchase Order Table

• Foreign keys are attributes that PO_


Date
Created
Approved By
Supplier Employee
Cash
Disbursement
Number By ID ID
point to a primary key in another ID

table. 1787 11/1/2020 1001 1010 1 52 2001

• Composite keys are a combination 1788 11/1/2020 1005 1010 2 52 2003

of two foreign keys used for line 1789 11/8/2020 1002 1010 1 52 2004

items. 1790 11/15/2020 1005 1010 1 52 2004

• Descriptive attributes include Exhibit. Purchase Order Table


everything else.

7
Examples of two tables, attributes, and
data. Notice the PK-FK relationship.
Purchase Order Table Purchase Order Detail
Cash Quantity
PO_ Created Supplier Employee PO_Number Item_Number
Date Approved By Disbursement Purchased
Number By ID ID
ID 1787 10 50
1787 25 50
1787 11/1/2020 1001 1010 1 52 2001
1789 5 30
1790 5 100
1788 11/1/2020 1005 1010 2 52 2003
1789 11/8/2020 1002 1010 1 52 2004 Exhibit. Line Items Table: Purchase Order
Detail Table
1790 11/15/2020 1005 1010 1 52 2004

Exhibit. Purchase Order Table

8
Data dictionaries define what data are acceptable.
For each attribute, we learn: Primary or
Foreign
Key?
Required
Attribute
Name
Description Data Type
Default
Value
Field
Size
Notes

◦ What type of key it is.


Unique Identifier
◦ What data are required. PK Y Supplier ID for each Supplier Number n/a 10

◦ What data can be stored in it. N


Supplier First and Last
Short Text n/a 30
Name Name
◦ How much data is stored. Type Code for
Supplier Different Supplier
FK N Number Null 10 1: Vendor
Type Categories
2: Misc

Exhibit. Supplier Data Dictionary

9
Q: What is the purpose of the primary key?
A foreign key? A non-key attribute?
Learning Objective 3

WHAT DOES IT MEAN TO EXTRACT,


TRANSFORM, AND LOAD?

11
The Requesting data is an iterative
practice involving 5 steps:
Step 1: Determine the purpose and scope of the data
request.
Step 2: Obtain the data.
Step 3: Validate the data for completeness and integrity.
Step 4: Clean the data.
Step 5: Load the data for data analysis.

12
Step 1: Determine the purpose and scope
of the data request
▪ Ask a few questions before beginning the process:
▪ What is the purpose of the data request?
▪ What do you need the data to solve?
▪ What business problem will it address?
▪ What risk exists in data integrity (for example, reliability, usefulness)?
▪ What is the mitigation plan?
▪ What other information will impact the nature, timing, and extent of the data analysis?

13
Step 2: Obtain the Data – Questions
▪ How will data be requested and/or obtained?
▪ Do you have access to the data yourself, or do you need to request a database
administrator or the information systems department to provide the data for
you?
▪ If you need to request the data, is there a standard data request form that you
should use?
▪ From whom do you request the data?
▪ Where are the data located in the financial or other related systems?
▪ What specific data are needed (tables and fields)?
▪ What tools will be used to perform data analytic tests or procedures and why?
14
Step 2: Obtain the Data – Methods
▪ There are a couple options:
o Obtain data through a data request to the I T department.
o Obtain data yourself.

15
Example Standard Data Request Form –
Header
Section 1: Request Details
One-Off Annually Termly
Requestor Name: Frequency (circle one)
Other:___________
Requestor Contact
Number:
Spreadsheet
Requestor Email Format you wish the
Word Document
Address: data to be delivered
Text File
Please provide a description of the information in(circle one):
Other: ____________
needed (indicate which tables and which fields
you require): Request Date:
Required Date:
What will the information be used for?
Intended Audience:
Customer
(if not requestor):
EXHIBIT 2-7 Example Standard Data Request Form
16
Example Standard Data Request Form –
Response
Section 2: To be Completed by Information Systems
Section 3: Completion Details
Department

Date
Request Number Date Date
Received
Completed Provided
Received by Assigned to
Initial review comments (discussion with client— Revisions
revisions required? agreement to proceed? etc.) Required

Feedback from client (if applicable)


Work in progress comments (additional notes and
comments during production of data)

EXHIBIT 2-7 Example Standard Data Request Form


17
Obtain the data yourself
▪ If you have direct access to a data warehouse, you can use SQL and other tools to pull the data
yourself.
▪ Identify the tables that contain the information you need. You can do this by looking through
the data dictionary or the relationship model.
▪ Identify which attributes, specifically, hold the information you need in each table.
▪ Identify how those tables are related to each other.

18
Step 3: Validate the data for completeness
and integrity
▪ Chances are the data you request isn’t complete. Before you begin, do a little
work to make sure your data are valid:
▪ Compare the number of records.
▪ Compare descriptive statistics for numeric fields.
▪ Validate Date/Time fields.
▪ Compare string limits for text fields.

19
Step 4: Clean the data
▪ Once you have valid data, there is still some work that needs to be done to
make sure it is consistent and ready for analysis:
▪ Remove headings or subtotals.
▪ Clean leading zeroes and nonprintable characters.
▪ Format negative numbers.
▪ Correct inconsistencies across data, in general.

20
Watch out for bad data quality.
Dates (e.g., 7/6/2023 or 6/7/2023 or 2023-07-06)
Numbers (e.g., 1 or I, 7 or seven)
International characters and encoding (e.g., * or “ or TAB)
Languages and measures (e.g., Arkansas or AR, $ or €)
Human error (e.g., 23 or 32)

21
Step 5: Load the data for data analysis
Finally, you can now import your data into the tool of your choice and expect the
functions to work properly.

22
Q: What are four common issues with data
that must be fixed before analysis can take
place?
Learning Objective 4
WHAT ETHICAL ISSUES DO WE
ENCOUNTER IN DATA COLLECTION
AND USE?

24
Potential ethical issues surround how data are
collected and how they are shared.
1. How does the company use data, and to what extent are they integrated into
firm strategy
2. Does the company send a privacy notice to individuals when their personal
data are collected?
3. Does the company assess the risks linked to the specific type of data the
company uses?
4. Does the company have safeguards in place to mitigate the risks of data
misuse?
5. Does the company have the appropriate tools to manage the risks of data
misuse?
6. Does our company conduct appropriate due diligence when sharing with or
acquiring data from third parties?
25
Q: A firm purchases data from a third party
about customer preferences for laundry
detergent. How would you recommend that
this firm conduct appropriate due diligence
about whether the third-party data provider
follows ethical data practices?
Summary
The first step in the IMPACT cycle is to identify the Once you have the data, they will need to be validated for
questions that you intend to answer through your data completeness and integrity—that is, you will need to
analysis project. Once a data analysis problem or question ensure that all of the data you need were extracted, and
has been identified, the next step in the IMPACT cycle is that all data are correct. Sometimes when data are
mastering the data, which can be broken down to mean extracted, some formatting or sometimes even entire
obtaining the data needed and preparing it for analysis. records will get lost, resulting in inaccuracies. Correcting
the errors and cleaning the data is an integral step in
In order to obtain the right data, it is important to have a mastering the data.
firm grasp of what data are available to you and how that
information is stored. Finally, after the data have been cleaned, there may be
◦ Data are often stored in a relational database, which helps to one last step of mastering the data, which is to load them
ensure that an organization’s data are complete and to avoid into the tool that will be used for analysis. Often, the
redundancy. Relational databases are made up of tables with cleaning and correcting of data occur in Excel and the
uniquely identified records (this is done through primary keys) analysis will also be done in Excel. In this case, there is no
and are related through the usage of foreign keys. need to load the data elsewhere. However, if you intend to
do more rigorous statistical analysis than Excel provides,
To obtain the data, you will either have access to extract or if you intend to do more robust data visualization than
the data yourself or you will need to request the data from can be done in Excel, it may be necessary to load the data
a database administrator or the information systems team. into another tool following the transformation process.
If the latter is the case, you will complete a data request
form, indicating exactly which data you need and why.

27
Thank you!
Contact me at:
duanh@sacredheart.edu

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy