0% found this document useful (0 votes)
17 views5 pages

MGMT 134 C2 Notes

Uploaded by

mainguyennt16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

MGMT 134 C2 Notes

Uploaded by

mainguyennt16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Mastering the Data

Identify and obtain the data needed for solving the problem.

Requires a firm understanding of what data are available to you and where they are stored, being skilled
in the process of extracting, transforming, and loading (ETL) the data in preparation for data analysis.

Requesting data for extraction and of extracting data mastering the data step can be described via the
ETL process:

Step 1 Determine the purpose and scope of the data request (extract).

Step 2 Obtain the data (extract).

Step 3 Validate the data for completeness and integrity (transform).

Step 4 Clean the data (transform).

Step 5 Load the data in preparation for data analysis (load).

HOW DATA ARE USED AND STORED IN THE ACCOUNTING


CYCLE
Must have a comfortable grasp on what data are available to you and where such data are stored.

Internal and External Data Sources


Data may come from a number of different sources, either internal or external

Internal data sources (accounting information system, supply chain management system, customer
relationship management system, and human resource management system)

Enterprise Resource Planning (ERP) integrates applications from throughout the business (such as
manufacturing, accounting, finance, human resources, etc.) into one system.

Accounting information system: records, processes, reports, and communicates the results of business
transactions to provide financial and nonfinancial information for decision-making purposes

Supply chain management (SCM): includes information on active vendors (their contact info, where
payment should be made, how much should be paid), the orders made to date (how much, when the
orders are made), or demand schedules for what component of the final product is needed

Customer relationship management (CRM): overseeing all interactions with current and potential
customers with the goal of improving relationships

Human resource management (HRM): managing all interactions with current and potential employees

Accounting Data and Accounting Information Systems


Data are stored in either fat files or a database (fat file is a means of maintaining all of the data you need
in one place, generally inefficient to store all of the data in one place)
Most common example of a fat file is a range of data in an Excel spreadsheet

Relational database is used for data storage because it is more capable of ensuring data integrity and
maintaining “one version of the truth” across multiple processes (relational database management
systems or RDBMS) (Microsoft SQL Server)

DATA AND RELATIONSHIPS IN A RELATIONAL DATABASE


Structured data should be stored in a normalized relational database

Storing data in a normalized, relational database ensures that data are complete, not redundant, and
that business rules and internal controls are enforced, aids communication and integration across
business processes.

Completeness: ensures that all data required are included in the dataset.

No redundancy: avoided (unnecessary space, unnecessary processing to run reports) to ensure that
there aren’t multiple versions of the truth, and it decreases the risk of data-entry errors. Storing data in
normalized relational databases requires there to be one version of the truth and for each element of
data to be stored in only one place.

Business rules enforcement: relational databases can be designed to aid in the placement and
enforcement of internal controls and business rules in ways that fat files cannot.

Communication and integration of business processes: relational databases should be designed to


support business processes across the organization, which results in improved communication across
functional areas and more integrated business processes.

Columns in a Table: Primary Keys, Foreign Keys, and Descriptive Attributes


Three types of columns: primary keys, foreign keys, and descriptive attributes.

Each table must have a primary key (made up of one column, to ensure that each row in the table is
unique, often referred to as a “unique identifier”, collection of letters or simply sequential numbers are
used)

When you request your data into a fat file, you’ll receive one big table with a lot of redundancies, ideal
for analyzing data. Each group of information is stored in a separate table. The tables that are related to
a

Relationship is created by placing a foreign key in one of the two tables that are related (another type of
attribute, and its function is to create the relationship between two tables)

Other columns in a table are descriptive attributes.

Primary and foreign keys facilitate the structure of a relational database, and the descriptive attributes
provide actual business information.

EXTRACT, TRANSFORM, AND LOAD (ETL) THE DATA


Prepared to request the data from the database manager or extract the data
ETL process begins with identifying which data you need and is complete when the clean data is loaded
in the appropriate format into the tool to be used for analysis.

Determining the purpose and scope of the data request. Obtaining the data. Validating the data for
completeness and integrity. Cleaning the data. Loading the data for analysis.

Extract
Determine exactly what data you need

Requesting the data involves the first two steps of the ETL process

Step 1: Determine the Purpose and Scope of the Data Request


Determined and scoped, any risks and assumptions documented

Step 2: Obtain the Data


Determine whom to ask and specifically what is needed, what format is needed (Excel, PDF, database),
and by what deadline

Obtaining the Data via a Data Request


Determining what data is needed, which tool will be used to test and process the data will aid the
database administrator in providing the data to you in the most accessible format.

It is also necessary to specify the format in which you would like to receive the data

When you receive the data, make sure that you understand the data in each column (the data dictionary
should prove extremely helpful for this)

ADS (developed by the AICPA): alleviate the headaches associated with data requests by serving as a
guide to standardize these requests and specify the format an auditor desires from the company being
audited (Order-to-Cash subledger, Procure-to-Pay subledger, Inventory subledger, General Ledger)

Data request form template can make communication easier between data requester and provider.

Once the data are received, you can move on to the transformation phase of the ETL process

Obtaining the Data Yourself


After identifying the goal of the data analysis project in the first step of the IMPACT cycle, follow a similar
process to how you would request the data

Identify the tables that contain the information you need.

Identify which attributes

Identify how those tables are related to each other.

Once you have identified the data you need, you can start gathering the information.

SQL (Structured Query Language): computer language to interact with data (tables, records, and
attributes) in a database by creating, updating, deleting, and extracting; combine data from one or more
tables and organize the data in a way that is more intuitive and useful for data
Microsoft Excel or Power BI: When data are not stored in a relational database, or are not too large for
Excel, the entire table can be analyzed directly in a spreadsheet; simpler for doing exploratory analysis

Two of Excel’s most useful techniques for looking up data and matching them based on a matching
primary key/foreign key relationship are the VLOOKUP or Index/Match functions.

SQL will often be the best option for retrieving data, after which that data can be loaded into Excel or
another tool for further analysis (can be saved and reproduced at will or at regular intervals, easier and
more efficient to re-create data requests

When you are performing exploratory analysis, it can be beneficial to load entire tables into Excel and
bypass the SQL step

Transform
Step 3: Validating the Data for Completeness and Integrity
It is possible that some of the data could have been lost during the extraction

Ensure that the extracted data are complete, and the integrity of the data remains to validate the data
after extraction:

Compare the number of records that were extracted to the number of records in the source: ensuring
that the record counts match

Compare descriptive statistics for numeric fields: ensure that the numeric data were extracted
completely.

Validate Date/Time fields: same way as numeric fields, to numeric and running descriptive statistic
comparisons.

Compare string limits for text fields: ensure that you haven’t cut off any characters.

Step 4: Cleaning the Data


If the dataset is large, or if the error is difficult to find, it may be easiest to go back to the extraction and
examine how the data were extracted, fix any errors in the SQL code, and re-run the extraction.

Pay close attention to the state of the data and clean them as necessary to improve the quality of the
data and subsequent analysis.

Remove headings or subtotals

Clean leading zeroes and nonprintable characters

Format negative numbers

Correct inconsistencies across data, in general

A Note about Data Quality


Low-quality data will often contain numerous errors, obsolete or incorrect data, or invalid data.

Five main data quality issues to consider: Dates, Numbers, International characters and encoding,
Languages and measures, Human error
Load
Step 5: Loading the Data for Data Analysis
Variety of different tools to use for analyzing data beyond including Excel, Power BI, Tableau Prep, and
Tableau Desktop.

ETL or ELT
ETL has been in popular

However, the procedure is shifting toward ELT. Particularly with tools such as Microsoft’s Power BI suite
most common method for mastering the data that we use is more in line with ELT than ETL

ETHICAL CONSIDERATIONS OF DATA COLLECTION AND


USE
The scope for digital risk was limited to cybersecurity threats to make sure the data were secure;
however, increasingly the concern is the risk of lacking ethical data practices

Potential ethical issues include an individual’s right to privacy and whether assurance is offered that
certain data are not misused

The Institute of Business Ethics suggests that companies consider the following six questions

How does the company use data, and to what extent are they integrated into firm strategy?

Does the company send privacy notices to individuals when their personal data is collected?

Does the company assess the risks linked to the specific type of data the company uses?

Does the company have safeguards in place to mitigate the risks of data misuse?

Does the company have the appropriate tools to manage the risks of data misuse?

Does our company conduct appropriate due diligence when sharing with or acquiring data from third
parties?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy