0% found this document useful (0 votes)

24 views16 pages

DAFD UNit-2

Uploaded by

navyasrireddy009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views16 pages

DAFD UNit-2

Uploaded by

navyasrireddy009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Unit - 2

Data Analytics Lifecycle :

The Data analytic lifecycle is designed for Big Data problems and data science
projects. The cycle is iterative to represent real project. To address the distinct
requirements for performing analysis on Big Data, step – by – step methodology
is needed to organize the activities and tasks involved with acquiring,
processing, analyzing, and repurposing data.

Phase 1: Discovery –

 The data science team learn and investigate the problem.

 Develop context and understanding.
 Come to know about data sources needed and available for the project.
 The team formulates initial hypothesis that can be later tested with data.
Phase 2: Data Preparation –

 Steps to explore, preprocess, and condition data prior to modeling and

analysis.
 It requires the presence of an analytic sandbox, the team execute, load, and
transform, to get data into the sandbox.
 Data preparation tasks are likely to be performed multiple times and not in
predefined order.
 Several tools commonly used for this phase are – Hadoop, Alpine Miner,
Open Refine, etc.
Phase 3: Model Planning –

 Team explores data to learn about relationships between variables and

subsequently, selects key variables and the most suitable models.
 In this phase, data science team develop data sets for training, testing, and
production purposes.
 Team builds and executes models based on the work done in the model
planning phase.
 Several tools commonly used for this phase are – Matlab, STASTICA.
Phase 4: Model Building –

 Team develops datasets for testing, training, and production purposes.

 Team also considers whether its existing tools will suffice for running the
models or if they need more robust environment for executing models.
 Free or open-source tools – Rand PL/R, Octave, WEKA.
 Commercial tools – Matlab , STASTICA.
Phase 5: Communication Results –

 After executing model team need to compare outcomes of modeling to

criteria established for success and failure.
 Team considers how best to articulate findings and outcomes to various team
members and stakeholders, taking into account warning, assumptions.
 Team should identify key findings, quantify business value, and develop
narrative to summarize and convey findings to stakeholders.
Phase 6: Operationalize –

 The team communicates benefits of project more broadly and sets up pilot
project to deploy work in controlled way before broadening the work to full
enterprise of users.
 This approach enables team to learn about performance and related
constraints of the model in production environment on small scale , and
make adjustments before full deployment.
 The team delivers final reports, briefings, codes.
 Free or open source tools – Octave, WEKA, SQL, MADlib.
Obtaining Data Files:

Data collection is the process of acquiring, collecting, extracting, and storing the
voluminous amount of data which may be in the structured or unstructured form
like text, video, audio, XML files, records, or other image files used in later
stages of data analysis.
In the process of big data analysis, “Data collection” is the initial step before
starting to analyze the patterns or useful information in data. The data which is to
be analyzed must be collected from different valid sources.
The data which is collected is known as raw data which is not useful now but on
cleaning the impure and utilizing that data for further analysis forms information,
the information obtained is known as “knowledge”. Knowledge has many
meanings like business knowledge or sales of enterprise products, disease
treatment, etc. The main goal of data collection is to collect information-rich
data.

Data collection starts with asking some questions such as what type of data is to
be collected and what is the source of collection. Most of the data collected are
of two types known as “qualitative data“ which is a group of non-numerical data
such as words, sentences mostly focus on behavior and actions of the group and
another one is “quantitative data” which is in numerical forms and can be
calculated using different scientific tools and sampling data.

The actual data is then further divided mainly into two types known as:

1. Primary data
2. Secondary data

1.Primary data:

The data which is Raw, original, and extracted directly from the official sources
is known as primary data. This type of data is collected directly by performing
techniques such as questionnaires, interviews, and surveys. The data collected
must be according to the demand and requirements of the target audience on
which analysis is performed otherwise it would be a burden in the data
processing.

Few methods of collecting primary data:

1. Interview method:

The data collected during this process is through interviewing the target audience
by a person called interviewer and the person who answers the interview is
known as the interviewee. Some basic business or product related questions are
asked and noted down in the form of notes, audio, or video and this data is stored
for processing. These can be both structured and unstructured like personal
interviews or formal interviews through telephone, face to face, email, etc.

2. Survey method:
ADVERTISING

The survey method is the process of research where a list of relevant questions
are asked and answers are noted down in the form of text, audio, or video. The
survey method can be obtained in both online and offline mode like through
website forms and email. Then that survey answers are stored for analyzing data.
Examples are online surveys or surveys through social media polls.

3. Observation method:

The observation method is a method of data collection in which the researcher

keenly observes the behavior and practices of the target audience using some
data collecting tool and stores the observed data in the form of text, audio, video,
or any raw formats. In this method, the data is collected directly by posting a few
questions on the participants. For example, observing a group of customers and
their behavior towards the products. The data obtained will be sent for
processing.

4. Experimental method:
The experimental method is the process of collecting data through performing
experiments, research, and investigation. The most frequently used experiment
methods are CRD, RBD, LSD, FD.

 CRD- Completely Randomized design is a simple experimental design used

in data analytics which is based on randomization and replication. It is mostly
used for comparing the experiments.
 RBD- Randomized Block Design is an experimental design in which the
experiment is divided into small units called blocks. Random experiments are
performed on each of the blocks and results are drawn using a technique
known as analysis of variance (ANOVA). RBD was originated from the
agriculture sector.
 LSD – Latin Square Design is an experimental design that is similar to CRD
and RBD blocks but contains rows and columns. It is an arrangement of NxN
squares with an equal amount of rows and columns which contain letters that
occurs only once in a row. Hence the differences can be easily found with
fewer errors in the experiment. Sudoku puzzle is an example of a Latin
square design.
 FD- Factorial design is an experimental design where each experiment has
two factors each with possible values and on performing trail other
combinational factors are derived.

2. Secondary data:

Secondary data is the data which has already been collected and reused again for
some valid purpose. This type of data is previously recorded from primary data
and it has two types of sources named internal source and external source.

Internal source:

These types of data can easily be found within the organization such as market
record, a sales record, transactions, customer data, accounting resources, etc. The
cost and time consumption is less in obtaining internal sources.

External source:
The data which can’t be found at internal organizations and can be gained
through external third party resources is external source data. The cost and time
consumption is more because this contains a huge amount of data. Examples of
external sources are Government publications, news publications, Registrar
General of India, planning commission, international labor bureau, syndicate
services, and other non-governmental publications.

What is Data Analytics?

Data analytics in the context of this write-up could be seen as the act
of sourcing, processing, analysing, interpreting and visualizing data
with the primary objective of extracting actionable insights from the
results of the analysis.

The International Auditing and Assurance Standards Board (IAASB)

defines data analytics as the science and art of discovering and
analysing patterns, deviations and inconsistencies, and extracting
other useful information in the data underlying or related to the
subject matter of an audit through analysis, modelling and
visualisation for the purpose of planning and performing the audit.
{1}

There are various data analytic tools available to auditors. In fact,

audit firms are now integrating data analytic capability into their
audit workflow and this allows them to run all their analytics within
a single system. Common data analytic tools include excel data
analysis tool, python, SQL query, IDEA analytical software, R
Programming, tableau etc.

Impact of Data Analytics on Audit

Data analytics has positively impacted audit in several ways some of

which are;

1) Fraud and Error Detection: Data analytic tools can help to

discover unusual pattern in large data set which sometimes may be
suggestive of fraud or error.

2) Analytic tools usually contain features that assist to discover

invalid, missing or erroneous data and could sometimes assist in
confirming the completeness of a population to be tested. Likewise,
through the application of computer assisted audit techniques
(“CAATs”), audit analytic tools assist in verifying the completeness
and accuracy of system generated reports.

3) CAAT enables auditors to easily take care of the highly

automated ledger balances thereby allowing an increased focus on
high risk areas that require deep professional judgement.

4) It enables effective managerial decision making through

superior business intelligence provided by data analytic tools such as
Microsoft Power BI

5) Increases efficiency thereby reducing the time spent mundane

task on an audit;
6) The traditional auditing methodology is based on sampling
technique; selecting a set of data for sampling out of a large
population. However, with data analytics, it is possible for auditors
to test the entire population and provide audit evidences in more
detailed granular form.

7) Also, advanced machine learning and artificial intelligence has

led to the invention of warehouse drones which can be used to carry
out inventory count. Inventory count involves taking a record count
of the stock being held by a company at a particular point in time.
Inventory counts are usually done manually and may be very
strenuous and time-consuming.

What is a File Format

File formats are designed to store specific types of information, such as CSV,
XLSX etc. The file format also tells the computer how to display or process its
content. Common file formats, such as CSV, XLSX, ZIP, TXT etc.
If you see your future as a data scientist so you must understand the different
types of file format. Because data science is all about the data and it’s processing
and if you don’t understand the file format so may be it’s quite complicated for
you. Thus, it is mandatory for you to be aware of different file formats.
Different type of file formats:
CSV: the CSV is stand for Comma-separated values. as-well-as this name CSV
file is use comma to separated values. In CSV file each line is a data record and
Each record consists of one or more than one data fields, the field is separated by
commas.
How do you prepare your data?
Data preparation follows a series of steps that starts with collecting the right data, followed by
cleaning, labeling, and then validation and visualization.

Collect data

Collecting data is the process of assembling all the data you need for ML. Data collection can be
tedious because data resides in many data sources, including on laptops, in data warehouses, in the
cloud, inside applications, and on devices. Finding ways to connect to different data sources can
be challenging. Data volumes are also increasing exponentially, so there is a lot of data to search
through. Additionally, data has vastly different formats and types depending on the source. For
example, video data and tabular data are not easy to use together.

Clean data

Cleaning data corrects errors and fills in missing data as a step to ensure data quality. After you
have clean data, you will need to transform it into a consistent, readable format. This process can
include changing field formats like dates and currency, modifying naming conventions, and
correcting values and units of measure so they are consistent.

Label data

Data labeling is the process of identifying raw data (images, text files, videos, and so on) and
adding one or more meaningful and informative labels to provide context so an ML model can
learn from it. For example, labels might indicate if a photo contains a bird or car, which words
were mentioned in an audio recording, or if an X-ray discovered an irregularity. Data labeling is
required for various use cases, including computer vision, natural language processing, and speech
recognition.

Validate and visualize

After data is cleaned and labeled, ML teams often explore the data to make sure it is correct and
ready for ML. Visualizations like histograms, scatter plots, box and whisker plots, line plots, and
bar charts are all useful tools to confirm data is correct. Additionally, visualizations also help data
science teams complete exploratory data analysis. This process uses visualizations to discover
patterns, spot anomalies, test a hypothesis, or check assumptions. Exploratory data analysis does
not require formal modeling; instead, data science teams can use visualizations to decipher the
data.

How to do data organization?

Data organization is the process of putting data into groups and categories to
make it easier to use so that it can be accessed, processed, and analyzed
more quickly.

You’ll need to organize your data in the most logical and orderly way
possible, similar to how we collect critical papers in file folders, so you and
anybody who has access to it can quickly find what they’re searching for.

It enables us to organize the information so that it is simple to read and use.

Working with or performing any analytics on raw data is challenging. So, to
properly portray the data, we must arrange it.

In a world where data sets are some of the most valuable assets possessed by
businesses across many different sectors, companies use this method to use
their data assets better.

Executives and other professionals may put a lot of effort into organizing
data as part of a larger plan to streamline business processes, get better
business intelligence, and improve a business model in general.

What is data sampling?

Data sampling is a statistical analysis technique used to select, manipulate

and analyze a representative subset of data points to identify patterns and
trends in the larger data set being examined. It enables data scientists,
predictive modelers and other data analysts to work with a small,
manageable amount of data about a statistical population to build and run
analytical models more quickly, while still producing accurate findings.

Why is data sampling important?

Data sampling is a widely used statistical approach that can be applied in

various use cases, including opinion, web analytics or political polls. For
example, a researcher doesn't need to speak with every American to
discover the most common method of commuting to work in the U.S.
Instead, they can choose 1,000 participants as a representative sample in the
hopes that this number will be sufficient to produce accurate results.

Therefore, data sampling enables data scientists and researchers to

extrapolate knowledge about a broad population from a smaller sample of
data. By taking a data sample, predictions about the larger population can be
made with a certain level of confidence without having to collect and
analyze data from each member of the population.

Advantages and challenges of data sampling

Data sampling is an effective approach for data analysis that comes with
various benefits and also a few challenges.

Benefits of data sampling

 Time savings. Sampling can be particularly useful with data sets that are
too large to efficiently analyze in full -- for example, in big data
analytics applications or surveys. Identifying and analyzing a
representative sample is more efficient and less time-consuming than
surveying the entirety of the data or population.

 Cost savings. Data sampling is often more cost-effective than collecting

data from the entire population.
 Accuracy. Correct sampling techniques can produce reliable findings.
Researchers can accurately interpret information about the total
population by selecting a representative sample.

 Flexibility. Data sampling provides researchers with the flexibility to

choose from a variety of sampling methods and sample sizes to best
address their research questions and make use of their resources.

 Bias elimination. Sampling can help to eliminate bias in data analysis,

as a well-designed sample can limit the influence of outliers, errors and
other kinds of bias that may impair the analysis of the entire population.

What are Descriptive Statistics?

In Descriptive statistics, we are describing our data with the help of various
representative methods using charts, graphs, tables, excel files, etc. In descriptive
statistics, we describe our data in some manner and present it in a meaningful
way so that it can be easily understood. Most of the time it is performed on small
data sets and this analysis helps us a lot to predict some future trends based on
the current findings. Some measures that are used to describe a data set are
measures of central tendency and measures of variability or dispersion.

Types of Descriptive Statistics

 Measures of Central Tendency
 Measure of Variability
 Measures of Frequency Distribution
Measures of Central Tendency

It represents the whole set of data by a single value. It gives us the location of
the central points. There are three main measures of central tendency:

 Mean
 Mode
 Median

Mean
It is the sum of observations divided by the total number of observations. It is
also defined as average which is the sum divided by count.

Mode
It is the value that has the highest frequency in the given data set. The data set
may have no mode if the frequency of all data points is the same. Also, we can
have more than one mode if we encounter two or more data points having the
same frequency.

Median
It is the middle value of the data set. It splits the data into two halves. If the
number of elements in the data set is odd then the center element is the median
and if it is even then the median would be the average of two central elements.

Measure of Variability

Measures of variability are also termed measures of dispersion as it helps to gain

insights about the dispersion or the spread of the observations at hand.

Unit - 3

What is Benford’s Law?

Benford’s law is named after a physicist called Frank Benford and was first
discovered in the 1880s by an astronomer named Simon Newcomb. Newcomb was
looking through logarithm tables (used before pocket calculators were invented to
find the value of the logarithms of numbers), when he spotted that the pages which
started with earlier digits, like 1, were significantly more worn than other pages.

Given a large set of numerical data, Benford’s Law asserts that the first digit of these
numbers is more likely to be small. If the data follows Benford’s Law, then
approximately 30% of the time the first digit would be a 1, whilst 9 would only be the
first digit around 5% of the time. If the distribution of the first digit was uniform, then
they would all occur equally often (around 11% of the time). It also proposes a
distribution of the second digit, third digit, combinations of digits, and so
on. According to Benford’s Law, the probability that the first digit in a dataset is d is
given by P(d) = log10(1 + 1/d).

Why is it useful?
There are plenty of data sets that have proven to have followed Benford’s Law,
including stock prices, population numbers, and electricity bills. Due to the large
availability of data known to follow Benford’s Law, checking a data set to see if it
follows Benford’s Law can be a good indicator as to whether the data has been
manipulated. While this is not definitive proof that the data is erroneous or fraudulent,
it can provide a good indication of problematic trends in your data.

In the context of fraud, Benford’s law can be used to detect anomalies and
irregularities in financial data. For example, within large datasets such as invoices,
sales records, expense reports, and other financial statements. If the data has been
fabricated, then the person tampering with it would probably have done so
“randomly”. This means the first digits would be uniformly distributed and thus, not
follow Benford’s Law.

Below are some real-world examples where Benford’s Law has been applied:

Detecting fraud in financial accounts – Benford’s Law can be useful in its

application to many different types of fraud, including money laundering and large
financial accounts. Many years after Greece joined the eurozone, the economic data
they provided to the E.U. was shown to be probably fraudulent using this method.

Detecting election fraud – Benford’s Law was used as evidence of fraud in the 2009
Iranian elections and was also used for auditing data from the 2009 German federal
elections. Benford’s Law has also been used in multiple US presidential elections.

Analysis of price digits – When the euro was introduced, all the different exchange
rates meant that, while the “real” price of goods stayed the same, the “nominal” price
(the monetary value) of goods was distorted. Research carried out across Europe
showed that the first digits of nominal prices followed Benford’s Law. However,
deviation from this occurred for the second and third digits. Here, trends more
commonly associated with psychological pricing could be observed.

Technical Proposal For Baseline Survey Jan 2023
100% (5)
Technical Proposal For Baseline Survey Jan 2023
30 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Sources of Data
100% (3)
Sources of Data
18 pages
AP Statistics Problems #19
No ratings yet
AP Statistics Problems #19
3 pages
Impulsive Buying Behaviour - Big Bazaar
100% (1)
Impulsive Buying Behaviour - Big Bazaar
62 pages
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
No ratings yet
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
18 pages
Data Analytics PDF
No ratings yet
Data Analytics PDF
115 pages
Unit 2 BI & Data Science
No ratings yet
Unit 2 BI & Data Science
35 pages
DA Unit1 Notes
No ratings yet
DA Unit1 Notes
28 pages
All Unit Notes
No ratings yet
All Unit Notes
116 pages
BigDataAnalytics - Unit1
No ratings yet
BigDataAnalytics - Unit1
21 pages
Data Analytics BCSDS501
No ratings yet
Data Analytics BCSDS501
114 pages
Data Science
100% (2)
Data Science
68 pages
Data Analytics Unit 1
No ratings yet
Data Analytics Unit 1
16 pages
Unit 1 Da
No ratings yet
Unit 1 Da
69 pages
Data Analytics Unit-1 Part 1
No ratings yet
Data Analytics Unit-1 Part 1
37 pages
Unit 1 Notes - Data Analysis Using R
No ratings yet
Unit 1 Notes - Data Analysis Using R
17 pages
Marketing Data Sources
No ratings yet
Marketing Data Sources
38 pages
Xi Ai Unit - 5 Notes
No ratings yet
Xi Ai Unit - 5 Notes
28 pages
Unit2 DATA SCIENCE
No ratings yet
Unit2 DATA SCIENCE
8 pages
Important Question of Introduction of Data Science
No ratings yet
Important Question of Introduction of Data Science
10 pages
UNIT 2 Notes - Data Science
No ratings yet
UNIT 2 Notes - Data Science
18 pages
Research Methodology Unit 4
No ratings yet
Research Methodology Unit 4
5 pages
Comprehensive Guide To Data Collection
No ratings yet
Comprehensive Guide To Data Collection
16 pages
Data Analytics - Unit - 1
No ratings yet
Data Analytics - Unit - 1
25 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Unit 2 - Data Science Methodology Notes
No ratings yet
Unit 2 - Data Science Methodology Notes
26 pages
Unit I
No ratings yet
Unit I
15 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
DATA ANALYSIS Docx
No ratings yet
DATA ANALYSIS Docx
17 pages
Bes 3 (Daisy)
No ratings yet
Bes 3 (Daisy)
22 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
Data Science - III
No ratings yet
Data Science - III
94 pages
Module I (Introduction Data Analytics Life Cycle) Part II
No ratings yet
Module I (Introduction Data Analytics Life Cycle) Part II
103 pages
Unit V
No ratings yet
Unit V
3 pages
Business Undestanding and Data Collection
No ratings yet
Business Undestanding and Data Collection
27 pages
Data Analysis: Types, Process, Methods, Techniques and Tools
No ratings yet
Data Analysis: Types, Process, Methods, Techniques and Tools
6 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
HTTTTC - Final Exam
No ratings yet
HTTTTC - Final Exam
4 pages
Data Collection Lecture
No ratings yet
Data Collection Lecture
10 pages
Module 4
No ratings yet
Module 4
8 pages
What Is Data Anaysis
No ratings yet
What Is Data Anaysis
8 pages
Data Collection Is The Process of Gathering and Measuring Information On Variables of Interest, in An Established
No ratings yet
Data Collection Is The Process of Gathering and Measuring Information On Variables of Interest, in An Established
3 pages
Data Collection Is An Important Aspect of Research
No ratings yet
Data Collection Is An Important Aspect of Research
9 pages
IBM Q1 Technical Marketing ASSET2 - Data Science Methodology-Best Practices For Successful Implementations Ov37176 PDF
No ratings yet
IBM Q1 Technical Marketing ASSET2 - Data Science Methodology-Best Practices For Successful Implementations Ov37176 PDF
6 pages
Unit 2 FDS
No ratings yet
Unit 2 FDS
13 pages
Data Analysis
No ratings yet
Data Analysis
22 pages
Module 1
No ratings yet
Module 1
20 pages
Datas Unit1
No ratings yet
Datas Unit1
20 pages
Unit 2
No ratings yet
Unit 2
58 pages
Data Mining 3
No ratings yet
Data Mining 3
31 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
BDA-24 - Lect (3-4) - (Fundamentals of Data Analysis)
No ratings yet
BDA-24 - Lect (3-4) - (Fundamentals of Data Analysis)
15 pages
Data Analytics Part 3
No ratings yet
Data Analytics Part 3
54 pages
Module 1B
No ratings yet
Module 1B
65 pages
Analysis of Data Is A Process of Inspecting, Cleaning, Transforming, and
No ratings yet
Analysis of Data Is A Process of Inspecting, Cleaning, Transforming, and
12 pages
ATW115 Slides Chp02
No ratings yet
ATW115 Slides Chp02
52 pages
Data Analytics and Visualization Unit-I
No ratings yet
Data Analytics and Visualization Unit-I
25 pages
Various Types of Statistical Data and Collection
No ratings yet
Various Types of Statistical Data and Collection
22 pages
EDUC 210 CHAPTER 4 Data Collection Is A Crucial Step in Many Fields
No ratings yet
EDUC 210 CHAPTER 4 Data Collection Is A Crucial Step in Many Fields
7 pages
CHAPTER 4 Exploratory Research Design
No ratings yet
CHAPTER 4 Exploratory Research Design
24 pages
LESSON1 ObtainingData
100% (1)
LESSON1 ObtainingData
32 pages
Cullen 1988
No ratings yet
Cullen 1988
15 pages
BBD University: A Synopsis On
No ratings yet
BBD University: A Synopsis On
5 pages
Skoda Automobile
No ratings yet
Skoda Automobile
82 pages
Semester 4: Course Code Course Title Credit Hours
No ratings yet
Semester 4: Course Code Course Title Credit Hours
9 pages
AGREE Analytical GREEnness Metric Approach and Software
No ratings yet
AGREE Analytical GREEnness Metric Approach and Software
7 pages
Sampling and Analysis For Public Reporting of Portable XRF Data 124550
No ratings yet
Sampling and Analysis For Public Reporting of Portable XRF Data 124550
8 pages
Research Methodology: Method of Research Group 4 Laguna State Polytechnic University (Lspu), Los Baños
No ratings yet
Research Methodology: Method of Research Group 4 Laguna State Polytechnic University (Lspu), Los Baños
70 pages
Skewed & Symmetric Distributions Foldable
No ratings yet
Skewed & Symmetric Distributions Foldable
4 pages
Rosenthal
100% (1)
Rosenthal
3 pages
A Project Report ON Consumer Perception Towards Patanjali Products
100% (1)
A Project Report ON Consumer Perception Towards Patanjali Products
45 pages
Sales Promotion
100% (1)
Sales Promotion
63 pages
Example of Research Paper For Hotel and Restaurant Management
No ratings yet
Example of Research Paper For Hotel and Restaurant Management
6 pages
DSC Unit 3 Cse
No ratings yet
DSC Unit 3 Cse
9 pages
Ajol File Journals - 333 - Articles - 243360 - Submission - Proof - 243360 3973 584939 1 10 20230312
No ratings yet
Ajol File Journals - 333 - Articles - 243360 - Submission - Proof - 243360 3973 584939 1 10 20230312
16 pages
Ijcisim 20
No ratings yet
Ijcisim 20
13 pages
Stat Full Book MCQs
100% (1)
Stat Full Book MCQs
28 pages
Course Outline Statistical Inference BBA QTM 232 30092022 081756am
No ratings yet
Course Outline Statistical Inference BBA QTM 232 30092022 081756am
6 pages
Audit Tests 9
No ratings yet
Audit Tests 9
5 pages
MfE Chapter 2 Lesson 1
No ratings yet
MfE Chapter 2 Lesson 1
24 pages
Stat Cluster Sampling
No ratings yet
Stat Cluster Sampling
22 pages
MCQ'S Auditing Sem 4
100% (1)
MCQ'S Auditing Sem 4
31 pages
5.attribute Control ChartNew
No ratings yet
5.attribute Control ChartNew
52 pages
Qualitative Data Analysis - Presentation PDF
100% (1)
Qualitative Data Analysis - Presentation PDF
64 pages
KCC Project Report
No ratings yet
KCC Project Report
60 pages
Grade 10 April 01
No ratings yet
Grade 10 April 01
95 pages
Euphemistic Strategies
No ratings yet
Euphemistic Strategies
14 pages
Basic Statistical Concepts For Nurses
100% (2)
Basic Statistical Concepts For Nurses
23 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DAFD UNit-2

Uploaded by

DAFD UNit-2

Uploaded by

Unit - 2

Data Analytics Lifecycle :

 The data science team learn and investigate the problem.

 Steps to explore, preprocess, and condition data prior to modeling and

 Team explores data to learn about relationships between variables and

 Team develops datasets for testing, training, and production purposes.

 After executing model team need to compare outcomes of modeling to

Few methods of collecting primary data:

The observation method is a method of data collection in which the researcher

 CRD- Completely Randomized design is a simple experimental design used

What is Data Analytics?

The International Auditing and Assurance Standards Board (IAASB)

There are various data analytic tools available to auditors. In fact,

Impact of Data Analytics on Audit

Data analytics has positively impacted audit in several ways some of

1) Fraud and Error Detection: Data analytic tools can help to

2) Analytic tools usually contain features that assist to discover

3) CAAT enables auditors to easily take care of the highly

4) It enables effective managerial decision making through

5) Increases efficiency thereby reducing the time spent mundane

7) Also, advanced machine learning and artificial intelligence has

What is a File Format

Validate and visualize

How to do data organization?

It enables us to organize the information so that it is simple to read and use.

What is data sampling?

Data sampling is a statistical analysis technique used to select, manipulate

Why is data sampling important?

Data sampling is a widely used statistical approach that can be applied in

Therefore, data sampling enables data scientists and researchers to

Advantages and challenges of data sampling

Benefits of data sampling

 Cost savings. Data sampling is often more cost-effective than collecting

 Flexibility. Data sampling provides researchers with the flexibility to

 Bias elimination. Sampling can help to eliminate bias in data analysis,

What are Descriptive Statistics?

Types of Descriptive Statistics

Measures of variability are also termed measures of dispersion as it helps to gain

What is Benford’s Law?

Detecting fraud in financial accounts – Benford’s Law can be useful in its

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.