0% found this document useful (0 votes)
49 views27 pages

Analytical Approaches and Tools To Analyze Data

Uploaded by

Domakonda Neha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views27 pages

Analytical Approaches and Tools To Analyze Data

Uploaded by

Domakonda Neha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Analytical approaches and tools

to analyze data
U-3 P-2
Analytical approaches
• The ways adopted to sort, analyze, or solve
problems are known as analytical approaches.
• As size of the data to be analyzed grew, newer
analytical approaches were adopted.
• Commonly used approaches are,
1. Ensemble methods
2. Text data analysis
Ensemble methods
• It is a process of generating multiple models
and combining them to solve a specific
problem.
• The main aim is to minimize the probability of
selecting a poor item and also to improve the
performance.
• Ex: Bagging, Boosting, Random forests
Bagging
• It stands for bootstrap aggregation.
• The idea behind bagging is combining the
results of multiple models (for instance, all
decision trees) to get a generalized result.
• we create subsets of observations from the
original dataset, with replacement.
• A base model (weak model) is created on each
of these subsets.
• The models run in parallel and are independent
of each other.
Boosting
• Boosting is a sequential process, where each
subsequent model attempts to correct the
errors of the previous model.
• A subset is created from the original dataset.
• Initially, all data points are given equal
weights.
• A base model is created on this subset.
• This model is used to make predictions on the
whole dataset.
• Errors are calculated using the actual values and
predicted values.
• The observations which are incorrectly predicted, are
given higher weights.
• Another model is created and predictions are made
on the dataset.
• Similarly, multiple models are created, each
correcting the errors of the previous model.
• The final model (strong learner) is the weighted
mean of all the models (weak learners).
• Thus, the boosting algorithm combines a number of
weak learners to form a strong learner.
Random Forests
• In this, random samples are generated,
multiple trees are constructed and a random
subset of inputs called predictors are
evaluated for each tree.
• The final prediction is calculated by averaging
the predictions from all decision trees.
Text data analysis
• The processing and modeling of textual data to
gain useful business insights is called text data
analysis.
• An essential part of text data analysis is text
mining which mines high quality information.
• This information is derived by finding relationships
and patterns from massive collection of text.
• It takes text as input, where the text can be e-mail,
media data, etc.
Precautions against Fraudulent practices
Advanced analytics can be used to safeguard companies from
falling into trap of imposters and frauds. Financial organizations
usually take following precautions.
• Record the information such as contact number, password,
username, email of customer.
• Determine the IP addresses of customers from the mails sent or
received by them.
• Identify whether the email id is fake or real.
• Match IP address, contact number, email, address of customer
and determine whether all the information are from same place
or different.
• Search the given contact number and check whether it is
reported for any kind of abuse or scam on the internet.
History of analytical tools
• In late 1980s, Job Control Language(JCL) which
is a scripting language, is used for analytics.
• By the late 1990s, all commercial analytical
tools offered GUIs.
• Later data visualization tools have been
introduced.
GUI
• Generates code that is already defined to
perform a particular task.
• Helps analytic professionals to focus on
analysis methods rather than on writing code.
• Code generated is free of errors and bugs.
Analytic point solutions
• Refers to software packages that solve a specific
group of problems.
• Ex: price optimization applications, fraud
detection and demand forecasting applications.
• It is based on tool suites such as SAS.
• Implementing a one point solution as a
substitute for creating a custom solution for
various problems can help organizations save
money, effort and time.
Data visualization tools
• The results will be displayed in a user-friendly
manner for easy understanding
• Presents data in the form of charts, graphs and
tables.
• Advanced visualization tools analyze and present
data in new ways.
• Tools such as Tableau, Quickview, JMP, Advizor,
Spotfire has enhanced graphics.
• They also allow users to link multiple tabs of graphs
and charts to each other and to the underlying data.
Popular analytical tools
• Some open source analytical tools are as follows:
1. GridGain
2. HPCC
3. Storm
• Popular tools are as follows:
1. R project for statistical computing
2. IBM SPSS
3. SAS
R project for statistical computing
• It is free, open source package and widely used in
academic, research and development
environments.
• Features of R:
1. R is object-oriented.
2. It is possible to embed it in different applications.
3. Can be implemented from commercial analytical
tools.
4. It is extensible language.
• Limitations of R:
• Lack of scalability.
• Memory is not enough.
• Programming in R is a fairly difficult process.
IBM SPSS
• SPSS(Statistical Package for the Social
Sciences) was introduced in 1968 and in 2009,
its name changed to IBM SPSS.
• The functionality can be accessed with the
help of proprietary 4GL known as syntax
language and using a graphical interface
offering menus.
• Very user-friendly and quite simple to use.
Features of IBM SPSS
• SPSS commands are executed one line at a time
to update tables and add results to the output
editor window.
• Can also store executed syntaxes with their times
of execution in the window.
• Can read data from and write to ASCII files,
databases, and tables of other statistical software.
• Provides basic data management functions, such
as sorting, aggregation, and table merge.
• Can send output directly to a file.
• File can be .txt, .html, or .xml format.
• Output Management System(OMS) helps in
storing the outputs in a single file by creating
loop using a macro.
• IBM SPSS statistics can be installed on
different platforms such as Windows, Mac OS
X and Unix.
SAS
• Statistical Analysis System is an information
delivery system, which is an integrated and
hardware-independent computing package.
• Based on 4GL programming language.
• Provides well-organized and timely information
delivery.
• SAS products, commonly known as modules,
are mostly used by social and behavioral
scientists.
Features of SAS
• Statistics
• Data and text mining
• Data visualization
• Forecasting
• Optimization
• Model management and deployment
• Quality improvement
Comparing various analytical tools
R installation
• https://www.javatpoint.com/r-installation
• https://www.javatpoint.com/rstudio-ide

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy