0% found this document useful (0 votes)

9 views25 pages

Data Analytics and Visualization Unit-I

Uploaded by

tunnuofficial01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views25 pages

Data Analytics and Visualization Unit-I

Uploaded by

tunnuofficial01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 25

BCDS501

Introduction to Data Analytics and Visualization

UNIT 1 Introduction to Data

Analytics

SYLLABUS

Sources and nature of data, classification of data (structured, semi-

structured, unstructured), characteristics of data, introduction to Big
Data platform, need of data analytics, evolution of analytic scalability,
analytic process and tools, analysis vs reporting, modern data analytic
tools, applications of data analytics. Data Analytics Lifecycle: Need,
key roles for successful analytic projects, various phases of data
analytics lifecycle – discovery, data preparation, model planning,
model building, communicating results, operationalization
Data Analytics
Data Analytics is a systematic approach that transforms raw data into valuable insights.
This process encompasses a suite of technologies and tools that facilitate data collection,
cleaning, transformation, and modelling, ultimately yielding actionable information. This
information serves as a robust support system for decision-making. Data analysis plays
a pivotal role in business growth and performance optimization. It aids in
enhancing decision-making processes, bolstering risk management strategies, and
enriching customer experiences.

Process of Data Analytics

Data analysts, data scientists, and data engineers together create data pipelines
which helps to set up the model and do further analysis. Data Analytics can be done in the
following steps which are mentioned below:
1.Data Collection : It is the first step where raw data needs to be collected for analysis
purposes. It consists of two steps in which data collection can be done. If the data are
from different source systems then using data integration routines the data analysts have
to combine the different data whereas sometimes the data are the subset of the data set.
In this case, the data analyst would perform some steps to extract the useful subset and
transfer it to the other compartment in the system.
2.Data Cleansing : After collecting the data the next step is to clean the quality of the
data as the collected data consists of a lot of quality problems such as errors, duplicate
entries and white spaces which need to be corrected before moving to the next step. By
running data profiling and data cleansing tasks these errors can be corrected. These data
are organised according to the needs of the analytical model by the analysts.
3.Data Analysis and Data Interpretation: Analytical models are created using software
and other tools which interpret the data and understand it. The tools include Python, Excel,
R, Scala and SQL. Lastly this model is tested again and again until the model works as it
needs to be then in production mode the data set is run against the model.
4.Data Visualisation: Data visualisation is the process of creating visual representation
of data using the plots, charts and graphs which helps to analyse the patterns, trends and
get the valuable insights of the data.
Types of Data Analytics
There are different types of data analysis in which raw data is converted into valuable
insights. Some of the types of data analysis are mentioned below:
1.Descriptive Data Analytics : Descriptive data Analytics is a type of data analysis
which summarises the data set and it is used to compare the past results, differentiate
between the weakness and strength, and identify the anomalies. Descriptive data analysis
is used by the companies to identify the problems in the data set as it helps in identifying
the patterns.
2.Real-time Data Analytics: Real time data Analytics doesn’t use data from past events.
It is a type of data analysis which involves using the data when the data is immediately
entered in the database. This type of analysis is used by the companies to identify the
trends and track the competitors’ operations.
3.Diagnostic Data Analytics: Diagnostic Data Analytics uses past data sets to analyse
the cause of an anomaly. Some of the techniques used in diagnostic analysis are
correlation analysis, regression analysis and analysis of variance.The results which are
provided by diagnostic analysis help the companies to give accurate solutions to the
problems.
4.Predictive Data Analytics: This type of Analytics is done in the current data to predict
future outcomes. To build the predictive models it uses machine learning algorithms,
statistical model techniques to identify the trends and patterns. Predictive data analysis is
also used in sales forecasting, to estimate the risk and to predict customer behaviour.
5.Prescriptive Data Analytics: Prescriptive data Analytics is an analysis of selecting best
solutions to problems. This type of data analysis is used in loan approval, pricing models,
machine repair scheduling, analysing the decisions and so on. To automate decision
making companies use prescriptive data analysis.

Methods of Data Analytics

There are two types of methods in data analysis which are mentioned below:
1. Qualitative Data Analytics
Qualitative data analysis doesn’t use statistics and derives data from the words, pictures
and symbols. Some common qualitative methods are:
•Narrative Analytics is used for working with data acquired from diaries, interviews and so
on.
•Content Analytics is used for Analytics of verbal data and behaviour.
•Grounded theory is used to explain some given event by studying.
2. Quantitative Data Analysis
Quantitative data Analytics is used to collect data and then process it into the numerical
data. Some of the quantitative methods are mentioned below:
•Hypothesis testing assesses the given hypothesis of the data set.
•Sample size determination is the method of taking a small sample from a large group of
people and then analysing it.
•Average or mean of a subject is dividing the sum total numbers in the list by the number
of items present in that list.
Skills Required for Data Analytics
There are multiple skills which are required to be a Data analyst. Some of the main skills
are mentioned below:
•Some of the common programming languages which are used are R and Python.
•For databases Structured Query Language (SQL) is a programming language used.
•Machine Learning is used in data analysis.
•In order to better analyse and interpret probability and statistics are used.
•For collecting and organising data, Data Management is used in data analysis.
•To use charts and graphs Data visualisation is used.
Importance and Usage of Data Analytics
Data analytics consists of many uses in the finance industry. It is also used in agriculture,
banking, retail, government and so on. Some of the main importance of data analysis are
mentioned below:
•Data Analytics targets the main audience of the business by identifying the trends and
patterns from the data sets. Thus, it can improve the businesses to grow and optimise its
performance.
•By doing data analysis it shows the areas where business needs more resources, products
and money and where the right amount of interaction with the customer is not happening
in the business. Thus by identifying the problems then working on those problems to grow
in the business.
•Data analysis also helps in the marketing and advertising of the business to make it
popular and thus more customers will know about the business.
•The valuable information which is taken out from the raw data can bring advantage to the
organisation by examining present situations and predicting future outcomes.
•From data Analytics the business can get better by targeting the right audience,
disposable outcomes and audience spending habits which helps the business to set prices
according to the interest and budget of customers.

Data Processing Cycle

1.Data Acquisition
2.Data Preparation
3.Data Input:
4.Data Processing
5.Data Output
6.Data Storage

How Do We Analyze Data?

Data analysis constitutes the main step of data cycle in which we discover knowledge and
meaningful information from raw data. It’s like reaching deep into the hands of a sand pile,
looking for those gems. Here’s a breakdown of the key aspects involved:Here’s a
breakdown of the key aspects involved:
1. Define Goals and Questions
To begin with, analyze what you need the data for, or in other words, determine your
goals. Are you trying to do seasonal line ups, determine customer behavior or make
forecasting?Clearly defined goals, indeed practical analysis techniques will be the key
factor to ensure alignment to them.

2. Choose the Right Techniques

Actually, there are so many techniques of data analysis making the mind overwhelmed to
choose the appropriate ones. Here are some common approaches:Here are some common
approaches:
•Statistical Analysis: Here, you are able to explore measures like mean, median,
standard deviation
•and hypothesis testing to summarize and prepare data. Among the means to investigate
causal factors, it reveals these relationships.
•Machine Learning: Algorithms depend on a priori data to discover behaviors and
predictively act. It is for these jobs that the categorization (the task of classifying data
points) and regression (the job of prediction of a continuous value) of the data fits well.
•Data Mining: What’s more, it means the exploration of unknown behaviors and
occurrences in immense clusters of data. Techniques like association rule learning and
clustering cater for identification of latent connections.
•Data Visualization: Charts, graphs, and dashboards which happen to be tools of
visualization of data,make easy identifying patterns, trends, and disclosures that would
seem to be unclear in raw numbers

3. Explore and Clean the Data

Prior to engaging in any kind of deep analysis, it is vital to grasp the nature of data. EDA
takes under analysis the construction of profiles, discovery of missing values, and graphing
distributions, in order to figure out what the entire data are about. The data cleaning
process allows you to correct inconsistencies, errors and missing values which helps to
produce a clear picture based on high quality information.

4. Perform the Analysis

Once all the techniques have been chosen and the data cleaning took place then you can
go straight to the data processing itself. Among other techniques, this could encompass
performing certain tests, which can be advanced regression or machine learning
algorithms, or well-crafted data visualisations.

5. Interpret the Results

You should extract the meaning of the analytics carefully as they are specific to the
objectives you have set for yourself. Do not just build the model, show what they signify,
make a point by your analysis limitations, and use your starting questions to make the
conclusions
6. Communicate Insights
Data analysis is customarily done to advance the decision making. Communicate findings
truthfully to all stakeholders such as through means of reports, presentations or interactive
charts.

Characterstics of high quality Data

The following are some of the key characteristics of high quality data:
•Data accuracy
•Data completeness
•Data consistency
•Data coherence
•Data timeliness
•Clear & accessible data definitions
•Data relevance
•Data reliability

What is a Big Data Platform?

A Big Data Platform functions as a structured repository for large volumes of data. These platforms utilise a
combination of data management hardware and software tools to store and manage aggregated data sets,
often in the cloud. They organise and maintain this extensive information in a coherent and accessible
manner to derive meaningful insights.

Typically, these platforms blend various Data Management tools to handle data on a large scale, usually
leveraging cloud storage.

Features of Big Data Platforms

Big Data Platforms are designed to handle and analyse vast amounts of data efficiently. Let’s explore some
key features you can expect from these platforms:

1) Scalability: They can scale horizontally to manage increasing volumes of data without compromising
performance.
2) Distributed Processing: They use distributed computing to process large datasets across multiple nodes,
ensuring faster data processing.

3) Real-time Stream Computing: Capable of processing data in real-time, that is crucial for applications
requiring immediate insights.

4) Machine Learning and Advanced Analytics: They offer built-in tools for Machine Learning and
Advanced Analytics to derive actionable insights from data.

5) Data Analytics and Visualisation: Provide tools for Data Analysis and visualisation to help users make
sense of complex data.

Components of Big Data Platforms

Big Data Platforms are complex systems designed to handle vast volumes of data, process it efficiently, and
turn it into valuable insights. These platforms consist of several essential components, each playing a critical
role in overall functionality.

1) Data Ingestion and Collection

This is the first step in the Big Data journey. Data can come from various sources, including sensors,
applications, social media, and databases. The data ingestion component is responsible for gathering this
diverse data and making it ready for processing. It involves data connectors, adapters, and protocols to
ensure data from different sources can be efficiently brought into the platform.
2) Data Storage

Once data is ingested, it needs a place to reside. Big Data Platforms employ a variety of storage solutions
designed to handle large datasets. Common storage systems include distributed file systems (e.g., Hadoop
HDFS, Amazon S3) and NoSQL databases (e.g., Apache Cassandra, MongoDB). These storage systems are
optimised for scalability, fault tolerance, and high availability, ensuring data remains accessible and reliable
even as it grows.

3) Data Processing and Analysis

This is the heart of Big Data Platforms, where data is transformed, processed, and analysed to extract
meaningful insights. Processing engines and frameworks like Apache Spark, Apache Flink, and Hadoop
MapReduce play a vital role in this component. They distribute and parallelise computations across clusters
of machines, enabling the platform to handle massive workloads efficiently.

Elevate your skills with our expert-led Big Data Architecture Training – join us and learn to design, build
and manage scalable Data Architectures!

4) Data Management and Orchestration

Managing and orchestrating data processing tasks across a distributed infrastructure is a complex task. The
management layer includes components for resource allocation, job scheduling, and workflow orchestration.
This layer ensures that data processing tasks run smoothly and efficiently, optimising resource utilisation.
5) Data Visualisation and Reporting

The insights derived from Big Data analysis are only valuable if they can be understood and acted upon. This
includes tools and technologies for data visualisation and reporting. This component allows users to create
interactive dashboards, generate reports, and visualise trends and patterns in the data.

6) Security and Governance

Data security and governance are paramount in Big Data Platforms, especially when dealing with sensitive
information. This layer includes components for authentication, authorisation, encryption, and auditing. It
ensures that data is protected from unauthorised access and maintains compliance with regulatory
requirements.

How do Big Data Platforms Work?

Big Data Platforms follow a structured process to enable companies to harness data for informed decision-
making. This process involves several key steps:

a) Data Collection: This initial step systematically gathers data from various sources such as databases,
social media, and sensors. Methods like web scraping, data feeds, APIs, and data integration tools are used to
collect data, which is then stored in a central repository, often a data lake or warehouse, for easy access and
further analysis.
b) Data Storage: After collection, data must be stored efficiently for retrieval and processing. Big Data
Platforms typically use distributed storage systems like Hadoop Distributed File System (HDFS), Google
Cloud Storage, or Amazon S3. This architecture ensures high availability, fault tolerance, and scalability.

c) Data Processing: Collected data is processed to extract valuable insights through operations such as
cleaning, transforming, and aggregating. Platforms like Apache Hadoop and Apache Spark enable rapid
computations and complex data transformations.

d) Data Analysis: This step involves examining and interpreting large data volumes to extract meaningful
insights and patterns using machine learning algorithms, data mining techniques, or visualisation tools. The
results inform data-driven decisions, optimise processes, and identify opportunities.

e) Data Quality Assurance: Ensuring data accuracy, consistency, integrity, relevance, and security is crucial.
Techniques like data quality management, lineage tracking, and cataloguing help maintain robust data
quality, giving organisations confidence in their decision-making data.

f) Data Management: This involves organising, storing, and retrieving large data volumes. Techniques such
as data backup, recovery, and archiving ensure fault tolerance and optimised data retrieval for various use
cases.

***************************************************************************************
Benefits of Using Big Data Platforms :There are various benefits of using Big Data Platforms, which are
discussed below:
a
a) Big Data Integration Platforms help organisations make smarter decisions by providing insights from
vast datasets, ensuring that choices depend on facts rather than guesswork.

b) These platforms streamline data storage and processing, reducing infrastructure costs and making Data
Management more affordable.

c) Big Data Platforms enable real-time Data Analysis, allowing companies to respond quickly to changing
situations and seize opportunities as they arise.

d) They help integrate data from various sources, creating a unified view of information and facilitating
comprehensive analysis.

e) With better insights, organisations can tailor their services, products and strategies to meet customer needs
more effectively, increasing satisfaction and loyalty.

f) Big Data Platforms can expand effortlessly to accommodate growing data volumes, ensuring they remain
effective as organisations evolve.

g) Those who harness Big Data gain an edge by staying abreast of the competition and providing superior
products and services.

h) These platforms spark innovation by revealing trends, gaps, and opportunities, driving the development of
new products and services.
i) Big Data Platforms offer robust security features to protect sensitive data, mitigating risks in an
increasingly complex Cyber Security landscape.

j) They improve operations across sectors, from manufacturing to healthcare, increasing efficiency and
reducing waste.

***************************************************************************************
Popular Big Data Platforms

Big Data Platforms are capable of handling massive amounts of data and turning it into some valuable
information. Here, we'll introduce you to a list of those platforms:

**************************************************************************************
a) Apache Hadoop: Apache Hadoop is an excellent platform for keeping and processing large volumes of
data. It's like a robust storage and data processing system that companies use to handle and manage massive
datasets.

b) Apache Spark: Apache Spark is known for its speed and efficiency in analysing data. It's like a powerful
tool that helps organisations quickly make sense of their data and extract valuable insights from it.
c) Apache Flink: Apache Flink is another data processing platform, similar to Spark, that specialises in real-
time Data Analysis. It's used for tasks where speed and low latency are critical, like monitoring online
activities or financial transactions.

d) Amazon Web Services (AWS) Big Data services: AWS offers a suite of Big Data services that run in the
cloud. These services make it easier for companies to store, process, and analyse data without the need for
extensive infrastructure management.

e) Google Cloud Platform (GCP) Big Data services: Similar to AWS, Google Cloud Platform provides a
range of Big Data services in the cloud. These services help organisations leverage Google's computing
power and data analytics capabilities.

f) Microsoft Azure Big Data services: Microsoft Azure offers various Big Data services, including data
storage, processing, and analytics tools. These services are designed to help businesses work with their data
efficiently and effectively.

Evolution of analytic scalability

https://www.studocu.com/in/document/velammal-engineering-college/big-data/evolution-of-analytic-
scalability/22574245

TOOLS

Discover the right tool for your needs and stay ahead in the competitive world of data
analytics

1. Tableau
Tableau is an easy-to-use Data Analytics tool. Tableau has a drag-and-drop interface which
helps to create interactive visuals and dashboards. Organizations can use this to instantly
develop visuals that give context and meaning to the raw data, making the data very easy
to understand. Also, due to the simple and easy-to-use interface, one can easily use this
tool regardless of their technical ability. Furthermore, Tableau comes with a wide range of
features and tools that help you create the best visuals which are easy to understand.
The advantage of Tableau that overshadows all others is in its Quality Visuals embedded
with Interactive Information. But this doesn’t mean Tableau is perfect. Tableau is only
meant for Data Visualisation, so we can’t preprocess data using this tool. Also, it does have
a bit of a learning curve and is known for its high cost.
Features:
• Easy Drag and Drop Interface
• Mobile support for both iOS and Android
• The Data Discovery feature allows you to find hidden data
• You can use various Data sources like SQL Server, Oracle, etc
2. Power BI
Power BI is Microsoft’s Data Analysis Tools. It provides enhanced Interactive Visualisation
and capabilities of Business Intelligence. Power BI achieves all this while providing a
Simple and intuitive User Interface. Being a product of Microsoft, you can expect seamless
integration with various Microsoft products. It allows you to connect with Excel
spreadsheets, cloud-based data sources and on-premises data sources.
Power BI is known and loved for its groundbreaking features like Natural Language
queries, Power Query Editor Support, and intuitive User Interface. But Power BI does have
its downsides. It can not handle records that are bigger than 250 MB in size. Besides, it has
limited sharing capabilities, and you would need to pay extra to scale as per your needs,
Features:
• Great connectivity with Microsoft products
• Powerful Semantic Models
• Can meet both Personal and Enterprise needs
• Ability to create beautiful paginated reports

3. Apache Spark
Apache Spark is known for its speed in Data Processing is a Data Analysis Tools. Spark has
in-memory processing, which makes it incredibly fast. It is also open source which results
in trust and interoperability. The ability to handle enormous amounts of Data makes Spark
distinguished. It is quite easy and straightforward to learn, thanks to its API. This doesn’t
end here. It also has support for Distributed Computing Frameworks.
But Apache Spark does have some drawbacks. It doesn’t have an integrated File
Management System and has fewer algorithms than its competitors. Also, it faces issues if
the files are tiny.
Features:
• Incredible Speed and Efficiency
• Great connectivity with support of Python, Scala, R, and SQL shells
• Ability to handle and manipulate data in real-time
• Can run on many platforms like Hadoop, Kubernetes, Cloud, and also standalone
4. TensorFlow

TensorFlow is a Machine Learning Library and among data analysis tools. This open-source
library was developed by Google and is a popular choice for many businesses looking
forward to supporting Machine Learning capabilities to their Data Analytics workflow
as Tensorflow can build and train Machine Learning Models. Tensorflow is the first choice of
many due to its wide recognition, which results in an adequate amount of tutorials, and
support for many Programming Languages. TensorFlow can also run on GPUs and TPUs,
making the task much faster.
But TensorFlow can be very hard to use for beginners, and you need Coding knowledge to
use it stand alone, and it has a steep learning curve. Tensorflow can also be quite tricky to
install and configure, depending on your system.
Features:
• Supports a lot of programming languages like Python, C++, JavaScript, and Java
• Can scale as needed with support for multiple CPUs, GPUs, or TPUs
• Offers a large community to solve problems and issues
• Features a built-in visualization tool for you to see how the model is performing
5. Hadoop
Hadoop by Apache is a Distributed Processing and Storage Solution and also used as a
data analysis tools. It is an open-source framework that stores and processes Big Data with
the help of the MapReduce Model. Hadoop is known for its scalability. It is also fault-
tolerant and can continue even after one or more nodes fail. Being Open Source, it can be
used freely and customized to suit specific needs, and Hadoop also supports various Data
Formats.
But Hadoop does have some drawbacks. Hadoop requires powerful hardware for it to run
effectively. In addition, it features a steep learning curve making it hard for some users.
This is partly because some users find the MapReduce Model hard to grasp.
Features:
• Free to use as it is Open Source
• Can run on commodity hardware
• Built with fault-tolerance as it can operate even when some node fails
• Highly scalable with the ability to distribute data into multiple nodes

6. R
R is an Open Source Programming language widely used for Statistical Computing and
Data Analysis and can be consider as a data analysis tools. It is known for handling large
Datasets and its flexibility. The package library of R has various packages. Using these
packages, R allows the user to manipulate and visualize data. Besides, R also has
packages for things like Data cleaning, Machine Learning, and Natural Language
Processing. These features make R very capable.
Despite these features, R isn’t perfect. For example, R is significantly slower than
languages like C++ and Java. Besides, R is known to have a steep learning curve,
especially if you are unfamiliar with Programming.
Features:
• Ability to handle large Datasets
• Flexibility to be used in many areas like Data Visualisation, Data Processing
• Features built-in graphics capabilities for amazing visuals
• Offers an active community to answer questions and help in problem-solving

7. Python
Python is another Programming Language popular for Data Analysis and Machine
Learning.Python is used extremely in Data analysis tools. Python is widely recognized to
have easy syntax which makes it easy to learn. Along with the easy syntax, the package
manager of Python features a lot of important packages and libraries. This makes it
suitable for Data Analysis and Machine Learning. Another reason to use Python is its
scalability.
This doesn’t mean Python is flawless. It is quite slow when we compare it to languages
like Java or C++; this is because Python is an interpreted language while the others are
compiled. Besides, Python is also infamous for its high memory consumption.
Features:
• Easy to learn and user-friendly
• Scalable with the ability to handle large datasets
• Extensive packages and libraries that increase the functionality
• Open Source and widely adopted which ensures problems can be fixed easily.
8. SAS
SAS stands for Statistical Analysis System. The SAS Software was developed by the SAS
Institute, and it is widely used for Business Analytics nowadays. SAS has both a Graphical
User Interface and a Terminal Interface. So, depending on the user’s skillsets, they can
choose either one. It also has the ability to handle large datasets. In addition, SAS is
equipped with a lot of Analytical Tools which makes it valid for a lot of applications.
Although SAS is very powerful, it has a big price tag and a steep learning curve, so it is
quite hard for beginners.
Features:
• Ability to handle large datasets
• Support for graphical and non-graphical interface
• Features tools to create high-quality visualizations
• Wide range of tools for predictive and statistical analysis

9. QlikSense
QilkSense is a Business and data analysis Tools that provides support for Data Visualisation
and Data Analysis. QuilkSense supports various Data sources
from Spreadsheets, Databases, and also Cloud Services. You can create amazing
Dashboards and Visualisations. It comes with Machine Learning features and uses AI to
help the user understand the Data. Furthermore, QlikSense also has features like Instant
Search and Natural Language Processing.
But QilkSense does have some drawbacks. The data extraction of QilkSense is quite
inflexible. The Pricing Model is quite complicated, and it is quite sluggish when it comes to
large datasets.
Features:
• Tools for stunning and interactive Data Visualisation
• Conversational AI-powered analytics with Qlik Insight Bot
• Features tools to create high-quality visualizations
• Provides Qlik Big Data Index which is a Data Indexing Engine

10. KNIME
KNIME is an Analytics Platform and a data analysis tools. It is Open Source and features an
User Interface which is intuitive. KNIME is built with scalability and also offers extensibility
via a well-defined API Plugin. You can also automate Spreadsheets, do Machine Learning,
and much more using KNIME. The best part is you don’t even need to code to do all this.
But KNIME does have its issues. The abundance of features can be overwhelming to some
users. Also, the Data Visualisation of KNIME is not the best and can be improved.
Features:
• Intuitive User Interface with drag and drop function
• Support for extensive analytics tools like Machine Learning, Data Mining, Big Data
Processing
• Provides tools to create high-quality visualizations
**************************************************************************************
Steps for Data Analysis Process

1.Define
the
Problem or
Research
Question
2.Collect Data
3.Data Cleaning
4.Analyzing the Data
5.Data Visualization
6.Presenting Data

***************************************************************************************

What is analytics vs reporting?

Analytics is the technique of examining data and reports to obtain actionable insights
that can be used to comprehend and improve business performance. Business users
may gain insights from data, recognize trends, and make better decisions
with workforce analytics.

On the one hand, analytics is about finding value or making new data to help you
decide. This can be performed either manually or mechanically. Next-generation
analytics uses new technologies like AI or machine learning to make predictions about
the future based on past and present data.

The steps involved in data analytics are as follows:

•Developing a data hypothesis

•Data collection and transformation
• Creating analytical research models to analyze and provide insights
• Utilization of data visualization, trend analysis, deep dives, and other tools.
•Making decisions based on data and insights
On the other hand, reporting is the process of presenting data from numerous sources
clearly and simply. The procedure is always carefully set out to report correct data and
avoid misunderstandings.

Today’s reporting applications offer cutting-edge dashboards with advanced data

visualization features. Companies produce a variety of reports, such as financial
reports, accounting reports, operational reports, market studies, and more. This makes
it easier to see how each function is operating quickly.

In general, the procedures needed to create a report are as follows:

•Determining the business requirement

•Obtaining and compiling essential data
•Technical data translation
•Recognizing the data context
•Building dashboards for reporting
•Providing real-time reporting
•Allowing users to dive down into reports
Key differences between analytics vs reporting

Differences between analytics and reporting can significantly benefit your business. If
you want to use both to their full potential and not miss out on essential parts of either
one knowing the difference between the two is important. Some key differences are:

Analytics Reporting

Analytics is the method of examining Reporting is an action that includes all the
and analyzing summarized data to needed information and data and is put
make business decisions. together in an organized way.

Identifying business events, gathering the

Questioning the data, understanding
required information, organizing, summarizing,
it, investigating it, and presenting it to
and presenting existing data are all part of
the end users are all part of analytics.
reporting.

The purpose of analytics is to draw The purpose of reporting is to organize the

conclusions based on data. data into meaningful information.

Analytics is used by data analysts, Reporting is provided to the appropriate

scientists, and business people to business leaders to perform effectively and
make effective decisions. efficiently within a firm.
Analytics and reporting can be used to reach a number of different goals. Both of
these can be very helpful to a business if they are used correctly.

Importance of analytics vs reporting

A business needs to understand the differences between analytics and reporting.

Better data knowledge through analytics and reporting helps businesses in decision-
making and action inside the organization. It results in higher value and performance.
Analytics is not really possible without advanced reporting, but analytics is more than
just reporting. Both tools are made for sharing important information that will help
business people make better decisions

Transforming data into insights

Analytics assists businesses in converting information into insights, whereas reporting

transforms data into information. Analytics aims to take the data and figure out what it
means.

Analytics examines report data to determine why and how to fix data
organizational problems. Analysts begin by asking questions that may arise as they
examine how the data in the reports has been structured. A qualified analyst can make
recommendations to improve business performance once the data analysis is
complete.

Data Analytics and Data Analysis are related yet distinct processes within data
interpretation. Data Analysis focuses on dissecting information from raw data,
identifying trends, patterns, and relationships. It involves cleaning, organizing, and
summarizing data to extract insights. On the other hand, Data Analytics goes beyond
surface-level exploration. It uses advanced techniques to model, predict, and prescribe
outcomes based on historical data, enabling businesses to make informed decisions
for the future.

Analytics and reporting go hand in hand, and you can’t have one without the other.
The raw data are the first step in the whole process. The data then needs to be put
together to make it look like accurate information. Reports can be comprehensive and
employ a range of technologies. Still, their main objective is always to make it simpler
for analysts to understand what is actually happening within the organization.

********************************************************************************

Applications of Data Analytics

Data analytics finds applications across various industries and sectors, transforming
the way organizations operate and make decisions. Here are some examples of how
data analytics is applied in different domains:

Healthcare

Data analytics is revolutionizing the healthcare industry by enabling better patient

care, disease prevention, and resource optimization. For example, hospitals can
analyze patient data to identify high-risk individuals and provide personalized
treatment plans. Data analytics can also help detect disease outbreaks, monitor the
effectiveness of treatments, and improve healthcare operations.

Finance

In the financial sector, data analytics plays a crucial role in fraud detection, risk
assessment, and investment strategies. Banks and financial institutions analyze large
volumes of data to identify suspicious transactions, predict creditworthiness, and
optimize investment portfolios. Data analytics also enables personalized financial
advice and the development of creative financial products and services.

E-commerce

E-commerce platforms utilize data analytics to understand customer behavior,

personalize shopping experiences, and optimize marketing campaigns. By analyzing
customer preferences, purchase history, and browsing patterns, e-commerce
companies can offer personalized product recommendations, target specific customer
segments, and improve customer satisfaction and retention.

Cybersecurity

Data analytics plays a vital role in cybersecurity by detecting and preventing cyber
threats and attacks. Security systems analyze network traffic, user behavior, and
system logs to identify anomalies and potential security breaches. By leveraging data
analytics, organizations can proactively strengthen their security measures, detect and
respond to threats in real-time, and safeguard sensitive information.

Supply Chain Management

Data analytics improves supply chain management by optimizing inventory levels,

reducing costs, and enhancing overall operational efficiency. Organizations can identify
bottlenecks, forecast demand, and improve logistics and distribution processes by
analyzing supply chain data. Data analytics also enables better supplier management
and enhances transparency throughout the supply chain.

Banking

Banks use data analytics to gain insights into customer behavior, manage risks, and
personalize financial services. Banks can tailor their offerings, identify potential fraud,
and assess creditworthiness by analyzing transaction data, customer demographics,
and credit histories. Data analytics also helps banks detect money laundering
activities and improve regulatory compliance.

Logistics

In the logistics industry, data analytics plays a crucial role in optimizing transportation
routes, managing fleet operations, and improving overall supply chain efficiency.
Logistics companies can minimize costs, reduce delivery times, and enhance customer
satisfaction by analyzing data on routes, delivery times, and vehicle performance.
Data analytics also enables better demand forecasting and inventory management.

Retail

Data analytics transforms the retail industry by providing insights into customer
preferences, optimizing pricing strategies, and improving inventory management.
Retailers analyze sales data, customer feedback, and market trends to identify popular
products, personalize offers, and forecast demand. Data analytics also helps retailers
enhance their marketing efforts, improve customer loyalty, and optimize store layouts.

Manufacturing

Data analytics is revolutionizing the manufacturing sector by enabling predictive

maintenance, optimizing production processes, and improving product quality.
Manufacturers can predict equipment failures, minimize downtime, and ensure
efficient operations by analyzing sensor data, machine performance, and historical
maintenance records. Data analytics also enables real-time monitoring of production
lines, leading to higher productivity and cost savings.

Internet Searching

Data analytics powers internet search engines, enabling users to find relevant
information quickly and accurately. Search engines analyze vast amounts of data,
including web pages, user queries, and click-through rates, to deliver the most
relevant search results. Data analytics algorithms continuously learn and adapt to user
behavior, providing increasingly accurate and personalized search results.

Risk Management

Data analytics plays a crucial role in risk management across various industries,
including insurance, finance, and project management. Organizations can assess risks,
develop mitigation strategies, and make informed decisions by analyzing historical
data, market trends, and external factors. Data analytics helps organizations identify
potential risks, quantify their impact, and implement risk mitigation measures.

**************************************************************************************

Data Analytics Lifecycle :

The Data analytic lifecycle is designed for Big Data problems and data science projects.
The cycle is iterative to represent real project. To address the distinct requirements for
performing analysis on Big Data, step–by–step methodology is needed to organize the
activities and tasks involved with acquiring, processing, analyzing, and repurposing data.

•Phase 1: Discovery –
•The data science team learns and investigates the problem.
•Develop context and understanding.
•Come to know about data sources needed and available for the project.
•The team formulates the initial hypothesis that can be later tested with data.
•Phase 2: Data Preparation –
•Steps to explore, preprocess, and condition data before modeling and analysis.
•It requires the presence of an analytic sandbox, the team executes, loads, and transforms,
to get data into the sandbox.
•Data preparation tasks are likely to be performed multiple times and not in predefined
order.
•Several tools commonly used for this phase are – Hadoop, Alpine Miner, Open Refine, etc.
•Phase 3: Model Planning –
•The team explores data to learn about relationships between variables and subsequently,
selects key variables and the most suitable models.
•In this phase, the data science team develops data sets for training, testing, and
production purposes.
•Team builds and executes models based on the work done in the model planning phase.
•Several tools commonly used for this phase are – Matlab and STASTICA.
•Phase 4: Model Building –
•Team develops datasets for testing, training, and production purposes.
•Team also considers whether its existing tools will suffice for running the models or if they
need more robust environment for executing models.
•Free or open-source tools – Rand PL/R, Octave, WEKA.
•Commercial tools – Matlab and STASTICA.
•Phase 5: Communication Results –
•After executing model team need to compare outcomes of modeling to criteria established
for success and failure.
•Team considers how best to articulate findings and outcomes to various team members
and stakeholders, taking into account warning, assumptions.
•Team should identify key findings, quantify business value, and develop narrative to
summarize and convey findings to stakeholders.
•Phase 6: Operationalize –
•The team communicates benefits of project more broadly and sets up pilot project to
deploy work in controlled way before broadening the work to full enterprise of users.
•This approach enables team to learn about performance and related constraints of the
model in production environment on small scale which make adjustments before full
deployment.
•The team delivers final reports, briefings, codes.
•Free or open source tools – Octave, WEKA, SQL, MADlib.
**************************************************************************************

Veritas Infoscale 7.3 Availability Admin
No ratings yet
Veritas Infoscale 7.3 Availability Admin
693 pages
EN - Security Center Administrator Guide 5.9
100% (1)
EN - Security Center Administrator Guide 5.9
1,260 pages
Data Analysis
No ratings yet
Data Analysis
34 pages
MF8 Manual Eng
No ratings yet
MF8 Manual Eng
287 pages
Data Analysis and Analytics
No ratings yet
Data Analysis and Analytics
4 pages
A Complete Guide To Computer System Validation (CSV) - QBD Group - 2023
100% (1)
A Complete Guide To Computer System Validation (CSV) - QBD Group - 2023
105 pages
wadf
No ratings yet
wadf
62 pages
Chapter 1 DA
No ratings yet
Chapter 1 DA
73 pages
Data Anaytics
No ratings yet
Data Anaytics
52 pages
unit-1ppt-241202105748-ba1c594f
No ratings yet
unit-1ppt-241202105748-ba1c594f
30 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
Understanding SAP Versions: Optimizing Packaged Applications
No ratings yet
Understanding SAP Versions: Optimizing Packaged Applications
20 pages
Code Coverage
No ratings yet
Code Coverage
66 pages
Assignments: Software Engineering (MC 0019)
No ratings yet
Assignments: Software Engineering (MC 0019)
14 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Cloud Computing Unit-II
No ratings yet
Cloud Computing Unit-II
45 pages
dataanalysisforbusiness1-250321180842-0ebdade9-250322081632-9ef21f5d
No ratings yet
dataanalysisforbusiness1-250321180842-0ebdade9-250322081632-9ef21f5d
24 pages
Unit II
No ratings yet
Unit II
91 pages
unit 2
No ratings yet
unit 2
81 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
DA-Unit-2-Trio-1
No ratings yet
DA-Unit-2-Trio-1
26 pages
DA Unit 2
No ratings yet
DA Unit 2
16 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
ACC 157 SAS No. 24
No ratings yet
ACC 157 SAS No. 24
6 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
Module II
No ratings yet
Module II
124 pages
Q
No ratings yet
Q
28 pages
DA Unit 2
No ratings yet
DA Unit 2
12 pages
Notes Data Science With Python 1
No ratings yet
Notes Data Science With Python 1
18 pages
Data Science - III
No ratings yet
Data Science - III
94 pages
What Are Data Analytics
No ratings yet
What Are Data Analytics
19 pages
Unit_1.pptx
No ratings yet
Unit_1.pptx
57 pages
Software
No ratings yet
Software
24 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
Data-Analysis-Chapter 1-compressed
No ratings yet
Data-Analysis-Chapter 1-compressed
20 pages
UNIT 3 NIVELACIÓN DE INGLÉS
No ratings yet
UNIT 3 NIVELACIÓN DE INGLÉS
34 pages
Cloud Computing Unit-IV
No ratings yet
Cloud Computing Unit-IV
24 pages
Introduction to Data Analytics
No ratings yet
Introduction to Data Analytics
19 pages
What is Data Analytics
No ratings yet
What is Data Analytics
44 pages
Data Analytics and Visualization Unit-II
No ratings yet
Data Analytics and Visualization Unit-II
23 pages
Build Your First Chatbot: Developer Workshop
No ratings yet
Build Your First Chatbot: Developer Workshop
31 pages
Data Analytics and Visualization Unit-III
No ratings yet
Data Analytics and Visualization Unit-III
21 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Chap 1
No ratings yet
Chap 1
42 pages
Ba Unit 1a
No ratings yet
Ba Unit 1a
18 pages
DB Lec Week-1 Introduction
No ratings yet
DB Lec Week-1 Introduction
14 pages
M3 - Business Data Analysis
No ratings yet
M3 - Business Data Analysis
31 pages
Week-1-Lecture
No ratings yet
Week-1-Lecture
26 pages
V1328 - NSP - Saas Security - Overview
No ratings yet
V1328 - NSP - Saas Security - Overview
23 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
24 pages
TechnoEssentials Module 2 Where Are We TechnoKids PH
No ratings yet
TechnoEssentials Module 2 Where Are We TechnoKids PH
14 pages
Notes of Unit-I Data Analyticsdocx_250319_093958
No ratings yet
Notes of Unit-I Data Analyticsdocx_250319_093958
18 pages
Data Analytics 1
No ratings yet
Data Analytics 1
3 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
3 pages
AWS Sample Resume 3
No ratings yet
AWS Sample Resume 3
30 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
AI specialist
No ratings yet
AI specialist
18 pages
Lecture 2-2
No ratings yet
Lecture 2-2
17 pages
DA 1st Week
No ratings yet
DA 1st Week
3 pages
Deloitte AMS
No ratings yet
Deloitte AMS
30 pages
Introduction to Data Analysis
No ratings yet
Introduction to Data Analysis
8 pages
Data analytics_1
No ratings yet
Data analytics_1
21 pages
Unit 05: Data Preparation & Analysis
100% (1)
Unit 05: Data Preparation & Analysis
26 pages
1 Introduction to Data Analytics
No ratings yet
1 Introduction to Data Analytics
14 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
AWS Concepts and Lab Intro: Saptarshi Debroy, Minh Nguyen
No ratings yet
AWS Concepts and Lab Intro: Saptarshi Debroy, Minh Nguyen
16 pages
Unit 1 Notes - Data Analysis Using r
No ratings yet
Unit 1 Notes - Data Analysis Using r
17 pages
Data Analytics - TYBCS
No ratings yet
Data Analytics - TYBCS
6 pages
2.Data analysis Vs analytics
No ratings yet
2.Data analysis Vs analytics
6 pages
Unit I (Notes 2)
No ratings yet
Unit I (Notes 2)
16 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Oracle Dumps PDF
No ratings yet
Oracle Dumps PDF
6 pages
The Impact of Information Technology On Business
No ratings yet
The Impact of Information Technology On Business
2 pages
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
100% (2)
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
51 pages
Data Analyst
No ratings yet
Data Analyst
9 pages
IBS Hyderabad - Course Handout (AY 2023-24) : Subject-Specific
No ratings yet
IBS Hyderabad - Course Handout (AY 2023-24) : Subject-Specific
8 pages
14 Monitoring Windows Server 2008
100% (1)
14 Monitoring Windows Server 2008
22 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
16 pages
Generic Software
No ratings yet
Generic Software
2 pages
Gym Management (AutoRecovered)
No ratings yet
Gym Management (AutoRecovered)
3 pages
Seeq - Advanced Analytics Stories From Process Engineers PDF
No ratings yet
Seeq - Advanced Analytics Stories From Process Engineers PDF
3 pages
Unit 1
No ratings yet
Unit 1
50 pages
Device Contract
No ratings yet
Device Contract
4 pages
Russia: Reject "Sovereign" Internet Bill
No ratings yet
Russia: Reject "Sovereign" Internet Bill
3 pages
Cloud Computing Unit-V
No ratings yet
Cloud Computing Unit-V
2 pages
What Is Data Analysis
No ratings yet
What Is Data Analysis
6 pages
Flair Data Analytics Tutorial
No ratings yet
Flair Data Analytics Tutorial
9 pages
Data Analysis: Types, Process, Methods, Techniques and Tools
No ratings yet
Data Analysis: Types, Process, Methods, Techniques and Tools
6 pages
01 Assignment 1
No ratings yet
01 Assignment 1
1 page
CS8074 Cyber Forensics: Anna University Exams April May 2022 - Regulation 2017
No ratings yet
CS8074 Cyber Forensics: Anna University Exams April May 2022 - Regulation 2017
2 pages
It Is The Process of Checking and Adjusting The Data For Omissions
No ratings yet
It Is The Process of Checking and Adjusting The Data For Omissions
5 pages
Types of Data Analysis: Techniques and Methods
No ratings yet
Types of Data Analysis: Techniques and Methods
4 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Analytics and Visualization Unit-I

Uploaded by

Data Analytics and Visualization Unit-I

Uploaded by

BCDS501

Introduction to Data Analytics and Visualization

UNIT 1 Introduction to Data

Sources and nature of data, classification of data (structured, semi-

Process of Data Analytics

Methods of Data Analytics

Data Processing Cycle

How Do We Analyze Data?

2. Choose the Right Techniques

3. Explore and Clean the Data

4. Perform the Analysis

5. Interpret the Results

Characterstics of high quality Data

What is a Big Data Platform?

Features of Big Data Platforms

Components of Big Data Platforms

1) Data Ingestion and Collection

3) Data Processing and Analysis

4) Data Management and Orchestration

6) Security and Governance

How do Big Data Platforms Work?

Evolution of analytic scalability

What is analytics vs reporting?

The steps involved in data analytics are as follows:

•Developing a data hypothesis

Today’s reporting applications offer cutting-edge dashboards with advanced data

In general, the procedures needed to create a report are as follows:

•Determining the business requirement

Identifying business events, gathering the

The purpose of analytics is to draw The purpose of reporting is to organize the

Analytics is used by data analysts, Reporting is provided to the appropriate

Importance of analytics vs reporting

A business needs to understand the differences between analytics and reporting.

Transforming data into insights

Analytics assists businesses in converting information into insights, whereas reporting

Applications of Data Analytics

Data analytics is revolutionizing the healthcare industry by enabling better patient

E-commerce platforms utilize data analytics to understand customer behavior,

Supply Chain Management

Data analytics improves supply chain management by optimizing inventory levels,

Data analytics is revolutionizing the manufacturing sector by enabling predictive

Data Analytics Lifecycle :

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.