0% found this document useful (0 votes)
225 views

Big Data Analytics Unit Test-I Answers Bank

The document discusses analytics 1.0, 2.0 and 3.0. Analytics 1.0 refers to the era of business intelligence where data was used primarily for reporting and history. Analytics 2.0 emerged with big data, where both internal and external sources were used to predict the future using techniques like machine learning. Analytics 3.0 focuses on using predictive analytics and data science to drive real-time, automated decisions and actions.

Uploaded by

vishal phule
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
225 views

Big Data Analytics Unit Test-I Answers Bank

The document discusses analytics 1.0, 2.0 and 3.0. Analytics 1.0 refers to the era of business intelligence where data was used primarily for reporting and history. Analytics 2.0 emerged with big data, where both internal and external sources were used to predict the future using techniques like machine learning. Analytics 3.0 focuses on using predictive analytics and data science to drive real-time, automated decisions and actions.

Uploaded by

vishal phule
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Big Data Analytics

Unit test Question bank

Q1) What is structured data? Explain sources of structured data?


Ans: Structured data is the data which conforms to a data model, has a well define structure,
follows a consistent order and can be easily accessed and used by a person or a computer
program.
Structured data is usually stored in well-defined schemas such as Databases. It is generally
tabular with column and rows that clearly define its attributes.
SQL (Structured Query language) is often used to manage structured data stored in databases.
Characteristics of Structured Data:
• Data conforms to a data model and has easily identifiable structure
• Data is stored in the form of rows and columns
Example : Database
• Data is well organised so, Definition, Format and Meaning of data is explicitly
known
• Data resides in fixed fields within a record or file
• Similar entities are grouped together to form relations or classes
• Entities in the same group have same attributes
• Easy to access and query, So data can be easily used by other programs
• Data elements are addressable, so efficient to analyse and process
Sources of Structured Data:
• SQL Databases
• Spreadsheets such as Excel
• OLTP Systems
• Online forms
• Sensors such as GPS or RFID tags
• Network and Web server logs
• Medical devices
Advantages of Structured Data:
• Structured data have a well defined structure that helps in easy storage and access of
data
• Data can be indexed based on text string as well as attributes. This makes search
operation hassle-free
• Data mining is easy i.e knowledge can be easily extracted from data
• Operations such as Updating and deleting is easy due to well structured form of data
• Business Intelligence operations such as Data warehousing can be easily undertaken
• Easily scalable in case there is an increment of data
• Ensuring security to data is easy

Q2) What is unstructured data? Explain sources of structured data?


Unstructured data is the data which does not conforms to a data model and has no easily
identifiable structure such that it can not be used by a computer program easily. Unstructured
data is not organised in a pre-defined manner or does not have a pre-defined data model, thus it
is not a good fit for a mainstream relational database.
Characteristics of Unstructured Data:
• Data neither conforms to a data model nor has any structure.
• Data can not be stored in the form of rows and columns as in Databases
• Data does not follows any semantic or rules
• Data lacks any particular format or sequence
• Data has no easily identifiable structure
• Due to lack of identifiable structure, it can not used by computer programs easily
Sources of Unstructured Data:
• Web pages
• Images (JPEG, GIF, PNG, etc.)
• Videos
• Memos
• Reports
• Word documents and PowerPoint presentations
• Surveys
Advantages of Unstructured Data:
• Its supports the data which lacks a proper format or sequence
• The data is not constrained by a fixed schema
• Very Flexible due to absence of schema.
• Data is portable
• It is very scalable
• It can deal easily with the heterogeneity of sources.
• These type of data have a variety of business intelligence and analytics applications.
Disadvantages Of Unstructured data:
• It is difficult to store and manage unstructured data due to lack of schema and
structure
• Indexing the data is difficult and error prone due to unclear structure and not having
pre-defined attributes. Due to which search results are not very accurate.
• Ensuring security to data is difficult task.

Q3) Define Big data and explain different challenges of big data?
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is
a data with so large size and complexity that none of traditional data management tools can store
it or process it efficiently. Big data is also a data but with huge size.
OR
Big data is a term for data sets that are so large or complex that traditional data processing
application softwares are inadequate to deal with them.

Some of the Big Data challenges are:


1.Sharing and Accessing Data:
-Perhaps the most frequent challenge in big data efforts is the inaccessibility of data sets from
external sources.
-Sharing data can cause substantial challenges.
-It include the need for inter and intra- institutional legal documents.
-Accessing data from public repositories leads to multiple difficulties.
- It is necessary for the data to be available in an accurate, complete and timely manner
because if data in the companies information system is to be used to make accurate decisions
in time then it becomes necessary for data to be available in this manner.
2.Privacy and Security:
-It is another most important challenge with Big Data. This challenge includes sensitive,
conceptual, technical as well as legal significance.
-Most of the organizations are unable to maintain regular checks due to large amounts of data
generation. However, it should be necessary to perform security checks and observation in real
time because it is most beneficial.
-There is some information of a person which when combined with external large data may
lead to some facts of a person which may be secretive and he might not want the owner to
know this information about that person.
-Some of the organization collects information of the people in order to add value to their
business. This is done by making insights into their lives that they’re unaware of.
3.Analytical Challenges:
-There are some huge analytical challenges in big data which arise some main challenges
questions like how to deal with a problem if data volume gets too large?
-Or how to find out the important data points?
-Or how to use data to the best advantage?
-These large amount of data on which these type of analysis is to be done can be structured
(organized data), semi-structured (Semi-organized data) or unstructured (unorganized data).
There are two techniques through which decision making can be done:
--Either incorporate massive data volumes in the analysis.
--Or determine upfront which Big data is relevant.
4.Technical challenges:
Quality of data:
-When there is a collection of a large amount of data and storage of this data, it comes at a
cost. Big companies, business leaders and IT leaders always want large data storage.
-For better results and conclusions, Big data rather than having irrelevant data, focuses on
quality data storage.
-This further arise a question that how it can be ensured that data is relevant, how much data
would be enough for decision making and whether the stored data is accurate or not.
Fault tolerance:
-Fault tolerance is another technical challenge and fault tolerance computing is extremely hard,
involving intricate algorithms.
-Nowadays some of the new technologies like cloud computing and big data always intended
that whenever the failure occurs the damage done should be within the acceptable threshold
that is the whole task should not begin from the scratch.
Scalability:
-Big data projects can grow and evolve rapidly. The scalability issue of Big Data has lead
towards cloud computing.
-It leads to various challenges like how to run and execute various jobs so that goal of each
workload can be achieved cost-effectively.
-It also requires dealing with the system failures in an efficient manner. This leads to a big
question again that what kinds of storage devices are to be used.

Q4)What is analytics 1.0, 2.0 and 3.0


ANS: The evolution of analytics
Analytics is the content of how to use data and what kind of value it will generate. Analytics can be
put into three different categories/sections named analytics 1.0, analytics 2.0 and analytics 3.0 [1].

Analytics 1.0 - the era of Business Intelligence (BI)


In this era the approach is to be data informed. Data is used primarily for history writing and reporting
purposes. The most common example is monthly financial reports to summarize what has happened
the last month. Often excel sheets are used to present and deliver information.

This was the era of the Enterprise Data Warehouse; used to capture information, and of Business
Intelligence Software; used to present and report it.

Statements:

• Decision was based primarily on experience and intuition


• Data sources relativity small and from internal systems
• Most of the time is used to gather data - not to put them into use
• Data is not used as a strategy asset in the decision process

Analytics 2.0 - the era of Big Data


In this era the amount of data is growing, and the sources are shifting from being of internal sources to
be a combination of internal and external sources. The amount of data sources puts new requirements
on how to process all the information, increasing the need to use both internal and external processing
capacities to handle them in the speed needed. A lot of the data that appeared in this era was
unstructured and they required new technologies to put them in proper use, it could be technologies as
machine learning (ML), computer vision systems and artificial intelligence (AI) The main focus in
this era is to start using data to predict what will happen in the future, instead of only use data for
“history writing”.
To be able to work with data in this new way requires a new type of employees and data analyst or
data scientists appears in the companies to be able to handle this new business area.

Statements:

• Complex, large, unstructured data sources


• New analytical and computational capabilities
• Data stored and analyzed in public or cloud computing environment
• Machine learning methods increases the speed of analysis
• Visual analytics offers predictive and prescriptive techniques
• On-line companies will start create business on data
Analytics 3.0 - the era of Data enriched offerings
This era is characterized by the fact that all businesses can create data-based services and products.
Data is not being supplied, but used for decision making both from the supplier - and from the
customer side. Another term often used is that a company is “data centric”. Data is quite often
imbedded in production and decision making processes, which makes it much harder for managers to
“avoid” using data. Data and analytics will be all around in all processes in companies.

Statements:

• Create data and analytics-based products


• Not supplying data - help/guide customers in decision making
• Rapid and agile insight delivery
• All businesses can create data-based products and services
• Focus will shift from software development to data analysis
• Heavy reliance on machine learning
• Strict structures in place to communicate data science finding to decision makers
• High speed and agility needed

Q4)What is big data ? Explain few characteristics


ANS: Big Data is a collection of data that is huge in volume, yet growing exponentially with
time. It is a data with so large size and complexity that none of traditional data management tools
can store it or process it efficiently. Big data is also a data but with huge size.
Characteristics Of Big Data
Big data can be described by the following characteristics:

• Volume
• Variety
• Velocity
• Variability

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a
very crucial role in determining value out of data. Also, whether a particular data can actually be
considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is one
characteristic which needs to be considered while dealing with Big Data solutions.

(ii) Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases were the only sources of data considered by
most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring
devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of
unstructured data poses certain issues for storage, mining and analyzing data.
(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast the data is
generated and processed to meet the demands, determines real potential in the data.

Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The
flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus
hampering the process of being able to handle and manage the data effectively.

Q6)What is Big data analytics? Explain classification of Big Data analytics ?


Big Data analytics is a process used to extract meaningful insights, such as hidden patterns,
unknown correlations, market trends, and customer preferences. Big Data analytics provides
various advantages—it can be used for better decision making, preventing fraudulent activities,
among other things.
Q7)What is data science? Explain responsibilities of data science.
ANS: Data science is an interdisciplinary field of scientific methods, processes, algorithms and
systems to extract knowledge or insights from data in various forms, either structured or
unstructured, similar to data mining.
Roles & Responsibilities of a Data Scientist
• Management: The Data Scientist plays an insignificant managerial role where he
supports the construction of the base of futuristic and technical abilities within the
Data and Analytics field in order to assist various planned and continuing data
analytics projects.
• Analytics: The Data Scientist represents a scientific role where he plans, implements,
and assesses high-level statistical models and strategies for application in the
business’s most complex issues. The Data Scientist develops econometric and
statistical models for various problems including projections, classification,
clustering, pattern analysis, sampling, simulations, and so forth.
• Strategy/Design: The Data Scientist performs a vital role in the advancement of
innovative strategies to understand the business’s consumer trends and management
as well as ways to solve difficult business problems, for instance, the optimization of
product fulfillment and entire profit.
• Collaboration: The role of the Data Scientist is not a solitary role and in this position,
he collaborates with superior data scientists to communicate obstacles and findings
to relevant stakeholders in an effort to enhance drive business performance and
decision-making.
• Knowledge: The Data Scientist also takes leadership to explore different
technologies and tools with the vision of creating innovative data-driven insights for
the business at the most agile pace feasible. In this situation, the Data Scientist also
uses initiative in assessing and utilizing new and enhanced data science methods for
the business, which he delivers to senior management of approval.
• Other Duties: A Data Scientist also performs related tasks and tasks as assigned by
the Senior Data Scientist, Head of Data Science, Chief Data Officer, or the Employer.
Q8) Write difference between NoSQL and RDBMS

NoSQL RDBMS

RDBMS applications store data in the 1) NoSQL is a non-relational database


form of table structured manner. system. It stores data in the form of
unstructured. manner.
1) RDBMS supports multiple users. 2) It also supports multiple users.
2) RDBMS uses tabular structures to store 3) NoSQL uses to store data in structured,
data. In table headers are the column semi-structured and unstructured
names and the rows contains forms.
corresponding values. 4) NoSQL may support ACID to store
3) RDBMS are harder to construct and data.
obey ACID (Atomicity, Consistency,
Isolation, Durability). It helps to create
consistency database. 5) It does have table form, so it does not
4) It supports the normalization and support normalization.
joining of tables.
5) The relational database supports the 6) NoSQL database supports integrity
integrity constraints at the schema constraints
level. Data values beyond a defined
range cannot be stored in the particular
RDBMS column.
6) Open source application 7) Open-source program
7) It was developed in the 1970s to deal 8) It developed in the late 2000s to
with the issues of flat file storage. overcome the issues and limitations of
the SQL database.
8) It supports a distributed database 9) It also supports a distributed database.
10) NoSQL database mainly designed for
9) This database system deals with a large Big data and real-time web data
quantity of data 11) NoSQL storage system supports multi-
10) RDBMS program support client-server server. It also supports client-server
architecture. architecture.
12) Data can be stored in a single document
11) Data related to each other with the help file
of foreign keys 13) Commodity hardware
12) High software and specialized DB
hardware (Oracle Exadata, etc.) 14) Data fetching is easy and flexible.
13) Data fetching is rapid because of its
relational approach and database.
15) Apache HBase, IBM Domino,Oracle
14) MYSQL, Oracle, SQL Server, etc. NoSQL Database,etc.
Q9) What is semi structured data? Explain the sources of semi structured data
ANS: Semi-structured data is data that does not conform to a data model but has some
structure. It lacks a fixed or rigid schema. It is the data that does not reside in a rational database
but that have some organizational properties that make it easier to analyze. With some processes,
we can store them in the relational database.
Characteristics of semi-structured Data:
• Data does not conform to a data model but has some structure.
• Data can not be stored in the form of rows and columns as in Databases
• Semi-structured data contains tags and elements (Metadata) which is used to group
data and describe how the data is stored
• Similar entities are grouped together and organized in a hierarchy
• Entities in the same group may or may not have the same attributes or properties
• Does not contain sufficient metadata which makes automation and management of
data difficult
• Size and type of the same attributes in a group may differ
• Due to lack of a well-defined structure, it can not used by computer programs easily
Sources of semi-structured Data:
• E-mails
• XML and other markup languages
• Binary executables
• TCP/IP packets
• Zipped files
• Integration of data from different sources
• Web pages
Advantages of Semi-structured Data:
• The data is not constrained by a fixed schema
• Flexible i.e Schema can be easily changed.
• Data is portable
• It is possible to view structured data as semi-structured data
• Its supports users who can not express their need in SQL
Disadvantages of Semi-structured data
• Lack of fixed, rigid schema make it difficult in storage of the data
• Interpreting the relationship between data is difficult as there is no separation of the
schema and the data.
• Queries are less efficient as compared to structured data.

Q10 Explain three ‘V’s of Big data.


1. Volume:
• The name ‘Big Data’ itself is related to a size which is enormous.
• Volume is a huge amount of data.
• To determine the value of data, size of data plays a very crucial role. If the volume
of data is very large then it is actually considered as a ‘Big Data’. This means
whether a particular data can actually be considered as a Big Data or not, is
dependent upon the volume of data.
• Hence while dealing with Big Data it is necessary to consider a characteristic
‘Volume’.
2. Velocity:
• Velocity refers to the high speed of accumulation of data.
• In Big Data velocity data flows in from sources like machines, networks, social
media, mobile phones etc.
• There is a massive and continuous flow of data. This determines the potential of
data that how fast the data is generated and processed to meet the demands.
• Sampling data can help in dealing with the issue like ‘velocity’.
3. Variety:
• It refers to nature of data that is structured, semi-structured and unstructured data.
• It also refers to heterogeneous sources.
• Variety is basically the arrival of data from new sources that are both inside and
outside of an enterprise. It can be structured, semi-structured and unstructured.
4. Value:
• After having the 3 V’s into account there comes one more V which stands for
Value!. The bulk of Data having no Value is of no good to the company, unless you
turn it into something useful.
• Data in itself is of no use or importance but it needs to be converted into something
valuable to extract Information. Hence, you can state that Value! is the most
important V of all the 5V’s.

Q11) Explain big data analytics.

Q12) Explain OTHER CHARACTERISTICS OF DATA WHICH ARE NOT


DEFINITIONAL TRAITS OF BIG DATA ?

Q13) Explain traditional business intelligence versus big data.

Q14) How is the traditional BI environment different from the big data environment?

Q15) Share your experience as a customer on an e-commerce site. Comment on the big data
that gets created on a typical e-commerce site.

Q16) Define 1. Descriptive analytics 2. Diagnostics Analytics


3. Predictive analytics 4.Prescriptive Analytics

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy