Big-Data-Analytics Notes For Ug
Big-Data-Analytics Notes For Ug
Unit Structure
2.0 Objectives
2.1 Introduction to big data analytics
2.2 Classification of Analytics
2.3 Challenges of Big Data
2.4 Importance of Big Data
2.5 Big Data Technologies
2.6 Data Science
2.7 Responsibilities
2.8 Soft state eventual consistency
2.9 Data Analytics Life Cycle
Summary
Review Questions
2.0 OBJECTIVES
1
3. About a competitive edge over your competitors by enabling you
with findings thatallow quicker and better decision-making.
5. Working with datasets whose volume and variety exceed the current
storage and processing capabilities and infrastructure of your
enterprise.
About moving code to data. This makes perfect sense as the program for
distributed processing is tiny (just a few KBs) compared to the data
(Terabytes or Petabytes today and likely to be Exabytes or Zettabytes in
the near future).
Basic analytics: This primarily is slicing and dicing of data to help with
basic business insights. This is about reporting on historical data, basic
visualization, etc.
2
Operationalized analytics: It is operationalized analytics if it gets
woven into the enterprisesbusiness processes.
Advanced analytics: This largely is about forecasting for the future by
way of predictive andprescriptive modelling.
Monetized analytics: This is analytics in use to derive direct business
revenue.
3
Relational databases Database appliances, In memory analytics, in
Hadoop clusters, SQL to database processing, agile
Hadoop environments, analytical methods,
etc. machine
learning techniques etc.
Table 2.1Analytics 1.0, 2.0 and 3.0 (Big Data and Analytics)
Security: Most of the NoSQL big data platforms have poor security
mechanisms (lack of proper authentication and authorization
mechanisms) when it comes to safeguarding big data. A spot that cannot
be ignored given that big data carries credit card information, personal
information and other sensitive data.
5
activities, global economic impacts, sensor logs, social media analytics,
customer churn, collaborative filtering(prediction about interest on users),
regression analysis, etc. Data science is multi-disciplinary. Refer to
Figure 2.2.
A data scientist should have following ability to play the role of data
scientist.
• Understanding of domain
• Business strategy
• Problem solving
• Communication
• Presentation
• Keenness
6
2.6.3 Mathematics Expertise:
The following are the key skills that a data scientist will have to have to
comprehend data,interpret it and analyze.
• Mathematics.
• Statistics.
• Artificial Intelligence (AI).
• Algorithms.
• Machine learning.
• Pattern recognition.
• Natural Language Processing.
• To sum it up, the data science process is
• Collecting raw data from multiple different data sources.
• Processing the data.
• Integrating the data and preparing clean datasets.
• Engaging in explorative data analysis using model and algorithms.
• Preparing presentations using data visualizations.
• Communicating the findings to all stakeholders.
• Making faster and better decisions.
2.7 RESPONSIBILITIES
7
Figure 2.3 Data scientist: your new best friend!!!
(Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data)
Soft state: The state of the system could change over time, so even
during times without input there may be changes going on due to
‘eventual consistency,’ thus the state of the system is always ‘soft.’
Phase 4-Model building: In Phase 4, the team develops data sets for
testing, training, and production purposes. In addition, in this phase the
team builds and executes models based on the work done in the model
planning phase. The team also considers whether its existing tools will
suffice for running the models, or if it will need a more robust
environment for executing models and workflows (for example, fast
hardware and parallel processing, if applicable).
10