Bda Unit 1
Bda Unit 1
Structured data
Semi structured data
Unstructured data
Structured Data
Data conforms to a pre-defined schema/structure.
Structured data has data model.
Data model is of the type of business data that we intend
to store, process and access.
Cardinality of a relation – the number of
rows/records/tuples in a relation.
Degree of a relation – the number of fields/columns.
Ex: Data stored in databases
Sources of structured data
Databases such as Oracle,DB2,Teradata,MySQL,PostgreSQL
Spreadsheets
OLTP systems
Ease of working with structured data
Semi structured data
Self describing structure.
Not confirm to the data model and Uses tags.
IT uses tags to segregate semantic elements.
Sources of semi structured data
XML : Extensible Markup Language is hugely
popularized by web services
Other markup languages
JSON(Java Script Object Notation)
BIG DATA
ANALYTICS
Contents
Overview of business intelligence
Data science and Analytics
Meaning and Characteristics of big data analytics
Need of big data analytics
Classification of analytics
Challenges to big data analytics
Importance of big data analytics
Basic terminologies in big data environment
What is Business Intelligence?
Business Intelligence enables the business to make
intelligent, fact-based decisions
Add Context to
Database, Data Reporting Tools, Decisions are
Create
Mart, Data Dashboards, Fact-based and
Information,
Warehouse, ETL Static Reports, Data-driven
Descriptive
Tools, Mobile
Statistics,
Integration Tools Reporting,
Benchmarks,
OLAP Cubes
Variance to Plan
Business Intelligence (BI) Tools
Data Sources
Data Warehousing
OLAP (Online Analytical Processing)
Data Mining
Regression
Predicting Customer Behavior
cloud technology
mobile BI
visual analytics.
Market Basket Analytics
Text Analytics
Customer Segmentation/Clustering
Amazon.com and NetFlix
36
Unstructured Text Processing
Facebook Page
Twitter
Page Customer Sat
Survey
Comments
Call
Center Services
Notes,
Quality Cost Friendliness
Voice
Competitors’
Facebook
Public Web Sites,
Pages
Discussion Boards,
Email
Product Reviews
Blogs Alerts,
Adhoc Real-time
Feedback Action
37
Data Science
Data science is the science of extracting knowledge
from data
(Or)
It is a science of drawing out hidden patterns amongst
data using statistical techniques and information
technology (machine learning, data engineering,
probability models and pattern recognition)
Business
Acumen
Data
Technology science
Mathematics
Expertise expertise
Data Science use cases
Consistenc
y
Availabilit
CAP y
Partition
Tolerance
Contd..,
Examples of databases that follow one of the possible
three combinations:
Availability and partition tolerance(AP)
Consistence and partition tolerance(CP)
Consistence and Availability(CA)