Bda Bi Jit Chapter-3
Bda Bi Jit Chapter-3
JIT #CMIT-5125
Unit-3
Big Data
Analytics
Admas Abtew
Faculty of Computing and Informatics
Jimma Technology Institute, Jimma University
Admas.abtew@ju.edu.et
+251-912499102
Outline
Looping
• An overview of Big Data Analytics
• Big Data Analytics Techniques
• Graph Database and Analytics
• Big Data Taxonomy
Chapter Three: Big Data
Analytics
3.1 An overview of Big Data Analytics
Big Data Analytics refers to the process of extracting valuable insights, patterns, and trends
from large and complex datasets, known as Big Data. It involves using advanced analytical
techniques and technologies to analyze vast volumes of structured, semi-structured, and
unstructured data to uncover hidden patterns, make predictions, and drive informed decision-
making.
The process of converting large amounts of unstructured raw data, retrieved from different
sources to a data product useful for organizations forms the core of Big Data Analytics.
Big Data Analytics involves the application of various techniques and methodologies to extract
valuable insights from large and complex datasets. These techniques help organizations uncover patterns,
trends, and relationships within the data, enabling data-driven decision-making. Here are some key Big
Data Analytics techniques:
1. Data Mining: Data mining is the process of discovering patterns, relationships, and trends in large
datasets. It involves applying statistical and machine learning algorithms to identify hidden patterns
and make predictions. Data mining techniques include clustering, classification, association rule
mining, and anomaly detection.
2. Machine Learning: Machine Learning (ML) algorithms enable computers to learn from data and
make predictions or take actions without being explicitly programmed. ML techniques are used in Big
Data Analytics to identify patterns, build predictive models, and automate decision-making.
#(BDA) Unit:3 –Big Data Analytics 29
3. Natural Language Processing (NLP): Natural Language Processing techniques are
used to analyze and understand human language data, such as text documents, social
media posts, and customer reviews. NLP algorithms enable sentiment analysis, topic
modeling, text classification, entity recognition, and language translation. These
techniques help organizations gain insights from unstructured text data.
4. Sentiment Analysis: Sentiment analysis, also known as opinion mining, involves
determining the sentiment or opinion expressed in text data. It helps organizations
understand customer feedback, public perception, and brand reputation. Sentiment
analysis techniques use NLP algorithms to classify text as positive, negative, or neutral,
and quantify sentiment scores.
#(BDA) Unit:3 –Big Data Analytics 30
5. Social Network Analysis: Social Network Analysis (SNA) examines the relationships and
interactions within social networks. It helps uncover influential nodes, identify communities, and
analyze patterns of information flow. SNA techniques include measuring centrality, detecting
communities, and visualizing network structures. Organizations use SNA to understand social
dynamics, influence networks, and target influential individuals.
6. Time Series Analysis: Time Series Analysis involves analyzing data collected at regular intervals
over time. It helps identify trends, seasonality, and patterns in time-dependent data. Time series
analysis techniques include moving averages, exponential smoothing, autoregressive integrated
moving average (ARIMA) models, and forecasting methods. Time series analysis is commonly used
in demand forecasting, financial analysis, and predictive maintenance.
A graph database is a specialized type of database that is designed to store and manage
data using graph structures. It represents data as nodes (vertices) and relationships (edges)
between those nodes. Graph databases excel at capturing and representing complex
relationships, making them ideal for scenarios where relationships and connections between
entities are crucial. Here are some key points about graph databases and their use in analytics:
1. Graph Data Model: In a graph database, data is modeled using nodes, edges, and
properties. Nodes represent entities, such as people, products, or locations, while edges
define the relationships between those entities. Properties provide additional information
about nodes and edges. This flexible and intuitive data model allows for the representation
of complex, interconnected data structures.
#(BDA) Unit:3 –Big Data Analytics 35
2. Relationship-Focused: Graph databases excel at managing and analyzing relationships
between entities. Relationships can have properties, directionality, and various types,
enabling rich and expressive modeling. This makes graph databases particularly well-
suited for scenarios such as social networks, recommendation systems, fraud detection,
network analysis, and knowledge graphs.
3. Traversal and Querying: Graph databases provide powerful traversal and querying
capabilities to navigate and analyze the data. Traversal allows users to traverse the graph
along paths, exploring relationships and identifying patterns. Query languages, such as
Cypher (used in Neo4j) and Gremlin, enable users to query and retrieve data based on
relationships, properties, and patterns.
#(BDA) Unit:3 –Big Data Analytics 36
4. Graph Analytics: Graph analytics is the process of analyzing graph-structured data to extract
meaningful insights. It involves applying algorithms and techniques to uncover patterns, identify
communities, calculate centrality measures, detect anomalies, and perform pathfinding. Graph
analytics helps reveal hidden relationships, influence networks, and provides contextual insights
that are not easily discovered with traditional database models.