0% found this document useful (0 votes)

94 views89 pages

Unit 1 - Big Data Technologies

The document discusses big data technologies and concepts. It provides an overview of text books on data mining and Hadoop. It then covers topics like the data mining process, common data mining techniques including frequent pattern mining, association analysis, classification, regression and clustering. It defines big data, discusses the characteristics of big data and common big data scenarios. It also discusses the limitations of traditional data analytics architectures and why Hadoop provides a better solution for big data.

Uploaded by

prakash N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views89 pages

Unit 1 - Big Data Technologies

Uploaded by

prakash N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 89

Big Data Technologies

Text Books:
1. Jiawei Han MichelineKamber Jian Pei, Data Mining: Concepts and
Techniques, Third Edition, Elsevier, Morgan Kaufmann, 2011.
2. Tom White, “Hadoop: The Definitive Guide”, 3rd Edition, O’reilly,
2012.
3. Brett Lantz, Machine Learning with R - Second Edition - Deliver Data
Insights with R and Predictive Analytics 2nd Revised edition, 2015

By
Prakash N
Assistant Professor
Department of CST
UNIT I: DATA MINING & BIG DATA

Introduction to Data mining, KDD process, Data Mining Techniques: Mining

Frequent patterns, Association rule, Cluster analysis, Classification and
Regression. Introduction to Big Data - What is Big Data? Explosion in
Quantity of Data, Big Data Characteristics, Types of Data, Common Big data
Customer Scenarios, BIG DATA vs. HADOOP, A Holistic View of a Data
System, Limitations of Existing Data Analytics Architecture.
Why Data Mining
• Credit ratings/targeted marketing:
– Given a database of 100,000 names, which persons are the least likely
to default on their credit cards?
– Identify likely responders to sales promotions
• Fraud detection
– Which types of transactions are likely to be fraudulent, given the
demographics and transactional history of a particular customer?
• Customer relationship management:
– Which of my customers are likely to be the most loyal, and which are
most likely to leave for a competitor? :

Data Mining helps extract such

information
Data mining

• It is the process of discovering interesting

patterns and knowledge from large amount of
data.
• Data source includes database, data
warehouses, the web or other information
resources.
• Also known as Knowledge Discovery from
Data (KDD)
Applications
• Banking: loan/credit card approval
– predict good customers based on old customers
• Customer relationship management:
– identify those who are likely to leave for a competitor.
• Targeted marketing:
– identify likely responders to promotions
• Fraud detection: telecommunications, financial
transactions
– from an online stream of event identify fraudulent events
• Manufacturing and production:
– automatically adjust knobs when process parameter changes
Evolution of Database Technology
• 1960 and earlier : Primitive File Processing
• 1970 to 1980 (Database Management system) -> Relational
Database Systems, ER models, Query Language, User Interface,
Forms, Query Processing, Transactions, Concurrency and recovery.
• Mid 1980 to present (Advance Database System): Advance data
models, Advanced Queries, Parallel data processing.
• Late 1980s to present (Advance Data Analysis) : Data ware housing
and Data Mining.
KDD Process

1. Data Cleaning – Remove noise and inconsistent data.

2. Data Integration – Multiple data source may be combined.

3. Data selection – Data relevant to analysis task are retrieved from

database

4. Data Transformation - Consolidated data from for data mining.

5. Data Mining – Intelligent methods are applied for data extraction

6. Pattern Evaluation - Identify truly data.

7. Knowledge Presentation – Used to present knowledge to user.

Data Mining: A KDD Process

Pattern Evaluation
– Data mining: the core of
knowledge discovery process.
Data Mining

Task-relevant Data
Data Selection
Data Transformation
Data Warehouse

Data Cleaning
Data Integration

Databases
What kind of pattern can be mined?

The most basic form of data for mining applications are Database
data, Data warehouse data and transactional data.
• Database Data (DBMS)

-Consist of a collection of interrelated data.

-To access and manage the data using by software programs.
• Data warehouse – repository of information collected from
multiple source.
• Transactional Data – information recorded from transaction. It
includes unique transaction number. (trans.ID).
Data Mining Task

• Prediction Tasks
– Perform induction on the current data in order to make predictions

• Description Tasks
– Find human-interpretable patterns that describe the data.
Data Mining Techniques

– Mining of Frequent Patterns

– Associations

– Correlations

– Classification and Regression

– Clustering Analysis
Mining of Frequent Patterns

• Patterns that occur frequently in data.

• Many kinds of Frequent Patterns
- Frequent Item sets
- Frequent Subsequence (Sequential Patterns)
• It leads to the discovery of interesting associations and correlations within
data.
Frequent Item Sets
• Set of Items that often appear together in a transactional data set
Example – milk and bread frequently brought together in grocery
store.

Frequent Subsequence (Sequential Patterns)

• The Pattern that customers, tend to purchase first a laptop, followed by a
digital camera and then a memory card.
Association Rule / Analysis

• For example, you want to know which items are frequently purchased together
within the same transaction.
Such a rule is,
buys(X,”computer”) => buys(X,”Software”)[support=1%, confidence=50%]
*X – Variable representing a customer
*buys – Attribute.

*A confidence of 50% means, customer buys a computer.. There is a

chance customer will buy software as well.
*1% means that, of transaction under analysis computer and software are
purchased together.

This rule contain single predicate referred as Single-Dimensional Association Rules

2nd Rule..

age(X,”20..29”) ˄ income(x, “40K..49K”) => buys(X, “laptop”)

[support = 2%, confidence=60%]

This rule involving more than one attribute or predicate (i.e., age, income and
buys) referred as Multidimensional Association Rule.
Classification and Regression

Classification
• It is the process of finding a model that describes and distinguish data
classes and concepts.
• The model are derived based on the analysis of a set of training data.
(Class label known)
• It is used to predict a class label of objects for which the class label is
unknown.
• Classification model can be represented in various forms (i). IF-THEN
rules, (ii) a decision tree and (iii) a neural network.
For example, Classify countries based on climate, or classify car based
on gas mileage
Classification Rules (IF-THEN rules)
A Decision Tree Algorithm

Node – Attribute; Branch – Outcome of the test;

Tree Leaves – Classes
A Neural Network

f3 f6 Class A

age f1

f4 f7 Class B
income f2

Class C
f5 f8

It is typically collection of neuron-like processing unit with weighted

connections between the units.
Regression

• Regression is a data mining function that predicts a number.

(unavailable numerical data values)
• Age, weight, distance, temperature, income, or sales could all
be predicted using regression techniques.
• For example, a regression model could be used to predict
children's height, given their age, weight, and other factors.
Cluster Analysis

• It is the process of partitioning a set of data objects into

subsets.
• Each subset is a cluster.
• Objects in a cluster are similar to one another, yet dissimilar to
objects in other clusters.
• The set of clusters resulting from a cluster analysis can be
referred as a Clustering.
• Clustering analyzes data objects without consulting class
labels (training data).
• Clustered based on the principle of maximizing the intraclass
similarity and minimizing the interclass similarity.
Cluster Analysis
What is Big Data?
• “A massive volume of both structured and unstructured data
that is so large that it's difficult to process using traditional on
hand database management tools.
• ‘Big Data’ is similar to ‘small data’, but bigger in size.

• but having data bigger it requires different approaches,

Techniques, Tools and Architecture
• an aim to solve new problems or old problems in a better way.
Big Data Analytics
Categories of BIG Data
• Structured
• Written in a format that’s easy for machines to
understand.
• Structured data is easily searchable by basic algorithms.
• Examples : Fields/ Tables/ Columns/
RDBMS/Spreadsheet

• Semi-structured
• Markers/Tags to separate elements
• XML/HTML
• Unstructured
• No fields/attributes
• More like Human Language
• Free form text (E-mail body, notes, articles,…)
• Audio, video, and image
Examples Of Structured Data

• An 'Employee' table in a database is an example of Structured Data.

Employee_ID Employee_N Gender Department Salary_In_lac

ame s
2365 Rajesh Male Finance 650000
Kulkarni
3398 Pratibha Female Admin 650000
Joshi
7465 Shushil Roy Male Admin 500000
7500 Shubhojit Male Finance 500000
Das
7699 Priya Sane Female Finance 550000
Examples Of Un-Structured Data
Examples Of Semi - Structured Data

• Personal data stored in a XML file

Explosion in Quantity of Data
• Every minute
– Facebook users share nearly 2.5 million pieces of
content
– Twitter users tweet nearly 300,000 times
– Instagram users post nearly 220,000 new photos
– YouTube users upload 72 hours of new video content
– Apple users download nearly 50,000 apps
– Email users send over 200 million messages
– Amazon generates over $80,000 in online sales
Explosion in Quantity of Data

• The Data Explosion in 2014 Minute by Minute

– In 2012, Google received over 2 million search
queries per minute
– Today, Google receives over 4 million search
queries per minute from the 2.4 billion strong
global internet population
Explosion in Quantity of Data
• Science
– Data bases from astronomy, genomics, environmental data,
transportation data, …
• Humanities and Social Sciences
– Scanned books, historical documents, social interactions data, new
technology like GPS …
• Business & Commerce
– Corporate sales, stock market transactions, census, airline traffic,
…
• Entertainment
– Internet images, movies, MP3 files, …
• Medicine
– MRI & CT scans, patient health records, …
Big Data Analytics
Why Big Data?

• Increase of storage capacities

• Increase of processing power
• Availability of data.
– Manage data – extract relevant data
– Perform analytics on data – gain insights and use
algorithms
– Make decisions
Why Big Data?

Big Data can further Benefit organisations in the

below mentioned 5 areas
Comprehend market Conditions:
through big data, organisations can predict what
future customer behaviour will be purchasing
patterns, choices, product preferences.
Know your Customer Better:
through big data analysis, companies come to
know the general thought process and feedback in
advance and make course corrections.
Why Big Data?

Control Online Reputation

Sentimental analysis can be done through Big
Data Tools.
Cost Saving
firstly, there might be an initial cost of application
of big data tools, but in the long run, the benefits
will outweigh the cost.
 Availability of Data – Through Big Data tools,
relevant data can be available, in an accurate
and structured format, in real time.
Big Data Characteristics

4Vs’
• Volume
• Velocity
• Variety
• Veracity
Big Data Characteristics

• Volume  Refer to the amount of data

• Terabytes --- Zettabytes, Records, Transactions,
Tables and files
• Amount of data the size of the data set.
• We are not talking Terabytes but Zettabytes, the same
amount of data will soon be generated every minute.
• New big data tools use distributed systems so that we
can store and analyse data across databases that are
dotted around anywhere in the world.
• Velocity  Data in motion
• Velocity Refers to the speed
• at which new data is generated and the speed at which data
moves around.
• at which the data is created, stored, analyzed and visualized
• Machine to machine processes exchange data between billions
of devices
• Infrastructure generate massive log data in real time.
• Variety Data in many forms,
• Different data type such as audio, video, image data
(mostly unstructured data)
• In the past we only focused on structured data that
neatly fitted into tables or relational databases such as
financial data.
• In fact, 80% of the world data’s is unstructured
• With big data technology we can now analyse and bring
together data of different types (messages, social
media, conversations, photos….)
• Veracity  Data in doubt
• Refers to the messiness
• Inconsistent and missed data.
• With many forms of big data, quality and accuracy are
less controllable
5 Vs of Big Data
Volume, Veracity, Velocity,
Variety, and Value

Having access to big data is no

good unless we can turn it into
value.

Big Data Analytics

Big Data Characteristics – 4V’s
Common Big data Customer Scenarios
• Web and E-Tailing
- Recommendation Engines
- Search Quality
- Abuse and click Fraud Detection
• Telecommunications
- Network Performance Optimization
- Analysis network to predict failure
Common Big data Customer Scenarios
• Government
- Fraud Detections and Cyber Security
- Welfare schemes

- Health Care and Life Sciences

- Health Information Exchange
- Drug Safety
- Health care Service quality Improvements
Common Big data Customer Scenarios
• Banks and Financial Services
- Fraud Detections and Cyber Security
- Credit Scoring Analysis

• Retail
- Sale Transaction Analysis
H
B
BIG DATA vs. HADOOP
Understand and navigate
Formed Discovery and Navigation
formed big data sources

Manage & store huge volume Hadoop File System

of any data MapReduce

Structure and control data Data Warehousing

Manage streaming data Stream Computing

Analyze unstructured data Text Analytics Engine

Integrate all data sources Extract Transform Load, Integration,

Data Quality, Security, Data Life Cycle.
Big Data Analytics
1. Analyzes multiple data streams from many sources live
2. Stream computing uses software algorithms that analyzes the
data in real time.
3. To increase speed and accuracy when dealing with data
handling and analysis.
ETL (Extract, Transformation and Load)
• ETL did originate in enterprise IT
- data from online databases is Extracted,
- then Transformed to normalize it and
- finally Loaded into enterprise data
warehouses for analysis
Data Life Cycle
HDFS (Hadoop Distributed File System)
• Hadoop File System was developed using distributed file
system design

• Highly fault tolerant and designed using low-cost hardware

• Holds very large amount of data and provides easier

access

• the files are stored across multiple machines

• provides file permissions and authentication

• Support big data analytics applications

• High performance access to data across Hadoop clusters

HDFS (Hadoop File System)
• Developer – Apache Software foundation

• Written in Java

• The core of apache Hadoop consists of storage part

(HDFS) and Map Reduce

Benefits
• Computing Power – Distributed computing model
ideal for big data

• Flexibility – Store any amount of any kind of data

• Fault Tolerance – If node goes down, Jobs are

automatically redirected to other nodes. And it
automatically stores multiple copies of all data.
HDFS (Hadoop File System)
Benefits
• Low cost – open source framework is free

• Scalability – System can be grown easily by adding

more nodes

HDFS Goals
• Detection of faults and automatic recovery

• High throughput of data access rather than low

latency

• Provide high bandwidth and scale to hundreds of

nodes in single cluster
HDFS (Hadoop File System)
• Write once read many access model for files

• Applications move themselves closer to where the

data is located

• Easily portable.

• Every block is replicated three times (by default)

• Default block size is 64mb

• Sending alive message in every 3 seconds

Storing a file in HDFS
MapReduce

• The MapReduce algorithm contains two important tasks, Map

• It is a powerful paradigm for parallel computation

• Hadoop uses map reduce to execute jobs on files in HDFS
• Hadoop will intelligently distribution computation over cluster
• Take computation to data.
MapReduce

• The MapReduce algorithm contains two important tasks, Map

and Reduce.
• Map takes a set of data and converts it into another set of
data, where individual elements are broken down into tuples
(key/value pairs).
• Secondly, Reduce task, which takes the output from a map as
an input and combines those data tuples into a smaller set of
tuples.
MapReduce
Hadoop Ecosystem
A Holistic View of a Big Data System
Discover,
Understand,
Search and Analyze
Navigate Streaming data
source of Big and large data
Data bursts for real
time insights

Delivers insight
Analysis with advanced in
petabytes of database
unstructured analytics and
data structured operational
data analytics

Govern data
Quality and
manage the
information life
cycle.
Holistic View of Hadoop Ecosystem

Hadoop System
• It is an open source distributed processing framework that
manages data processing.
• Storage for big data applications running in clustered system.
• It is used to advanced analytic initiatives, including predictive
analytics, data mining.
• Handle various forms of structured and unstructured data,
giving users more flexibility for collecting, processing and
analyzing data than relational databases and data warehouses
provide.
• Hadoop runs on clusters of commodity servers.
Holistic View of Hadoop Ecosystem

Stream Computing
Holistic View of Hadoop Ecosystem

Stream Computing
• Analyzes multiple data streams from many sources live. i.e.,
pulling in streams of data, processing the data and streaming it
back out as a single flow

• Uses software algorithms that analyzes the data in real time.

• It streams in to increase speed and accuracy when dealing with
data handling and analysis.
Holistic View of Hadoop Ecosystem

Stream Computing
• In June 2007, IBM announced its stream computing system,
called System S.
– This system runs on 800 microprocessors and the System S software
enables software applications to split up tasks and then reassemble
the data into an answer.
Dataware House
• DWs are central repositories of integrated data from one or
more disparate sources
• They store current and historical data in one single place that
are used for creating analytical reports for workers
throughout the enterprise.
• used for reporting and data analysis
Holistic View of Hadoop Ecosystem

Dataware House
• The data is processed, transformed, and ingested so that users
can access the processed data in the Data Warehouse through
Business Intelligence tools, SQL clients, and spreadsheets.
• Three main types of Data Warehouses are:
– Enterprise data warehouse
1. Its ability to classify the data according to the subject and given
access according to those divisions
– Operational Data Store
1. Data store required when nether data ware house nor OLTP systems
support organizations reporting needs.
– Data Mart
1. Subset of data ware house, designed for a particular line of business
such as sales, finance…
Holistic View of Hadoop Ecosystem

Information Integration and Governance

• Integrate the data, cleanse data, master data, protect
sensitive data and govern the meaning of data
• Get the information to make important decision
• Governance and integration platform solutions to know your
data is correct and available to every data user, to trust your
data to deliver efficiency and protection, and to use your data
to drive business transformation and innovation.

Data Discovery
• Data discovery is the process of breaking complex data
collections into information that users can understand and
manage.
Holistic View of Hadoop Ecosystem

Data Visualization
• Representing data in visual form. This can be particularly
useful when data need to be evaluated and decisions made
quickly.
Big Data System Management
• Monitoring and ensuring the availability of all big data
resources through a centralized interface/dashboard.
• Performing database maintenance for better results.
• Ensuring the security of big data repositories and control
access.
• Ensuring that data are captured and stored from all resources
as desired
Testing you?

Q 1 – As compared to RDBMS, Hadoop

A – Has higher data Integrity.
B – Does ACID transactions
C – IS suitable for read and write many times
D – Works better on unstructured and semi-structured data.

Q2 Which of the following is the true about metadata?

A-FsImage & EditLogs are metadata files
B- Metadata contain information like number of blocks, their
location, replicas
C-Metadata shows the structure of HDFS directories/files
D-ALL of the above
Testing you?

Q3. HDFS performs replication, although it results in data

redundancy?
A-True
B-False

Q4. Why do we need Hadoop?

A-Storage,
B-Security
C-Analytics
D-All the above
Testing you?

Q5.Components of RDMS? Are HDFS and MapReduce

A-True
B- False

Q6. For the following stands for,

A . OLTP
B. OLAP
C. HDFS
D. BDT
Testing you?

Q7.All of the following accurately describe Hadoop, EXCEPT:

A) Open source
B) Real-time
C) Java-based
D) Distributed computing approach

Q8. Which of the following genres does Hadoop produce ?

A) Distributed file system
B) JAX-RS
C) Java Message Service
D) Relational Database Management System
Testing you?

Q9. What was Hadoop written in ?

A) Java (software platform)
B) Perl
C) Java (programming language)
D) Lua (programming language)

Q10. What is default size of block?

A) 64mb
B) 64kb
C) 64gb
D) 64pb
Testing you?

Q11. Hadoop is a framework that works with a variety of

related tools. Common cohorts include:
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet

Q12. …………………. is an essential process where intelligent

methods are applied to extract data patterns.
A) Data warehousing
B) Data mining
C) Text mining
D) Data selection
Testing you?

13. Data mining can also applied to other forms such as …………….
i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data
v) All

14. Write full form of KDD-----------------------

15. The out put of KDD is ………….

A) Data
B) Information
C) Query
D) Useful information
Testing you?

16. The data is stored, retrieved and updated in ………….

A) OLTP
B) OLAP
C) SMTP
D) FTP

17. What is metadata?

18. The Data ware house is --------------

A) Read only
B) Write only
C) Read and Write
D) None
Limitations of Existing Data Analytics

EDW – Enterprise Data Warehouse

MPP – Massively Parallel Processing Database
Schema-Blueprint of how the database is constructed
(divided into database tables in the case of relational
databases)
ACID Properties
• A- Atomicity : Each transaction is considered as one unit and
either runs to completion or is not executed at all.
—Abort: If a transaction aborts, changes made to
database are not visible.
— Commit: If a transaction commits, changes made are
visible.
• C-Consistency : Maintained so that the database is consistent
before and after the transaction
Total before T occurs = 500 + 300 = 700.
Total after T occurs = 400 + 200 = 600.
Therefore, database is consistent. Inconsistency occurs in case T1
completes but T2 fails. As a result T is incomplete.
ACID Properties
• I-Isolation : multiple transactions can occur concurrently.
• Transactions occur independently without interference.

• D-Durability :
This property ensures that once the transaction has completed
execution, the updates and modifications to the database are
stored in and written to disk and they persist even is system
failure occurs
Limitations of Hadoop for Big Data Analytics
Issue with small files
• Hadoop is not suited for small data.

• lacks the ability to efficiently support the random reading of

small files because of its high capacity design.

• If we are storing these huge numbers of small files, HDFS can’t

handle these lots of files.

• HDFS was designed to work properly with a small number of

large files for storing large data sets rather than a large
number of small files.
Limitations of Hadoop for Big Data Analytics
Slow Processing Speed
• In Hadoop, with a parallel and distributed algorithm.

• MapReduce process large data sets.

• MapReduce requires a lot of time to perform these tasks

thereby increasing latency.

• Data is distributed and processed over the cluster in

MapReduce which increases the time and reduces processing
speed.
Limitations of Hadoop for Big Data Analytics
Support for Batch Processing Only
• Hadoop supports batch processing only.
• the execution of a series of programs each on a set or "batch"
of inputs, rather than a single input.
• Hadoop MapReduce is the best framework for processing
data in batches.
• MapReduce framework of Hadoop does not leverage the
memory of the Hadoop cluster to the maximum.

No Delta Iteration
• Hadoop is not so efficient for iterative processing.
• Hadoop does not support cyclic data flow.
Limitations of Hadoop for Big Data Analytics
Latency
• In Hadoop, MapReduce framework is comparatively slower,
since it is designed to support different format, structure and
huge volume of data.
• MapReduce requires a lot of time to perform these tasks
thereby increasing latency.

Not Easy to use

• In Hadoop, MapReduce developers need to hand code for
each and every operation which makes it very difficult to
work.
Limitations of Hadoop for Big Data Analytics
Security
• Hadoop can be challenging in managing the complex
application. If the user doesn’t know how to enable platform
who is managing the platform, your data could be at huge
risk.
• At storage and network levels, Hadoop is missing encryption,
which is a major point of concern.
• Hadoop supports Kerberos authentication, which is hard to
manage.

No Abstraction
• Hadoop does not have any type of abstraction
• So MapReduce developers need to hand code for each and
every operation which makes it very difficult to work.
Limitations of Hadoop for Big Data Analytics
Vulnerable by Nature
• Hadoop is entirely written in java, a language most widely used,
hence java been most heavily exploited by cyber criminals

No Caching
• Hadoop is not efficient for caching.
• In Hadoop, MapReduce cannot cache the intermediate data in
memory for a further requirement which diminishes the
performance of Hadoop.

Lengthy Line of Code

• Hadoop has 1,20,000 line of code, the number of lines
produces the number of bugs and it will take more time to
execute the program.
Limitations of Hadoop for Big Data Analytics
Uncertainty
• Hadoop only ensures that data job is complete, but it’s unable
to guarantee when the job will be complete.
Limitations of Existing Data Analytics Architecture

Big Data Analytics

Solution: A Combined Storage Compute Layer

Big Data Analytics

Data Mining
No ratings yet
Data Mining
395 pages
01 Intro
No ratings yet
01 Intro
52 pages
Data Mining
No ratings yet
Data Mining
254 pages
DMDW
No ratings yet
DMDW
287 pages
Datamining With Big Data - Siva
No ratings yet
Datamining With Big Data - Siva
69 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
DB 14
No ratings yet
DB 14
97 pages
Haramaya University College of Engineering and Technology Department of Information Technology
No ratings yet
Haramaya University College of Engineering and Technology Department of Information Technology
38 pages
1 DM Intro
No ratings yet
1 DM Intro
38 pages
Introduction
No ratings yet
Introduction
26 pages
Week 4 - Introduction To Data Mining and Data Mining Techniques
No ratings yet
Week 4 - Introduction To Data Mining and Data Mining Techniques
44 pages
Data Mining
No ratings yet
Data Mining
88 pages
The Importance of Data Mining in IT Industry
No ratings yet
The Importance of Data Mining in IT Industry
50 pages
Unit-1 A
No ratings yet
Unit-1 A
47 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Module 3
No ratings yet
Module 3
187 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Lec 1
No ratings yet
Lec 1
48 pages
Unit III
No ratings yet
Unit III
101 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
46 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
DWDM
No ratings yet
DWDM
30 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Dmi Unit 1 - 186 - N3
No ratings yet
Dmi Unit 1 - 186 - N3
12 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
01 Intro
No ratings yet
01 Intro
22 pages
Unit 1
No ratings yet
Unit 1
59 pages
Data Warehouse & Mining
No ratings yet
Data Warehouse & Mining
28 pages
DataMining S
No ratings yet
DataMining S
103 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Module 1
No ratings yet
Module 1
41 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Data Mining
No ratings yet
Data Mining
26 pages
Data Mining
No ratings yet
Data Mining
7 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
BDA Final Lab Manual
100% (1)
BDA Final Lab Manual
56 pages
Dmi Unit 1
No ratings yet
Dmi Unit 1
8 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Oracle APEX Scripting 101 - The Command Line Is Your Friend
100% (1)
Oracle APEX Scripting 101 - The Command Line Is Your Friend
48 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
IME 672-Chapter 1 PDF
No ratings yet
IME 672-Chapter 1 PDF
41 pages
Computer Science 3rd Year Specilization
No ratings yet
Computer Science 3rd Year Specilization
9 pages
Unit 3 Data Mining PDF
No ratings yet
Unit 3 Data Mining PDF
19 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
SQL Project 1
100% (1)
SQL Project 1
40 pages
Data-Mining FINAL
No ratings yet
Data-Mining FINAL
45 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
If He Had Been With Me - Laura Nowlin - Google Books
No ratings yet
If He Had Been With Me - Laura Nowlin - Google Books
1 page
Data Mining
No ratings yet
Data Mining
27 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
SQL Interview Questions For A BA Job
No ratings yet
SQL Interview Questions For A BA Job
10 pages
Chap 1
No ratings yet
Chap 1
32 pages
Infor LN Installation Guide: Release 10.7.1
No ratings yet
Infor LN Installation Guide: Release 10.7.1
46 pages
What Is A Data Analytics Lifecycle
No ratings yet
What Is A Data Analytics Lifecycle
8 pages
Sparsh DBMS Revised
No ratings yet
Sparsh DBMS Revised
38 pages
Big Data PPT 55b0fc01e7543
No ratings yet
Big Data PPT 55b0fc01e7543
31 pages
CC Unit 4
No ratings yet
CC Unit 4
46 pages
DBMS Unit Iv
No ratings yet
DBMS Unit Iv
19 pages
Mad Unit-4
No ratings yet
Mad Unit-4
34 pages
DBMS QP
No ratings yet
DBMS QP
15 pages
DBMS (Cat - 1)
No ratings yet
DBMS (Cat - 1)
32 pages
CSE301 Lec1
No ratings yet
CSE301 Lec1
15 pages
CN Unit-5
No ratings yet
CN Unit-5
41 pages
ADB Chap04 Conceptual Design I2324
No ratings yet
ADB Chap04 Conceptual Design I2324
68 pages
Transaction Concept
No ratings yet
Transaction Concept
26 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Module #3 Transaction Concurrency Control and Recovery System
No ratings yet
Module #3 Transaction Concurrency Control and Recovery System
82 pages
Unit IV BDA
No ratings yet
Unit IV BDA
32 pages
Android Sqlite Basics
No ratings yet
Android Sqlite Basics
45 pages
Lab - Building and Orchestrating ETL Pipelines by Using Athena and Step Functions
No ratings yet
Lab - Building and Orchestrating ETL Pipelines by Using Athena and Step Functions
38 pages
MySQL Data Tape
No ratings yet
MySQL Data Tape
19 pages
3.3 Methods Used To Store Data & Information
No ratings yet
3.3 Methods Used To Store Data & Information
3 pages
Endnote Tutorial v2
No ratings yet
Endnote Tutorial v2
32 pages
Import and Export in R
No ratings yet
Import and Export in R
3 pages
JDBC
No ratings yet
JDBC
13 pages
SICAM 230 Manual: Allocations
No ratings yet
SICAM 230 Manual: Allocations
8 pages
DataScience Analytics Ai Syllabus
No ratings yet
DataScience Analytics Ai Syllabus
6 pages
Creating Databases Using Python and SQL Module PDF
No ratings yet
Creating Databases Using Python and SQL Module PDF
6 pages
DBMS Assignment1
No ratings yet
DBMS Assignment1
3 pages
Evan - Big Data Architect
No ratings yet
Evan - Big Data Architect
5 pages
How To Find SqlID of Statement
No ratings yet
How To Find SqlID of Statement
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.