0% found this document useful (0 votes)

97 views

Chapter 2

Data science is a multi-disciplinary field that uses scientific methods to extract knowledge from structured, semi-structured, and unstructured data. It involves more than just data analysis and requires a range of skills. Data goes through many transformations as it moves from collection to storage to analysis to decision making. Data scientists need strong quantitative skills, programming abilities, and the ability to communicate technical results to non-technical audiences. The data processing cycle involves input, processing, and output stages to increase the usefulness of data. There are different types of data from both computer programming and data analytics perspectives, including structured, semi-structured, unstructured, and metadata.

Uploaded by

Abdataa waaqaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views

Chapter 2

Uploaded by

Abdataa waaqaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Chapter 2

Data Science

2.1. An Overview of Data Science

Activity 1:
➢ What is data science? Can you describe the role of data in emerging technology?
➢ What are data and information?
➢ What is big data?

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms,
and systems to extract knowledge and insights from structured, semi-structured and
unstructured data. Data science is much more than simply analysing data. It offers a range of
roles and requires a range of skills.
Let’s consider this idea by thinking about some of the data involved in buying a box of cereal
from the store or supermarket:
• Whatever your cereal preferences—teff, wheat, or burly—you prepare for the purchase by
writing “cereal” in your notebook. This planned purchase is a piece of data though it is
written by pencil that you can read.
• When you get to the store, you use your data as a reminder to grab the item and put it in
your cart. At the checkout line, the cashier scans the barcode on your container, and the
cash register logs the price. Back in the warehouse, a computer tells the stock manager that
it is time to request another order from the distributor because your purchase was one of
the last boxes in the store.
• You also have a coupon for your big box, and the cashier scans that, giving you a
predetermined discount. At the end of the week, a report of all the scanned manufacturer
coupons gets uploaded to the cereal company so they can issue a reimbursement to the
grocery store for all of the coupon discounts they have handed out to customers. Finally,
at the end of the month, a store manager looks at a colourful collection of pie charts showing
all the different kinds of cereal that were sold and, on the basis of strong sales of cereals,
decides to offer more varieties of these on the store’s limited shelf space next month.
• So, the small piece of information that began as a scribble on your notebook ended up in
many different places, most notably on the desk of a manager as an aid to decision making.

Data Science Page 1

On the trip from your pencil to the manager’s desk, the data went through many
transformations. In addition to the computers where the data might have stopped by or
stayed on for the long term, lots of other pieces of hardware—such as the barcode
scanner—were involved in collecting, manipulating, transmitting, and storing the data. In
addition, many different pieces of software were used to organize, aggregate, visualize, and
present the data.
Finally, many different human systems were involved in working with the data. People
decided which systems to buy and install, who should get access to what kinds of data, and
what would happen to the data after its immediate purpose was fulfilled.
As an academic discipline and profession, data science continues to evolve as one of the most
promising and in-demand career paths for skilled professionals. Today, successful data
professionals understand that they must advance past the traditional skills of analysing large
amounts of data, data mining, and programming skills. In order to uncover useful intelligence
for their organizations, data scientists must master the full spectrum of the data science life
cycle and possess a level of flexibility and understanding to maximize returns at each phase
of the process.
Data scientists need to be curious and result-oriented, with exceptional industry-specific
knowledge and communication skills that allow them to explain highly technical results to
their non-technical counterparts. They possess a strong quantitative background in statistics
and linear algebra as well as programming knowledge with focuses on data warehousing,
mining, and modelling to build and analyse algorithms.
2.1.1. Data and Information
Data can be defined as a representation of facts, concepts, or instructions in a formalized
manner, which should be suitable for communication, interpretation, or processing, by human
or electronic machines. It can be described as unprocessed facts and figures. It is represented
with the help of characters such as alphabets (A-Z, a-z), digits (0-9) or special characters (+,
-, /, *, <,>, =, etc.). Whereas information is the processed data on which decisions and
actions are based. It is data that has been processed into a form that is meaningful to the
recipient and is of real or perceived value in the current or the prospective action or decision
of recipient. Furtherer more, information is interpreted data; created from organized,
structured, and processed data in a particular context.

Data Science Page 2

2.1.2. Data Processing Cycle
Data processing is the re-structuring or re-ordering of data by people or machines to increase
their usefulness and add values for a particular purpose. Data processing consists of the
following basic steps - input, processing, and output. These three steps constitute the data
processing cycle.

Figure 2.1 Data Processing Cycle

• Input - in this step, the input data is prepared in some convenient form for processing. The
form will depend on the processing machine. For example, when electronic computers are
used, the input data can be recorded on any one of the several types of storage medium, such
as hard disk, CD, flash disk and so on.
• Processing - in this step, the input data is changed to produce data in a more useful form.
For example, interest can be calculated on deposit to a bank, or a summary of sales for the
month can be calculated from the sales orders.
• Output - at this stage, the result of the proceeding processing step is collected. The
particular form of the output data depends on the use of the data. For example, output data
may be payroll for employees.
Activity 2:
➢ Discuss the main differences between data and information with examples.
➢ Can we process data manually using a pencil and paper? Discuss the
differences with data processing using the computer.

2.3 Data types and their representation

Data types can be described from diverse perspectives. In computer science and computer
programming, for instance, a data type is simply an attribute of data that tells the compiler or
interpreter how the programmer intends to use the data.

2.3.1. Data types from Computer programming perspective

Almost all programming languages explicitly include the notion of data type, though different
languages may use different terminology. Common data types include:

Data Science Page 3

• Integers(int)- is used to store whole numbers, mathematically known as integers
• Booleans(bool)- is used to represent restricted to one of two values: true or false
• Characters(char)- is used to store a single character
• Floating-point numbers(float)- is used to store real numbers
• Alphanumeric strings(string)- used to store a combination of characters and numbers
A data type makes the values that expression, such as a variable or a function, might take.
This data type defines the operations that can be done on the data, the meaning of the data,
and the way values of that type can be stored.
2.3.2. Data types from Data Analytics perspective
From a data analytics point of view, it is important to understand that there are three common
types of data types or structures: Structured, Semi-structured, and Unstructured data types.
Structured Data
Structured data is data that adheres to a pre-defined data model and is therefore
straightforward to analyse. Structured data conforms to a tabular format with a relationship
between the different rows and columns. Common examples of structured data are Excel files
or SQL databases. Each of these has structured rows and columns that can be sorted.
Semi-structured Data
Semi-structured data is a form of structured data that does not conform with the formal
structure of data models associated with relational databases or other forms of data tables, but
nonetheless, contains tags or other markers to separate semantic elements and enforce
hierarchies of records and fields within the data. Therefore, it is also known as a self-
describing structure. Examples of semi-structured data include JSON and XML are forms of
semi-structured data.
Unstructured Data
Unstructured data is information that either does not have a predefined data model or is not
organized in a pre-defined manner. Unstructured information is typically text-heavy but may
contain data such as dates, numbers, and facts as well. This results in irregularities and
ambiguities that make it difficult to understand using traditional programs as compared to
data stored in structured databases. Common examples of unstructured data include audio,
video files or NoSQL databases.
Metadata – Data about Data
The last category of data type is metadata. From a technical point of view, this is not a
separate data structure, but it is one of the most important elements for Big Data analysis and
big data solutions. Metadata is data about data. It provides additional information about a

Data Science Page 4

specific set of data.
In a set of photographs, for example, metadata could describe when and where the photos
were taken. The metadata then provides fields for dates and locations which, by themselves,
can be considered structured data. Because of this reason, metadata is frequently used by Big
Data solutions for initial analysis.
Activity 3:
➢ Discuss data types from programing and analytics perspectives.
➢ Compare metadata with structured, unstructured and semi-structured data
➢ Give at least one example of structured, unstructured and semi-structured data types

2.4. Data value Chain

The Data Value Chain is introduced to describe the information flow within a big data system
as a series of steps needed to generate value and useful insights from data. The Big Data
Value Chain identifies the following key high-level activities:
2.4.1. Data Acquisition
It is the process of gathering, filtering, and cleaning data before it is put in a data warehouse
or any other storage solution on which data analysis can be carried out. Data acquisition is
one of the major big data challenges in terms of infrastructure requirements. The
infrastructure required to support the acquisition of big data must deliver low, predictable
latency in both capturing data and in executing queries; be able to handle very high
transaction volumes, often in a distributed environment; and support flexible and dynamic
data structures.
2.4.2. Data Analysis
It is concerned with making the raw data acquired amenable to use in decision-making as
well as domain-specific usage. Data analysis involves exploring, transforming, and modelling
data with the goal of highlighting relevant data, synthesizing and extracting useful hidden
information with high potential from a business point of view. Related areas include data
mining, business intelligence, and machine learning.

2.4.3. Data Curation

It is the active management of data over its life cycle to ensure it meets the necessary data
quality requirements for its effective usage. Data curation processes can be categorized into
different activities such as content creation, selection, classification, transformation,
validation, and preservation. Data curation is performed by expert curators that are

Data Science Page 5

responsible for improving the accessibility and quality of data. Data curators (also known as
scientific curators or data annotators) hold the responsibility of ensuring that data are
trustworthy, discoverable, accessible, reusable and fit their purpose. A key trend for the
duration of big data utilizes community and crowdsourcing approaches.
2.4.4. Data Storage
It is the persistence and management of data in a scalable way that satisfies the needs of
applications that require fast access to the data. Relational Database Management Systems
(RDBMS) have been the main, and almost unique, a solution to the storage paradigm for
nearly 40 years. However, the ACID (Atomicity, Consistency, Isolation, and Durability)
properties that guarantee database transactions lack flexibility with regard to schema changes
and the performance and fault tolerance when data volumes and complexity grow, making
them unsuitable for big data scenarios. NoSQL technologies have been designed with the
scalability goal in mind and present a wide range of solutions based on alternative data
models.
2.4.5. Data Usage
It covers the data-driven business activities that need access to data, its analysis, and the tools
needed to integrate the data analysis within the business activity. Data usage in business
decision making can enhance competitiveness through the reduction of costs, increased added
value, or any other parameter that can be measured against existing performance criteria.

2.5. Basic concepts of big data

Big data is a blanket term for the non-traditional strategies and technologies needed to gather,
organize, process, and gather insights from large datasets. While the problem of working with
data that exceeds the computing power or storage of a single computer is not new, the
pervasiveness, scale, and value of this type of computing have greatly expanded in recent
years.

Data Science Page 6

2.5.1. What Is Big Data?
Big data is the term for a collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools or traditional data processing
applications.
In this context, a “large dataset” means a dataset too large to reasonably process or store with
traditional tooling or on a single computer. This means that the common scale of big datasets
is constantly shifting and may vary significantly from organization to organization. Big data
is characterized by 4Vs:
• Volume: large amounts of data Zeta bytes/Massive datasets
• Velocity: Data is live streaming or in motion
• Variety: data comes in many different forms from diverse sources
• Veracity: can we trust the data? How accurate is it? etc.

Figure 2.4 Characteristics of big data

2.5.2. Clustered Computing and Hadoop Ecosystem

2.5.2.1.Clustered Computing
Because of the qualities of big data, individual computers are often inadequate for handling
the data at most stages. To better address the high storage and computational needs of big
data, computer clusters are a better fit.
Big data clustering software combines the resources of many smaller machines, seeking to
provide a number of benefits:

Data Science Page 7

• Resource Pooling: Combining the available storage space to hold data is a clear benefit,
but CPU and memory pooling are also extremely important. Processing large datasets
requires large amounts of all three of these resources.
• High Availability: Clusters can provide varying levels of fault tolerance and availability
guarantees to prevent hardware or software failures from affecting access to data and
processing. This becomes increasingly important as we continue to emphasize the
importance of real-time analytics.
• Easy Scalability: Clusters make it easy to scale horizontally by adding additional machines
to the group. This means the system can react to changes in resource requirements without
expanding the physical resources on a machine.
Using clusters requires a solution for managing cluster membership, coordinating resource
sharing, and scheduling actual work on individual nodes. Cluster membership and resource
allocation can be handled by software like Hadoop’s YARN (which stands for Yet Another
Resource Negotiator).
The assembled computing cluster often acts as a foundation that other software interfaces
with to process the data. The machines involved in the computing cluster are also typically
involved with the management of a distributed storage system.

Activity 4:
➢ List and discuss the characteristics of big data
➢ Describe the big data life cycle. Which step you think most useful and why?
➢ List and describe each technology or tool used in the big data life cycle.
➢ Discuss the three methods of computing over a large dataset.

2.5.2.2.Hadoop and its Ecosystem

Hadoop is an open-source framework intended to make interaction with big data easier. It is a
framework that allows for the distributed processing of large datasets across clusters of
computers using simple programming models. It is inspired by a technical document
published by Google.
The four key characteristics of Hadoop are:
• Economical: Its systems are highly economical as ordinary computers can be used for

Data Science Page 8

data processing.
• Reliable: It is reliable as it stores copies of the data on different machines and is resistant
to hardware failure.
• Scalable: It is easily scalable both, horizontally and vertically. A few extra nodes help in
scaling up the framework.
• Flexible: It is flexible and you can store as much structured and unstructured data as you
need to and decide to use them later.
Hadoop has an ecosystem that has evolved from its four core components: data management,
access, processing, and storage. It is continuously growing to meet the needs of Big Data. It
comprises the following components and many others:
• HDFS: Hadoop Distributed File System
• YARN: Yet Another Resource Negotiator
• MapReduce: Programming based Data Processing
• Spark: In-Memory data processing
• PIG, HIVE: Query-based processing of data services
• HBase: NoSQL Database
• Mahout, Spark MLLib: Machine Learning algorithm libraries
• Solar, Lucene: Searching and Indexing
• Zookeeper: Managing cluster
• Oozie: Job Scheduling

2.5.3. Big Data Life Cycle with Hadoop

2.5.3.1. Ingesting data into the system

The first stage of Big Data processing is Ingest. The data is ingested or transferred to Hadoop
from various sources such as relational databases, systems, or local files. Sqoop transfers data
from RDBMS to HDFS, whereas Flume transfers event data.

Data Science Page 9

2.5.3.2. Processing the data in storage

The second stage is Processing. In this stage, the data is stored and processed. The data is
stored in the distributed file system, HDFS, and the NoSQL distributed data, HBase. Spark
and MapReduce perform data processing.

2.5.3.3. Computing and analyzing data

The third stage is to Analyze. Here, the data is analyzed by processing frameworks such as
Pig, Hive, and Impala. Pig converts the data using a map and reduces and then analyzes it.
Hive is also based on the map and reduces programming and is most suitable for structured
data.

2.5.3.4. Visualizing the results

The fourth stage is Access, which is performed by tools such as Hue and Cloudera Search. In
this stage, the analysed data can be accessed by users.

Data Science Page 10

Lesson 3 Data Science
No ratings yet
Lesson 3 Data Science
12 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
HTC Emerging Ch2
No ratings yet
HTC Emerging Ch2
37 pages
Chapter - 2
No ratings yet
Chapter - 2
38 pages
Chapter 2 Data Science (4)
No ratings yet
Chapter 2 Data Science (4)
8 pages
ET_Ch-2_Data_Science_ppt (2)
No ratings yet
ET_Ch-2_Data_Science_ppt (2)
28 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Emerging_CH2
No ratings yet
Emerging_CH2
41 pages
Chapter 2 - Intro to Data Sciences[2]
No ratings yet
Chapter 2 - Intro to Data Sciences[2]
41 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
EmTech Chapter 2 - Data Science
No ratings yet
EmTech Chapter 2 - Data Science
22 pages
Emerg T 2200
No ratings yet
Emerg T 2200
17 pages
Sample Security Plan
No ratings yet
Sample Security Plan
9 pages
asset-v1_e-SHE+EX101+Q1+type@asset+block@Chapter2_Session_1_pdf
No ratings yet
asset-v1_e-SHE+EX101+Q1+type@asset+block@Chapter2_Session_1_pdf
6 pages
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
33 pages
Dr. Ayaz_Data Science Presentation
No ratings yet
Dr. Ayaz_Data Science Presentation
164 pages
Chapter 2 - Introduction to Data Science (2)
No ratings yet
Chapter 2 - Introduction to Data Science (2)
35 pages
Data Lifecycle
No ratings yet
Data Lifecycle
55 pages
Chapter 2 Introduction to Data Science_for Extension
No ratings yet
Chapter 2 Introduction to Data Science_for Extension
51 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
50 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
ETCh2
No ratings yet
ETCh2
36 pages
Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
Multidisciplinary Field That Uses A Variety
No ratings yet
Multidisciplinary Field That Uses A Variety
48 pages
ict Ch. 2
No ratings yet
ict Ch. 2
38 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
56 pages
Data Science
No ratings yet
Data Science
35 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Module 1
No ratings yet
Module 1
35 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
Chapter 2 - Overview for Data Science
No ratings yet
Chapter 2 - Overview for Data Science
31 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Chapter 2. Introduction To Data Science
100% (2)
Chapter 2. Introduction To Data Science
45 pages
Chapter 2. Introduction to Data Science
No ratings yet
Chapter 2. Introduction to Data Science
41 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
IT 106 - Intro To Data Sciences
No ratings yet
IT 106 - Intro To Data Sciences
32 pages
Data Science
No ratings yet
Data Science
32 pages
Emerging Tech Ch 2
No ratings yet
Emerging Tech Ch 2
52 pages
DATA ANALYSIS_Full_Note_Immersive 2
No ratings yet
DATA ANALYSIS_Full_Note_Immersive 2
13 pages
Data v2
No ratings yet
Data v2
25 pages
Module 1 Introduction To DataScience and Analytics
No ratings yet
Module 1 Introduction To DataScience and Analytics
10 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
CH-2 Introduction To Data Science
No ratings yet
CH-2 Introduction To Data Science
26 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Unit 1
No ratings yet
Unit 1
19 pages
MCS-226
No ratings yet
MCS-226
348 pages
Block 1
No ratings yet
Block 1
107 pages
Emergency chapter two(2)
No ratings yet
Emergency chapter two(2)
41 pages
22mca341 - Data Science
No ratings yet
22mca341 - Data Science
109 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
From Everand
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
RAJIV JAIN
No ratings yet
Chapter Five Augmented Reality (AR) : Emerging Technologies
100% (1)
Chapter Five Augmented Reality (AR) : Emerging Technologies
10 pages
Using The Five Parts of Argument
No ratings yet
Using The Five Parts of Argument
3 pages
Chapter Four Internet of Things (Iot) : Iot Services+ Data+ Networks + Sensors
No ratings yet
Chapter Four Internet of Things (Iot) : Iot Services+ Data+ Networks + Sensors
14 pages
Purpose of Amazon Linux 2
No ratings yet
Purpose of Amazon Linux 2
10 pages
Linux, KDE and Much Fun... : 1, Back Ground History
No ratings yet
Linux, KDE and Much Fun... : 1, Back Ground History
12 pages
Perform Complex Test To Measure Engineering Properties of Materials
No ratings yet
Perform Complex Test To Measure Engineering Properties of Materials
82 pages
Modern Linux Installation: A Mandrake Linux Install
No ratings yet
Modern Linux Installation: A Mandrake Linux Install
35 pages
Abdo Hadiya Wechemo
100% (1)
Abdo Hadiya Wechemo
18 pages
Introduction To Digital Image Processing
No ratings yet
Introduction To Digital Image Processing
3 pages
Asianux: Release History
No ratings yet
Asianux: Release History
18 pages
Structured Vs Unstructured Data Editted
No ratings yet
Structured Vs Unstructured Data Editted
2 pages
Structured Vs Unstructured Data Editted
No ratings yet
Structured Vs Unstructured Data Editted
2 pages
5 - Lesson (Frequency Domain Image Processing)
No ratings yet
5 - Lesson (Frequency Domain Image Processing)
58 pages
Royal College Hosanna Campus
No ratings yet
Royal College Hosanna Campus
2 pages
Instruction I: Choose The Best Answer (2.5 Point Each)
No ratings yet
Instruction I: Choose The Best Answer (2.5 Point Each)
5 pages
MSM-SQL Data Dictionary Guide v3.4 (Micronetics) 1997
No ratings yet
MSM-SQL Data Dictionary Guide v3.4 (Micronetics) 1997
168 pages
7. DATA SCIENCE - PYTHON DATA TYPES - 14 - 04 - 2025
No ratings yet
7. DATA SCIENCE - PYTHON DATA TYPES - 14 - 04 - 2025
6 pages
Student Report
No ratings yet
Student Report
43 pages
Plex SQL Queries B099751BWX
100% (1)
Plex SQL Queries B099751BWX
127 pages
Access SQL - Visual Basic 6 (VB6)
No ratings yet
Access SQL - Visual Basic 6 (VB6)
21 pages
Introduction to Java and BlueJ(1) (2)
No ratings yet
Introduction to Java and BlueJ(1) (2)
5 pages
Java PTU
No ratings yet
Java PTU
219 pages
Apex Workbook
No ratings yet
Apex Workbook
87 pages
Language Fundamentals C# PDF
100% (2)
Language Fundamentals C# PDF
46 pages
Est Your C Programming Skills: Download Your Free C MCQ PDF!
No ratings yet
Est Your C Programming Skills: Download Your Free C MCQ PDF!
16 pages
DS Unit I
100% (1)
DS Unit I
40 pages
Lecture Notes Unit 1 Java
No ratings yet
Lecture Notes Unit 1 Java
37 pages
Lec13 X86asm
No ratings yet
Lec13 X86asm
71 pages
Cssa7 161218101108 170427061027 PDF
No ratings yet
Cssa7 161218101108 170427061027 PDF
108 pages
PSPC Unit-5
No ratings yet
PSPC Unit-5
56 pages
Lesson 3
No ratings yet
Lesson 3
14 pages
Anonymous Array
No ratings yet
Anonymous Array
6 pages
DVF PG
No ratings yet
DVF PG
726 pages
Python Cheatsheet Ebook
No ratings yet
Python Cheatsheet Ebook
53 pages
Sarful H. Arduino Programming for Absolute Beginners 2024
No ratings yet
Sarful H. Arduino Programming for Absolute Beginners 2024
428 pages
Coded UI Automation - User Guide
No ratings yet
Coded UI Automation - User Guide
12 pages
F# Programming
No ratings yet
F# Programming
104 pages
Data Structures MCQ
100% (1)
Data Structures MCQ
19 pages
Data Structures and Algorithms in Java
No ratings yet
Data Structures and Algorithms in Java
44 pages
Python Unit 1.pptx
No ratings yet
Python Unit 1.pptx
50 pages
9618_w24_qp_42
No ratings yet
9618_w24_qp_42
16 pages
Ex 05
No ratings yet
Ex 05
4 pages
LEC_1_INTRODUCTION TO PYTHON.pptx
No ratings yet
LEC_1_INTRODUCTION TO PYTHON.pptx
87 pages
New Class XI (2017 - 18) QB
No ratings yet
New Class XI (2017 - 18) QB
126 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Chapter 2

2.1. An Overview of Data Science

Data Science Page 1

Data Science Page 2

Figure 2.1 Data Processing Cycle

2.3 Data types and their representation

2.3.1. Data types from Computer programming perspective

Data Science Page 3

Data Science Page 4

2.4. Data value Chain

2.4.3. Data Curation

Data Science Page 5

2.5. Basic concepts of big data

Data Science Page 6

Figure 2.4 Characteristics of big data

2.5.2. Clustered Computing and Hadoop Ecosystem

Data Science Page 7

2.5.2.2.Hadoop and its Ecosystem

Data Science Page 8

2.5.3. Big Data Life Cycle with Hadoop

2.5.3.1. Ingesting data into the system

Data Science Page 9

2.5.3.3. Computing and analyzing data

2.5.3.4. Visualizing the results

Data Science Page 10

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.