0% found this document useful (0 votes)
13 views28 pages

Unit 1-Part2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views28 pages

Unit 1-Part2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

The Qualitative data are further classified into two parts :

Nominal Data
Nominal Data is used to label variables without any order or quantitative
value. The color of hair can be considered nominal data, as one color can’t be
compared with another color.
The name “nominal” comes from the Latin name “nomen,” which means
“name.” With the help of nominal data, we can’t do any numerical tasks or
can’t give any order to sort the data. These data don’t have any meaningful
order; their values are distributed into distinct categories.
Examples of Nominal Data :

Colour of hair (Blonde, red, Brown, Black, etc.)

Marital status (Single, Widowed, Married)

Nationality (Indian, German, American)

Gender (Male, Female, Others)

Eye Color (Black, Brown, etc.)
Ordinal Data
Ordinal data have natural ordering where a number is present in some kind
of order by their position on the scale. These data are used for observation
like customer satisfaction, happiness, etc., but we can’t do any arithmetical
tasks on them.
Ordinal data is qualitative data for which their values have some kind of
relative position. These kinds of data can be considered “in-between”
qualitative and quantitative data. The ordinal data only shows the sequences
and cannot use for statistical analysis. Compared to nominal data, ordinal
data have some kind of order that is not present in nominal data.
Examples of Ordinal Data :
When companies ask for feedback, experience, or satisfaction on a scale of 1
to 10
Letter grades in the exam (A, B, C, D, etc.)
Ranking of people in a competition (First, Second, Third, etc.)
Economic Status (High, Medium, and Low)
Education Level (Higher, Secondary, Primary)
Continuous Data
Continuous data are in the form of fractional numbers. It can be
the version of an android phone, the height of a person,
the length of an object, etc. Continuous data represents
information that can be divided into smaller levels. The
continuous variable can take any value within a range.
The key difference between discrete and continuous data is
that discrete data contains the integer or whole number. Still,
continuous data stores the fractional numbers to record
different types of data such as temperature, height, width,
time, speed, etc.
Examples of Continuous Data :
Height of a person
Speed of a vehicle
“Time-taken” to finish the work
Wi-Fi Frequency
Market share price
Discrete Data
The term discrete means distinct or separate. The
discrete data contain the values that fall under integers
or whole numbers. The total number of students in a
class is an example of discrete data. These data can’t be
broken into decimal or fraction values.
The discrete data are countable and have finite values;
their subdivision is not possible. These data are
represented mainly by a bar graph, number line, or
frequency table.
Examples of Discrete Data :
Total numbers of students present in a class
Numbers of employees in a company
The total number of players who participated in a
competition
Days in a week
Graph data :
Data that represents relationships between
entities, often modeled as nodes and edges
(e.g., social networks, network graphs).
Graph data : Examples
Social Networks:
Facebook, Twitter, LinkedIn, and other social media platforms represent individuals
(nodes)
and their connections or friendships (edges).
Network Graphs:
In computer networks, nodes can represent devices (such as computers or routers),
and edges
represent the connections or links between them.
Citation Networks:
In academic research, nodes can represent papers or authors, and edges represent
citations or
collaborations between them.
Recommendation Systems:
Nodes can represent users or items, and edges can represent user interactions or
preferences.
Graphs are used to model and recommend items to users based on their preferences
and
connections.
High-Dimensional Data:

• Refers to data sets with a large number of variables or features.


• This type of data is often encountered in fields like genomics, image processing, and
other complex systems where measurements are taken across numerous dimensions
High dimension data - examples
Genomic Data: Gene expression data can involve thousands of genes for each sample.

Image Data:
Each pixel in an image can be considered a dimension, and high-resolution images result in
high-dimensional datasets.

Text Data:
In natural language processing, the representation of text data using techniques like
TF-IDF or word embeddings can result in high-dimensional feature spaces.

Social Media Data:


Example: User behavior on social media platforms can be represented as high-dimensional
data,
where each user's actions (likes, comments, shares) on different types of content serve as
Features.

Customer Transaction Data:


Example: E-commerce companies may collect high-dimensional data on customer
transactions,
including purchase history, time of purchase, product categories, and more.
Classification of Digital Data
Structured Data

1. Structured Data:
Definition: Well-organized data with a fixed schema, often stored in
relational databases.
Examples:
Relational Database Table:
• Attributes (Columns): ID, Name, Age, Address
• Records (Rows):
• 1, John Doe, 30, 123 Main St
• 2, Jane Smith, 25, 456 Oak St
Excel Spreadsheet:
• Columns represent different attributes (e.g., Date, Sales, Product).
• Rows represent individual entries for each date.
SQL Database:
• Tables with predefined columns and data types.
Semi - Structured Data
2. Semi-structured Data
Definition: Falls between structured and unstructured data, having
some
organizational elements but not adhering to a strict schema.

Examples:

1.JSON (JavaScript Object Notation):


json
•{ "employee": { "id": "123", "name": "Alice", "department": "IT" } }
•XML (eXtensible Markup Language):
xml
•<book> <title>Harry Potter</title> <author>J.K. Rowling</author>
• <genre>Fantasy</genre> </book>
•NoSQL Databases (e.g., MongoDB):
•Documents with flexible structures.
Unstructured Data
3. Unstructured Data :
Lacks a predefined data model or structure; often in
the form of text, images, audio, or video.
Text Documents:
• Word documents, PDFs, plain text files.
Images:
• JPEG, PNG, GIF files.
Audio:
• MP3, WAV, AAC files.
Video:
• MP4, AVI, MKV files.
Social Media Posts:
• Tweets, Facebook posts, Instagram updates.
Emails:
• Content of email messages.
Sources of Data
Sources of data in data science
1.Databases:

•Relational Databases: Structured data stored in relational database


management systems (RDBMS) like MySQL, PostgreSQL, or
SQL Server.

•NoSQL Databases: Data stored in non-relational databases such as


MongoDB, Cassandra, or Redis, which may handle unstructured
or semi-structured data.

.
Sources of data in data science
2.APIs (Application Programming Interfaces):
• Access data from web APIs that provide a
structured way to interact with web services.
• Examples include Twitter API, Google Maps
API, or financial market APIs
Sources of data in data science

3.Web Scraping:

•Extracting data from websites by parsing HTML and other


web page formats.
•Web scraping tools and libraries
(e.g., BeautifulSoup, Scrapy) are often used for this
purpose.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy