0% found this document useful (0 votes)

24 views12 pages

FDS Most Imp Question

The document provides definitions and explanations of various data science concepts, including primary data, interquartile range, data visualization tools, and data preprocessing. It covers statistical methods, data attributes, and the importance of data quality and transformation in analysis. Additionally, it discusses hypothesis testing, measures of central tendency, and the visualization of geospatial data.

Uploaded by

j03410581

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views12 pages

FDS Most Imp Question

Uploaded by

j03410581

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

What do you mean by Primary Data- Primary data refers to information that is collected

directly from original sources for a speciﬁc research purpose or analysis. It is ﬁrsthand data
gathered through methods such as surveys, interviews, experiments, or observations,
ensuring that it is original and not previously analyzed.

Deﬁne Interquartile range- The interquartile range (IQR) is a statistical measure of the
spread of a dataset. It is the range between the ﬁrst quartile (Q1) and the third quartile (Q3)
and represents the middle 50% of the data. The formula is: IQR=Q3−Q1\text {IQR} = Q3 -
Q1IQR=Q3−Q1 It helps identify the variability in the data and is used to detect outliers.

What are uses of zip files- ZIP files are compressed archive files used to reduce the size of
files or collections of files for storage and transfer. The main uses include : Space-saving:
Compressing files to take up less disk space. Ease of transfer: Reducing file size makes it
faster and more e icient to share over the internet. Organization: Bundling multiple files or
folders into a single archive. Security: Optionally, ZIP files can be encrypted with a password
for added protection.

What do you mean by XML Files data format-XML (eXtensible Markup Language) is a text-
based format used to store and transport structured data. It is both human-readable and
machine-readable, making it widely used for data representation and sharing between
systems. XML ﬁles are made up of elements, attributes, and nested structures that deﬁne
data hierarchies.

What is visual encoding- Visual encoding refers to the process of representing data through
visual means such as charts, graphs, and diagrams. It involves translating data attributes
(e.g., values, categories) into visual elements (e.g., shapes, colors, positions) to facilitate
understanding and pattern recognition.

What is Data science- Data science is an interdisciplinary ﬁeld that combines scientiﬁc
methods, processes, algorithms, and systems to extract insights and knowledge from
structured and unstructured data. It involves areas such as statistics, machine learning, data
analysis, and big data technologies to make informed decisions.

Deﬁne Data source- A data source is any origin from which data is obtained or collected for
analysis. Data sources can be primary (e.g., surveys, interviews, experiments) or secondary
(e.g., databases, research papers, and public datasets). Data sources can also include APIs,
sensors, or data dumps from various software.

Deﬁne Hypothesis Testing- Hypothesis testing is a statistical method used to determine

whether there is enough evidence to reject a null hypothesis about a population parameter. It
involves formulating a null hypothesis (H0H_0H0) and an alternative hypothesis (H1H_1H1),
collecting data, and using statistical tests (e.g., t-tests, chi-square tests) to make decisions.
The result is typically expressed in terms of a p-value, indicating the likelihood of observing
the data if the null hypothesis is true.
What is use of Bubble plot- A bubble plot is a type of scatter plot that displays three
dimensions of data. The x and y axes represent two variables, while the size of the bubble
represents a third variable, and the color can represent another variable. It is useful for
visualizing relationships and comparing data points based on three attributes.

Deﬁne Data cleaning- Data cleaning is the process of identifying and rectifying errors,
inconsistencies, inaccuracies, or missing values in a dataset to improve its quality. This
process can include tasks such as: Removing duplicates ,Correcting typos or formatting
issues , Handling missing values by imputation or deletion , Ensuring data consistency across
the dataset

List the visualization libraries in python.

Matplotlib: Basic, highly customizable 2D plots.
Seaborn: Built on Matplotlib, provides more sophisticated and high-level visualizations.
Plotly: Interactive, web-based plots.
Bokeh: Interactive plots for web applications.
ggplot: A Python adaptation of the popular R library for statistical graphics.
Altair:Declarative statistical visualization library.
Pandas Visualization: Quick plotting capabilities built into Pandas dataframes.

List applications of data science.

Healthcare: Predictive analysis, medical image analysis, personalized treatment.
Finance: Fraud detection, risk modeling, stock market predictions
Retail: Customer segmentation, demand forecasting, recommendation systems.
Marketing: Customer behavior analysis, targeted advertising.
E-commerce: Personalization, sales forecasting, inventory management.
Transportation: Route optimization, predictive maintenance, traffic analysis.
Social Media: Sentiment analysis, user engagement analysis.
Education: Adaptive learning systems, dropout prediction.

What is data transformation- Data transformation is the process of converting data from
one format, structure, or value into another. This process is essential for cleaning, enriching,
and formatting data before analysis or integration with other data systems. It may involve
tasks such as normalization, aggregation, or data type conversion.

Deﬁne standard deviation-Standard deviation is the measurement of the dispersion of the

data set from its mean value. It is always measured in arithmetic value. Standard deviation is
always positive and is denoted by σ (sigma).

What is data discretization-The data discretization techniques can be used to reduce the
number of values for a given continuous attribute by dividing the range of the attribute into
intervals. Interval labels can be used to restore actual data values. It can be restoring
multiple values of a continuous attribute with a small number of interval labels therefore
decrease and simplifies the original information.
What is missing values-Missing values defines specified data values as user-missing. For
example, you might want to distinguish between data that are missing because a respondent
refused to answer and data that are missing because the question didn't apply to that
respondent.

What is data quality-Data quality refers to the condition of a dataset, determined by factors
such as accuracy, completeness, consistency, reliability, and relevance. High-quality data is
essential for e ective analysis and decision-making.

What is tag cloud-A tag cloud (also known as a word cloud or weighted list in visual design) is
a visual representation of text data which is often used to depict keyword metadata on
websites, or to visualize free form text. Tags are usually single words, and the importance of
each tag is shown with font size or color .When used as website navigation aids, the terms are
hyperlinked to items associated with the tag.

What is outlier. State types of outliers

An outlier is a data point that di ers significantly from other observations in a dataset.
Outliers can indicate variability in the measurements, experimental errors, or a novel
phenomenon. They can be classified into two main types:
1. Univariate Outliers: These are outliers in a single variable's distribution. They can be
identified using statistical methods like the Z-score or the IQR (Interquartile Range) method.
2. Multivariate Outliers: These occur when a data point is an outlier in the context of
multiple variables. They can be detected using methods such as Mahalanobis distance or
clustering techniques.

Explain Data transformation and there types

Data transformation is the process of converting, cleaning, and structuring raw data into a
usable format for analysis and decision-making. It’s a crucial step in data management that
ensures your information is accessible, consistent, and secure.As you deal with massive
amounts of data from various sources daily, data transformation has become an essential
tool to integrate, store, and analyze information for business intelligence.Data transformation
can be categorized into four main types:Constructive: Adding, copying, or replicating data.
Destructive : Deleting unnecessary records or ﬁelds.
Esthetic : Standardizing values to meet speciﬁc requirements or parameters.
Structural: Reorganizing the database by renaming, moving, or combining columns.

Write details notes on basic data visualization tools

Data Visualization Tools are software platforms that provide information in a visual format
such as a graph, chart, etc to make it easily understandable and usable. Data Visualization
tools are so popular as they allow analysts and statisticians to create visual data models
easily according to their speciﬁcations by conveniently providing an interface, database
connections, and Machine Learning tools all in one place!
Bar Charts: Used to compare quantities across di erent categories.
Histograms: Represent the distribution of numerical data.
Line Graphs: Show trends over time.
Scatter Plots: Display relationships between two numerical variables.
Box Plots: Summarize data using quartiles and highlight outliers.
What are the measures of central tendency. Explain them
frequency distribution and graphical representation are used to depict a set of raw data to
attain meaningful conclusions from them. However, sometimes, these methods fail to
convey a proper and clear picture of the data as expected. Therefore, some measures, also
known as Measures of Central Tendency.The di erent measures of central tendency can be
classiﬁed into three categories; viz., Mathematical Averages (Arithmetic Mean Geometric
Mean(G), and Harmonic Mean(H)), Positional Averages (Median(M or Me) and Mode(Z)), and
Commercial Averages (Moving Average, Progressive Average, and Composite Average).

What are the di erent methods for measuring the data dispersion
The measures of dispersion that are measured and expressed in the units of data themselves
are called Absolute Measure of Dispersion. For example – Meters, Dollars, Kg, etc.Some
absolute measures of dispersion are: Range: It is defined as the di erence between the
largest and the smallest value in the distribution.
Mean Deviation: It is the arithmetic mean of the di erence between the values and their
mean.
Standard Deviation: It is the square root of the arithmetic average of the square of the
deviations measured from the mean.
Variance: It is defined as the average of the square deviation from the mean of the given data
set.
Quartile Deviation: It is defined as half of the di erence between the third quartile and the
first quartile in a given data set.
Interquartile Range: The di erence between upper(Q3 ) and lower(Q1) quartile is called
Interterquartile Range. Its formula is given as Q3 – Q1.

How do you visualize geospatial data. Explain in detail

Geospatial visualization is the process of representing data associated with a location on a
map to help people understand it. It can be used to create interactive 3D maps and graphics,
or static maps. Geospatial visualization can help people understand patterns, trends, and
themes on Earth's surface. It can also help identify problems, track changes, and make
predictions.
Maps-Maps can be used to show the boundaries of a country, continent, or the whole
planet. They can also be used to show a street, town, or park.
Proportional symbol maps-These maps use symbols of di erent sizes to show the relative
magnitude of data at speciﬁc locations. For example, you can use this map type to show
population density, crime rates, or economic indicators.
Interactive maps-Interactive maps allow users to navigate the map controls and interact
with the data to get more information

Deﬁne Interquartile range

The interquartile range (IQR) is a simple way to measure how spread out the middle 50% of a
dataset is. It’s used in statistics to understand the spread of data by focusing on the central
part, ignoring any extreme values or outliers. This makes the IQR a useful tool when you want
to get a clear sense of where most of your data points lie, without letting unusually high or low
values distort the picture. Essentially, the IQR helps describe how clustered or spread out the
middle portion of your data
Explain di erent data formats in brief-Data file formats usually come in two main varieties:
Binary files - files that contain information in their binary format, usually supporting
documents containing image or video data.
Text-based files - containing text-based data and information, for documents that are
primarily databases.
1. CSV (Comma-Separated Values): A simple text format for tabular data.
2. JSON (JavaScript Object Notation): A lightweight data interchange format that is easy for
humans to read and write.
3. XML (eXtensible Markup Language): A markup language that defines rules for encoding
documents in a format that is both human-readable and machine-readable.
4. Excel: A spreadsheet format commonly used for data analysis and manipulation.

What do you mean by Data attribute .Explain types of attributes with example
Data attributes refer to the specific characteristics or properties that describe individual data
objects within a dataset.These attributes provide meaningful information about the objects
and are used to analyze, classify, or manipulate the data.Understanding and analyzing data
attributes is fundamental in various fields such as statistics , machine learning , and data
analysis, as they form the basis for deriving insights and making informed decisions from the
data.
Nominal Attributes :Nominal attributes, as related to names, refer to categorical data where
the values represent di erent categories or labels without any inherent order or ranking.
Binary Attributes: Binary attributes are a type of qualitative attribute where the data can take
on only two distinct values or states.
Symmetric: In a symmetric attribute, both values or states are considered equally important
or interchangeable.
Asymmetric: An asymmetric attribute indicates that the two values or states are not equally
important or interchangeable
Ordinal Attributes : Ordinal attributes are a type of qualitative attribute where the values
possess a meaningful order or ranking, but the magnitude between values is not precisely
quantified.

Explain data cube-When data is grouped or combined in multidimensional matrices called

Data Cubes. The data cube method has a few alternative names or a few variants, such as
"Multidimensional databases," "materialized views," and "OLAP (On-Line Analytical
Processing)."The general idea of this approach is to materialize certain expensive
computations that are frequently inquired.example, a relation with the schema sales (part,
supplier, customer, and sale-price) can be materialized into a set of eight views

Deﬁne statistical data analysis:-Statistical Analysis means gathering, understanding, and

showing data to ﬁnd patterns and connections that can help us make decisions. It includes
lots of di erent ways to look at data, from simple stu like basic facts to more complicated
methods for ﬁguring out what those facts mean.

Explain 3V’s of data science-The 3 V's (volume, velocity and variety) are three deﬁning
properties or dimensions of big data. Volume refers to the amount of data, velocity refers to
the speed of data processing, and variety refers to the number of types of data.
What is meant by Noisy data-Noisy data are data with a large amount of additional
meaningless information in it called noise. [1] This includes data corruption and the term is
often used as a synonym for corrupt data.[1] It also includes any data that a user system
cannot understand and interpret correctly. Many systems, for example, cannot use
unstructured text. Noisy data can adversely a ect the results of any data analysis and skew
conclusions if not handled properly. Statistical analysis is sometimes used to weed the noise
out of noisy data.

Give the purpose of data preprocessing? Data preprocessing is a crucial step in data
mining and machine learning. It involves cleaning, transforming, and preparing raw data to
improve its quality and suitability for analysis. The main purposes of data preprocessing
include: Handling missing values: Imputing missing values or removing records with missing
data. Noise reduction: Identifying and removing noise or outliers. Data integration:
Combining data from multiple sources.Data transformation: Normalization,
standardization, and feature engineering.
Data reduction: Dimensionality reduction and feature selection.

Explain any four data visualization tools.

Google Charts:A free tool that lets you create interactive charts for online display.
Apache Superset:An open-source platform for data visualization and exploration. It has
built-in charts, visualizations, and interactive dashboards.
Boxplots:A tool for identifying outliers. It displays the distribution of statistical observations.
Visme : An online tool for creating infographics, charts, maps, graphs, presentations, and
social media graphics.

Explain null and alternate hypothesis.

Null Hypothesis (H₀):The null hypothesis represents a statement of no e ect, no di erence,
or no relationship. It is the default assumption that any observed di erences or e ects in the
data are due to chance or random variation, rather than any speciﬁc cause or intervention.
Alternate Hypothesis (H₁ or Ha):The alternate hypothesis is the complement of the null
hypothesis. It suggests that there is a signiﬁcant e ect, di erence, or relationship. In other
words, it is the hypothesis that researchers hope to support.

What is venn diagram? How to create it? Explain with example.

A Venn diagram is a visual representation of sets and their relationships. It consists of
overlapping circles, where each circle represents a set and the overlapping regions represent
the intersection of sets. Example:
Consider two sets: A = {1, 2, 3, 4} and B = {3, 4, 5, 6}.

The overlapping region represents the intersection of the two sets, which
is {3, 4}. Venn diagrams are useful for understanding and visualizing set
operations like union, intersection, and di erence.
What do you mean by Data transformation? Explain strategies of data transformation.
Data transformation refers to the process of converting data from its original format or
structure into a di erent format or structure that is more suitable for analysis, reporting, or
further processing. This is a key step in data preprocessing and can involve several operations
that change the data's format, structure, or values to meet speciﬁc analytical needs. Data
transformation is often performed during the Extract, Transform, Load (ETL) process in data
engineering, especially when data from disparate sources needs to be integrated into a
centralized data warehouse or used for machine learning.

Data Cleaning (or Data Wrangling):Data cleaning is an essential part of data transformation
Normalization: the process of adjusting the values in a dataset to a common scale, typically
in a range like 0 to 1.
Aggregation:Aggregation is the process of combining data from multiple sources or records
into a single summary value. This is often used in reporting or time-series data analysis.
Data Type Conversion-Data often needs to be transformed to appropriate types to match the
requirements of downstream processes
Feature Engineering-Feature engineering is the process of creating new variables (features)
or modifying existing ones to improve the performance of a machine learning model or
analysis.

What is inferential statistics?

Inferential statistics is a branch of statistics that involves drawing conclusions about a
population based on a sample of data. It uses statistical tests and models to make inferences
about the population parameters, such as the mean, standard deviation, or proportions.
Common techniques include hypothesis testing, conﬁdence intervals, and regression
analysis.

Explain any two data transformation technique in detail.

Data transformation is the process of converting raw data into a suitable format for analysis.
Two common techniques are:
Normalization: Scales numerical data to a speciﬁc range (e.g., 0 to 1 or -1 to 1). Helps in
improving the performance of machine learning algorithms. Techniques include min-max
scaling and z-score normalization.
Discretization: Converts continuous numerical data into discrete intervals or bins. Reduces
the number of values, simpliﬁes analysis, and can improve model performance. Methods
include equal-width binning, equal-frequency binning, and clustering-based binning.

Write a short note on feature extraction.

Feature extraction is the process of selecting and transforming relevant features from raw
data to improve the performance of machine learning models. It involves identifying the most
informative characteristics of the data that contribute to the prediction or classiﬁcation task.
Feature Selection: Choosing a subset of the most relevant features.
Feature Engineering: Creating new features from existing ones.
Dimensionality Reduction: Reducing the number of features using techniques like Principal
Component Analysis (PCA) or t-SNE.
Explain Exploratory Data Analysis (EDA) in detail.
Exploratory Data Analysis (EDA) is an essential step in the data science pipeline. It involves
understanding the data through statistical summaries, visualizations, and other techniques.
Understand the Data:
Identify the data types (numerical, categorical)
Check for missing values and outliers
Examine the distribution of variables
Discover Patterns:
Identify trends, correlations, and relationships between variables
Find clusters and anomalies
Prepare for Modeling:
Transform and clean the data
Select relevant features for modelling

Explain any two tools in data scientist tool box.

Python: -Versatile programming language for data analysis, machine learning, and data
visualization. Popular libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn
SQL: Essential for working with relational databases. Used for data extraction,
transformation, and loading (ETL) processes. Enables querying and manipulating large
datasets.

Write a short note on wordclouds.

A word cloud is a visual representation of text data where words are displayed in di erent
sizes, with larger words representing more frequent terms. It's a useful tool for quickly
identifying the most important keywords or themes within a text document or corpus. Word
clouds are often used in text analysis, natural language processing, and information
visualization. By visually highlighting the most prominent words, word clouds can help users
gain insights into the underlying topics and sentiments of the text.

Explain any two ways in which data is stored in ﬁles.

Text-based Files:
-Data is stored as plain text characters.
-Common formats: CSV, TSV, JSON, XML
-Simple to read and write but less e icient for large datasets.
Binary Files:
-Data is stored in binary format,it is more e icient for storing large amounts of data.
-Common formats: Databases, images, audio, video
-Requires speciﬁc software or libraries to read and write.

What is a quartile?
A quartile is a statistical measure that divides a dataset into four equal parts. There are three
quartiles: Quartiles are used to understand the distribution and variability of data.
First Quartile (Q1): Divides the lowest 25% of the data.
Second Quartile (Q2): Also known as the median, divides the lowest 50% of the data.
Third Quartile (Q3): Divides the lowest 75% of the data.
Types of Data (with Examples):
Structured Data: Organized into tables (e.g., spreadsheets, databases).
Unstructured Data: Lacks predeﬁned format (e.g., videos, social media posts).
Semi-Structured Data: Contains some organizational properties (e.g., JSON, XML).
Metadata: Data about data (e.g., ﬁle size, author name).

Types of Descriptive Statistics

Descriptive statistics summarize and describe the main features of a dataset.
They are categorized into:
Measures of Central Tendency: Mean, Median, Mode
Measures of Dispersion: Range, Variance, Standard Deviation
Measures of Shape: Skewness and Kurtosis

Types of Data
Structured Data: Organized into rows and columns (e.g., spreadsheets, databases).
Unstructured Data: Data without a ﬁxed structure (e.g., text, images, videos).
Semi-Structured Data: Data with a ﬂexible structure (e.g., JSON, XML).
Qualitative Data: Non-numerical information (e.g., opinions, descriptions).
Quantitative Data: Numerical information (e.g., measurements, counts).

Transformation Strategies
Log Transformation: Reduces skewness and handles large-scale di erences in data.
Square Root Transformation: Reduces skewness for positive data.
Z-Score Transformation: Standardizes data using z-scores.
Scaling: Ensures all data is on the same scale for analysis.

What is Visual Encoding

-Visual encoding refers to the process of representing data values as visual elements, such as
position, size, shape, color, and orientation, to make data interpretable. For example, in a bar
chart, the length of the bar encodes the data value.

Deﬁne Box Plot

-A box plot is a graphical representation of the distribution of a dataset. It displays the
median, quartiles, and potential outliers. The box represents the interquartile range (IQR),
while whiskers extend to show the range of the data.
Deﬁne Dendrogram
- A dendrogram is a tree-like diagram that represents hierarchical relationships between data
points. It is commonly used in clustering algorithms to visualize how data points are grouped.

What are Donut Charts

-A donut chart is a variation of a pie chart with a circular hole in the center. It is used to
represent proportions or percentages among di erent categories.

Deﬁne Area Chart

-An area chart is a type of graph that represents quantitative data using ﬁlled areas under the
line connecting data points. It shows trends over time and emphasizes the magnitude of
changes.

Life Cycle of Data Science:

Problem Deﬁnition: Identifying the question or challenge to solve.
Data Acquisition: Collecting raw data from sources.
Data Preparation: Cleaning, transforming, and organizing data for analysis.
Exploratory Data Analysis (EDA): Understanding patterns and relationships in the data.
Modeling : Applying machine learning or statistical models to derive insights.
Evaluation: Assessing the model's accuracy and reliability.
Deployment: Implementing the solution in real-world applications.
Monitoring and Maintenance: Ensuring the model remains e ective
over time.

Data Scientist's Toolbox:

Programming Languages: Python, R
Libraries and Frameworks: Pandas,NumPy, TensorFlow, Scikit-learn
Data Visualization Tools: Matplotlib, Tableau .
Databases: SQL, MongoDB Cloud Platforms: AWS, Google Cloud

List di erent types of attributes. Attributes can be broadly categorized into two types:
Categorical Attributes:
Nominal: No inherent order (e.g., color, gender).
Ordinal: Has a natural order (e.g., low, medium, high).
Numerical Attributes:
Discrete: Countable values (e.g., number of children).
Continuous: Inﬁnitely many possible values (e.g., height, weight).

State the methods of feature selection.There are several methods for feature selection in
machine learning:
-Filter Methods: Statistical measures like correlation, chi-square test, and information gain
are used to rank features.
-Wrapper Methods: Algorithms like forward selection, backward elimination, and recursive
feature elimination evaluate subsets of features.
-Embedded Methods: Feature selection is integrated into the model building process, suchas
regularization techniques like L1 and L2 regularization.
On the basis of Structured data Unstructured data

It is based on a relational It is based on character and binary

Technology
database. data.

Structured data is less

There is an absence of schema, so it
Flexibility flexible and schema-
is more flexible.
dependent.

It is hard to scale database

Scalability It is more scalable.
schema.

Robustness It is very robust. It is less robust.

Here, we can perform a While in unstructured data, textual

structured query that queries are possible, the
Performance
allows complex joining, so performance is lower than semi-
the performance is higher. structured and structured data.

Structured data is
It is qualitative, as it cannot be
quantitative, i.e., it consists
Nature processed and analyzed using
of hard numbers or things
conventional tools.
that can be counted.

It has a variety of formats, i.e., it

Format It has a predefined format. comes in a variety of shapes and
sizes.

Searching for unstructured data is

Analysis It is easy to search.
more difficult.

Deﬁne variance.-Variance is a statistical measure that quantiﬁes the dispersion or

spread of data points from their mean. It calculates the average squared di erence
between each data point and the mean. A higher variance indicates greater variability in
the data,while a lower variance indicates less variability.

What is nominal attribute?-Nominal attributes are categorical data where the values
represent di erent categories or labels without any inherent order or ranking. Examples
include gender, color, or country.
Explain two methods of data cleaning for missing values. -->
Deletion:
Listwise deletion: Removes entire records with missing values.
Pairwise deletion: Excludes cases with missing values only for speciﬁc analyses.
Simple to implement but can lead to loss of information.
Imputation:
Mean/Median/Mode Imputation: Replaces missing values with the mean, median, or
mode of the respective variable.
Regression Imputation: Predicts missing values using regression models.
Hot Deck Imputation: Replaces missing values with values from similar records.

Explain role of statistics in data science.

Statistics plays a crucial role in data science by providing the tools and techniques to
analyze, interpret, and draw meaningful insights from data. It helps in:
Data exploration and cleaning: Identifying patterns, anomalies, and missing values.
Feature engineering: Creating new features from existing ones.
Model building and evaluation: Selecting appropriate models, training, and evaluating their
performance.
Hypothesis testing: Making inferences about the population based on sample data.
Data visualization: Creating informative visualizations to communicate ﬁndings.

Deﬁne volume characteristic of data in reference to data science.

Volume in the context of data science refers to the sheer size and quantity of data being
generated and stored.As technology advances, the volume of data generated by various
sources(e.g., social media, IoT devices, scientiﬁc experiments) is rapidly increasing. This
massive volume of data presents both challenges and opportunities for data scientists,
requiring specialized tools and techniques to store, process, and analyze it e ectively.

Give examples of semistructured data.

<book>
<title>The Lord of the Rings</title>
<author>J.R.R. Tolkien</author>
<genre>Fantasy</genre>
</book>
Semistructured Data is data that doesn't conform to a rigid, predeﬁned data model. It has a
partial structure, often using tags or markers to delimit data elements. Examples include
XML, JSON, and HTML. While it lacks the strict structure of relational databases, it o ers
ﬂexibility for representing complex information.

What is one hot coding?

One-hot encoding is a technique used to convert categorical data into numerical data. It
creates a new binary feature for each category, assigning a value of 1 to the corresponding
category and 0 to others. This allows machine learning algorithms to process categorical
data e ectively.

Rb183210 Mpa Craft Guidebook FA
100% (1)
Rb183210 Mpa Craft Guidebook FA
23 pages
D&D 4th Edition - Adventurer's Vault 2
100% (6)
D&D 4th Edition - Adventurer's Vault 2
159 pages
Nikko New Product - Catalogue
No ratings yet
Nikko New Product - Catalogue
32 pages
Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
94 pages
Conti USA IFS Hydraulic Hoses Fittings Catalog 2016
No ratings yet
Conti USA IFS Hydraulic Hoses Fittings Catalog 2016
444 pages
Logica Portfolio-1
No ratings yet
Logica Portfolio-1
10 pages
Gravity Light Project
No ratings yet
Gravity Light Project
16 pages
On Job Annual Training Plan 2023
No ratings yet
On Job Annual Training Plan 2023
3 pages
Crash Course Data Science
No ratings yet
Crash Course Data Science
7 pages
Module 1 - BCS602 - Chapter 02
No ratings yet
Module 1 - BCS602 - Chapter 02
90 pages
March Apr Current RAS NEW (1) 1
No ratings yet
March Apr Current RAS NEW (1) 1
40 pages
Data Mining
No ratings yet
Data Mining
77 pages
SOLUTION, SUSPENSION and COLLOID Activity Sheet
67% (3)
SOLUTION, SUSPENSION and COLLOID Activity Sheet
1 page
D2R Season 9 Charger Paladin Build (D2R 2.8)
No ratings yet
D2R Season 9 Charger Paladin Build (D2R 2.8)
22 pages
A Review On Lifting Beams: July 2017
No ratings yet
A Review On Lifting Beams: July 2017
14 pages
Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
CDS - Unit 2
No ratings yet
CDS - Unit 2
31 pages
MODULE 3 Job Order Costing PDF
100% (1)
MODULE 3 Job Order Costing PDF
9 pages
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
No ratings yet
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
12 pages
Đề thi minh họa số 16
No ratings yet
Đề thi minh họa số 16
6 pages
Cse2026 Module 1 & 2 Detailed Notes
No ratings yet
Cse2026 Module 1 & 2 Detailed Notes
185 pages
Test Initial Engleza Clasa A 8 A
No ratings yet
Test Initial Engleza Clasa A 8 A
2 pages
FDS - 4 Solved
No ratings yet
FDS - 4 Solved
21 pages
Project VBA: How and Why It Can Make You A Project Guru!
No ratings yet
Project VBA: How and Why It Can Make You A Project Guru!
14 pages
FDS - 3 Solved
No ratings yet
FDS - 3 Solved
21 pages
Worksheet KTSP - Kelas 7
No ratings yet
Worksheet KTSP - Kelas 7
31 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
Borang
No ratings yet
Borang
1 page
CFor Speed Setup
No ratings yet
CFor Speed Setup
13 pages
Quiz Ecology
No ratings yet
Quiz Ecology
9 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
100 pages
RSPile Tutorials - 1 - Axially Loaded Piles
No ratings yet
RSPile Tutorials - 1 - Axially Loaded Piles
14 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
FSBC01 The Use of Repair and Maintenance Budget For Buildings
No ratings yet
FSBC01 The Use of Repair and Maintenance Budget For Buildings
5 pages
Unit 2 - Data Representation
No ratings yet
Unit 2 - Data Representation
44 pages
Assignment DSBDS Insem
No ratings yet
Assignment DSBDS Insem
6 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Lexicology Summary 1
No ratings yet
Lexicology Summary 1
1 page
DS&ML 4
No ratings yet
DS&ML 4
9 pages
FDS Sem5
No ratings yet
FDS Sem5
20 pages
Glossary Terms From Module 4
No ratings yet
Glossary Terms From Module 4
5 pages
Data Science
No ratings yet
Data Science
12 pages
FDS - 5 Solved
No ratings yet
FDS - 5 Solved
13 pages
Inputs and Outputs List Page:1/21: Example-9: Sequential Control of Induction Motors
No ratings yet
Inputs and Outputs List Page:1/21: Example-9: Sequential Control of Induction Motors
7 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
23 pages
Important Questions
No ratings yet
Important Questions
26 pages
DM - Midsem - Question Bank
No ratings yet
DM - Midsem - Question Bank
5 pages
Unit - 2
No ratings yet
Unit - 2
17 pages
DAI101 4 Data Preparation
No ratings yet
DAI101 4 Data Preparation
45 pages
Mod1 DM Part2
No ratings yet
Mod1 DM Part2
34 pages
Module 8
No ratings yet
Module 8
13 pages
Timber Formwork Design
No ratings yet
Timber Formwork Design
12 pages
FDS - 2 Solved
No ratings yet
FDS - 2 Solved
14 pages
Unit 3 Data Analytics
No ratings yet
Unit 3 Data Analytics
16 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
Fds Print
No ratings yet
Fds Print
7 pages
DA Unit 2 Trio 1
No ratings yet
DA Unit 2 Trio 1
26 pages
21BCAD5C01 IDA Module 2 Notes
No ratings yet
21BCAD5C01 IDA Module 2 Notes
16 pages
DVP Unit1
No ratings yet
DVP Unit1
44 pages
Data Science Four Marks Qa
No ratings yet
Data Science Four Marks Qa
4 pages
FDS CH 3
No ratings yet
FDS CH 3
2 pages
Data Mining
No ratings yet
Data Mining
34 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
OJCST Vol13 N2-3 P 78-81
No ratings yet
OJCST Vol13 N2-3 P 78-81
4 pages
DMTN
No ratings yet
DMTN
17 pages
Data Mining
No ratings yet
Data Mining
5 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
Fds Csheet and Read The Rule
No ratings yet
Fds Csheet and Read The Rule
4 pages
JAVA Advanced 3
No ratings yet
JAVA Advanced 3
19 pages
Filipino Inventors
0% (1)
Filipino Inventors
2 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
The Trade - Offs of ChatGPT To Filipino Freelance Content Writers A Diffusion of Innovation Theory Perspective
No ratings yet
The Trade - Offs of ChatGPT To Filipino Freelance Content Writers A Diffusion of Innovation Theory Perspective
7 pages
Unit 1
No ratings yet
Unit 1
8 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Data Analytics (Finished
No ratings yet
Data Analytics (Finished
4 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
Foundation of Data Science Imp Notes
No ratings yet
Foundation of Data Science Imp Notes
6 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Director of Training
No ratings yet
Director of Training
2 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
DSO Organizational Chart - by Michael W. Davis, DDS
No ratings yet
DSO Organizational Chart - by Michael W. Davis, DDS
1 page
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

FDS Most Imp Question

Uploaded by

FDS Most Imp Question

Uploaded by

What do you mean by Primary Data- Primary data refers to information that is collected

Deﬁne Hypothesis Testing- Hypothesis testing is a statistical method used to determine

List the visualization libraries in python.

List applications of data science.

Deﬁne standard deviation-Standard deviation is the measurement of the dispersion of the

What is outlier. State types of outliers

Explain Data transformation and there types

Write details notes on basic data visualization tools

How do you visualize geospatial data. Explain in detail

Deﬁne Interquartile range

Explain data cube-When data is grouped or combined in multidimensional matrices called

Deﬁne statistical data analysis:-Statistical Analysis means gathering, understanding, and

Explain any four data visualization tools.

Explain null and alternate hypothesis.

What is venn diagram? How to create it? Explain with example.

What is inferential statistics?

Explain any two data transformation technique in detail.

Write a short note on feature extraction.

Explain any two tools in data scientist tool box.

Write a short note on wordclouds.

Explain any two ways in which data is stored in ﬁles.

Types of Descriptive Statistics

What is Visual Encoding

Deﬁne Box Plot

What are Donut Charts

Deﬁne Area Chart

Life Cycle of Data Science:

Data Scientist's Toolbox:

It is based on a relational It is based on character and binary

Structured data is less

It is hard to scale database

Robustness It is very robust. It is less robust.

Here, we can perform a While in unstructured data, textual

It has a variety of formats, i.e., it

Searching for unstructured data is

Deﬁne variance.-Variance is a statistical measure that quantiﬁes the dispersion or

Explain role of statistics in data science.

Deﬁne volume characteristic of data in reference to data science.

Give examples of semistructured data.

What is one hot coding?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.