0% found this document useful (0 votes)

9 views5 pages

Data Science Sample

The document contains a comprehensive set of questions and tasks related to data science, covering topics such as data classification, statistical analysis, data manipulation using NumPy and Pandas, and the creation of frequency tables. It also includes practical exercises on data handling, including joins, aggregations, and handling missing values. The questions are designed to assess understanding and application of data science concepts in real-world scenarios.

Uploaded by

santhoshkrishnan.r.it.28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views5 pages

Data Science Sample

Uploaded by

santhoshkrishnan.r.it.28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

DATA SCIENCE

TWO MARKS:

1. Define Data Science and explain its benefits and uses in real-world applications.

2. What are Frequency Tables and Contingency Tables?

3. A dataset contains information on employee Salaries and Years of Experience. Use a NumPy
array to classify the data based on experience level (e.g., 0-5 years, 6-10 years, 11+ years). Also,
show how to cross-classify data by experience level and salary range.

4. You have a NumPy array of daily sales data for a company over a week: [100, 200, 150, 175,
125, 210, 180]. Calculate the mean, median, and standard deviation of the sales.

5. Given the following NumPy array representing the temperature data for a week in two cities:
[22.5, 24.3, 21.8, 19.6], [25.3, 27.1, 26.9, 24.8]. Use reshape, flatten, and ravel on this array.

6. Given the following dataset:

Name | Age | Gender
John | 25 | Male
Alice| 30 | Female
Bob | 22 | Male
Eve | 35 | Female
➢ Create a DataFrame from this data.
➢ Demonstrate how to set a custom index based on the "Name" column.

7. Explain the concept of broadcasting in element-wise operations with an example.

9. Outline the difference between DataFrames and Series in Pandas. How do these two
structures facilitate data analysis?

10. A DataFrame df has an index of EmployeeID and a column Salary. The data for the
DataFrame is as follows: EmployeeID: [102, 101, 104, 103], Salary: [50000, 60000, 45000,
55000]. The task is to create a DataFrame with this data and then sort the DataFrame by the
EmployeeID index in ascending order.

THIRTEEN MARKS:

11. a) An e-commerce platform collects data on its customers, their shopping behavior, and
feedback to improve user experience. The data includes factors such as product preferences,
purchase time, and customer satisfaction ratings. Below is a snapshot of the collected data from a
week of transactions:
Customer Delivery
Customer Product Order Payment
Order ID Gender Price (₹) Rating Time
Age Category Time Method
(Out of 5) (mins)
Electronic Credit
101 25 Male 14:30 15,000 4.5 45
s Card
102 30 Female Clothing 18:15 2,500 UPI 3.8 30
103 22 Male Groceries 09:45 800 Cash 4.2 25
Home Debit
104 27 Female 12:00 4,200 4 40
Decor Card

I. Identify the types of variables present in the dataset. (3M)

II. Provide examples of two qualitative and two quantitative variables from the dataset and explain
how they differ. (4M)

b) Differentiate between Data Science, Machine Learning, and Artificial Intelligence. (3m)

List and explain the major applications of Data Science in different sectors.(3m)

OR
12. A supermarket chain wants to analyze the daily sales volume (number of items sold) across its
stores over 30 days. The data collected represents the number of items sold per day at a specific
store.
Tasks:
➢ Create a frequency table for the given daily sales data by grouping it into appropriate class
intervals (e.g., 0-50, 51-100, etc.). (3m)
➢ Determine the modal class (the class interval with the highest frequency). (3m)
➢ Calculate the cumulative frequency for each class interval. (3m)
➢ Find the relative frequency (percentage of total days falling into each class interval). (2m)
➢ Interpret the results: What do the frequency patterns suggest about sales performance? (2m)
➢ Given Data (Number of Items Sold per Day in 30 Days): 45, 78, 120, 60, 55, 89, 100, 130, 40,
75, 150, 200, 110, 95, 125, 140, 50, 85, 160, 175, 95, 105, 115, 70, 90, 180, 135, 145, 80, 155

13. a) Explain the process of creating frequency table for qualitative data. (5m)
b) A university conducted a survey among students to understand their preferred mode of learning.
The students were asked:
"Which learning method do you prefer the most?"
The responses were categorized as: Online Classes, In-Person Classes, Hybrid Learning (Both
Online & In-Person)
Tasks:
➢ Construct a frequency table displaying the number of students choosing each mode. (2M)
➢ Find the relative frequency (percentage) for each category. (2M)
➢ Determine the most preferred and least preferred learning mode. (2M)
➢ Recommend strategies for the university based on the findings. (2M)
OR
14. a) Discuss the various measures of dispersion and explain their significance. The following are
the test scores of 10 students in a class: 55, 60, 65, 70, 75, 80, 85, 90, 95, 100.
Calculate the range, variance, and standard deviation of the scores. Explain what these measures
reveal about the variability in the students' performance.(7m)
b) Explain the different measures of central tendency. A company records the number of products
sold in a week: 100, 200, 150, 300, 250, 100, 200. Calculate the mean, median, and mode of the
sales data. Interpret what these values suggest about the sales pattern. (6m)
15. a) A store records monthly sales data for different products across 3 months. The data is
structured as follows:
Product 1: [1500, 1600, 1700]
Product 2: [2000, 2100, 2200]
Product 3: [1800, 1900, 2000]
(a) Create a 2D NumPy array from the given sales data.(2 Marks)
(b) Using aggregation functions, calculate:
The total sales for each month (column-wise sum).
The average sales for each product (row-wise mean).
The standard deviation of sales for each product.(3 Marks)
(c) Use np.ravel() and np.flatten() on the reshaped array to convert it into a 1D array. Explain the
difference between the two functions.(2 Marks)
b) What are structured arrays in NumPy? How are they defined and used? Discuss how to access
and modify elements in a structured array. Provide examples of creating structured arrays and
accessing specific fields.(6m)
OR
16. i. A company records the salaries of its employees across 5 departments. The salary data for
each department (in thousands) is given as:
Department 1: [400, 450, 500, 550]
Department 2: [420, 480, 520, 560]
Department 3: [390, 430, 470, 510]
Department 4: [410, 460, 510, 550]
Department 5: [430, 480, 530, 570]
(a) Create a 2D NumPy array for the salary data of the employees in all 5 departments.(2 M)
(b) Perform the following:
➢ Calculate the average salary for each department (row-wise mean).
➢ Find the highest salary in each department (row-wise max).
➢ Calculate the total salary of all employees in each department.(3 M)
(c) Use fancy indexing to extract the salary data of employees from Department 2 and Department
4. (2 M)
(d) Use logical operations to find employees in any department who earn more than 500.
(2 M)
ii. Explain element-wise comparison operations in NumPy and provide an example. (4m)

17. a) Explain the different ways of creating Dataframes. Discuss the primary syntax and functions
used to manipulate data in Pandas, such as: df.head(), df.describe() and df.info() (7M)
b) You are working with product sales data for an online store. The dataset contains the following
columns:
{"Product Name": ["Product A", "Product B", "Product C", "Product D"], "Category":
["Electronics", "Furniture", "Electronics", "Furniture"], "Units Sold": [100, 200, 150, 300], "Price":
[200, 150, 180, 250]}
(a) Create a Pandas DataFrame from the given data and add a new column called Total Sales. Use
vectorized operations to calculate the total sales.(2 M)
(b) Select and display the products in the Electronics category using label-based indexing.(2M)
(c) Use position-based indexing to select the second and third products (based on their row
positions). (2 M)

OR
18. a) What is hierarchical indexing in Pandas? How do you create a multi-level index in a Pandas
DataFrame? Provide an example. (7m)
b) Consider a company that tracks employee salaries based on their departments and job
positions. The dataset has a multi-level index where the first level is the department and the
second level is the job position. The dataset includes employee names and their salaries.
"Employee Name": ["Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Hank"],
"Salary": [60000, 50000, 70000, 80000, 55000, 75000, 85000, 65000],
"Department": ["HR", "IT", "IT", "HR", "Finance", "Finance", "IT", "HR"],
"Job Position": ["Manager", "Developer", "Developer", "Assistant", "Manager", "Assistant",
"Developer", "Manager"]
i. Create a DataFrame with the employee salary data and hierarchical indexing based on the
Department and Job Position.
ii. Using the hierarchical index, find the salary of the employee working in the IT department as
a Developer.
iii. Extract all employee data from the HR department and show the salary of all job positions in
that department.
iv. Calculate the average salary for each Job Position across all departments.
v. Reindex the DataFrame so that Job Position is the primary index and Department is the
secondary index. Provide the new DataFrame. (6M)

19. a) Describe the different types of joins available in Pandas. Explain with examples of each
type.(8m)
b) Two datasets are provided containing product sales data for January and February 2024. The
first dataset includes the product details for January, such as Product ID, Product Name, Quantity
Sold, Price, and Order Date. Similarly, the second dataset contains the same information for
February. The task is to concatenate these two datasets into a single DataFrame using Pandas'
concat() function. After concatenating, it is important to address how the index is handled—
whether it resets automatically or needs manual adjustment. Use sample data to implement the
same. (5m)

20. Explain the different set operations using pandas. A company maintains employee records
with skillsets and training sessions attended across two departments: Sales and Marketing. The
data includes employee IDs, names, skills, and training programs. HR wants to analyze the skills
distribution and identify training gaps.
Dataset 1: Sales Department Skills
Employee ID Employee Name Skills Training Attended
1001 John Sales, Negotiation, Excel Negotiation Skills Workshop
1002 Alice Sales, CRM, Excel CRM Basics Training
1003 Steve Sales, Communication, Excel Sales Excellence Program
1004 Nancy Sales, Presentation, Excel Presentation Skills Training
1005 Mark Sales, Negotiation Sales Mastery Workshop
Dataset 2: Marketing Department Skills
Employee ID Employee Name Skills Training Attended
2001 Sarah Marketing, SEO, Excel SEO Optimization Training
2002 Michael Marketing, Content Creation, Excel Content Marketing Workshop
2003 Linda Marketing, Social Media, Excel Social Media Strategy Course
2004 Greg Marketing, SEO SEO Mastery Program
2005 Emma Marketing, Content Creation Content Marketing Workshop

➢ Use the union operation to combine skills from both departments.

➢ Identify common skills using the intersection operation.
➢ Find department-specific skills using set difference.
➢ Analyze training gaps by comparing skills and training attended.

FIFTEEN MARKS:
21. a) Discuss the various strategies to handle missing values in Pandas. How do the methods
.fillna(), .dropna(), and .isna() help in this context? (5M)
b) The following dataset provides sales information from an e-commerce platform. The goal is to
analyze the data and perform several data operations using Pandas.
Product Quantity Discount Customer
Order ID Category Price Order Date Order Status
Name Sold Applied Age
1 Laptop Electronics 1000 1 10 01-02-2024 25 Completed
2 Table Furniture 200 2 15 02-02-2024 34 Cancelled
3 Shirt Apparel 50 5 0 03-02-2024 45 Completed
4 Phone Electronics 800 1 5 04-02-2024 23 Pending
5 Shoes Apparel 100 3 20 05-02-2024 30 Completed

(a) Handle missing data by filling missing Customer Age with the mean age of customers.
(b) Drop any rows where Order Status is missing.
(c) Calculate the Total Sales for each product, considering the discount applied.
(d) Create a new column called Discounted Price after applying the discount to the original price.
(e) Calculate the Total Revenue of the platform by summing up all Total Sales values.
(f) Group the data by Category and calculate the total sales and profit for each category. (10M)
OR
22.a) Explain the concept of GroupBy in Pandas. How does it help in data aggregation and
transformation?
b) What is a pivot table? How can it be used to summarize data by rearranging and aggregating it?
Provide an example where a pivot table would be useful.
c) A company has sales data for the past year, including the region, product category, and monthly
sales figures. The management wants to analyze the total sales, average sales, and monthly growth
for different regions and product categories.

Dataset:
Region Product Category Month Sales Amount
North Electronics Jan 15000
South Furniture Jan 20000
North Electronics Feb 17000
South Furniture Feb 19000
East Clothing Jan 10000
East Clothing Feb 15000
West Electronics Jan 12000
West Clothing Jan 11000
Tasks:
➢ GroupBy: Group the data by Region and calculate the total sales and average sales for
each region.
➢ Pivot Table: Create a pivot table that summarizes sales amount by Region and Product
Category. Show the total sales for each combination of Region and Product Category.
➢ Monthly Growth: Calculate the monthly growth of sales in each region (from January to
February) using the mean sales value.
➢ Based on the analysis, identify the top-performing region and product category in terms of
total sales.

1 - Workday HCM Main Book-1-81 PDF
84% (37)
1 - Workday HCM Main Book-1-81 PDF
81 pages
Informatics Practices Practical File Class 12th - Pandas, Matplotlib & SQL Questions With Solutions
100% (1)
Informatics Practices Practical File Class 12th - Pandas, Matplotlib & SQL Questions With Solutions
27 pages
BDA Important Questions
No ratings yet
BDA Important Questions
3 pages
Sample Questions For XII IP
No ratings yet
Sample Questions For XII IP
59 pages
PRACTICAL LIST CLASS-XII (INFO. PRACTICALS - fINAL PDF
100% (1)
PRACTICAL LIST CLASS-XII (INFO. PRACTICALS - fINAL PDF
8 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
Data Science
No ratings yet
Data Science
10 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Text 3
No ratings yet
Text 3
3 pages
FDS - 1 Solved
No ratings yet
FDS - 1 Solved
17 pages
Data Science Notes
No ratings yet
Data Science Notes
44 pages
FDS Record 5-8
No ratings yet
FDS Record 5-8
15 pages
23HCS4142 PDF
No ratings yet
23HCS4142 PDF
24 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
Soal Latihan IT Specialist Data Analytics
No ratings yet
Soal Latihan IT Specialist Data Analytics
12 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Home Assignment Dataliteracy
No ratings yet
Home Assignment Dataliteracy
4 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
6 pages
Python Lab Manual
No ratings yet
Python Lab Manual
33 pages
VIP Question Bank For DPV For Theory Exam
No ratings yet
VIP Question Bank For DPV For Theory Exam
6 pages
Pandas NumPy Practice Questions
No ratings yet
Pandas NumPy Practice Questions
2 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
QB For DS - V Sem Students
No ratings yet
QB For DS - V Sem Students
23 pages
Data Science 500 Assignment
No ratings yet
Data Science 500 Assignment
6 pages
Assignment Questions - Data Analysis and Visualization Using Power BI and Tableau
No ratings yet
Assignment Questions - Data Analysis and Visualization Using Power BI and Tableau
2 pages
Fds Answers
No ratings yet
Fds Answers
53 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Ip Final Practical File
No ratings yet
Ip Final Practical File
22 pages
IP Practical File2
No ratings yet
IP Practical File2
35 pages
Self Practical File Tina Gupta
No ratings yet
Self Practical File Tina Gupta
45 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
100 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
Syllabus Sem 6
No ratings yet
Syllabus Sem 6
6 pages
FDS Important Q
No ratings yet
FDS Important Q
5 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Data Science Assignments
No ratings yet
Data Science Assignments
6 pages
FDS Ii Ans Key PDF
No ratings yet
FDS Ii Ans Key PDF
50 pages
Practical File 12.
No ratings yet
Practical File 12.
22 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
12 Ip Set A Anskey
No ratings yet
12 Ip Set A Anskey
17 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
IP Question Paper 2020-2021
No ratings yet
IP Question Paper 2020-2021
9 pages
QP-1PB-IP-2024 Set 1
No ratings yet
QP-1PB-IP-2024 Set 1
9 pages
Data - Mining 1 18 36
No ratings yet
Data - Mining 1 18 36
19 pages
Kushal Kadayat
No ratings yet
Kushal Kadayat
33 pages
Ip MS
No ratings yet
Ip MS
6 pages
Python Module 5
No ratings yet
Python Module 5
19 pages
II CSE - A&B (96) DS-int 1 QP ANS-set1
No ratings yet
II CSE - A&B (96) DS-int 1 QP ANS-set1
7 pages
End Semester Answer Key Format-Fods
No ratings yet
End Semester Answer Key Format-Fods
8 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Dav End Sem
No ratings yet
Dav End Sem
2 pages
Bca212 Ids 2023
No ratings yet
Bca212 Ids 2023
3 pages
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
BCM54684D0KFBG
No ratings yet
BCM54684D0KFBG
1 page
Print Media Research HC
100% (1)
Print Media Research HC
27 pages
Marketing Plan ON Samsung Led TV: Group Member
No ratings yet
Marketing Plan ON Samsung Led TV: Group Member
13 pages
x15 Hbo Max Accounts #BRAZIL
No ratings yet
x15 Hbo Max Accounts #BRAZIL
2 pages
Transmittal Memo
No ratings yet
Transmittal Memo
10 pages
2 PT Global Business Foundation With All Majors and Elective Module Outline
No ratings yet
2 PT Global Business Foundation With All Majors and Elective Module Outline
3 pages
A Zero Wage Increase Again?
No ratings yet
A Zero Wage Increase Again?
9 pages
Orgalime S-2022-EN
No ratings yet
Orgalime S-2022-EN
6 pages
Growatt Warranty Procedure - 07-09-2020
No ratings yet
Growatt Warranty Procedure - 07-09-2020
9 pages
Star
No ratings yet
Star
1 page
GS 08 50208 SMDS Wax Plant Expansion PPD RevB1 PDF
No ratings yet
GS 08 50208 SMDS Wax Plant Expansion PPD RevB1 PDF
28 pages
Rir'Z/: Department of Public Works and Highways
No ratings yet
Rir'Z/: Department of Public Works and Highways
2 pages
LM01 Introduction To Commodities and Commodity Derivatives IFT Notes
No ratings yet
LM01 Introduction To Commodities and Commodity Derivatives IFT Notes
26 pages
Text
No ratings yet
Text
3 pages
CH3 Numerical Problems
No ratings yet
CH3 Numerical Problems
9 pages
Ch13 Investor Behavior & Market Efficiency
No ratings yet
Ch13 Investor Behavior & Market Efficiency
47 pages
IV (2024) CPJ 38 (U.T. CHD.)
No ratings yet
IV (2024) CPJ 38 (U.T. CHD.)
9 pages
Unit 4
No ratings yet
Unit 4
17 pages
True Wealth NP en
No ratings yet
True Wealth NP en
8 pages
Bill Statement: Mobile Number 019-7633236
No ratings yet
Bill Statement: Mobile Number 019-7633236
4 pages
Property Casualty Appetite Guide
No ratings yet
Property Casualty Appetite Guide
2 pages
DevOps Presentation YKB 20052024 v7.0 Part2
No ratings yet
DevOps Presentation YKB 20052024 v7.0 Part2
60 pages
Max 50W
No ratings yet
Max 50W
1 page
Ra-040808 Professional Teacher - Secondary (Mapeh) Tacloban 9-2021
No ratings yet
Ra-040808 Professional Teacher - Secondary (Mapeh) Tacloban 9-2021
18 pages
OT - FORM Oct
No ratings yet
OT - FORM Oct
1 page
Proposal Erna Masfiza
100% (1)
Proposal Erna Masfiza
35 pages
Chapter 8 - Human Resource Management
No ratings yet
Chapter 8 - Human Resource Management
24 pages
Tambunting CHAPTER 1 4 GROUP 5 FINAL 1
No ratings yet
Tambunting CHAPTER 1 4 GROUP 5 FINAL 1
36 pages
Dbms
No ratings yet
Dbms
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Science Sample

Uploaded by

Data Science Sample

Uploaded by

DATA SCIENCE

2. What are Frequency Tables and Contingency Tables?

6. Given the following dataset:

7. Explain the concept of broadcasting in element-wise operations with an example.

I. Identify the types of variables present in the dataset. (3M)

➢ Use the union operation to combine skills from both departments.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.