0% found this document useful (0 votes)
9 views5 pages

Data Science Sample

The document contains a comprehensive set of questions and tasks related to data science, covering topics such as data classification, statistical analysis, data manipulation using NumPy and Pandas, and the creation of frequency tables. It also includes practical exercises on data handling, including joins, aggregations, and handling missing values. The questions are designed to assess understanding and application of data science concepts in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Data Science Sample

The document contains a comprehensive set of questions and tasks related to data science, covering topics such as data classification, statistical analysis, data manipulation using NumPy and Pandas, and the creation of frequency tables. It also includes practical exercises on data handling, including joins, aggregations, and handling missing values. The questions are designed to assess understanding and application of data science concepts in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DATA SCIENCE

TWO MARKS:

1. Define Data Science and explain its benefits and uses in real-world applications.

2. What are Frequency Tables and Contingency Tables?

3. A dataset contains information on employee Salaries and Years of Experience. Use a NumPy
array to classify the data based on experience level (e.g., 0-5 years, 6-10 years, 11+ years). Also,
show how to cross-classify data by experience level and salary range.

4. You have a NumPy array of daily sales data for a company over a week: [100, 200, 150, 175,
125, 210, 180]. Calculate the mean, median, and standard deviation of the sales.

5. Given the following NumPy array representing the temperature data for a week in two cities:
[22.5, 24.3, 21.8, 19.6], [25.3, 27.1, 26.9, 24.8]. Use reshape, flatten, and ravel on this array.

6. Given the following dataset:


Name | Age | Gender
John | 25 | Male
Alice| 30 | Female
Bob | 22 | Male
Eve | 35 | Female
➢ Create a DataFrame from this data.
➢ Demonstrate how to set a custom index based on the "Name" column.

7. Explain the concept of broadcasting in element-wise operations with an example.


8. Given the following datasets:
Dataset 1: Dataset 2:
ID | Name | Age ID | Department
1 | John | 25 1 | Sales
2 | Alice | 30 2 | Marketing
3 | Bob | 22 3 | HR
Perform an inner join on both datasets using the ID column.
Perform a left join to include all records from Dataset 1 and matching records from Dataset 2.

9. Outline the difference between DataFrames and Series in Pandas. How do these two
structures facilitate data analysis?

10. A DataFrame df has an index of EmployeeID and a column Salary. The data for the
DataFrame is as follows: EmployeeID: [102, 101, 104, 103], Salary: [50000, 60000, 45000,
55000]. The task is to create a DataFrame with this data and then sort the DataFrame by the
EmployeeID index in ascending order.

THIRTEEN MARKS:

11. a) An e-commerce platform collects data on its customers, their shopping behavior, and
feedback to improve user experience. The data includes factors such as product preferences,
purchase time, and customer satisfaction ratings. Below is a snapshot of the collected data from a
week of transactions:
Customer Delivery
Customer Product Order Payment
Order ID Gender Price (₹) Rating Time
Age Category Time Method
(Out of 5) (mins)
Electronic Credit
101 25 Male 14:30 15,000 4.5 45
s Card
102 30 Female Clothing 18:15 2,500 UPI 3.8 30
103 22 Male Groceries 09:45 800 Cash 4.2 25
Home Debit
104 27 Female 12:00 4,200 4 40
Decor Card

I. Identify the types of variables present in the dataset. (3M)

II. Provide examples of two qualitative and two quantitative variables from the dataset and explain
how they differ. (4M)

b) Differentiate between Data Science, Machine Learning, and Artificial Intelligence. (3m)

List and explain the major applications of Data Science in different sectors.(3m)

OR
12. A supermarket chain wants to analyze the daily sales volume (number of items sold) across its
stores over 30 days. The data collected represents the number of items sold per day at a specific
store.
Tasks:
➢ Create a frequency table for the given daily sales data by grouping it into appropriate class
intervals (e.g., 0-50, 51-100, etc.). (3m)
➢ Determine the modal class (the class interval with the highest frequency). (3m)
➢ Calculate the cumulative frequency for each class interval. (3m)
➢ Find the relative frequency (percentage of total days falling into each class interval). (2m)
➢ Interpret the results: What do the frequency patterns suggest about sales performance? (2m)
➢ Given Data (Number of Items Sold per Day in 30 Days): 45, 78, 120, 60, 55, 89, 100, 130, 40,
75, 150, 200, 110, 95, 125, 140, 50, 85, 160, 175, 95, 105, 115, 70, 90, 180, 135, 145, 80, 155

13. a) Explain the process of creating frequency table for qualitative data. (5m)
b) A university conducted a survey among students to understand their preferred mode of learning.
The students were asked:
"Which learning method do you prefer the most?"
The responses were categorized as: Online Classes, In-Person Classes, Hybrid Learning (Both
Online & In-Person)
Tasks:
➢ Construct a frequency table displaying the number of students choosing each mode. (2M)
➢ Find the relative frequency (percentage) for each category. (2M)
➢ Determine the most preferred and least preferred learning mode. (2M)
➢ Recommend strategies for the university based on the findings. (2M)
OR
14. a) Discuss the various measures of dispersion and explain their significance. The following are
the test scores of 10 students in a class: 55, 60, 65, 70, 75, 80, 85, 90, 95, 100.
Calculate the range, variance, and standard deviation of the scores. Explain what these measures
reveal about the variability in the students' performance.(7m)
b) Explain the different measures of central tendency. A company records the number of products
sold in a week: 100, 200, 150, 300, 250, 100, 200. Calculate the mean, median, and mode of the
sales data. Interpret what these values suggest about the sales pattern. (6m)
15. a) A store records monthly sales data for different products across 3 months. The data is
structured as follows:
Product 1: [1500, 1600, 1700]
Product 2: [2000, 2100, 2200]
Product 3: [1800, 1900, 2000]
(a) Create a 2D NumPy array from the given sales data.(2 Marks)
(b) Using aggregation functions, calculate:
The total sales for each month (column-wise sum).
The average sales for each product (row-wise mean).
The standard deviation of sales for each product.(3 Marks)
(c) Use np.ravel() and np.flatten() on the reshaped array to convert it into a 1D array. Explain the
difference between the two functions.(2 Marks)
b) What are structured arrays in NumPy? How are they defined and used? Discuss how to access
and modify elements in a structured array. Provide examples of creating structured arrays and
accessing specific fields.(6m)
OR
16. i. A company records the salaries of its employees across 5 departments. The salary data for
each department (in thousands) is given as:
Department 1: [400, 450, 500, 550]
Department 2: [420, 480, 520, 560]
Department 3: [390, 430, 470, 510]
Department 4: [410, 460, 510, 550]
Department 5: [430, 480, 530, 570]
(a) Create a 2D NumPy array for the salary data of the employees in all 5 departments.(2 M)
(b) Perform the following:
➢ Calculate the average salary for each department (row-wise mean).
➢ Find the highest salary in each department (row-wise max).
➢ Calculate the total salary of all employees in each department.(3 M)
(c) Use fancy indexing to extract the salary data of employees from Department 2 and Department
4. (2 M)
(d) Use logical operations to find employees in any department who earn more than 500.
(2 M)
ii. Explain element-wise comparison operations in NumPy and provide an example. (4m)

17. a) Explain the different ways of creating Dataframes. Discuss the primary syntax and functions
used to manipulate data in Pandas, such as: df.head(), df.describe() and df.info() (7M)
b) You are working with product sales data for an online store. The dataset contains the following
columns:
{"Product Name": ["Product A", "Product B", "Product C", "Product D"], "Category":
["Electronics", "Furniture", "Electronics", "Furniture"], "Units Sold": [100, 200, 150, 300], "Price":
[200, 150, 180, 250]}
(a) Create a Pandas DataFrame from the given data and add a new column called Total Sales. Use
vectorized operations to calculate the total sales.(2 M)
(b) Select and display the products in the Electronics category using label-based indexing.(2M)
(c) Use position-based indexing to select the second and third products (based on their row
positions). (2 M)

OR
18. a) What is hierarchical indexing in Pandas? How do you create a multi-level index in a Pandas
DataFrame? Provide an example. (7m)
b) Consider a company that tracks employee salaries based on their departments and job
positions. The dataset has a multi-level index where the first level is the department and the
second level is the job position. The dataset includes employee names and their salaries.
"Employee Name": ["Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Hank"],
"Salary": [60000, 50000, 70000, 80000, 55000, 75000, 85000, 65000],
"Department": ["HR", "IT", "IT", "HR", "Finance", "Finance", "IT", "HR"],
"Job Position": ["Manager", "Developer", "Developer", "Assistant", "Manager", "Assistant",
"Developer", "Manager"]
i. Create a DataFrame with the employee salary data and hierarchical indexing based on the
Department and Job Position.
ii. Using the hierarchical index, find the salary of the employee working in the IT department as
a Developer.
iii. Extract all employee data from the HR department and show the salary of all job positions in
that department.
iv. Calculate the average salary for each Job Position across all departments.
v. Reindex the DataFrame so that Job Position is the primary index and Department is the
secondary index. Provide the new DataFrame. (6M)

19. a) Describe the different types of joins available in Pandas. Explain with examples of each
type.(8m)
b) Two datasets are provided containing product sales data for January and February 2024. The
first dataset includes the product details for January, such as Product ID, Product Name, Quantity
Sold, Price, and Order Date. Similarly, the second dataset contains the same information for
February. The task is to concatenate these two datasets into a single DataFrame using Pandas'
concat() function. After concatenating, it is important to address how the index is handled—
whether it resets automatically or needs manual adjustment. Use sample data to implement the
same. (5m)

OR

20. Explain the different set operations using pandas. A company maintains employee records
with skillsets and training sessions attended across two departments: Sales and Marketing. The
data includes employee IDs, names, skills, and training programs. HR wants to analyze the skills
distribution and identify training gaps.
Dataset 1: Sales Department Skills
Employee ID Employee Name Skills Training Attended
1001 John Sales, Negotiation, Excel Negotiation Skills Workshop
1002 Alice Sales, CRM, Excel CRM Basics Training
1003 Steve Sales, Communication, Excel Sales Excellence Program
1004 Nancy Sales, Presentation, Excel Presentation Skills Training
1005 Mark Sales, Negotiation Sales Mastery Workshop
Dataset 2: Marketing Department Skills
Employee ID Employee Name Skills Training Attended
2001 Sarah Marketing, SEO, Excel SEO Optimization Training
2002 Michael Marketing, Content Creation, Excel Content Marketing Workshop
2003 Linda Marketing, Social Media, Excel Social Media Strategy Course
2004 Greg Marketing, SEO SEO Mastery Program
2005 Emma Marketing, Content Creation Content Marketing Workshop

➢ Use the union operation to combine skills from both departments.


➢ Identify common skills using the intersection operation.
➢ Find department-specific skills using set difference.
➢ Analyze training gaps by comparing skills and training attended.

FIFTEEN MARKS:
21. a) Discuss the various strategies to handle missing values in Pandas. How do the methods
.fillna(), .dropna(), and .isna() help in this context? (5M)
b) The following dataset provides sales information from an e-commerce platform. The goal is to
analyze the data and perform several data operations using Pandas.
Product Quantity Discount Customer
Order ID Category Price Order Date Order Status
Name Sold Applied Age
1 Laptop Electronics 1000 1 10 01-02-2024 25 Completed
2 Table Furniture 200 2 15 02-02-2024 34 Cancelled
3 Shirt Apparel 50 5 0 03-02-2024 45 Completed
4 Phone Electronics 800 1 5 04-02-2024 23 Pending
5 Shoes Apparel 100 3 20 05-02-2024 30 Completed

(a) Handle missing data by filling missing Customer Age with the mean age of customers.
(b) Drop any rows where Order Status is missing.
(c) Calculate the Total Sales for each product, considering the discount applied.
(d) Create a new column called Discounted Price after applying the discount to the original price.
(e) Calculate the Total Revenue of the platform by summing up all Total Sales values.
(f) Group the data by Category and calculate the total sales and profit for each category. (10M)
OR
22.a) Explain the concept of GroupBy in Pandas. How does it help in data aggregation and
transformation?
b) What is a pivot table? How can it be used to summarize data by rearranging and aggregating it?
Provide an example where a pivot table would be useful.
c) A company has sales data for the past year, including the region, product category, and monthly
sales figures. The management wants to analyze the total sales, average sales, and monthly growth
for different regions and product categories.

Dataset:
Region Product Category Month Sales Amount
North Electronics Jan 15000
South Furniture Jan 20000
North Electronics Feb 17000
South Furniture Feb 19000
East Clothing Jan 10000
East Clothing Feb 15000
West Electronics Jan 12000
West Clothing Jan 11000
Tasks:
➢ GroupBy: Group the data by Region and calculate the total sales and average sales for
each region.
➢ Pivot Table: Create a pivot table that summarizes sales amount by Region and Product
Category. Show the total sales for each combination of Region and Product Category.
➢ Monthly Growth: Calculate the monthly growth of sales in each region (from January to
February) using the mean sales value.
➢ Based on the analysis, identify the top-performing region and product category in terms of
total sales.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy