0% found this document useful (0 votes)
13 views2 pages

Assignment 1-SS-2023-24

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views2 pages

Assignment 1-SS-2023-24

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

The Hashemite University

Prince Al-Hussein bin Abdullah II Faculty for IT


Department of Information Technology
Advance Programming for DS and AI (2010042351)

Assignment 1: String manipulation, Files and


Exceptions
Due Date: 28-4-2024 11:59 PM
Max score: 100 points

Instructions:
1- Submit one python script file ({filename}.ipynb.
2- File naming: For a team of two students, name your Python script file using a
combination of both team members' university IDs. For instance, if the team members
have university IDs 123456 and 789012, the Python script file should be named
Team_123456_789012.ipynb.
3- send your code file to your instructor through the designated assignment.
4- Please keep in mind that late submissions will result in a ZERO score.
5- You must be able to discuss the details of your solution with your instructor.

The Dataset
Amazon Products Dataset 2023 Dataset is one of the biggest online retailers in the USA that
sells over 12 million products. With this dataset, you can get an in-depth idea of what
products sell best, the best price range for a product in a given category, and much more.
Files
1. amazon_products.csv
ID: Category identifier.
Category_name: Name of the category.

2. amazon_categories.csv with the following features:


 ASIN: Unique identifier assigned by Amazon to each product.
 Title: Name or description of the product.
 ImgUrl: URL of the product's image.
 ProductURL: URL of the product's page on Amazon.
 Stars: Average rating given to the product.
 Reviews: Total number of customer reviews for the product.
 Price: Current selling price of the product.
 ListPrice: Manufacturer's suggested retail price.
 Category_ID: Category identifier.
 IsBestSeller: Indicates if the product is a best-seller.
 BoughtInLastMonth: Indicates the quantity of items bought in the last month.

The amazon_products.csv and amazon_categories.csv are linked through a foreign key


relationship where the Category_ID' column in the 'amazon_products' references the 'id'
column in the 'amazon_categories', allowing us to connect each product to its corresponding
category.

Objective:
Develop a Python program to analyze Amazon product data, categorize products, extract relevant
information, and generate summary reports.

Tasks:
A. Read Data from Files (3 marks)
 Utilize the CSV module to read the "amazon_products.csv" Read the file line by line.
 Use the JSON module to read the "amazon_categories.json" file and load its contents into
a Python dictionary.
Ensure that the data is properly formatted and ready for analysis and reporting tasks.
B. Product Title Cleaning: (3 marks)
 Use the regular expression (re) package to clean up the product titles.
(1) In the title column, substitute any word within parentheses with the first letter of
each word.
(2) Remove any special characters or symbols from the product titles, leaving only
alphanumeric characters and parentheses.
C. Product Categorization: (3 marks)
 Write a Python code to classify the products into two separate files based on their
star ratings:
(1) File name: High_Rated_Products.csv: Products with star ratings greater than 4.5.
(2) File name: Standard_Rated_Products.csv: Products with star ratings less than or
equal to 4.5.
 Each file contains columns except Stars columns:

D. Summary Reports: (3 marks)


 Provide summary reports for each product category (High_Rated_Products,
Standard_Rated_Products).
 Include the count of products in each category and the average price.
 Write this information into a CSV file named "product_summary.csv".
product_summary.csv
Product Category, Number of Products, Average Price
High_Rated_Products, 2, $262.74
Standard_Rated_Products, 3, $246.41
E. Exceptions Handling: (3 points)
 Implement proper exception handling for file reading and writing operations.
 Display informative error messages for better understanding in case of failures, such as
file not found or permission issues.

-- Good Luck --

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy