Assignment 1-SS-2023-24
Assignment 1-SS-2023-24
Instructions:
1- Submit one python script file ({filename}.ipynb.
2- File naming: For a team of two students, name your Python script file using a
combination of both team members' university IDs. For instance, if the team members
have university IDs 123456 and 789012, the Python script file should be named
Team_123456_789012.ipynb.
3- send your code file to your instructor through the designated assignment.
4- Please keep in mind that late submissions will result in a ZERO score.
5- You must be able to discuss the details of your solution with your instructor.
The Dataset
Amazon Products Dataset 2023 Dataset is one of the biggest online retailers in the USA that
sells over 12 million products. With this dataset, you can get an in-depth idea of what
products sell best, the best price range for a product in a given category, and much more.
Files
1. amazon_products.csv
ID: Category identifier.
Category_name: Name of the category.
Objective:
Develop a Python program to analyze Amazon product data, categorize products, extract relevant
information, and generate summary reports.
Tasks:
A. Read Data from Files (3 marks)
Utilize the CSV module to read the "amazon_products.csv" Read the file line by line.
Use the JSON module to read the "amazon_categories.json" file and load its contents into
a Python dictionary.
Ensure that the data is properly formatted and ready for analysis and reporting tasks.
B. Product Title Cleaning: (3 marks)
Use the regular expression (re) package to clean up the product titles.
(1) In the title column, substitute any word within parentheses with the first letter of
each word.
(2) Remove any special characters or symbols from the product titles, leaving only
alphanumeric characters and parentheses.
C. Product Categorization: (3 marks)
Write a Python code to classify the products into two separate files based on their
star ratings:
(1) File name: High_Rated_Products.csv: Products with star ratings greater than 4.5.
(2) File name: Standard_Rated_Products.csv: Products with star ratings less than or
equal to 4.5.
Each file contains columns except Stars columns:
-- Good Luck --