2024-2025 F20DL Assessed Coursework, Heriot-Watt University Dubai
Areej Ahmed: H00385913
Muhammad Imaad: H00362645
Ziyaan Mir: H00382769
Syed Ammar: H00364774
Mufaddal Abizar Ezzi: H00360993
Coursework Overview The group project at hand involves choosing one topic of application and then selecting and analyzing 1–3 unique datasets while dividing the project into five distinct phases. The project's overarching goal is to explore, analyze, and apply machine learning techniques to these datasets. The project emphasizes team work, data exploration, and the application of various machine learning methods, including clustering, decision trees, and neural networks
Week 3: Finalize topic, finalize datasets and lay out the objectives.
Week 4: Carry out data visualization on datasets. Finish PowerPoint for D1. (R1)
Week 5: Change PowerPoint slides as per feedback. Start preprocessing.
Week 6: Complete preprocessing. Visualize data after preprocessing and analyze. (R2)
Week 7: Experiment with clustering (R3)
Week 8: Start training models. Evaluate their performance. (R4)
Week 9:Experiment with Neural Networks (R5)
Week 10: Work on the report
Week 11: Finalize report
Link for the dataset: https://www.kaggle.com/datasets/arunavakrchakraborty/australia-weather-data
Original source of the data: Australian Bureau of Meteorology at http://www.bom.gov.au/climate/data/
License: CC0: Public Domain
Link for the dataset: https://www.kaggle.com/datasets/emmanuelfwerr/london-weather-data
Original source of the data: European Climate Assessment (ECA) at http://www.bom.gov.au/climate/data/](https://www.ecad.eu/dailydata/index.php)
License: CC0: Public Domain
Link for the dataset: https://www.kaggle.com/datasets/jehanbhathena/weather-dataset
Original source of the data: Harvard Dataverse
License: CC0: Public Domain
Some images of this dataset include:
- Tools\Software needed: Jupyter Notebook or Google Colab
- Libraries Required: NumPy, Pandas, matplotlib.pyplot, seaborn, imbalanced-learn, scikit-learn, xgboost, lightgbm, catboost.
- Load raw dataset from the specified directory in the header of the notebook.
- Clean the data by removing null values and handling outliers.
- Balance the dataset.
- Normalize numerical features and encode categorical features.
- Split the data into training and testing sets.
- Save the preprocessed dataset with a change in name for ease of use.
With this dataset, we are classifying based on chosen attributes, whether or not is raining the next day. The attribute we are predicting is the "RainTomorrow" attribute.
- Data Preprocessing and Analysis (R2)
Process
- Clustering (R3)
Cluster
- Model Training and Evaluation (R4)
Model A What inputs is the model using to predict from
Figure showing results using permanent links
Model B What inputs is the model using to predict from
Figure showing results using permanent links
Model C What inputs is the model using to predict from
Figure showing results using permanent links
*Neural Networks (R5)
CNN
MLP
Table of results (Markdown tables)
With this dataset, we are predicting based on chosen attributes, the amount of sunshine the next day. The attribute we are predicting is the "sunshine" attribute.
- Data Preprocessing and Analysis (R2)
Process
- Model Training and Evaluation (R4)
Model A What inputs is the model using to predict from
Figure showing results using permanent links
Model B What inputs is the model using to predict from
Figure showing results using permanent links
Model C What inputs is the model using to predict from
Figure showing results using permanent links
*Neural Networks (R5)
CNN
MLP
Table of results (Markdown tables)
With this dataset, we are classifying pictures into certain classes using neural networks.
- Data Preprocessing and Analysis (R2)
Process
- Clustering (R3)
Cluster
- Model Training and Evaluation (R4)
Model A What inputs is the model using to predict from
Figure showing results using permanent links
Model B What inputs is the model using to predict from
Figure showing results using permanent links
Model C What inputs is the model using to predict from
Figure showing results using permanent links
*Neural Networks (R5)
CNN
MLP
Table of results (Markdown tables)
This folder stores our chosen datasets as well their the preprocessed versions.
This folder stores our notebooks for our analysis and model running.
This folder stores our scripts for preprocessing our datasets.
This folder stores our project documents as well as the weekly updates of our work