0% found this document useful (0 votes)
15 views5 pages

Intracollege Datathon 2.0 - Case

The Intra College Datathon Competition 2.0 challenges participants to predict the winner of the ICC Champions Trophy 2025 using historical data from past tournaments (1998-2017). Participants must create and deploy a machine learning model based on various performance metrics of teams, with a focus on accuracy and innovative feature engineering. The competition includes guidelines for data preprocessing, model training, evaluation, and deployment, with attractive prizes for the top teams.

Uploaded by

matkook07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Intracollege Datathon 2.0 - Case

The Intra College Datathon Competition 2.0 challenges participants to predict the winner of the ICC Champions Trophy 2025 using historical data from past tournaments (1998-2017). Participants must create and deploy a machine learning model based on various performance metrics of teams, with a focus on accuracy and innovative feature engineering. The competition includes guidelines for data preprocessing, model training, evaluation, and deployment, with attractive prizes for the top teams.

Uploaded by

matkook07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Intra College Datathon Competition 2.

Problem Statement:

Predicting the Winner of the ICC Champions Trophy 2025

In this datathon, participants are tasked with analyzing past data of ICC Champions Trophy
tournaments to predict the winners of the upcoming ICC Champions Trophy 2025. A machine
learning model should be created and deployed to predict the outcome based on the given dataset.

Dataset Overview:

The dataset contains records from the ICC Champions Trophy, spanning tournaments held in 1998,
2000, 2004, 2006, 2009, 2013, and 2017. These records consist of 56 rows, with each row
representing a team’s performance in a specific tournament year. The data includes the performance
metrics for eight teams that usually participate in the ICC Champions Trophy.

The dataset has columns representing the teams’ key performance statistics such as matches played,
matches won, average runs per match, strike rate, top scorers, number of centuries and fifties,
wickets taken, top wicket-takers, bowling average, economy rate, fielding metrics, and many others.
These statistics will be used to train the model.

For the competition, participants are also provided with data for the 2025 ICC Champions Trophy.
This data is for testing the machine learning model, to make predictions on the outcome of the 2025
tournament.

Dataset Columns:

1. Year: Year of the tournament.

2. Team: The team participating in the tournament (e.g., India, Pakistan).

3. Group: Group classification (A or B).

4. Matches Played: Total matches played by the team.

5. Matches Won: Total matches won by the team.

6. Avg Runs Per Match: Average runs scored by the team per match.

7. Strike Rate: Team’s batting strike rate.

8. Team’s Top Scorer: The top run scorer from the team.

9. Number of Centuries: Total centuries scored by the team.

10. Number of Fifties: Total fifties scored by the team.

11. Highest Team Total: Highest total scored by the team.

12. Wickets Taken: Total wickets taken by the team.


13. Top Wicket-Taker: The top wicket-taker from the team.

14. Bowling Average: Average runs conceded per wicket by the team.

15. Bowling Economy Rate: Team’s bowling economy rate.

16. Five-Wicket Hauls: Total five-wicket hauls by the team.

17. Catches Taken: Total catches taken by the team.

18. Run Outs: Total run-outs by the team.

19. Stumpings: Total stumpings by the team.

20. Maiden Overs: Total maiden overs bowled by the team.

21. Net Run Rate (NRR): The team’s net run rate.

22. Total Fours: Total number of boundaries (fours) hit by the team.

23. Total Sixes: Total number of sixes hit by the team.

24. Host Advantage: Whether the team is the host nation (1 for yes, 0 for no).

25. Outcome: The target variable, whether the team was the winner (1 for Winner, 0 for
otherwise).

Expected Outcomes:

Participants are expected to:

1. Model Creation:

o Use the given historical data (1998-2017) to train a machine learning model that can
predict the outcome (winner or otherwise) of each team based on the team’s
statistics.

o Identify and engineer relevant features from the dataset that contribute to the
model's accuracy.

o Evaluate different machine learning algorithms (e.g., logistic regression, random


forest, XGBoost, etc.) and select the best performing model based on metrics like
accuracy, precision, recall, and F1 score.

2. Model Deployment:

o Deploy the model in a way that it can be tested with the current 2025 dataset.
Participants should create a system where, upon entering the performance statistics
of the teams from the 2025 dataset, the model predicts whether a particular team
will win or not (1 for winner, 0 for otherwise).
o Ensure the deployed model is user-friendly and can easily intake new data for
prediction.

Steps for Participation:

1. Data Preprocessing:

o Clean the data: Handle missing values and ensure all numeric data types are
correctly formatted.

o Normalize or standardize data if necessary to improve model performance.

o Perform feature selection based on correlation or importance metrics.

2. Feature Engineering:

o Use domain knowledge to create new features, such as combinations of batting and
bowling statistics, or derived metrics like win ratio, run differential, etc.

3. Model Training:

o Train various machine learning models (e.g., logistic regression, decision trees,
random forest, gradient boosting, or deep learning) to classify whether a team will
win or not.

o Tune hyperparameters using techniques such as cross-validation and grid search to


optimize model performance.

4. Model Evaluation:

o Evaluate the model using classification metrics such as accuracy, precision, recall, F1
score, and AUC-ROC curve.

o Compare models and select the one that performs the best on the validation set.

5. Model Deployment:

o Deploy the trained model using a web-based interface or a command-line tool.

o Ensure the deployed system can predict the outcome for new data provided (i.e., the
2025 data).

o Implement error handling and ensure the deployment is robust.

Submission Guidelines:

 Code: Submit the code used for data preprocessing, model training, and deployment as
HTML file. That has the following

a. Model: Provide a brief description of the model and the reasoning behind the
selection of the final model.
b. Documentation: Include a report that explains the methodology, feature selection,
model evaluation, and deployment process.
c. Deployment: How will you deploy the model (an Excel file/ python code for
deployment).

Additional Notes:

 You are free to explore advanced techniques such as ensemble methods, stacking models, or
neural networks if deemed appropriate.

 Domain knowledge of cricket and key factors that affect match outcomes (e.g., home
advantage, player form, etc.) will be helpful in improving model accuracy.

 Feel free to add more columns in the data, if you feel any other columns are relevant.

Evaluation Criteria:

The results of the ICC Champions Trophy will be announced on March 9. For evaluation, only those
teams that have accurately predicted the winning team will be considered. This accuracy must be
achieved through rigorous, data-driven analysis. Predictions that fail to identify the winning team will
not be reviewed, and any attempt to rely on speculation, guesswork, or personal bias instead of
analytical methods will disqualify the team from further evaluation. It is essential that the prediction
process is grounded solely in the use of data and statistical insights

1. Accuracy of Predictions: How well the model performs in predicting the winners. (15%)

2. Innovative Feature Engineering: How well new features are derived and contribute to the
model’s performance. (20%)

3. Model Selection and Justification: The rationale behind selecting a particular model and the
performance metrics achieved. (20%)

4. Model Deployment: The usability and functionality of the deployed model. (20%)

5. Documentation: Clarity and completeness of the documentation provided. (25%)

This datathon is a great opportunity for participants to showcase their skills in predictive modeling,
feature engineering, and deployment. Good luck!

Attractive Prizes
Attractive cash prizes and trophies will be awarded to the winners and runners-up of the
competition. We will recognize the top three teams, with prizes for first, second, and third positions.
Make sure your predictions are data-driven for a chance to win these prestigious rewards!
Rules for Team formation
1. Team Formation: Each team can have 2 to 5 members.

2. Eligibility: Team members must be current students of TSM, enrolled in MBA, PGDM, or
PGDDSBA programs.

3. Outsourcing: Teams must complete all work internally. Outsourcing any part of the work to
external agencies or individuals is strictly prohibited.

4. Collaboration: Collaboration between teams is not allowed. Each team must work
independently.

5. Originality: All submissions must be original and based on the team’s work. Plagiarism will
result in disqualification.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy