0% found this document useful (0 votes)

40 views7 pages

Cits2402 Assignment

The assignment for CITS2402 requires students to compare demographic data from the 2021 Australian Census and the 2023 New Zealand Census, focusing on a topic of their choice that allows for in-depth analysis. Students must submit a Python notebook and a PDF version, ensuring their code runs in Google Colab without errors, and include appropriate visualizations and explanations of their findings. The assignment emphasizes the importance of clarity, professionalism, and a well-structured report that adheres to the data science lifecycle.

Uploaded by

maggiechowwwww

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views7 pages

Cits2402 Assignment

Uploaded by

maggiechowwwww

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CITS2402 Introduction to Data Science

Semester 2, 2024
Assignment
Assessed, worth 20%. Due: 11:59pm, Friday 4th October 2024

1 Aim
This assignment aims to investigate the similarities and differences between Australia and New
Zealand regarding a demographic feature of your choice by comparing data from the latest
available Australian Census (2021) and New Zealand Census (2023) 1 .

You may choose what census topic you are most interested in from the data, provided it is not
one we addressed in the lecture or lab case studies (such as age, number of children, or travel
to work). For best results, you should also choose a topic with multiple categories (not, for
example, binary categories). You may focus on particular categories of interest. You should use
appropriate ways of visualising the data that best demonstrate the similarities and differences in
your conclusions.

To focus the assignment, it may be helpful to frame it as a question you seek to answer. You
should clearly state in your opening paragraphs what it is that you are seeking to answer.

When selecting your topic, choose data that lends well to in-depth analysis and graphical com-
parison. Simple comparisons, such as population size, are unlikely to achieve high marks.
Review several alternatives before settling on your topic to ensure it offers rich data for analysis
and meaningful insights. Choose a topic that interests you and allows for a comprehensive and
engaging exploration using the available census data.

Here are some context examples to inspire your choice:

• Comparing urbanisation trends in both countries.

• Examining the impact of migration on cultural diversity in both countries.

• Analysing the relationship between education levels and life satisfaction in both countries.

• Comparing housing or health indicators and their impact on overall well-being in both
countries.

It is important to understand from the beginning that, while programming is important, this
assignment is not merely a “coding exercise”. Equally significant are the context and rationale
behind your work, the sound and replicable use of data, and the clear and compelling presenta-
tion of your results.
1
You may choose a country other than NZ if there is one you are particularly interested in, providing the census
data is publicly available and you link the source.

1
2 Learning outcomes
This assignment demonstrates competencies in:

• sourcing information from (authoritative) public data repositories.

• extracting and cleaning information needed to answer a question about the data.

• analysing and interpreting the data.

• visualising data to aid understanding and communicate results.

• writing comprehensive and informative scientific reports to communicate your findings.

3 Authorship
The assignment may be done individually or in groups of up to three students.

Each student’s name and student number (whether completed by one or more students) must be
provided in the declaration at the top of the assignment template
CITS2402-Assignment-template.ipynb.

Where the assignment has been completed by more than one student, only one copy should be
provided that it is clear which version to mark.

The suffix “-template” should be replaced with the corresponding student numbers. For in-
stance, if you are doing your assignment with another person, you should rename your file as
CITS2402-Assignment-STDNO1-STDNO2.ipynb, where ‘STDNO1’ and STDNO2’
are the corresponding student numbers involved in the submission.

The submission must be the student’s (s) own work. Any material used in the assignment from
other sources must be clearly stated and referenced.

Your report, including all explanations and code, must be provided in a single notebook. The
notebook should contain headings and explanations in markdown cells and executable code in
Python code cells (as is done in the lab sheets).

It is recommended that you download a backup of your final completed submission directory
for your own records.

4 Data
Finding the appropriate data is part of the exercise. You cannot expect that the two countries
will provide the data in the same way. It is recommended that you begin searching for your
data early on, starting with the Australian Bureau of Statistics (ABS) and Stats NZ - Census
data. The metadata in the spreadsheets should be used to identify the relevant tables for your
investigation. Only the individual tables in CSV format (not all the data) should be used in your
submission.

2
Your report should clearly explain how you located the relevant data. This explanation should
be detailed enough to allow the reader to replicate your steps and obtain their own raw data to
test your code.

Your code should only need to access the file system for reading. You should avoid saving
images, writing files, etc.

5 Submission
Your submission consists of:

1. The Python notebook CITS2402-Assignment-STDNO1-STDNO2.ipynb, includ-

ing all explanations and code. It should contain headings and explanations in markdown
cells and executable code in Python cells.

2. The PDF version of the Python notebook. Both files should be named following the
instructions above.

3. Any data files that you use to run the code. Data files should not be more than 1MB. If
you wish to include any images (not required), they should be no larger than 200KB.

Submit your files to LMS as a ZIP file before the due date and time. You can submit them
multiple times. Only the latest version will be marked. Pay attention if you are submitting all
requested files when making a new submission. Your submission will follow the rules provided
in LMS.

Before submitting your assignment, you must ensure it runs without errors in the Google Colab
environment. This is to avoid markers having local problems with libraries you may choose
to use. Therefore, your code should run seamlessly without having to install any package on
Google Colab. The Colab environment has embedded essential data science libraries (numpy,
pandas, matplotlib, etc). Your mark will be zero if they cannot run your code in Google Colab.

Important:

• You must submit your assignment as .IPYNB *and* as an electronic file in PDF format
(do not send DOCX, or any other file format). Only PDF format and .IPYNB is accepted,
and any other file formats will receive a zero mark.

• Failing to submit any of the required files will result in a zero mark. You should include
all data files you are using.

• The data files you are using should be clearly specified. The marker should be able to
download the same data files from the website(s) and run your analysis.

• You should provide comments on your code.

• By submitting your assignment, you acknowledge you have read all instructions provided
in this document and LMS.

3
• There is a section in your LMS, Assignment - Updates and Clarifications, where you will
find updates or clarifications about the tasks when necessary. It is your responsibility to
check this page regularly.

• You will be assessed on your thinking and process, not only on your results. You should
demonstrate you understand the concepts involved.

• Your answer must be concise. You will be graded on thoughtfulness. If you are writing
long answers, rethink what you are doing. Probably, it is the wrong path.

• You can ask in the lab or during consultation if you need clarification about the assign-
ment.

• You should be aware that some algorithms can take a while to run. A good approach
to improving the Python speed is using the vectorised forms discussed in class. In this
case, it is strongly recommended that you start your assignment soon to accommodate the
computational time.

6 Code
The code will be executed with a fresh kernel for marking, so (as usual) you should ensure it
runs with a clean kernel before submission.

Any supporting data must be in the same directory. Data files should not be more than 1MB.

It is recommended that development is done in the notebooks - in the past students have had
code fail due to pasting from other environments.

7 Rubric
The assignment questions involve analysing data. You should present a well-organised report
and aim to write concisely.

The assignment will be marked for clarity and professionalism of both the exposition and the
coding.

Please read the assignment instructions and rubric carefully when preparing your code and
report.

It is recommended that you structure the report in a way that is consistent with the data science
lifecycle. You should use headings to help structure your report.

Consider these aspects as examples of what is expected (see rubric for more details):

• Plots and figures: Your report should include appropriate and well-presented visualisa-
tions that are meaningful to the analysis. All your diagrams/plots should have proper
titles, axis labels, values, etc., to help the reader understand what you are plotting.

4
• Your report should include the correct use of the data science lifecycle steps and the
interpretation of the results obtained from these steps.

• Presentation of results: Describe the results of your analysis and their interpretations.
Software output is not a valid output. You must format and present your answer appro-
priately (tables, graphs, etc.). You should not add irrelevant information when presenting
the results.

• Discussion: Based on your results, describe the conclusions of your analysis.

• Overall: Do not report out a long list of numbers if you do not explain what they are
and what they mean. Do not generate plots after plots without explaining what they
were doing or their purposes. Instead, think about the best way to display/illustrate your
approach. Pay attention to all details, including the graph labels, presentation quality, and
clarity.

Markers will pay attention to the following components (roughly equal weighting).

Context and data:

• Adequate context has been provided to understand the question and why it is important.

• The question you seek to understand is clearly stated.

• It is clear what data is used and its provenance. Instructions allow the reader to easily
source the data (to make the work replicable).

• Complete context and information about the data (e.g. the unit of measure, description of
the categories in the topic, etc) are provided.

• Relevant differences between the data from different sources and assumptions you have
to make for comparison are clearly described.

• If you are extracting only part of the data, your code should be accompanied by a brief
description of what you are extracting and why.

Data lifecycle, structure, and presentation:

Your data cleaning steps should be accompanied by a brief description of any steps you took to
transform the data from its raw form into usable form.

• The route from the data to the results is clearly set out, and steps are explained.

• It is clear what format the raw data took, what is extracted and why.

• Any data cleaning and conversion is clearly and concisely outlined.

• The processing or analysis necessary to extract and compile the results is clearly ex-
plained.

5
Results, visualisation and conclusion:

• The results are clearly stated and connected back to the original data and assumptions.

• Appropriate and informative choices are made for visualisation(s) (plots).

• The visualisations are clearly and professionally presented.

• Conclusions are connected to relevant features of the visualisations.

Coding:

There is no single “right” way to write the code. However, the following should be considered:

• Code is clear and easy to read and comprehend. Considerations should include the use of
meaningful variable names, the use of comments and/or docstrings for key steps/blocks
(you do not need to comment every line; this tends to obscure the key steps), and the use
of functions.

• Code is appropriately concise. Code should not be pared down to a bare minimum at
the expense of clarity and readability. However, you should try to avoid unnecessary
extraneous code.

• Code is reasonably efficient. (It is not necessary to achieve ultimate efficiency at the
expense of writing clear, logical code. However, you should avoid obvious unnecessary
inefficiencies.)

• Code is well structured. Functional decomposition is used to separate tasks into mean-
ingful components.

• Ensure you avoid repeating unnecessary blocks of code by using functions. Additionally,
refrain from hard-coding values directly within functions; instead, use function arguments
to pass these values.

Professionalism and Challenge:

• Overall, the report forms a compelling and illuminating narrative.

• The report is not unnecessarily long or repetitive and provides all the information com-
pletely but concisely.

• The report reveals aspects of the data that are not trivially obvious.

• The report is of a quality that an employer would be comfortable showing to a client.

6
8 Plagiarism and penalty on late submissions
See the URL below about late submission of assignments:

https://ipoint.uwa.edu.au/app/answers/detail/a id/2711/∼/consequences-for-late-assignment-submission

Plagiarism: In accordance with University Policy, you certify that all work submitted for this
assignment is your own and that all material drawn from other sources has been fully acknowl-
edged.

Snowplow 101 Guide To Marketing Attribution - 2023
No ratings yet
Snowplow 101 Guide To Marketing Attribution - 2023
16 pages
StotraNidhi Telugu 15-Books Combo
No ratings yet
StotraNidhi Telugu 15-Books Combo
1 page
API Casing To Recommended Bit Size
100% (1)
API Casing To Recommended Bit Size
3 pages
Reset Blu Ray Samsung BD-F5100
0% (1)
Reset Blu Ray Samsung BD-F5100
5 pages
Data Structures and Algorithm Analysis in Java, Third Edition
From Everand
Data Structures and Algorithm Analysis in Java, Third Edition
Clifford A. Shaffer
4/5 (4)
Assignment 2 Task Sheet
No ratings yet
Assignment 2 Task Sheet
3 pages
CC7182 - Programming For Data Analytics
No ratings yet
CC7182 - Programming For Data Analytics
9 pages
CSCI946 Assignment - 1 - Task - Sheet
No ratings yet
CSCI946 Assignment - 1 - Task - Sheet
4 pages
Project 1
No ratings yet
Project 1
3 pages
Project Guidelines (ISE-291 - T 241)
No ratings yet
Project Guidelines (ISE-291 - T 241)
3 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
Touchpad Information Technology Class 10: Skill Education Based on Windows & OpenOffice Code (402)
From Everand
Touchpad Information Technology Class 10: Skill Education Based on Windows & OpenOffice Code (402)
Dr. Sanjay Jain
No ratings yet
ICDL Spreadsheets: ICDL Certification Series, #4
From Everand
ICDL Spreadsheets: ICDL Certification Series, #4
Michael Anderson
4/5 (2)
Assignment 1 Specification - T1 - 2023 - COIT12209
No ratings yet
Assignment 1 Specification - T1 - 2023 - COIT12209
3 pages
CS502M Project Spec
No ratings yet
CS502M Project Spec
8 pages
Research & the Analysis of Research Hypotheses
From Everand
Research & the Analysis of Research Hypotheses
Kathleen Thomas Allan
No ratings yet
Crystal Reports Introduction: Versions 2008-2016
From Everand
Crystal Reports Introduction: Versions 2008-2016
Seth Bonder
No ratings yet
ST3189 Assessed Coursework Project 2023-24
No ratings yet
ST3189 Assessed Coursework Project 2023-24
2 pages
ChatGPT Guide to Scientific Thesis Writing: AI Research writing assistance for UG, PG, & Ph.d programs
From Everand
ChatGPT Guide to Scientific Thesis Writing: AI Research writing assistance for UG, PG, & Ph.d programs
Jayachandran M
4/5 (1)
Lab Sheet1
No ratings yet
Lab Sheet1
1 page
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
GT 4801 Cours Python2
No ratings yet
GT 4801 Cours Python2
5 pages
Demonstrating Design for Six Sigma
From Everand
Demonstrating Design for Six Sigma
Robert Perrine
3/5 (2)
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
ICT202 Machine Learning - Assignment 2
No ratings yet
ICT202 Machine Learning - Assignment 2
2 pages
Project2 - 158755. 4.21
No ratings yet
Project2 - 158755. 4.21
3 pages
Data Structures and Algorithm Analysis in C++, Third Edition
From Everand
Data Structures and Algorithm Analysis in C++, Third Edition
Clifford A. Shaffer
4.5/5 (5)
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
Touchpad Information Technology Class 9
From Everand
Touchpad Information Technology Class 9
Sanjay Jain
No ratings yet
R Object-oriented Programming
From Everand
R Object-oriented Programming
Kelly Black
3/5 (1)
Touchpad Computer Applications Class 9
From Everand
Touchpad Computer Applications Class 9
Sanjay Jain
4/5 (1)
Ultimate Full-Stack Web Development with MEVN: Learn From Designing to Deploying Production-Gr7ade Web Applications with MongoDB, Express, Vue, and Node.js on AWS, Azure, and GCP (English Edition)
From Everand
Ultimate Full-Stack Web Development with MEVN: Learn From Designing to Deploying Production-Gr7ade Web Applications with MongoDB, Express, Vue, and Node.js on AWS, Azure, and GCP (English Edition)
Bhargav Bachina
No ratings yet
Group Assignment 01
No ratings yet
Group Assignment 01
3 pages
Data Analysis and Visualization LAB
No ratings yet
Data Analysis and Visualization LAB
2 pages
Agile Foundation Courseware – English
From Everand
Agile Foundation Courseware – English
Nader Rad
No ratings yet
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
Python Group Final Project
No ratings yet
Python Group Final Project
2 pages
Privacy & Data Protection Practitioner Courseware - English
From Everand
Privacy & Data Protection Practitioner Courseware - English
Marios Siathas
No ratings yet
Task2 - Colaboratory
No ratings yet
Task2 - Colaboratory
3 pages
1 - cn7022 18 19 CRWK
No ratings yet
1 - cn7022 18 19 CRWK
7 pages
ITEC2600 Section A Project
No ratings yet
ITEC2600 Section A Project
2 pages
Nd002 Syllabus 2018 June v9
No ratings yet
Nd002 Syllabus 2018 June v9
5 pages
Assignment 1 - Part B
No ratings yet
Assignment 1 - Part B
2 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
From Everand
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
Manish Soni
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
CWBrief
No ratings yet
CWBrief
2 pages
Datascience
No ratings yet
Datascience
8 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
6C - Data Science - Syllabus - 01
No ratings yet
6C - Data Science - Syllabus - 01
4 pages
Data Analysis and Data Science Task - 1
No ratings yet
Data Analysis and Data Science Task - 1
3 pages
FIT1043 A2 Specification - S2 2024 - Gks6arg
No ratings yet
FIT1043 A2 Specification - S2 2024 - Gks6arg
5 pages
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
Data Analyst Nanodegree Program - Syllabus
50% (2)
Data Analyst Nanodegree Program - Syllabus
7 pages
Task2 - Colaboratory Dip
No ratings yet
Task2 - Colaboratory Dip
3 pages
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
Python Practical Questions@Subas
No ratings yet
Python Practical Questions@Subas
7 pages
Syllabus
No ratings yet
Syllabus
15 pages
dsm020 Coursework
No ratings yet
dsm020 Coursework
3 pages
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
No ratings yet
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
52 pages
RN 2103213618 1 MT 690966-638031645126201096-Big-Data-Question
No ratings yet
RN 2103213618 1 MT 690966-638031645126201096-Big-Data-Question
6 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
01ALCATEL - Temporis - 500 Pro - User Guide
No ratings yet
01ALCATEL - Temporis - 500 Pro - User Guide
40 pages
Computer 10 4th MY ANSWER
No ratings yet
Computer 10 4th MY ANSWER
11 pages
Config WCM
100% (1)
Config WCM
17 pages
Whitepaper EngineeringDesignSimulationShapeOptimization OnshapeSimScaleESTECO
No ratings yet
Whitepaper EngineeringDesignSimulationShapeOptimization OnshapeSimScaleESTECO
17 pages
Checklist For Installation of CI Pipe
No ratings yet
Checklist For Installation of CI Pipe
1 page
Huang GameFormer Game-Theoretic Modeling and Learning of Transformer-Based Interactive Prediction and ICCV 2023 Paper
No ratings yet
Huang GameFormer Game-Theoretic Modeling and Learning of Transformer-Based Interactive Prediction and ICCV 2023 Paper
11 pages
Foundation Plan (Delos Santos)
No ratings yet
Foundation Plan (Delos Santos)
1 page
Tutorial - SurvCE.01.Rev3.NTRIP Connections S9III S8
No ratings yet
Tutorial - SurvCE.01.Rev3.NTRIP Connections S9III S8
18 pages
Social Media Influences To Teenagers: June 2020
No ratings yet
Social Media Influences To Teenagers: June 2020
12 pages
Sop Vigilance
No ratings yet
Sop Vigilance
7 pages
G-Low Dvor
No ratings yet
G-Low Dvor
39 pages
RHB R6.2 Point Release PDF
No ratings yet
RHB R6.2 Point Release PDF
14 pages
Elevayt
No ratings yet
Elevayt
8 pages
PKG List (Submit To Mr. Jeong)
No ratings yet
PKG List (Submit To Mr. Jeong)
6 pages
Emfd Eec
No ratings yet
Emfd Eec
2 pages
Tut - 03 - 020843
No ratings yet
Tut - 03 - 020843
25 pages
Installation: Order No.: Customer: Equipment: Converter Type: Document: 3BHS213774E01 ACS 1000 W
No ratings yet
Installation: Order No.: Customer: Equipment: Converter Type: Document: 3BHS213774E01 ACS 1000 W
73 pages
Surveillance Systems
No ratings yet
Surveillance Systems
17 pages
Datatool Alarm Manual
No ratings yet
Datatool Alarm Manual
20 pages
NLP Extc Sem8 Final Exam IMPs
No ratings yet
NLP Extc Sem8 Final Exam IMPs
3 pages
Display A CDS View Using ALV With IDA
No ratings yet
Display A CDS View Using ALV With IDA
7 pages
Forklift Inspection
No ratings yet
Forklift Inspection
4 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
28 pages
Methods2023 Syllabus
No ratings yet
Methods2023 Syllabus
7 pages
VGS House Model - Estimate
No ratings yet
VGS House Model - Estimate
1 page
SECURITY MEASURES IN Monuments
No ratings yet
SECURITY MEASURES IN Monuments
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cits2402 Assignment

Uploaded by

Cits2402 Assignment

Uploaded by

CITS2402 Introduction to Data Science

Here are some context examples to inspire your choice:

• Comparing urbanisation trends in both countries.

• Examining the impact of migration on cultural diversity in both countries.

• sourcing information from (authoritative) public data repositories.

• analysing and interpreting the data.

• visualising data to aid understanding and communicate results.

• writing comprehensive and informative scientific reports to communicate your findings.

1. The Python notebook CITS2402-Assignment-STDNO1-STDNO2.ipynb, includ-

• You should provide comments on your code.

• Discussion: Based on your results, describe the conclusions of your analysis.

Context and data:

• The question you seek to understand is clearly stated.

Data lifecycle, structure, and presentation:

• Any data cleaning and conversion is clearly and concisely outlined.

• Appropriate and informative choices are made for visualisation(s) (plots).

• The visualisations are clearly and professionally presented.

• Conclusions are connected to relevant features of the visualisations.

Professionalism and Challenge:

• Overall, the report forms a compelling and illuminating narrative.

• The report is of a quality that an employer would be comfortable showing to a client.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.