0% found this document useful (0 votes)
48 views5 pages

Data Engineering Syllabus Spring 2024 (3)

IEOR E2000 is a Spring 2024 course focused on data engineering with Python, covering data organization, storage, and management. The course includes lectures, assessments, and collaborative learning through EdStem, with a grading policy based on homework, exams, and participation. Academic honesty is emphasized, and students are encouraged to seek help while ensuring their work is original.

Uploaded by

jsm46
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views5 pages

Data Engineering Syllabus Spring 2024 (3)

IEOR E2000 is a Spring 2024 course focused on data engineering with Python, covering data organization, storage, and management. The course includes lectures, assessments, and collaborative learning through EdStem, with a grading policy based on homework, exams, and participation. Academic honesty is emphasized, and students are encouraged to seek help while ensuring their work is original.

Uploaded by

jsm46
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEOR E2000: Data Engineering with Python

Spring 2024

Class Schedule
● Classroom: Mudd 303
● Meeting time: Mondays and Wednesdays 2:40-3:55 PM

Course Staff
Instructor
● Yi Zhang
○ Email: yz3558@columbia.edu
○ Office: Mudd 340
○ Office Hours: Fridays 2:45- 4:15 pm

Course Assistant
● Sahil Bhave
○ Email: sb4865@columbia.edu
○ Office Hours
■ Time: 11 AM to 12:30 AM
■ Location: Table 3 Mudd 301
● Chris Lee
○ Email: csl2183@columbia.edu
○ Office Hours
■ Time: 11 AM to 12:30 AM
■ Location: Table 3 Mudd 301

Course Description
This comprehensive course is designed to equip students with essential knowledge and skills in
effectively working with data. Students will learn how data is organized, stored, and managed in
both programming applications and large-scale databases. This course serves as a crucial
stepping stone for students interested in technology-driven fields that deal with information
handling and data analytics.
Learning Objectives
● Understand fundamental data building blocks, such as list, tuple, arrays, linked lists,
stacks, queues, deque, priority queues, dictionary, set, trees, graphs, and their
applications.
● Implement and analyze algorithms using data structures for efficient data manipulation
and problem-solving for Operations Research problems
● Comprehend the principles of database design and develop SQL/NoSQL skills to
retrieve, update, and manage data in databases.
● Perform basic data manipulation and wrangling using Pandas

Course Materials

Course website
We will be using EdStem to post lecture materials and host the discussion board. Please check
the updates on the course website periodically.

We will be using the EdStem Discussion Board for Q&A. All questions should be posted on the
discussion board on the EdStem platform. The goal is to make a collaborative space for learning
and communicating. Please do not use public posts to share solutions. For personal matters or
content that might contain the solution, please set your post "private" so that your peers will not
see the content.

Textbook
There is no required textbook. All materials will be posted on EdStem. The following two
textbooks might be useful if you are interested in doing some in-depth reading for certain topics.

● Goodrich, Michael T., Roberto Tamassia, and Michael H. Goldwasser. Data structures and
algorithms in Python. John Wiley & Sons Ltd, 2013.
● Silberschatz, Abraham, Henry F. Korth, and Shashank Sudarshan. Database system concepts.
McGraw Hill, 2019.

Software
We will use Python for this course. We expect you have finished ENGI E1006 or have
proficiency in Python, such as knowing how to

● Define and manipulate built-in Python objects


● Define Python function
● Write loop and conditional statements
● Experience with Object-oriented programming in Python, such as defining Python class
and perform class inheritance in Python

Assessments
Assessment of learning objectives will consist of six regular problem sets, a midterm exam, and
a final exam. All assessments will contain both theoretical problems as well as application
problems in Python. The problems aim to help students practice and assess their skills in
applying methods learned in class to work with various data components and solve real-world
problems.

Grading Policy
Homework (35%), Midterm (20%), Final (40%), Class Participation (5%)

Homework
We will have six homework assignments in total. The lowest assignment grade will be dropped.
The questions appearing on the homework focus on the application component of the course.
You can collaborate on the homework assignments. However, you MUST finish the write-up
independently. You cannot share questions with or solicit help from people not attending this
course.

Exams
The exams will be computer-based. It will be open-book and open-notes. You are required to
finish the questions independently. AI tools are forbidden for the exam.

Class Participation
Students are expected to attend the lectures and contribute to an active learning environment.
Ways to increase your class participation include but are not limited to:

● Attend lectures in person


● Actively ask and answer questions during the lecture
● Actively participate in discussions on EdStem

Letter Grade
The letter grade will be assigned based on the curve. When assigning the letter grade, we will
consider your standing among your peers and the class performance.
Assignment Policies
Late Policy
The deadline for all the homework assignments is at midnight EST. In addition, I will give each
person a leeway of 1 hour for each assignment. After the leeway, the submissions will receive a
0. In addition, you can submit up to 2 homework assignments up to 24 hours late. No
permission is needed. No questions asked.

Re-grading Policy
For the re-grading of homework, please leave a private post on the discussion board on EdStem
within seven days of receiving your grades. Since I will post the solution to each homework
assignment, you are expected to compare the solution with your own write-up before sending
the request. In your request, you should explain the reasoning for any suspected mistakes in
grading.

Tentative Deadlines
Assignments are released 7 days before the deadline.

Assessment Due Note

HW1 Feb 17, 2024

HW2 Mar 2, 2024

HW3 Mar 17, 2024

Midterm Mar 20, 2024 In class

HW4 Apr 6, 2024

HW5 Apr 20, 2024

HW6 May 4, 2024

Final May 8, 2024 In person. Projected By the


registrar's office
Academic Honesty
I highly encourage a collaborative environment. You are encouraged to help each other in the
learning process. Students are also highly encouraged to come to the instructor/TA for help.
However, your homework assignments and quizzes must entirely be your own write-up. Sharing
homework solution files and copying others' work is strictly prohibited and will receive different
levels of penalty depending on the severity of the case.

For the exams, you need to finish the questions by yourself. No discussion or collaboration is
allowed. Your work cannot be copied from another person or any other source. Submissions
where these details are identical or nearly identical, either among peers or with another source,
will be regarded as cheating. The sanctions may range up to the termination of your enrollment
at Columbia University. All suspected incidents will be recorded with SEAS administration at the
same time the student is notified.

Potential Topics
● Data Building Blocks
○ Numeric, Boolean, and String
○ List, Tuple, and Arrays
○ Stacks, Queues, Priority Queue, Deque, and Linked List
○ Map, Hash Tables, Dictionary, and Set
○ Trees
○ Graphs
○ Search and Sort
● Data Repository
○ Relational Database Management System
○ SQL
○ NoSQL
● Data Collection
○ Web Scraping using XML, HTML, and Json
● Data Manipulation with Pandas

Additional Support
Studying at Columbia University can be competitive and stressful. We are here to make sure
everyone stays healthy physically and mentally. If you have any help with your work or life,
please do not hesitate to approach us. We are always here to help. In addition, it is a good
option to use Columbia Counseling and Psychological Services for anonymous consultation.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy