Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Project

This is the final project for the course. You can work in teams of 1-3 to complete a data science project of your choosing.

There are multiple graded components to the project that will sum up to a final project grade. The project is worth 100 points total. Here are the points breakdown for each component.


  1. Proposal: 25 pts

  2. Project Goal, Scope, Approach, Evaluations, Algorithmic Results, In-class final Presentation and Demo/visualization 25 pts


  1. Interim Check-In 25 pts


  1. Code/GitHub contents 40 pts

  2. Project Report 60 pts


  1. 3-5minute presentation 25 pts


DeadlineDeliverablePointsSubmission Window Opens
Tuesday, 3 MarchOpt-in0Tuesday, 3 March
Thursday, 19 MarchProposal50Tuesday, 3 March
Tuesday, 5 MayInterim Check-In25Tuesday, 28 April
Thursday, 12 MayComplete Project100Tuesday, May 5
Thursday, 12 MayPresentation Slides25Tuesday, May 5
Total Points200
  1. source code

    1. share GitHub repo link if code is public, other wise include all code files in the zip

    2. README file with instructions on how to run the code

  2. poster or slides

  3. demo if any

  4. project report

    1. 2-4 pages report summarizing the project with fonts no smaller than 11pt

    2. latex, word, or pdf format accepted.

    3. DO NOT include any code in the report unless necessary, but include relevant figures, tables, and results.

Project Proposal (50 points)

Due AoE, Thursday, 19 March No more than 2 pages with font size 11, single-spaced (PDF, .docx, or .txt format)

The window for submitting proposals opens March 10. If you would like feedback and the opportunity to resubmit for a full credit, submit early in the window. Feel free to re-submit as many times as you like, up until the proposal deadline.

The proposal is split into the following sections:

Overview:

Think of the overview section as the equivalent of an abstract in a research paper or an elevator pitch for the project. The following questions will help you frame your thoughts if you ever have to succinctly describe your project in an interview:

  • Title: should capture the topic/theme of your project.

  • Objective: In 1 to 2 sentences, succinctly describe what you are hoping to accomplish in this project in simple, non technical English.

  • Importance: In 1 to 2 sentences, describe why this project has personal significance to you.

  • Originality: In 1 to 2 sentences, describe why you believe this project idea is unique and original.

Background Research:

In this section, please prove to us that you have already done research in the project you are proposing by answering the questions below.

  • Key Term Definitions: What are some terms specific to your project that someone else might not know? List and define these terms here.

  • Existing Solutions: What are some existing solutions (if any) that are already available for your problem. What are the drawbacks to these solutions?

Data:

In order to write a successful proposal, you must already have obtained the data and done basic exploratory analysis on it, enough so that you feel confident you have enough data to answer the questions you wish to explore. We cannot stress this enough: You must use data that is publicly available. Try to avoid dataset that often shows in tutoriala like Iris for classifiction. If your data does not fit this criteria, your proposal will be rejected.

The following questions will guide you through some criteria you should be using to assess if the data you have is enough for a successful project.

  • Data Source: Include a list of your planned data source(s), complete with URL(s) for downloading. All data must be publicly available.

  • Data Volume: How many columns in your dataset? How many rows? If you are joining multiple datasets together, please tell us how many rows and columns remain after the data has been merged into a single dataset.

  • Data Richness: What type of data is in your dataset? You don’t need to describe every column. A generalized overview is fine. (e.g. “My data contains 311 complaint types, the date the complaints are created and closed, as well as a description of the complaint”). If you found a data dictionary, feel free to link us to that as well.

The Predictive Model:

A strong data science project should demonstrate your knowledge of predictive modeling. We will be covering models extensively in the latter half of the course. At this stage of the proposal writing, we will not have covered all the modeling techniques yet, so it’s okay to be a bit vague here.

Hint: Look ahead in the textbook at the chapters on “Linear Modeling” and “Multiple Linear Modeling” for the running examples of models.

  • The Predictors (X’s): Which column(s) in the dataset will be used to predict the column listed above?

  • Python Dependencies: What Python libraries and dependencies will you be using?

  • Security and Privacy Considerations: Will you be working with personal identifiable information (PII)? Can your model be mis-used for evil, not good? If so, how do you plan to mitigate that?

The Visualization:

A key part of making a great data science portfolio are the visualizations. This is a quick and elegant way of showcasing your work during the job hunting process, even to a non-technical audience.

Thus, a major part of this final project will center around making the following three types of visualizations with the data you choose. If your data cannot support all three types of visualizations, then please, reconsider choosing another dataset.

  • Summary Statistics Plots: Write out in detail at least 3 types of summary statistics graphs you plan to make with your data (e.g. “I plan to make a histogram using the column X”).

  • Map Graphs: Write out in detail how you plan to make at least 1 map data visualization using your data (e.g. “I plan to create a choropleth map to visualize the volume of 311 service requests in NYC in 2021”).

  • Model Performance Plots: At the time of writing this proposal, we would not have covered how to visualize model accuracy yet. So, no worries if this part is still confusing to you. Give it your best shot on explaining what kind of visualization you think will best showcase that your model is “successful” and “accurate”.


Project Check In (25 Points)

Due AoE, Tuesday, 5 May

Describe where you are currently at with the project, make sure to describe your progress with:

  • The aquisition of the data

  • Visualization of the data

  • Prelimanary Analysis

  • What needs to be done next?

Submit on gradescope