Applied Data Analytics (IE 2064) Spring 2024

Description

This is an introduction to applied data analytics. The first part of the courses focuses on practical skills: datawrangling, visualization, data processing, exploratory data analysis and scoping projects. The second and main part of this course focuses on building predictive models for regression and classification: linear models, support vector machines, kernel methods, nearest neighbor, and tree-based models. The primary assessment is a project where students will apply their acquired skills on a real dataset. All course work will be done using R.

Prerequisites:

Note: it is suggested (but not mandatory) that student start working on module 1 - `software tools' prior to starting date of this course -- the lectures for this module are recorded. Two weeks after the course starts students will be expected to complete all of module 1 and associated learning checkpoints. Module 1 provides an introduction to the software tools that we will be used in this course.

Modules

Title Reference

Part 0 - Introduction to software tools Chapter 1-4 of Modern Dive

Part 1 - Overview and general strategies Chapter 1-5 of Applied Predictive Modeling

Part 2 - Regression models Chapter 6-8 of Applied Predictive Modeling

Part 3 - Classification models Chapter 11-16 of Applied Predictive Modeling

Textbooks:

(APM) Max Kuhn, Kjell Johnson, (2013), Applied Predictive Modeling, Springer, ISBN: 978-1-4614-6850-9

This book is available online at the pitt library.

Supplements:

Coding

Modelling and theory

Soft skills

Software tools

An introduction to these tools is provided in the software tools module. Installation instructions appear install-software-tools.pdf on canvas.

Assessment

Learning checkpoints, equally weighted                 5% of grade

Three homeworks & quizzes, equally weighted     30% of grade

Competition & quiz                 10% of grade

Midterm                           15% of grade

Project & oral exam           40% of grade

Late penalties: less than 1 hour late 2% penalty, less than two days late 5% penalty. Any later no points except with extraordinary circumstances.

Learning checkpoints

Learning checkpoints are due one week after the corresponding lecture (with the exception of the software tools module). Unlike homeworks the goal is not to assess students but to give students an opportunity to practice skills and check they are following lectures. They are also generally much shorter than homeworks. It is acceptable to look at answers before you submit. Generally, brief feedback will be provided for learning check points. For learning checkpoints a good faith effort (more than 50% of questions attempted and most of the answers correct) will considered complete and receive full points. Otherwise zero points will be awarded.

Competition

I will provide a dataset from an unknown source, your goal is to predict the outcome as best as possible. You will scored based on the quality of your code and if you meet certain prediction performance thresholds.

Project

The projects will involve taking a real dataset and applying the skills that you have learnt in this course to solve a problem for a `stakeholder'. You will be assessed through a series of presentations and an oral exam.

Collaboration and academic integrity

Collaboration and discussion between students is generally encouraged with some restrictions. For the competition assignment students will be assigned to teams. For the homeworks, students may discuss with each other but final answers should be written independently. For projects, student should submit independent reports but it is acceptable for students to use the same datasets or discussion among students. Remember you are strongly encouraged to discuss homeworks and projects at office hours where I am happy to offer advice.

AI usage: in this course I will allow AI usage with the cavet that the majority of your grade will be assess through quizzes, presentations and the oral exam. Thus, reliance on AI may substantially hurt your grade. Indeed in the oral exam for the project you are expected to defend the claims made in your report and discuss decisions made in your code. Over reliance on AI or stack overflow may thus harm your grade.

Please contact me if you have any questions and make sure you have read the academic integrity section below.