Applied Data Analytics (IE 2064) Spring 2024

Description

This is an introduction to applied data analytics. The first part of the courses focuses on practical skills: datawrangling, visualization, data processing, exploratory data analysis and scoping projects. The second and main part of this course focuses on building predictive models for regression and classification: linear models, support vector machines, kernel methods, nearest neighbor, and tree-based models. The primary assessment is a project where students will apply their acquired skills on a real dataset. All course work will be done using R.

Prerequisites:

Note: it is suggested (but not mandatory) that student start working on module 1 - `software tools' prior to starting date of this course -- the lectures for this module are recorded. Two weeks after the course starts students will be expected to complete all of module 1 and associated learning checkpoints. Module 1 provides an introduction to the software tools that we will be used in this course.

Modules

Title Reference

Part 0 - Introduction to software tools Chapter 1-4 of Modern Dive

Part 1 - Overview and general strategies Chapter 1-5 of Applied Predictive Modeling

Part 2 - Regression models Chapter 6-8 of Applied Predictive Modeling

Part 3 - Classification models Chapter 11-16 of Applied Predictive Modeling

Textbooks:

(APM) Max Kuhn, Kjell Johnson, (2013), Applied Predictive Modeling, Springer, ISBN: 978-1-4614-6850-9

This book is available online at the pitt library.

Supplements:

Coding

Modelling and theory

Soft skills

Software tools

An introduction to these tools is provided in the software tools module. Installation instructions appear install-software-tools.pdf on canvas.

Assessment

Learning checkpoints, equally weighted 5% of grade

Three homeworks, equally weighted 30% of grade

Competition 10% of grade

Midterm 15% of grade

Project 40% of grade

Late penalties: less than 1 hour late 2% penalty, less than two days late 5% penalty. Any later no points except with extraordinary circumstances.

Learning checkpoints

Learning checkpoints are due one week after the corresponding lecture (with the exception of the software tools module). Unlike homeworks the goal is not to assess students but to give students an opportunity to practice skills and check they are following lectures. They are also generally much shorter than homeworks. It is acceptable to look at answers before you submit. Generally, brief feedback will be provided for learning check points. For learning checkpoints a good faith effort (more than 50% of questions attempted and most of the answers correct) will considered complete and receive full points. Otherwise zero points will be awarded.

Competition

I will provide a dataset from an unknown source, your goal is to predict the outcome as best as possible. You will scored based on the quality of your code and if you meet certain prediction performance thresholds.

Project

The projects will involve taking a real dataset and applying the skills that you have learnt in this course to solve a problem for a `stakeholder'.

Collaboration and academic integrity

Collaboration and discussion between students is generally encouraged with some restrictions. For the competition assignment students will be assigned to teams. For the homeworks, students may discuss with each other but final answers should be written independently. For projects, student should submit independent reports but it is acceptable for students to use the same datasets or discussion among students. Remember you are strongly encouraged to discuss homeworks and projects at office hours where I am happy to offer advice.

If you copy code in any homework or project (e.g., from stackoverflow), please make this clear and cite where you copied it from otherwise there is a risk of being accused of plagiarism. Furthermore, please keep in mind that a correctly cited chuck of code will not be consider plagiarism, but it could detract from the demonstrating that you understand the course material, causing you to lose points on an assignment. If you are unsure what to do for a particular assignment, I would recommend discussing with me at office hours.

Please contact me if you have any questions and make sure you have read the academic integrity section below.