Optimization For Machine Learning (IE 1187/IE 2187) Spring 2025

Modern machine learning involves fitting predictive models on huge data sets using optimization methods. The choice of optimization method is critical in these problems. For example, using traditional (factorization based) methods to solve regression with ten thousand data points and features will fail - a tiny dataset by modern standards. Moreover, modern machine learning methods such as stochastic gradient descent are not plug-and-play: they require user expertise to select tuning parameters and interpret results. The goal of this course is to teach students how to use modern first-order methods to solve large-scale machine learning problems. Coding will be done in python using pytorch.

Topics covered: Convexity, nonconvexity, critical points and saddle points. Gradient descent descent. First-order methods vs second-order methods. Training vs test error. Stochastic gradient descent. Hyperparameter tuning. Explicit and implicit regularization. Batch sizes, parallelization, and GPUs. Fine tuning. Large language models.

Requirements: Multivariate calculus (e.g., MATH 240), linear algebra (e.g., MATH 0280), probability (e.g., IE 1070), and programming experience (e.g., IE 0015).

Learning objectives

Students should be able to explain how optimization is used in machine learning
Students should be able to explain the difference between train and test error. They should be able to avoid overfitting.
Students should be able to explain at a high level how stochastic gradient descent works and why it is popular for machine learning
Students should be able to train machine learning models including
- Choose appropriate loss functions
- Understanding and debugging possible failure cases
- Tune hyperparameters including step size routines, batch sizes
- Understand how they can reduce training times
- How to fine tune models

ABET outcomes

(1) Identify, formulate, and solve complex engineering problems by applying principles of engineering, science, and mathematics

(2) Apply engineering design to produce solutions that meet specified needs with consideration of public health, safety, and welfare, as well as global, cultural, social, environmental, and economic factors

(5) Function effectively on a team whose members together provide leadership, create a collaborative and inclusive environment, establish goals, plan tasks, and meet objectives

(6) Develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to draw conclusions

Assessment:

In class group exercises (5% of grade). Roughly one in three lectures will be in class exercises devoted to solving exercises in teams with guidance from the professor. These are performed in groups of 2-3. Due at the end of lecture. Full points for good faith effort (i.e., no need to complete all questions but try your best). No credit for team members who do not show up to class. Worst scoring in group exercise can be dropped.
Live in class questions using tophat (5% of grade). Ten worst scoring questions can be dropped.
Five HWs (15% of grade). Lowest scoring HW will be dropped (only the best four scoring HWs will be counted). HWs are equally weighted.
Midterm exam (35% of grade).
Final exam (40% of grade).

Final exam & midterm will test:

Conceptual understanding of material and how to use in practice
Coding by asking comprehension questions about sample code or to write short pseudo-code

Late HW policy

Late penalties: less than 1 hour late 2% penalty, less than two days late 5% penalty. Any later no points except in extraordinary circumstances.

Supplementary material

There is no textbook for this course (slides and colabs will contain all material that needs to be known) but useful supplementary references include:

HW Collaboration and ChatGPT Policies

Students may collaborate on homeworks but should understand their answers and write them up themselves. The use of large language models is not prohibited but I recommend that students use them sparingly. The most important thing is that students use HWs to learn. Over reliance on tools like ChatGPT or friends may lead to poor performance in the midterm or final exam.

Learning tools

Canvas for posting course content and submitting HW
Tophat for in class questions
Google Colab for coding

Tentative lecture schedule

lecture-schedule-OptML

Standard University PoLicies

Academic integrity

Students in this course will be expected to comply with the University of Pittsburgh’s Policy on Academic Integrity. Any student suspected of violating this obligation for any reason during the semester will be required to participate in the procedural process, initiated at the instructor level, as outlined in the University Guidelines on Academic Integrity. This may include, but is not limited to, the confiscation of the examination of any individual suspected of violating University Policy. Furthermore, no student may bring any unauthorized materials to an exam, including dictionaries and programmable calculators.

To learn more about Academic Integrity, visit the Academic Integrity Guide for an overview of the topic. For hands-on practice, complete the Academic Integrity Modules.

The Swanson School’s Academic Integrity Guide can be found here: SSOE_AI_Policy.pdf

Disability services

If you have a disability for which you are or may be requesting an accommodation, you are encouraged to contact both your instructor and Disability Resources and Services (DRS), 140 William Pitt Union, (412) 648-7890, drsrecep@pitt.edu, (412) 228-5347 for P3 ASL users, as early as possible in the term. DRS will verify your disability and determine reasonable accommodation for this course.

Students must contact DRS each term to initiate their accommodations.