Optimization For Machine Learning (IE 1187/IE 2187) Spring 2025
Modern machine learning involves fitting predictive models on huge data sets using optimization methods. The choice of optimization method is critical in these problems. For example, using traditional (factorization based) methods to solve regression with ten thousand data points and features will fail - a tiny dataset by modern standards. Moreover, modern machine learning methods such as stochastic gradient descent are not plug-and-play: they require user expertise to select tuning parameters and interpret results. The goal of this course is to teach students how to use modern first-order methods to solve large-scale machine learning problems. Coding will be done in python using pytorch.
Topics covered: Convexity, nonconvexity, critical points and saddle points. Gradient descent descent. First-order methods vs second-order methods. Training vs test error. Stochastic gradient descent. Hyperparameter tuning. Explicit and implicit regularization. Batch sizes, parallelization, and GPUs. Fine tuning. Large language models.
Requirements: Multivariate calculus (e.g., MATH 240), linear algebra (e.g., MATH 0280), probability (e.g., IE 1070), and programming experience (e.g., IE 0015).
Learning objectives
Students should be able to explain how optimization is used in machine learning
Students should be able to explain the difference between train and test error. They should be able to avoid overfitting.
Students should be able to explain at a high level how stochastic gradient descent works and why it is popular for machine learning
Students should be able to train machine learning models including
Choose appropriate loss functions
Understanding and debugging possible failure cases
Tune hyperparameters including step size routines, batch sizes
Understand how they can reduce training times
How to fine tune models
ABET outcomes
(1) Identify, formulate, and solve complex engineering problems by applying principles of engineering, science, and mathematics
(2) Apply engineering design to produce solutions that meet specified needs with consideration of public health, safety, and welfare, as well as global, cultural, social, environmental, and economic factors
(5) Function effectively on a team whose members together provide leadership, create a collaborative and inclusive environment, establish goals, plan tasks, and meet objectives
(6) Develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to draw conclusions
Assessment:
In class group exercises (5% of grade). Roughly one in three lectures will be in class exercises devoted to solving exercises in teams with guidance from the professor. These are performed in groups of 2-3. Due at the end of lecture. Full points for good faith effort (i.e., no need to complete all questions but try your best). No credit for team members who do not show up to class. Worst scoring in group exercise can be dropped.
Live in class questions using tophat (5% of grade). Ten worst scoring questions can be dropped.
Five HWs (15% of grade). Lowest scoring HW will be dropped (only the best four scoring HWs will be counted). HWs are equally weighted.
Midterm exam (35% of grade).
Final exam (40% of grade).
Final exam & midterm will test:
Conceptual understanding of material and how to use in practice
Coding by asking comprehension questions about sample code or to write short pseudo-code
Late HW policy
Late penalties: less than 1 hour late 2% penalty, less than two days late 5% penalty. Any later no points except in extraordinary circumstances.
Supplementary material
There is no textbook for this course (slides and colabs will contain all material that needs to be known) but useful supplementary references include:
https://www.cambridge.org/core/books/optimization-for-data-analysis/C02C3708905D236AA354D1CE1739A6A2
HW Collaboration and ChatGPT Policies
Students may collaborate on homeworks but should understand their answers and write them up themselves. The use of large language models is not prohibited but I recommend that students use them sparingly. The most important thing is that students use HWs to learn. Over reliance on tools like ChatGPT or friends may lead to poor performance in the midterm or final exam.
Learning tools
Canvas for posting course content and submitting HW
Tophat for in class questions
Google Colab for coding