Teaching

Data-Driven Security/Privacy/Fairness (CS 4390/5390/6390)---Fall 2022

Course Description

Data-driven software is growing fast and has pervasively applied in critical domains such as self-driving car, malware detection, medical software, hiring, and criminal justice. Since this application learns the decision logic from passive historical data, they are prone to security vulnerabilities, privacy leaks, and fairness bugs when deployed in security-critical and socially critical applications.

The primary goal of this course is to study the opportunities and challenges in deploying data-driven solutions. After covering the basics of machine learning (ML) such as decision trees and deep neural networks, students explore topics related to robustness and fairness of ML algorithms. Then, they study the application of data-driven solutions to improve security and reliability of traditional software.

Course Objectives

Upon completion of this course, students will be able to:

Students have a clear understanding of computations in data-driven and ML models,
Students can evaluate the security, privacy, and fairness of prevalent ML models such as deep neural networks,
Students understand the limitations of data-driven software and learn alternative cutting-edge techniques such as causality,
Students can apply data-driven techniques for various computational problems such as software design, testing, and debugging, and
Students have necessary backgrounds to apply for jobs in domains such as algorithmic fairness, data privacy, and adversarial machine learning.

Course Topics

Overview of machine learning (ML): KNN, Linear Classification, Logistic Regression, and Neural Networks.
Adversarial Machine Learning: Adversarial Example Attacks and Data Poisoning Attacks.
Challenges in Defending and Detecting Adversarial Examples and Data Poisoning.
Data Privacy and Reconstruction attacks.
Differential Privacy.
Membership Attack and Model Inversion.
Introduction to Algorithmic Fairness.
Fairness Testing, Debugging, and Mitigation
Causality, Intervention, and Counterfactuals.
White-box and Gray-Box Testing for Machine Learning Systems.
Challenges and Opportunities in Deploying ML-based Software Systems.
Machine Learning for Code Interpretability (Guest Lecture by Prof. Reyhaneh Jabbarvand-Behrouz).
Where Machine Learning and Software Engineering Meet? (Guest Lecture by Prof. Baishakhi Ray).
Traditional and Modern Software Testing and Debugging.
The Application of ML for Security Fuzzing and Software Testing.
Explainable Machine Learning.

Prerequisite

This course requires no prior experience in security and privacy but assumes the willingness to seek out and read background material as needed. Although it is not a requirement, knowledge in core topics of machine learning and familiarity with Python and Numpy is a significant advantage.

Course Structure

This is a research-oriented and discussion-based course, which also includes hands-on exercises and programming assignments. The students are required to write a review for assigned papers prior to the class so that they can participate in class discussions. Every student needs to present a major paper listed in the course syllabus and lead discussions. Students will also work on a major project in group of 1, 2, or 3 and deliver write-ups, code, and presentations in phases.

See Presentation Schedule