Issue # 19: Attacks on Machine Unlearning and new Red Teaming Course

Attacks on Machine Unlearning and new Red Teaming Course

Hello privateers,

I hope October is treating you well. Since our last newsletter, I presented at GOTO Copenhagen and InfoQ Dev Summit München, took a dip in the Berlin Spree (it was 13 Celsius!) and released my Red Teaming AI/ML Mini-Course (YouTube).

In this issue, we're wrapping up our investigation of unlearning. The past few newsletters on AI/ML memorization have explored unlearning definitions and how to actually do unlearning. As a reminder, machine unlearning promises a way to select and remove data from models, which would help with the privacy problems caused by memorization.

But, does unlearning produce artifacts that attackers could use to their advantage? Let's explore!

Attacks on unlearned models

Unlearned models are different than their learned counterparts, which creates avenues for new attack vectors. Let's investigate two major ones:

Gradient Reconstruction via Model Differencing: If an attacker has both the prior model and the unlearned model, they could potentially reconstruct the unlearned example(s) by calculating the difference between the models and using that as a gradient approximation. (See Betran et al. (2024))
Membership Inference Attack comparing models to find what was unlearned: If an attacker has access to the previous model or saved the responses on forget set examples, they can use those to compare the shift in response and infer that those examples were definitely unlearned. (See Chen et al. (2021))

Given the unlearned information is probably deletion requests or legal troubles, this isn't a great start. And unfortunately that's not all.

In addition to differencing attacks, Hayes and colleagues from Google DeepMind (2024) revealed that many unlearning papers didn't provide representative or worst-case testing for their privacy auditing. Many papers performed weaker membership inference attacks (MIAs) and often audited these attacks only via the forget set rather than a wider sample of forget and retain points. The authors also found that forget sets were not always very representative, leading to a false sense of privacy in potential worst-case scenarios.

Why do you need to test more than just the forget set for potential privacy leakage? Carlini et al. exposed "The Privacy Onion Effect" in 2022, which proves that removing memorized data exposes new, different data points that were previously sheltered by those memorized points.

They define the effect as:

Removing the “layer” of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the same attack

To prove unlearning's privacy benefits, robust testing of both the forget set examples and additional examples that might be overexposed due to their removal is necessary. This type of testing requires significant investment in current MLOps setups.

Which begs the question: what are we really improving with unlearning? The promise is that unlearning is a cheap and efficient way to remove problematic memorization and yet it introduces new risks and increased testing costs. Is it really the right solution for this problem?

To make unlearning a practical and viable solution to memorization, AI vendors and infrastructure/tool providers could:

work to clarify definitions
implement repeatable and robust privacy testing
regularly release testing results and research to increase awareness, standardization and alignment

What are your thoughts? You can also dive deeper into the attacks via the full article and accompanying video.

In following articles and newsletters this year, you'll continue exploring potential solutions to the memorization problem. Up next is differential privacy 😎

AI/ML Red Teaming Minicourse

If you follow my YouTube channel you already know that I've been releasing a minicourse on red teaming local AI/ML systems.

The motivation came as I've been getting several questions about where to start with privacy and security testing. Upon further investigation, I realized many teams hadn't done an initial round of red teaming or security testing. Frequently security and privacy are not part of model selection decisions (outside of a possible legal review).

Since this is basically all I've done for many years (joke, but... kind of true), I started putting some content out there so people can find an initial starting point with step-by-step instructions. The course notebooks are reworked from the really fun Hacking LLMs for feminism workshop I ran at the PyData Germany Feminist AI LAN Party.

If you've had a look at the Jupyter notebooks for the course or the videos themselves, I'd be curious about other questions, topics and attacks you want to see. I'll be adding a defending against attacks minicourse early next year.

Australian Masterclasses

I'll be in Australia in December for YOW! Conference and offering my Private by Design, Secure by Default AI/ML Product Masterclasses:

If you know anyone who would want to learn with me, please send them a note (there's a minimum class size and the class won't happen if it doesn't reach that size).

The day will be packed with pairing exercises, hands-on experimentation, real world privacy and security testing and mitigations and discussions about how to engage interdisciplinary stakeholders on AI/ML privacy risk and mitigations. The technical requirements are a laptop that has ollama and a few llamafiles installed and willingness to learn. If you want to write some Python, you can, or be paired with someone who does.

Questions on the course are very welcome. I also would love suggestions on what you want in an online version, which I hope to offer next year.

Input Wanted: do you work in sovereign cloud, infrastructure, AI/ML? If so, I'd love to set up some time to chat! I have a few upcoming engagements and talks in the space and want to exchange ideas, get input and meet other folks interested in these topics.

I'm always happy to read from you; either by a quick email or send me a postcard or letter to my PO Box:

Postfach 2 12 67 10124 Berlin Germany

Until next time!

With Love and Privacy, kjam