Issue # 18: How does machine unlearning work?

How does machine unlearning work?

Hello privateers,

I'm back in Europe after an amazing PyCon India 2025. My keynote linked information theory, propaganda and how we can bring real information back to AI/ML narratives. I also got a chance to meet some really cool humans, help pack bags, attend a PyLadies lunch and drink (almost) too much tea.

In this issue, we're exploring how to actually implement machine unlearning. How can you selectively remove information from AI/ML models?

Implementing Machine Unlearning

Unlearning definitions vary (which you learned about in the last Probably Private), but once you have one that works for you, you can begin implementation. Like the definitions, today's unlearning methods are dependent on your model architecture and data choices.

If you are using simpler or smaller machine learning models, you can either build model ensembles or leverage statistical query algorithms, which query the underlying features/data to assemble aggregate inputs. With these approaches, you're essentially "cutting out" data points or contributions for deletion.

In ensemble learning, this happens by rolling one of the models back to a particular checkpoint before it was trained on a particular example or by retraining one of the models entirely. In statistical query algorithms, you can rerun a query presuming your model doesn't significantly depend on the data you are deleting. Check out Sharded, Isolated, Sliced, and Aggregated (SISA) learning for an ensemble example and one of the first unlearning papers (2015) for statistical query unlearning.

To unlearn with today's large scale deep learning models (LLMs, multi-modal models, diffusion, computer vision, etc.) without moving to an ensemble, you'll basically continue finetuning the network, but this time to forget/unlearn examples. I'll call these approaches "deep unlearning".

To do this, you'll assemble a set of examples called the "forget set" which you want to forget, and hopefully you also have a set of examples similar to the forget set that you want to retain (i.e. of the same class, with similar qualities but data that you are allowed to continue using).

Deep Unlearning methods include:

ascending the loss gradient for your forget set
maximizing forget set loss while minimizing retain set loss
training the forget set toward an alternative label/output (i.e. target towards a new generic answer/label)
some mixture of the above combined with classic deep learning methods (like additional finetuning, freezing weights, bounding loss contributions to avoid huge gradients, etc.)

Sounds doable in most deep learning setups. But the hard part is really in the details.

For many setups, defining forget and retain sets is going to be difficult, because data expiration or deletion requests are often separate from how training data is collected and prepared. Data duplication or murky data lineage also introduces challenges in unlearning if your training data isn't properly documented.

In addition, scaling datasets so you can unlearn without breaking model utility is challenging. Unlearning 1% of the training data is significantly different than trying to unlearn 10%. This also depends on data and task complexity (i.e. more complexity = harder to unlearn without breaking the model entirely).

Aside from that, you also need to:

define which metric you are using to prove unlearning and test it
choose which deep unlearning methods to test on your model setup/compute
experiment with unlearning combinations to find ideal compute expense for forgetting quality
address unlearning model or training performance issues that are likely to arise, such as utility loss, unstable unlearning and aforementioned scaling issues

This is a lot of work! Given the field of unlearning is still young, I hope that choosing a metric and finding the right deep unlearning method combination will significantly improve in the next few years; but it's also possible that model retraining will happen more often than unlearning.

If large AI vendors are going to pretrain and train/finetune a new set of large models every 4 months, does it make sense to dedicate time and resources to unlearning or would it be better to build model governance to more easily segregate data for deletion from training datasets? Ideally there's budget for both so unlearning continues to progress from research into reality.

Sounds interesting? In the longer article on how unlearning is done, I uncover additional approaches that I think could unlock new insights to improve unlearning, including ideas inspired by the same patterns as LORA fine-tuning and new architectures informed by information theory and differential privacy.

If you're an audio/visual learner, I posted a new Probably Private YouTube summary of this article.

Online Masterclasses

I've been preparing content for my masterclasses and there's been requests to attend the class in an online setting. I'm curious to get your feedback. Would an online version be interesting for you?

Core themes and activities:

Real-World Threat Modeling: Identify vulnerabilities in AI systems
Hands-On Red Teaming: Execute and evaluate attacks on models
Meta Prompt Engineering & Guardrails: Create useful and more privacy-aware meta prompts. Use guardrails to identify insecure prompts or questionable AI output
Data Flow Analysis, Risk Assessment, Privacy Controls: Map and mitigate privacy and confidentiality risks in data workflows. Choose appropriate protections for identification, sanitization and pseudonymization.
Practical Model Evaluation Strategies: Build evaluation datasets and integrate security & privacy testing into your deployment workflow.

If any part of this sounds interesting, can you take a minute to reply to this email with:

price range you'd find reasonable
time suggestions (i.e. evenings, weekends, which timezones)
burning questions you'd hope to have answered or topics you wouldn't want to miss

Looking forward to reading your thoughts!

Conference season is starting, so here's some upcoming speaking engagements where you can catch me:

If you enjoyed this newsletter, consider forwarding it to someone so they can subscribe.

With Love and Privacy, kjam