Happy End-of-Gregorian-Year!
I hope this newsletter finds you well, and that you're taking some rest, restoration and relaxation at the end of 2024. Give yourself some space for quality time with yourself and folks you care about, you (always) deserve it!
This issue continues the AI/deep learning memorization investigation, looking at:
and I'll review some thoughts on how we might be better able to test or regulate these problems during model development and/or auditing.
As mentioned in the last newsletter, data collection and preparation steps contribute significantly to memorization. But what do models memorize and how exactly do they memorize?
The most obvious way memorization happens is when highly repeated examples are memorized. You can explore this by prompting any commercial AI system for a famous quote, song or when using image-based generative AI, a famous person's name to render a picture of their face. In a way, this is "expected" and desired memorization, because this is the use case we expect to produce when training on repeated examples and what we want to produce in many of today's consumer-facing AI chatbots. We can, however, question whether the famous persons, the authors, the artists and other creators would have wanted their work or likeness to be reproduced in such a way.
The second major way memorization happens is when novel examples are memorized. This phenomenon has been explored both in theory (breaking it down in probability and statistics) and in practice. In practice, researchers have been able to extract uncommon images, to grab personal contact information from ChatGPT (Poem attack) and find specific training data examples that are memorized by comparing performance between models that have and have not been trained on a given example.
These attacks leverage the same types of data extraction explored in Membership Inference and Model Inversion attacks, and are a core bug/feature of how deep learning models "learn". Interestingly enough, deep learning models with a long-tailed dataset as training and evaluation data must memorize even singleton examples in order to perform well. Therefore, outliers, who are already in a higher privacy risk category than others, end up being useful to memorize.
But, if models memorize wouldn't it mean they couldn't generalize? Well, in the third article on how it happens, you can explore how overparameterization and model size growth have created enough parameter space to both memorize the outliers (which helps when potentially similar errors or outliers surface again) and to generalize for other classes/groups/examples. Today's biggest models have been proven to do both!
So.... what should an AI/ML practitioner who also cares about privacy do?
In 2025, I'll be exploring the solution space with articles on AI guardrails, unlearning, differential privacy during model training and inference -- as well as some "wild" new ideas, like guiding general models to personal and private use, building communal consensual training datasets and finding new ways to audit memorization in AI/ML systems.
If you haven't had a chance to read the series or watch the videos, maybe now's a good time? 😉
The EDPB announced last week its opinion on personal data use in AI models, which outlines the following:
I also had the pleasure of reading Marit Hansen and Benjamin Walczak's article "Die KI zaubert nicht" (translated: the AI doesn't conjure, or more loosely, AI isn't witchery/wizardry). Their take on Hamburg's discussion paper pointed out that AI systems can and do indeed memorize data that can be exfiltrated, and that more should be done to both ensure privacy during model development as well as countering other privacy problems like AI hallucination/errors.
My take: Most of the research around measuring LLM memorization is best done if the underlying language model is accessible before fine-tuning. If that model alongside the original training data could be audited for memorization, it would be much easier to provide clear estimates of training data memorization and measure the influence of fine-tuning and guardrails. This holds for other generative model types as well (i.e. diffusion).
Today, these models are almost never released (even when the fine-tuned model is released) and the training data is certainly not released. This leads to what I call "whack-a-mole" memorization research, where researchers must either have enough money, data and resources to model the systems appropriately, or take a good guess at the model architecture and training data to try to prove memorization and extraction.
The next year and years following should be an interesting time to review how we train AI models, how we manage training data and model governance, and how we begin to reason about anonymization and consent in AI systems. I'm excited! And I'd love to hear what questions are on your mind that I can help explore next year.
I usually do some sort of end-of-the-year workshop for myself, which always involves some element of "Futurespective".
Here are a few things that I'm cooking up next year:
If any of those sound particularly interesting to you and you'd like to be informed when they happen, feel free to hit reply! I'll also be posting updates on LinkedIn and here once things are more concrete.
My goal for my sabbatical was to write accessible introductions to the problem of AI memorization. To help my work and writing next year, I would love to hear:
Until next time, enjoy dead week!
With Love and Privacy, kjam