Hello privateers,
I'm writing you after many weeks of traveling with an amazing YOW! Conference in Austrailia and then in a van in South Island New Zealand. It was wonderful and humbling to swim in the South Pacific Ocean.

Māori art and the South Sea
I hope you've had a chance to take some time "between the years" to relax, unwind, connect with nature, people or animals. Or really do whatever it is you enjoy doing.
Last issue we looked at how differential privacy can be applied to deep learning and shared some tips and tricks to get started.
But why is it that not everyone does this? Some easy answers are that most places don't really know how and aren't willing to invest in figuring it out.
In this issue, you'll explore a few of the harder parts of applying differential privacy to machine learning problems once you've overcome the initial barriers related to investment and prioritization.
What data needs to be protected in the ML lifecycle?
Often companies use models already trained on publicly available data and then finetune using differential privacy, but Tramèr et al. (2024) released a position paper calling for a more nuanced approach to what data is considered public. In their paper, they enumerate several cases where publicly available data violated individual privacy.
When applying differential privacy, you need to think through not just fine tuning but data exposure in the entire model lifecycle, from collection, training, evaluation and inference. If there is sensitive data at any of those stages, the same privacy questions apply.
This doesn't mean you can't or shouldn't fine tune with differential privacy (I give you several examples of how to do this in my book). It just means that you might still have memorization and overexposure, especially in overparametrized models.
Is your data representative enough to learn privately?
Tramèr and Boneh (2021) discovered to reach the same deep learning accuracy of a normally trained model, a differentially private model might need an "order of magnitude" more data.
How come? Well, if the model cannot memorize or learn from just one example, it must process many examples of that concept to learn it privately.
There's been significant evolution of the overlap between learning theory and differential privacy. For example, there are bounds as to what can be learned (PAC theory) when it comes to using differential privacy on certain complex distributions. There's also mathematical proof that all distributions are differentially privately learnable although they might not be efficient.
In an applied setting: If you don't have enough different and diverse data, or if you don't have a well-defined problem space and can't find data that adequately represents that problem, you probably won't be able to learn privately.
My advice: think through your problem and task deeply. Figure out what you actually really need to learn and what is superfluous. Then determine if you can actually simulate, collect or produce data that matches that requirement. This will help you not only learn privately, but also more efficiently.
Can some tasks ever be private?
Brown et al (2022) asked this question about today's language models. The authors argue that Nissenbaum's contextual integrity should apply to language data.
Nissembaum's theory says each user should have autonomy about how, where and in what context their data appears. The authors argue the only data that matches how LLMs are used today is data intended to be freely available for the general public.
Text origin and ownership is often difficult to define, which is a key decision to appropriately apply differential privacy. For example, to do appropriate privacy accounting, you define how much one person can contribute to the training. This is surprisingly difficult for text data because sometimes someone is quoting another person, or paraphrasing or referencing. Or someone may use different accounts or handles but be the same person. How can you define authorship well enough to apply differential privacy?
For those of us working on this problem: what large language models can truly be "privacy preserving"? When can you ensure the guarantees match the real-world concerns and context? Is example-level good enough for the problem at hand? Do we need to think through attribution at a higher level?
How can you work interdisciplinary to develop real use cases?
Applying differential privacy so that it actually fits the organization's understanding of privacy and risk can be a challenge. Handoffs almost always lead to confusion. This can result in missed product expectations, poor performance and also privacy and security mistakes. Interdisciplinary processes can help build knowledge, understanding and communication that helps smooth handoffs and clarify these expectations.
Ideally the AI/ML product lifecycle includes product, privacy, security, machine learning and risk stakeholders from the beginning. It could look something like this:
I'm curious: does your model lifecycle look like this?
If your organization is ready to explore AI privacy and security, you can work with me on the harder parts of applying differential privacy to today's AI.
I've summarized these points from a longer article, in case you'd like to read more.
If you are a resolutions, goals or new practices in the new year kind of person, I thought I'd share a few things I would recommend if you're looking to learn and grow.
In 2026, I'll be sharing my local-first personal AI use cases, developing materials and videos for multidisciplinary thinking in AI privacy and security and show you how to apply my book directly into your AI product lifecycle. Requests, open questions and suggestions very welcome!
I'm presenting in Brussels at FOSDEM this year on Sovereign AI.
Here's a few questions I'm noodling on. If you have feedback, articles and thoughts you're willing to share, send them my way!
If you'll be at FOSDEM and want to meet up or chat, please drop me a line! It's a pleasure to read from you; either by a quick reply or a postcard or letter to my PO Box:
Postfach 2 12 67 10124 Berlin Germany
Until next time!
With Love and Privacy, kjam