PROBABLY PRIVATE

Issue # 14: Common AI Product Privacy Mistakes, Masterclasses and Trainings
logo

Common AI Product Privacy Mistakes, Masterclasses and Trainings

Common AI Product Privacy Mistakes, Masterclasses and Trainings

Hello privateers, and a happy welcome to Spring!

I've been busy writing the second part of the longer series on memorization in AI/ML systems... expect those articles and accompanying videos this summer.

In case you missed it on social media, we had a great second Feminist AI LAN party at PyCon DE. I hope some of you might be inspired to host your own! I'll be publishing some enhancements to the kits soon, so you can build your own computer and LAN.

I've officially ended my search for a full-time position in the EU/DE public sector and started working on freelance, training and speaking engagements. So far, these are all focused on practical data privacy in AI/ML/data science. If you want to work with me in this capacity, please feel free to reply to this email or schedule a free chat.

During my open office hours, I received a lot of interest around building AI products with privacy-by-design and security by default. What would that even mean?

Common AI Product Privacy Oopsies

I've singled out some common privacy anti-patterns I see when teams are first building AI-supported products and systems. This is advice I would offer any person or team looking into the initial architecture and deployment -- even if it's a PoC!

  1. No consideration or comparison between API-based AI and local-first AI: Although it's surely the case that no local model can compare with the state-of-the-art largest API-based models, it doesn't mean that it cannot perform well enough for your use case. Too often I see teams paying lots of money for a fancy API-based model and using it for a task that could just as easily be done with a smaller, local-first model. This has the added benefit that the company can run it on-prem, keep the data local, further develop the system and switch out new models for evaluation whenever they want. On a related note...

  2. No evaluation criteria that considers privacy concerns: You do have an evaluation or testing harness, right? Since you already have use-case specific evaluation data that you use to compare models, then why not also incorporate a few privacy tests? This can be as simple as slipping in a request for extra information from the document or model, or seeing what responses are triggered when providing personal details in the prompt. Ideally these tests are built with input from privacy and security experts at your organization and alongside additional security and responsible AI testing you are working on. When evaluating your models, you can then generate a "trustworthy AI" score alongside the other performance scores.

  3. Connecting data without understanding its origin (or secrecy): For businesses with sensitive data, often the initial use case is a RAG on some sort of internal data. Too often this data is of unknown privacy/confidentiality or mixed (i.e. sensitive data mixed in with public data). The problem with not having a grip on document and data provenance is that the RAG should probably avoid returning the sensitive data for just any query. There should be some sort of user- and use-case-level understanding of who should have access to what documents and how. An easier starting point is to just make sure that everyone who has access to a RAG should also have access to all of the underlying documents. That way you don't have an accidental data release you weren't prepared for, like the levels oopsie.

  4. Lack of transparency on AI and/or data use: if you are giving users a blank slate to write in, you gotta communicate to them where the data is going and let them decide if they want to say everything the same way. I see this often in internal deployments where teams release PoCs and then don't fully explain where the data goes and how it will be used. This leads to people uploading confidential documents to cloud-based AI models which can breach business agreements. If you are uploading data/text/photos/etc to a third-party service, please state that clearly on the page, even if it's a PoC. If you are putting something in front of customers as a beta release, and they don't know they are talking with Claude, they might be peeved to learn this later. Please just be transparent and experiment with your design/UI/UX team to find a way that appropriately communicates what is going on.

Recommended Reading: Isabel Barberá's in-depth report on managing privacy risks in AI systems is a great place to get started on setting up better assessment and reviews of your own AI systems.

I'm curious, what are some common oopsies you've seen?

New Masterclass: Privacy by Design, Secure by Default AI Systems

I developed a new masterclass to communicate what I've learned from 7 years working on privacy and security in ML into ways for non-ML-experts to get started. I'll be presenting it at GOTO Copenhagen on 30 September, in Melbourne on December 3 and in Sydney on December 10. If you know anyone in those cities who might want to attend, there's a 20% discount for the next few weeks.

In the class, you build a one-day AI product from scratch, walking through how to:

  • Discover privacy and security antipatterns in AI Product design
  • Identify and evaluate privacy risk in AI systems
  • Map data and user flows to identify potential privacy issues
  • Evaluate AI-specific privacy and security threats/attacks
  • Design and review architectures, informed by risk and threat analysis
  • Evaluate and integrate use case specific guardrails and other potential technological solutions (i.e. privacy technologies)
  • Build evaluation datasets and pipelines
  • Define and measure success

Sounds interesting, but you're not in Denmark or Australia? I'm offering in-house and customized trainings and workshops for organizations who want to hire me to work with their teams or advise on developing more private, more secure AI systems. If you want to see if I can help, I'm offering free drop-in consultations on Mondays.

Stay tuned for more open materials from this class as it progresses and hopefully a few webinars to share the materials to a broader audience. Let me know which bits you want to see first!

Talking about privacy is hard right now

I'm noticing in my own work and conversations that talking about privacy with folks is becoming more difficult - and I wanted to write about what I'm noticing in case it's useful for someone else.

Too often I'm noticing responses that show me:

  • a disdain for privacy as a blocker for "progress"
  • an eyeroll on having to think about privacy because it "slows things down"
  • a political statement claiming a certain group or groups don't deserve privacy
  • a lack of patience to understand how privacy might apply (brush off - "legal said we could do it")

What all of this tells me is that privacy is becoming a political term (ahem: it always was) but that the techno-love for AI (or whatever tech you're talking about) is drowning out the critical need for conversations around privacy. The religion of AI has stated that "we shall not do privacy", and the evangelists are spreading "the good word".

It's not all bad news. When talking with actual scientists and engineers building AI systems from the ground up (i.e. training their own) there is still intellectual curiosity and interest in the field. Those who have already seen the privacy problems haven't forgotten about them. Perhaps the influx of hype and interest has created an uneven playing field -- and allowed misinformation to be rife and profitable.

My tips?

  • Curate your conversations: Have in-depth and useful conversations with those who do understand and care, rather than trying to "correct" others who clearly aren't open to the conversation.
  • Embrace misinformation or disdain with curiosity: Ah, can you explain that thought? That's interesting! Can you tell me where you learned that or what made you say that?
  • Promote voices that challenge these narratives: Find people whose work and opinions you admire on these topics and promote their reach and work.
  • Don't let the triggers/"losses" leave you feeling powerless: Take time to recharge your batteries, speak about your feelings in safe spaces and be kind to yourself and others. I can also suggest hosting a Feminist AI LAN Party as a good way to recharge. :)

Hoping this email has left you a little more seen, heard and maybe elicited a chuckle.

I'll be in Brussels next week, so drop me a line if you are at CPDP and want to say hi.

With Love and Privacy, kjam