PROBABLY PRIVATE

Issue # 6: All your text are belong to ChatGPT
logo

All your text are belong to ChatGPT

All your text are belong to ChatGPT

Everyone seems ChatGPT crazy right now, so I decided to jump on the wagon and look at privacy and security of the service ChatGPT. Some of you might not know, I started in NLP many years ago now (technically: 2003 was my first exposure!), and I've been following the advances (and hype) excitedly for the past 5 years, despite my focus on privacy and security.

In this newsletter, I'll cover:

  • A known and disclosed ChatGPT privacy leak
  • Security vulnerabilities of ChatGPT plugins
  • Ethical implications of open vs closed datasets
  • My new PO Box (please write me letters :) )

Announcing the Probably Private PO Box

A photo of an open post office box at a Deutsche Post office.

First things first, I have a PO Box! 📬 As I've been investigating issues in email and internet privacy, I wondered if there wasn't a better way to protect reader privacy, and decided that Deutsche Post would be a good candidate for adding a new way of interacting with me and privacy topics.

Many years ago I used to write zines, subscribe to catalogs and diligently check mailboxes. Now most of my mail is just bills, paperwork and the occasional direct mail (yes, we still have that in Germany!).

I would love to send a receive real physical mail, and from what I know so far about the postal systems, they don't yet track most of it on an individual basis unless you ask them to, meaning you can choose your level of privacy and consent -- a wonderful gift!

If you want to write me, send me a postcard, a zine, a funny photo, please do! I will write back if you include a return address. I'm at:

Katharine Jarmul
Postfach 2 12 67
10124 Berlin

ChatGPT Privacy Leak

You may have missed the leak last week where OpenAI's ChatGPT erroneously leaked user chat histories to other users. It hasn't yet been disclosed how many users were affected and what potentially sensitive conversations were leaked.

Users began noticing problems and posting them to Twitter and other social media on March 20, with either questions: Whose histories are these, they aren't mine... or comments about the clear privacy violations. Although OpenAI directly tells users to not reveal any sensitive data to the service, they also don't explicitly validate or remove conversations with sensitive data. This leads the question as to who was able to see and exfiltrate data from these conversations before OpenAI rolled back the error.

What most users are not aware of is that it's quite likely their chats are part of active training, and can also leak in chat outputs to themselves or other users as Carl Bergstrom notes. OpenAI is already aware that corporate secrets have been exposed due to Amazon's recent legal actions after finding sensitive internal data in ChatGPT's responses.

It also brings to light the nature of handling and managing sensitive data for these AI service providers. There likely should have been tests in place to ensure this didn't happen. They could also design chats to be inherently ephemeral, only turning history on via opt-in. And finally, when security and privacy breaches happen, like last week, they can certainly do better than posting a status update -- a serious investigation and incident report is what is expected and required to rebuild trust with users and organizations using the service as well as to understand the scope of the problem and users affected. This is unlikely to be the last incident we hear about, given the response thus far.

ChatGPT Security and Closed Datasets

ChatGPT-4 was launched this month, sparking new debates on AI hype and "Sparks of AGI", AI security issues and principles of open research. I won't touch on the AI/AGI debate in this newsletter, but the other two topics are near and dear to my work.

As Florian Tramèr aptly noted on Twitter, the security aspects of connecting a Reinforcement Learning with Human Feedback (RLHF) + Large Language Model (LLM) to your private accounts opens all sorts of security vulnerabilities. You can imagine a million plugins, such as email autorespond, spam filter, calendar invite organizer, iCloud backup filtering and search, that sound, on the surface, like great ideas! How cool to never have to sort or respond to your own email except for a select few?

No one is certain how vulnerable these models are to attack, but they certainly can and will be attacked. Just like the field of prompt engineering is growing, so is the field of prompt hacking. I've been following Arvind Narayanan's explorations on website prompt injections, which are both clever and hilarious. There's also been plenty of examples of successfully jumping guardrails and getting ChatGPT to teach you how to carjack and other supposedly blocked behaviors.

Now, imagine someone sends an email to sort through all your emails and forward any that say "love" to a different email address. Imagine someone exporting your entire calendar via one calendar invite. Imagine someone being able to hide a message in something you take a photo of that gives instructions to the new "multi-modal" models.

Some of these might seem like science fiction, but OpenAI and many other research groups aren't releasing information about the guardrails or security testing that they've done to adequately assess the risks. And this links directly to the issue of not releasing any open training datasets.

Releasing chat data as open data is a complex issue. On one hand, if OpenAI, as suspected, are using all previous chat histories, there are serious privacy risks with releasing that data publicly. If the training data includes other private data sources, or even powerful crawled datasets, these can also contain sensitive information and expose people in new ways to their words and ideas being taken out of context or used against them.

On the other hand, there are massive ethical risks in not releasing datasets for public inspection. I'm reminded of the amazing work of Kate Crawford and Trevor Paglen on "Excavating AI", where they investigated large image datasets, exposing sexism, racism, hatred, bigotry and cultural biases. As they dove deeper, they found murky parts of ImageNet, as described in their article (some words removed for brevity):

As we go further into the depths of ImageNet’s Person categories, the classifications of humans within it take a sharp and dark turn. There are categories for Bad Person, Call Girl, Drug Addict, Closet Queen, Convict, Crazy, ... Jezebel, Kleptomaniac, Pervert, Prima Donna, Schizophrenic, Tosser, Unskilled Person, Wanton, Waverer, and Wimp. There are many racist slurs and misogynistic terms.

Reading their research reminded me of my own investigation and discoveries into *isms in Word Vectors, in which I discovered similar words (and worse) in Google's News Vectors, released in 2016. Equally depressing for me at the time, was that the vector output of computer programmer - man + woman was homemaker.

How do researchers investigate these biases if the training data or embeddings aren't released? I don't see an easy way for this to happen, particularly with the RLHF interface between users and the language model itself. One would hope that OpenAI is willing to release the language model for open science and future discovery into its ethical issues and encoded societal and cultural biases which can and will cause harm. On the other hand, I am not so sure that they have properly removed sensitive, personal content, or even properly anonymized any of the input -- so the privacy risks, especially for outliers, would be extreme.

Crawford and Paglen end their article with some fairly powerful words, that I'll leave here for your contemplation:

There is much at stake in the architecture and contents of the training sets used in AI. They can promote or discriminate, approve or reject, render visible or invisible, judge or enforce. And so we need to examine them—because they are already used to examine us.

Ask: I'm curious about jailbreaks you have seen, problematic output or sketchy ChatGPT behavior. Hit reply if you have some examples you're willing to share. 😊

I'd be excited to hear from you what you'd like more (or less) of in this newsletter. Please hit reply (or mail me something!) and tell me how you liked it, and what you'd like to hear in future issues!

With Love and Privacy, kjam