Generative A.I. without the Privacy Risks (with Prof. Raluca Ada Popa)

Data Science

Consumers and enterprises dread that Generative A.I. tools like ChatGPT breach privacy by using convos as training data, storing PII and potentially surfacing confidential data as responses. Prof. Raluca Ada Popa has all the solutions.

Raluca:
• Is Associate Professor of Computer Science at University of California, Berkeley.
• Specializes in computer security and applied cryptography.
• Her papers have been cited over 10,000 times.
• Is Co-Founder and President of Opaque Systems, a confidential computing platform that has raised over $31m in venture capital to enable collaborative analytics and A.I., including allowing you to securely interact with Generative A.I.
• Previously co-founded PreVeil, a now-well-established company that provides end-to-end document and message encryption to over 500 clients.
• Holds a PhD in Computer Science from MIT.

Despite Raluca being such a deep expert, she does such a stellar job of communicating complex concepts simply that this episode should appeal to anyone that wants to dig into the thorny issues around data privacy and security associated with Large Language Models (LLMs) and how to resolve them.

In the episode, Raluca details:
• What confidential computing is and how to do it without sacrificing performance.
• How you can perform inference with an LLM (or even train an LLM!) without anyone — including the LLM developer! — being able to access your data.
• How you can use commercial generative models OpenAI’s GPT-4 without OpenAI being able to see sensitive or personally-identifiable information you include in your API query.
• The pros and cons of open-source versus closed-source A.I. development.
• How and why you might want to seamlessly run your compute pipelines across multiple cloud providers.
• Why you should consider a career that blends academia and entrepreneurship.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

 

Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released “StableLM”, their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.

read full post