How Data Happened: A History, with Columbia Prof. Chris Wiggins

Data Science

Chris Wiggins — Chief Data Scientist at The New York Times and faculty at Columbia University — talks to our Chief Data Scientist, Jon Krohn, to provide an enthralling, witty and rich History of Data Science.

Chris:
• Is an Associate Professor of Applied Math at Columbia University.
• Has been CDS at The NY Times for nearly a decade.
• Co-authored two fascinating recently-published books: “How Data Happened: A History from the Age of Reason to the Age of Algorithms” and “Data Science in Context: Foundations, Challenges, Opportunities”

The vast majority of this episode will be accessible to anyone. There are just a couple of questions near the end that cover content on tools and programming languages that are primarily intended for hands-on practitioners.

In the episode, Chris magnificently details:
• The history of data and statistics from its infancy centuries ago to the present.
• Why it’s a problem that most data scientists have limited exposure to the humanities.
• How and when Bayesian statistics became controversial.
• What we can do to address the key issues facing data science and ML today.
• His computational biology research at Columbia.
• The tech stack used for data science at the globally revered New York Times.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released “StableLM”, their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.

read full post