Data Science Icon

Data Science

94 Stories
All Topics

Practical AI Practical AI #142

Building a data team

Inspired by a recent article from Erik Bernhardsson titled “Building a data team at a mid-stage startup: a short story”, Chris and Daniel discuss all things AI/data team building. They share some stories from their experiences kick starting AI efforts at various organizations and weight the pro and cons of things like centralized data management, prototype development, and a focus on engineering skills.

Practical AI Practical AI #139

Vector databases for machine learning

Pinecone is the first vector database for machine learning. Edo Liberty explains to Chris how vector similarity search works, and its advantages over traditional database approaches for machine learning. It enables one to search through billions of vector embeddings for similar matches, in milliseconds, and Pinecone is a managed service that puts this capability at the fingertips of machine learning practitioners.

Practical AI Practical AI #138

Multi-GPU training is hard (without PyTorch Lightning)

William Falcon wants AI practitioners to spend more time on model development, and less time on engineering. PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research that lets you train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code! In this episode, we dig deep into Lightning, how it works, and what it is enabling. William also discusses the Grid AI platform (built on top of PyTorch Lightning). This platform lets you seamlessly train 100s of Machine Learning models on the cloud from your laptop.

Practical AI Practical AI #137

Learning to learn deep learning 📖

Chris and Daniel sit down to chat about some exciting new AI developments including wav2vec-u (an unsupervised speech recognition model) and meta-learning (a new book about “How To Learn Deep Learning And Thrive In The Digital World”). Along the way they discuss engineering skills for AI developers and strategies for launching AI initiatives in established companies.

Lj Miranda ljvmiranda921.github.io

How to improve software engineering skills as a researcher

In which Lj Miranda proposes an exercise that data scientists can do to learn relevant software skills (with a tangible output in the end).

Create a machine learning application that receives HTTP requests, then deploy it as a containerized app.

I’m willing to wager that this is a worthy goal even if you’re coming from the software engineering side of the spectrum. Don’t worry, he’ll walk you through the steps.

Practical AI Practical AI #127

Women in Data Science (WiDS)

Chris has the privilege of talking with Stanford Professor Margot Gerritsen, who co-leads the Women in Data Science (WiDS) Worldwide Initiative. This is a conversation that everyone should listen to. Professor Gerritsen’s profound insights into how we can all help the women in our lives succeed - in data science and in life - is a ‘must listen’ episode for everyone, regardless of gender.

Practical AI Practical AI #122

The AI doc will see you now

Elad Walach of Aidoc joins Chris to talk about the use of AI for medical imaging interpretation. Starting with the world’s largest annotated training data set of medical images, Aidoc is the radiologist’s best friend, helping the doctor to interpret imagery faster, more accurately, and improving the imaging workflow along the way. Elad’s vision for the transformative future of AI in medicine clearly soothes Chris’s concern about managing his aging body in the years to come. ;-)

Career mihaileric.com

We don't need data scientists, we need data engineers

TLDR:

There are 70% more open roles at companies in data engineering as compared to data science. As we train the next generation of data and machine learning practitioners, let’s place more emphasis on engineering skills.

This vibes with what I’ve been hearing on Practical AI lately. Organizations are facing big challenges when it comes to deploying, maintaining, and improving data processing tools and platforms in production settings. Big challenges produce big opportunities. And what does a data engineer do? According to this article:

Develops a robust and scalable set of data processing tools/platforms. Must be comfortable with SQL/NoSQL database wrangling and building/maintaining ETL pipelines.

If you have that skillset, you are in high demand today. And if you can adapt that skillset and be considered a ML engineer, you will be in high demand for a long, long time.

We don't need data scientists, we need data engineers

Practical AI Practical AI #116

Engaging with governments on AI for good

At this year’s Government & Public Sector R Conference (or R|Gov) our very own Daniel Whitenack moderated a panel on how AI practitioners can engage with governments on AI for good projects. That discussion is being republished in this episode for all our listeners to enjoy!

The panelists were Danya Murali from Arcadia Power and Emily Martinez from the NYC Department of Health and Mental Hygiene. Danya and Emily gave some great perspectives on sources of government data, ethical uses of data, and privacy.

Practical AI Practical AI #115

From research to product at Azure AI

Bharat Sandhu, Director of Azure AI and Mixed Reality at Microsoft, joins Chris and Daniel to talk about how Microsoft is making AI accessible and productive for users, and how AI solutions can address real world challenges that customers face. He also shares Microsoft’s research-to-product process, along with the advances they have made in computer vision, image captioning, and how researchers were able to make AI that can describe images as well as people do.

Practical AI Practical AI #114

The world's largest open library dataset

Unsplash has released the world’s largest open library dataset, which includes 2M+ high-quality Unsplash photos, 5M keywords, and over 250M searches. They have big ideas about how the dataset might be used by ML/AI folks, and there have already been some interesting applications. In this episode, Luke and Tim discuss why they released this data and what it take to maintain a dataset of this size.

Practical AI Practical AI #113

A casual conversation concerning causal inference

Lucy D’Agostino McGowan, cohost of the Casual Inference Podcast and a professor at Wake Forest University, joins Daniel and Chris for a deep dive into causal inference. Referring to current events (e.g. misreporting of COVID-19 data in Georgia) as examples, they explore how we interact with, analyze, trust, and interpret data - addressing underlying assumptions, counterfactual frameworks, and unmeasured confounders (Chris’s next Halloween costume).

Peter Wang anaconda.com

Anaconda's dividend program helps sustain the open source DS/ML community

Anaconda CEO (and Practical AI guest) Peter Wang:

I am excited to announce the Anaconda Dividend Program, which formalizes our commitment to direct a portion of our revenue to open-source projects that help advance innovation in data science. We are launching the program in partnership with NumFOCUS, and will kick off with a seed donation of $10,000, as well as an additional 10% of single-user Commercial Edition subscription revenue through the end of this year. Going forward, we will fund the dividend with at least 1% of our revenue in 2021, with a minimum of $25,000 committed for the year.

We’ve been beating the successful-businesses-that-thrive-in-large-part-due-to-open-source-software-should-set-aside-revenues-to-support-those-projects drum for years now, so it’s exciting to see forward-looking companies like Anaconda step up and do just that. More like this! 🙏

Practical AI Practical AI #109

When data leakage turns into a flood of trouble

Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

0:00 / 0:00