Building an AI Training Data Pipeline with VAST Data

Model training seriously stresses data infrastructure, but preparing that data to be used is a much more difficult challenge. This episode of Utilizing Tech features Subramanian Kartik of VAST Data discussing the broad data pipeline with Jeniece Wnorowski of Solidigm and Stephen Foskett. The first step in building an AI model is collecting, organizing, tagging, and transforming data. Yet this data is spread around the organization in databases, data lakes, and unstructured repositories. The challenge of building a data pipeline is familiar to most businesses, since a similar process is required in analytics, business intelligence, observability, and simulation, but generative AI applications have an insatiable appetite for data. These applications also demand extreme levels of storage performance, and only flash SSDs can meet this demand. A side benefit is the improvements in power consumption and cooling versus hard disk drives, and this is especially true as massive SSDs come to market. Ultimately the success of generative AI will drive greater collection and processing of data on the inferencing side, perhaps at the edge, and this will drive AI data infrastructure further.

Podcast Information:

Stephen Foskett, Organizer of the Tech Field Day Event Series, part of The Futurum Group. Find Stephen’s writing at GestaltIT.com, on Twitter at @SFoskett, or on Mastodon at @[email protected].

Jeniece Wnorowski is the Datacenter Product Marketing Manager at Solidigm. You can connect with Jeniece on LinkedIn and learn more about Solidigm and their AI efforts on their dedicated AI landing page or watch their AI Field Day presentations from the recent event.

Subramanian Kartik, Ph. D, is the Global Systems Engineering Lead at VAST Data. You can connect with Subramanian on LinkedIn and learn more about VAST Data on their website or watch the videos from their recent Tech Field Day Showcase.

VAST Data Tech Field Day Showcase:

Thank you for listening to Utilizing Tech with Season 7 focusing on AI Data Infrastructure. If you enjoyed this discussion, please subscribe in your favorite podcast application and consider leaving us a rating and a nice review on Apple Podcasts or Spotify. This podcast was brought to you by Solidigm and by Tech Field Day, now part of The Futurum Group. For show notes and more episodes, head to our dedicated Utilizing Tech Website or find us on X/Twitter and Mastodon at Utilizing Tech.