AI training is a uniquely data-hungry application, and it requires a special data pipeline to keep expensive GPUs fed. This episode of Utilizing Tech focuses on the data platform for machine learning, featuring Molly Pressley of Hammerspace along with Frederic Van Haren and Stephen Foskett. Nothing is worse than idle hardware, especially when it comes to expensive GPUs intended for ML training. Performance is important, but parallel access and access to multiple systems is just as important. Building an AI training environment requires identifying and eliminating bottlenecks at every layer, but many systems are simply not capable of scaling to the extent required by the largest GPU clusters. But a data pipeline goes way beyond storage: Training requires checkpoints, metadata, and access to different data points. And different models have unique requirements as well. Ultimately, AI applications require a flexible data pipeline not just high-performance storage.
Podcast Information:
Stephen Foskett, Organizer of the Tech Field Day Event Series, part of The Futurum Group. Find Stephen’s writing at GestaltIT.com, on Twitter at @SFoskett, or on Mastodon at @[email protected].
Frederic Van Haren is the CTO and Founder at HighFens Inc., Consultancy & Services. Connect with Frederic on LinkedIn or on X/Twitter and check out the HighFens website.
Molly Presley, Head of Global Marketing at Hammerspace. Connect with Molly on LinkedIn or learn more on the Hammerspace website and watch their presentations from AI Field Day 4.
Thank you for listening to Utilizing AI, part of the Utilizing Tech podcast series. If you enjoyed this discussion, please subscribe in your favorite podcast application and consider leaving us a rating and a nice review on Apple Podcasts or Spotify. This podcast was brought to you by Tech Field Day, now part of The Futurum Group. For show notes and more episodes, head to our dedicated Utilizing Tech Website or find us on X/Twitter and Mastodon at Utilizing Tech.