Machine learning is unlike any other enterprise application, demanding massive datasets from distributed sources. In this episode, Bin Fan of Alluxio discusses the unique challenges of distributed heterogeneous data to support ML workloads with Frederic Van Haren and Stephen Foskett. The systems supporting AI training are unique, with GPUs and other AI accelerators distributed across multiple machines, each accessing the same massive set of small files. Conventional storage solutions are not equipped to serve parallel access to such a large number of small files, and they often become a bottleneck to performance in machine learning training. Another issue is moving data across silos, storage systems and protocols, which is impossible with most solutions.
Three Questions:
- Frederic: What areas are blocking us today to further improve and accelerate AI?
- Stephen: How big can ML models get? Will today’s hundred-billion parameter model look small tomorrow or have we reached the limit?
- Sara E. Berger: With all of the AI that we have in our day-to-day, where should be the limitations? Where should we have it, where shouldn’t we have it, where should be the boundaries?
Gests and Hosts
Bin Fan, Founding Member of Alluxio. Connect with Bin on LinkedIn and on Twitter @BinFan.
Frederic Van Haren is the CTO and Founder at HighFens Inc., Consultancy & Services. Connect with Frederic on LinkedIn or on X/Twitter and check out the HighFens website.
Stephen Foskett, Organizer of the Tech Field Day Event Series, part of The Futurum Group. Find Stephen’s writing at GestaltIT.com, on Twitter at @SFoskett, or on Mastodon at @[email protected].