Charles Frye shares insights from understanding GPUs for data science workloads, exploring the most important facts and intuitions about GPU architecture that impact system performance. For data scientists working with neural networks on GPU accelerators, this talk offers valuable perspectives on features and phenomena that affect latency and throughput, from silicon layer fundamentals to high-level frameworks like PyTorch and vLLM

https://www.youtube.com/watch?v=ch2ODgbJjlA

https://youtu.be/wE1ZoMGIZHM