⛽ How GPU Memory Hierarchy Fuels the idea Behind FlashAttention
FlashAttention Part Three: Understanding the GPU memory hierarchy and how each component can be used to optimize performance. Dimitris Poulopoulos May 8th GPU Memory HierarchyFaster than your company’s organization chart! In our last chapter, we delved into the attention mechanism—today's superstar in the world of Deep Learning. We now have a basic understanding of how attention works, so, before we explore the various types of attention mechanisms, let's circle back to this month's topic:...