Pallas TPU# TPU specific documentation. Guides Writing TPU kernels with Pallas What is a TPU? Noteworthy properties and restrictions Supported operations Pipelining TPU and its memory spaces Constraints of using VMEM/SMEM Primer: Pipelining Pipelining in Pallas Handling reductions TPUs in Megacore configuration Conclusion Matrix Multiplication Background Your first matrix multiplication kernel Matrix multiplication performance Performance of pipelined kernels Templating the matrix multiplication Conclusion Scalar Prefetch and Block-Sparse Computation Dynamic Block Indexing with Scalar Prefetch Example: Block Dynamic Slice with Scalar Prefetch Sparse Kernels: Representing Sparse Data Example: Sparse @ Dense Matrix Multiplication Sparse Access Patterns on Dense Data Example: Dense @ Dense Matrix Multiplication with a Block-Sparse Output Mask Distributed Computing in Pallas for TPUs TPU Topologies Remote Direct Memory Access (RDMA) Model Advanced Techniques Final Notes