Pallas TPU# TPU specific documentation. Guides Writing TPU kernels with Pallas What is a TPU? Noteworthy properties and restrictions Supported operations TPU Pipelining TPU and its memory spaces TPU-specific Pipelining Features Matrix Multiplication Background Your first matrix multiplication kernel Matrix multiplication performance Performance of pipelined kernels Templating the matrix multiplication Conclusion Scalar Prefetch and Block-Sparse Computation Dynamic Block Indexing with Scalar Prefetch Example: Block Dynamic Slice with Scalar Prefetch Sparse Kernels: Representing Sparse Data Example: Sparse @ Dense Matrix Multiplication Sparse Access Patterns on Dense Data Example: Dense @ Dense Matrix Multiplication with a Block-Sparse Output Mask Distributed Computing in Pallas for TPUs TPU Topologies Remote Direct Memory Access (RDMA) Model Advanced Techniques Final Notes