Massively Parallel Processors: A Hands-on Approach is not really good in my opinion, many small mistakes and confusing sentences (even when you know cuda).
CUDA by Example: An Introduction to General-Purpose GPU Programming is too simple and abstract too much the architecture.
Next year I'm planning to start writing a cuda book that starts by engineering the hardware, and goes up to the optimization part on that harware (which is basically a nvidia card) including all the main algorithms (except for graphs).
I'm already teaching the course in this way at uni, and it is quite successful among students.
So tl;dr, you have at least one person who would pay for a better book :-)
A curated list of every major book on CUDA programming β beginner to advanced, C++/Python, architecture, optimization, and the latest 2024β2026 releases.
Focused on practical, high-quality resources for NVIDIA GPU parallel computing.
Contributions welcome! See Contributing.
CUDA by Example: An Introduction to General-Purpose GPU Programming
Jason Sanders & Edward Kandrot (2010, Addison-Wesley)
The timeless classic. Short, example-driven, perfect first book.
Learn CUDA Programming
Jaegeun Han & Bharatkumar Sharma (2019, Packt)
Modern beginner-to-intermediate with CUDA 10+ examples and GitHub repo.
CUDA for Engineers: An Introduction to High-Performance Parallel Computing
Mete Yurtoglu & Duane Storti (2016, Addison-Wesley)
Engineer-focused, hands-on projects for scientists and non-CS folks.
Programming in Parallel with CUDA: A Practical Guide
Richard Ansorge (2022, Cambridge University Press)
Real-world scientific examples (stencils, Monte Carlo, imaging). Excellent modern C++ coverage.
Professional CUDA C Programming
John Cheng, Max Grossman & Ty McKercher (2014, Wrox)
Production-level: multi-GPU, streams, libraries, and performance pitfalls.
GPU Parallel Program Development Using CUDA
Tolga Soyata (2018, Chapman & Hall/CRC)
Strong on libraries (cuBLAS, cuFFT, Thrust, NPP) and OpenCL comparison.
CUDA for Deep Learning Elliot Arledge (2025, Manning) From first kernels to Flash Attention β teaches hands-on CUDA optimization for deep learning with Nsight Compute profiling.
The CUDA Handbook: A Comprehensive Guide to GPU Programming
Nicholas Wilt (2013)
The deep-dive reference. Every API detail and low-level trick.
CUDA Programming: A Developer's Guide to Parallel Computing with GPUs
Shane Cook (2013, Morgan Kaufmann)
Parallel algorithms, optimization patterns, and best practices.
CUDA Application Design and Development
Rob Farber (2011, Morgan Kaufmann)
Real research applications and scalable design.
Hands-On GPU Programming with Python and CUDA
Brian Tuomanen (2018, Packt)
Best for Python users β Numba, CuPy, and raw bindings.
GPU Programming with C++ and CUDA (or 9781805128823 variant)
Paulo Motta (2024, Packt)
Modern C++20 + Python interop (pybind11).
Programming in Parallel with CUDA (Ansorge, 2022) β see above
Programming Massively Parallel Processors (3rd Ed.) (Kirk & Hwu, 2022) β see above
GPU Programming with C++ and CUDA (Motta, 2024) β see above
CUDA for Deep Learning (Arledge, 2025, Manning) β see above
Notable 2024β2026 titles (mostly specialized or self-published but frequently appearing in searches):
Pro tip: CUDA changes fast. Always pair books with the free official CUDA C++ Programming Guide (v13.x, 2026).
Contributions are welcome! See contributing.md for the full guide.
Quick version:
Star the repo if this helps you write faster kernels! π
Inspired by GoBooks.
MIT Β© Dariush Abbasi & Altern