GPU Development Logs & CUDA Examples

👋 Welcome. I’m documenting key learnings and experiments from my ongoing work in GPU computing.
🧩 This site collects focused examples of CUDA implementations — from memory management and performance profiling to real-time rendering techniques and system-level optimization.
🛠️ Each post is backed by runnable code, shared openly on GitHub, and written to be useful for both learning and collaboration.

Follow-Up: Building libcudf on WSL2

Continuing from the previous post, this article explains how I replicated successful libcudf builds by cloning WSL2 environments and installing the correct CUDA Toolkit.

Failure Log: How to Build libcudf from Source?

This article introduces how to build libcudf, the backend of the RAPIDS library cuDF, from source.

CUDA: Performance Comparison Between Shared and Global Memory

This article explores the performance difference between shared memory and global memory in CUDA, and explains how to use them effectively while avoiding common pitfalls.

CUDA Streaming and Overlap Visualization with Nsight

This post explains how to implement asynchronous parallel processing using CUDA streams and how to visualize GPU execution overlap with Nsight Systems.

[Debugging] Image Loading and CUDA Processing with Nsight Profiling and OpenCV

This post investigates and resolves a bug encountered while transferring 2D image data to device memory and processing it with CUDA kernels.