Image Loading and CUDA Image Processing with Nsight Profiling and OpenCV

This post explains how to use Nsight to profile CUDA performance, how to load images with OpenCV, and how to transfer 2D images to device memory for processing with CUDA kernels.

May 21, 2025 · 5 min · yaikeda

CUDA Memory Transfer Timing: malloc, managed, and zero-copy

I measured the transfer time of 2GB memory between CPU and GPU using cudaMalloc, cudaMallocManaged, and cudaHostAllocMapped (Zero-Copy).

May 20, 2025 · 2 min · yaikeda

Three CUDA Memory Allocation Methods and Zero-Copy Mapping

In this post, I explore and visualize three memory allocation strategies in CUDA: cudaMalloc, cudaMallocManaged, and cudaHostAlloc—with an additional test on zero-copy memory mapping.

May 19, 2025 · 3 min · yaikeda

Comparing CPU and GPU Performance in Vector Addition with CUDA

A first benchmark comparing CPU and GPU performance in vector addition using CUDA. Includes code samples, timing analysis, and lessons learned.

May 18, 2025 · 2 min · yaikeda