CUDA Memory Transfer Timing: malloc, managed, and zero-copy

I measured the transfer time of 2GB memory between CPU and GPU using cudaMalloc, cudaMallocManaged, and cudaHostAllocMapped (Zero-Copy).

May 20, 2025 · 2 min · yaikeda

Comparing CPU and GPU Performance in Vector Addition with CUDA

A first benchmark comparing CPU and GPU performance in vector addition using CUDA. Includes code samples, timing analysis, and lessons learned.

May 18, 2025 · 2 min · yaikeda