Mining/miner/cuda/TODO.md

16 KiB

Code Review Findings - CUDA Mining Plugin Enterprise Audit

Generated: 2025-12-31 Reviewed by: 6 Parallel Opus Code Reviewers Target: XMRig-CUDA Plugin (76 source files)


Summary

Domain Critical High Medium Total
Plugin API & Entry Point 2 4 2 8
CUDA Core Kernels 2 3 2 7
CryptoNight-R NVRTC 1 3 3 7
RandomX CUDA 2 3 1 6
KawPow CUDA 2 5 1 8
Crypto Common 3 3 0 6
TOTAL 12 21 9 42

Critical Issues

CRIT-001: Memory Leak in DatasetHost::release()

  • File: src/xmrig-cuda.cpp:62-72
  • Domain: Plugin API & Entry Point
  • Confidence: 95%

The m_ptr is set to nullptr unconditionally after decrementing m_refs, even when m_refs > 0. This loses the pointer while other contexts may still be using it.

inline void release()
{
    --m_refs;
    if (m_refs == 0) {
        cudaHostUnregister(m_ptr);
    }
    m_ptr = nullptr;  // BUG: Always sets to nullptr, even when refs > 0
}

Fix: Only set m_ptr = nullptr inside the if (m_refs == 0) block.


CRIT-002: Unchecked cudaHostUnregister Error

  • File: src/xmrig-cuda.cpp:68-69
  • Domain: Plugin API & Entry Point
  • Confidence: 90%

cudaHostUnregister() is called without error checking, unlike all other CUDA API calls in the codebase which use CUDA_CHECK macro.

Fix: Add error checking or at minimum log errors from cleanup path.


CRIT-003: Race Condition with atomicExch in MUL_SUM_XOR_DST

  • File: src/cuda_extra.h:98
  • Domain: CUDA Core Kernels
  • Confidence: 85%

The macro reads dst0 non-atomically at the start, but another thread may be writing to the same location via atomicExch. This creates a read-write race.

Fix: Add __threadfence() and ensure all accesses to shared dst are atomic.


CRIT-004: Shared Memory Array Index Bounds Overflow

  • File: src/cuda_core.cu:335
  • Domain: CUDA Core Kernels
  • Confidence: 82%

Expression (MASK - 0x30) could underflow if MASK < 0x30 for certain algorithm variants, leading to out-of-bounds scratchpad access.

Fix: Add bounds validation: const uint32_t safe_mask = (MASK >= 0x30) ? (MASK - 0x30) : 0;


CRIT-005: Memory Leak in Module Cleanup Path

  • File: src/cuda_core.cu:834-836
  • Domain: CryptoNight-R NVRTC
  • Confidence: 95%

When changing block height, the old module is unloaded before checking if compilation succeeds. If CryptonightR_get_program() throws, ctx->module is left in invalid state.

Fix: Get new module first, then unload old module only after successful compilation.


CRIT-006: Dataset Buffer Overflow in execute_vm Kernel

  • File: src/RandomX/randomx_cuda.hpp:2175
  • Domain: RandomX CUDA
  • Confidence: 95%

Dataset access at ma + sub * 8 can reach maximum offset of 2,181,038,008 bytes with only 8 bytes safety margin. Under-allocation by even a few bytes causes out-of-bounds read.

Fix: Add explicit bounds checking before dataset access.


CRIT-007: Scratchpad Buffer Overflow Risk

  • File: src/RandomX/randomx_cuda.hpp:2132-2133
  • Domain: RandomX CUDA
  • Confidence: 90%

Scratchpad pointer arithmetic spAddr + sub * 8 relies on exact allocation sizes. If RANDOMX_SCRATCHPAD_L3 constant is misconfigured in any coin variant, buffer overflow occurs.

Fix: Add compile-time static_assert and runtime bounds checks.


CRIT-008: Out-of-Bounds Array Access in dag_sizes

  • File: src/KawPow/raven/CudaKawPow_gen.cpp:432
  • Domain: KawPow CUDA
  • Confidence: 95%

dag_sizes[epoch] accessed without validating that epoch is within array bounds. The dag_sizes parameter is a raw pointer with no size information.

Fix: Add size parameter to function signature and validate bounds before access.


CRIT-009: Background Thread Never Joins/Terminates

  • File: src/KawPow/raven/CudaKawPow_gen.cpp:51-82
  • Domain: KawPow CUDA
  • Confidence: 100%

Background thread runs infinite for(;;) loop with no exit condition. Thread is never joined or deleted on shutdown.

Fix: Add std::atomic<bool> shutdown flag, check in loop, and add cleanup function.


CRIT-010: Integer Overflow in Algorithm L3 Memory Calculation

  • File: src/crypto/common/Algorithm.h:89
  • Domain: Crypto Common
  • Confidence: 85%

The l3() function uses 1ULL << ((id >> 16) & 0xff). If bits 16-23 are >= 64, this causes undefined behavior due to shift overflow.

Fix: Add bounds checking: return (shift < 64) ? (1ULL << shift) : 0;


CRIT-011: Division by Zero in VARIANT2_INTEGER_MATH

  • File: src/crypto/cn/CryptoNight_monero.h:79,81
  • Domain: Crypto Common
  • Confidence: 90%

While divisor d is OR'd with 0x80000001UL, the ARM version (line 125-127) has different logic that may not guarantee non-zero divisor.

Fix: Add explicit validation if (d == 0) { /* handle error */ }.


CRIT-012: Missing Function Definition for int_sqrt_v2

  • File: src/crypto/cn/CryptoNight_monero.h:83
  • Domain: Crypto Common
  • Confidence: 95%

CUDA plugin calls int_sqrt_v2() but function is not defined in CUDA codebase. It exists in core project at separate compilation unit.

Fix: Include proper header or provide implementation in CUDA plugin headers.


High Priority Issues

HIGH-001: Integer Overflow in Scratchpad Size Calculation

  • File: src/RandomX/randomx.cu:33
  • Domain: Plugin API & Entry Point
  • Confidence: 85%

batch_size * (ctx->algorithm.l3() + 64) - both operands may be uint32_t, causing overflow before assignment to uint64_t.

Fix: Cast to uint64_t first: static_cast<uint64_t>(batch_size) * ...


HIGH-002: Incomplete Error Cleanup in cryptonight_extra_cpu_init()

  • File: src/cuda_extra.cu:313-389
  • Domain: Plugin API & Entry Point
  • Confidence: 90%

Multiple GPU allocations with CUDA_CHECK (which throws on error) but no try-catch cleanup. Mid-function failures leak prior allocations.

Fix: Add RAII wrapper or manual cleanup in catch block.


HIGH-003: Missing Null Check in deviceName()

  • File: src/xmrig-cuda.cpp:369-372
  • Domain: Plugin API & Entry Point
  • Confidence: 85%

deviceName() dereferences ctx without null check, while deviceInt() does check. Inconsistent defensive programming.

Fix: Add if (ctx == nullptr) { return nullptr; }.


HIGH-004: Potential Double-Free in release() with d_ctx_state2

  • File: src/xmrig-cuda.cpp:537-538
  • Domain: Plugin API & Entry Point
  • Confidence: 80%

In non-HEAVY algorithms, d_ctx_state2 aliases d_ctx_state. But release() unconditionally calls cudaFree() on both.

Fix: Check if pointers are equal before freeing second.


HIGH-005: Missing CUDA Error Check After Kernel Launches

  • File: src/cuda_core.cu:741-805
  • Domain: CUDA Core Kernels
  • Confidence: 95%

Dynamic shared memory size calculation could exceed device limits (48KB), but no runtime validation before launch.

Fix: Add cudaDeviceGetAttribute check for max shared memory before launch.


HIGH-006: Integer Overflow in Index Calculation (CN_HEAVY)

  • File: src/cuda_core.cu:621-622
  • Domain: CUDA Core Kernels
  • Confidence: 90%

Expression ((idx0 & MASK) >> 3) + 1u can overflow for large idx0, exceeding allocated long_state buffer size.

Fix: Add overflow check before calculation.


HIGH-007: Uncoalesced Memory Access Pattern in Phase 2

  • File: src/cuda_core.cu:337-372
  • Domain: CUDA Core Kernels
  • Confidence: 88%

Adjacent threads access non-adjacent memory locations, causing ~50% reduction in memory bandwidth utilization.

Fix: Restructure to use shared memory tiling with coalesced loads.


HIGH-008: Unvalidated Height Parameter in Code Generation

  • File: src/CudaCryptonightR_gen.cpp:261-262
  • Domain: CryptoNight-R NVRTC
  • Confidence: 85%

The height parameter seeds Blake hash directly without validation. Extreme height values near UINT64_MAX are not rejected.

Fix: Add validation for reasonable blockchain height limits.


HIGH-009: Missing PTX Data Validation Before Module Load

  • File: src/cuda_core.cu:840-843
  • Domain: CryptoNight-R NVRTC
  • Confidence: 90%

If CryptonightR_get_program() returns early, ptx vector may be empty. Loading empty PTX causes undefined behavior.

Fix: Check if (ptx.empty() || lowered_name.empty()) before cuModuleLoadDataEx.


HIGH-010: Missing Memory Deallocation for RandomX Buffers

  • File: src/RandomX/randomx.cu:30-48
  • Domain: RandomX CUDA
  • Confidence: 100%

randomx_prepare allocates 5 device memory buffers but no cleanup function exists for re-calling with different batch sizes.

Fix: Add randomx_cleanup() function and proper lifecycle management.


HIGH-011: Uninitialized VM State Memory

  • File: src/RandomX/randomx_cuda.hpp:448-449
  • Domain: RandomX CUDA
  • Confidence: 85%

init_vm kernel only zeros R[0-7], but imm_buf and compiled_program sections left with uninitialized data.

Fix: Add memset(R, 0, VM_STATE_SIZE) for full initialization.


HIGH-012: Integer Overflow in Scratchpad Size Calculation

  • File: src/RandomX/randomx.cu:33
  • Domain: RandomX CUDA
  • Confidence: 80%

Same as HIGH-001: batch_size * (ctx->algorithm.l3() + 64) could overflow if operands are uint32_t.

Fix: Use static_cast<uint64_t>(batch_size) * ....


HIGH-013: Missing Null Pointer Validation for dag_sizes

  • File: src/KawPow/raven/CudaKawPow_gen.cpp:405-432
  • Domain: KawPow CUDA
  • Confidence: 90%

dag_sizes pointer is dereferenced without null checking. Background lambda captures pointer by value - could be null.

Fix: Add if (!dag_sizes) { CUDA_THROW("dag_sizes cannot be null"); }.


HIGH-014: Integer Overflow in DAG Element Calculation

  • File: src/KawPow/raven/CudaKawPow_gen.cpp:432
  • Domain: KawPow CUDA
  • Confidence: 85%

dag_elements is uint64_t but implicitly converted to uint32_t in calculate_fast_mod_data(). Values exceeding 32-bit range silently truncate.

Fix: Validate dag_elements <= UINT32_MAX before conversion.


HIGH-015: Unchecked string::find in Code Generation

  • File: src/KawPow/raven/CudaKawPow_gen.cpp:423,426,448,455
  • Domain: KawPow CUDA
  • Confidence: 90%

All four source_code.replace() calls use find() without checking for std::string::npos. Missing template markers cause undefined behavior.

Fix: Create helper function that validates find() result before replace().


HIGH-016: Module Unload Without Null Check

  • File: src/KawPow/raven/KawPow.cu:99-101
  • Domain: KawPow CUDA
  • Confidence: 80%

Module is unloaded but NOT set to nullptr afterwards. Creates use-after-free if period changes multiple times.

Fix: Set ctx->kawpow_module = nullptr after cuModuleUnload().


HIGH-017: Missing Error Check on Large DAG Allocation

  • File: src/KawPow/raven/KawPow.cu:58-62
  • Domain: KawPow CUDA
  • Confidence: 85%

DAGs can be multi-GB. No pre-allocation validation that GPU has sufficient memory.

Fix: Use cudaMemGetInfo() to verify available memory before allocation.


HIGH-018: Unvalidated Array Index in v4_random_math Macro

  • File: src/crypto/cn/r/variant4_random_math.h:104-106
  • Domain: Crypto Common
  • Confidence: 85%

r[op->src_index] and r[op->dst_index] accessed without bounds checking against 9-element array.

Fix: Add if (op->src_index >= 9 || op->dst_index >= 9) { return; }.


HIGH-019: Integer Overflow in L3 Mask Calculation

  • File: src/crypto/cn/CnAlgo.h:111
  • Domain: Crypto Common
  • Confidence: 82%

If Algorithm::l3(algo) returns 0, then l3(algo) - 1 underflows to SIZE_MAX.

Fix: Add validation: if (l3_val == 0) return 0;.


HIGH-020: Potential Buffer Overflow in blake256_update

  • File: src/crypto/cn/c_blake256.c:150
  • Domain: Crypto Common
  • Confidence: 80%

memcpy((void *) (S->buf + left), ...) - if left + (datalen >> 3) exceeds 64 (buf size), overflow occurs.

Fix: Add explicit bounds check before memcpy.


Medium Priority Issues

MED-001: Missing Error Check on strdup()

  • File: src/cuda_extra.cu:550
  • Domain: Plugin API & Entry Point
  • Confidence: 85%

strdup() can return nullptr on allocation failure but this is not checked.


MED-002: Unchecked CUDA Error in init()

  • File: src/xmrig-cuda.cpp:513-518
  • Domain: Plugin API & Entry Point
  • Confidence: 80%

cuInit(0) called without checking return value.


MED-003: Missing Synchronization After Shuffle Operations

  • File: src/cuda_core.cu:354-357
  • Domain: CUDA Core Kernels
  • Confidence: 85%

__syncwarp() for CUDA 9+ only synchronizes within warp. If block2.x > 32, shared memory access across warps has race condition.


MED-004: Device Memory Leak on Algorithm Change

  • File: src/cuda_core.cu:834-836
  • Domain: CUDA Core Kernels
  • Confidence: 92%

Module unloaded for CN-R variant switching but associated device memory allocations not cleaned up.


MED-005: Race Condition in Cache Access (TOCTOU)

  • File: src/CudaCryptonightR_gen.cpp:158-171,268-280
  • Domain: CryptoNight-R NVRTC
  • Confidence: 85%

Check-then-act pattern allows duplicate NVRTC compilations under high concurrency.


MED-006: Background Thread Never Joins (CN-R)

  • File: src/CudaCryptonightR_gen.cpp:96,124-126
  • Domain: CryptoNight-R NVRTC
  • Confidence: 100%

Same issue as CRIT-009 but in CryptoNight-R code path.


MED-007: Insufficient Error Context in NVRTC Failures

  • File: src/CudaCryptonightR_gen.cpp:190-205
  • Domain: CryptoNight-R NVRTC
  • Confidence: 90%

Error log missing height/arch context, making production debugging difficult.


MED-008: Race Condition in find_shares Kernel

  • File: src/RandomX/hash.hpp:22-32
  • Domain: RandomX CUDA
  • Confidence: 82%

Multiple threads using atomicInc could theoretically get same idx before increment completes on all SMs.


MED-009: Race Condition in Background Compilation

  • File: src/KawPow/raven/CudaKawPow_gen.cpp:407-410
  • Domain: KawPow CUDA
  • Confidence: 80%

Lambda captures dag_sizes pointer by value. Use-after-free if caller deallocates before background execution.


Immediate (Security Critical)

  1. CRIT-006: Dataset buffer overflow in RandomX
  2. CRIT-007: Scratchpad buffer overflow in RandomX
  3. CRIT-008: Out-of-bounds dag_sizes access in KawPow
  4. CRIT-012: Missing int_sqrt_v2 function
  5. CRIT-011: Division by zero in VARIANT2

This Week (Data Integrity)

  1. CRIT-001: DatasetHost memory leak
  2. CRIT-003: Race condition in MUL_SUM_XOR_DST
  3. CRIT-009: Background thread never terminates
  4. HIGH-010: RandomX buffer cleanup
  5. HIGH-002: cryptonight_extra_cpu_init cleanup

Next Sprint (Stability)

  1. CRIT-005: Module cleanup path memory leak
  2. CRIT-010: Integer overflow in l3() calculation
  3. HIGH-005: Missing kernel launch error checks
  4. HIGH-015: Unchecked string::find in code gen
  5. HIGH-016: Module use-after-free

Backlog (Quality)

  • All Medium priority items
  • Performance optimization (HIGH-007)
  • Error message improvements (MED-007)

Review Completion Status

  • Plugin API & Entry Point - 8 issues found
  • CUDA Core Kernels - 7 issues found
  • CryptoNight-R NVRTC - 7 issues found
  • RandomX CUDA - 6 issues found
  • KawPow CUDA - 8 issues found
  • Crypto Common - 6 issues found

Total Issues Identified: 42