vllm.v1.attention.kv_dequant.hadamard ¶
Hadamard helpers for INT4 KV cache quantization.
_get_hadamard_matrix ¶
Return a cached normalized Hadamard matrix of shape (d, d).
The cached matrix follows the same H / sqrt(d) convention as hadacore_transform so applying the transform twice is the identity.
Source code in vllm/v1/attention/kv_dequant/hadamard.py
hadamard_transform ¶
Apply a normalized Hadamard transform.
Tier-1 uses hadacore_transform on CUDA. Tier-2 falls back to a cached dense matmul on non-CUDA platforms. The fallback is intended for the INT4 KV cache path and prioritizes correctness over performance.