vllm.v1.attention.kv_dequant ¶
KV dequantization dispatch scaffold for attention backends.
Modules:
| Name | Description |
|---|---|
flashinfer_tile | FlashInfer KV dequant tile helpers. |
hadamard | Hadamard helpers for INT4 KV cache quantization. |
triton_tile | Triton KV dequantization helpers used by unified attention kernels. |
assert_backend_supports_kv_quant_mode ¶
assert_backend_supports_kv_quant_mode(
backend_name: str, quant_mode: KVQuantMode
) -> None
Raise when a backend has not declared support for the kv quant mode.