代码文件索引

代码文件索引(Code Map)

本文档提供 vLLM 代码库的关键文件索引,帮助读者快速定位感兴趣的代码。


代码目录结构概览

vllm/
├── entrypoints/           # 入口点
│   ├── llm.py             # Python API 入口
│   ├── cli/               # 命令行入口
│   └── openai/            # OpenAI 兼容 API
│
├── v1/                    # V1 版本核心实现
│   ├── engine/            # 引擎相关
│   ├── core/              # 核心调度和内存管理
│   ├── worker/            # Worker 执行
│   ├── attention/         # 注意力实现
│   ├── sample/            # 采样器
│   └── spec_decode/       # 投机解码
│
├── model_executor/        # 模型执行
│   ├── models/            # 模型实现
│   └── layers/            # 层实现和量化
│
├── distributed/           # 分布式通信
│
├── config/                # 配置管理
│
└── csrc/                  # CUDA 内核
    └── attention/         # Attention CUDA 内核

入口点(Entry Points)

文件路径说明关键类/函数
vllm/entrypoints/llm.pyPython API 入口LLM, LLM.generate(), LLM.chat()
vllm/entrypoints/cli/main.pyCLI 入口serve, bench 命令
vllm/entrypoints/openai/api_server.pyOpenAI API 服务API 端点定义
vllm/engine/arg_utils.py参数解析EngineArgs, create_engine_config()

V1 引擎(Engine)

文件路径说明关键类/函数
vllm/v1/engine/llm_engine.pyLLM 引擎入口LLMEngine
vllm/v1/engine/core.py引擎核心EngineCore, EngineCore.step()
vllm/v1/engine/processor.py输入/输出处理InputProcessor, OutputProcessor
vllm/v1/engine/async_llm.py异步引擎AsyncLLM

核心调度(Core Scheduling)

文件路径说明关键类/函数
vllm/v1/core/sched/scheduler.py调度器Scheduler, Scheduler.schedule()
vllm/v1/core/sched/request_queue.py请求队列FCFSRequestQueue, PriorityRequestQueue
vllm/v1/core/kv_cache_manager.pyKV Cache 管理KVCacheManager, allocate_slots(), free()
vllm/v1/core/block_pool.py块池管理BlockPool, FreeKVCacheBlockQueue
vllm/v1/core/kv_cache_utils.pyKV Cache 工具KVCacheBlock, BlockHashToBlockMap

请求处理(Request)

文件路径说明关键类/函数
vllm/v1/request.py请求数据结构Request, RequestStatus
vllm/sampling_params.py采样参数SamplingParams
vllm/outputs.py输出数据结构RequestOutput, CompletionOutput

Worker 执行(Worker)

文件路径说明关键类/函数
vllm/v1/worker/gpu_worker.pyGPU WorkerGPUWorker
vllm/v1/worker/gpu_model_runner.py模型执行GPUModelRunner, execute_model()
vllm/v1/worker/gpu_input_batch.py输入批处理InputBatch, CachedRequestState

Executor 执行器

文件路径说明关键类/函数
vllm/v1/executor/abstract.py执行器基类Executor
vllm/v1/executor/uniproc_executor.py单进程执行器UniProcExecutor
vllm/v1/executor/multiproc_executor.py多进程执行器MultiprocExecutor
vllm/v1/executor/ray_distributed.pyRay 分布式执行器RayDistributedExecutor

注意力机制(Attention)

文件路径说明关键类/函数
vllm/v1/attention/ops/paged_attn.pyPagedAttention 接口PagedAttention
vllm/v1/attention/backends/flash_attn.pyFlash Attention 后端FlashAttentionBackend
vllm/v1/attention/backends/triton_attn.pyTriton Attention 后端TritonAttentionBackend
vllm/attention/layer.pyAttention 层Attention

采样(Sampling)

文件路径说明关键类/函数
vllm/v1/sample/sampler.py采样器Sampler, Sampler.forward()
vllm/v1/sample/metadata.py采样元数据SamplingMetadata
vllm/v1/sample/ops/penalties.py惩罚项计算apply_penalties()
vllm/v1/sample/ops/topk_topp.pyTop-K/Top-P 采样apply_top_k_top_p()

投机解码(Speculative Decoding)

文件路径说明关键类/函数
vllm/v1/spec_decode/eagle.pyEAGLE 基类SpecDecodeBaseProposer, EagleProposer
vllm/v1/spec_decode/draft_model.pyDraft ModelDraftModelProposer
vllm/v1/spec_decode/medusa.pyMedusaMedusaProposer
vllm/v1/worker/gpu/spec_decode/rejection_sample.py拒绝采样rejection_sample()

模型实现(Models)

文件路径说明关键类/函数
vllm/model_executor/models/llama.pyLLaMA 模型LlamaForCausalLM
vllm/model_executor/models/qwen2.pyQwen2 模型Qwen2ForCausalLM
vllm/model_executor/models/mixtral.pyMixtral MoEMixtralForCausalLM
vllm/model_executor/models/deepseek_v2.pyDeepSeek V2DeepseekV2ForCausalLM
vllm/model_executor/model_loader/loader.py模型加载get_model()

量化(Quantization)

文件路径说明关键类/函数
vllm/model_executor/layers/quantization/__init__.py量化入口get_quantization_config()
vllm/model_executor/layers/quantization/base_config.py量化基类QuantizationConfig
vllm/model_executor/layers/quantization/fp8.pyFP8 量化Fp8Config
vllm/model_executor/layers/quantization/awq.pyAWQ 量化AWQConfig
vllm/model_executor/layers/quantization/gptq.pyGPTQ 量化GPTQConfig

分布式通信(Distributed)

文件路径说明关键类/函数
vllm/distributed/parallel_state.py并行状态管理GroupCoordinator
vllm/distributed/communication_op.py通信操作tensor_model_parallel_all_reduce()
vllm/distributed/device_communicators/pynccl.pyNCCL 通信PyNcclCommunicator
vllm/distributed/device_communicators/custom_all_reduce.py自定义 AllReduceCustomAllReduce

配置(Config)

文件路径说明关键类/函数
vllm/config/vllm.py总配置VllmConfig
vllm/config/model.py模型配置ModelConfig
vllm/config/parallel.py并行配置ParallelConfig
vllm/config/scheduler.py调度器配置SchedulerConfig
vllm/config/cache.py缓存配置CacheConfig

CUDA 内核(CUDA Kernels)

文件路径说明
csrc/attention/paged_attention_v1.cuPagedAttention V1 内核
csrc/attention/paged_attention_v2.cuPagedAttention V2 内核
csrc/quantization/量化相关内核
csrc/moe/MoE 相关内核

关键函数速查

请求处理流程

# 1. 用户调用
LLM.generate()                          # vllm/entrypoints/llm.py

# 2. 引擎处理
LLMEngine.add_request()                  # vllm/v1/engine/llm_engine.py
EngineCore.step()                        # vllm/v1/engine/core.py

# 3. 调度
Scheduler.schedule()                     # vllm/v1/core/sched/scheduler.py
KVCacheManager.allocate_slots()          # vllm/v1/core/kv_cache_manager.py

# 4. 执行
GPUModelRunner.execute_model()           # vllm/v1/worker/gpu_model_runner.py
model.forward()                          # vllm/model_executor/models/*.py

# 5. 采样
Sampler.forward()                        # vllm/v1/sample/sampler.py

# 6. 输出
OutputProcessor.process()                # vllm/v1/engine/processor.py

KV Cache 管理流程

# 分配
KVCacheManager.allocate_slots()          # vllm/v1/core/kv_cache_manager.py
BlockPool.get_free_blocks()              # vllm/v1/core/block_pool.py

# 释放
KVCacheManager.free()                    # vllm/v1/core/kv_cache_manager.py
BlockPool.free_blocks()                  # vllm/v1/core/block_pool.py

# 前缀缓存
KVCacheManager.get_computed_blocks()     # vllm/v1/core/kv_cache_manager.py

调试建议

关键断点位置

功能文件:行号说明
请求添加v1/engine/llm_engine.py:add_request追踪请求入口
调度决策v1/core/sched/scheduler.py:schedule理解调度逻辑
KV 分配v1/core/kv_cache_manager.py:allocate_slots内存分配
模型执行v1/worker/gpu_model_runner.py:execute_model前向传播
采样v1/sample/sampler.py:forwardToken 采样

日志配置

# 详细日志
export VLLM_LOGGING_LEVEL=DEBUG

# 函数追踪
export VLLM_TRACE_FUNCTION=1

# 调度器日志
export VLLM_LOG_SCHEDULER=1

导航

January 28, 2026: vllm cook book (efdcc55)