Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[full_graph] Fix query_start_loc padding v1
#19321 opened Jun 7, 2025 by yinghai Loading…
3 tasks done
[v1] Add fp32 support to v1 engine through flex attn ready ONLY add when PR is ready to merge/full CI is needed
#19319 opened Jun 7, 2025 by Isotr0py Loading…
3 tasks done
[Bugfix] Fix auto dtype casting for BatchFeature ready ONLY add when PR is ready to merge/full CI is needed
#19316 opened Jun 7, 2025 by Isotr0py Loading…
2 of 3 tasks
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B moe qwen Related to Qwen models
#19315 opened Jun 7, 2025 by Xu-Wenqing Loading…
3 tasks done
[Fix] Remove unused opentelemetry-semantic-conventions-ai dependency ci/build documentation Improvements or additions to documentation
#19313 opened Jun 7, 2025 by conroy-cheers Loading…
[Misc]: refactor: ParallelConfig init func
#19310 opened Jun 7, 2025 by googs1025 Loading…
3 tasks
[doc] improve ci doc ci/build documentation Improvements or additions to documentation
#19307 opened Jun 7, 2025 by reidliu41 Loading…
3 tasks
Use xla flag to improve the quantized model performance ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
#19303 opened Jun 6, 2025 by vanbasten23 Loading…
3 tasks done
[Misc] Change tests/compile to use VLLM_V1 by default ready ONLY add when PR is ready to merge/full CI is needed
#19302 opened Jun 6, 2025 by zou3519 Loading…
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. ready ONLY add when PR is ready to merge/full CI is needed v1
#19298 opened Jun 6, 2025 by varun-sundar-rabindranath Loading…
[CI] Update FlashInfer to 0.2.6 ci/build
#19297 opened Jun 6, 2025 by mgoin Loading…
[Quantization] Bump compressed-tensors version ci/build ready ONLY add when PR is ready to merge/full CI is needed
#19295 opened Jun 6, 2025 by kylesayrs Loading…
[V1] Add API docs for EncoderCacheManager ready ONLY add when PR is ready to merge/full CI is needed v1
#19294 opened Jun 6, 2025 by russellb Loading…
[TPU] support fp8 kv cache quantization tpu Related to Google TPUs v1
#19292 opened Jun 6, 2025 by yaochengji Loading…
[Metrics] Compute and log the serving FLOPs documentation Improvements or additions to documentation
#19290 opened Jun 6, 2025 by sysradium Loading…
[Misc] Add documentation update reminder to PR template ci/build
#19289 opened Jun 6, 2025 by Isotr0py Loading…
1 of 3 tasks
[Core] Update error message for Whisper + num-scheduler-steps > 1 ready ONLY add when PR is ready to merge/full CI is needed
#19286 opened Jun 6, 2025 by russellb Loading…
[V1][Kernel] Flashinfer HND KV cache layout v1
#19280 opened Jun 6, 2025 by NickLucche Loading…
[Frontend] optimize beam_search code
#19267 opened Jun 6, 2025 by zhanggzh Loading…
Fix TorchAOConfig skip layers
#19265 opened Jun 6, 2025 by mobicham Loading…
ProTip! Mix and match filters to narrow down what you’re looking for.
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy