-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[CI] Introduce rules for llama auto-label
ci/build
#19323
opened Jun 8, 2025 by
houseroad
Loading…
1 of 3 tasks
[full_graph] Fix query_start_loc padding
v1
#19321
opened Jun 7, 2025 by
yinghai
Loading…
3 tasks done
[v1] Add fp32 support to v1 engine through flex attn
ready
ONLY add when PR is ready to merge/full CI is needed
#19319
opened Jun 7, 2025 by
Isotr0py
Loading…
3 tasks done
[Bugfix] Fix auto dtype casting for BatchFeature
ready
ONLY add when PR is ready to merge/full CI is needed
#19316
opened Jun 7, 2025 by
Isotr0py
Loading…
2 of 3 tasks
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B
moe
qwen
Related to Qwen models
ready
ONLY add when PR is ready to merge/full CI is needed
#19315
opened Jun 7, 2025 by
Xu-Wenqing
Loading…
3 tasks done
[Fix] Remove unused opentelemetry-semantic-conventions-ai dependency
ci/build
documentation
Improvements or additions to documentation
#19313
opened Jun 7, 2025 by
conroy-cheers
Loading…
[Bugfix][V1] Fix memory profile to allow multiple servers to start on the same card
needs-rebase
v1
#19312
opened Jun 7, 2025 by
yeqcharlotte
Loading…
[doc] improve ci doc
ci/build
documentation
Improvements or additions to documentation
#19307
opened Jun 7, 2025 by
reidliu41
Loading…
3 tasks
Use xla flag to improve the quantized model performance
ready
ONLY add when PR is ready to merge/full CI is needed
tpu
Related to Google TPUs
v1
#19303
opened Jun 6, 2025 by
vanbasten23
Loading…
3 tasks done
[Misc] Change tests/compile to use VLLM_V1 by default
ready
ONLY add when PR is ready to merge/full CI is needed
#19302
opened Jun 6, 2025 by
zou3519
Loading…
Add optional token-level progress bar to
LLM.beam_search
using tqdm
frontend
#19301
opened Jun 6, 2025 by
NekoMimiUnagi
Loading…
3 tasks done
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination.
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#19298
opened Jun 6, 2025 by
varun-sundar-rabindranath
Loading…
[Quantization] Bump compressed-tensors version
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#19295
opened Jun 6, 2025 by
kylesayrs
Loading…
[V1] Add API docs for EncoderCacheManager
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#19294
opened Jun 6, 2025 by
russellb
Loading…
[TPU] support fp8 kv cache quantization
tpu
Related to Google TPUs
v1
#19292
opened Jun 6, 2025 by
yaochengji
Loading…
[Metrics] Compute and log the serving FLOPs
documentation
Improvements or additions to documentation
#19290
opened Jun 6, 2025 by
sysradium
Loading…
[Misc] Add documentation update reminder to PR template
ci/build
#19289
opened Jun 6, 2025 by
Isotr0py
Loading…
1 of 3 tasks
[Frontend] Remove unreachable code from llm.py
frontend
#19288
opened Jun 6, 2025 by
KsuParkhamchuk
Loading…
[Core] Update error message for Whisper + num-scheduler-steps > 1
ready
ONLY add when PR is ready to merge/full CI is needed
#19286
opened Jun 6, 2025 by
russellb
Loading…
Convert kv_transfer_config from dict to KVTransferConfig to fix #19259
frontend
#19262
opened Jun 6, 2025 by
maobaolong
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-05-07.