| Age | Commit message (Collapse) | Author |
|
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
The official Qwen org publishes only BF16 and FP8 variants of 3.6, none
of which fit comfortably on a 32 GB card (27B-FP8 is 28 GB before KV
cache). Community quants land within days of release and are the right
size for consumer hardware:
- qwen3.6_27b_awq_int4 (cyankiwi): ~14 GB, AWQ INT4
- qwen3.6_27b_gptq_int4 (groxaxo): ~14 GB, GPTQ-Pro 4bit
- qwen3.6_27b_autoround_int4 (Lorbus): ~14 GB, AutoRound INT4
- qwen3.6_35b_a3b_nvfp4 (RedHatAI): ~18 GB, NVFP4 (Blackwell-native;
sglang auto-detects SM120 and uses fp4-gemm-backend=flashinfer_cudnn)
All use --reasoning-parser qwen3 --tool-call-parser qwen3_coder.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
Arch ships two confusingly-named multipart packages:
python-multipart -> defnull/multipart (wrong)
python-python-multipart -> Kludex/python-multipart (correct, what FastAPI uses)
Selecting the wrong one means FastAPI raises at endpoint registration:
RuntimeError: Form data requires "python-multipart" to be installed.
It seems you installed "multipart" instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
Move proven-required modules (fastapi, starlette, openai, huggingface-hub,
pillow, packaging, psutil, scipy, sentencepiece, soundfile, pyzmq, multipart,
uvicorn, flashinfer) from optdepends to depends. They are imported
unconditionally during sglang.launch_server bootstrap, so the package
crash-loops at startup without them.
Add Qwen3.6-27B (BF16 + FP8), Qwen3.6-35B-A3B (BF16 + FP8) confs and a
gemma_4_31b_fp8 conf using RedHatAI/gemma-4-31B-it-FP8-Dynamic so 31B fits
on a 32 GB card.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
|
|
|
|
Prevents scheduler busy-wait burning a CPU core at idle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
|
|
- Replace sglang.service with sglang@.service template unit
- Add per-model config files for Gemma 4 and Qwen 3.5 variants
- Default to --sleep-on-idle to reduce CPU usage when idle
- Update sglang.conf as global config with SGLANG_OPTS/SGLANG_ARGS split
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
Sync with sglang-git: add Qwen3.5 dense/MoE model sizes with VRAM
estimates, reasoning/tool-call parser options, and usage example.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Replace DynamicUser with static sglang user via sysusers.d/tmpfiles.d
(persistent HF_HOME at /var/lib/sglang survives restarts)
- Add sglang.env (mode 0600) for credentials, separate from sglang.conf
- Harden systemd service: NoNewPrivileges, PrivateTmp, ProtectSystem/Home
- Bind to 127.0.0.1:30000 instead of 0.0.0.0:8000
- Fix arch: any -> x86_64 (CUDA dependency)
- Fix python-python-multipart -> python-multipart
- Add provides/conflicts with sglang-git
- Move config from /etc/sglang.conf to /etc/sglang/
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Move orjson, compressed-tensors, gguf, msgspec, einops, xgrammar
from optdepends to depends (required for launch_server).
Add sglang.service and /etc/sglang.conf for systemd integration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Adds safetensors, triton, partial-json-parser, transformers,
and torchvision — all required for sglang.launch_server to import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Lists available Arch/AUR packages for the serving stack. The client
library works without these; they are needed for the CLI and server.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
These are eagerly imported via sglang.__init__ -> utils.py import chain.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Prevents setuptools build/lib/ directory from leaking into the wheel.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Uses pyproject_other.toml for minimal dependencies (client/gateway only).
GPU runtime deps available as optional extras.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|