summarylogtreecommitdiffstats
AgeCommit message (Collapse)Author
4 daysBump to 0.5.12 and ship more Gemma 4 / Qwen 3.6 quantsWill Handley
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28Ship Qwen 3.6 community quants that fit on 32 GB VRAMWill Handley
The official Qwen org publishes only BF16 and FP8 variants of 3.6, none of which fit comfortably on a 32 GB card (27B-FP8 is 28 GB before KV cache). Community quants land within days of release and are the right size for consumer hardware: - qwen3.6_27b_awq_int4 (cyankiwi): ~14 GB, AWQ INT4 - qwen3.6_27b_gptq_int4 (groxaxo): ~14 GB, GPTQ-Pro 4bit - qwen3.6_27b_autoround_int4 (Lorbus): ~14 GB, AutoRound INT4 - qwen3.6_35b_a3b_nvfp4 (RedHatAI): ~18 GB, NVFP4 (Blackwell-native; sglang auto-detects SM120 and uses fp4-gemm-backend=flashinfer_cudnn) All use --reasoning-parser qwen3 --tool-call-parser qwen3_coder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28Fix python-multipart dep — use python-python-multipartWill Handley
Arch ships two confusingly-named multipart packages: python-multipart -> defnull/multipart (wrong) python-python-multipart -> Kludex/python-multipart (correct, what FastAPI uses) Selecting the wrong one means FastAPI raises at endpoint registration: RuntimeError: Form data requires "python-multipart" to be installed. It seems you installed "multipart" instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28Promote runtime imports to depends; ship Qwen3.6 + Gemma 4 31B FP8Will Handley
Move proven-required modules (fastapi, starlette, openai, huggingface-hub, pillow, packaging, psutil, scipy, sentencepiece, soundfile, pyzmq, multipart, uvicorn, flashinfer) from optdepends to depends. They are imported unconditionally during sglang.launch_server bootstrap, so the package crash-loops at startup without them. Add Qwen3.6-27B (BF16 + FP8), Qwen3.6-35B-A3B (BF16 + FP8) confs and a gemma_4_31b_fp8 conf using RedHatAI/gemma-4-31B-it-FP8-Dynamic so 31B fits on a 32 GB card. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27Add python-tilelang optdep for DeepSeek V4Will Handley
2026-04-27Bump to 0.5.10.post1Will Handley
2026-04-14Bake --sleep-on-idle into service fileWill Handley
Prevents scheduler busy-wait burning a CPU core at idle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13Update to 0.5.10 — native Gemma 4 supportWill Handley
2026-04-07Template service, per-model configs, sleep-on-idle defaultWill Handley
- Replace sglang.service with sglang@.service template unit - Add per-model config files for Gemma 4 and Qwen 3.5 variants - Default to --sleep-on-idle to reduce CPU usage when idle - Update sglang.conf as global config with SGLANG_OPTS/SGLANG_ARGS split Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20Update sglang.conf with model size guide and parser examplesWill Handley
Sync with sglang-git: add Qwen3.5 dense/MoE model sizes with VRAM estimates, reasoning/tool-call parser options, and usage example. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20Add dedicated sglang user, systemd hardening, and packaging fixesWill Handley
- Replace DynamicUser with static sglang user via sysusers.d/tmpfiles.d (persistent HF_HOME at /var/lib/sglang survives restarts) - Add sglang.env (mode 0600) for credentials, separate from sglang.conf - Harden systemd service: NoNewPrivileges, PrivateTmp, ProtectSystem/Home - Bind to 127.0.0.1:30000 instead of 0.0.0.0:8000 - Fix arch: any -> x86_64 (CUDA dependency) - Fix python-python-multipart -> python-multipart - Add provides/conflicts with sglang-git - Move config from /etc/sglang.conf to /etc/sglang/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19Add systemd service, config, and hard dependenciesWill Handley
Move orjson, compressed-tensors, gguf, msgspec, einops, xgrammar from optdepends to depends (required for launch_server). Add sglang.service and /etc/sglang.conf for systemd integration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19Add python-sgl-kernel dep, use python-pytorch virtual packageWill Handley
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18Add hard dependencies for launch_serverWill Handley
Adds safetensors, triton, partial-json-parser, transformers, and torchvision — all required for sglang.launch_server to import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18Add runtime_common dependencies as optdependsWill Handley
Lists available Arch/AUR packages for the serving stack. The client library works without these; they are needed for the CLI and server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18Add missing hard dependencies: pybase64, pydanticWill Handley
These are eagerly imported via sglang.__init__ -> utils.py import chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18Clean build artifacts before wheel creationWill Handley
Prevents setuptools build/lib/ directory from leaking into the wheel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18Add .gitignore for build artifactsWill Handley
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18Initial PKGBUILD for sglang 0.5.9Will Handley
Uses pyproject_other.toml for minimal dependencies (client/gateway only). GPU runtime deps available as optional extras. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>