.

vLLM


Reading time: less than 1 minute

vLLM is an easy way to spin up Large Language Models locally. It handles a lot of things automatically, including downloading models from HuggingFace.

Embedding server

You can start an embedding server like this:

uv run --python cpython-3.12.11-linux-x86_64-gnu --with vllm -- vllm serve --task embedding Qwen/Qwen3-Embedding-0.6B

Citation

If you find this work useful, please cite it as:
@article{yaltirakli,
  title   = "vLLM",
  author  = "Yaltirakli, Gokberk",
  journal = "gkbrk.com",
  year    = "2025",
  url     = "https://www.gkbrk.com/vllm"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation
Gokberk Yaltirakli, "vLLM", August, 2025. [Online]. Available: https://www.gkbrk.com/vllm. [Accessed Aug. 16, 2025].
APA Style
Yaltirakli, G. (2025, August 16). vLLM. https://www.gkbrk.com/vllm
Bluebook Style
Gokberk Yaltirakli, vLLM, GKBRK.COM (Aug. 16, 2025), https://www.gkbrk.com/vllm

Comments

© 2025 Gokberk Yaltirakli