vLLM is an easy way to spin up Large Language Models locally. It handles a lot of things automatically, including downloading models from HuggingFace.
Embedding server
You can start an embedding server like this:
uv run --python cpython-3.12.11-linux-x86_64-gnu --with vllm -- vllm serve --task embedding Qwen/Qwen3-Embedding-0.6B
Citation
If you find this work useful, please cite it as:
@article{yaltirakli,
title = "vLLM",
author = "Yaltirakli, Gokberk",
journal = "gkbrk.com",
year = "2025",
url = "https://www.gkbrk.com/vllm"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation Gokberk Yaltirakli, "vLLM", August, 2025. [Online]. Available: https://www.gkbrk.com/vllm. [Accessed Aug. 16, 2025].
APA Style Yaltirakli, G. (2025, August 16). vLLM. https://www.gkbrk.com/vllm
Bluebook Style Gokberk Yaltirakli, vLLM, GKBRK.COM (Aug. 16, 2025), https://www.gkbrk.com/vllm