vLLM + Qwen2.5-14B on Hetzner RTX 4000 Ada: Making Tool Calling Work

A complete journey of getting vLLM + Qwen2.5-14B-AWQ running on an RTX 4000 Ada with working tool calls for OpenCode: CUDA driver setup, throughput debugging, AWQ Marlin kernels for sm_89, writing a custom Qwen tool parser plugin, and debugging model refusals caused by a reasoning:true misconfiguration.

February 27, 2026 · 10 min · Ashish Jaiswal

Setting Up a Local LLM at Obmondo: From Zero to Qwen2.5-Coder

How we decided to run a local LLM at Obmondo — hardware selection on a budget, understanding quantization and model parameters, comparing Ollama vs vLLM, and landing on Qwen2.5-Coder for coding assistance.

February 24, 2026 · 13 min · Ashish Jaiswal