Qwen | Ashish Jaiswal

vLLM + Qwen2.5-14B on Hetzner RTX 4000 Ada: Making Tool Calling Work

A complete journey of getting vLLM + Qwen2.5-14B-AWQ running on an RTX 4000 Ada with working tool calls for OpenCode: CUDA driver setup, throughput debugging, AWQ Marlin kernels for sm_89, writing a custom Qwen tool parser plugin, and debugging model refusals caused by a reasoning:true misconfiguration.

DeepCoder-14B vs Qwen3-Coder: Which Coding Model Should You Run Locally?

Since we committed to Qwen2.5-Coder-14B on our Hetzner GEX44, two serious challengers appeared: DeepCoder-14B from Agentica/Together AI, and Qwen3-Coder from Alibaba. Here’s an honest comparison for teams running local LLMs on real, budget hardware.