vLLM + Qwen2.5-14B on Hetzner RTX 4000 Ada: Making Tool Calling Work

A complete journey of getting vLLM + Qwen2.5-14B-AWQ running on an RTX 4000 Ada with working tool calls for OpenCode: CUDA driver setup, throughput debugging, AWQ Marlin kernels for sm_89, writing a custom Qwen tool parser plugin, and debugging model refusals caused by a reasoning:true misconfiguration.

February 27, 2026 · 10 min · Ashish Jaiswal

DeepCoder-14B vs Qwen3-Coder: Which Coding Model Should You Run Locally?

Since we committed to Qwen2.5-Coder-14B on our Hetzner GEX44, two serious challengers appeared: DeepCoder-14B from Agentica/Together AI, and Qwen3-Coder from Alibaba. Here’s an honest comparison for teams running local LLMs on real, budget hardware.

February 26, 2026 · 9 min · Ashish Jaiswal

Setting Up a Local LLM at Obmondo: From Zero to Qwen2.5-Coder

How we decided to run a local LLM at Obmondo — hardware selection on a budget, understanding quantization and model parameters, comparing Ollama vs vLLM, and landing on Qwen2.5-Coder for coding assistance.

February 24, 2026 · 13 min · Ashish Jaiswal

Setting Up LXD on Ubuntu 24.04 and Running OpenVox Agent with LinuxAid

A hands-on guide to installing LXD on Ubuntu 24.04, spinning up containers, and connecting them to LinuxAid using the OpenVox agent for configuration management.

June 15, 2025 · 5 min · Ashish Jaiswal

What the Heck Is Solana? — From a Clawrouter Perspective

A deep dive into Solana — what makes it fast, why it matters, and how it connects to the Clawrouter project. Written by someone who builds infrastructure for a living.

February 8, 2025 · 6 min · Ashish Jaiswal