Posts

vLLM + Qwen2.5-14B on Hetzner RTX 4000 Ada: Making Tool Calling Work

A complete journey of getting vLLM + Qwen2.5-14B-AWQ running on an RTX 4000 Ada with working tool calls for OpenCode: CUDA driver setup, throughput debugging, AWQ Marlin kernels for sm_89, writing a custom Qwen tool parser plugin, and debugging model refusals caused by a reasoning:true misconfiguration.

DeepCoder-14B vs Qwen3-Coder: Which Coding Model Should You Run Locally?

Since we committed to Qwen2.5-Coder-14B on our Hetzner GEX44, two serious challengers appeared: DeepCoder-14B from Agentica/Together AI, and Qwen3-Coder from Alibaba. Here’s an honest comparison for teams running local LLMs on real, budget hardware.

Setting Up a Local LLM at Obmondo: From Zero to Qwen2.5-Coder

How we decided to run a local LLM at Obmondo — hardware selection on a budget, understanding quantization and model parameters, comparing Ollama vs vLLM, and landing on Qwen2.5-Coder for coding assistance.

Setting Up LXD on Ubuntu 24.04 and Running OpenVox Agent with LinuxAid

A hands-on guide to installing LXD on Ubuntu 24.04, spinning up containers, and connecting them to LinuxAid using the OpenVox agent for configuration management.

What the Heck Is Solana? — From a Clawrouter Perspective

A deep dive into Solana — what makes it fast, why it matters, and how it connects to the Clawrouter project. Written by someone who builds infrastructure for a living.