Qwen | Ashish Jaiswal

Obmondo Code: An AI-Powered SRE Alert Diagnosis CLI

A walkthrough of Obmondo Code, an SRE co-pilot CLI built in Go that connects an AI agent to 290+ diagnostic runbooks, SSH command execution, Gitea issue parsing, and time registration — with a strict safety model baked into every layer.

Building a PR Review Bot Under 32K Context: Go AST Diffing, YAML Key Diffing, and Smart Truncation

Artoo, Obmondo’s Mattermost bot, got a PR review engine that runs on a self-hosted Qwen3-14B with a ~32K context window. The key problem: a single Kubernetes PR can have 10,000+ lines of diff. The solution: Go pre-processing that compresses diffs to ~2KB before the LLM ever sees them — using go/ast for Go files, YAML key-level diffing for Helm/K8s configs, and priority-based file truncation. This post covers why each decision was made and what the LLM still cannot do.

Upgrading vLLM to v0.17.0 with Qwen on RTX 4000 Ada: Every Breaking Change You Will Hit

A hands-on record of every error encountered upgrading from vLLM v0.9.2 to v0.17.0 with Qwen models on an RTX 4000 Ada (20GB VRAM): deprecated CLI flags, CUDA runtime changes, entrypoint conflicts, tool-call parser requirements, unsupported transformers, and how to pick the right quantized model for 20GB VRAM.

vLLM + Qwen2.5-14B on Hetzner RTX 4000 Ada: Making Tool Calling Work

A complete journey of getting vLLM + Qwen2.5-14B-AWQ running on an RTX 4000 Ada with working tool calls for OpenCode: CUDA driver setup, throughput debugging, AWQ Marlin kernels for sm_89, writing a custom Qwen tool parser plugin, and debugging model refusals caused by a reasoning:true misconfiguration.

DeepCoder-14B vs Qwen3-Coder: Which Coding Model Should You Run Locally?

Since we committed to Qwen2.5-Coder-14B on our Hetzner GEX44, two serious challengers appeared: DeepCoder-14B from Agentica/Together AI, and Qwen3-Coder from Alibaba. Here’s an honest comparison for teams running local LLMs on real, budget hardware.