Gguf

On April 16, 2026, I replaced my earlier local Gemma run with a heavier stack: llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL --jinja The result was not “a slightly better chat model.” The result was a qualitatively different local agent loop. This time the agent did not stop at repo reconnaissance, rough planning, or code scaffolding. It wrote a research narrative, generated a slide deck module by module, rebuilt after failures, converted the deck to PDF, rasterized slides for visual inspection, read the resulting PNGs, ran text extraction against the .pptx, checked for placeholder residue, and then closed the loop with targeted repairs. That is not AGI. But it is no longer a toy local demo either. ...

On April 4, 2026, I ran pi against a local llama.cpp endpoint serving unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL. I wanted a clear answer to one question: has a quantized open model crossed the line from “cute local demo” to “serious enough to matter” for agentic coding? My answer is yes, but only if you are honest about where it breaks. This stack is already good enough to read local instructions, load skills, use tools, follow a plan, write files, run commands, and recover from some failures. It is not good enough to be trusted on precision-heavy work without hard validation. The first things to degrade were not general fluency or vibe. The first things to degrade were exactness, path resolution, layout arithmetic, and long-context responsiveness. ...

I Ran pi on Qwen3.6 35B A3B via llama-server. It Built the Deck and QA'd Itself

I Ran pi on Gemma 4 26B A4B via llama.cpp. Here Is What Broke First