I Ran pi on Gemma 4 26B A4B via llama.cpp. Here Is What Broke First

On April 4, 2026, I ran pi against a local llama.cpp endpoint serving unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL. I wanted a clear answer to one question: has a quantized open model crossed the line from “cute local demo” to “serious enough to matter” for agentic coding? My answer is yes, but only if you are honest about where it breaks. This stack is already good enough to read local instructions, load skills, use tools, follow a plan, write files, run commands, and recover from some failures. It is not good enough to be trusted on precision-heavy work without hard validation. The first things to degrade were not general fluency or vibe. The first things to degrade were exactness, path resolution, layout arithmetic, and long-context responsiveness. ...

April 4, 2026 · 11 min · 2333 words · Pavel Nasovich

TurboQuant Under the Hood: Google's 3-Bit Attack on the LLM Memory Wall

Most AI efficiency launches are either smaller weights, benchmark theater, or a kernel trick dressed up as a new paradigm. TurboQuant is more interesting than that. On March 24, 2026, Google Research published TurboQuant as a practical compression stack for KV caches and vector search. The public claim was blunt: at least 6x KV-cache reduction, up to 8x attention-logit speedup on H100, and no training or fine-tuning required. Underneath the marketing, the real contribution is cleaner and more important: Google found a way to make extreme low-bit vector quantization behave like a systems primitive instead of a fragile research demo. S1 ...

March 26, 2026 · 15 min · 3033 words · Pavel Nasovich

Microsoft's Agentic Modernization Stack: Azure Copilot, GitHub Copilot, and the Control Plane Nobody Is Talking About

Microsoft’s biggest AI play in 2026 is not a new model, a new IDE, or a new assistant. It is an emerging connected modernization control plane: Azure Copilot owns migration and operational intelligence, GitHub Copilot owns application transformation execution, and Operations Center gives enterprises a single surface to observe, steer, and govern the resulting cloud estate. Each layer is individually useful. Together they describe something more interesting: a vertical stack that can convert a multi-year legacy migration programme into a continuous, agentic workflow — with humans kept in the decision seat rather than removed from it. ...

March 24, 2026 · 15 min · 3106 words · Pavel Nasovich

The Real GitHub Copilot Publishing Factory: How I Turned a Hugo Blog into a Repo-Aware Content System

Most “Copilot for blogging” setups are fake. They give the model a nicer prompt, maybe a scaffold script, and then act surprised when the output breaks the repo. That approach fails the moment the repository has real structure: Hugo page bundles instead of one flat posts/ folder local images and downloadable assets theme overrides on top of a vendored submodule deploy config and build rules old posts with inconsistent front matter styles companion materials like quizzes, flashcards, or social copy I wanted something stricter: a repo where GitHub Copilot could take a scoped topic, research it, scaffold the right bundle, write into the right files, validate the result, and stop before touching generated output. ...

March 24, 2026 · 12 min · 2380 words · Pavel Nasovich

When the Scanner Turned: Inside the Trivy Supply Chain Attack and the Rise of CanisterWorm

In March 2026, attackers turned Aqua Security’s Trivy ecosystem into a credential-harvesting distribution channel. This was not one bug, one poisoned package, or one bad release. It was a chained failure across GitHub Actions trust, secret rotation, mutable tags, runner memory, registry publishing, and npm’s default willingness to execute third-party code. On February 27 and February 28, 2026, the Trivy story started the way a lot of modern software compromises start: not with a zero-day in the scanner, but with automation glued together too loosely around trust. An autonomous agent dubbed hackerbot-claw found a dangerous pull_request_target pattern in Aqua Security’s Trivy repository, exploited it, and stole a privileged aqua-bot token. That first breach was bad enough on its own. The real disaster came after the first incident was supposedly contained. ...

March 24, 2026 · 17 min · 3422 words · Pavel Nasovich

GitHub-Native Autonomous Intake for Copilot: From Structured Issues to Draft PRs

Most autonomous content demos are fake. They show a model taking a prompt and emitting a draft, but they skip the part that actually matters in a working repository: intake structure, validation, repo rules, PR flow, and failure handling. For this blog, I wanted a GitHub-native pipeline where an idea could start as a structured issue, get normalized into a deterministic brief, be assigned to GitHub Copilot, and come back as a draft PR that still respected the repo. ...

March 24, 2026 · 9 min · 1891 words · Pavel Nasovich

The Multi-Call Latency Trap: Why Your Voice Bot Is Probably a Gateway Problem First

Reading time: ~24 min | Audience: staff engineers, AI architects, platform leads, senior ICs inheriting a slow conversational system | Primary goal: stop treating a 10 to 13 second LLM workflow like a prompt problem when it is really a systems problem Preface: Voice Did Not Create the Problem Here is the version of this story that gets told too often: “The text bot was fine. Then voice arrived. Now latency matters.” ...

March 23, 2026 · 15 min · 3185 words · Pavel Nasovich

RunAnywhere (YC W26): The Real Bet Behind Fast AI Inference on Apple Silicon

Preface: How I Read This Research Pack The local research bundle on RunAnywhere is broad, but it is not uniform. Some files are direct performance summaries, some are opinionated strategy memos, and some are clearly derivative study aids built from the same underlying source set. After reading the full bundle, then re-checking the public web evidence on March 12, 2026, my conclusion is narrower and more useful: RunAnywhere is not just a “fastest inference on Apple Silicon” demo. It is trying to become the runtime, packaging, and fleet-management layer for on-device AI, with MetalRT acting as the Apple-Silicon flagship proof point. S1 S2 S3 S4 ...

March 11, 2026 · 16 min · 3211 words · Pavel Nasovich

The Great Immich Migration: From v1.113.0 to v2.5.6

How a “simple” photo library upgrade turned into a deep dive through PostgreSQL version migrations, deprecated vector extensions, and the kind of database surgery you hope to never need. The Starting Point My Immich instance had been happily humming along at v1.113.0 for months on my Unraid server. I was running the community all-in-one imagegenius/immich container variant, which bundles the server, microservices, machine learning, and Redis into one image, backed by an NVIDIA GPU for CUDA-accelerated ML and a shared PostgreSQL 14 instance that also served a pile of other workloads. ...

March 9, 2026 · 7 min · 1380 words · Pavel Nasovich

FinOps Toolkit Framework Playbook: Secure Hubs, AI Agents, and a 90-Day Execution Model

Preface: Why This Version Exists Most FinOps programs fail in the same place: they build good dashboards and still ship bad decisions. The root cause is rarely tooling. It is usually one of these: Ingestion is not trustworthy (missing prices, missing months, duplicates after scope changes). Ownership is fuzzy (nobody is on the hook for a recommendation becoming a change). The loop is discontinuous (big cost projects twice a year instead of an operating rhythm). This playbook focuses on the parts that create trust: data contracts, scope design, versioning, and operational gates. ...

March 2, 2026 · 13 min · 2693 words · Pavel Nasovich