Copilot Cowork Under the Hood: Frontier, Work IQ, and the OneDrive Skills Model

On March 9, 2026, Microsoft introduced Copilot Cowork as the move from “Copilot can answer” to “Copilot can carry work forward.” On March 30, 2026, Microsoft said Cowork was available through the Frontier program. As of the Microsoft Learn and Support documentation updated in late March and early April 2026, Cowork is still explicitly documented as a preview/prerelease capability, gated through Frontier and still evolving. S1 S2 S3 S5 S8 That date sequence matters, because Cowork is not just another prompt box. It is Microsoft 365 Copilot’s first serious “plan to action” surface for long-running work: you describe an outcome, Cowork turns it into a plan, grounds it in your tenant context, loads skills, asks for approvals on sensitive steps, and keeps state in a visible task view while it works. S1 S2 S4 S8 S10 ...

April 8, 2026 · 23 min · 4754 words · Pavel Nasovich

PlugMem Under the Hood: Why Knowledge-Centric Memory Changes LLM Agents

Most agent-memory systems still do the lazy thing: store raw interaction history, retrieve a few chunks, and hope the base model compresses the mess at inference time. PlugMem starts from a much stronger assumption. The useful part of experience is sparse, structured, and should be compiled before retrieval. That is why this paper matters. PlugMem was submitted to arXiv on February 6, 2026, published on the Microsoft Research site on March 6, 2026, and the PDF metadata marks it as an ICML 2026 proceedings paper. As of April 5, 2026, the code and benchmark artifacts are public. The claim is ambitious but concrete: a single task-agnostic memory module, attached unchanged to very different agents, can beat both raw-memory baselines and several task-specific memory systems while using much less agent-side context. S1 S2 S3 S4 ...

April 5, 2026 · 16 min · 3405 words · Pavel Nasovich

I Ran pi on Gemma 4 26B A4B via llama.cpp. Here Is What Broke First

On April 4, 2026, I ran pi against a local llama.cpp endpoint serving unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL. I wanted a clear answer to one question: has a quantized open model crossed the line from “cute local demo” to “serious enough to matter” for agentic coding? My answer is yes, but only if you are honest about where it breaks. This stack is already good enough to read local instructions, load skills, use tools, follow a plan, write files, run commands, and recover from some failures. It is not good enough to be trusted on precision-heavy work without hard validation. The first things to degrade were not general fluency or vibe. The first things to degrade were exactness, path resolution, layout arithmetic, and long-context responsiveness. ...

April 4, 2026 · 11 min · 2333 words · Pavel Nasovich

TurboQuant Under the Hood: Google's 3-Bit Attack on the LLM Memory Wall

Most AI efficiency launches are either smaller weights, benchmark theater, or a kernel trick dressed up as a new paradigm. TurboQuant is more interesting than that. On March 24, 2026, Google Research published TurboQuant as a practical compression stack for KV caches and vector search. The public claim was blunt: at least 6x KV-cache reduction, up to 8x attention-logit speedup on H100, and no training or fine-tuning required. Underneath the marketing, the real contribution is cleaner and more important: Google found a way to make extreme low-bit vector quantization behave like a systems primitive instead of a fragile research demo. S1 ...

March 26, 2026 · 15 min · 3033 words · Pavel Nasovich

Microsoft's Agentic Modernization Stack: Azure Copilot, GitHub Copilot, and the Control Plane Nobody Is Talking About

Microsoft’s biggest AI play in 2026 is not a new model, a new IDE, or a new assistant. It is an emerging connected modernization control plane: Azure Copilot owns migration and operational intelligence, GitHub Copilot owns application transformation execution, and Operations Center gives enterprises a single surface to observe, steer, and govern the resulting cloud estate. Each layer is individually useful. Together they describe something more interesting: a vertical stack that can convert a multi-year legacy migration programme into a continuous, agentic workflow — with humans kept in the decision seat rather than removed from it. ...

March 24, 2026 · 15 min · 3106 words · Pavel Nasovich

The Real GitHub Copilot Publishing Factory: How I Turned a Hugo Blog into a Repo-Aware Content System

Most “Copilot for blogging” setups are fake. They give the model a nicer prompt, maybe a scaffold script, and then act surprised when the output breaks the repo. That approach fails the moment the repository has real structure: Hugo page bundles instead of one flat posts/ folder local images and downloadable assets theme overrides on top of a vendored submodule deploy config and build rules old posts with inconsistent front matter styles companion materials like quizzes, flashcards, or social copy I wanted something stricter: a repo where GitHub Copilot could take a scoped topic, research it, scaffold the right bundle, write into the right files, validate the result, and stop before touching generated output. ...

March 24, 2026 · 12 min · 2380 words · Pavel Nasovich

When the Scanner Turned: Inside the Trivy Supply Chain Attack and the Rise of CanisterWorm

In March 2026, attackers turned Aqua Security’s Trivy ecosystem into a credential-harvesting distribution channel. This was not one bug, one poisoned package, or one bad release. It was a chained failure across GitHub Actions trust, secret rotation, mutable tags, runner memory, registry publishing, and npm’s default willingness to execute third-party code. On February 27 and February 28, 2026, the Trivy story started the way a lot of modern software compromises start: not with a zero-day in the scanner, but with automation glued together too loosely around trust. An autonomous agent dubbed hackerbot-claw found a dangerous pull_request_target pattern in Aqua Security’s Trivy repository, exploited it, and stole a privileged aqua-bot token. That first breach was bad enough on its own. The real disaster came after the first incident was supposedly contained. ...

March 24, 2026 · 17 min · 3422 words · Pavel Nasovich

GitHub-Native Autonomous Intake for Copilot: From Structured Issues to Draft PRs

Most autonomous content demos are fake. They show a model taking a prompt and emitting a draft, but they skip the part that actually matters in a working repository: intake structure, validation, repo rules, PR flow, and failure handling. For this blog, I wanted a GitHub-native pipeline where an idea could start as a structured issue, get normalized into a deterministic brief, be assigned to GitHub Copilot, and come back as a draft PR that still respected the repo. ...

March 24, 2026 · 9 min · 1891 words · Pavel Nasovich

The Multi-Call Latency Trap: Why Your Voice Bot Is Probably a Gateway Problem First

Reading time: ~24 min | Audience: staff engineers, AI architects, platform leads, senior ICs inheriting a slow conversational system | Primary goal: stop treating a 10 to 13 second LLM workflow like a prompt problem when it is really a systems problem Preface: Voice Did Not Create the Problem Here is the version of this story that gets told too often: “The text bot was fine. Then voice arrived. Now latency matters.” ...

March 23, 2026 · 15 min · 3185 words · Pavel Nasovich

RunAnywhere (YC W26): The Real Bet Behind Fast AI Inference on Apple Silicon

Preface: How I Read This Research Pack The local research bundle on RunAnywhere is broad, but it is not uniform. Some files are direct performance summaries, some are opinionated strategy memos, and some are clearly derivative study aids built from the same underlying source set. After reading the full bundle, then re-checking the public web evidence on March 12, 2026, my conclusion is narrower and more useful: RunAnywhere is not just a “fastest inference on Apple Silicon” demo. It is trying to become the runtime, packaging, and fleet-management layer for on-device AI, with MetalRT acting as the Apple-Silicon flagship proof point. S1 S2 S3 S4 ...

March 11, 2026 · 16 min · 3211 words · Pavel Nasovich