II-Search-4B: A Love Letter to Small Models (Or How I Learned to Stop Worrying and Embrace 4B Parameters)

Okay so. Picture this. It’s August 2025, I’m drowning in API costs from o3, my ideas’s runway is… let’s not talk about it… and I stumble across this random Hugging Face model called II-Search-4B. Four billion parameters. FOUR. In an era where everyone’s flexing their 405B models like it’s a #### measuring contest. My first thought? “This is gonna s*ck.” My second thought, three weeks later? “Holy crap this actually works.” ...

August 8, 2025 · 8 min · 1574 words · Pavel Nasovich

Qwen3-30B-A3B Deep Dive: How 128 Experts Achieve Frontier Performance at 10% Active Parameters

Qwen3-30B-A3B represents a paradigm shift in large language model efficiency, achieving flagship-level performance with only 3.3 billion active parameters from a 30.5 billion total parameter pool. This Mixture-of-Experts (MoE) model, released by Alibaba’s Qwen team, demonstrates that intelligent parameter activation can outperform brute-force scaling, scoring 91.0 on ArenaHard while using 10x fewer active parameters than comparable dense models. The model’s hybrid thinking architecture enables controllable reasoning depth, supporting both rapid responses and deep analytical tasks through dynamic computational allocation. ...

August 7, 2025 · 7 min · 1403 words · Pavel Nasovich