Voxtral, FluidAudio, and Parakeet: A Deep Technical Map of the Modern Local Speech Stack

The speech stack has split into three very different shapes. One shape is a model family: Voxtral. It is Mistral’s audio line, with text-to-speech, speech-to-text, realtime transcription, and API-centered voice workflows. Another shape is a native Apple SDK: FluidAudio. It is not one model. It is a Swift/CoreML pipeline for local transcription, voice activity detection, diarization, and TTS on macOS and iOS. The third shape is a recognition engine: Parakeet. It is NVIDIA’s ASR family, built around FastConformer/TDT variants, optimized for very fast and accurate speech-to-text. ...

May 29, 2026 · 25 min · 5175 words · Pavel Nasovich

RunAnywhere (YC W26): The Real Bet Behind Fast AI Inference on Apple Silicon

Preface: How I Read This Research Pack The local research bundle on RunAnywhere is broad, but it is not uniform. Some files are direct performance summaries, some are opinionated strategy memos, and some are clearly derivative study aids built from the same underlying source set. After reading the full bundle, then re-checking the public web evidence on March 12, 2026, my conclusion is narrower and more useful: RunAnywhere is not just a “fastest inference on Apple Silicon” demo. It is trying to become the runtime, packaging, and fleet-management layer for on-device AI, with MetalRT acting as the Apple-Silicon flagship proof point. S1 S2 S3 S4 ...

March 11, 2026 · 16 min · 3211 words · Pavel Nasovich