May 1^st 2025 · 2 min read · #ai #cloning #speech #stt #synthesis #tts #voice #whisper

AI Speech Technologies

This page is a collection of notes and links related to AI speech technologies, including Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, voice cloning, and other related frippery in the modern AI space.

Resources

Field	Category	Date	Link	Notes
Generative Audio	Models	2023	bark	a text-prompted genereative audio model
Speech Agents	Tools	2026	RCLI	an on-device Apple Silicon voice agent for macOS that combines STT, local LLM inference, TTS, 38 local actions, and document RAG in a low-latency TUI/CLI.
			dograh	Open source voice agent platform.
			speech-to-speech	an open-source local voice agent pipeline combining VAD, STT, LLM, and TTS
		2025	Asterisk-AI-Voice-Agent	An Asterisk-based AI voice agent project integrating telephony with voice AI workflows.
Speech Recognition	Libraries	2025	WhisperKit	a Swift package that integrates Whisper with Apple’s CoreML
	Models	2026	transcribe.cpp	ggml speech-to-text inference for 16+ model families
		2026	parakeet.cpp	a ggml-based C++ inference port of NVIDIA NeMo Parakeet ASR models with streaming support
		2024	WhisperLive	a real-time text-to-speech system based on Whisper
		2024	moonshine	a family of models optimized for fast and accurate automatic speech recognition on resource-constrained devices. Designed to run efficiently on smaller hardware. The `micro` subdirectory contains the very-low-latency speech stack.
		2023	distil-whisper	a distilled version of whisper that is 6 times faster
		2022	whisper.cpp	a C++ implementation of whisper that can run in consumer hardware
		2022	whisper	a general purpose speech recognition model
	Tools	2026	TypeWhisper	A macOS dictation and transcription app that can use Apple’s speech stack as well as other local on-device engines, with optional prompt-driven post-processing.
			Handy	An offline, cross-platform speech-to-text app built with Tauri that transcribes locally. Uses Whisper and Parakeet models without sending audio to the cloud.
			Bootlegger	FastAPI server for Moonshine STT
			Ghost Pepper	a 100% local macOS hold-to-talk speech-to-text menu bar app with WhisperKit transcription and local LLM cleanup.
			dictate	a Go-based local voice-to-text tool for Linux terminals that streams whisper.cpp transcription to stdout, files, or keystroke injection for dictating into focused terminal apps.
		2024	audapolis	an editor for spoken-word audio with automatic transcription
		2023	insanely-fast-whisper	An opinionated CLI for audio transcription
	Transcription Tools	2026	buzz	A cross-platform Whisper desktop app that works quite well on the Mac (including speaker diarization)
	Transcription Tools	2025	OpenTranscribe	an open-source all-in-one recording transcription and diarization stack
Speech Synthesis	Implementations	2026	pocket-tts.c	A minimal, dependency-free C scaffold for Pocket-TTS, aimed at CPU-only TTS. Includes a tiny CLI in the flux2.c style.
	Models		sopro	a lightweight text-to-speech model
			Inflect-Micro-v2	a lightweight text-to-speech model
		2025	csm	a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.
			Orpheus-TTS	an open-source text-to-speech system built on Llama-3b
			chatterbox	a text-to-speech model that can generate expressive speech with a variety of styles and emotions.
		2024	ChatTTS	a text-to-speech model designed specifically for dialogue scenarios, with decent prosody
			Real-Time-Voice-Cloning	a PyTorch implementation of a voice cloning model
			WhisperSpeech	a text-to-speech system built by inverting Whisper
		2023	StyleTTS2	A text to speech model that supports style diffusion
			silero-models	Multi-language neural text-to-speech models
	Resources		Training a voice for piper TTS	a detailed walkthrough of how to customize a voice model
	Tools	2026	Voicebox	an open-source voice cloning studio with DAW-like features, local-first voice synthesis powered by Qwen3-TTS, multi-track timeline editor, and REST API
			pocket-tts	A lightweight text-to-speech (TTS) application designed to run efficiently on CPUs that supports voice cloning
			inflect-speechd	local text-to-speech server around Inflect Nano v2 with Speech Dispatcher integration
		2025	voice-pro	a toolkit for doing speech processing and voice cloning
			abogen	a tool for generating audiobooks from text using the Kokoro open weights model
			edge-tts	a text-to-speech module that leverages the Microsoft Edge TTS API
			podcastfy	a tool for generating podcasts from text
		2024	OpenVoice	a tool that enables accurate voice cloning with multi-lingual support and flexible style control.

← The Kingroon KP3S Pro (V1), Two Years Later On The Apple U.S. App Guidelines Update →

This page is referenced in:

Notes for August 1-10 • Aug 10^th 2025
The Great AI Breakdown • May 1^st 2025
Stupid Patent Of The Day • Feb 3^rd 2004
Artificial Intelligence • Jan 22^nd 2004