AI Speech Technologies

This page is a collection of notes and links related to speech technologies, including Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, voice cloning, and other related frippery in the modern space.

Resources

Field Category Date Link Notes
Generative Audio Models 2023 bark

a text-prompted genereative audio model

Speech Agents Tools 2025 Asterisk-AI-Voice-Agent

An Asterisk-based AI voice agent project integrating telephony with voice AI workflows.

Speech Recognition Libraries WhisperKit

a Swift package that integrates Whisper with Apple’s CoreML

Models 2024 WhisperLive

a real-time text-to-speech system based on Whisper

moonshine

a family of models optimized for fast and accurate automatic speech recognition on resource-constrained devices. Designed to run efficiently on smaller hardware.

2023 distil-whisper

a distilled version of whisper that is 6 times faster

2022 whisper.cpp

a C++ implementation of whisper that can run in consumer hardware

whisper

a general purpose speech recognition model

Tools 2026 Handy

An offline, cross-platform speech-to-text app built with Tauri that transcribes locally. Uses Whisper and Parakeet models without sending audio to the cloud.

2024 audapolis

an editor for spoken-word audio with automatic transcription

2023 insanely-fast-whisper

An opinionated CLI for audio transcription

Transcription Tools 2026 buzz

A cross-platform Whisper desktop app that works quite well on the Mac (including speaker diarization)

2025 OpenTranscribe

an open-source all-in-one recording transcription and diarization stack

Speech Synthesis Implementations 2026 pocket-tts.c

A minimal, dependency-free C scaffold for Pocket-TTS, aimed at CPU-only TTS. Includes a tiny CLI in the flux2.c style.

Models sopro

a lightweight text-to-speech model

2025 csm

a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.

Orpheus-TTS

an open-source text-to-speech system built on Llama-3b

chatterbox

a text-to-speech model that can generate expressive speech with a variety of styles and emotions.

2024 ChatTTS

a text-to-speech model designed specifically for dialogue scenarios, with decent prosody

Real-Time-Voice-Cloning

a PyTorch implementation of a voice cloning model

WhisperSpeech

a text-to-speech system built by inverting Whisper

2023 StyleTTS2

A text to speech model that supports style diffusion

Resources Training a voice for piper TTS

a detailed walkthrough of how to customize a voice model

Tools 2026 pocket-tts

A lightweight text-to-speech (TTS) application designed to run efficiently on CPUs that supports voice cloning

2025 voice-pro

a tookit for doing speech processing and voice cloning

a tool for doing speech processing and voice cloning

abogen

a tool for generating audiobooks from text using the Kokoro open weights model

edge-tts

a text-to-speech module that leverages the Microsoft Edge TTS API

podcastfy

a tool for generating podcasts from text

2024 OpenVoice

a tool that enables accurate voice cloning with multi-lingual support and flexible style control.

This page is referenced in: