There’s a new generative AI kid on the block — and this one’s mimicking the human voice.
ElevenLabs wants to transfer text-to-speech and audio-to-audio in any language, any voice, with the full range of emotions. It could be used for everything from creating audiobooks to dubbing movies.
Today it’s announcing a $2m pre-seed round led by Credo, a Czech VC, at a time when the public interest in generative AI like OpenAI’s ChatGPT and Stability AI’s Stable Diffusion is going through the roof.
“There’s OpenAI and several copycat companies that do similar things with text and images, but an audio solution is missing,” says Mati Staniszewski, ElevenLabs’ CEO and cofounder.
What does ElevenLabs do?
ElevenLabs is an AI voice technology startup, cofounded by two Polish engineers, Mati Staniszewski and Piotr Dabkowski. The solution it’s developed — a deep-learning model for speech synthesis — can convert text to speech in any voice and any emotion, currently working in English and in Polish. It works on short and long-form narratives, and so could be used by book publishers, journalists or content creators.
ElevenLabs produces artificial voices but it’s also able to clone existing ones — so, for example, a book can be read out in the voice of a celebrity, without much of their involvement — the tech can clone a voice out of samples as short as five seconds.
In the short term, the startup hopes to get its solution working for all languages. In the longer term, it hopes to be able to convert instantly spoken audio between languages — a solution that could be used, for example, in cinema dubbing, live TV and real-time communication.
The solution has been tested by 250 clients so far — and will be more widely available from February.
🎧 LISTEN: To give Sifted a preview, ElevenLabs has converted this article into audio. Listen to the recording below.
What’s the market like?
There are several startups operating in the area of voice AI — Ukrainian Respeecher, Canadian Resemble AI and American WellSaid Labs, just to name a few. But none of them can handle longer written forms like ElevenLabs, says Staniszewski.
For Staniszewski, the most serious competition is posed by big tech companies and scaleups with their own research departments. “We’re the most afraid of the companies that focus on research — like OpenAI. They’ll do a lot of research in voice and develop a model,” he says.
The big tech firms are moving fast: in the first three weeks of 2023, Microsoft has already announced a new tool that can clone one’s voice and tone from a three-second snippet of audio. Apple has also announced the launch of a new digital narration project to create audiobooks.
Who has invested?
- Credo, a Czech VC
- Concept Ventures, a UK pre-seed fund
- Angel investors, including Pether Czaban, an ex-founder at blockchain platform and cryptocurrency Polkadot, Bartek Pucek, author of a Polish tech newsletter, and Carles Reina, former VP of Sonantic, a voice tech startup.
What’s the future for ElevenLabs?
With the fresh funding, ElevenLabs wants to scale up its solution globally, so it’s available in all languages. It also wants to start research into automatic dubbing from one language to another with the same voice. This AI dubbing tool is aimed for release later this year.
In the next few months, Staniszewski plans to double his team, which is currently five people strong.
For ElevenLabs, the timing of its first round is a blessing and a curse.
On the one hand, generative AI is undoubtedly going to be the hottest tech area in 2023 — and it’ll attract the attention of clients, business partners and investors.
On the other hand — big tech companies, with their endless resources, are already putting a lot of time and money into developing their own generative AI solutions or acquiring existing ventures (see Microsoft’s reported bid to invest $10bn in OpenAI). If startups like ElevenLabs want to break through in this crowded market, they will have to move fast.
The pitch deck the founders used
Sifted’s been given access to ElevenLabs’ pitch deck since its recent raise. Check it out below.
Zosia Wanat is Sifted’s central and eastern Europe reporter, based in Warsaw. She tweets from @zosiawanat.