Sonu Sahani logo
Sonusahani.com
KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs

KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs

0 views
5 min read
#AI

European viewers have had every right to complain. While English TTS has soared ahead with commercial giants like ElevenLabs, anyone wanting high quality voice synthesis in German, Polish, or dozens of other European languages has faced slim pickings in the open-source world. This is where KugelAudio Open is trying to help. It's a new entry that's worth paying attention to.

In human preference testing, it outperformed ElevenLabs with a 78% win rate. That shows open-source models can now compete with the best proprietary systems in the industry. This model is built on Microsoft's Vibe voice architecture and trained on 200,000 hours of speech data from 23 European languages. The language list is quite comprehensive for Europe and I am more than sure it will grow.

Installing and Running KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs

I used an Ubuntu system with an Nvidia RTX A6000 GPU and 48 GB of VRAM. During generation, it consumed close to 19 GB of VRAM. The model is a bit large, but the quality is quite good, so make sure you have enough VRAM to run it.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 98s

Step-by-step:

  • Clone the KugelAudio Open repo.
  • Install the dependencies.
  • From the root of the repo, start the Gradio demo.
  • Access the local web interface at port 7860 in your browser.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 134s

First Run and Basic TTS

The first run downloads the model. I started with a Polish text tongue twister to stress pronunciation and rhythm. The system synthesized it cleanly as a simple text-to-speech test.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 157s

Voice Cloning Results

I uploaded a 5 to 30 second reference audio clip of good quality and generated a clone on a new sentence:

  • The cloning sounded perfect to my ear. Really good cloning.
  • I also tried verifying the watermark on generated audio. The checker reported no watermark detected and could not confirm generation. That feature was not working for me.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 219s

Multilingual Trials in KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs

Bulgarian and Reference Voice

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 307s

I generated Bulgarian on a short text about rose oil and yogurt using my own voice as the reference. Even though the reference audio quality was not great, it did quite a good job. The claim that it is at par or even better than ElevenLabs does not seem far-fetched to me, but I will let you be the judge.

Danish, French, Spanish, Italian, and German

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 373s

  • Danish: looks good.
  • French: solid.
  • Spanish: pretty good.
  • Italian: not bad at all.
  • German: not bad.

As per the model card, quality varies significantly by language. Spanish, French, English, and German have the strongest representation in the training data. Other languages may have reduced quality, prosody, or vocabulary depending on data availability.

Portuguese With Voice Cloning

I supplied a Portuguese reference audio and generated new Portuguese. Voice cloning here is really very good. This is one of the best I have seen this year.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 465s

Dutch, Russian, Ukrainian, Czech Republic

I ran Dutch, Russian, Ukrainian, and Czech Republic samples. Results were consistently good and natural across these tests.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 514s

Hungarian and Romanian

I initially mixed up a Romanian prompt with Hungarian, then corrected it:

  • Hungarian: looks great.
  • Romanian: generated cleanly after the correction.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 596s

Turkish

I generated Turkish on a short sample. The output held up well.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 658s

Emotions, Singing, and Styles

Because it is based on web voice from Microsoft, it can do emotion:

  • Angry line: good.
  • Singing: I tried "Twinkle, Twinkle Little Star." There was some melody in the style of Vibe voice. Not there yet, but not bad at all.
  • Shouting: not bad.
  • Radio announcer or podcaster style: really good.

Screenshot from KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs at 677s

Final Thoughts

KugelAudio Open brings strong open-source text-to-speech to European languages. In my tests, it delivered high-quality multilingual synthesis and excellent voice cloning, with human preference results indicating a 78% win rate over ElevenLabs. It does need significant GPU memory, and quality varies by language based on training data coverage. Emotions and styles are already effective, while singing shows promise but needs work. Overall, this is one of the best voice cloning experiences I have seen this year.

sonuai.dev

Sonu Sahani

AI Engineer & Full Stack Developer. Passionate about building AI-powered solutions.

Related Posts