Table Of Content
- HeartMuLa: AI Music Generator Outshining Suno in Languages - What sets it apart
- Architecture of HeartMuLa: AI Music Generator Outshining Suno in Languages
- Install HeartMuLa: AI Music Generator Outshining Suno in Languages Locally
- System and prerequisites
- Step-by-step setup
- Performance
- Multilingual results with HeartMuLa: AI Music Generator Outshining Suno in Languages
- Change voice and genre in HeartMuLa: AI Music Generator Outshining Suno in Languages
- Final Thoughts

HeartMuLa: AI Music Generator Outshining Suno in Languages
Table Of Content
- HeartMuLa: AI Music Generator Outshining Suno in Languages - What sets it apart
- Architecture of HeartMuLa: AI Music Generator Outshining Suno in Languages
- Install HeartMuLa: AI Music Generator Outshining Suno in Languages Locally
- System and prerequisites
- Step-by-step setup
- Performance
- Multilingual results with HeartMuLa: AI Music Generator Outshining Suno in Languages
- Change voice and genre in HeartMuLa: AI Music Generator Outshining Suno in Languages
- Final Thoughts
There is a new music model out there making serious waves. HeartMuLa is Apache 2 licensed open source. Compared with closed source competitors like Suno, HeartMuLa gives you the full package: an LLM-based song generator that accepts multimodal inputs including text descriptions, lyrics, and reference audio. I installed it locally and generated a few songs in different languages.
HeartMuLa: AI Music Generator Outshining Suno in Languages - What sets it apart
- Fine-grain control: specify different styles for different song sections like intro, verse, and chorus using natural language prompts.
- Benchmarks: according to the model card, it outperforms both Suno version 5 and UIO version 1.5 on lyric clarity.
- Quality claims: the internal 7 billion version reportedly matches Sono’s overall quality in musicality, fidelity, and controllability.
- Multilingual support: English, Chinese, Japanese, Korean, and Spanish.

Architecture of HeartMuLa: AI Music Generator Outshining Suno in Languages
The architecture is designed around four core components working in harmony.
- hard codec: a 12.5 Hz music tokenizer that achieves high fidelity reconstruction while operating at an extremely low frame rate. This is crucial for efficient autoregressive generation of long form music.
- hard clap: handles audio-text alignment and creates a unified embedding space for crossmodal understanding.
- hard transcripttor: a Whisper-based model fine-tuned specifically for music lyric recognition in real world scenarios.
- heart mula generator: the LLM-based generator that synthesizes everything together with a three-tier architecture:
- A global backbone processes text tokens and audio encodings.
- A local decoder handles the music generation with both audio and hidden state tokens.
- A detokenizer converts everything back to waveform audio.

The model supports multilingual lyrics as noted above and is released under the Apache 2 license in their GitHub repo.

Install HeartMuLa: AI Music Generator Outshining Suno in Languages Locally
System and prerequisites
- Ubuntu system
- One GPU card used in my test: Nvidia RTX 6000 with 48 GB of VRAM
- Python 3.10 recommended

Step-by-step setup
- Create a Python 3.10 virtual environment.
- Clone the repo.
- Install the requirements.
- Create a ckpt directory inside the repo.
- Download all three required models from Hugging Face. Use the correct HF syntax for the download tool.
- Use the provided run music generation script in the repo.
- Ensure your lyrics file in the assets directory matches the expected format, even for other languages.
- The script saves the output locally so you can play it.

Performance
- Generation ETA was around 4 minutes for a full song in my run.
- VRAM consumption sat close to 20 GB. A 24 GB GPU should be fine.
- It generated a complete song with music matching the lyrics and style.

Multilingual results with HeartMuLa: AI Music Generator Outshining Suno in Languages
I generated songs in English, Spanish, and a short Chinese example. The first song stood out. Spanish and Chinese outputs were also pretty nice.

Change voice and genre in HeartMuLa: AI Music Generator Outshining Suno in Languages
You can change the voice and the genre by editing text.txt. I passed descriptors like opera, male, vocals, classical, dramatic, orchestral. There are various options documented on the model card and repo. The list is not that huge right now, but I am sure it will grow. My quick test in this style needs more work in my opinion.

Final Thoughts
HeartMuLa gives you fine-grain control over sections, strong lyric clarity in benchmarks, multilingual support, and a straightforward local setup. VRAM use was reasonable in my tests and the provided script made it easy to produce complete songs that follow the lyrics and style prompts. I plan to explore more aspects of this model in detail soon.
Related Posts

Chroma 4B: Exploring End-to-End Virtual Human Dialogue Models
Chroma 4B: Exploring End-to-End Virtual Human Dialogue Models

Qwen3-TTS: Create Custom Voices from Text Descriptions Easily
Qwen3-TTS: Create Custom Voices from Text Descriptions Easily

How to Fix Google AI Studio Failed To Generate Content Permission Denied?
How to Fix Google AI Studio Failed To Generate Content Permission Denied?

