Table Of Content
- HeartMuLa: AI Music Generator Outshining Suno in Languages - What sets it apart
- Architecture of HeartMuLa: AI Music Generator Outshining Suno in Languages
- Install HeartMuLa: AI Music Generator Outshining Suno in Languages Locally
- System and prerequisites
- Step-by-step setup
- Performance
- Multilingual results with HeartMuLa: AI Music Generator Outshining Suno in Languages
- Change voice and genre in HeartMuLa: AI Music Generator Outshining Suno in Languages
- Final Thoughts

HeartMuLa: AI Music Generator Outshining Suno in Languages
Table Of Content
- HeartMuLa: AI Music Generator Outshining Suno in Languages - What sets it apart
- Architecture of HeartMuLa: AI Music Generator Outshining Suno in Languages
- Install HeartMuLa: AI Music Generator Outshining Suno in Languages Locally
- System and prerequisites
- Step-by-step setup
- Performance
- Multilingual results with HeartMuLa: AI Music Generator Outshining Suno in Languages
- Change voice and genre in HeartMuLa: AI Music Generator Outshining Suno in Languages
- Final Thoughts
There is a new music model out there making serious waves. HeartMuLa is Apache 2 licensed open source. Compared with closed source competitors like Suno, HeartMuLa gives you the full package: an LLM-based song generator that accepts multimodal inputs including text descriptions, lyrics, and reference audio. I installed it locally and generated a few songs in different languages.
HeartMuLa: AI Music Generator Outshining Suno in Languages - What sets it apart
- Fine-grain control: specify different styles for different song sections like intro, verse, and chorus using natural language prompts.
- Benchmarks: according to the model card, it outperforms both Suno version 5 and UIO version 1.5 on lyric clarity.
- Quality claims: the internal 7 billion version reportedly matches Sono’s overall quality in musicality, fidelity, and controllability.
- Multilingual support: English, Chinese, Japanese, Korean, and Spanish.

Architecture of HeartMuLa: AI Music Generator Outshining Suno in Languages
The architecture is designed around four core components working in harmony.
- hard codec: a 12.5 Hz music tokenizer that achieves high fidelity reconstruction while operating at an extremely low frame rate. This is crucial for efficient autoregressive generation of long form music.
- hard clap: handles audio-text alignment and creates a unified embedding space for crossmodal understanding.
- hard transcripttor: a Whisper-based model fine-tuned specifically for music lyric recognition in real world scenarios.
- heart mula generator: the LLM-based generator that synthesizes everything together with a three-tier architecture:
- A global backbone processes text tokens and audio encodings.
- A local decoder handles the music generation with both audio and hidden state tokens.
- A detokenizer converts everything back to waveform audio.

The model supports multilingual lyrics as noted above and is released under the Apache 2 license in their GitHub repo.

Install HeartMuLa: AI Music Generator Outshining Suno in Languages Locally
System and prerequisites
- Ubuntu system
- One GPU card used in my test: Nvidia RTX 6000 with 48 GB of VRAM
- Python 3.10 recommended

Step-by-step setup
- Create a Python 3.10 virtual environment.
- Clone the repo.
- Install the requirements.
- Create a ckpt directory inside the repo.
- Download all three required models from Hugging Face. Use the correct HF syntax for the download tool.
- Use the provided run music generation script in the repo.
- Ensure your lyrics file in the assets directory matches the expected format, even for other languages.
- The script saves the output locally so you can play it.

Performance
- Generation ETA was around 4 minutes for a full song in my run.
- VRAM consumption sat close to 20 GB. A 24 GB GPU should be fine.
- It generated a complete song with music matching the lyrics and style.

Multilingual results with HeartMuLa: AI Music Generator Outshining Suno in Languages
I generated songs in English, Spanish, and a short Chinese example. The first song stood out. Spanish and Chinese outputs were also pretty nice.

Change voice and genre in HeartMuLa: AI Music Generator Outshining Suno in Languages
You can change the voice and the genre by editing text.txt. I passed descriptors like opera, male, vocals, classical, dramatic, orchestral. There are various options documented on the model card and repo. The list is not that huge right now, but I am sure it will grow. My quick test in this style needs more work in my opinion.

Final Thoughts
HeartMuLa gives you fine-grain control over sections, strong lyric clarity in benchmarks, multilingual support, and a straightforward local setup. VRAM use was reasonable in my tests and the provided script made it easy to produce complete songs that follow the lyrics and style prompts. I plan to explore more aspects of this model in detail soon.
Subscribe to our newsletter
Get the latest updates and articles directly in your inbox.




