HeartMuLa: AI Music Generator Outshining Suno in Languages

There is a new music model out there making serious waves. HeartMuLa is Apache 2 licensed open source. Compared with closed source competitors like Suno, HeartMuLa gives you the full package: an LLM-based song generator that accepts multimodal inputs including text descriptions, lyrics, and reference audio. I installed it locally and generated a few songs in different languages.

HeartMuLa: AI Music Generator Outshining Suno in Languages - What sets it apart

Fine-grain control: specify different styles for different song sections like intro, verse, and chorus using natural language prompts.
Benchmarks: according to the model card, it outperforms both Suno version 5 and UIO version 1.5 on lyric clarity.
Quality claims: the internal 7 billion version reportedly matches Sono’s overall quality in musicality, fidelity, and controllability.
Multilingual support: English, Chinese, Japanese, Korean, and Spanish.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 67s

Architecture of HeartMuLa: AI Music Generator Outshining Suno in Languages

The architecture is designed around four core components working in harmony.

hard codec: a 12.5 Hz music tokenizer that achieves high fidelity reconstruction while operating at an extremely low frame rate. This is crucial for efficient autoregressive generation of long form music.
hard clap: handles audio-text alignment and creates a unified embedding space for crossmodal understanding.
hard transcripttor: a Whisper-based model fine-tuned specifically for music lyric recognition in real world scenarios.
heart mula generator: the LLM-based generator that synthesizes everything together with a three-tier architecture:
- A global backbone processes text tokens and audio encodings.
- A local decoder handles the music generation with both audio and hidden state tokens.
- A detokenizer converts everything back to waveform audio.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 197s

The model supports multilingual lyrics as noted above and is released under the Apache 2 license in their GitHub repo.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 298s

Install HeartMuLa: AI Music Generator Outshining Suno in Languages Locally

System and prerequisites

Ubuntu system
One GPU card used in my test: Nvidia RTX 6000 with 48 GB of VRAM
Python 3.10 recommended

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 106s

Step-by-step setup

Create a Python 3.10 virtual environment.
Clone the repo.
Install the requirements.
Create a ckpt directory inside the repo.
Download all three required models from Hugging Face. Use the correct HF syntax for the download tool.
Use the provided run music generation script in the repo.
Ensure your lyrics file in the assets directory matches the expected format, even for other languages.
The script saves the output locally so you can play it.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 148s

Performance

Generation ETA was around 4 minutes for a full song in my run.
VRAM consumption sat close to 20 GB. A 24 GB GPU should be fine.
It generated a complete song with music matching the lyrics and style.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 372s

Multilingual results with HeartMuLa: AI Music Generator Outshining Suno in Languages

I generated songs in English, Spanish, and a short Chinese example. The first song stood out. Spanish and Chinese outputs were also pretty nice.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 478s

Change voice and genre in HeartMuLa: AI Music Generator Outshining Suno in Languages

You can change the voice and the genre by editing text.txt. I passed descriptors like opera, male, vocals, classical, dramatic, orchestral. There are various options documented on the model card and repo. The list is not that huge right now, but I am sure it will grow. My quick test in this style needs more work in my opinion.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 689s

Final Thoughts

HeartMuLa gives you fine-grain control over sections, strong lyric clarity in benchmarks, multilingual support, and a straightforward local setup. VRAM use was reasonable in my tests and the provided script made it easy to produce complete songs that follow the lyrics and style prompts. I plan to explore more aspects of this model in detail soon.