Sonu Sahani logo
Sonusahani.com
HeartMuLa: AI Music Generator Outshining Suno in Languages

HeartMuLa: AI Music Generator Outshining Suno in Languages

0 views
4 min read
#AI

There is a new music model out there making serious waves. HeartMuLa is Apache 2 licensed open source. Compared with closed source competitors like Suno, HeartMuLa gives you the full package: an LLM-based song generator that accepts multimodal inputs including text descriptions, lyrics, and reference audio. I installed it locally and generated a few songs in different languages.

HeartMuLa: AI Music Generator Outshining Suno in Languages - What sets it apart

  • Fine-grain control: specify different styles for different song sections like intro, verse, and chorus using natural language prompts.
  • Benchmarks: according to the model card, it outperforms both Suno version 5 and UIO version 1.5 on lyric clarity.
  • Quality claims: the internal 7 billion version reportedly matches Sono’s overall quality in musicality, fidelity, and controllability.
  • Multilingual support: English, Chinese, Japanese, Korean, and Spanish.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 67s

Architecture of HeartMuLa: AI Music Generator Outshining Suno in Languages

The architecture is designed around four core components working in harmony.

  • hard codec: a 12.5 Hz music tokenizer that achieves high fidelity reconstruction while operating at an extremely low frame rate. This is crucial for efficient autoregressive generation of long form music.
  • hard clap: handles audio-text alignment and creates a unified embedding space for crossmodal understanding.
  • hard transcripttor: a Whisper-based model fine-tuned specifically for music lyric recognition in real world scenarios.
  • heart mula generator: the LLM-based generator that synthesizes everything together with a three-tier architecture:
    • A global backbone processes text tokens and audio encodings.
    • A local decoder handles the music generation with both audio and hidden state tokens.
    • A detokenizer converts everything back to waveform audio.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 197s

The model supports multilingual lyrics as noted above and is released under the Apache 2 license in their GitHub repo.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 298s

Install HeartMuLa: AI Music Generator Outshining Suno in Languages Locally

System and prerequisites

  • Ubuntu system
  • One GPU card used in my test: Nvidia RTX 6000 with 48 GB of VRAM
  • Python 3.10 recommended

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 106s

Step-by-step setup

  • Create a Python 3.10 virtual environment.
  • Clone the repo.
  • Install the requirements.
  • Create a ckpt directory inside the repo.
  • Download all three required models from Hugging Face. Use the correct HF syntax for the download tool.
  • Use the provided run music generation script in the repo.
  • Ensure your lyrics file in the assets directory matches the expected format, even for other languages.
  • The script saves the output locally so you can play it.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 148s

Performance

  • Generation ETA was around 4 minutes for a full song in my run.
  • VRAM consumption sat close to 20 GB. A 24 GB GPU should be fine.
  • It generated a complete song with music matching the lyrics and style.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 372s

Multilingual results with HeartMuLa: AI Music Generator Outshining Suno in Languages

I generated songs in English, Spanish, and a short Chinese example. The first song stood out. Spanish and Chinese outputs were also pretty nice.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 478s

Change voice and genre in HeartMuLa: AI Music Generator Outshining Suno in Languages

You can change the voice and the genre by editing text.txt. I passed descriptors like opera, male, vocals, classical, dramatic, orchestral. There are various options documented on the model card and repo. The list is not that huge right now, but I am sure it will grow. My quick test in this style needs more work in my opinion.

Screenshot from HeartMuLa: AI Music Generator Outshining Suno in Languages at 689s

Final Thoughts

HeartMuLa gives you fine-grain control over sections, strong lyric clarity in benchmarks, multilingual support, and a straightforward local setup. VRAM use was reasonable in my tests and the provided script made it easy to produce complete songs that follow the lyrics and style prompts. I plan to explore more aspects of this model in detail soon.

sonuai.dev

Sonu Sahani

AI Engineer & Full Stack Developer. Passionate about building AI-powered solutions.

Related Posts