August 15th, 2024

YouTube Video to Tabs and Lyrics

Fish is an AI-powered multimodal model for music information retrieval, generating musical elements like chords and lyrics, featuring advanced audio processing and a specialized architecture for enhanced functionality.

Read original articleLink Icon
YouTube Video to Tabs and Lyrics

The GitHub repository for the project named Fish presents an AI-powered multimodal model designed for music information retrieval. Its primary function is to generate various musical elements such as chords, beats, lyrics, melody, and tabs for any song using a transformer-based hybrid model. Key features include chord detection, which identifies different chord types and song keys; beat detection for tracking tempo; pitch tracking for vocal melodies; and music structure analysis to label song segments. Additionally, it employs automatic speech recognition (ASR) for lyrics recognition, aligning them with audio, and generates playable sheet music with editing capabilities. The project incorporates advanced audio processing features like source separation, speed adjustment, and pitch shifting. The model architecture consists of several specialized models, including U-Net, Pitch-Net, Beat-Net, Chord-Net, and Segment-Net, with a core model called CombineNet that utilizes an encoder-decoder structure for audio processing. The repository also showcases training results for a speech sample, demonstrating the model's capabilities. For further exploration, users can visit the project's website or access the code in the repository.

- Fish is an AI model for music information retrieval, generating chords, beats, lyrics, and more.

- Key features include chord detection, beat tracking, pitch monitoring, and music structure analysis.

- The project uses various specialized models for audio processing, culminating in the CombineNet architecture.

- It offers functionalities like lyrics recognition and the generation of editable sheet music.

- A demo is available showcasing the model's capabilities with training results.

Link Icon 24 comments
By @rwl4 - 6 months
So I'm trying to understand. Is this spam for the Lamucal service? I saw this same code posted on Reddit the other day under a different name. Here are a few repos with the exact same code under different names:

- https://github.com/DoMusic/Hybrid-Net

- https://github.com/TuneMusic/NiceMusic

- https://github.com/JoinMusic/fish

- https://github.com/Famuse/CombineNet

- https://github.com/AIAudioLab/AITabs

- https://github.com/AIMusicLab/MicroMuisc

I'm pretty sure there are more, but I'll stop there. Especially suspicious considering all the usernames.

Here's a post from yesterday on Reddit:

- https://www.reddit.com/r/coolaitools/comments/1ervthn/found_...

I'm guessing the general process here is:

- Push novelty (but unusable to most people) code to new Github repo

- Submit that code to Reddit/Hacker News

- People see it and are impressed by the novelty code, despite not running it due to missing the models themselves, etc. They upvote and subscribe ($$$) to actually try it.

- Repeat

I understand the desire to promote one's new service, and the product seems like it could be interesting, but this is not the way to get the word out. Reputation matters.

Edit:

Check out the user deeplover's post/comment history. One submission with the MicroMusic (see above) repo, and one comment, see below.

Also, the post by user liwei0517 is almost exactly like BigOrange688 on Reddit. See: https://www.reddit.com/r/MachineLearning/comments/1es0deh/co...

By @rrherr - 6 months
Here's the most impressive results I've seen for automated guitar transcription:

High-resolution guitar transcription via domain adaptation

Demo Videos: https://xavriley.github.io/HighResolutionGuitarTranscription... Paper: https://arxiv.org/abs/2402.15258

> We propose the use of a high-resolution piano transcription model to train a new guitar transcription model. The resulting model obtains state-of-the-art transcription results on GuitarSet in a zero-shot context, improving on previously published methods.

By @kranner - 6 months
The "tabs" seem to be arpeggiations of the chords, which might have been some use if the chord detection had worked well, which doesn't seem the case. I see chords and tabs being generated from sections which have only spoken audio, while actual guitar parts are not notated at all. The arpeggios are not consistent either and switch arbitrarily to upstrokes/downstrokes and back to arpeggios.

edit: removed a reference to a competing product

By @criddell - 6 months
I tried to get it to generate tabs for Where is my Mind by the Pixies. I see the chords, but get the NO icon (red circle with diagonal bar) when I try to click on tabs. Am I doing something wrong?

A couple of weeks ago I asked one of the AIs to teach me this song. It responded that it can't teach specifics or tell me strumming patterns because it would be a copyright violation. I told it that if I went to a human teacher, they would have no problem teaching me how to play along to the song. That was a good enough argument to get the AI to changed its mind (whatever that means) and produced a chord chart and strumming pattern (which was wrong).

By @authorfly - 6 months
Nice idea, but gives the wrong chords for jazz music: https://lamucal.com/chords/emmet-cohen/after-youve-gone-patr...

E.g. Bb instead of Ebmaj7

Bb7 instead of Bm7b5

By @neilyio - 6 months
Very happy to see more tools like this. There is so much potential for interactive tabs and sheet music with YouTube videos.

I only found out about https://www.soundslice.com recently. I'm not sure how it managed to evade me for years of searching for music resources on the internet... but for anyone interested in sheet music, I can't recommend it enough.

The design of the whole platform is so minimal and beautiful, and having notation synchronized with YouTube is simply brilliant. Built by one of the co-creators of Django, too!

By @dadver - 6 months
I played around a few minutes with the various features. The voice removing was kind of impressive, though I don't know how novel that is.

I tried making some AI covers, too, which was kind of fun. For one of my tries, I submitted Nirvana's Smells Like Teen Spirit for AI voice generation to make a cover of Carola's song "Främling" (Sweden's ESC song 1983 which came in third, a very non-Nirvana-type of song). At first, I thought the voice sounded pretty much like a Swedish Kurt Cobain, then the more I listened to it, all I could hear was the Swedish artist Nordman, and it dawned upon me that they have similar voice styles. I tried lowering the pitch, and then I was certain I recognised the voice from another artist but couldn't place it. So I'm leaning towards the AI voices being trained on some not-so-unfamiliar artists rather than there being some cool AI magic happening, though I'm out of my depth here.

By @sixall - 6 months
I am an entrepreneur, and my startup team is about to fail. In response to the questions raised by rwl4, I am very sorry. All the features on our website are completely free for everyone to use. If we can help some people, it will be a small consolation for the team before the failure. I apologize again.
By @buildsjets - 6 months
Pretty cool! Is there a way to either detect or enforce alternate tunings? There is a world beyond EADGBe... I put in a few songs that I know of which have trivial chord fingerings in drop D, and it comes up with some correct but convoluted chord fingerings in standard tuning.
By @SoftMachine - 6 months
I don't often read hackernews for Lamucal AI spams but when I do, it's always nice.
By @ksr - 6 months
I'm looking to do the opposite: Given a melody in MusicXML / MIDI, generate an accompaniment "in the style of". Any pointers?
By @riiii - 6 months
Very interesting. I don't suppose this would work with instrumental music?

Anyone know of a thing that does?

By @smrtinsert - 6 months
This is going to save me a lot of time for when I actually have time to play music. Much better than whipping out Audacity and its chord estimation, manually grabbing a video etc.
By @TrackerFF - 6 months
Seems like a cool concept, but the tab function was more or less useless. Tried a bunch of different songs in various complexities, couldn't get anything convincing.
By @ndriscoll - 6 months
Neat, it would also be awesome to package it into something like a Clone Hero/Rocksmith tool/plugin to generate charts just like Audiosurf did.
By @zerop - 6 months
Off topic: what's best way to generate good quality videos given a transcript. Only automation, no manual work. I can code.
By @CMLab - 6 months
AI cover song platforms, How to address issues related to copyright, legal, and ethical concerns?
By @guitarlimeo - 6 months
I was expecting something like this https://www.youtube.com/watch?v=nLtlyzWuoqM

but was somewhat disappointed. The site is cool, can give you a headstart when transposing a song to practice, but the chords were quite off in a few examples I tried.

By @afpx - 6 months
Wow - nice job!
By @liwei0517 - 6 months
I just trained a Taylor Swift voice model using Lamucal ( https://lamucal.com/ai-cover/share/66be2087bc3fdb000baf3cac ),

The mid-range is eerily close to Taylor's voice - there were moments when I almost thought it was really her singing. But, If you listen closely, you can still catch a tiny hint of that robotic sound.

A Bar Song : (Taylor Swift Cover) https://lamucal.com/ai-cover/song-share/66be244cbc3fdb000e76... Original: A Bar Song (Tipsy) https://www.youtube.com/watch?v=t7bQwwqW-Hc