Modifying Sound Input to Determine Frequency

98 views Asked by At

I'm working on a project and I've hit a snag that is past my understanding. My goal is to create an artificial neural network which is fed information from a sound file which is then ported through the system, resulting in a labeling of the chord. I'm hoping to make this to help in music transcription -- not to actually do the transcription itself, but to help in the harmonization aspect. I digress.

I've read as much as I can on the Goertzel and the FFT function, but I'm unsure if these functions are what I'm looking for. I'm not looking for any particular frequency in the sound sample, but rather, I'm hoping to find the higher, middle, and low range frequencies of the sample.

I know the Goertzel algorithm returns a high number if a particular frequency is found, but it seems computational wasteful to run the algorithm for all possible tones in a given sample. Any ideas on what to use?

Or, if this is impossible, I'd love to know that too before spending too much time on this one project.

Thank you for your time!

3

There are 3 answers

0
th3falc0n On

FFT is the right solution. Basically, when you have the FFT of an input signal that consists only of sinus waves, you can determine the chord by just mapping which frequencys are present to specific tones in whichever musical temperament you want to use, then look up the chord specified by those tones. If you don't have sinus-waves as input, then using a neural network is a valid attempt in solving the problem, provided that you have enough samples to train it.

1
P i On

Probably better suited to DSP StackExchange.

Suppose you FFT a single 110Hz tone to get a spectrogram; you'll see evenly spaced peaks at 110 220 330 etc Hz -- the harmonics. 110 is the fundamental.

Suppose you have 3 tones. Already it's going to look quite messy in the frequency domain. Especially if you have a chord containing e.g. A110 and A220.

On account of this, I think a neural network is a good approach.

Feed in FFT output.

It would be a good idea to use a neural network that accepts complex valued inputs, as FFT outputs of a complex number for each frequency bin.

http://www.eagle.tamut.edu/faculty/igor/PRESENTATIONS/IJCNN-0813_Tutorial.pdf

It may seem computationally wasteful to extract so many frequencies with FFT, but FFT algorithms are extremely efficient nowadays. You should probably use a bit strength of 10, so 2^10 inputs -> 2^9 = 512 complex bins.

0
Jacques de Hooge On

FFT is the right way. Harmonics don't bother you, since they are an integer multiple of the fundamental frequency they're just higher 'octaves' of the same note. And to recognize a chord, tranpositions of notes over whole octaves don't matter.