I need to build a software that does audio recognition from a small audio sample (A) inside other audio samples (B), and output how many times A appears inside the audio from B (if there is a match).
What I have: A database with hundreds of audios
Input: New audio
Expected Output: A boolean if the input matches a sample from the database, and how many times appeared the input inside the matched audio (from the db).
Any code, open source project, guides, books, videos, tutorial, etc... is useful! Thanks everyone!
This is a very broad question, but let me try to back up and describe a little bit about how audio recognition works generally, and how you might perform this yourself.
I'm going to assume the audio comes from an audio file and not a stream, but it should be relatively easy to understand either way.
The Basics of Digital Audio
An audio file is a series of samples which are recorded into a device through a process called sampling. Sampling is the process by which a continuous analog signal (for instance, the electrical signal from a microphone or an electric guitar) is turned into a discrete, digital signal.
With audio signals, sampling is almost always done at a single sampling rate, which is generally somewhere between 8kHz and 192kHz. The only particularly important things to know about sampling for you are:
Audio Recognition
General algorithms for audio recognition are complex and often inefficient relative to a certain amount of use cases. For instance, are you trying to determine whether an audio file exactly matches another audio file, or whether they would sound nearly identical? For instance, let's look at the simplest audio comparison algorithm (at least the simplest I can come up with).
This works **only under specific circumstances* -- if the audio files are even slightly different, they won't be matched as identical. Let's talk about a few ways that this could fail:
==between floats because floats are compared with such accuracy that tiny changes to the samples would cause them to register as different. For instance:Even though the slight change to
SamplesBis imperceivable, it is recognized bycompareAudioFiles.There are tons of other reasons this wouldn't work, like phase mismatch, bias, and filtered low frequency or high frequency signals which aren't audible.
You could continue to improve this algorithm to make up for some things like these, but it would still probably never work well enough to match perceived sounds to others. In short, if you want to do this in such a way which compares the way audio sounds you need to use an acoustic fingerprinting library. One such library is pyacoustid. Otherwise, if you want to compare audio samples from files on their own, you can probably come up with a relatively stable algorithm which measures the difference between sounds in the time domain, taking into account zero padding, imprecision, bias, and other noise.
For general purpose audio operations in Python, I'd recommend LibROSA
Good luck!