Magenta "tone transfer" colab: what is the time interval between audio slices?

228 views Asked by At

The Magenta "Tone Transfer" colab takes audio of one instrument and re-synthesizes it so it sounds like a different instrument. Cool.

Along the way, it analyzes the features of audio you upload: audio_features = ddsp.training.metrics.compute_audio_features(audio)

printing this line yields various arrays, such as freqeuncy (in Hertz): 220.2, 221.0, 300.5..., each of which represents the analysis a given time slice.

My primary question is, what is the time interval between slices/samples in the audio_features arrays?` (If it's even constant...) Secondarily, is that interval modifiable? Tertiarily (?), can one get a time stamp for each array element?

I did not see any API docs on this.

I'm coming from Javascript; total beginner at Python, so if you figured this out from the colab notebook or repo itself, it would also help to understand how you figured it out!

In case you're curious, I want to be able to use the analysis tools for my own custom audio manipulation, resynthesis, and analysis, so I'm trying to understand ddsp.training.metrics.compute_audio_features(audio) more precisely. Thanks!

1

There are 1 answers

2
Thatcher Thornberry On

The somewhat simple answer is: it depends on the audio clip that you upload.

The first intuition you need to have is that when an audio clip is stored digitally, due to the nature of computers, it must be segmented into individual components. If there are too few of these components per second, the audio will sound choppy to us. However, if there are a sufficient amount of these individual components per second, we won't be able to perceive the difference between a digital audio clip and real audio. So, what is this components per second attribute? Its known as the sampling rate. The sampling rate is the number of samples taken per second.

So, every audio clip you upload will have a particular sampling rate. At each sample, you can get values such as amplitude of the audio. Let me give an example with the Python package Librosa.

import librosa

# Load an example file in .ogg format
fileName = librosa.util.example_audio_file()
audioData, sampleRate = librosa.load(fileName)

print(audioData)
>>> [ -4.756e-06,  -6.020e-06, ...,  -1.040e-06,   0.000e+00]

print(audioData.shape)
>>> (1355168,)

print(sampleRate)
>>> 22050

# We can use the number of samples and the sampling rate to get the duration of the audio

librosa.core.get_duration(audioData, 22050) # output is seconds

>>> 61.4

what is the time interval between slices/samples in the audio_features arrays?

The time interval will depend on the sampling rate of the audio. If the sampling rate is 22,050 samples per second, the time interval between each sample will be 1/22,050 seconds.

can one get a time stamp for each array element

With a quick search, I found a Librosa function called samples_to_time.

librosa.samples_to_time(np.arange(0, 22050, 512))
>>> array([ 0.   ,  0.023,  0.046,  0.07 ,  0.093,  0.116,  0.139,
        0.163,  0.186,  0.209,  0.232,  0.255,  0.279,  0.302,
        0.325,  0.348,  0.372,  0.395,  0.418,  0.441,  0.464,
        0.488,  0.511,  0.534,  0.557,  0.58 ,  0.604,  0.627,
        0.65 ,  0.673,  0.697,  0.72 ,  0.743,  0.766,  0.789,
        0.813,  0.836,  0.859,  0.882,  0.906,  0.929,  0.952,
        0.975,  0.998])

My parting advice is to check out some Youtube videos about audio processing if you're interested in doing a project. Additionally, the librosa library is very handy. If you're set on using Python (a great choice), you should read more into their docs.