When I create a series of spectrograms from a long audio file, the colour intesities vary noticably

23 views Asked by At

I'm using python, librosa, np. The audio is stereo, 44.1k, about 2gb, 3hrs First I convert the audio to mon, then in 60 second chunks I calculate the maximum power. Next I use then highest max power as the global max power reference to generate the spectrograms. I use the same f_high, and f_low same db_high and db_low same n_fft, hope langth and n_mels. Same sample rate, and same max power reference.

basically this to get global max power ref

    for i in tqdm(range(0, len(y), samples_per_chunk)):
    y_chunk = y[i : i + samples_per_chunk]
    S = librosa.feature.melspectrogram(
        y=y_chunk,
        sr=sr,
        n_fft=n_fft,
        hop_length=hop_length,
        n_mels=n_mels,
        fmin=f_low,
        fmax=f_high,
    )
    maxpower = np.max(S)
    if maxpower > global_max_power:
        global_max_power = maxpower

then I use that max power to create the images convert to db S_dB = librosa.power_to_db( S, ref=max_power, amin=10 ** (db_low / 10.0), top_db=db_high - db_low, )

render image

    plt.figure(figsize=(image_width / 100, video.get("height", 100) / 100))
    librosa.display.specshow(
        S_dB,
        sr=sr,
        cmap=audiovis.get("cmap", "magma"),
        hop_length=hop_length,
        fmin=f_low,
        fmax=f_high,
    )

if I make the image very wide, so no chunking is required (using a smaller file), I do not see the color intensity variation in the audio. when I process the audio in chunks, it appears, yet the parameters controlling the process are the same.

I'm not sure how to diagnose further.

0

There are 0 answers