I have a problem of the following nature, there is an hdf5 pandas (PyTables).
with open('vector.h5', 'wb') as f:
f.close()
vector = pd.HDFStore('vector.h5', mode='r+')
with open('compare.h5', 'wb') as f:
f.close()
compare = pd.HDFStore('compare.h5', mode='r+')
In it, I iteratively write a vector with a dynamic length size from 20000 to 1.
for idx_row in range(length):
array = []
array_title = []
for idx_column in range(idx_row, length):
array.append(compare_vector_cosine(vector[keys[idx_row]][:512], vector[keys[idx_column]][:512]))
array_title.append(keys[idx_column])
compare[keys[idx_row]] = pd.Series(data=array, index=array_title)
For some reason, there is a slow write to the file. It is being created and slowly getting bigger in size.
I have no reason to say that there is no slow reading from another vector file.
why is this happening? what parameters should I add or change?
However, when writing to another file where the vector is static, this happens quickly.
for idx, img in enumerate(IMAGE_TITLE):
vector[img] = pd.Series(np.concatenate([embedding_main['embedding'],
embedding_second,
embedding_main['pose'],
np.array([embedding_main['gender']], axis=0))