I have this function that constantly adds a new element in a dataset array of an HDF5 file every second.
from time import time, sleep
i = 100
def update_array():
hf = h5py.File('task1.h5', 'r+')
old_rec = np.array(hf.get('array'))
global i
i = i+1
new_rec = np.append(old_rec, i)
#deleting old record andreplacing with updated record
del hf['array']
new_data = hf.create_dataset('array', data = new_rec)
print(new_rec)
hf.close()
while True:
sleep(1 - time() % 1)
update_array()
The output of the print line (basically showing the updated array..... we do not know if it is getting saved in the file or not):
[101.]
[101. 102.]
[101. 102. 103.]
[101. 102. 103. 104.]
[101. 102. 103. 104. 105.]
[101. 102. 103. 104. 105. 106.]
[101. 102. 103. 104. 105. 106. 107.]
[101. 102. 103. 104. 105. 106. 107. 108.]
I want to have a separate notebook that can track changes made by the above function and display the updated contents of this dataset present in the HDF5 file system.
I want a separate function for this task because I want to make sure that the updated content gets saved in the HDF5 files, and perform further on fly operations on them as they keep arriving.
Here is a potential solution attaching attributes to the
'array'dataset. Adding attributes to a HDF5 data object are easy with.attrs. It has a dictionary-like syntax:h5obj[attr_name] = attr_value. Attribute value types can be ints, strings, floats, and arrays. You can add 2 attributes to your dataset with the following 2 lines:To demonstrate, I added these lines to your code, along with several other modifications to address the following issues:
create_array()to initially create the file and dataset. I created it as a resizable dataset to simplify logic inupdate_array().update_array()code to enlarge the dataset and append the new value. This is much cleaner (and faster) than your 4 step process.with / as:context manager to open the file. This eliminates the need to close it, and (more importantly) ensures it is closed cleanly if the program exits abnormally.hf['array'][:]instead ofnp.array(hf.get('array')).with / as:lines into the main and pass the resultinghfobject tocreate_array()andupdate_array()functions. If you do that, you can easily consolidate the 2 functions. (You will need logic to test if the'array'dataset exists.)Code below: