There is a tree-like data structure, the nodes of which contain scalar values (integers), lists and dictionaries, np.array. I need to write a function (in Python) that applies an aggregation function to the lower levels of the tree and replaces the lower level with the result of the aggregation. The result of the function is a new data structure.
Example.
Initial data structure: {“a”: [1, [2, 3, 4], [5, 6, 7]], “b”: [{“c”:8, “d”:9}, {“ e”:3, “f”:4}, 8]}
Aggregation function: sum
First use: {“a”: [1, 9, 18]], “b”: [17, 7, 8]}
Second use: {“a”: 28, “b”: 32}
Third use: 60
Fourth use: 60
sumis a particularly simple example because you can sum in any order and you'll always get the same result.So for instance you could first collapse the tree into a linear collection, using depth-first-search or breadth-first-search, and then sum everything, and you'd get the same result.
But with a different aggregation function, the way things are grouped might matter.
I suggest two different functions,
collapse_then_aggregateandaggregate_recursively, which return the same result for a simple aggregating function likesumbut can return different results for more complex aggregating functions.Note how
depth_first_searchandaggregate_recursivelyuse the same logic to walk the tree recursively, by making a recursive call on each element of the iterable. Dictionaries are treated separately withif isinstance(tree, dict)because you care about the values and not the keys. Iterables are treated using try/except. Strings are iterables too, but presumably you don't want to iterate on their individual characters, so I wrote a special caseif isinstance(tree, (str, bytes))so that strings are not treated like iterables.Examples of applications:
Results:
Note how
sumandprodgive the same result for our two aggregation methods, butmean, geometric_meanandlistgive different results.