I want to apply a function via flatMap to each group produced by DataSet.groupBy. Trying to call flatMap I get the compiler error:
error: value flatMap is not a member of org.apache.flink.api.scala.GroupedDataSet
My code:
var mapped = env.fromCollection(Array[(Int, Int)]())
var groups = mapped.groupBy("myGroupField")
groups.flatMap( myFunction: (Int, Array[Int]) => Array[(Int, Array[(Int, Int)])] )  // error: GroupedDataSet has no member flatMap
Indeed, in the documentation of flink-scala 0.9-SNAPSHOT no map or similar is listed. Is there a similar method to work with? How to achieve the desired distributed mapping over each group individually on a node?
                        
You can use
reduceGroup(GroupReduceFunction f)to process all elements a group. AGroupReduceFunctiongives you anIterableover all elements of a group and anCollectorto emit an arbitrary number of elements.Flink's
groupBy()function does not group multiple elements into a single element, i.e., it does not convert a group of(Int, Int)elements (that all share the same_1tuple field) into one(Int, Array[Int]). Instead, aDataSet[(Int, Int)]is logically grouped such that all elements that have the same key can be processed together. When you apply aGroupReduceFunctionon aGroupedDataSet, the function will be called once for each group. In each call all elements of a group are handed together to the function. The function can then process all elements of the group and also convert a group of(Int, Int)elements into a single(Int, Array[Int])element.