pyspark.streaming.DStream.reduceByKeyAndWindow#
- DStream.reduceByKeyAndWindow(func, invFunc, windowDuration, slideDuration=None, numPartitions=None, filterFunc=None)[source]#
- Return a new DStream by applying incremental reduceByKey over a sliding window. - The reduced value of over a new window is calculated using the old window’s reduce value :
- reduce the new values that entered the window (e.g., adding new counts) 
- “inverse reduce” the old values that left the window (e.g., subtracting old counts) 
 
 - invFunc can be None, then it will reduce all the RDDs in window, could be slower than having invFunc. - Parameters
- funcfunction
- associative and commutative reduce function 
- invFuncfunction
- inverse function of reduceFunc 
- windowDurationint
- width of the window; must be a multiple of this DStream’s batching interval 
- slideDurationint, optional
- sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval 
- numPartitionsint, optional
- number of partitions of each RDD in the new DStream. 
- filterFuncfunction, optional
- function to filter expired key-value pairs; only pairs that satisfy the function are retained set this to null if you do not want to filter