-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Milestone
Description
As discussed in the Sprint 5 planning meeting, we would like to rework the mapreduce algorithm itself to be closer to Apache Hadoop's flow.
The proposed order of operations is as follows:
- Mappers get data from external data source (FS).
- When a mapper completes all of its assigned tasks, it performs a Combine step followed by a Partition step on the intermediate data saved to its local disk.
- When all mappers have completed their tasks, the Shuffle step begins, which is simply the movement of local intermediate files to the FS for further processing.
- Then the Merge step combines the values of all duplicate keys.
- Finally, the intermediate data is sent to the reducers for final processing.
Reactions are currently unavailable