Hierarchical Scheduling for Diverse Datacenter Workloads
In this post we’ll cover the paper that introduced HDRF (Hierarchical Dominant Resource Fairness) which builds upon the team’s existing work DRF (Dominant Resource Fairness), but looking to also provide hierarchical scheduling.
Prior work DRF, was an algorithm that was able to decide how to allocate multi-dimensional resources to multiple frameworks, which it described how it can enforce fairness when scheduling multiple resource types with a flat hierarchy:
| —— | —— | —— – |
dev test staging prod
10 10 30 50
However, in most organizations it’s important to be able to describe resource allocation weights in a hierarchy that reflects its organizational intent:
| —— | —— | —— – |
fe ads spam mail
30 20 25 25
/\ /\ /\ /\
d p d p d p d p (d = dev, p = prod)
50 50 20 80 30 70 40 60
The key difference with hierarchical scheduling is that when a node is not using its resources, it’s redistributed among the sibling nodes as opposed to all leaf nodes. For example, when dev environment in FE is not using its resources, it’s allocated to prod in FE instead.
Naive implementations of hierarchical and multi-resource scheduling (such as collapsing the hierarchy into a flat hierarchy, or simply running DRF from root to leaf node) can lead to starvation, where in our example certain dev and prod environment never receiving any or their fair share of resources. This is referred as hierarchical share guarantee.
To avoid the problem of starvation, H-DRF incorporates two ideas when considering dominant share in the leaf nodes. The first idea is rescaling the leaf node’s resource consumption to the minimum node. The second idea is to ignore rescaling blocked nodes, where a node is blocked if one of the resources request is saturated or when it has no more tasks to launch. The actual proof and steps of the implementation is covered in the paper, and I won’t go over here in details.
The interesting piece that was highlighted in this paper was that Hadoop implemented a naive version of HDRF and therefore has bugs where it can cause starvation in the tasks. Therefore, it’s not straightforward when attempting to modify how DRF works without proofing it’s starvation free and also provides fairness (unless it’s not the primary goal for your change).
That said, there are more papers that continues to extend and modify DRF and also shown ways that can continue to show blindspots that HDRF didn’t cover, which I’ll try to cover more in the future.