Hierarchical Scheduling for Diverse Datacenter Workloads

Hierarchical Scheduling for Diverse Datacenter Workloads

In this post we’ll cover the paper that introduced HDRF (Hierarchical Dominant Resource Fairness) which builds upon the team’s existing work DRF (Dominant Resource Fairness), but looking to also provide hierarchical scheduling.

Background

Prior work DRF, was an algorithm that was able to decide how to allocate multi-dimensional resources to multiple frameworks, which it described how it can enforce fairness when scheduling multiple resource types with a flat hierarchy:

                 DRF  

    | —— |  —— | —— – |

   dev   test staging prod

    10     10      30       50

However, in most organizations it’s important to be able to describe resource allocation weights in a hierarchy that reflects its organizational intent:

                 H-DRF  

    | —— |  —— | —— – |

   fe      ads     spam   mail

   30      20        25       25

   /\        /\          /\         /\

 d  p     d p       d p     d  p       (d = dev, p = prod)

 50 50 20 80    30 70  40 60

The key difference with hierarchical scheduling is that when a node is not using its resources, it’s redistributed among the sibling nodes as opposed to all leaf nodes. For example, when dev environment in FE is not using its resources, it’s allocated to prod in FE instead. 

Naive implementations of hierarchical and multi-resource scheduling (such as collapsing the hierarchy into a flat hierarchy, or simply running DRF from root to leaf node) can lead to starvation, where in our example certain dev and prod environment never receiving any or their fair share of resources. This is referred as hierarchical share guarantee.

Screen Shot 2017-10-12 at 2.23.30 AM

H-DRF

To avoid the problem of starvation, H-DRF incorporates two ideas when considering dominant share in the leaf nodes. The first idea is rescaling the leaf node’s resource consumption to the minimum node. The second idea is to ignore rescaling blocked nodes, where a node is blocked if one of the resources request is saturated or when it has no more tasks to launch. The actual proof and steps of the implementation is covered in the paper, and I won’t go over here in details. 

Notes

The interesting piece that was highlighted in this paper was that Hadoop implemented a naive version of HDRF and therefore has bugs where it can cause starvation in the tasks. Therefore, it’s not straightforward when attempting to modify how DRF works without proofing it’s starvation free and also provides fairness (unless it’s not the primary goal for your change). 

That said, there are more papers that continues to extend and modify DRF and also shown ways that can continue to show blindspots that HDRF didn’t cover, which I’ll try to cover more in the future.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s