What is a Workload Manager

  • Commonly called a Workload Manager. May also be referred to (sometimes loosely) as:

    • Batch system
    • Batch scheduler
    • Workload scheduler
    • Job scheduler
    • Resource manager (usually considered a component of a Workload Manager)
  • Tasks commonly performed by a Workload Manager:

    • Provide a means for users to specify and submit work as “jobs”
    • Evaluate, prioritize, schedule and run jobs
    • Provide a means for users to monitor, modify and interact with jobs
    • Manage, allocate and provide access to available machine resources
    • Manage pending work in job queues
    • Monitor and troubleshoot jobs and machine resources
    • Provide accounting and reporting facilities for jobs and machine resources
    • Efficiently balance work over machine resources; minimize wasted resources
  • Generalized architecture and workflow of a Workload Manager:

WorkloadMgrWorkflow

User

  • Logs into cluster
  • Creates job script and submits it to workload manager
  • Monitors and interacts with job via workload manager
  • Queries workload manager for job and cluster information

Workload Manager

  • Typically runs on a separate server as multiple processes
  • Receives job submissions, commands, queries from user
  • Matches job requirements to available machine resources
  • Evaluates, prioritizes and queues jobs
  • Schedules jobs for execution on cluster
  • Tracks job and cluster information
  • Sends jobs to compute node daemons for actual execution

Cluster

  • Workload Manager daemons run on compute nodes
  • Daemons manage compute resources and job execution
  • Daemons communicate with Workload Manager server processes

  • Some popular Workload Managers include:
    • Slurm from SchedMD
    • Spectrum LSF from IBM
    • Tivoli Workload Scheduler (LoadLeveler) from IBM
    • PBS from Altair Engineering
    • TORQUE, Maui, Moab from Adaptive Computing
    • Univa Grid Engine
    • OpenLava

Workload Manager

本文采用 知识共享署名 4.0 国际许可协议(CC-BY 4.0)进行许可。转载请注明来源: https://snowfrs.com/2020/11/20/Workload-Manager.html 欢迎对文中引用进行考证,欢迎指出任何不准确和模糊之处。