Workload Manager
What is a Workload Manager
-
Commonly called a Workload Manager. May also be referred to (sometimes loosely) as:
- Batch system
- Batch scheduler
- Workload scheduler
- Job scheduler
- Resource manager (usually considered a component of a Workload Manager)
-
Tasks commonly performed by a Workload Manager:
- Provide a means for users to specify and submit work as “jobs”
- Evaluate, prioritize, schedule and run jobs
- Provide a means for users to monitor, modify and interact with jobs
- Manage, allocate and provide access to available machine resources
- Manage pending work in job queues
- Monitor and troubleshoot jobs and machine resources
- Provide accounting and reporting facilities for jobs and machine resources
- Efficiently balance work over machine resources; minimize wasted resources
-
Generalized architecture and workflow of a Workload Manager:
User
- Logs into cluster
- Creates job script and submits it to workload manager
- Monitors and interacts with job via workload manager
- Queries workload manager for job and cluster information
Workload Manager
- Typically runs on a separate server as multiple processes
- Receives job submissions, commands, queries from user
- Matches job requirements to available machine resources
- Evaluates, prioritizes and queues jobs
- Schedules jobs for execution on cluster
- Tracks job and cluster information
- Sends jobs to compute node daemons for actual execution
Cluster
- Workload Manager daemons run on compute nodes
- Daemons manage compute resources and job execution
-
Daemons communicate with Workload Manager server processes
- Some popular Workload Managers include:
- Slurm from SchedMD
- Spectrum LSF from IBM
- Tivoli Workload Scheduler (LoadLeveler) from IBM
- PBS from Altair Engineering
- TORQUE, Maui, Moab from Adaptive Computing
- Univa Grid Engine
- OpenLava
本文采用 知识共享署名 4.0 国际许可协议(CC-BY 4.0)进行许可。转载请注明来源: https://snowfrs.com/2020/11/20/Workload-Manager.html 欢迎对文中引用进行考证,欢迎指出任何不准确和模糊之处。