Job Manager

From COSSAN Wiki
Jump to: navigation, search
Question book-new.svg This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources.

A Job manager is a software employed for resource management in heterogeneous distributed computing environments (i.e., computers cluster, computers grid). Workload management means that the use of shared resources is controlled to best achieve goals such as productivity, timeliness, level-of-service, and so forth. Workload management is accomplished through managing resources and administering policies. Sites configure the system to maximize usage and throughput, while the system supports varying levels of timeliness and importance. Job deadlines are instances of timeliness. Job priority and user share are instances of importance.

A job management software provides advanced resource management and policy administration for etherogeneus environments (machines runnig different OSes, like Windows, Unices, etc.) that are composed of multiple shared resources. These interconnected machine, sharing computational power, form an interconnected structure called grid. The features of the state-of-the-art job manager used in a grid are teh following:

  • Innovative dynamic scheduling and resource management that allows grid engine software to enforce site-specific management polices.
  • Dynamic collection of performance data to provide the scheduler with up-to-the-minute job level resource consumption and system load information.
  • Availability of enhanced security by way of Certificate Security Protocol (CSP)-based encryption. Instead of transferring messages in clear text, the messages in this more secure system are encrypted with a secret key.
  • High-level policy administration for the definition and implementation of enterprise goals such as productivity, timeliness, and level-of-service.

The job management software provides users with the means to submit computationally demanding tasks to the grid for transparent distribution of the associated workload. Users can submit batch jobs, interactive jobs, and parallel jobs to the grid.

The product also supports checkpointing programs. Checkpointing jobs migrate from workstation to workstation without user intervention on load demand.

For the administrator, the software provides comprehensive tools for monitoring and controlling jobs.

The job manager does all of the following:

  • Accepts jobs. Jobs are users' requests for computer resources. Each job includes a description of what to do and a set of property definitions that that describe how the job should be run. Users can submit jobs via the command line interface or Grid Engine's graphical user interface, QMON. Users can also use the optional Distributed Resource Management Application API (DRMAA) to automate grid engine functions by writing scripts to submit and control jobs.
  • Holds jobs. The Sun Grid Engine master daemon holds jobs until the needed compute resources become available.
  • Sends jobs. When the compute resources become available, the master daemon sends the job to the appropriate execution host. The execution daemon on that host then executes the job.
  • Manages running jobs. The master daemon manages running jobs. At a fixed interval, the master daemon receives reports from each execution daemon.
  • Logs the record of job execution when the jobs are finished. The master daemon stores raw data. Users can also use the Accounting and Reporting Console (ARCo) to gather live reporting data from the Grid Engine system and to store the data for historical analysis in the reporting database, which is a standard SQL database.

See also