Schedulers

Bioluigi extend the scope of Luigi by interfacing external schedulers such as Slurm to dispatch the execution of external programs.

The simplest way of specifying a scheduler is via the configuration so that it become the default for any scheduled external programs.

[bioluigi]
scheduler=slurm
scheduler_extra_args=

The second option is to set it explicitly when invoking a task.

bcftools.Annotate(scheduler='slurm')

Local

The local scheduler performs the execution within the context of a Luigi worker and consumes two kind of resources: cpus and memory. These resources must be specified in luigi.cfg.

[resources]
cpus=16
memory=32

This scheduler is used by default.

SSH

The SSH scheduler allows to run external programs on a remote host.

The SSH scheduler consumes two resources: ssh_cpus and ssh_memory which should be set according to the remote host capabilities.

[resources]
ssh_cpus=16
ssh_memory=32

[bioluigi.schedulers.ssh]
ssh_bin=ssh
remote=
port=
user=
identity_file=
extra_args=[]

Slurm

The Slurm scheduler allows to dispatch external programs on a Slurm cluster

Unlike the local scheduler, no resource allocation is performed via Luigi resource mechanism. Instead, two resources are consumed: slurm_jobs and slurm_cpus to respectively control how many jobs and CPUs can be allocated on the cluster.

Note

Each scheduled external program consume a mostly idle Luigi worker and for concurrency to be achieved, many of them have to be specified with the --workers flag.

luigi --module tasks --workers 32 <task> <task_args>
[resources]
slurm_jobs=32
slurm_cpus=256

[bioluigi.schedulers.slurm]
srun_bin=srun
squeue_bin=squeue
partition=
extra_args=
track_job_status=False

Job Status Tracking

The Slurm scheduler supports job status tracking by polling via squeue --json. This feature is not enabled by default because it seems to be causing high CPU utilization.

[bioluigi.schedulers.slurm]
track_job_status=True