Schedulers
Bioluigi extend the scope of Luigi by interfacing external schedulers such as Slurm to dispatch the execution of external programs.
The simplest way of specifying a scheduler is via the configuration so that it become the default for any scheduled external programs.
[bioluigi]
scheduler=slurm
scheduler_extra_args=
The second option is to set it explicitly when invoking a task.
bcftools.Annotate(scheduler='slurm')
Local
The local scheduler performs the execution within the context of a Luigi
worker and consumes two kind of resources: cpus and memory. These
resources must be specified in luigi.cfg.
[resources]
cpus=16
memory=32
This scheduler is used by default.
SSH
The SSH scheduler allows to run external programs on a remote host.
The SSH scheduler consumes two resources: ssh_cpus and ssh_memory which
should be set according to the remote host capabilities.
[resources]
ssh_cpus=16
ssh_memory=32
[bioluigi.schedulers.ssh]
ssh_bin=ssh
remote=
port=
user=
identity_file=
extra_args=[]
Slurm
The Slurm scheduler allows to dispatch external programs on a Slurm cluster
Unlike the local scheduler, no resource allocation is performed via Luigi
resource mechanism. Instead, two resources are consumed: slurm_jobs and
slurm_cpus to respectively control how many jobs and CPUs can be allocated
on the cluster.
Note
Each scheduled external program consume a mostly idle Luigi worker and for
concurrency to be achieved, many of them have to be specified with the
--workers flag.
luigi --module tasks --workers 32 <task> <task_args>
[resources]
slurm_jobs=32
slurm_cpus=256
[bioluigi.schedulers.slurm]
srun_bin=srun
squeue_bin=squeue
partition=
extra_args=
track_job_status=False
Job Status Tracking
The Slurm scheduler supports job status tracking by polling via squeue --json.
This feature is not enabled by default because it seems to be causing high CPU
utilization.
[bioluigi.schedulers.slurm]
track_job_status=True