Getting started with Bioluigi
The first step is to install Bioluigi package in your current environment with pip.
pip install bioluigi
Then, we need to create a luigi.cfg
file to indicate the location of the
tools that will be invoked. You can omit this step if they are defined in your
$PATH
.
[bioluigi]
cutadapt_bin=cutadapt
star_bin=STAR
rsem_dir=
Now, let’s setup a simple single read RNA-Seq pipeline. For this use case, we will first trim our single-end reads and then align them on a human reference genome.
Note that rsem.CalculateExpression
depends on rsem.PrepareReference
so
we’ll also need to pass the arguments necessary to generate the reference
genome index. The index will be generated once and reused for subsequent tasks.
import datetime
import os
import luigi
from import bioluigi.tasks import cutadapt, rsem
def QuantifySample(luigi.Task):
sample_id = luigi.Parameter(description='Sample identifier')
def input(self):
return luigi.LocalTarget('{}.fastq.gz'.format(self.sample_id))
def run(self):
sample = self.input()
trimmed_reads = yield cutadapt.TrimReads(sample.path,
adapter_3prime='ACGTAGCGAGA...')
isoform_expr, genes_expr = yield rsem.CalculateExpression('genomes/hg38_ensembl98/annotation.gtf',
['genomes/hg38_ensembl98/primary_assembly.fa'],
[trimmed_reads.path],
'references/hg38_ensembl98',
self.sample_id,
aligner='star',
walltime=datetime.timedelta(hours=4),
cpus=8,
memory=32)
def output(self):
return luigi.LocalTarget('{}.genes.results'.format(self.sample_id))
The important features to notice here are walltime
, cpus
and memory
parameters. When run with a supporting scheduler, all the
tasks will be dispatched on the cluster with allocated resources.
To run our task on a given sample:
luigi --module tasks QuantifySample --sample-id SRR...
To see more advanced usage of Bioluigi, take a look at our RNA-Seq pipeline.