Getting started with Bioluigi

The first step is to install Bioluigi package in your current environment with pip.

pip install bioluigi

Then, we need to create a luigi.cfg file to indicate the location of the tools that will be invoked. You can omit this step if they are defined in your $PATH.

[bioluigi]
cutadapt_bin=cutadapt
star_bin=STAR
rsem_dir=

Now, let’s setup a simple single read RNA-Seq pipeline. For this use case, we will first trim our single-end reads and then align them on a human reference genome.

Note that rsem.CalculateExpression depends on rsem.PrepareReference so we’ll also need to pass the arguments necessary to generate the reference genome index. The index will be generated once and reused for subsequent tasks.

import datetime
import os

import luigi
from import bioluigi.tasks import cutadapt, rsem

def QuantifySample(luigi.Task):
    sample_id = luigi.Parameter(description='Sample identifier')

    def input(self):
        return luigi.LocalTarget('{}.fastq.gz'.format(self.sample_id))

    def run(self):
        sample = self.input()

        trimmed_reads = yield cutadapt.TrimReads(sample.path,
            adapter_3prime='ACGTAGCGAGA...')

        isoform_expr, genes_expr = yield rsem.CalculateExpression('genomes/hg38_ensembl98/annotation.gtf',
            ['genomes/hg38_ensembl98/primary_assembly.fa'],
            [trimmed_reads.path],
            'references/hg38_ensembl98',
            self.sample_id,
            aligner='star',
            walltime=datetime.timedelta(hours=4),
            cpus=8,
            memory=32)

    def output(self):
        return luigi.LocalTarget('{}.genes.results'.format(self.sample_id))

The important features to notice here are walltime, cpus and memory parameters. When run with a supporting scheduler, all the tasks will be dispatched on the cluster with allocated resources.

To run our task on a given sample:

luigi --module tasks QuantifySample --sample-id SRR...

To see more advanced usage of Bioluigi, take a look at our RNA-Seq pipeline.