Introduction

Aozan is a tool that automatically handle Illumina sequencer generated data from the end of sequencing to the demultiplexing while also performing quality control. One of the greatest strength of Aozan is that it doesn't require any user action to process data.

Each step of the post-sequencing data processing is rather easy to do. However, the duration of each step (data transfer, demultiplexing and quality control) is quite long and before getting data ready to analysis, user must watch the end of each step to avoid waste of time. Executing these tasks after each sequencing process is a laborious job. Aozan allows to save time by automate all these tasks. In addition, Aozan provide a Bcl2fastq CSV samplesheet generator from an XLS or XLSX file to avoid common syntax errors in the CSV file and to allow usage of alias for the index sequences. This online tool is available here.

Principles

Aozan is not an interactive tool, it communicates with users through mails. It is launched regularly (usually every hour) through a cron job. There are 6 steps in Aozan. Once the end of a run has been discovered, synchronization, demultiplexing, recompression and quality control will be automatically executed. However, if the end of another run is discovered at the end of this last 4 steps, the synchronization of the new run will be launched before resuming the end of the analysis of the previous run.

The 6 native steps of Aozan are:

  1. New run discovering step
  2. End run discovering step
  3. Synchronization step (Optional)
  4. Demultiplexing step
  5. Recompression step (Optional)
  6. Quality control step

Aozan demo installation script

To simply the installation and configuration of Azan, we provide a shell script, that allow to create all directories required by Aozan and a valid configuration file for your system. This script can also download all the files required by the demo (Azoan, raw data and reference data). However, you still need to install the Aozan requierements.

The script is available in the example data section of the documentation.

Requirements

To run Aozan, you need to install the following software:

  • Java 7 or above (tested with Oracle JRE and OpenJDK)
  • bcl2fastq 2 (tested with bcl2fastq 2.17.1.14 and 2.18.0.12)
  • rsync 3.0.x or later

On Debian/Ubuntu, you can install requirements (except Bcl2fastq) using the 'apt-get' command, here is an example:

$ sudo apt-get install openjdk-7-jre-headless rsync

Installing bcl2fastq2

The Bcl2FastQ conversion software is a tool which handle bcl conversion and demultiplexing of both unzipped and zipped bcl files. bcl2fastq 2 can be downloaded on the Illumina website here.

On CentOS, you can install Bcl2fastq using the following commands:

$ cd /tmp
$ wget http://support.illumina.com/content/dam/illumina-support/documents/downloads/software/bcl2fastq/bcl2fastq2-v2-18-0-12-linux-x86-64.zip

# Install
$ unzip bcl2fastq2-*.zip
$ yum -y --nogpgcheck localinstall /tmp/bcl2fastq2-*.rpm

# Patch a punctual error to search css file for create the final report html
$ cd /usr/local/bin
$ ln -s ../share/

# Install requiered dependencies
$ yum install -y zip.x86_64

As Bcl2fastq 2 is a static binary, you can also use the RPM package on Debian/Ubuntu using the following commands:

$ cd /tmp
$ wget http://support.illumina.com/content/dam/illumina-support/documents/downloads/software/bcl2fastq/bcl2fastq2-v2-18-0-12-linux-x86-64.zip
$ unzip bcl2fastq2-*.zip
$ alien -i bcl2fastq2-*.rpm

Installing ncbi-blast+ (optional requireement for quality control)

In Aozan, the output of the "Overrepresented Sequences" module from FastQC has been improved. For sequences labelled as "No hit", we launch a blast on the NR databank and report its best hit. This greatly helps for the discovery of contaminating sequences.

Aozan can use Blast2 ou Blast+ to perform the blast.

To installing ncbi-blast+ on your system (Debian or Ubuntu), use the following command line:

$ sudo apt-get install ncbi-blast+

Now download the required "nt" database from NCBI :

$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.??.tar.gz*

Unzip all files, the first nt.00.tar.gz contains the file nt.nal.

Use the perl script to update the database, you can consult the NCBI documentation.

Installation

The installation of Aozan is very easy, you just had to uncompress the archive:

$ tar xzf aozan-2.0.tar.gz

Aozan is written in Python and Java. It uses the Java implementation of Python (Jython) that is bundled in Aozan.

Installing using Docker

Aozan and its dependancies are available throw Docker images. You can:

  • Use a Docker image with Aozan and all its optional dependencies
  • Use Docker only for Bcl2fastq or blast executables

To see how install docker on your system, go to the Docker website. Even if Docker can run in virtual machines in Windows or macOS, we recommand to only run Aozan on a Linux host.

Aozan Docker image

You can use a Docker image with Aozan and all its optional dependencies (Bc2fastq and Blast) instead of installating manually Aozan. This image is named genomicpariscentre/aozan:2.0. When you use this Docker image you need to mount all the required directories by Aozan in the Docker container.

Bcl2fastq and Blast Docker images

If you had installed manually Aozan, you can launch bcl2fastq and/or blast inside a Docker container. To do this, you only need to set the Aozan bcl2fastq.use.docker configuration property to True for bcl2fastq and qc.conf.fastqc.blast.use.docker to True for Blast. If you do not use the /var/run/docker.sock socket to communicate with Docker deamon, you must change the value of the docker.uri setting in the Aozan configuration.

Launching Aozan

Aozan is usually launched regularly as a cron job. However, Aozan can also be launched manually. In the following examples, Aozan is installed in /usr/local/aozan and the configuration file is /etc/aozan.conf. Note that it is better to configure your aozan.conf file before running Aozan.

The configuration file is a text file and parameters are key-value pairs. See the pages about steps for more details.

Launching Aozan manually

In this case, we can launch Aozan with the following command:

$ /usr/local/aozan/aozan.sh /etc/aozan.conf

Launching Aozan as cron job

In the following lines, we configure our system to launch Aozan every hour using a script named /etc/cron.daily/aozan (on a Debian/Ubuntu GNU/Linux distribution).

#!/bin/bash

# User to use to launch Aozan
AOZAN_USER=nobody

# Path to Aozan base directory
AOZAN_DIR=/usr/local/aozan

# Path to Aozan configuration
AOZAN_CONF=/etc/aozan.conf

su $AOZAN_USER -c "$AOZAN_DIR/aozan.sh --quiet $AOZAN_CONF"

The --quiet option avoid displaying message if another Aozan instance is currently running.

Then we set the permission on the Aozan cron script:

$ sudo chmod 755 /etc/cron/daily/aozan && sudo chmod root:root /etc/cron/daily/aozan

Accessing Sequencer Data

Aozan can handle several sequencer instruments. For each instruments you must allow Aozan computer to have access to HiSeq output directories. On HiSeq 2000/2500, 2 hard drives are dedicated to each flow cell slot. So you must share each hard drive with Aozan computer.

You can also choose to force the sequencer to directly write its data on a network storage like a NAS. In this case you must mount this network storage (using preferably an Unix network file system like NFS) on the computer where Aozan is installed.

Enable sharing on HiSeq computer

First on the sequencer computer, share the hard drive that contains generated data (usualy F: and G:). To do this, open the explorer and right-click on each Hard drive, share... The shares can be in read only mode (recommended).

Security issues: we recommend to shares sequencer output directories in read only mode and restrict the shares access to Aozan computer. To do this, you can configure the Windows firewall.

Mount Windows shares on linux

  • First install the tools for mounting Windows shares (CIFS):
  • $ sudo apt-get install cifs-utils smbclient
    
  • Then, test if you can connect to the share:
  • $ smbclient -U sbsuser 'smb://hiseq01.example.com/F$'
    
  • And now you can add the following lines in /etc/fstab:
  • //hiseq01.example.com/F$   /mnt/hiseq01_f    cifs    username=sbsuser,password=hiseqpassword       0       0
    //hiseq01.example.com/G$   /mnt/hiseq01_g    cifs    username=sbsuser,password=hiseqpassword       0       0
    
  • Now create the mount points and process the mounting of the shares:
  • $ sudo mkdir -p /mnt/hiseq01_f /mnt/hiseq01_g && \
       sudo mount /mnt/hiseq01_f && \
       sudo mount /mnt/hiseq01_g
    

You can also use autofs to mount the share.

Directories requiered by Aozan

To work, Aozan needs the following directories. The path of these directories must be set in the Aozan configuration file.

An example of an Aozan configuration file can be found here.

Aozan property Sample value description
aozan.var.path (*) /var/lib/aozan Aozan internal data directory. It contains log files and history of processed runs
aozan.log.path /var/log/aozan Path to the Aozan log file
hiseq.data.path /mnt/hiseq01_f:/mnt/hiseq01_g Hiseq output directories. Multiple values are allowed if there is several sequencers or 2 output directories for each flow cell of an HiSeq 2000 (paths separated by ':')
bcl.data.path /mnt/storage/bcl Sequencer output data after synchronization. Usualy cif files are not copied in this directory
fastq.data.path /mnt/storage/fastq Directory for the output of demultiplexing with Bcl2fastq
reports.data.path /mnt/storage/reports Directory for the QC report
bcl2fastq.samplesheet.path /mnt/storage/samplesheet Directory with Bcl2fastq sample sheets (with files named like samplesheet_INSTRUMENT-SN_RUN-NUMBER.xls where INSTRUMENT-SN is the instrument serial number and RUN-NUMBER is the run number, e.g. samplesheet_SNL125_0067.xls) for demultiplexing. If a custom script is used to generate CSV samplesheet files, this directory will no be used.
tmp.path /tmp Temporary directory

(*) The directory specified in field aozan.var.path contains the following files. Aozan allows to process several runs at the same time. At the end of a step, it adds the run id of that run that has been processed in the step log file.

  • first_base_report.done : list of the run ids processed by the first base report step;
  • hiseq.done : list of the run ids processed by the end run discovering step;
  • hiseq.deny : list of the run ids to not process (user created file);
  • sync.done : list of the run ids processed by the synchronization step;
  • sync.deny : list of the run ids processed by the synchronization step;
  • recompress.done : list of the run ids processed by the recompression step;
  • demux.done : list of the run ids processed by the demuliplexing step;
  • qc.done : list of the run ids processed by the quality control step;
The following files can also be created to set priority or disable specific runs.
  • runs.priority : list of the run ids processed in priority by aozan when available;
  • [step].deny : list of the run ids that won't be processed by a step;

Aozan general configuration

This section describe the Aozan global configuration settings. For the steps settings, check in the steps documentation.

An example of aozan configuration file is here.

General configuration

Aozan property Type Default value description
include string No set Load the configuration entries from another configuration file path. The values loaded from this new configuration file override existing values
aozan.enable boolean False Enable Aozan
aozan.log.level string INFO Log level (ALL, FINEST, FINER, FINE, CONFIG, INFO, WARNING, SEVERE, OFF)
aozan.debug boolean False Enable debug mode
lock.file string /var/lock/aozan.lock Aozan lock file path. This file that prevent two instances of Aozan running at the same time
index.html.template string Not set HTML page template that describe a run. If not set, the default template included in the aozan jar file will be used
reports.url string Not set Run reports URL
hiseq.critical.min.space integer 1099511627776 Threshold before sending a email at each Aozan start if not enough space is available on HiSeq output disk, the value corresponds 1 Tb in bytes
read.only.output.files boolean True Set rights of output files to read only

Mail configuration

Email is the only mean for Aozan to inform users. This section show how to configure Aozan email sending. Aozan currently only support sending email using SMTP without authentification and encryption.

Aozan property Type Default value description
send.mail boolean False Enable sending email
smtp.server string Not set SMTP server address
smtp.port integer 25 (465 is SSL enabled) SMTP server port
smtp.use.starttls boolean False Use StartTLS to connect to the SMTP server
smtp.use.ssl boolean False Use SSL to connect to the SMTP server
smtp.login string Not set Login to use for the connection to the SMTP server
smtp.password string Not set Password to use for the connection to the SMTP server
mail.from string Not set Email of the sender
mail.to string Not set Email recipient
mail.error.to string Not set Email recipient when an error occurs during Aozan
mail.header string THIS IS AN AUTOMATED MESSAGE.\n\n Email header
mail.footer string \n\nThe Aozan team.\n Email footer