Writing an Aozan plug-in

Requirements

To develop an Aozan plugin, you need:

  • A Linux 64 bit (x86-64) distribution (e.g. Ubuntu, Debian, Fedora...). You can develop on other operating systems but your system will not run Aozan
  • A Java 7 SDK (Oracle JDK or OpenJDK are recommended)
  • Maven 3.2.x
  • An IDE like Eclipse or Intellij IDEA (Optional)
  • An Internet connection

If you use Ubuntu 14.04 (Trusty Tahr), you can install all the requirements with the next command line:

$ sudo apt-get install openjdk-7-jdk maven eclipse-jdt

Creation of the project

Maven simplify the management of project dependencies, that's why in this example we use Maven to build our project. It is not mandatory to use Maven but it is quite harder without.

  • First we generate the skeleton of our plugin with Maven. NB: pom.xml should not exist in the current folder.
  • $ mvn archetype:generate \
      -DarchetypeGroupId=org.apache.maven.archetypes \
      -DarchetypeArtifactId=maven-archetype-quickstart \
      -DgroupId=com.example  \
      -DartifactId=myaozanplugin \
      -Dversion=0.1-alpha-1 \
      -Durl=http://example.com/aozanplugin \
      -DinteractiveMode=false
    
  • You will obtain the following files. Samples App.java and AppTest.java files will not be used in your plug-in. You can remove them but keep the com.example package folders.
  • myaozanplugin
    |-- pom.xml
    `-- src
        |-- main
        |   `-- java
        |       `-- com
        |           `-- example
        |               `-- App.java
        `-- test
            `-- java
                `-- com
                    `-- example
                        `-- AppTest.java
     
  • Next edit the pom.xml at the root of the project to add the Aozan dependency and the ENS repository where Aozan dependency is available:
  •   <repositories>
        <repository>
          <snapshots>
            <enabled>true</enabled>
          </snapshots>
          <id>ens</id>
          <name>ENS repository</name>
          <url>http://outils.genomique.biologie.ens.fr/maven2</url>
        </repository>
      </repositories>
    
      <dependencies>
        <dependency>
          <groupId>fr.ens.biologie.genomique</groupId>
          <artifactId>aozan</artifactId>
          <version>2.0</version>
          <scope>compile</scope>
        </dependency>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>3.8.1</version>
          <scope>test</scope>
        </dependency>
    
        <!-- Add specific library needed for plugin -->
        <dependency>
          <!-- jsoup HTML parser library @ http://jsoup.org/ -->
          <groupId>org.jsoup</groupId>
          <artifactId>jsoup</artifactId>
          <version>1.7.3</version>
        </dependency>
      </dependencies>
    
  • In the pom.xml add also a build section to set the compilation mode to java 1.7 and set the path of the java resources. The src/main/java/files folder is where the developer put resource files and the src/main/java/META-INF directory is for the metadata of your plug-in.
  •   <build>
        <resources>
          <resource>
            <directory>src/main/java/files</directory>
          </resource>
          <resource>
            <directory>src/main/java/META-INF</directory>
            <targetPath>META-INF</targetPath>
          </resource>
        </resources>
        <plugins>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>1.7</source>
              <target>1.7</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
    
  • Now you can generate an eclipse project with :
  • $ mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
    
  • A warning message may appear if some source or javadoc dependencies cannot be found.
  • To import the project in Eclipse, go to File > Import... > General > Existing projects into Workspace and select the root directory of the myaozanplugin project. By clicking on the finish button, myaozanplugin will be imported into Eclipse workspace.

Coding the plug-in

As an example, we show bellow a plug-in example that extract the focus score value for each base (A, T, G, C) from the HTML first base report file and check if collected values are in a interval defined by the user in the Aozan configuration file.

Warning: This plugin will not work with the latest Illumina sequencers (e.g. HiSeq 3/4000, NextSeq 500...) as the First_Base_Report.htm file is no more generated by this sequencers.

  • In the package com.example.myaozanplugin create a class name FocusScoreCollector that extends Collector. This class allow to collect the focus score data from the first base report.
  • package com.example;
    
    import java.io.File;
    import java.io.IOException;
    import java.util.Arrays;
    import java.util.Collections;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    import java.util.Properties;
    
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    
    import fr.ens.biologie.genomique.aozan.AozanException;
    import fr.ens.biologie.genomique.aozan.Globals;
    import fr.ens.biologie.genomique.aozan.QC;
    import fr.ens.biologie.genomique.aozan.RunData;
    import fr.ens.biologie.genomique.aozan.collectors.Collector;
    import fr.ens.biologie.genomique.aozan.collectors.CollectorConfiguration;
    import fr.ens.biologie.genomique.aozan.collectors.RunInfoCollector;
    
    public class FocusScoreCollector implements Collector {
    
      /** The collector name. */
      public static final String COLLECTOR_NAME = "focusscore";
      public static final String PREFIX_DATA = "focus.score.firstbasereport";
    
      /** Bases authorized. */
      private static final List<String> BASES = Arrays.asList("A", "T", "G",
          "C");
    
      private File firstBaseReport;
    
      private String qcReportRunPath;
    
      @Override
      public String getName() {
    
        return COLLECTOR_NAME;
      }
    
      @Override
      public List<String> getCollectorsNamesRequiered() {
        return Collections.singletonList(RunInfoCollector.COLLECTOR_NAME);
      }
    
      @Override
      public void configure(QC qc, CollectorConfiguration conf) {
    
        this.qcReportRunPath = qc.getQcDir().getPath();
    
      }
    
      @Override
      public void collect(RunData data) throws AozanException {
        final String runId = data.get("run.info.run.id");
    
        // Path to the first base report file
        this.firstBaseReport =
            new File(this.qcReportRunPath + "/../report_" + runId,
                "First_Base_Report.htm");
    
        if (!this.firstBaseReport.exists()) {
          throw new AozanException("Fail of collector "
              + getName() + ": First base report not found at "
              + this.firstBaseReport.getAbsolutePath());
        }
    
        // Collect data in first base report
        Document doc = null;
        try {
          doc = Jsoup.parse(firstBaseReport, Globals.DEFAULT_FILE_ENCODING.name());
          parse(doc, data);
    
        } catch (IOException e1) {
          throw new AozanException(e1);
        }
      }
    
      @Override
      public void clear() {
    
      }
    
      //
      // Private methods
      //
    
      /**
       * Parse first base report html file
       * @param doc document represent html file
       * @param data result data object
       * @throws AozanException
       */
      private void parse(final Document doc, final RunData data)
          throws AozanException {
    
        final String endFirstColumnName = "Focus Score";
        final int laneCount = data.getInt("run.info.flow.cell.lane.count");
    
        // Summary focus score per lane
        final Map<Integer, Double> scoresPerLane =
            new HashMap<>(laneCount);
    
        // Init map, all values at 0
        for (int i = 1; i <= laneCount; i++)
          scoresPerLane.put(i, 0.0);
    
        // Summary focus score per base for all lanes
        final Map<String, Double> scoresPerBase =
            new HashMap<>(4);
    
        // Init map all values at 0
        for (String b : BASES)
          scoresPerBase.put(b, 0.0);
    
        // Parsing two table (bottom and top)
        for (Element table : doc.select("table")) {
          for (Element row : table.select("tr")) {
            if (row.select("td").first().text().endsWith(endFirstColumnName)) {
              parseLane(row, scoresPerLane, scoresPerBase);
            }
          }
        }
    
        // Save in run data
        writeRundata(data, scoresPerLane, scoresPerBase);
      }
    
      /**
       * Parse a row table for focus score and save scores for each lane and each
       * base
       * @param focusRow contains row table from first base report html file
       * @param scoresPerLane map to save sum scores per lane
       * @param scoresPerBase map to save sum scores per base
       * @throws AozanException occurs if base is unknown or if the conversion score
       *           in double fails.
       */
      private void parseLane(final Element focusRow,
          final Map<Integer, Double> scoresPerLane,
          final Map<String, Double> scoresPerBase) throws AozanException {
    
        int currentLaneNumber = 0;
        boolean first = true;
        String base = "";
    
        // Parse elements of a row
        for (Element col : focusRow.select("td")) {
          // Skip name line
          if (first) {
            base =
                Character.toString(col.text().charAt(0)).toUpperCase(
                    Globals.DEFAULT_LOCALE);
    
            first = false;
    
            if (!BASES.contains(base))
              throw new AozanException("Collector "
                  + getName() + ": focus base unknown " + base);
    
          } else {
    
            // Retrieve focus score
            final Double value = Double.parseDouble(col.text());
    
            if (value < 0.0)
              throw new AozanException("Collector "
                  + getName() + ": focus score invalid " + value);
    
            // Update map
            currentLaneNumber++;
            scoresPerLane.put(currentLaneNumber,
                scoresPerLane.get(currentLaneNumber) + value);
    
            scoresPerBase.put(base, scoresPerBase.get(base) + value);
          }
        }
      }
    
      /**
       * Update run data
       * @param data result data object
       * @param scoresPerLane map to save sum scores per lane
       * @param scoresPerBase map to save sum scores per base
       */
      private void writeRundata(RunData data, Map<Integer, Double> scoresPerLane,
          Map<String, Double> scoresPerBase) {
    
        // Save mean focus score in run data
        for (Map.Entry<Integer, Double> entry : scoresPerLane.entrySet()) {
          double val = entry.getValue().doubleValue() / (4.0 * 2);
          data.put(PREFIX_DATA + ".lane" + entry.getKey(),
              String.format("%.2f%n", val));
        }
    
        // Save mean focus score for all lane at each base
        for (Map.Entry<String, Double> entry : scoresPerBase.entrySet()) {
          double val = entry.getValue().doubleValue() / (scoresPerLane.size() * 2);
          data.put(PREFIX_DATA + ".run.base" + entry.getKey(),
              String.format("%.2f%n", val));
        }
    
      }
    }
    
  • The second class to create in the package com.example.myaozanplugin is a class that extends AbstractSimpleLaneTest: FocusScoreLaneTest. This class define a test class that will check if focus values are in the interval defined by the user in the Aozan configuration file. The last section of page show how to enable this test.
  • package com.example;
    
    import java.util.Arrays;
    import java.util.List;
    
    import fr.ens.biologie.genomique.aozan.collectors.RunInfoCollector;
    import fr.ens.biologie.genomique.aozan.tests.lane.AbstractSimpleLaneTest;
    import fr.ens.biologie.genomique.aozan.util.ScoreInterval;
    
    public class FocusScoreLaneTest extends AbstractSimpleLaneTest {
    
      private final ScoreInterval interval = new ScoreInterval();
    
      @Override
      protected String getKey(final int read, final boolean indexedRead, final int lane) {
        return FocusScoreCollector.PREFIX_DATA + ".lane" + lane;
      }
    
      @Override
      protected Class<?> getValueType() {
    
        return Double.class;
      }
    
      @Override
      public List<String> getCollectorsNamesRequiered() {
        return Arrays.asList(RunInfoCollector.COLLECTOR_NAME,
            FocusScoreCollector.COLLECTOR_NAME);
      }
    
      /**
      /* Public constructor.
       **/
      public FocusScoreLaneTest() {
        super("lane.focus.score", "from first base report",
            "Focus Score (first estimation)");
      }
    }
    

Register the plugin

Like many java components (JDBC, JCE, JNDI...), Aozan use the Service provider Interface (spi) system for its plugin system. To get a functional spi plugin, you need a class that implements an interface (here FocusScoreCollector implements the Collector interface and FocusScoreLaneTest implements the AozanTest interface) and a declaration of your implementation of the interface in the metadata. To register your collector and your test in the metadata use the following command lines:

$ mkdir -p src/main/java/META-INF/services
$ echo com.example.FocusScoreCollector > src/main/java/META-INF/services/fr.ens.biologie.genomique.aozan.collectors.Collector
$ echo com.example.FocusScoreLaneTest > src/main/java/META-INF/services/fr.ens.biologie.genomique.aozan.tests.AozanTest

Compile the plugin

The compilation is quite simple, at the root of your project launch:

$ mvn clean install

This command line will clean the target directory before lauching the compilation. You will obtain a myaozanplugin-0.1-alpha-1.jar jar archive that contains your plugin in the target directory.

Install the plugin

To install an Aozan plugin, you just have to copy the generated jar file from the target directory of your project to the lib directory of your Aozan installation with the other specific libraries needed by plugin (jsoup-1.7.3.jar for this example). Your plug-in is now ready to use like the other built-in collectors and tests of Aozan.

To enable your plug-in, you must update the Aozan configuration file and set parameters for the required test. For this example, must you add the following one line to enable the focusscore test and another line to set the expected interval:

qc.test.lane.focus.score.enable=True
qc.test.lane.focus.score.interval=[75.0, 100.0]

After running Aozan with this example plug-in, we've got the following quality control report:

quality control report after plugin execution