<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1746-4811-5-8</ui>
   <ji>1746-4811</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Estill</snm>
               <mi>C</mi>
               <fnm>James</fnm>
               <insr iid="I1"/>
               <email>JamesEstill@gmail.com</email>
            </au>
            <au id="A2">
               <snm>Bennetzen</snm>
               <mi>L</mi>
               <fnm>Jeffrey</fnm>
               <insr iid="I2"/>
               <email>maize@uga.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Plant Biology, The University of Georgia, Athens, Georgia 30602-7271, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Genetics, The University of Georgia, Athens, Georgia 30602-7223, USA</p>
            </ins>
         </insg>
         <source>Plant Methods</source>
         <issn>1746-4811</issn>
         <pubdate>2009</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>8</fpage>
         <url>http://www.plantmethods.com/content/5/1/8</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19545381</pubid>
               <pubid idtype="doi">10.1186/1746-4811-5-8</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>30</day>
               <month>4</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>19</day>
               <month>6</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>19</day>
               <month>6</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Estill and Bennetzen; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>High quality annotation of the genes and transposable elements in complex genomes requires a human-curated integration of multiple sources of computational evidence. These evidences include results from a diversity of <it>ab initio </it>prediction programs as well as homology-based searches. Most of these programs operate on a single contiguous sequence at a time, and the results are generated in a diverse array of readable formats that must be translated to a standardized file format. These translated results must then be concatenated into a single source, and then presented in an integrated form for human curation.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have designed, implemented, and assessed a Perl-based workflow named DAWGPAWS for the generation of computational results for human curation of the genes and transposable elements in plant genomes. The use of DAWGPAWS was found to accelerate annotation of 80&#8211;200 kb wheat DNA inserts in bacterial artificial chromosome (BAC) vectors by approximately twenty-fold and to also significantly improve the quality of the annotation in terms of completeness and accuracy.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The DAWGPAWS genome annotation pipeline fills an important need in the annotation of plant genomes by generating computational evidences in a high throughput manner, translating these results to a common file format, and facilitating the human curation of these computational results. We have verified the value of DAWGPAWS by using this pipeline to annotate the genes and transposable elements in 220 BAC insertions from the hexaploid wheat genome (<it>Triticum aestivum </it>L.). DAWGPAWS can be applied to annotation efforts in other plant genomes with minor modifications of program-specific configuration files, and the modular design of the workflow facilitates integration into existing pipelines.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Genomic sequence assemblies are rapidly being published for a great number of species <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. The sequence data used to produce genome assemblies are being generated at ever-increasing rates for reduced costs <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, indicating that the genomes of many more plant species will be <it>de novo </it>sequenced in coming years. The relative value of these sequencing efforts is a direct function of the accuracy of the annotation of the resultant sequence assemblies. Genome annotation seeks to delineate the sequence features that occur on the genome, thereby permitting definition of the biological processes responsible for these features <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. In plants, the sequence characteristics that are most critical to our interpretation of gene function and genome evolution include both genes and transposable elements (TEs) <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>.</p>
         <p>Identification of the genes that have been uncovered in assembled genome sequence data can utilize evidence from both <it>ab initio </it>gene annotation programs as well as sequence similarity searches against databases of previously identified proteins and expressed RNA <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. The <it>ab initio </it>gene finding programs derive full gene models from DNA sequence data based solely on knowledge of the sequence features associated with protein coding domains. Sequence alignments can refine the exon-intron boundaries of these models and provide evidence that computationally predicted genes are actually transcribed <it>in vivo</it>. Existing software can automatically synthesize these data to derive combined evidence gene models <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>While this combination of <it>ab initio </it>and homology-based approaches have been used to accurately annotate genes in a number of eukaryotic genomes, plant genome annotation efforts cannot focus solely on the annotation of genes due to the risk of conflating genes with transposable elements <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Many TEs contain open reading frames (ORFs) that generate the proteins required for TE transposition. The <it>ab initio </it>gene annotation programs will often annotate these TE ORFs as genes. Since most TE genes are expressed and represented in cDNA libraries, homology-based searches will indicate that these ORFs are transcribed and they thus may be considered legitimate gene predictions. Simply removing the high-copy-number candidate genes does not alleviate this problem because some true gene families are highly abundant while not all transposable elements are highly repetitive <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. These erroneous gene annotations are especially problematic in plant genomes where transposable elements make up the majority of sequenced genome space. Since these false positive gene predictions cannot be mitigated by gene prediction methods alone, plant genome annotation must directly annotate TEs in order to remove them from the gene candidate list.</p>
         <p>Similar to the prediction of genes, accurate identification of the TEs in genomic sequence data combines homology-based searches and <it>ab initio </it>results <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. Tools for <it>ab initio </it>transposable element discovery can exploit the fact that many families of TEs occur in high copy number within a host genome <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>, or they can utilize diagnostic structural features such as tandem inverted repeats (TIRs) or long terminal repeats (LTRs) that delineate an individual TE insertion <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. Homology-based searches of transposable elements are facilitated by specialized tools <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp> that make use of databases of previously identified TEs <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> or leverage repetitive data from the sequenced genome <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>.</p>
         <p>The gold standard of genome annotation is the integration and curation of multiple computational results by a knowledgeable biologist <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. This approach has been advocated for the structural annotation of genes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B11">11</abbr></abbrgrp>, as well as transposable elements <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. A limitation of the manually-curated multiple-evidences approach is that the process requires the combination of computational results from a disparate set of independent annotation programs. The output of this software has been designed to maximize readability by humans and not to facilitate integration of results across programs. Furthermore, these tools are often designed to work on a single contiguous sequence (contig) at a time, while many annotation efforts require the generation of computational results for thousands of assembled contigs. Computational workflow suites that seek to aid in plant genome annotation must therefore overcome these limitations while facilitating the human interpretation of the computational results contributing to a biological annotation.</p>
         <p>Here, we introduce an annotation suite that allows for computational evidences to be generated in an automated fashion, integrates the results from multiple programs and facilitates the human curation of these computational results. This suite was designed to assist a Distributed Annotation Working Group (DAWG) approach for a Pipeline to Annotate Wheat Sequences (PAWS), and we hereafter refer to this effort as DAWGPAWS.</p>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <p>The DAWGPAWS workflow (Figure <figr fid="F1">1</figr>) is distributed as a suite of individual command line interface (CLI) programs written in the Perl programming language. Generally, each program is tailored for an individual step in the annotation process, and it can be used independently of all other programs in the package. This allows users to design an individualized annotation pipeline by selecting those computational components that are most appropriate to their annotation efforts. This modular design also facilitates using DAWGPAWS in a high throughput cluster-computing framework. Large-scale annotation jobs can be split across compute nodes by contigs being annotated as well as by the computational process used to generate computational results.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>An overview of the workflow supported by the current version of the DAWGPAWS suite of programs</p>
            </caption>
            <text>
               <p><b>An overview of the workflow supported by the current version of the DAWGPAWS suite of programs</b>.</p>
            </text>
            <graphic file="1746-4811-5-8-1"/>
         </fig>
         <p>A common thread to each component of the DAWGPAWS package is that computational evidences are translated from the native annotation program output into the standard general feature format (GFF) <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. The GFF file format facilitates integration of multiple computational results. This format can be directly curated by any biologist using standard sequence curation and visualization tools such as Apollo <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, Artemis <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, GBrowse <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, the UCSC genome browser <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> or the Ensembl Genome Browser <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The GFF files also provide a standard format for loading annotation results to relational database schemas such as BioSQL <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> or CHADO <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
         <p>One of the main sets of scripts in the DAWGPAWS package is the batch run program set (Table <tblr tid="T1">1</tblr>). All of these scripts are designed to run individual annotation programs in a high throughput batch mode. They take as their input a directory of sequence files that are to be annotated and a configuration file describing the sets of parameters to use for each sequence file. The output of these batch scripts includes the original output from the annotation program as well as this output translated to the GFF format. The resulting files are stored in a predefined directory structure that allows users to quickly locate the original annotation results as well as the GFF copy. These batch programs exist for both gene and TE annotation results. The <it>ab initio </it>gene annotation programs supported by these scripts include EuG&#232;ne <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, GeneID <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, GeneMark.hmm <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, and Genscan <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The <it>ab initio </it>TE annotation programs that can be run in batch mode are Find_LTR <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, LTR_STRUC <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, LTR_FINDER <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, LTR_seq <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, FINDMITE <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, and Tandem Repeats Finder <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Batch mode scripts also support TE annotation using HMMER <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, NCBI-BLAST <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, RepeatMasker <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, and TEnest <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The full set of gene and TE annotation programs that can be run in batch mode are summarized in Table <tblr tid="T1">1</tblr>.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>DAWGPAWS annotation scripts for generating computational annotation results in batch mode.</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="left">
                     <p>
                        <b>Annotation Program</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Result Type</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>DAWGPAWS Script</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>EuG&#232;ne <abbrgrp><abbr bid="B9">9</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>Gene <it>ab initio </it>and automated combined evidence</p>
                  </c>
                  <c ca="left">
                     <p>batch_eugene.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>GeneID <abbrgrp><abbr bid="B42">42</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>Gene <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_geneid.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>GeneMark.hmm <abbrgrp><abbr bid="B43">43</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>Gene <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_genemark.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Genscan <abbrgrp><abbr bid="B44">44</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>Gene <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_genescan.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Find_LTR <abbrgrp><abbr bid="B45">45</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_findltr.pl*</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LTR_STRUC <abbrgrp><abbr bid="B20">20</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_ltrstruc.vbs</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LTR_FINDER <abbrgrp><abbr bid="B21">21</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_ltrfinder.pl*</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LTR_seq <abbrgrp><abbr bid="B46">46</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_ltrseq.pl*</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>FINDMITE <abbrgrp><abbr bid="B19">19</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_findmite.pl*</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Tandem Repeats Finder <abbrgrp><abbr bid="B47">47</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>Repeat <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>batch_trf.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HMMER <abbrgrp><abbr bid="B48">48</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE homology</p>
                  </c>
                  <c ca="left">
                     <p>batch_hmmer.pl*</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>NCBI-BLAST <abbrgrp><abbr bid="B49">49</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE and gene homology</p>
                  </c>
                  <c ca="left">
                     <p>batch_blast.pl*</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RepeatMasker <abbrgrp><abbr bid="B22">22</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE homology</p>
                  </c>
                  <c ca="left">
                     <p>batch_repmask.pl*</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>TEnest <abbrgrp><abbr bid="B24">24</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE homology</p>
                  </c>
                  <c ca="left">
                     <p>batch_tenest.pl</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>These scripts operate on a directory of FASTA files, and generate the native results of the annotation program as well as the GFF file format. The exception is the batch_ltrstruc.vbs visual basic script that must be used in conjunction with cnv_ltrstruc2gff.pl to generate results in GFF.</p>
               <p>* Indicates programs that make use of a configuration file. The nature and format of the configuration file for these programs is described in the individual help file for those programs.</p>
            </tblfn>
         </tbl>
         <p>In addition to the batch run programs, scripts that convert an individual annotation program output to GFF are also available (Table <tblr tid="T2">2</tblr>). These programs allow an existing annotation result to be specified, or they can take advantage of UNIX standard streams. If an input file is not specified, the conversion scripts will expect input from the standard input stream. Likewise, if the output path is not specified, these programs will write the output to a standard output stream. Accepting standard input and output streams facilitates using these programs as supplements to an existing workflow. For example, data can be piped directly from the output stream of an annotation program to a DAWGPAWS converter, and then piped on to a parser that loads the GFF formatted result to a database. These conversion programs provide the ability to support conversion of output from programs such as FGENESH <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp> and RepSeek <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> that are not supported by batch scripts in DAWGPAWS.</p>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>DAWGPAWS scripts for conversion of annotation results from native program output to GFF.</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="left">
                     <p>
                        <b>Annotation Program</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Result Type</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>DAWGPAWS Script</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>FGENESH <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>Gene <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>cnv_fgenesh2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>GeneMark.hmm <abbrgrp><abbr bid="B43">43</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>Gene <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>cnv_genemark2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Find_LTR <abbrgrp><abbr bid="B45">45</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>cnv_findltr2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LTR_FINDER <abbrgrp><abbr bid="B21">21</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>cnv_ltrfinder2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LTR_seq <abbrgrp><abbr bid="B46">46</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>cnv_ltrseq2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LTR_STRUC <abbrgrp><abbr bid="B20">20</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>cnv_ltrstruc2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RepSeek <abbrgrp><abbr bid="B52">52</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE <it>ab initio</it></p>
                  </c>
                  <c ca="left">
                     <p>cnv_repseek2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>NCBI-BLAST <abbrgrp><abbr bid="B49">49</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE and gene homology</p>
                  </c>
                  <c ca="left">
                     <p>cnv_blast2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RepeatMasker <abbrgrp><abbr bid="B22">22</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE homology</p>
                  </c>
                  <c ca="left">
                     <p>cnv_repmask2gff.pl</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>TEnest <abbrgrp><abbr bid="B24">24</abbr></abbrgrp></p>
                  </c>
                  <c ca="left">
                     <p>TE homology</p>
                  </c>
                  <c ca="left">
                     <p>cnv_tenest2gff.pl</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>The DAWGPAWS suite also includes specialized tools for TE annotation. For identification of the highly repetitive regions of a contig, the seq_oligocount.pl program can count the occurrence of oligomers in the query sequence against an index of random shotgun sequences. This program generates all oligomers of length k from the query sequence, and uses the vmatch program <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> to determine the number of these k-mers that occur in a random shotgun sequence data set generated by mkvtree <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. The output of this program is a GFF file indicating the count of these k-mers in the shotgun sequence dataset. These results may be used to identify the mathematically defined repeats in the query sequence, as well as provides a means to visualize low-copy-number runs in the query sequence <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>.</p>
         <p>In addition to the gene and TE annotation-specific scripts included in the DAWGPAWS package, helper applications are also included (Table <tblr tid="T3">3</tblr>). These CLI programs fulfill needs that occur when generating annotation results. They allow for file conversion such as the conversion of GFF to game.xml format or the conversion of a lowercase masked sequence file to a hard masked sequence file. They also prepare the sequence files for annotation by shortening FASTA headers as required by some programs, or by splitting a single FASTA file containing multiple records into multiple FASTA files containing single record files. The ability to generate Euler Diagrams is also supported via the vennseq.pl conversion script that formats GFF file data for input into the VennMaster program <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>.</p>
         <tbl id="T3">
            <title>
               <p>Table 3</p>
            </title>
            <caption>
               <p>Additional helper scripts included in the DAWGPAWS package.</p>
            </caption>
            <tblbdy cols="2">
               <r>
                  <c ca="left">
                     <p>
                        <b>DAWGPAWS Script</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Purpose</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>cnv_gff2game.pl</p>
                  </c>
                  <c ca="left">
                     <p>Converts GFF files to the game.xml format.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>cnv_game2gff3.pl</p>
                  </c>
                  <c ca="left">
                     <p>Converts game.xml files to the GFF3 format.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>batch_hardmask.pl</p>
                  </c>
                  <c ca="left">
                     <p>Given a directory of lowercase masked sequence files, this will replace lowercase residues with an N or X to indicate masking.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>dir_merge.pl</p>
                  </c>
                  <c ca="left">
                     <p>Given annotation results scattered across multiple directories, this program can merge the results into subdirectories in a single parent directory.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>vennseq.pl</p>
                  </c>
                  <c ca="left">
                     <p>Given GFF annotation results from multiple methods, this program generates a Euler Diagram of these features using the VennMaster program <abbrgrp><abbr bid="B55">55</abbr></abbrgrp></p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>batch_findgaps.pl</p>
                  </c>
                  <c ca="left">
                     <p>This program will annotate gaps in the query sequences in the input directory.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>clust_write_shell.pl</p>
                  </c>
                  <c ca="left">
                     <p>This program writes shell scripts to run DAWGPAWS in a cluster environment running the Platform LSF queuing system.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>cnv_seq2dir.pl</p>
                  </c>
                  <c ca="left">
                     <p>Given a FASTA file with multiple sequence files, this program generates a separate FASTA file for each sequence record. The sequence files produced are named using the sequence ID in the FASTA header in the input file.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>fasta_merge.pl</p>
                  </c>
                  <c ca="left">
                     <p>This program merges all FASTA files in a directory into a single FASTA file.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>fasta_shorten.pl</p>
                  </c>
                  <c ca="left">
                     <p>This program shortens the FASTA header by limiting the header length, or splitting the header by a delimiting character. Some annotation programs are limited by the length of the FASTA header that is accepted, and this programs allows input files to meet this limitation.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>fetch_tenest.pl</p>
                  </c>
                  <c ca="left">
                     <p>Fetches multiple results from the Plant GDB TEnest server and converts the results to GFF.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>gff_seg.pl</p>
                  </c>
                  <c ca="left">
                     <p>Given a GFF file that contains point or segment data, this will extract segments with score values that exceed a threshold value.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>ltrstruc_prep.pl</p>
                  </c>
                  <c ca="left">
                     <p>Because the LTR_STRUC program only runs under the windows environment, this program converts FASTA sequences in UNIX to DOS line endings and generates the files name and flist file required for LTR_STRUC.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>seq_oligiocount.pl</p>
                  </c>
                  <c ca="left">
                     <p>This program allows for the generation of a GFF file that counts the number of times an oligomer in the genomic contig occurs in a reference shotgun sequence database.</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>A CLI interface was selected for DAWGPAWS to facilitate the use of our applications in a cluster-computing environment, and to provide stability in program interface across multiple operating systems. While command line interface programs may be daunting to some users, every effort has been made to simplify their use. All of the CLI programs included in the DAWGPAWS suite follow consistent protocols for command line options (Table <tblr tid="T4">4</tblr>). Help files or full program manuals are available from the command line within all programs by invoking the &#8211; help or &#8211; man options. These application manuals are also available in HTML form on the DAWGPAWS website along with a general program manual describing the installation and use of a local implementation of the DAWGPAWS package <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. This documentation is also included in the downloadable release of DAWGPAWS.</p>
         <tbl id="T4">
            <title>
               <p>Table 4</p>
            </title>
            <caption>
               <p>Common command line options used throughout the DAWGPAWS suite of programs.</p>
            </caption>
            <tblbdy cols="2">
               <r>
                  <c ca="left">
                     <p>
                        <b>Option</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Description</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--indir <it>or</it></p>
                     <p>--infile</p>
                  </c>
                  <c ca="left">
                     <p>For batch scripts, this indicates the input directory containing the FASTA files to annotate. For conversion scripts, this indicates the input file to convert from the native format to the GFF format.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--outdir <it>or</it></p>
                     <p>--outfile</p>
                  </c>
                  <c ca="left">
                     <p>For batch scripts, this indicates the output directory containing the annotation results for the program and the GFF results.</p>
                     <p>For conversion scripts, this indicates the path to the GFF output file.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--config</p>
                  </c>
                  <c ca="left">
                     <p>For programs that make use of a configuration file, this indicates the path to the configuration file to use.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--seqname</p>
                  </c>
                  <c ca="left">
                     <p>For conversion scripts, this indicates the sequence id to use in the GFF output file.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--param</p>
                  </c>
                  <c ca="left">
                     <p>For conversion scripts, this indicates the name of that parameter set used with the annotation program. This option allows the user to distinguish among multiple parameter sets for the same annotation program, and this parameter name is appended to the source column of the GFF output file.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--program</p>
                  </c>
                  <c ca="left">
                     <p>For conversion scripts, this indicates the name of the program used to generate the annotation result.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--version</p>
                  </c>
                  <c ca="left">
                     <p>Print the current version of the script.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--usage</p>
                  </c>
                  <c ca="left">
                     <p>Print a short program usage message.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--help</p>
                  </c>
                  <c ca="left">
                     <p>Print a short help message including the common usage and all program options available at the command line.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--man</p>
                  </c>
                  <c ca="left">
                     <p>Print the full program manual.</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>--verbose</p>
                  </c>
                  <c ca="left">
                     <p>This will run the program with maximum verbosity. This option will generate status updates while the program is running, and will maximize the error reporting functions of the script. All verbose statements are written to the standard error output stream.</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>The computational annotation results generated by DAWGPAWS can be directly imported into any genome annotation program that supports GFF. We have used the Apollo program <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> to visualize and curate our results for genes and transposable elements in the wheat genome (Figure <figr fid="F2">2</figr>). Since the game xml file format is the most stable way to store annotation results in Apollo, it is generally useful to first convert GFF files to the game xml format before beginning curation of computational results. The visual display of computational results in Apollo is modified by a tiers configuration file. This file controls how and where individual computational and annotation results are drawn on the annotation pane. The tiers file used in these annotation efforts is included in the DAWGPAWS download package, and it can serve as a starting point for generating individualized tier files for other plant annotation efforts. As an alternative to Apollo, it is also possible to curate computational results using the Artemis sequence visualization program <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Screen capture image of gene and TE annotation results visualized in the Apollo genome annotation program</p>
            </caption>
            <text>
               <p><b>Screen capture image of gene and TE annotation results visualized in the Apollo genome annotation program</b>. This example shown is for a wheat BAC that has been annotated and curated with the assistance of DAWGPAWS.</p>
            </text>
            <graphic file="1746-4811-5-8-2"/>
         </fig>
         <p>The GBrowse package <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> can also visualize GFF formatted annotations, and has proven to be a useful method for visualizing TE results. GBrowse makes use of core images called glyphs that are used to draw sequence features along a genome. The available glyphs in GBrowse can be supplemented by writing additional Perl modules, and we have generated TE glyphs that allow visualization of the biologically relevant features of TEs. GBrowse also has the capability to draw histograms along the sequence contigs. GBrowse can thus combine TE glyphs and histograms to provide an informative visualization of the distribution of mathematically defined repeats and the structural features of TEs (Figure <figr fid="F3">3</figr>). The current drawback to visualizations in GBrowse is that the program is intended to serve as a static visualization tool, and does not provide the means for the curation and combination of computational results. It would therefore be helpful if the current curation programs for gene annotation, such as Apollo or Artemis, directly addressed the needs of TE annotation curation and developed glyphs for the major classes of TEs.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Screen capture image of the TE annotation results and oligomer counts visualized in the GBrowse genome annotation visualization program</p>
            </caption>
            <text>
               <p><b>Screen capture image of the TE annotation results and oligomer counts visualized in the GBrowse genome annotation visualization program</b>. The example shown is for a 15 kb segment of a BAC with a wheat DNA insert.</p>
            </text>
            <graphic file="1746-4811-5-8-3"/>
         </fig>
         <p>In addition to visualization and curation of the annotated DNA, it is also possible to transfer the DAWGPAWS results into existing database schema. For example, the CHADO database <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> can make use of the gmod_bulk_load_gff3.pl program <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> that can load GFF3 format files into a CHADO database. In the DAWGPAWS package, the GFF3 format files from curated results can be generated with the cnv_game2gff3.pl program. These curated results could then be stored in a local implementation of the CHADO database. The BioSQL database schema <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> also includes a bp_load_gff.pl script that can load GFF results into the database schema.</p>
         <p>The DAWGPAWS annotation framework has a number of features that make it a good choice to facilitate the workflow in plant genome annotation. The use of configuration files makes it fairly easy to modify the annotation workflow for the species of interest. The configuration files also makes it quite easy to generate results with multiple parameter sets for an individual program. Using multiple parameter sets will be especially useful when working with a genome that has not been annotated before, and for which appropriate annotation parameters have not been identified. Also, while previous annotation pipelines have focused on gene annotation, the DAWGPAWS suite maximizes the quality of TE annotation results. Most plants contain genomes with sizes > 5000 Mb <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>, and are therefore expected to contain more than 80% TEs <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>, so efficiently dealing with this large number and diverse set of mobile DNAs is necessary for effective genome annotation.</p>
         <p>The current focus of DAWGPAWS in our laboratory is the structural annotation of the genes and TEs in a genome using methods and applications tuned to the Triticeae. In annotation of 220 BACs from hexaploid bread wheat, we found that the DAWGPAWS pipeline increased the rate of individual BAC annotations by twenty-fold. Due to the time required to manually generate annotation results, this previous annotation effort was limited to using the FGENESH annotation program combined with a BLAST search of predicted models against known transposable elements and protein databases <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Using this method, annotators could annotate a single BAC in one to two days. The implementation of the DAWGPAWS pipeline increased the speed of annotation to ten-fifteen BACs per person per day. Furthermore, the quality of both TE and gene prediction were also seen to improve with the use of DAWGPAWS. This was due, at least in part, to the larger number of complementary programs for TE and gene discovery that could be conveniently employed in each BAC annotation. Specifically, the inclusion of <it>ab initio </it>TE prediction programs allowed for the identification of new families of LTR retrotransposons that would have been missed in our previous annotation efforts. Predicted gene models that span these newly discovered families would not have been identified as TEs in the exclusively homology-based searches that were previously used.</p>
         <p>Future development of DAWGPAWS will incorporate tools for the functional annotation of the predicted genes. Currently, functional annotation can be done within the Apollo program by manually selecting individual gene models and BLASTing these results against appropriate databases. A batch run support for additional local alignment search tools will also be added. The use of NCBI-BLAST is sufficient for most comparisons of sequence contigs against reference databases, but programs such as BLAT <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> or sim4 <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> are designed specifically to align ESTs and flcDNAs against assembled genomes. While output from these local alignment tools can be converted to GFF using the existing cnv_blast2gff.pl program in DAWGPAWS, it would be useful to use these packages in a batch run framework similar to the batch_blast.pl program.</p>
         <p>Support for additional <it>ab initio </it>gene annotation programs will also be added to future releases of DAWGPAWS. Augustus <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> is an <it>ab initio </it>annotation program that will be useful for gene annotation that seeks to identify all transcripts derived from a single locus. Support for GENEZILLA <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> and GlimmerHMM <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> gene annotation packages will also be added to future releases of DAWGPAWS. The SNAP program <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> will be added to support the annotation of genomes that have been sequenced <it>de novo </it>and lack species-specific HMM model parameterizations. The addition of the PASA <abbrgrp><abbr bid="B66">66</abbr></abbrgrp> program would assist in the annotation of genomes that have large transcript databases that can assist genome annotation. As additional fully-sequenced genomes are added to the plant genomics literature, we can make use of syntenic comparisons and multiple alignments to aid in gene annotation <abbrgrp><abbr bid="B67">67</abbr></abbrgrp> as well as TE annotation <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>. Future development of DAWGPAWS will incorporate syntenic alignment and prediction programs such as SGP2 <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>, SLAM <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>, and TWINSCAN <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> as they become increasingly relevant to plant genome annotation.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The DAWGPAWS annotation workflow provides a suite of command line interface programs that can generate computational evidences for human curation in a high-throughput fashion. We have used the DAWGPAWS pipeline to annotate 220 randomly selected BACs with wheat DNA inserts for both gene and TE content. Our curation efforts on the DAWGPAWS output are implemented in the Apollo program. The tiers file used for visualization of this curation are available as part of the DAWGPAWS package.</p>
         <p>DAWGPAWS represents an efficient tool for genome annotation in the Triticeae, and can be used in its current form to generate gene and TE computational results for other grass genomes. Minor modifications to the configuration files used by DAWGPAWS can make this program suitable to the generation of computational annotation results for any plant genome. The TE annotation capabilities of DAWGPAWS exceeds any other current genome annotation suite, and makes this package particularly valuable for the great majority of plant genomes, such as wheat or maize, that contain a diverse arrays of TEs that comprise the majority of the nuclear genome.</p>
         <p>The DAWGPAWS program has been specifically designed to facilitate use of individual component scripts outside of the entire package. Each script can function independently of all other applications in the package, and programs make use of standard input and standard output streams when possible to facilitate integration into existing pipelines. Since this package is being released under the open source GPL (version 3), the suite and its individual components can be used and modified under the terms of the GPL. Template batch run and conversion scripts are provided in a boilerplate format to facilitate extending DAWGPAWS to additional annotation tools. Furthermore, since we have selected the Perl language for the implementation of our package, the addition of new annotation tools can leverage existing modules in the BioPerl toolkit <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>. These modules include parsers for computational tools useful for predicting alternative splicing <abbrgrp><abbr bid="B62">62</abbr><abbr bid="B61">61</abbr></abbrgrp> as well as interfaces for transfer RNA prediction <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. We also formally invite collaboration in the development of additional DAWGPAWS applications under the auspices of the GNU GPL, as facilitated by the SourceForge subversion repository of the DAWGPAWS source code. Interested collaborators may contact the authors or become member developers of the DAWGPAWS SourceForge project <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>Project Name: DAWGPAWS Plant Genome Annotation Pipeline</p>
         <p>Project Home Page: <url>http://dawgpaws.sourceforge.net/</url></p>
         <p>Operating System: Platform Independent</p>
         <p>Programming Language: Perl</p>
         <p>Other Requirements: BioPerl 1.4, as well as the annotation programs that scripts are dependent upon.</p>
         <p>License: GNU General Public License 3</p>
         <p>Any restrictions to use by non-academics: No restrictions</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>BAC: Bacterial Artificial Chromosome; cDNA: complementary DNA; CLI: Command Line Interface; EST: Expressed Sequence Tag; flcDNA: full-length complementary DNA; GFF: General Feature Format; GPL: General Public License; HMM: Hidden Markov Model; LTR: Long Terminal Repeat; ORF: Open Reading Frame; pHMM: Profile Hidden Markov Model; TE: Transposable Element</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JE developed the pipeline, wrote the software, and drafted the manuscript. JB conceived the study, oversaw pipeline development, and helped draft the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank Katrien Devos, Antonio Costa de Oliveira, Xiangyang Xu, Ansuya Jogi, and Jennifer Hawkins for their useful feedback that has been incorporated in the implementation of DAWGPAWS. Xiangyang Xu provided helpful comments on a draft version of this manuscript. The submitted version of this manuscript was refined with constructive comments from two anonymous peer reviewers. This work was supported by NSF grants DBI-0501814 and DBI-0607123.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata</p>
            </title>
            <aug>
               <au>
                  <snm>Liolios</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mavromatis</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tavernarakis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>NC</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <volume>36</volume>
            <fpage>D475</fpage>
            <lpage>479</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238992</pubid>
                  <pubid idtype="pmpid" link="fulltext">17981842</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Database resources of the National Center for Biotechnology Information</p>
            </title>
            <aug>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Benson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Canese</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chetvernin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Dicuccio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Edgar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <volume>36</volume>
            <fpage>D13</fpage>
            <lpage>21</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238880</pubid>
                  <pubid idtype="pmpid" link="fulltext">18045790</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Bioinformatics challenges of new sequencing technology</p>
            </title>
            <aug>
               <au>
                  <snm>Pop</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>142</fpage>
            <lpage>149</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2680276</pubid>
                  <pubid idtype="pmpid" link="fulltext">18262676</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Genome annotation: from sequence to biology</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>493</fpage>
            <lpage>503</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11433356</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Transposable element contributions to plant gene and genome evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>42</volume>
            <fpage>251</fpage>
            <lpage>269</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10688140</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Transposable elements, gene creation and genome rearrangement in flowering plants</p>
            </title>
            <aug>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>621</fpage>
            <lpage>627</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16219458</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>A brief review of computational gene prediction methods</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Genomics Proteomics Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <fpage>216</fpage>
            <lpage>221</lpage>
            <xrefbib>
               <pubid idtype="pmpid">15901250</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Computational approaches to gene prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Do</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>DK</fnm>
               </au>
            </aug>
            <source>J Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>44</volume>
            <fpage>137</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16728949</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>EuGene: An Eucaryotic Gene Finder that combines several sources of evidence</p>
            </title>
            <aug>
               <au>
                  <snm>Schiex</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Moisan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rouz&#233;</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Computational Biology</source>
            <editor>Gascuel O, Sagot M-F</editor>
            <pubdate>2001</pubdate>
            <fpage>111</fpage>
            <lpage>125</lpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>JIGSAW: integration of multiple sources of evidence for gene prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3596</fpage>
            <lpage>3603</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16076884</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Consistent over-estimation of gene number in complex plant genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Coleman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ramakrishna</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Curr Opin Plant Biol</source>
            <pubdate>2004</pubdate>
            <volume>7</volume>
            <fpage>732</fpage>
            <lpage>736</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15491923</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>Sanmiguel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Annals of Botany</source>
            <pubdate>1998</pubdate>
            <volume>82</volume>
            <fpage>37</fpage>
            <lpage>44</lpage>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Discovering and detecting transposable elements in genome sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Quesneville</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>382</fpage>
            <lpage>392</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17932080</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Computational analysis and paleogenomics of interspersed repeats in eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Feschotte</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Pritham</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>Computational Genomics: Current Methods</source>
            <publisher>Stojanovic N: Taylor and Francis</publisher>
            <pubdate>2007</pubdate>
            <fpage>31</fpage>
            <lpage>54</lpage>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Computational Approaches and Tools Used in Identification of Dispersed Repetitive DNA Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Saha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bridges</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Magbanua</snm>
                  <fnm>ZV</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Tropical Plant Biology</source>
            <pubdate>2008</pubdate>
            <volume>1</volume>
            <fpage>85</fpage>
            <lpage>96</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Automated de novo identification of repeat sequence families in sequenced genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Bao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1269</fpage>
            <lpage>1276</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186642</pubid>
                  <pubid idtype="pmpid" link="fulltext">12176934</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>PILER: identification and classification of genomic repeats</p>
            </title>
            <aug>
               <au>
                  <snm>Edgar</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>Suppl 1</issue>
            <fpage>i152</fpage>
            <lpage>158</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15961452</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>De novo identification of repeat families in large genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Price</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>PA</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>Suppl 1</issue>
            <fpage>i351</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15961478</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae</p>
            </title>
            <aug>
               <au>
                  <snm>Tu</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>1699</fpage>
            <lpage>1704</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29320</pubid>
                  <pubid idtype="pmpid" link="fulltext">11172014</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>LTR_STRUC: a novel search and identification program for LTR retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>McCarthy</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>362</fpage>
            <lpage>367</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12584121</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>Xu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>W265</fpage>
            <lpage>268</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1933203</pubid>
                  <pubid idtype="pmpid" link="fulltext">17485477</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>RepeatMasker Open-3.0. 1996&#8211;2004</p>
            </title>
            <aug>
               <au>
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hubley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <url>http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker</url>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor</p>
            </title>
            <aug>
               <au>
                  <snm>Kohany</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Gentles</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Hankus</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>474</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1634758</pubid>
                  <pubid idtype="pmpid" link="fulltext">17064419</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>TEnest: Automated chronological annotation and visualization of nested plant transposable elements</p>
            </title>
            <aug>
               <au>
                  <snm>Kronmiller</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Wise</snm>
                  <fnm>RP</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2008</pubdate>
            <volume>146</volume>
            <fpage>45</fpage>
            <lpage>59</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2230558</pubid>
                  <pubid idtype="pmpid" link="fulltext">18032588</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Automated paleontology of repetitive DNA with REANNOTATE</p>
            </title>
            <aug>
               <au>
                  <snm>Pereira</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>614</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2672092</pubid>
                  <pubid idtype="pmpid" link="fulltext">19094224</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>TREP: a database for Triticeae repetitive elements</p>
            </title>
            <aug>
               <au>
                  <snm>Wicker</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Trends in Plant Science</source>
            <pubdate>2002</pubdate>
            <volume>7</volume>
            <fpage>561</fpage>
            <lpage>562</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants</p>
            </title>
            <aug>
               <au>
                  <snm>Ouyang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Buell</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D360</fpage>
            <lpage>363</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308833</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681434</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Repbase Update, a database of eukaryotic repetitive elements</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kapitonov</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Pavlicek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Klonowski</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kohany</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Walichiewicz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cytogenet Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>110</volume>
            <fpage>462</fpage>
            <lpage>467</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16093699</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>MIPSPlantsDB &#8211; plant database resource for integrative and comparative plant genome research</p>
            </title>
            <aug>
               <au>
                  <snm>Spannagl</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Noubibou</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Haase</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gundlach</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hindemitt</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Klee</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Haberer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schoof</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>KF</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D834</fpage>
            <lpage>840</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1899105</pubid>
                  <pubid idtype="pmpid" link="fulltext">17202173</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>GK</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>e43</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1232128</pubid>
                  <pubid idtype="pmpid" link="fulltext">16184192</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>DeBarry</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>235</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2412881</pubid>
                  <pubid idtype="pmpid" link="fulltext">18474116</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Narechania</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Ware</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>517</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2613927</pubid>
                  <pubid idtype="pmpid" link="fulltext">18976482</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Combined evidence annotation of transposable elements in genome sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Quesneville</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Andrieu</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Autard</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Nouaud</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Anxolabehere</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>166</fpage>
            <lpage>175</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1185648</pubid>
                  <pubid idtype="pmpid" link="fulltext">16110336</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>GFF Format Specifications</p>
            </title>
            <url>http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml</url>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Apollo: a sequence annotation editor</p>
            </title>
            <aug>
               <au>
                  <snm>Lewis</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lyer</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wiel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bayraktaroglir</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0082</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151184</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537571</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Artemis: sequence visualization and annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Crook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Horsnell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>944</fpage>
            <lpage>945</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11120685</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Using the Generic Genome Browser (GBrowse)</p>
            </title>
            <aug>
               <au>
                  <snm>Donlin</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Curr Protoc Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>Chapter 9</volume>
            <issue>Unit 9</issue>
            <fpage>9</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18428797</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>The UCSC Genome Browser Database: update 2009</p>
            </title>
            <aug>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zweig</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Rhead</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Raney</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Pohl</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pheasant</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2009</pubdate>
            <volume>37</volume>
            <fpage>D755</fpage>
            <lpage>761</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2686463</pubid>
                  <pubid idtype="pmpid" link="fulltext">18996895</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The Ensembl Web site: mechanics of a genome browser</p>
            </title>
            <aug>
               <au>
                  <snm>Stalker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gibbins</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Meidl</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Spooner</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hotz</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>AV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>951</fpage>
            <lpage>955</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">479125</pubid>
                  <pubid idtype="pmpid" link="fulltext">15123591</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>BioSQL</p>
            </title>
            <url>http://www.biosql.org</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Using Chado to store genome annotation data</p>
            </title>
            <aug>
               <au>
                  <snm>Zhou</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Emmert</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Curr Protoc Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>Chapter 9</volume>
            <issue>Unit 9</issue>
            <fpage>6</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18428772</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>GeneID in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Blanco</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>511</fpage>
            <lpage>515</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310871</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779490</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>GeneMark.hmm: new solutions for gene finding</p>
            </title>
            <aug>
               <au>
                  <snm>Lukashin</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>1107</fpage>
            <lpage>1115</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147337</pubid>
                  <pubid idtype="pmpid" link="fulltext">9461475</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Prediction of complete gene structures in human genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>78</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9149143</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>De novo identification of LTR retrotransposons in eukaryotic genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Rho</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>90</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1858694</pubid>
                  <pubid idtype="pmpid" link="fulltext">17407597</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Efficient algorithms and software for detection of full-length LTR retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>Kalyanaraman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Aluru</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <fpage>197</fpage>
            <lpage>216</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16819780</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Tandem repeats finder: a program to analyze DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Benson</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>573</fpage>
            <lpage>580</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148217</pubid>
                  <pubid idtype="pmpid" link="fulltext">9862982</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Profile hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>755</fpage>
            <lpage>763</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9918945</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Identification of human gene structure using linear discriminant functions and dynamic programming</p>
            </title>
            <aug>
               <au>
                  <snm>Solovyev</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Salamov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>3</volume>
            <fpage>367</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7584460</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Ab initio gene finding in Drosophila genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Salamov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Solovyev</snm>
                  <fnm>VV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>516</fpage>
            <lpage>522</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310882</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779491</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Repseek, a tool to retrieve approximate repeats from large DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Achaz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Boyer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Rocha</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Viari</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Coissac</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>119</fpage>
            <lpage>121</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17038345</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>vmatch</p>
            </title>
            <url>http://www.vmatch.de/</url>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats</p>
            </title>
            <aug>
               <au>
                  <snm>Wicker</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Narechania</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sabot</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vu</snm>
                  <fnm>GT</fnm>
               </au>
               <au>
                  <snm>Graner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ware</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>518</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2584661</pubid>
                  <pubid idtype="pmpid" link="fulltext">18976483</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Generalized Venn diagrams: a new method of visualizing complex genetic set relations</p>
            </title>
            <aug>
               <au>
                  <snm>Kestler</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gress</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Buchholz</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1592</fpage>
            <lpage>1595</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15572472</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>DAWGPAWS User Manual</p>
            </title>
            <url>http://dawgpaws.sourceforge.net/man.html</url>
         </bibl>
         <bibl id="B57">
            <title>
               <p>How to Load GFF Into Chado</p>
            </title>
            <url>http://gmod.org/wiki/Load_GFF_Into_Chado</url>
         </bibl>
         <bibl id="B58">
            <title>
               <p>First nuclear DNA amounts in more than 300 angiosperms</p>
            </title>
            <aug>
               <au>
                  <snm>Zonneveld</snm>
                  <fnm>BJM</fnm>
               </au>
               <au>
                  <snm>Leitch</snm>
                  <fnm>IJ</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>MD</fnm>
               </au>
            </aug>
            <source>Annals of Botany</source>
            <pubdate>2005</pubdate>
            <volume>96</volume>
            <fpage>229</fpage>
            <lpage>244</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15905300</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Genome Size and Proportion of Repeated Nucleotide-Sequence DNA in Plants</p>
            </title>
            <aug>
               <au>
                  <snm>Flavell</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Biochemical Genetics</source>
            <pubdate>1974</pubdate>
            <volume>12</volume>
            <fpage>257</fpage>
            <lpage>269</lpage>
            <xrefbib>
               <pubid idtype="pmpid">4441361</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Structure and organization of the wheat genome &#8211; the number of genes in the hexaploid wheat genome</p>
            </title>
            <aug>
               <au>
                  <snm>Devos</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Costa de Oliveira</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Estill</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Estep</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jogi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Morales</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pinheiro</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>SanMiguel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>11th International Wheat Genetics Symposium 2008 Proceedings</source>
            <publisher>Sydney University Press</publisher>
            <editor>Rudi Appels, Russell Eastwood, Evans Lagudah, Peter Langridge, Michael Mackay, Lynne McIntyre, Peter Sharp</editor>
            <pubdate>2008</pubdate>
            <fpage>1</fpage>
            <lpage>5</lpage>
            <url>http://ses.library.usyd.edu.au/bitstream/2123/3389/1/O25.pdf</url>
         </bibl>
         <bibl id="B61">
            <title>
               <p>BLAT &#8211; the BLAST-like alignment tool</p>
            </title>
            <aug>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>656</fpage>
            <lpage>664</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187518</pubid>
                  <pubid idtype="pmpid" link="fulltext">11932250</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>A computer program for aligning a cDNA sequence with a genomic DNA sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Florea</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hartzell</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>967</fpage>
            <lpage>974</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310774</pubid>
                  <pubid idtype="pmpid" link="fulltext">9750195</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>AUGUSTUS: ab initio prediction of alternative transcripts</p>
            </title>
            <aug>
               <au>
                  <snm>Stanke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Gunduz</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Hayes</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Waack</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>W435</fpage>
            <lpage>439</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1538822</pubid>
                  <pubid idtype="pmpid" link="fulltext">16845043</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders</p>
            </title>
            <aug>
               <au>
                  <snm>Majoros</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Pertea</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>2878</fpage>
            <lpage>2879</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15145805</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Gene finding in novel genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>59</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">421630</pubid>
                  <pubid idtype="pmpid" link="fulltext">15144565</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</p>
            </title>
            <aug>
               <au>
                  <snm>Haas</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Mount</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>RK</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Hannick</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Maiti</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ronning</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Town</snm>
                  <fnm>CD</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>5654</fpage>
            <lpage>5666</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">206470</pubid>
                  <pubid idtype="pmpid" link="fulltext">14500829</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Comparative analysis and visualization of genomic sequences using VISTA browser and associated computational tools</p>
            </title>
            <aug>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Methods Mol Biol</source>
            <pubdate>2007</pubdate>
            <volume>395</volume>
            <fpage>3</fpage>
            <lpage>16</lpage>
            <xrefbib>
               <pubid idtype="pmpid">17993664</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Identification of transposable elements using multiple alignments of related genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Caspi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>260</fpage>
            <lpage>270</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1361722</pubid>
                  <pubid idtype="pmpid" link="fulltext">16354754</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Comparative gene prediction in human and mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Wiehe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>108</fpage>
            <lpage>117</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430976</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529313</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model</p>
            </title>
            <aug>
               <au>
                  <snm>Alexandersson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cawley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>496</fpage>
            <lpage>502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430255</pubid>
                  <pubid idtype="pmpid" link="fulltext">12618381</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Integrating genomic homology into gene structure prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Duan</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>Suppl 1</issue>
            <fpage>S140</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11473003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>The Bioperl toolkit: Perl modules for the life sciences</p>
            </title>
            <aug>
               <au>
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Block</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Boulez</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Chervitz</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Dagdigian</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fuellen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lapp</snm>
                  <fnm>H</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1611</fpage>
            <lpage>1618</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187536</pubid>
                  <pubid idtype="pmpid" link="fulltext">12368254</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Lowe</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>955</fpage>
            <lpage>964</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146525</pubid>
                  <pubid idtype="pmpid" link="fulltext">9023104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>DAWGPAWS SourceForge Project Page</p>
            </title>
            <url>http://sourceforge.net/projects/dawgpaws/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
