Email updates

Keep up to date with the latest news and content from Plant Methods and BioMed Central.

Open Access Review

Analysing complex Triticeae genomes – concepts and strategies

Manuel Spannagl*, Mihaela M Martis, Matthias Pfeifer, Thomas Nussbaumer and Klaus FX Mayer*

Author Affiliations

MIPS/IBIS, Helmholtz Center Munich, National Research Center for Environment and Health, Ingolstaedter Landstr. 1, Neuherberg, Germany

For all author emails, please log on.

Plant Methods 2013, 9:35  doi:10.1186/1746-4811-9-35

Published: 6 September 2013

Abstract

The genomic sequences of many important Triticeae crop species are hard to assemble and analyse due to their large genome sizes, (in part) polyploid genomes and high repeat content. Recently, the draft genomes of barley and bread wheat were reported thanks to cost-efficient and fast NGS technologies. The genome of barley is estimated to be 5 Gb in size whereas the genome of bread wheat accounts for 17 Gb and harbours an allo-hexaploid genome. Direct assembly of the sequence reads and access to the gene content is hampered by the repeat content. As a consequence, novel strategies and data analysis concepts had to be developed to provide much-needed whole genome sequence surveys and access to the gene repertoires. Here we describe some analytical strategies that now enable structuring of massive NGS data generated and pave the way towards structured and ordered sequence data and gene order. Specifically we report on the GenomeZipper, a synteny driven approach to order and structure NGS survey sequences of grass genomes that lack a physical map. In addition, to access and analyse the gene repertoire of allo-hexaploid bread wheat from the raw sequence reads, a reference-guided approach was developed utilizing representative genes from rice, Brachypodium distachyon, sorghum and barley. Stringent sub-assembly on the reference genes prevented collapsing of homeologous wheat genes and allowed to estimate gene retention rate and determine gene family sizes. Genomic sequences from the wheat sub-genome progenitors enabled to discriminate a large number of sub-assemblies between the wheat A, B or D sub-genome using machine learning algorithms. Many of the concepts outlined here can readily be applied to other complex plant and non-plant genomes.

Keywords:
Triticeae genomes; Grass genomes; Wheat genome; Barley genome; GenomeZipper; Genome analysis