Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Quick Search
My eDoc
Session History
Support Wiki
Direct access to
document ID:

          Institute: MPI für Entwicklungsbiologie     Collection: Abteilung 6 - Molecular Biology (D. Weigel)     Display Documents

ID: 591489.0, MPI für Entwicklungsbiologie / Abteilung 6 - Molecular Biology (D. Weigel)
Reference-guided assembly of four diverse Arabidopsis thaliana genomes
Authors:Schneeberger, K.; Ossowski, S.; Ott, F.; Klein, J. D.; Wang, X.; Lanz, C.; Smith, L. M.; Cao, J.; Fitz, J.; Warthmann, N.; Henz, S. R.; Huson, D. H.; Weigel, D.
Date of Publication (YYYY-MM-DD):2011-06-21
Title of Journal:Proc. Natl. Acad. Sci. USA
Issue / Number:25
Start Page:10249
End Page:10254
Review Status:not specified
Audience:Not Specified
Abstract / Description:We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.
Free Keywords:Algorithms; Arabidopsis/*genetics; Base Sequence; *Genome, Plant; Polymorphism, Genetic; Sequence Alignment; Sequence Analysis, DNA
External Publication Status:published
Document Type:Article
Affiliations:MPI für Entwicklungsbiologie/Abteilung 6 - Molekulare Biologie (Detlef Weigel)
External Affiliations:%G eng
Identifiers:ISSN:1091-6490 (Electronic) 0027-8424 (Linking) %R 1107... [ID No:1]
URL:http://www.ncbi.nlm.nih.gov/pubmed/21646520 [ID No:2]
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.