Effective deployment of new technologies in a hierarchical WGS approach will yield reference sequences based on well-defined milestones. An initial and early deliverable will be 21X WGS sequence and preliminary assemblies (gene-boosted and whole genome) of the loblolly pine genome based on >= 100 bp paired-end Illumina sequences of a mix of 500-bp, 5-kbp, and 40-kbp (fosmid-diTag) libraries. In less than two years a 10x18 hierarchical WGS (180X total read depth) based on 18X (read depth of 500-bp, 5-kbp and 40-kbp libraries) of many small pools of fosmids will be the fundamental data for two types of assemblies: a consensus based on all the data and a second consensus based on hierarchical analysis of subassemblies of the haploid fosmid pools. Polishing will follow that includes longer end reads from a 10X BAC library, deep fosmind-end sequencing, and existing or emerging long-read technologies which are deemed effective for improving assembly quality. A high-resolution (0.1 cM) genetic scaffold based on a new genotyping resource will incorporate all genotypable contigs and validate the contiguity lar ger ones. In the later years comparable reference sequences for sugar pine, slash pine, and Douglas fir will be created. Comparative genomic analysis of these four conifer genomes will provide a solid and rich annotation and further improve assembly quality and contiguity.
We will build transcriptome references using multiple sequencing approaches to maximize evidence-based gene discovery in parallel with the reference genome assembly and annotation and we will provide full transcript assemblies for functional genomics studies. Initially, RNAs from a large number of loblolly pine organs, stages of development, and tissues exposed to biotic and abiotic stresses will be sequenced using the long reads of Roche/454 GS-FLX Titanium technology. Subsequently, higher-depth RNA-Seq approaches will be employed using the Illumina platform, including the sequencing of various mRNA and noncoding RNA libraries. Data will be used first to add depth and detail to the transcriptome and to catalog transcribed polymorphisms. Transcriptome analysis will profile gene expression differences of biological importance, including changes in development of reproductive tissues, embryos and seedlings, and wood and in response to biotic and abiotic stresses.
The transcriptome and genome sequences will be delivered via TreeGenes to the community as sequence becomes available. Collaboration with GDR will provide the primary annotation and integrate a custom web-based tool known as GenSAS from GDR with GBrowse from Dendrome to facilitate community-level annotation. We will apply and expand existing pipelines to deliver a comprehensive SNP resource and distribute this through the existing DiversiTree interface. We will work continuously with existing projects like Gene Ontology and Plant Ontology to imple ment specific conifer-based ontologies to consistently describe gene products and phenotypes. All pipelines and tools developed in this project will be made freely available to the academic community.