Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions

Denoeud, F., Kapranov, P., Ucla, C., Frankish, A., Castelo, R., Drenkow, J., Lagarde, J., Alioto, T., Manzano, C., Chrast, J., Dike, S., Wyss, C., Henrichsen, C. N., Holroyd, N., Dickson, M. C., Taylor, R., Hance, Z., Foissac, S., Myers, R. M., Rogers, J., Hubbard, T., Harrow, J., Guigó, R., Gingeras, T. R., Antonarakis, S. E., Reymond, A. (2007) Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Research, 17 (6). pp. 746-759. ISSN 10889051 (ISSN) (Public Dataset)

[thumbnail of Paper]
Preview
PDF (Paper)
Prominent-use-of-distal-5-transcription-start-sites-and-discovery-of-a-large-number-of-additional-exons-in-ENCODE-regions.pdf - Published Version

Download (1MB) | Preview

Abstract

This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5′ rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5′ distal to the annotated 5′ terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations. ©2007 by Cold Spring Harbor Laboratory Press.

Item Type: Paper
Uncontrolled Keywords: 5' transcription start site analytic method article controlled study DNA polymorphism exon gene amplification gene locus gene sequence genetic transcription human human cell human genome nucleotide sequence open reading frame priority journal RNA splicing Chromosome Mapping DNA, Complementary Exons Genome, Human Human Genome Project Humans Open Reading Frames Promoter Regions (Genetics) Quantitative Trait Loci Transcription Genetic
Subjects: bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > transcription
bioinformatics > genomics and proteomics > annotation > map annotation
CSHL Authors:
Communities: CSHL labs > Gingeras lab
Depositing User: CSHL Librarian
Date: 2007
Date Deposited: 08 Mar 2012 15:56
Last Modified: 12 Jul 2013 20:05
PMCID: PMC1891335
Related URLs:
Dataset ID:
URI: https://repository.cshl.edu/id/eprint/25315

Actions (login required)

Administrator's edit/view item Administrator's edit/view item