The Genome Sequence DataBase (GSDB): improving data quality and data access

Harger, C., Skupski, M., Bingham, J., Farmer, A., Hoisie, S., Hraber, P., Kiphart, D., Krakowski, L., McLeod, M., Schwertfeger, J., Seluja, G., Siepel, A., Singh, G., Stamper, D., Steadman, P., Thayer, N., Thompson, R., Wargo, P., Waugh, M., Zhuang, J. J., Schad, P. A. (January 1998) The Genome Sequence DataBase (GSDB): improving data quality and data access. Nucleic Acids Res, 26 (1). pp. 21-6. ISSN 0305-1048 (Print)0305-1048

[thumbnail of Paper]
Preview
PDF (Paper)
Siepel Nucleic Acid Res 1998.pdf - Published Version

Download (161kB) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/9399793
DOI: 10.1093/nar/26.1.21

Abstract

In 1997 the primary focus of the Genome Sequence DataBase (GSDB; www. ncgr.org/gsdb ) located at the National Center for Genome Resources was to improve data quality and accessibility. Efforts to increase the quality of data within the database included two major projects; one to identify and remove all vector contamination from sequences in the database and one to create premier sequence sets (including both alignments and discontiguous sequences). Data accessibility was improved during the course of the last year in several ways. First, a graphical database sequence viewer was made available to researchers. Second, an update process was implemented for the web-based query tool, Maestro. Third, a web-based tool, Excerpt, was developed to retrieve selected regions of any sequence in the database. And lastly, a GSDB flatfile that contains annotation unique to GSDB (e.g., sequence analysis and alignment data) was developed. Additionally, the GSDB web site provides a tool for the detection of matrix attachment regions (MARs), which can be used to identify regions of high coding potential. The ultimate goal of this work is to make GSDB a more useful resource for genomic comparison studies and gene level studies by improving data quality and by providing data access capabilities that are consistent with the needs of both types of studies.

Item Type: Paper
Uncontrolled Keywords: Base Sequence Computer Communication Networks *Databases, Factual Forecasting *Genome Information Storage and Retrieval
Subjects: bioinformatics
bioinformatics > genomics and proteomics > databases
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Siepel lab
Depositing User: Matt Covey
Date: 1 January 1998
Date Deposited: 14 Jan 2015 17:55
Last Modified: 08 Nov 2017 15:53
PMCID: PMC147232
Related URLs:
URI: https://repository.cshl.edu/id/eprint/31068

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving