Viral Bioinformatics Workshop - Adventures in sequence space: turning publicly available viral nucleotide sequences into value added data.


Rodney Brister

NIH/NLM/NCBI, Bethesda, United States


There are over 2.2 million viral nucleotide sequences available from the three public databases that together comprise the International Sequence Database Consortium - INSDC. As the number of publicly available sequences grows, so does the promise that this wealth of archival data can be an invaluable resource for research and public health policy. However, the sheer number of sequences, their diversity in origin and quality, as well as evolving experimental insight present significant challenges to archival data usage. To counter these issues, NCBI embarked on the RefSeq project in an attempt to provide non-redundant, high quality reference genome sequences for each viral species. Now, 15 years later, the scope of this project has grown, and the NCBI Viral Genomes Group provides a number of products including reference genome and protein records, curated sets of all “complete” viral genomes, taxonomy/classification tools, and curated metadata such as host type that place individual sequences within a larger context. These efforts have led to the development of the Virus Variation resource where computer tools, database loading processes, modular annotation pipelines, and specialized human curation interfaces are used to dynamically transform archival, primary sequence data into standardized, high quality datasets. Next generation search and retrieval interfaces in turn provide powerful but user friendly approaches to working with the data. All of these efforts are dependent on a comprehensive understanding of the viral sequence space, and current efforts are focussed on computationally interrogating the viral sequence space. The goal is to build better reference and representational models that can support a variety of data transformation and validation operations and enhance the value of the data for users.






Reference:
Viral Bioinformatics 1-T14-KNA-01
Session:
Viral Bioinformatics Workshop - Adventures in viral sequence space: A public database perspective
Presenters:
Rodney Brister
Session:
Viral bioinformatics and annotation workshop
Presentation type:
Keynote address - 45 min
Room:
Main Auditorium
Date:
Thursday, 21 July 2016
Time:
17:30 - 18:15