Dr Guillermo Lopez-Campos, Biomedical Informatics Researcher at HBIRU, travelled to the US to attend an open meeting of the Genome in a Bottle Consortium. This project, organised by the National Institute of Standards and Technology (NIST), aims to establish reference-material, -methods and -data for a selected whole genome. This ‘example’ genome, the accompanying documentation and data will offer a benchmark or standard, against which the validity of other methods can be measured.
The meeting was called to continue work towards the establishment of a NIST standard for use in the development and setting of clinical applications of next generation sequencing (NGS). The basis of the standard will be a well-characterized reference sample, which could be used to assess the quality of various NGS methods. The development of this reference material (RM) is also being monitored by the US Food and Drug Administration (FDA) and the consortium intend to submit to FDA for assessment and approval. This is intended to be only the first of several to be developed. Attendees at the meeting were a diverse community, including academics, industry, government and interested members of the general public.
Day One of the meeting was devoted to a series of talks setting the ground of the project, covering aspects such as the clinical application of NGS; technical aspects including the analysis of bias and error sources associated with the different technologies; and finally an example of current work towards the development of well-characterized genomes.
Once the scope of the meeting was set out by these talks, there were a series of 5 minute presentations by people involved in similar or related initiatives for different applications. These presentations included efforts for the development of reference samples; proficiency assessment; development of a model of consent; and how to report secondary findings.
After the presentations, participants joined to one of the four working groups for the development of work plans and scheduling of that work. The four groups were:
- Reference Material Selection and Design
- Measurements for Reference Material Characterization
- Bioinformatics, Data Integration and Data Representation
- Performance Metrics and Figures of Merit
Day Two comprised presentations from the working groups, seeking feedback and comment from the other attendees. There were several presentations focused on the use of samples from the 1000 Genomes Project, more precisely on the use of sample NA12878, as a good candidate for the reference sample due to the great amount of data already available for this particular sample. Some concerns were raised about the use of this sample as there it is intended to market the end product of this work; however the donor only gave consent for research, not for commercialization.
Regarding the samples to be developed, it was proposed to use samples from different ethnic backgrounds, balanced gender and using low passed cells to control the genomic drift. Remarks were made about the benefits of using as many sequencing platforms as possible and to keep the door open to development of new methods for new sequencing technologies that are being developed or that might be developed in the future as well as the use of these techniques for error correction and validation. One important suggestion made during the meeting was the classification of the genome into different areas depending on the how difficult it is to sequence them, so this concept might be considered during the assessment processes.
Another proposal was the development of tools for data visualization and an underlying database with the associated metadata; which although not visible in entirety, could be used for filtering purposes, for each reference material developed.
Bioinformatics will play a key role in the process from start to finish, and during the meeting the bioinformatics group identified four different tasks associated with their role. This included collection of all relevant data to create a data repository associated with sample NA12878, establishing a protocol to define a minimum quality for data filtering, running different pipelines and finally a consensus call integration.
The collaborative work of the Genome in a Bottle Consortium will continue by means of teleconferences and face to face meetings of specific working groups.