Wednesday, May 14, 2008

OpGen Home
Home
About OpGen
OpGen Newsroom
References
Contact Us

Sequence Finishing and Validation

Introduction

Many sequencing centers have adopted the whole-genome shotgun approach to sequence entire genomes. This sequencing strategy has not been fully optimized in terms of reducing cost and time in finishing. This is especially true when trying to assemble regions containing repeated DNA sequences. In fact, entire regions of a genome could be excluded from the "finished" sequence because of budget limitations and lack of physical maps.

The Problem

At the end of the shotgun process, the investigator is left with a collection of sequence contigs. Among other things, finishing involves designing and conducting additional experiments to establish the overall order of contigs, and closing the gaps between them. These efforts require extensive human intervention. For illustration purposes, we will look at a collection of sequence contigs in OpGen's Map Viewer program. Each sequence contig is represented as an in silico digest of Xho1.


Conventional finishing methods involve multiple steps, and may still encounter recalcitrant problems.  Contig ordering is usually done by multiplex PCR experiments that answer the question, "Are contig 1 and contig 2 neighbors?".  Contig ordering is related to gap closure.  Contigs must be in proper order to address the gaps betweeen them.  Adjacent contigs may be separated by either gaps in coverage, creating contig islands, or by repeat sequences (pseudogaps).

The Solution



The addition of a whole genome physical map provides a scaffold for orienting these contigs.  This eliminates much of the work in finishing.   The contigs can be oriented correctly with respect to each other.  This provides sequence data flanking the gaps, some sequence information within the gap (restriction sequence), and the overall size of the gap itself.  The map verifies the contig assemblies as well.  The depth of coverage of an optical map provides greater accuracy than typical shotgun sequence coverage.  Therefore, discrepancies in marker sites may indicate misassembly of the contig in question, or possible DNA sequencing error at that site.

Some Examples

The screen shot at right demonstrates two problems resolved by the physical map.   The gap between the two contigs (green circle) can now be sized, and flanking sequences are apparent.  Also, there is a sequence discrepancy indicated by a missing restriction site in the contig (red oval).  This would suggest resequencing would be appropriate in this region.

This example shows a contig that has been mis-assembled. Contig 1 sequence aligns to the Optical Map in two separate places, indicated by the lines connecting the ends of congruent fragments.



This is resolved by "breaking" contig 1 into two fragments, contigs 1A and 1B which yields the correct orientation of contigs 1 and 8.  The corrected assembly will help close the gap that now exists between the 3' end of contig 1A and the 5' end of contig 8.




Applications | Contact Us | Home

© 2007 OpGen, Inc. All Rights Reserved.

privacy policy  |  site map