Sequencing Technology

Sequencing on the Illumina HiSeq 2000 has several important characteristics.  The first is that the number of unique reads is extremely high.  Each lane is capable of generating as many as 180 million reads, compared to about 36 reads per run for traditional sequencing.  The second is that these reads take a long time, with a HiSeq run taking from 2.5 to 12 days, compared to just hours for a traditional run.  Third, each individual read is from 50 to 100 bases, compared to many hundreds of bases for standard sequencing.  These differences combine to create substantial challenges for data handling (a long run may generate 7 Tb of data) and data analysis (traditional approaches for sequence assembly are not adequate for short read sequencing).


The initial step in the sequencing process is the preparation of the library.  A library is a representation of the source nucleic acid (RNA or DNA) in a collection of short fragments generated randomly by chemical degradation or mechanical shearing.The preparation of the library is a multi-step process.


For genomic DNA, the process of preparing the library can be visualized by the following.

After the library is constructed and sized via gel electrophoresis, the individual strands that will be the templates are amplified to increase the signal from the sequencing reaction.

The unique feature of sequencing with the Illumina HiSeq 2000 is that the amplification of individual templates is automated and performed on a flowcell.  A flow cell resembles a microscope slide with 8 channels running lengthwise.
Each individual strand on the flowcell is amplified in place by a process called bridge amplification.  This results in about 1000 copies of each strand.  The amplified copies derived from a single template are in a focused location on the flowcell.  This location is called a cluster.


When the process is complete, each flowcell will have as many as 180 million individual readable clusters in one lane on the flowcell.  Each flowcell has 8 lanes, so the total reads for a single run will exceed 1.4 billion.


Sequencing is performed by proprietary chemistry using reversible dye terminators.  Each sequencing pass incorporates one of the four nucleotides into the growing chain.  At the end of the pass, five images are captured - one for focusing and one each for the four nucleotides, which are distinguished by their fluorescence characteristics.  Software calculates the location of each cluster and makes a base call for the cluster.  Sequencing runs are from 50 to 100 bases and run from 3 to 12 days.  All libraries are constructed with barcodes, so multiple samples can be run in each lane.