Sequencing on the Illumina GAIIx has several important characteristics. The first is that the number of unique reads is extremely high. Each run is capable of generating as many as 150 million reads, compared to about 36 reads per run for traditional sequencing. The second is that these reads take a long time, with a GAIIx run taking from 2.5 to 10 days, compared to just hours for a traditional run. Third, each individual read is from 35 to 100 bases, compared to many hundreds of bases for standard sequencing. These differences combine to create substantial challenges for data handling (a long run may generate 7 Tb of data) and data analysis (traditional approaches for sequence assembly are not adequate for short read sequencing).
The initial step in the sequencing process is the preparation of the library. A library is a representation of the source nucleic acid (RNA or DNA) in a collection of short fragments generated randomly by chemical degradation or mechanical shearing.
The preparation of the library is a multi-step process.
For genomic DNA, the process of preparing the library can be visualized by the following.
After the library is constructed and sized via gel electrophoresis, the individual strands that will be the templates are amplified to increase the signal from the sequencing reaction.
The unique feature of sequencing with the Illumina GAIIx is that the amplification of individual templates is automated and performed on a flowcell. A flow cell resembles a microscope slide with 8 channels running lengthwise.
Each individual strand on the flowcell is amplified in place by a process called bridge amplification. This results in about 1000 copies of each strand. The amplified copies derived from a single template are in a focused location on the flowcell. This location is called a cluster.
When the process is complete, each flowcell will have as many as 150 million individual readable clusters on the flowcell. Each flowcell has 8 lanes, so each sample (assuming one sample per lane) will generate about 20 million individual reads.
Sequencing is performed by standard chemistry using reversible dye terminators. Each sequencing pass incorporates one of the four nucleotides into the growing chain. At the end of the pass, five images are captured - one for focusing and one each for the four nucleotides, which are distinguished by their fluorescence characteristics. Software calculates the location of each cluster and makes a base call for the cluster. Sequencing runs are from 35 to 100 bases and run from 3 to 10 days. Multiple samples can be run per lane by indexing the sequence reads, but this reduces the number of unique reads per sample.