Affymetrix GeneChip Arrays
GeneChip Array Overview
The GeneChip high-density oligonucleotide arrays are fabricated by using in-situ synthesis of short oligonucleotide sequences on a small glass chip using light directed synthesis. This technique allows for the precise construction of a highly ordered matrix of DNA oligomers on the chip.
3’ IVT GeneChip Array design overview:
In the GeneChip system a known gene or potentially expressed sequence is represented on the chip by 11-20 unique oligomeric probes, each 25 bases in length. The group of probes corresponding to a given gene or small group of highly similar genes is known as the probe set and generally spans a region of about 600 bases, known as the target sequence. Many copies of each oligomer are synthesized in discrete features (or cells) on the GeneChip array. In addition, for each oligomer on the array there is a matched oligomer, synthesized in an adjacent cell that is identical with the exception of a mismatched base at the central position (i.e. base 13). These are designated Perfect Match (PM) and Mismatch (MM) probes, respectively. The MM probes serves as a control for non-specific hybridization.
See figure representing Affymetrix GeneChip technology overview
GeneChip Array Fabrification
"Probe arrays are manufactured by Affymetrix's proprietary, light-directed chemical synthesis process, which combines solid-phase chemical synthesis with photolithographic fabrication techniques employed in the semiconductor industry. Using a series of photolithographic masks to define chip exposure sites, followed by specific chemical synthesis steps, the process constructs high-density arrays of oligonucleotides, with each probe in a predefined position in the array. Multiple probe arrays are synthesized simultaneously on a large glass wafer. This parallel process enhances reproducibility and helps achieve economies of scale. The wafers are then diced, and individual probe arrays are packaged in injection-molded plastic cartridges, which protect them from the environment and serve as chambers for hybridization."
- Quoted from the Affymetrix Website
The GeneChip expression assay begins in the investigator’s lab. RNA is isolated from the chosen sample source. Samples can be prepared from a variety of biological sources, including tissue, cell culture, biopsy, etc. The single most important step in assuring a successful GeneChip experiment is the isolation of the RNA. Clean, intact RNA will generally ensure the generation of high quality microarray data.
After isolation, RNA samples are delivered to the AMC lab. The AMC performs a check of RNA quantity and quality. If the RNA sample passes the assessment, the RNA labeling process is initiated using the Affymetrix 3’ IVT Labeling or the NuGEN Ovation Standard labeling protocol.
Affymetrix 3' IVT One-Cycle target labeling
The first step of the labeling procedure is the synthesis of double stranded cDNA from the RNA sample using reverse transcriptase and an oligo-dT primer. Next, the cDNA serves as a template in an in vitro transcription (IVT) reaction that produces amplified amounts of biotin-labeled antisense mRNA . This biotinylated RNA is referred to as labeled aRNA or cRNA - the microarray target.
Prior to hybridization, the cRNA is fragmented using heat and Mg+2. This fragmentation reduces the cRNA to 25-200 bp fragments. Proper fragmentation facilitates efficient and reproducible hybridization.cRNA is added to a hybridization cocktail. The cocktail contains salts, blocking agents and bacterial RNA spikes. The cocktail injected into the Genechip hybridization chamber. It is then hybridized at 45 degrees Celsius for 18 hours.
Alternative Labeling Protocol
The Affymetrix Microarray Core (AMC) also offers alternative labeling methods to deal with various issues that investigators have encountered with RNA. For example, limited quantities and/or low concentrations of total RNA can be addressed by using the two-cycle cDNA synthesis w/Affymetrix IVT, which uses two rounds of cDNA amplification to generate labeled cRNA target for hybridization. Also, the AMC offers the NuGEN Ovation Standard Labeling protocol to address low concentrations and/or low total RNA inputs. This protocol utilizes proprietary technology from NuGEN Inc. to create a cDNA target for hybridization to Affymetrix arrays. For some tissues like blood, modified protocols can be used to improve sensitivity in the microarray assay.
GeneChip array processing
After hybridization, the chip is stained with a fluorescent molecule (streptavidin-phycoerythrin) that binds to biotin. The staining protocol includes a signal amplification step that employs anti-Streptavidin antibody (goat) and biotinylated goat IgG antibody (The series of washes and stains with aforementioned reagents binds the biotin and provides an amplified flour that emits light when the chip is then scanned with a confocal laser and the distribution pattern of signal in the array is recorded.
The GeneChip arrays are scanned and the images processed using Affymetrix software, GeneChip Operating Software (GCOS).
For more information on the GeneChip expression assay , please see the Affymetrix GeneChip Expression Analysis Technical Manual.
Affymetrix GeneChip experiments are managed using GCOS. GCOS interfaces with equipment to run a probe array experiment and is also used to generate preliminary analysis data from an experiment. This section is divided into 4 sections:
MAS File Types
The next section covers the basics of files generated by GCOS and also explains some of the most widely used variables generated by GCOS.
Experiment File *.EXP: This file contains the parameters of the experiment such as Probe Array Type, Experiment Name, Equipment parameters, Sample Description, and others. This file is not used for analysis, but is required to open other GCOS files for the designated chip experiment.
Image Data File *.DAT: This is an image file generated by the scanner from the Probe Array after processing on the Fluidics Station. This file can be viewed in GCOS to assess the quality of scanning event or exported as a *.TIFF image. It is used in GCOS to generate the *.CEL file (see below).
Cell Intensity File *.CEL: This binary file is the result of low level analysis performed from the *.DAT image file. It is exported from GCOS and is often used as the base file for further analysis.
Probe Array Results File *.CHP: This binary file is a gene level summarization of the CEL file using the Affymetrix’ MAS 5.0 or PLIER algorithms. It is exported from GCOS. It can also be used as the base file for further analysis; however one needs to know the settings of key parameters (alpha1, alpha2, tau, target signal etc.). There are many other algorithms that have been adopted by the community other than MAS 5.0 and PLIER, hence the reason why CEL files are often preferred over CHP files by Investigators for analysis. The CHP file contains the data that is used by the AMC for quality control purposes.
Report File *.RPT: The report file is generated from the chip file. This expression report summarizes information about expression analysis settings and probe set hybridization intensity data.
- MAGE-ML *.XML: This file contains information related the microarray experiment (one per experiment). This information can include biologically relevant information (entered by the user manually or through an EXP file), array details, fluidics protocol details and the analysis settings. It also records the file hierarchy of an experiment. This last point is important if one is attempting to load previously generated GCOS files from one instance of the application into another. This makes the MAGE-ML file essential for the task of reloading GCOS files.
GCOS uses a statistical algorithm to calculate signals and make significance calls for the data. A description of the analysis algorithm is available. Below are the algorithm generated metrics the AMC provides to investigators as part of its data preparation service.
MAS Analysis Metrics
Signal: a measure of the abundance of transcript
Detection: the call that indicates whether the transcript is detected (P present), undetected ( A, absent), or at the limit of detection (M, marginal).
Detection p-value: p-value that indicates the significance of the detection call.
Signal Log Ratio: the change in expression level of a transcript between a baseline and an experiment array. This change is expressed as the log2 ratio. A log2 ratio of 1 is equal to a fold change of 2.
Change: the call that indicates the change in the transcript level between a baseline and experiment (increase (I), marginal increase (MI), no change (NC), marginal decrease (MD),
Change p-value: p-value that indicates the significance of the change call.
Each probe set on a GeneChip array has a unigue name known as the Probe set ID. Probe set ID's have different extensions that denote important information about how the probe set was designed.. The nomenclature for the probe set extensions are below.
Probe Set Extension Nomenclature
All probe sets have one of the following two extensions:
_at : anti-sense target (most probe sets on the array)
_st : sense target (only some control probes are in sense orientation on the array)
_i : reduced number of pairs in the probe set.
Some probe sets represent more than one gene or EST:
_s_at : designates probe sets that share common probes among multiple transcripts from different genes.
_a_at : designates probe sets that recognize multiple alternative transcripts from the same gene (on HG-U133 these probe sets have an "_s" suffix).
_x_at : designates probe sets where it was not possible to select either a unique probe set or a probe set with identical probes among multiple transcripts. Rules for cross-hybridization were dropped. Therefore, these probe sets may cross-hybridize in an unpredictable manner with other sequences.
_g_at : similar genes, also unique probe sets elswhere on the array.
_f_at : similarity rules dropped, probe set will recognize more than one gene.
_i_at : designates sequences for which there are fewer than the required numbers of unique probes specified in the design.
_b_at : all probe selection rules were ignored. Withdrawn from GenBank.
_l_at : sequence represented by more than 20 probe pairs.
_r_ : designates sequences for which it was not possible to pick a full set of unique probes using Affymetrix' probe selection rules. Probes were picked after dropping some of the selection rules.
Most of the descriptions for the probe set ID extensions above were taken from the Affymetrix GeneChip Expression Analysis Data Analysis Fundamentals.
Glossary of Analysis Terms
- Fragmented, biotinylated anti-sense cRNA prepared from mRNA to be analyzed. Target molecules are hybridized to the probe array and the levels of hybridization are measured with the GeneArray scanner after the array is stained with streptavidin-phycoerythrin (SAPE).
- Single-stranded DNA oligonucleotide synthesized directly on the surface of the GeneChip array using photolithography and combinatorial chemistry. The 25 base oligonucleotide is designed to be complementary to a specific gene transcript.
- Probe Cell:
- Single square-shaped feature on an array containing probes with a unique sequence. The size can vary depending on the array type, typically 20 µm or 18 µm. Each probe cell contains millions of probe molecules.
- Perfect Match (PM):
- Probes that are designed to be complementary to a reference sequence.
- Mismatch (MM):
- Probes that are designed to be complementary to a reference sequence except for a homomeric mismatch at the central position (e.g., 13th position of 25 base probe. A->T or G->C). Mismatch probes serve as a control for cross-hybridization.
- Probe Pair:
- Two probe cells, a PM and its corresponding MM. On the probe array, a probe pair is arranged with a PM cell directly above a MM cell.
- Probe set:
- A set of probes designed to detect one transcript. A probe set usually consists of 11-20 probe pairs. For example, an 11 probe pair set is made up of 11 PM probes and 11 MM probes for a total of 22 probe cells. Newer array designs from Affymetrix, e.g., HG-U133, contain probe sets with 11 probe pairs. Older designs have average probe set numbers of 16 or 20 probe pairs.
- Target Sequence:
- The portion of a transcript reference sequence that is interrogated by a probe set on the array. The target sequence extends from the first base of the most 5' probe to the last base of the most 3' probe.
- Absolute Analysis:
- This is an analysis of a single GeneChip array using Affymetrix Microarray Suite software. The software applies an algorithm developed by Affymetrix to determine the expression level for each gene represented on the array.
- Analysis Metrics:
- Probe set performance descriptors calculated by the software from measured probe cell intensities. Analysis metrics are used to determine biologically meaningful results, such as the presence or absence of gene transcripts.
- Analysis Parameters:
- Variables with user-defined values used in the expression analysis (default values in the software are empirically determined at Affymetrix).
*More extensive glossaries can be found in Statistical Algorithms Reference Guide and Data Analysis Fundamentals, available on the Affymetrix website.
The following table contain the codes for the chip types and the GCOS parameter values AMC uses in data preparation. Scale Target Intensity is the target value for the global scaling for the chips within a dataset. Alpha1 and Alpha2 are two values that define the the range of p-values that determines Detection Calls. The values are current as of May 2005.
|Array Description||Chip Type||Scale Target Intensity||Alpha 1||Alpha2||Chip Code for DB|
|Human Genome U133 Plus 2.0||HG-U133 Plus 2.0||300||0.05*||0.065*||K|
|Human Genome U133A 2.0||HG-U133A 2.0||NA||0.05*||0.065*||J|
|Human Genome U95Av2||HG-95Av2||200||0.1||0.15||A|
|Human Genome U133A||HG-U133A||325||0.1||0.15||F|
|Human Genome U133B||HG-U133B||105||0.1||0.15||G|
|Human Genome Focus||HG-Focus||275||0.1||0.15||H|
|Mouse Genome 430 2.0||MOE 430 2.0||350||0.05*||0.065*||H|
|Mouse Expression 430A||MOE 430A||350||0.1||0.15||F|
|Mouse Expression 430B||MOE 430B||70||0.1||0.15||G|
|Mouse Genome U74Av2||MG-U74Av2||150||0.1||0.15||A|
|Rat Genome 230 2.0||RAE 230 2.0||300||0.05*||0.065*||H|
|Rat Expression 230A||RAE 230A||300||0.1||0.15||F|
|Rat Genome U34A||RG-U34A||265||0.1||0.15||A|
|Yeast Genome 2.0||YG 2.0||NA||0.1||0.15||A|
* Affymetrix default value