The GeneChip high-density oligonucleotide arrays are fabricated by using in-situ
synthesis of short oligonucleotide sequences on a small glass chip using light
directed synthesis. This technique allows for the precise construction of a
highly ordered matrix of DNA oligomers on the chip.
3’ IVT GeneChip Array design overview:
In
the GeneChip system a known gene or potentially expressed sequence is represented
on the chip by 11-20 unique oligomeric probes, each 25 bases in length. The
group of probes corresponding to a given gene or small group of highly similar
genes is known as the probe set and generally spans a region of about 600 bases,
known as the target sequence. Many copies of each oligomer are synthesized
in discrete features (or cells) on the GeneChip array. In addition, for each
oligomer on the array there is a matched oligomer, synthesized in an adjacent
cell that is identical with the exception of a mismatched base at the central
position (i.e. base 13). These are designated Perfect Match (PM) and Mismatch
(MM) probes, respectively. The MM probes serves as a control for non-specific
hybridization.
See figure
representing Affymetrix GeneChip technology overview
| GeneChip Array Fabrification |
"Probe arrays are manufactured by Affymetrix's proprietary, light-directed
chemical synthesis process, which combines solid-phase chemical synthesis with
photolithographic fabrication techniques employed in the semiconductor industry.
Using a series of photolithographic masks to define chip exposure sites, followed
by specific chemical synthesis steps, the process constructs high-density arrays
of oligonucleotides, with each probe in a predefined position in the array. Multiple
probe arrays are synthesized simultaneously on a large glass wafer. This parallel
process enhances reproducibility and helps achieve economies of scale. The wafers
are then diced, and individual probe arrays are packaged in injection-molded
plastic cartridges, which protect them from the environment and serve as chambers
for hybridization."
- Quoted from the
Affymetrix Website
|
|
|
|
|
|
The GeneChip expression assay begins in the investigator’s lab. RNA is
isolated from the chosen sample source. Samples can be prepared from a variety
of biological sources, including tissue, cell culture, biopsy, etc. The single
most important step in assuring a successful GeneChip experiment is the isolation
of the RNA. Clean, intact RNA will generally ensure the generation of high quality
microarray data.
After isolation, RNA samples are delivered to the AMC lab. The AMC performs
a check of RNA quantity and quality. If the RNA sample passes the assessment,
the
RNA labeling process is initiated using the Affymetrix 3’ IVT Labeling
or the NuGEN Ovation Standard labeling protocol.
Affymetrix 3’ IVT One-Cycle target labeling
The first step of the labeling
procedure is the synthesis of double stranded cDNA from the RNA sample
using reverse
transcriptase
and an oligo-dT primer. Next, the cDNA serves as a template in an in vitro
transcription (IVT) reaction that produces amplified amounts of biotin-labeled
antisense
mRNA . This biotinylated RNA is referred to as labeled aRNA or cRNA
- the microarray
target.
Prior to hybridization, the cRNA is fragmented using heat and Mg+2. This
fragmentation reduces the cRNA to 25-200 bp fragments. Proper fragmentation
facilitates efficient
and reproducible hybridization.
cRNA is added to a hybridization cocktail. The cocktail contains salts,
blocking agents and bacterial RNA spikes. The cocktail injected into
the Genechip
hybridization chamber. It is then hybridized at 45 degrees Celsius for
18 hours.
Alternative Labeling Protocols
The Affymetrix Microarray Core (AMC) also offers alternative labeling
methods to deal with various issues that investigators have encountered
with RNA.
For example, limited quantities and/or low concentrations of total RNA
can be addressed by using the two-cycle cDNA synthesis w/Affymetrix
IVT, which uses two rounds of cDNA amplification
to generate labeled cRNA target for hybridization. Also, the AMC offers
the NuGEN Ovation Standard Labeling protocol to address low concentrations
and/or low total RNA inputs. This protocol utilizes
proprietary technology from NuGEN Inc. to create a cDNA target for hybridization
to Affymetrix arrays.
For some tissues like blood, modified protocols
can be used to improve sensitivity in the microarray assay.
GeneChip array processing
After hybridization, the chip is stained with a fluorescent molecule (streptavidin-phycoerythrin)
that binds to biotin. The staining protocol includes a signal amplification
step that employs anti-Streptavidin antibody (goat) and biotinylated goat
IgG antibody (The series of washes and stains with aforementioned reagents
binds the biotin and provides an amplified flour that emits light when
the chip is then scanned with a confocal laser and the distribution pattern
of signal in the array is recorded.

The GeneChip arrays are scanned and the images processed using Affymetrix
software, GeneChip Operating Software (GCOS).
For more information on the GeneChip expression assay , please see the
Affymetrix GeneChip Expression Analysis Technical Manual.
|
 |
Affymetrix GeneChip experiments are managed using GCOS. GCOS interfaces with
equipment to run a probe array experiment and is also used to generate preliminary
analysis data from an experiment. This section is divided into 4 sections:
The next section covers the basics of files generated by GCOS and also explains
some of the most widely used variables generated by GCOS.
| MAS File Types |
The next section covers the basics of files generated by GCOS and also explains
some of the most widely used variables generated by GCOS.

-
Experiment File *.EXP: This file contains the parameters of
the experiment such as Probe Array Type, Experiment Name, Equipment parameters,
Sample Description, and others. This file is not used for analysis, but
is required
to open other GCOS files for the designated chip experiment.
Image Data File *.DAT: This is an image file generated by the scanner
from the Probe Array after processing on the Fluidics Station. This file
can be viewed in GCOS to assess the quality of scanning event or exported
as a *.TIFF image. It is used in GCOS to generate the *.CEL file (see below).
-
Cell Intensity File *.CEL: This binary file is the result of low
level analysis performed from the *.DAT image file. It is exported from GCOS
and is often used as the base file for further analysis.
-
Probe Array Results File *.CHP: This binary file is a gene level
summarization of the CEL file using the Affymetrix’ MAS 5.0 or PLIER
algorithms. It is exported from GCOS. It can also be used as the base file
for further analysis; however one needs to know the settings of key parameters
(alpha1, alpha2, tau, target signal etc.). There are many other algorithms
that have been adopted by the community other than MAS 5.0 and PLIER, hence
the reason why CEL files are often preferred over CHP files by Investigators
for analysis. The CHP file contains the data that is used by the AMC for
quality control purposes.
- Report File *.RPT: The report file is generated from
the chip file. This expression report summarizes information about
expression analysis settings and probe set hybridization intensity data.
- MAGE-ML *.XML: This file contains information related the microarray
experiment (one per experiment). This information can include biologically
relevant information (entered by the user manually or through an EXP file),
array details, fluidics protocol details and the analysis settings. It also
records the file hierarchy of an experiment. This last point is important if
one is attempting to load previously generated GCOS files from one instance
of the application into another. This makes the MAGE-ML file essential for
the task of reloading GCOS files.
|
GCOS uses a statistical algorithm to calculate signals and make significance
calls for the data. A
description of the analysis algorithm is available. Below are
the algorithm generated metrics the AMC provides to investigators as part of
its data preparation service.
| MAS Analysis Metrics |
-
Signal: a measure of the abundance of transcript
-
Detection: the call that indicates whether the transcript is detected
(P present), undetected ( A, absent), or at the limit of detection (M, marginal).
Detection p-value: p-value that indicates the significance of the detection call.
-
Signal Log Ratio: the change in expression level of a transcript
between a baseline and an experiment array. This change is expressed as the
log2
ratio. A log2 ratio of 1 is equal
to a fold change of 2.
-
Change: the call that indicates the change in the transcript level
between a baseline and experiment (increase (I), marginal increase (MI),
no change (NC), marginal decrease (MD),
decrease (D)).
Change p-value: p-value that indicates the significance of the change call.
|
Each probe set on a GeneChip array has a unigue name known as the Probe set
ID. Probe set ID's have different extensions that denote important information
about how the probe set was designed.. The nomenclature for the probe set extensions
are below.
| Probe Set Extension Nomenclature |
All probe sets have one of the following two extensions:
A few probe sets are designated as follows:
Some probe sets represent more than one gene or EST:
-
_s_at : designates probe sets that share common probes among multiple
transcripts from different genes.
-
_a_at : designates probe sets that recognize multiple alternative
transcripts from the
same gene (on HG-U133 these probe sets have an "_s" suffix).
-
_x_at : designates probe sets where it was not possible to select
either a unique probe set or a probe set with identical probes among multiple
transcripts. Rules for
cross-hybridization were dropped. Therefore, these probe sets may cross-hybridize
in an unpredictable manner with other sequences.
_g_at : similar genes, also unique probe sets elswhere on the array.
_f_at : similarity rules dropped, probe set will recognize more than one gene.
-
_i_at : designates sequences for which there are fewer than the required
numbers of
unique probes specified in the design.
-
_b_at : all probe selection rules were ignored. Withdrawn from GenBank.
-
_l_at : sequence represented by more than 20 probe pairs.
-
_r_ : designates sequences for which it was not possible to pick
a full
set of unique probes using Affymetrix' probe selection
rules. Probes were picked after dropping some of the selection rules.
Most of the descriptions for the probe set ID extensions above were taken from the
Affymetrix
GeneChip Expression Analysis Data Analysis Fundamentals.
|
| Glossary of Analysis Terms |
- Target:
- Fragmented, biotinylated anti-sense cRNA prepared from mRNA to be analyzed.
Target molecules are hybridized to the probe array and the levels of hybridization
are measured with the GeneArray scanner after the array is stained with
streptavidin-phycoerythrin (SAPE).
- Probe:
- Single-stranded DNA oligonucleotide synthesized directly on the surface of the
GeneChip array using photolithography and combinatorial chemistry.
The 25 base oligonucleotide is designed to be complementary to a specific gene transcript.
- Probe Cell:
- Single square-shaped feature on an array containing probes with a unique sequence.
The size can vary depending on the array type, typically 20 µm or 18 µm.
Each probe cell contains millions of probe molecules.
- Perfect Match (PM):
- Probes that are designed to be complementary to a reference sequence.
- Mismatch (MM):
- Probes that are designed to be complementary to a reference sequence except
for a homomeric mismatch at the central position (e.g., 13th position of 25 base probe. A->T or G->C).
Mismatch probes serve as a control for cross-hybridization.
- Probe Pair:
- Two probe cells, a PM and its corresponding MM.
On the probe array, a probe pair is arranged with a PM cell
directly above a MM cell.
- Probe set:
- A set of probes designed to detect one transcript.
A probe set usually consists of 11-20 probe pairs.
For example, an 11 probe pair set is made up of 11 PM
probes and 11 MM probes for a total of 22 probe cells.
Newer array designs from Affymetrix, e.g., HG-U133, contain
probe sets with 11 probe pairs. Older designs have average
probe set numbers of 16 or 20 probe pairs.
- Target Sequence:
- The portion of a transcript reference sequence that is interrogated
by a probe set on the array. The target sequence extends from the first
base of the most 5' probe to the last base of the most 3' probe.
- Absolute Analysis:
- This is an analysis of a single GeneChip array using Affymetrix Microarray Suite
software. The software applies an algorithm developed by Affymetrix to determine
the expression level for each gene represented on the array.
- Analysis Metrics:
- Probe set performance descriptors calculated by the software from measured probe
cell intensities. Analysis metrics are used to determine biologically meaningful
results, such as the presence or absence of gene transcripts.
- Analysis Parameters:
- Variables with user-defined values used in the expression
analysis (default values in the software are empirically determined at Affymetrix).
*More extensive glossaries can be found in Statistical Algorithms
Reference Guide and Data Analysis Fundamentals, available on the
Affymetrix website (www.affymetrix.com).
|
| Chip Codes |
The following table contain the codes for the chip types and the GCOS parameter values AMC uses in data preparation.
Scale Target Intensity is the target value for the global scaling for the chips within a dataset. Alpha1 and Alpha2 are two values that define the the range of p-values that determines Detection Calls.
The values are current as of May 2005.
| Array Description |
Chip Type |
Scale Target Intensity |
Alpha 1 |
Alpha2 |
Chip Code for DB |
| Human Genome U133 Plus 2.0 |
HG-U133 Plus 2.0 |
300 |
0.05* |
0.065* |
K |
| Human Genome U133A 2.0 |
HG-U133A 2.0 |
NA |
0.05* |
0.065* |
J |
| Human Genome U95Av2 |
HG-95Av2 |
200 |
0.1 |
0.15 |
A |
| Human Genome U133A |
HG-U133A |
325 |
0.1 |
0.15 |
F |
| Human Genome U133B |
HG-U133B |
105 |
0.1 |
0.15 |
G |
| Human Genome Focus |
HG-Focus |
275 |
0.1 |
0.15 |
H |
| |
| Mouse Genome 430 2.0 |
MOE 430 2.0 |
350 |
0.05* |
0.065* |
H |
| Mouse Expression 430A |
MOE 430A |
350 |
0.1 |
0.15 |
F |
| Mouse Expression 430B |
MOE 430B |
70 |
0.1 |
0.15 |
G |
| Mouse Genome U74Av2 |
MG-U74Av2 |
150 |
0.1 |
0.15 |
A |
| Rat Genome 230 2.0 |
RAE 230 2.0 |
300 |
0.05* |
0.065* |
H |
| Rat Expression 230A |
RAE 230A |
300 |
0.1 |
0.15 |
F |
| Rat Genome U34A |
RG-U34A |
265 |
0.1 |
0.15 |
A |
| |
| Yeast Genome 2.0 |
YG 2.0 |
NA |
0.1 |
0.15 |
A |
* Affymetrix default value
|
Written by BT/Updated by DMC 08/19/08
|
 |