arrayQualityMetrics report for MLL.A[, 1:5]
Section 2: Array intensity distributionsBoxplotsDensity plots
Section 3: Variance mean dependenceStandard deviation versus rank of the mean
Section 4: Affymetrix specific plotsRelative Log Expression (RLE)Normalized Unscaled Standard Error (NUSE)RNA digestion plotPerfect matches and mismatches
Section 5: Individual array qualityMA plotsSpatial distribution of M
Browser compatibility
This report uses recent features of HTML 5 which have not yet been implemented by all browsers. Thus, unfortunately, browser compatibility currently needs to be considered:
- Firefox 4 - tested, works well,
- Chrome 10 - tested, works well,
- Safari 5 - the interactive (SVG) plots will be missing, since this browser does not support the embedding of the <svg> tag in HTML.
- Array metadata and outlier detection overview
- Figure 1: Distances between arrays.
Figure 1 (PDF file) shows a false color heatmap of the distances between arrays. The color scale is chosen to cover the range of distances encountered in the dataset. Patterns in this plot can indicate clustering of the arrays either because of intended biological or unintended experimental factors (batch effects). The distance
dab between two arrays
a and
b is computed as the mean absolute difference (L
1-distance) between the data of the arrays (using the data from all probes without filtering). In formula,
dab = mean |
Mai - Mbi |, where
Mai is the value of the
i-th probe on the
a-th array. Outlier detection was performed by looking for arrays for which the sum of the distances to all other arrays,
Sa = Σ
b dab was exceptionally large. No such arrays were detected.
+ Figure 2: Outlier detection for Distances between arrays.
Figure 2 (PDF file) shows a bar chart of the sum of distances to other arrays
Sa, the outlier detection criterion from the previous figure. The bars are shown in the original order of the arrays. Based on the distribution of the values across all arrays, a threshold of 1.44 was determined, which is indicated by the vertical line. None of the arrays exceeded the threshold and was considered an outlier.
- Figure 3: Principal Component Analysis.
Figure 3 (PDF file) shows a scatterplot of the arrays along the first two principal components. You can use this plot to explore if the arrays cluster, and whether this is according to an intended experimental factor (you can indicate such a factor by color using the 'intgroup' argument), or according to unintended causes such as batch effects. Move the mouse over the points to see the sample names.
Principal component analysis is a dimension reduction and visualisation technique that is here used to project the multivariate data vector of each array into a two-dimensional plot, such that the spatial arrangement of the points in the plot reflects the overall data (dis)similarity between the arrays.
- Figure 4: Boxplots.
Figure 4 (PDF file) shows boxplots representing summaries of the signal intensity distributions of the arrays. Each box corresponds to one array. Typically, one expects the boxes to have similar positions and widths. If the distribution of an array is very different from the others, this may indicate an experimental problem. Outlier detection was performed by computing the Kolmogorov-Smirnov statistic
Ka between each array's distribution and the distribution of the pooled data.
+ Figure 5: Outlier detection for Boxplots.
Figure 5 (PDF file) shows a bar chart of the Kolmogorov-Smirnov statistic
Ka, the outlier detection criterion from the previous figure. The bars are shown in the original order of the arrays. Based on the distribution of the values across all arrays, a threshold of 0.163 was determined, which is indicated by the vertical line. None of the arrays exceeded the threshold and was considered an outlier.
- Figure 6: Density plots.
Figure 6 (PDF file) shows density estimates (smoothed histograms) of the data. Typically, the distributions of the arrays should have similar shapes and ranges. Arrays whose distributions are very different from the others should be considered for possible problems. Various features of the distributions can be indicative of quality related phenomena. For instance, high levels of background will shift an array's distribution to the right. Lack of signal diminishes its right right tail. A bulge at the upper end of the intensity range often indicates signal saturation.
- Figure 7: Standard deviation versus rank of the mean.
Figure 7 (PDF file) shows a density plot of the standard deviation of the intensities across arrays on the
y-axis versus the rank of their mean on the
x-axis. The red dots, connected by lines, show the running median of the standard deviation. After normalisation and transformation to a logarithm(-like) scale, one typically expects the red line to be approximately horizontal, that is, show no substantial trend. In some cases, a hump on the right hand of the x-axis can be observed and is symptomatic of a saturation of the intensities.
- Figure 8: Relative Log Expression (RLE).
Figure 8 (PDF file) shows the
Relative Log Expression (RLE) plot. Arrays whose boxes are centered away from 0 and/or are more spread out are potentially problematic. Outlier detection was performed by computing the Kolmogorov-Smirnov statistic
Ra between each array's RLE values and the pooled, overall distribution of RLE values.
+ Figure 9: Outlier detection for Relative Log Expression (RLE).
Figure 9 (PDF file) shows a bar chart of the Kolmogorov-Smirnov statistic
Ra of the RLE values, the outlier detection criterion from the previous figure. The bars are shown in the original order of the arrays. Based on the distribution of the values across all arrays, a threshold of 0.446 was determined, which is indicated by the vertical line. None of the arrays exceeded the threshold and was considered an outlier.
- Figure 10: Normalized Unscaled Standard Error (NUSE).
Figure 10 (PDF file) shows the
Normalized Unscaled Standard Error (NUSE) plot. For each array, the boxes should be centered around 1. An array were the values are elevated relative to the other arrays is typically of lower quality. Outlier detection was performed by computing the 75% quantile
Na of each array's NUSE values and looking for arrays with large
Na.
+ Figure 11: Outlier detection for Normalized Unscaled Standard Error (NUSE).
Figure 11 (PDF file) shows a bar chart of the
Na, the outlier detection criterion from the previous figure. The bars are shown in the original order of the arrays. Based on the distribution of the values across all arrays, a threshold of 1.04 was determined, which is indicated by the vertical line. None of the arrays exceeded the threshold and was considered an outlier.
- Figure 12: RNA digestion plot.
Figure 12 (PDF file) shows the
RNA digestion plot. The shown values are computed from the preprocessed data (after background correction and quantile normalisation). Each array is represented by a single line; move the mouse over the lines to see their corresponding sample names. The plot can be used to identify array(s) that have a slope very different from the others. This could indicate that the RNA used for that array has been handled differently from what was done for the other arrays.
- Figure 13: Perfect matches and mismatches.

Figure shows the density distributions of the log
2 intensities grouped by the matching type of the probes. The blue line shows a density estimate (smoothed histogram) from intensities of perfect match probes (PM), the grey line, one from the mismatch probes (MM). We expect that MM probes have poorer hybridization than PM probes, and thus that the PM curve be to the right of the MM curve.
- Figure 14: MA plots.
Figure 14 (PDF file) shows MA plots. M and A are defined as:
M = log
2(I
1) - log
2(I
2)
A = 1/2 (log
2(I
1)+log
2(I
2)),
where I
1 is the intensity of the array studied, and I
2 is the intensity of a "pseudo"-array that consists of the median across arrays. Typically, we expect the mass of the distribution in an MA plot to be concentrated along the M = 0 axis, and there should be no trend in M as a function of A. If there is a trend in the lower range of A, this often indicates that the arrays have different background intensities; this may be addressed by background correction. A trend in the upper range of A can indicate saturation of the measurements; in mild cases, this may be addressed by non-linear normalisation (e.g. quantile normalisation).
Outlier detection was performed by computing Hoeffding's statistic
Da on the joint distribution of A and M for each array. The value of
Da is shown in the panel headings. One array had
Da>0.15 and was marked as outlier. For more information on Hoeffing's
D-statistic, please see the manual page of the function
hoeffd in the
Hmisc package.
+ Figure 15: Outlier detection for MA plots.
Figure 15 (PDF file) shows a bar chart of the Hoeffding's statistic
Da, the outlier detection criterion from the previous figure. The bars are shown in the original order of the arrays. A threshold of 0.15 was used, which is indicated by the vertical line. One array exceeded the threshold and was considered an outlier.