gfp {yeastExpData} | R Documentation |
This data frame contains data concerning the localization and abundance of various yeast proteins.
data(gfp)
A data frame with 6234 observations on the following 33 variables.
orfid
yORF
YAL001C
, YAL002W
, etc. These are also the
row names of the data frame.gene_name
AAC1
, AAC3
, etc. GFP_tagged
not tagged
and
tagged
, indicating whether or not the ORF was GFP taggedGFP_visualized
not
visualized
and visualized
, indicating whether or not GFP
fluoresence was visualizedTAP_visualized
TAP
visualized
and not TAP visualized
, indicating success of
TAP tagabundance
error
abundance
(see details below)localization_summary
,
ER
, ER to Golgi
, ER,ambiguous
,
ER,ambiguous,bud
, etc. Summarizes the information
contained in the subsequent columns. The following columns indicate whether or not the protein was localized in the specific region of the cell. A protein can be localized in more than one region.
ambiguous
mitochondrion
vacuole
spindle_pole
cell_periphery
punctate_composite
vacuolar_membrane
ER
nuclear_periphery
endosome
bud_neck
microtubule
Golgi
late_Golgi
peroxisome
actin
nucleolus
cytoplasm
ER_to_Golgi
early_Golgi
lipid_particle
nucleus
bud
Explanation for missing abundance values are given by
missingAbundance
low signal
, not visualized
and
technical problem
The information on abundance is available in three columns.
abundance
gives (where available) absolute protein abundances
determined by quantitative Western blot analysis of TAP-tagged
strains. Abundances that have a non-NA
error
value were
done in triplicate with serial dilutions of purified TAP-tagged
standards included in each gel, which substantially reduces the
measurement error. In addition, for these strains, the tagged genes
were confirmed to rescue the loss of function phenotype of the
corresponding deletion strain. For rows where abundance
is
missing (NA
), the missingAbundance
column gives the
reason. Possible reasons are:
"not visualized"
"low signal"
"technical problem"
Replicate analysis for a subset of tagged strains found a linear correlation coefficient of R = 0.94, with the pairs of proteins having a median variation of a factor of 2.0. This error analysis does not account for potential alterations in the endogenous levels of the proteins caused by the the fused tag, which may be particularly disruptive for small proteins.
The data were obtained from http://yeastgfp.ucsf.edu/, which contains a lot more information as well as raw image data. This data frame was specifically generated from http://yeastgfp.ucsf.edu/allOrfData.txt
For the Localization data: Huh, et al., Nature 425, 686-691 (2003) – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14562095&dopt=Abstract
For the Protein abundance data: Ghaemmaghami, et al., Nature 425, 737-741 (2003) – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14562106&dopt=Abstract
data(gfp) keep <- names(which(table(gfp$localization_summary) > 50)) if (require(lattice)) { bwplot(reorder(localization_summary, abundance, median, na.rm = TRUE) ~ log2(abundance), gfp, varwidth = TRUE, subset = localization_summary %in% keep) } else { opar <- par(las = 2, mar = par("mar") + c(3.5, 0, 0, 0)) gfp._sub <- subset(gfp, localization_summary %in% keep) gfp._sub$localization_summary <- gfp._sub$localization_summary[, drop = TRUE] boxplot(log2(abundance) ~ reorder(localization_summary, abundance, median, na.rm = TRUE), data = gfp._sub, varwidth = TRUE) rm(gfp._sub) par(opar) }