An alternative for Affymetrix genotyping based on principal component analysis and its application to array and SNP quality control, as well as using genotyping arrays for CGH analysis

Gerard te Meerman

Medical Genetics UMCG and Groningen Bioinformatics Centre

Current genotyping arrays from Affymetrix use a design where many different probes are used to derive the genotype and a possible non-informative state. The informativeness of these probes can be investigated by principal component analysis of the data from one array and from a series of arrays for a specific snp. The results show that there is only one major difference between all the probes: the matching probes are more correlated with the genotype than the mismatch probes. There is very little difference between snp's regarding to the informativity of spots. This indicates that the arrays can be substantially simplified and improved by omitting the mismatch spots. A histogram of the genotyping component for each array and for each spot gives a good indication of the quality and can be used to obtain a reliable genotype. The snp's vary in a systematic way considerably in sensitivity: about 80% of the variance over an array with regard to the difference between the homozygous and heterozygous state is explained by SNP specific properties. After correcting for array and individual SNP sensitivity plots can be made along the genome that can be used for CGH analysis.