Complex SNP-related sequence variation in segmental genome duplications

Stefan White, David Fredman*, Susanna Potter*, Evan Eichler#, Johan den Dunnen, Anthony Brookes*

Human and Clinical Genetics, Leiden University Medical Center, Leiden; *Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden; #Department of Genetics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, USA.

There is uncertainty about the true nature of predicted SNPs in segmental duplications (duplicons) and whether these markers genuinely exist at increased density as indicated in public databases. We explored these issues by genotyping 157 predicted SNPs in duplicons and control regions in normal diploid genomes and fully homozygous complete hydatidiform moles. Our data identified many true SNPs in duplicon regions and few paralogous sequence variants. Twenty-eight percent of the polymorphic duplicon sequences we tested involved multisite variation, a new type of polymorphism representing the sum of the signals from many individual duplicon copies that vary in sequence content due to duplication, deletion or gene conversion. Multisite variations can masquarade as normal SNPs when genotyped. Given that duplicons comprise at least 5% of the genome and many are yet to be annotated in the genome draft, effective strategies to identify multisite variation must be established and deployed.