Leiden Muscular Dystrophy pages^©

Gene identity and function ?

(last modified April, 2008)

Introduction

You are performing a genome-wide research project using arrays, e.g.

a gene expression profiling study
a project to determine differential methylation using CpG-islands
a linkage or association mapping a disease to a candidate gene region
a gene expression study following differentiating cells
a study searching for copy number variation (CNV) in relation to disease (e.g. patients with mental retardation)

Your data clearly point in the direction of specific genomic region or even a specific gene-X as potentially involved. However, gene-X has no assigned gene-ID (see HGNC-site) and little is known about this gene. To give any priority to gene-X (and other genes in the region) for follow-up studies, we urgently need a summary of the collected database knowledge of gene-X.

In addition we need to design primer pairs to be able to confirm that the gene is transcribed, to verify the predicted transcript(s), to perform quantitative RT-PCR measurements (where - which tissues -, when and at what level is the gene transcribed).

Note

Besides those mentioned below, the Bioinformatics Tools section from the Center for Human and Clinical Genetics might contain helpful hints to guide your analysis.

Task

Gene structure

Go to the genome browser (UCSC, Ensembl, NCBI-Map Viewer), locate the gene and use the links provided to generate the data below

RNA

what transcripts are annotated for this gene ?
using BLAST searches (see task Mutation analysis - gene structure) verify these transcripts and check whether or not there are other transcripts potentially deriving from the same gene / genomic region. When no transcripts are annotated assemble transcripts yourself (use BLAST searches; see task Mutation analysis - gene structure).
what is the structure of the gene (how many exons / introns, their sizes, etc.) ?
using the transcript sequence determined design primers for (q)RT-PCR analysis of the gene's expression (for design use the Roche UPL method)
based on the BLAST hits (esp. dbEST) and other sources (e.g. SOURCE, Gene Expression Omnibus (GEO), ArrayExpress), in which tissues and during which times of development is the gene expressed ?

Protein

predict the encoded protein.
NOTE: two tools can be helpful for this a) the ORF-finder (NCBI) looking for large open reading frames and b) translated BLAST (BlastX or tBlastX), using the option to translate a DNA sequence into protein and use this protein to search against; (i) protein databases (BlastX) and (ii) translated DNA sequence databases (tBlastX). Any clear similarity you pick up makes it very likely that that respective part of the DNA sequence encodes a true part of the protein.
analyse the predicted protein sequence and use it to predict the function of the protein (see task Pathogenic or not ?).
Based on these predictions, what would be a proper name and Gene-ID for gene-X. To give gene-X a proper name, submit your suggestion to the HUGO Gene Nomenclature Committee (HGNC).

Disease relations ?

when this gene is deleted, duplicated or otherwise mutated does this have known (pathogenic) consequences ?
NOTE: check the genome browser (UCSC, Ensembl, NCBI-Map Viewer), OMIM (Gene Map and Morbid Map), the Database of Genomic Variants or other sites.

Result

What is your overall conclusion ?.