Leiden Muscular Dystrophy pages©
Gene identity and function ?
(last modified April, 2008)
Introduction
You are performing a genome-wide research project using arrays, e.g.
- a
gene expression profiling study
- a project to determine
differential methylation using CpG-islands
- a linkage or association mapping a disease to a candidate gene region
- a gene expression study following differentiating cells
- a
study searching for copy number variation (CNV) in relation to disease (e.g.
patients with mental retardation)
Your data clearly point in the direction of specific genomic region or
even a specific gene-X as potentially
involved. However, gene-X has no assigned gene-ID (see HGNC-site)
and little is
known about this gene. To give any priority to gene-X (and other genes
in the
region)
for follow-up
studies, we urgently need a summary of the collected database knowledge
of gene-X.
In addition we need to design primer pairs to be able to confirm
that
the gene is transcribed, to verify the predicted transcript(s), to
perform quantitative
RT-PCR measurements (where - which tissues -, when and at what level is
the gene
transcribed).
Note
Besides those mentioned below, the Bioinformatics
Tools section from the Center for Human and Clinical Genetics might
contain
helpful hints to guide your analysis.
Task
Gene structure
Go to the genome browser (UCSC, Ensembl,
NCBI-Map
Viewer), locate the gene and use the links provided to generate the
data below
RNA
- what transcripts are annotated for this gene ?
- using BLAST searches
(see task Mutation analysis - gene
structure) verify these transcripts
and check whether or not there are
other transcripts potentially deriving from the same gene / genomic
region. When no transcripts are annotated assemble transcripts yourself
(use BLAST searches; see task Mutation
analysis - gene structure).
- what is the structure of the gene (how many exons
/ introns, their sizes, etc.) ?
- using the transcript sequence determined design
primers for (q)RT-PCR analysis of the gene's expression (for design use
the
Roche UPL method)
- based on the BLAST hits (esp. dbEST) and other sources
(e.g. SOURCE,
Gene Expression Omnibus
(GEO), ArrayExpress),
in which tissues and during which times of
development is the gene expressed ?
Protein
- predict the encoded protein.
NOTE: two tools can be helpful for this a) the ORF-finder
(NCBI)
looking for large open reading frames and b) translated BLAST (BlastX
or tBlastX),
using the option to
translate a DNA sequence into protein and use this protein to search
against; (i) protein databases (BlastX)
and (ii) translated DNA sequence databases (tBlastX).
Any clear similarity you pick up makes it very likely that
that respective part of the DNA sequence encodes a true part of the
protein.
- analyse the predicted protein sequence and use it
to predict the function of the protein (see task Pathogenic or not ?).
- Based on these
predictions, what would be a proper name and Gene-ID for gene-X. To
give gene-X a proper name, submit your suggestion to the HUGO Gene
Nomenclature Committee (HGNC).
Disease relations ?
- when this gene is deleted, duplicated or otherwise mutated
does this have known (pathogenic) consequences ?
NOTE: check the genome browser (UCSC, Ensembl, NCBI-Map
Viewer), OMIM
(Gene Map
and Morbid Map),
the Database of Genomic
Variants or
other sites.
Result
What is your overall conclusion ?.
| Top of page | LMDp homepage
|
| Remarks / information | Copyright©,
liability |