Dept. Human and Clinical Genetics - LUMC, “RESEARCH and the INTERNET”©

Tasks: working with DNA sequences

(J.T. den Dunnen)


Content



General remarks


The tasks are in a direct logical order. Please, browse through the tasks provided and select the one which fits your interest best, i.e. extending your exisitng knowledge. For most tasks it is recommended to start from a link in the Course Bookmarks although an attempt to locate the most appropriate start site through a general search engine (e.g. AltaVista, Excite, Lycos, other) is an instructive and worthwhile excercise. For each task a specific sequence is choosen, please feel free to use your favourite sequence. A start can also be made through the links provided in the "DNA analysis" page.


Sequence retrieval


1.  Retrieve a DNA sequence from GenBank
Go to the GenBank - Entrez server, select <Search Nucleotide> sequences, and search for the human dystrophin mRNA (cDNA) sequence (mutations in dystrophin cause Duchenne and Becker muscular dystrophy).

  1. save the sequence (GenBank report) on disk as dystrophin.gb (at the bottom of the Nucleotide QUERY page select <PC> and <Text>)
  2. save the sequence also in FASTA-format (on disk as dystrophin.fas)
  3. suppose the NCBI computer can not be accessed; try to retrieve the sequence from one of the other sequence databases, e.g. EMBL (Europe) or DDJB (Japan)

NOTE: depending on the formulation of your query, large numbers of "hits" may appear. Play with your query by making it more specific, restricting it to specific fields (<Search Field> drop down menu) or by using the <Add Term(s) to Query> field on the results page. Alternatively, go the LocusLink (NCBI) site and try whether dystrophin has been catalogued yet; if so you get a direct link to a curated reference sequence from the RefSeq database (NCBI).

2.  Other possibilities for retrieval from GenBank
Go to the GenBank - Entrez server and search for;

3.  Information in GenBank

4.  Sequences from other sources
Usually, general search engines can not be used to find and retireve DNA-sequences. However, in exceptional cases, general searches may hit sites of researchers working on specific subjects providing more detailed descriptions and/or even unpublished sequences


Structural analysis


I. General DNA analysis

NOTE: use the Atelier BioInformatique (aBi) or BCM (section "Sequence Utilities") sites, as a good starting point for the DNA-analysis tools required.

1.  Look for restriction sites
Take the dystrophin sequence retrieved and try to find whether EcoRI, NotI and SfiI sites are present.
NOTE: netwerk software for restriction mapping can e.g. be found at the aBi-site, under <Nucleic acids sequences>, <Map (restriction)>.

2.  Calculate a primer pair for PCR
Take the dystrophin sequence retrieved and try to design a primer pair for the analysis of RNA samples, i.e. to determine whether the gene is transcribed in specific tissues.

NOTE: use e.g. the Primer3 package (MIT). Other netwerk software for restriction mapping can e.g. be found at the aBi-site, under <Nucleic acids sequences>, <PCR primer selection>.

3.  Look for open reading frames

4.  Turn around a sequence
Turn around, i.e. reverse and complement the dystrophin sequence retrieved


II. Homology searches


1.  homologies in other organisms
Take the dystrophin sequence retrieved (dystrophin.gb) and select the 3' untranslated region. Perform a Blast-search against the non-redundant database

2.  from EST-homologies to a consensus cDNA-sequence
Take the dystrophin sequence retrieved (dystrophin.gb) and select from the 3' untranslated region about 400 bp immediately upstream from the polyA-addition site. Perform a Blast-search against dbEST using the EST-extractor at TIGEM


III. From sequence to gene

NOTE: difficult task

  1. use task_sequence_1 to perform a BLAST search (against the non-redundant DNA database)
  2. verify whether task_seq2.fas contains repetitive DNA sequences
  3. use gene/exon prediction tools (at least three different packages) to calculate the presence of gene(s) / exon(s) in this DNA segment
  4. does the region contain potential promoters ?


IV. From EST hits to a potential gene

NOTE: difficult task

  1. use the seq2_clean.fas sequence (or task_seq2.fas, see above) to perform BLAST database searches against dbEST, the non-redundant database and the HTGS section
  2. take the best hit from dbEST and use this sequence to repeat the dbEST BLAST search using the EST Extractor (at TIGEM)


The human genome sequence


The human genome is currently sequenced at an incredible rate. The current strategy is to determine a first draft sequence (finished spring 2000) and than focus on completing it (finished early 2003). The consequence of this is that sequences currently go to the high through-put genomic sequences (HTGS) section in the database and not to the non-redundant (NR) section. This has several consequences;


Sequence submission


1.  A tool to prepare a sequence for submission

  1. find, download and install the SEQUIN software package (for sequence submission and simple sequence analysis)
  2. find and retrieve, in <.ASN1 format>, the human dystrophin mRNA sequence (see above)
    NOTE: SEQUIN loads all details when the .ASN1 format is used)
  3. open the human dystrophin sequence in SEQUIN and try tasks 1 and 3 of the section "I. General DNA analysis"

2.  update a sequence directly through the WWW

Go to the sites of either BankIt (NCBI) or Submit (EBI) and look at the possibilities of updating/submitting sequences directly using the Internet.


| Top of page | Course Bookmarks |