Human and Clinical Genetics - LUMC, "Research,
the internet and mining data"©
Tasks: working with protein sequences
(Johan den Dunnen / Peter Taschner, last modification by Peter Taschner
March 28, 2007)
Content
- General remarks
- Sequence retrieval
- Homology searches
- Structural analysis
- Basic: retrieve and analyse
protein sequence
General remarks
The tasks are in a direct logical order. Please, browse through the tasks provided and
select that set which fits your interest best, i.e. extending your exisitng knowledge. For
most tasks it is recommended to start with the Course Bookmarks
although an attempt to locate the most appropriate start site through a general search
engine (e.g. AltaVista, Lycos, other) is an instructive and worthwhile excercise. For each
task a specific sequence is choosen, please feel free to use your favourite sequence. A
descriptive start can also be made through the links provided in the "DNA analysis" page.
Sequence retrieval
- search the WWW with the sequence:
- "MLWWEEVEDCY", what is it ?
- "GlnGlnGlnGlnGlnGlnGln" or "QQQQQQQQQQQQ"
- databases in GenBank
- which independent protein databases does GenBank keep ?
- what is the difference between the "Swiss Prot" and the
"non-redundant" protein databases ?
- find a protein sequence in GenBank using Entrez
- was any sequence submitted with your contribution ?
- is it possible to use Entrez to find a sequence containing "MLWWEEVEDCY" ?
- was any sequence submitted with your contribution ?
- did J.T. den Dunnen submit any sequence ?
- find the (human) utrophin protein sequence
- retrieve and save it (in GenBank format) for later use
- which format do you need ?
Homology searches
Note: use the BCM- or aBi-site as a good
WWW-starting point. Find, download and install also the SEQUIN
software package (for sequence submission and sequence analysis).
- take the human dystrophin protein sequence and change its format into FastA
- which input format do you need ?
- is your name found in a protein ?
- which letters does the "single-letter amino acid code"
contain ?
- use the human dystrophin sequence to perform a BLAST search
- try Swiss Prot, the non-redundant DNA database and translations of the DNA sections
(e.g. HTGS and dbEST); what are the differences ?
- retrieve as many homologous sequences as possible
- how many dystrophin sequences are known ?
- perform a multiple alignment with at least 5 homologous sequences retrieved
Structural analysis
Note: use the BCM- or aBi-site as a good
WWW-starting point. Find, download and install also the SEQUIN
software package (for sequence submission and sequence analysis).
- use task_sequence_2 to perform a BLAST search
- try Swiss Prot, the non-redundant DNA database and translations of the DNA sections
(e.g. dbEST); what are the differences ?
- which protein domains does the protein in task_sequence_2
contain ?
- do these domains overlap with the segments showing homology to other proteins as derived
from the BLAST search ?
| Top of page | Course Bookmarks |