Links page for Dina Workshop on Bioinformatics
Links for Database search lecture and exercises
Links for Alignment lecture and exercises
Links for QTL lecture and exercises
Links for Phylogeny lecture and exercises
Links from the articles in the Trends Guide to Bioinformatics 1998
Text-based database searching
Practical database searching
Computational genefinding
Multiple-alignment & -sequence searches
Protein classification & functional assignment
Phylogenetic analysis & comparative genomics
Databases of biological information
Functional genomics
Other resources -- Background material
(material that the workshop organisers have found useful or interesting)
- Primer
on Molecular Genetics from the U.S. Department of Energy
- The course Topics in
Computational Biology at Aarhus University Department of Computer
Science
- Many links relevant to Agricultural
Biotechnology
- Durbin, R., Eddy, S.R., Krogh, A. \& Mitchison, G. (1998): Biological Sequence
Analysis, Cambridge University Press.
- Waterman, M.S. (1995): Introduction to Computational Biology. Maps,
Sequences and Genomes. Chapman and Hall.
- Larry Gonick & Mark Wheelis: The Cartoon Guide to Genetics,
HarperPerennial 1991.
- Setubal & Meidanis: Introduction to computational molecular biology,
PWS 1997.
- Mary Sara McPeek: An introduction to recombination and linkage analysis, In T.
Speed og M.S. Waterman (eds.): Genetic Mapping and DNA Sequencing,
Springer-Verlag 1996, pp. 1-14.
Links from the articles in the Trends Guide to Bioinformatics 1998
Text-based database searching
- National Center for Biotechnology Information (NCBI)
- European Bioinformatics Institute (EBI)
- GenomeNet (Kyoto University and University of Tokyo)
Practical database searching
Computational genefinding
- Computational genefinding biobliographies
- Genefinding datasets
- Single genes
- Annotated contigs
- Some HMM-based genefinder genes
- Some further genefinders
Multiple-alignment & -sequence searches
- Software and databases used in example analysis
- Other profile and profile HMM software packages
- Web servers for multiple alignment
- Other lists of pointers
- One source for the Linux operating system
Protein classification & functional assignment
Phylogenetic analysis & comparative genomics
Databases of biological information
- Sequence annotations
- Sequence motifs
- Sequence classification
- 3-D fold classifications
- Specific molecules
- Biochemical pathways
- Orthologous genes
- Organism classifications
- Medical resources
Functional genomics
Data sets for Phylogeny exercises
Exercise 1
4 3
seq1 AAG
seq2 AAA
seq3 GGA
seq4 AGA
Exercise 2
5 846
chimpanzeeAAGCTTCACC GGCGCAATTA TCCTCATAAT CGCCCACGGA CTTACATCCT
gibbonxxxxAAGCTTTACA GGTGCAACCG TCCTCATAAT CGCCCACGGA CTAACCTCTT
gorillaexxAAGCTTCACC GGCGCAGTTG TTCTTATAAT TGCCCACGGA CTTACATCAT
homosapienAAGCTTCACC GGCGCAGTCA TTCTCATAAT CGCCCACGGA CTTACATCCT
orangutangAAGCTTCACC GGCGCAACCA CCCTCATGAT TGCCCATGGA CTCACATCCT
CATTATTATT CTGCCTAGCA AACTCAAATT ATGAACGCAC CCACAGTCGC
CCCTGCTATT CTGCCTTGCA AACTCAAACT ACGAACGAAC TCACAGCCGC
CATTATTATT CTGCCTAGCA AACTCAAACT ACGAACGAAC CCACAGCCGC
CATTACTATT CTGCCTAGCA AACTCAAACT ACGAACGCAC TCACAGTCGC
CCCTACTGTT CTGCCTAGCA AACTCAAACT ACGAACGAAC CCACAGCCGC
ATCATAATTC TCTCCCAAGG ACTTCAAACT CTACTCCCAC TAATAGCCTT
ATCATAATCC TATCTCGAGG GCTCCAAGCC TTACTCCCAC TGATAGCCTT
ATCATAATTC TCTCTCAAGG ACTCCAAACC CTACTCCCAC TAATAGCCCT
ATCATAATCC TCTCTCAAGG ACTTCAAACT CTACTCCCAC TAATAGCTTT
ATCATAATCC TCTCTCAAGG CCTTCAAACT CTACTCCCCC TAATAGCCCT
TTGATGACTC CTAGCAAGCC TCGCTAACCT CGCCCTACCC CCTACCATTA
CTGATGACTC GCAGCAAGCC TCGCTAACCT CGCCCTACCC CCCACTATTA
TTGATGACTT CTGGCAAGCC TCGCCAACCT CGCCTTACCC CCCACCATTA
TTGATGACTT CTAGCAAGCC TCGCTAACCT CGCCTTACCC CCCACTATTA
CTGATGACTT CTAGCAAGCC TCACTAACCT TGCCCTACCA CCCACCATCA
ATCTCCTAGG GGAACTCTCC GTGCTAGTAA CCTCATTCTC CTGATCAAAT
ACCTCCTAGG TGAACTCTTC GTACTAATGG CCTCCTTCTC CTGGGCAAAC
ACCTACTAGG AGAGCTCTCC GTACTAGTAA CCACATTCTC CTGATCAAAT
ACCTACTGGG AGAACTCTCT GTGCTAGTAA CCACATTCTC CTGATCAAAT
ACCTTCTAGG AGAACTCTCC GTACTAATAG CCATATTCTC TTGATCTAAC
ACCACTCTCC TACTCACAGG ATTCAACATA CTAATCACAG CCCTGTACTC
ACTACTATTA CACTCACCGG GCTCAACGTA CTAATCACGG CCCTATACTC
ACCACCCTTT TACTTACAGG ATCTAACATA CTAATCACAG CCCTGTACTC
ATCACTCTCC TACTTACAGG ACTCAACATA CTAGTCACAG CCCTATACTC
ATCACCATCC TACTAACAGG ACTCAACATA CTAATCACAA CCCTATACTC
CCTCTACATG TTTACCACAA CACAATGAGG CTCACTCACC CACCACATTA
CCTTTACATA TTTATCATAA CACAACGAGG CACACTTACA CACCACATTA
CCTTTATATA TTTACCACAA CACAATGAGG CCCACTCACA CACCACATCA
CCTCTACATA TTTACCACAA CACAATGAGG CTCACTCACC CACCACATTA
TCTCTATATA TTCACCACAA CACAACGAGG TACACCCACA CACCACATCA
ATAACATAAA GCCCTCATTC ACACGAGAAA ATACTCTCAT ATTTTTACAC
AAAACATAAA ACCCTCACTC ACACGAGAAA ACATATTAAT ACTTATGCAC
CCAACATAAA ACCCTCATTT ACACGAGAAA ACATCCTCAT ATTCATGCAC
ACAACATAAA ACCCTCATTC ACACGAGAAA ACACCCTCAT GTTCATACAC
ACAACATAAA ACCTTCTTTC ACACGCGAAA ATACCCTCAT GCTCATACAC
CTATCCCCCA TCCTCCTTCT ATCCCTCAAT CCTGATATCA TCACTGGATT
CTCTTCCCCC TCCTCCTCCT AACCCTCAAC CCTAACATCA TTACTGGCTT
CTATCCCCCA TCCTCCTCCT ATCCCTCAAC CCCGATATTA TCACCGGGTT
CTATCCCCCA TTCTCCTCCT ATCCCTCAAC CCCGACATCA TTACCGGGTT
CTATCCCCCA TCCTCCTCTT ATCCCTCAAC CCCAGCATCA TCGCTGGGTT
CACCTCCTGT AAATATAGTT TAACCAAAAC ATCAGATTGT GAATCTGACA
TACTCCCTGT AAACATAGTT TAATCAAAAC ATTAGATTGT GAATCTAACA
CACCTCCTGT AAATATAGTT TAACCAAAAC ATCAGATTGT GAATCTGATA
TTCCTCTTGT AAATATAGTT TAACCAAAAC ATCAGATTGT GAATCTGACA
CGCCTACTGT AAATATAGTT TAACCAAAAC ATTAGATTGT GAATCTAATA
ACAGAGGCTC ACGACCCCTT ATTTACCGAG AAAGCTTATA AGAACTGCTA
ATAGAGGCTC GAAACCTCTT GCTTACCGAG AAAGCCCACA AGAACTGCTA
ACAGAGGCTC ACAACCCCTT ATTTACCGAG AAAGCTCGTA AGAGCTGCTA
ACAGAGGCTT ACGACCCCTT ATTTACCGAG AAAGCTCACA AGAACTGCTA
ATAGGGCCCC ACAACCCCTT ATTTACCGAG AAAGCTCACA AGAACTGCTA
ATTCATATCC CCATGCCTGA CAACATGGCT TTCTCAACTT TTAAAGGATA
ACTCACTATC CCATGTATGA CAACATGGCT TTCTCAACTT TTAAAGGATA
ACTCATACCC CCGTGCTTGA CAACATGGCT TTCTCAACTT TTAAAGGATA
ACTCATGCCC CCATGTCTAA CAACATGGCT TTCTCAACTT TTAAAGGATA
ACTCNTCACT CCATGTGTGA CAACATGGCT TTCTCAGCTT TTAAAGGATA
ACAGCCATCC GTTGGTCTTA GGCCCCAAAA ATTTTGGTGC AACTCCAAAT
ACAGCTATCC ATTGGTCTTA GGACCCAAAA ATTTTGGTGC AACTCCAAAT
ACAGCTATCC ATTGGTCTTA GGACCCAAAA ATTTTGGTGC AACTCCAAAT
ACAGCTATCC ATTGGTCTTA GGCCCCAAAA ATTTTGGTGC AACTCCAAAT
ACAGCTATCC CTTGGTCTTA GGATCCAAAA ATTTTGGTGC AACTCCAAAT
AAAAGTAATA ACCATGTATA CTACCATAAC CACCTTAACC CTAACTCCCT
AAAAGTAATA GCAATGTACA CCACCATAGC CATTCTAACG CTAACCTCCC
AAAAGTAATA ACTATGTACG CTACCATAAC CACCTTAGCC CTAACTTCCT
AAAAGTAATA ACCATGCACA CTACTATAAC CACCCTAACC CTGACTTCCC
AAAAGTAACA GCCATGTTTA CCACCATAAC TGCCCTCACC TTAACTTCCC
TAATTCTCCC CATCCTCACC ACCCTCATTA ACCCTAACAA AAAAAACTCA
TAATTCCCCC CATTACAGCC ACCCTTATTA ACCCCAATAA AAAGAACTTA
TAATTCCCCC TATCCTTACC ACCTTCATCA ATCCTAACAA AAAAAGCTCA
TAATTCCCCC CATCCTTACC ACCCTCGTTA ACCCTAACAA AAAAAACTCA
TAATCCCCCC CATTACCGCT ACCCTCATTA ACCCCAACAA AAAAAACCCA
TATCCCCATT ATGTGAAATC CATTATCGCG TCCACCTTTA TCATTAGCCT
TACCCGCACT ACGTAAAAAT GACCATTGCC TCTACCTTTA TAATCAGCCT
TACCCCCATT ACGTAAAATC TATCGTCGCA TCCACCTTTA TCATCAGCCT
TACCCCCATT ATGTAAAATC CATTGTCGCA TCCACCTTTA TTATCAGTCT
TACCCCCACT ATGTAAAAAC GGCCATCGCA TCCGCCTTTA CTATCAGCCT
TTTCCCCACA ACAATATTCA TATGCCTAGA CCAAGAAGCT ATTATCTCAA
ATTTCCCACA ATAATATTCA TGTGCACAGA CCAAGAAACC ATTATTTCAA
CTTCCCCACA ACAATATTTC TATGCCTAGA CCAAGAAGCT ATTATCTCAA
CTTCCCCACA ACAATATTCA TGTGCCTAGA CCAAGAAGTT ATTATCTCGA
TATCCCAACA ACAATATTTA TCTGCCTAGG ACAAGAAACC ATCGTCACAA
ACTGGCACTG AGCAACAACC CAAACAACCC AGCTCTCCCT AAGCTT
ACTGACACTG AACTGCAACC CAAACGCTAG AACTCTCCCT AAGCTT
GCTGACACTG AGCAACAACC CAAACAATTC AACTCTCCCT AAGCTT
ACTGACACTG AGCCACAACC CAAACAACCC AGCTCTCCCT AAGCTT
ACTGATGCTG AACAACCACC CAGACACTAC AACTCTCACT AAGCTT
Exercise 4
5 149
chimpanzeeSFTGAIILIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAF
gibbonxxxxSFTGATVLIIAHGLTSSLLFCLANSNYERTHSRIIILSRGLQALLPLIAF
gorillaexxSFTGAVVLIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAL
homosapienSFTGAVILIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAF
orangutangSFTGATTLMIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAL
LLASLANLALPPTINLLGELSVLVTSFSSNTTLLLTGFNILITALYS
LAASLANLALPPTINLLGELFVLMASFSANTTITLTGLNVLITALYS
LLASLANLALPPTINLLGELSVLVTTFSSNTTLLLTGSNILITALYS
LLASLANLALPPTINLLGELSVLVTTFSSNITLLLTGLNILVTALYS
LLASLTNLALPPTINLLGELSVLIAIFSSNITILLTGLNILITTLYS
LYMFTTTQGSLTHHINNIKPSFTRENTLIFLHLSPILLLSLNPDIITGF
LYIFIITQGTLTHHIKNIKPSLTRENILILMHLFPLLLLTLNPNIITGF
LYIFTTTQGPLTHHITNIKPSFTRENILIFMHLSPILLLSLNPDIITGF
LYIFTTTQGSLTHHINNIKPSFTRENTLMFIHLSPILLLSLNPDIITGF
LYIFTTTQGTPTHHINNIKPSFTRENTLMLIHLSPILLLSLNPSIIAGF
TSC
TPC
TSC
SSC
AYC
Output files for phylogenetic exercises
Exercise 1
DNA parsimony algorithm, version 3.573c
Name Sequences
---- ---------
seq1 AAG
seq2 ..A
seq3 GGA
seq4 .GA
One most parsimonious tree found:
+--seq4
+--3
+--2 +--seq3
! !
--1 +-----seq2
!
+--------seq1
remember: this is an unrooted tree!
requires a total of 3.000
steps in each site:
0 1 2 3 4 5 6 7 8 9
*-----------------------------------------
0! 1 1 1
From To Any Steps? State at upper node
( . means same as in the node below it on tree)
1 AAR
1 2 maybe ..A
2 3 yes .G.
3 seq4 no ...
3 seq3 yes G..
2 seq2 no ...
1 seq1 maybe ..G
Exercise 2a
DNA parsimony algorithm, version 3.573c
One most parsimonious tree found:
+--------gorillaexx
+--2
! ! +-----homosapien
! +--3
--1 ! +--orangutang
! +--4
! +--gibbonxxxx
!
+-----------chimpanzee
remember: this is an unrooted tree!
requires a total of 330.000
Exercise 2b
Neighbor-Joining/UPGMA method version 3.573c
+-------gibbonxxxx
+--1
+--2 +-----orangutang
! !
! +---gorillaexx
!
--3-homosapien
!
+--chimpanzee
remember: this is an unrooted tree!
Between And Length
------- --- ------
3 2 0.00318
2 1 0.03598
1 gibbonxxxx 0.12602
1 orangutang 0.09198
2 gorillaexx 0.05777
3 homosapien 0.04015
3 chimpanzee 0.05195
Exercise 2c (dnaml program)
Nucleic acid sequence Maximum Likelihood method, version 3.573c
Empirical Base Frequencies:
A 0.30929
C 0.32750
G 0.10570
T(U) 0.25751
Transition/transversion ratio = 2.000000
(Transition/transversion parameter = 1.653039)
+-homosapien
+--2
! ! +-----orangutang
! +--3
! +-------gibbonxxxx
!
--1---gorillaexx
!
+--chimpanzee
remember: this is an unrooted tree!
Ln Likelihood = -2514.48557
Examined 17 trees
Between And Length Approx. Confidence Limits
------- --- ------ ------- ---------- ------
1 2 0.01720 ( 0.00594, 0.02847) **
2 homosapien 0.02875 ( 0.01514, 0.04262) **
2 3 0.05455 ( 0.03466, 0.07469) **
3 orangutang 0.09121 ( 0.06758, 0.11573) **
3 gibbonxxxx 0.13271 ( 0.10424, 0.16187) **
1 gorillaexx 0.06191 ( 0.04344, 0.08064) **
1 chimpanzee 0.05097 ( 0.03402, 0.06806) **
* = significantly positive, P < 0.05
** = significantly positive, P < 0.01
Exercise 2d
(Bootstrapping of parsimony)
Majority-rule and strict consensus tree program, version 3.573c
Species in order:
gorillaexx
homosapien
orangutang
gibbonxxxx
chimpanzee
Sets included in the consensus tree
Set (species in order) How many times out of 100.00
..**. 100.00
.***. 73.67
Sets NOT included in consensus tree:
Set (species in order) How many times out of 100.00
..*** 20.17
.*..* 6.17
CONSENSUS TREE:
the numbers at the forks indicate the number
of times the group consisting of the species
which are to the right of that fork occurred
among the trees, out of 100.00 trees
+----gibbonxxxx
+-100.0
+-73.7 +----orangutang
! !
+-100.0 +---------homosapien
! !
! +--------------chimpanzee
!
+-------------------gorillaexx
remember: this is an unrooted tree!
Exercise 3 - ml method
Nucleic acid sequence Maximum Likelihood method, version 3.573c
Empirical Base Frequencies:
A 0.30864
C 0.32781
G 0.10580
T(U) 0.25775
Transition/transversion ratio = 2.000000
(Transition/transversion parameter = 1.650539)
+-homosapien
+--2
! ! +-----orangutang
! +--3
! +-------gibbonxxxx
!
--1---gorillaexx
!
+--chimpanzee
remember: this is an unrooted tree!
Ln Likelihood = -2513.07579
Examined 17 trees
Between And Length Approx. Confidence Limits
------- --- ------ ------- ---------- ------
1 2 0.01723 ( 0.00595, 0.02850) **
2 homosapien 0.02879 ( 0.01516, 0.04268) **
2 3 0.05462 ( 0.03470, 0.07478) **
3 orangutang 0.09120 ( 0.06758, 0.11573) **
3 gibbonxxxx 0.13287 ( 0.10436, 0.16207) **
1 gorillaexx 0.06198 ( 0.04350, 0.08074) **
1 chimpanzee 0.05088 ( 0.03411, 0.06809) **
* = significantly positive, P < 0.05
** = significantly positive, P < 0.01
Exercise 4 (parsimony and neighbour joining methods)
Protein parsimony algorithm, version 3.573c
One most parsimonious tree found:
+--homosapien
+-----3
! +--gorillaexx
+--2
! ! +--orangutang
--1 +-----4
! +--gibbonxxxx
!
+-----------chimpanzee
remember: this is an unrooted tree!
requires a total of 57.000
Neighbor-Joining/UPGMA method version 3.573c
Neighbor-joining method
Negative branch lengths allowed
+--------gibbonxxxx
+--1
! +-----orangutang
!
--3-gorillaexx
!
! +-chimpanzee
+--2
+-homosapien
remember: this is an unrooted tree!
Between And Length
------- --- ------
3 1 0.02121
1 gibbonxxxx 0.14023
1 orangutang 0.08913
3 gorillaexx 0.03381
3 2 0.00696
2 chimpanzee 0.02877
2 homosapien 0.03076
Henrik Stryhn
(hes@svs.dk) 2000-04-07