Links page for Dina Workshop on Bioinformatics

Links for Database search lecture and exercises

Links for Alignment lecture and exercises

Links for QTL lecture and exercises

Links for Phylogeny lecture and exercises

Links from the articles in the Trends Guide to Bioinformatics 1998

Text-based database searching
Practical database searching
Computational genefinding
Multiple-alignment & -sequence searches
Protein classification & functional assignment
Phylogenetic analysis & comparative genomics
Databases of biological information
Functional genomics

Other resources -- Background material

(material that the workshop organisers have found useful or interesting)


Links from the articles in the Trends Guide to Bioinformatics 1998

Text-based database searching

Practical database searching

Computational genefinding

Multiple-alignment & -sequence searches

Protein classification & functional assignment

Phylogenetic analysis & comparative genomics

Databases of biological information

Functional genomics


Data sets for Phylogeny exercises

Exercise 1
4 3
seq1      AAG
seq2      AAA
seq3      GGA
seq4      AGA
Exercise 2
  5 846
chimpanzeeAAGCTTCACC GGCGCAATTA TCCTCATAAT CGCCCACGGA CTTACATCCT
gibbonxxxxAAGCTTTACA GGTGCAACCG TCCTCATAAT CGCCCACGGA CTAACCTCTT
gorillaexxAAGCTTCACC GGCGCAGTTG TTCTTATAAT TGCCCACGGA CTTACATCAT
homosapienAAGCTTCACC GGCGCAGTCA TTCTCATAAT CGCCCACGGA CTTACATCCT
orangutangAAGCTTCACC GGCGCAACCA CCCTCATGAT TGCCCATGGA CTCACATCCT

CATTATTATT CTGCCTAGCA AACTCAAATT ATGAACGCAC CCACAGTCGC
CCCTGCTATT CTGCCTTGCA AACTCAAACT ACGAACGAAC TCACAGCCGC
CATTATTATT CTGCCTAGCA AACTCAAACT ACGAACGAAC CCACAGCCGC
CATTACTATT CTGCCTAGCA AACTCAAACT ACGAACGCAC TCACAGTCGC
CCCTACTGTT CTGCCTAGCA AACTCAAACT ACGAACGAAC CCACAGCCGC

ATCATAATTC TCTCCCAAGG ACTTCAAACT CTACTCCCAC TAATAGCCTT
ATCATAATCC TATCTCGAGG GCTCCAAGCC TTACTCCCAC TGATAGCCTT
ATCATAATTC TCTCTCAAGG ACTCCAAACC CTACTCCCAC TAATAGCCCT
ATCATAATCC TCTCTCAAGG ACTTCAAACT CTACTCCCAC TAATAGCTTT
ATCATAATCC TCTCTCAAGG CCTTCAAACT CTACTCCCCC TAATAGCCCT

TTGATGACTC CTAGCAAGCC TCGCTAACCT CGCCCTACCC CCTACCATTA
CTGATGACTC GCAGCAAGCC TCGCTAACCT CGCCCTACCC CCCACTATTA
TTGATGACTT CTGGCAAGCC TCGCCAACCT CGCCTTACCC CCCACCATTA
TTGATGACTT CTAGCAAGCC TCGCTAACCT CGCCTTACCC CCCACTATTA
CTGATGACTT CTAGCAAGCC TCACTAACCT TGCCCTACCA CCCACCATCA

ATCTCCTAGG GGAACTCTCC GTGCTAGTAA CCTCATTCTC CTGATCAAAT
ACCTCCTAGG TGAACTCTTC GTACTAATGG CCTCCTTCTC CTGGGCAAAC
ACCTACTAGG AGAGCTCTCC GTACTAGTAA CCACATTCTC CTGATCAAAT
ACCTACTGGG AGAACTCTCT GTGCTAGTAA CCACATTCTC CTGATCAAAT
ACCTTCTAGG AGAACTCTCC GTACTAATAG CCATATTCTC TTGATCTAAC

ACCACTCTCC TACTCACAGG ATTCAACATA CTAATCACAG CCCTGTACTC
ACTACTATTA CACTCACCGG GCTCAACGTA CTAATCACGG CCCTATACTC
ACCACCCTTT TACTTACAGG ATCTAACATA CTAATCACAG CCCTGTACTC
ATCACTCTCC TACTTACAGG ACTCAACATA CTAGTCACAG CCCTATACTC
ATCACCATCC TACTAACAGG ACTCAACATA CTAATCACAA CCCTATACTC

CCTCTACATG TTTACCACAA CACAATGAGG CTCACTCACC CACCACATTA
CCTTTACATA TTTATCATAA CACAACGAGG CACACTTACA CACCACATTA
CCTTTATATA TTTACCACAA CACAATGAGG CCCACTCACA CACCACATCA
CCTCTACATA TTTACCACAA CACAATGAGG CTCACTCACC CACCACATTA
TCTCTATATA TTCACCACAA CACAACGAGG TACACCCACA CACCACATCA

ATAACATAAA GCCCTCATTC ACACGAGAAA ATACTCTCAT ATTTTTACAC
AAAACATAAA ACCCTCACTC ACACGAGAAA ACATATTAAT ACTTATGCAC
CCAACATAAA ACCCTCATTT ACACGAGAAA ACATCCTCAT ATTCATGCAC
ACAACATAAA ACCCTCATTC ACACGAGAAA ACACCCTCAT GTTCATACAC
ACAACATAAA ACCTTCTTTC ACACGCGAAA ATACCCTCAT GCTCATACAC

CTATCCCCCA TCCTCCTTCT ATCCCTCAAT CCTGATATCA TCACTGGATT
CTCTTCCCCC TCCTCCTCCT AACCCTCAAC CCTAACATCA TTACTGGCTT
CTATCCCCCA TCCTCCTCCT ATCCCTCAAC CCCGATATTA TCACCGGGTT
CTATCCCCCA TTCTCCTCCT ATCCCTCAAC CCCGACATCA TTACCGGGTT
CTATCCCCCA TCCTCCTCTT ATCCCTCAAC CCCAGCATCA TCGCTGGGTT

CACCTCCTGT AAATATAGTT TAACCAAAAC ATCAGATTGT GAATCTGACA
TACTCCCTGT AAACATAGTT TAATCAAAAC ATTAGATTGT GAATCTAACA
CACCTCCTGT AAATATAGTT TAACCAAAAC ATCAGATTGT GAATCTGATA
TTCCTCTTGT AAATATAGTT TAACCAAAAC ATCAGATTGT GAATCTGACA
CGCCTACTGT AAATATAGTT TAACCAAAAC ATTAGATTGT GAATCTAATA

ACAGAGGCTC ACGACCCCTT ATTTACCGAG AAAGCTTATA AGAACTGCTA
ATAGAGGCTC GAAACCTCTT GCTTACCGAG AAAGCCCACA AGAACTGCTA
ACAGAGGCTC ACAACCCCTT ATTTACCGAG AAAGCTCGTA AGAGCTGCTA
ACAGAGGCTT ACGACCCCTT ATTTACCGAG AAAGCTCACA AGAACTGCTA
ATAGGGCCCC ACAACCCCTT ATTTACCGAG AAAGCTCACA AGAACTGCTA

ATTCATATCC CCATGCCTGA CAACATGGCT TTCTCAACTT TTAAAGGATA
ACTCACTATC CCATGTATGA CAACATGGCT TTCTCAACTT TTAAAGGATA
ACTCATACCC CCGTGCTTGA CAACATGGCT TTCTCAACTT TTAAAGGATA
ACTCATGCCC CCATGTCTAA CAACATGGCT TTCTCAACTT TTAAAGGATA
ACTCNTCACT CCATGTGTGA CAACATGGCT TTCTCAGCTT TTAAAGGATA

ACAGCCATCC GTTGGTCTTA GGCCCCAAAA ATTTTGGTGC AACTCCAAAT
ACAGCTATCC ATTGGTCTTA GGACCCAAAA ATTTTGGTGC AACTCCAAAT
ACAGCTATCC ATTGGTCTTA GGACCCAAAA ATTTTGGTGC AACTCCAAAT
ACAGCTATCC ATTGGTCTTA GGCCCCAAAA ATTTTGGTGC AACTCCAAAT
ACAGCTATCC CTTGGTCTTA GGATCCAAAA ATTTTGGTGC AACTCCAAAT

AAAAGTAATA ACCATGTATA CTACCATAAC CACCTTAACC CTAACTCCCT
AAAAGTAATA GCAATGTACA CCACCATAGC CATTCTAACG CTAACCTCCC
AAAAGTAATA ACTATGTACG CTACCATAAC CACCTTAGCC CTAACTTCCT
AAAAGTAATA ACCATGCACA CTACTATAAC CACCCTAACC CTGACTTCCC
AAAAGTAACA GCCATGTTTA CCACCATAAC TGCCCTCACC TTAACTTCCC

TAATTCTCCC CATCCTCACC ACCCTCATTA ACCCTAACAA AAAAAACTCA
TAATTCCCCC CATTACAGCC ACCCTTATTA ACCCCAATAA AAAGAACTTA
TAATTCCCCC TATCCTTACC ACCTTCATCA ATCCTAACAA AAAAAGCTCA
TAATTCCCCC CATCCTTACC ACCCTCGTTA ACCCTAACAA AAAAAACTCA
TAATCCCCCC CATTACCGCT ACCCTCATTA ACCCCAACAA AAAAAACCCA

TATCCCCATT ATGTGAAATC CATTATCGCG TCCACCTTTA TCATTAGCCT
TACCCGCACT ACGTAAAAAT GACCATTGCC TCTACCTTTA TAATCAGCCT
TACCCCCATT ACGTAAAATC TATCGTCGCA TCCACCTTTA TCATCAGCCT
TACCCCCATT ATGTAAAATC CATTGTCGCA TCCACCTTTA TTATCAGTCT
TACCCCCACT ATGTAAAAAC GGCCATCGCA TCCGCCTTTA CTATCAGCCT

TTTCCCCACA ACAATATTCA TATGCCTAGA CCAAGAAGCT ATTATCTCAA
ATTTCCCACA ATAATATTCA TGTGCACAGA CCAAGAAACC ATTATTTCAA
CTTCCCCACA ACAATATTTC TATGCCTAGA CCAAGAAGCT ATTATCTCAA
CTTCCCCACA ACAATATTCA TGTGCCTAGA CCAAGAAGTT ATTATCTCGA
TATCCCAACA ACAATATTTA TCTGCCTAGG ACAAGAAACC ATCGTCACAA

ACTGGCACTG AGCAACAACC CAAACAACCC AGCTCTCCCT AAGCTT
ACTGACACTG AACTGCAACC CAAACGCTAG AACTCTCCCT AAGCTT
GCTGACACTG AGCAACAACC CAAACAATTC AACTCTCCCT AAGCTT
ACTGACACTG AGCCACAACC CAAACAACCC AGCTCTCCCT AAGCTT
ACTGATGCTG AACAACCACC CAGACACTAC AACTCTCACT AAGCTT
Exercise 4
  5  149
chimpanzeeSFTGAIILIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAF
gibbonxxxxSFTGATVLIIAHGLTSSLLFCLANSNYERTHSRIIILSRGLQALLPLIAF
gorillaexxSFTGAVVLIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAL
homosapienSFTGAVILIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAF
orangutangSFTGATTLMIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAL

LLASLANLALPPTINLLGELSVLVTSFSSNTTLLLTGFNILITALYS
LAASLANLALPPTINLLGELFVLMASFSANTTITLTGLNVLITALYS
LLASLANLALPPTINLLGELSVLVTTFSSNTTLLLTGSNILITALYS
LLASLANLALPPTINLLGELSVLVTTFSSNITLLLTGLNILVTALYS
LLASLTNLALPPTINLLGELSVLIAIFSSNITILLTGLNILITTLYS

LYMFTTTQGSLTHHINNIKPSFTRENTLIFLHLSPILLLSLNPDIITGF
LYIFIITQGTLTHHIKNIKPSLTRENILILMHLFPLLLLTLNPNIITGF
LYIFTTTQGPLTHHITNIKPSFTRENILIFMHLSPILLLSLNPDIITGF
LYIFTTTQGSLTHHINNIKPSFTRENTLMFIHLSPILLLSLNPDIITGF
LYIFTTTQGTPTHHINNIKPSFTRENTLMLIHLSPILLLSLNPSIIAGF

TSC
TPC
TSC
SSC
AYC


Output files for phylogenetic exercises

Exercise 1
DNA parsimony algorithm, version 3.573c

Name         Sequences
----         ---------
seq1         AAG
seq2         ..A
seq3         GGA
seq4         .GA

One most parsimonious tree found:

        +--seq4      
     +--3  
  +--2  +--seq3      
  !  !  
--1  +-----seq2      
  !  
  +--------seq1      

  remember: this is an unrooted tree!

requires a total of      3.000
steps in each site:
         0   1   2   3   4   5   6   7   8   9
     *-----------------------------------------
    0!       1   1   1                        

From    To     Any Steps?    State at upper node
                             ( . means same as in the node below it on tree)

          1                AAR
   1      2        maybe   ..A
   2      3         yes    .G.
   3   seq4         no     ...
   3   seq3         yes    G..
   2   seq2         no     ...
   1   seq1        maybe   ..G
Exercise 2a
DNA parsimony algorithm, version 3.573c

One most parsimonious tree found:

     +--------gorillaexx
  +--2  
  !  !  +-----homosapien
  !  +--3  
--1     !  +--orangutang
  !     +--4  
  !        +--gibbonxxxx
  !  
  +-----------chimpanzee

  remember: this is an unrooted tree!
requires a total of    330.000
Exercise 2b
Neighbor-Joining/UPGMA method version 3.573c


        +-------gibbonxxxx
     +--1  
  +--2  +-----orangutang
  !  !  
  !  +---gorillaexx
  !  
--3-homosapien
  !  
  +--chimpanzee


remember: this is an unrooted tree!

Between        And            Length
-------        ---            ------
   3             2              0.00318
   2             1              0.03598
   1          gibbonxxxx        0.12602
   1          orangutang        0.09198
   2          gorillaexx        0.05777
   3          homosapien        0.04015
   3          chimpanzee        0.05195
Exercise 2c (dnaml program)
Nucleic acid sequence Maximum Likelihood method, version 3.573c

Empirical Base Frequencies:

   A       0.30929
   C       0.32750
   G       0.10570
  T(U)     0.25751

Transition/transversion ratio =   2.000000
(Transition/transversion parameter =   1.653039)

     +-homosapien
  +--2  
  !  !  +-----orangutang
  !  +--3  
  !     +-------gibbonxxxx
  !  
--1---gorillaexx
  !  
  +--chimpanzee

remember: this is an unrooted tree!
Ln Likelihood = -2514.48557
Examined   17 trees

 Between        And            Length      Approx. Confidence Limits
 -------        ---            ------      ------- ---------- ------

   1             2              0.01720     (  0.00594,     0.02847) **
   2          homosapien        0.02875     (  0.01514,     0.04262) **
   2             3              0.05455     (  0.03466,     0.07469) **
   3          orangutang        0.09121     (  0.06758,     0.11573) **
   3          gibbonxxxx        0.13271     (  0.10424,     0.16187) **
   1          gorillaexx        0.06191     (  0.04344,     0.08064) **
   1          chimpanzee        0.05097     (  0.03402,     0.06806) **

     *  = significantly positive, P < 0.05
     ** = significantly positive, P < 0.01
Exercise 2d
(Bootstrapping of parsimony)

Majority-rule and strict consensus tree program, version 3.573c

Species in order: 

  gorillaexx
  homosapien
  orangutang
  gibbonxxxx
  chimpanzee

Sets included in the consensus tree
Set (species in order)     How many times out of 100.00

..**.                      100.00
.***.                      73.67

Sets NOT included in consensus tree:
Set (species in order)     How many times out of 100.00

..***                      20.17
.*..*                       6.17

CONSENSUS TREE:
the numbers at the forks indicate the number
of times the group consisting of the species
which are to the right of that fork occurred
among the trees, out of 100.00 trees

                 +----gibbonxxxx          
            +-100.0
       +-73.7    +----orangutang          
       !    !
  +-100.0    +---------homosapien          
  !    !
  !    +--------------chimpanzee          
  !
  +-------------------gorillaexx          

  remember: this is an unrooted tree!
Exercise 3 - ml method
Nucleic acid sequence Maximum Likelihood method, version 3.573c

Empirical Base Frequencies:

   A       0.30864
   C       0.32781
   G       0.10580
  T(U)     0.25775

Transition/transversion ratio =   2.000000
(Transition/transversion parameter =   1.650539)

     +-homosapien
  +--2  
  !  !  +-----orangutang
  !  +--3  
  !     +-------gibbonxxxx
  !  
--1---gorillaexx
  !  
  +--chimpanzee

remember: this is an unrooted tree!

Ln Likelihood = -2513.07579
Examined   17 trees

 Between        And            Length      Approx. Confidence Limits
 -------        ---            ------      ------- ---------- ------

   1             2              0.01723     (  0.00595,     0.02850) **
   2          homosapien        0.02879     (  0.01516,     0.04268) **
   2             3              0.05462     (  0.03470,     0.07478) **
   3          orangutang        0.09120     (  0.06758,     0.11573) **
   3          gibbonxxxx        0.13287     (  0.10436,     0.16207) **
   1          gorillaexx        0.06198     (  0.04350,     0.08074) **
   1          chimpanzee        0.05088     (  0.03411,     0.06809) **

     *  = significantly positive, P < 0.05
     ** = significantly positive, P < 0.01
Exercise 4 (parsimony and neighbour joining methods)
Protein parsimony algorithm, version 3.573c

One most parsimonious tree found:

           +--homosapien
     +-----3  
     !     +--gorillaexx
  +--2  
  !  !     +--orangutang
--1  +-----4  
  !        +--gibbonxxxx
  !  
  +-----------chimpanzee

  remember: this is an unrooted tree!
requires a total of     57.000


Neighbor-Joining/UPGMA method version 3.573c
 Neighbor-joining method
 Negative branch lengths allowed

     +--------gibbonxxxx
  +--1  
  !  +-----orangutang
  !  
--3-gorillaexx
  !  
  !  +-chimpanzee
  +--2  
     +-homosapien


remember: this is an unrooted tree!

Between        And            Length
-------        ---            ------
   3             1              0.02121
   1          gibbonxxxx        0.14023
   1          orangutang        0.08913
   3          gorillaexx        0.03381
   3             2              0.00696
   2          chimpanzee        0.02877
   2          homosapien        0.03076

Henrik Stryhn (hes@svs.dk) 2000-04-07