SWISS-PROT RELEASE 22.0 RELEASE NOTES
1. INTRODUCTION
1.1 Evolution
Release 22.0 of SWISS-PROT contains 25044 sequence entries, comprising
8'375'696 amino acids abstracted from 25613 references. This represents
an increase of 6.5% over release 21. The recent growth of the data bank
is summarized below.
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
21.0 03/92 23742 7 866 596
22.0 05/92 25044 8 375 696
1.2 Source of data
Release 22.0 has been updated using protein sequence data from release
31.0 of the PIR (Protein Identification Resource) protein data bank, as
well as translation of nucleotide sequence data from release 30.0 of the
EMBL Nucleotide Sequence Database.
As an indication to the source of the sequence data in the SWISS-PROT
data bank we list here the statistics concerning the DR (Database cross-
references) pointer lines:
Entries with pointer(s) to only PIR entri(es): 4309
Entries with pointer(s) to only EMBL entri(es): 3327
Entries with pointer(s) to both EMBL and PIR entri(es): 16868
Entries with no pointers lines: 540
<PAGE>
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 21
2.1 Sequences and annotations
About 1320 sequences have been added since release 21, the sequence data
of 180 existing entries has been updated and the annotations of 3800
entries have been revised. In particular we have used reviews articles
to update the annotations of the following groups or families of
proteins:
- Bacterial regulatory proteins, luxR family
- Bacterial regulatory proteins, ntrC family
- Band 4.1 proteins family
- Beta-amylases
- Beta-transducin family
- BetvI family pathogenesis proteins
- bglG-type antiterminator family
- Bromodomain proteins
- CBF/NF-Y subunits
- CDC48 / PAS1 / SEC18 / TBP-1 family
- Chaperonins cpn10
- Colipases
- Collagens
- D-amino acid oxidases
- D-isomer specific 2-hydroxyacid dehydrogenases
- Dihydrodipicolinate synthases
- DnaJ-like proteins
- Endoglucanases / exoglucanases/ xylanases
- Fork head domain proteins
- G-protein coupled receptors family 2
- GMC oxidoreductases
- Herpes viruses proteins
- Inositol monophosphatase family
- LIM domain proteins
- Mevalonate, and phosphomevalonate kinases.
- Mitochondrial creatine kinases
- NADH-ubiquinone oxidoreductases peripheral subunits
- Orotidine 5'-phosphate decarboxylase
- papD family chaperonins
- Polygalacturonases
- Polyprenyl synthetases
- Regulator of chromosome condensation (RCC1)
- Respiratory-chain NADH dehydrogenase nuclear encoded subunits
- Ribonuclease P protein component
- Serine proteases, V8-type family
- Snake toxins
- Thiol (cysteine) proteases
- TNF receptors / low-affinity NGF receptor / CD40 family
- Vinculins and alpha-catenins
- ZP domain proteins
<PAGE>
2.2 New topic for the comments (CC) line type
We have introduced in this release a new 'topic' for the comments (CC)
line-type: PTM. This topic is used to describe post-translational
modification(s) whose position(s) on the sequenc is not yet known and
which can thus not be shown in the feature table.
Examples of its usage:
CC -!- PTM: PHOSPHORYLATED IN VITRO ON SERINE(S) AND THREONINE(S)
CC BY PKC.
CC -!- PTM: HEAVILY GLYCOSYLATED WITH SULFATED N-LINKED CARBOHYDRATES.
CC -!- PTM: SIX DISULFIDE BONDS ARE PRESENT.
2.3 New feature key
The UNSURE key is used to describe region(s) of a sequence for which the
authors are unsure about the sequence assignment.
3. ENZYME AND PROSITE
3.1 The ENZYME data bank
Release 9.0 of the ENZYME data bank is distributed along with release 22
of SWISS-PROT. ENZYME release 9.0 contains information relative to 3179
enzymes. The data bank is complete and up to date. Until new enzyme
nomenclature data is published we only plan to update the SWISS-PROT
pointers at each release of the protein sequence data bank, correct
eventual errors, and complete the information concerning synonyms and
cofactors using the literature.
3.2 The PROSITE data bank
Release 9.00 of the PROSITE data bank is distributed along with release
22 of SWISS-PROT. Release 9.00 contains 580 documentation chapters that
describes 689 different patterns.
Since the last major release of PROSITE (release 8.00 of December 1991),
50 new chapters have been added and about 200 chapters have been
updated. The new chapters are listed below.
- Acid phosphatases signature
- Bacterial protein export pilT protein family signature
- Bacterial regulatory proteins, luxR family signature
- Bacterial Ribonuclease P protein component signature
- Band 4.1 family domain signatures
- Beta-transducin family Trp-Asp repeats signature
<PAGE>
- Bromodomain
- CBF/NF-Y subunits signatures
- CDC48 / PAS1 / SEC18 family signature
- Chaperonins cpn10 signature
- C-type lectin domain signature
- Dihydrodipicolinate synthetase signatures
- dnaJ domains signatures
- D-amino acid oxidase signature
- Fork head domain signatures
- GHMP kinases putative ATP-binding domain
- Glycosyl hydrolases family 2 active site
- Glycosyl hydrolases family 32 active site
- Glycosyl hydrolases family 5 signature
- Glycosyl hydrolases family 6 signatures
- GMC oxidoreductases signatures
- Gram-negative pili assembly chaperone signature
- G-protein coupled receptors family 2 signatures
- Histidinol dehydrogenase active site
- Indole-3-glycerol phosphate synthase signature
- Inositol monophosphatase family signatures
- Leucine aminopeptidase signature
- Methionine aminopeptidase signature
- Neurotransmitters transporters signature
- NifH/frxC family signature
- Osteonectin domain signatures
- PTN/MK heparin-binding protein family signatures
- RecF protein signatures
- Regulator of chromosome condensation signatures
- Respiratory-chain NADH dehydrogenase subunit 1 signatures
- Respiratory-chain NADH dehydrogenase 51 Kd subunit signatures
- Respiratory-chain NADH dehydrogenase 75 Kd subunit signatures
- Ribosomal protein L30 signature
- Ribosomal protein L9 signature
- Ribosomal protein S13 signature
- Ribosomal protein S19e signature
- Ribosomal protein S4 signature
- Serine proteases, V8 family, active sites
- Sigma-54 interaction domain signatures
- Thymidine phosphorylase signature
- Tissue factor signature
- TNFR/NGFR family cysteine-rich region signature
- Transcriptional antiterminators bglG family signature
- Vinculin family signatures
- ZP domain signature
4. WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, as for example if the function
of a protein has been clarified or if new post-translational information
has become available.
<PAGE>
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.64 Gln (Q) 4.07 Leu (L) 9.14 Ser (S) 7.10
Arg (R) 5.24 Glu (E) 6.26 Lys (K) 5.82 Thr (T) 5.84
Asn (N) 4.47 Gly (G) 7.09 Met (M) 2.33 Trp (W) 1.30
Asp (D) 5.25 His (H) 2.27 Phe (F) 3.97 Tyr (Y) 3.21
Cys (C) 1.81 Ile (I) 5.49 Pro (P) 5.06 Val (V) 6.49
Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Ser, Gly, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 3325
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 1465
2x: 599
3x: 323
4x: 197
5x: 143
6x: 111
7x: 74
8x: 51
9x: 65
10x: 35
11- 20x: 140
21- 50x: 68
51-100x: 24
>100x: 30
<PAGE>
A.2.2 Table of the most represented species
Number Frequency Species
1 1943 Human
2 1785 Escherichia coli
3 1152 Mouse
4 1099 Rat
5 1004 Baker's yeast (Saccharomyces cerevisiae)
6 532 Bovine
7 474 Fruit fly (Drosophila melanogaster)
8 406 Chicken
9 377 Bacillus subtilis
10 290 African clawed frog (Xenopus laevis)
11 282 Rabbit
12 258 Pig
13 251 Vaccinia virus (strain Copenhagen)
14 245 Salmonella typhimurium
15 193 Human cytomegalovirus (strain AD169)
16 183 Maize
17 168 Bacteriophage T4
18 153 Vaccinia virus (strain WR)
19 142 Rice
20 129 Tobacco
21 128 Wheat
22 121 Pea
23 113 Staphylococcus aureus
24 112 Pseudomonas aeruginosa
25 111 Barley
26 109 Slime mold (Dictyostelium discoideum)
27 106 Arabidopsis thaliana (Mouse-ear cress)
28 104 Fission yeast (Schizosaccharomyces pombe)
29 102 Sheep
102 Spinach
31 100 Pseudomonas putida
100 Soybean
33 97 Dog
34 96 Caenorhabditis elegans
35 95 Neurospora crassa
<PAGE>
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 1597 1001-1100 239
51- 100 2698 1101-1200 140
101- 150 3791 1201-1300 117
151- 200 2402 1301-1400 77
201- 250 2024 1401-1500 62
251- 300 1838 1501-1600 33
301- 350 1683 1601-1700 29
351- 400 1640 1701-1800 29
401- 450 1257 1801-1900 34
451- 500 1394 1901-2000 27
501- 550 980 2001-2100 10
551- 600 700 2101-2200 28
601- 650 488 2201-2300 31
651- 700 338 2301-2400 11
701- 750 329 2401-2500 12
751- 800 275 >2500 61
801- 850 204
851- 900 209
901- 950 127
951-1000 130
Currently the ten largest sequences are:
RYNR_RABIT 5037 a.a.
RYNR_HUMAN 5032 a.a.
APB_HUMAN 4563 a.a.
APOA_HUMAN 4548 a.a.
DYHC_TRIGR 4466 a.a.
POLG_BVDV 3988 a.a.
POLG_HCVA 3898 a.a.
POLG_HCVB 3898 a.a.
ACVT_PENCH 3791 a.a.
TRX_DROME 3759 a.a.
<PAGE>
APPENDIX B: ON-LINE EXPERTS
B.1 List of on-line experts for PROSITE and SWISS-PROT
Field of expertise Name Email address
--------------------------- ------------------ ----------------------------
African swine fever virus Yanez R.J. ryanez@cbm2.uam.es
Alcohol dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt@medfys.ki.se
Aldehyde dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt@medfys.ki.se
Alpha-crystallins/HSP-20 Leunissen J.A.M. jackl@caos.caos.kun.nl
de Jong W. u629000@hnykun11.bitnet
Alpha-2-macroglobulins Van Leuven F. fred@blekul13.bitnet
AA-tRNA synthetases class II Leberman R. leberman@frembl51.bitnet
Apolipoproteins Boguski M.S. boguski@ncbi.nlm.nih.gov
Arrestins Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Band 4.1 family proteins Rees J. jrees@vax.oxford.ac.uk
Beta-lactamases Brannigan J. jab5@vaxa.york.ac.uk
Beta-transducin family Boguski M.S. boguski@ncbi.nlm.nih.gov
Chalcone/stilbene synthases Schroeder J. raf@sun1.ruf.uni-freiburg.de
Chitinases Henrissat B. cermav@frgren81.bitnet
Clusterin Peitsch M.C. peitsch@ulbio1.unil.ch
CTF/NF-I Mermod N. nmermod@ulys.unil.ch
Cytochromes P450 Holsztynska E.J. ela@netcom.uucp
netcom!ela@apple.com
DEAD-box helicases Linder P. linder@urz.unibas.ch
EF-hand calcium-binding Cox J.A. cox@sc2a.unige.ch
Kretsinger R.H. rhk5i@virginia.bitnet
Enoyl-CoA hydratase Hofmann K.O. khofmann@cipvax.biolan.uni-koeln.de
fruR/lacI family HTH proteins Reizer J. jreizer@ucsd.edu
GATA-type zinc-fingers Boguski M.S. boguski@ncbi.nlm.nih.gov
Glucanases Henrissat B. cermav@frgren81.bitnet
Beguin P. phycel@pasteur.bitnet
G-protein coupled receptors Chollet A. chollet@clients.switch.ch
Attwood T.K. bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins Boguski M.S. boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17 Landsman D. landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Integrases Roy P.H. 2020000@lavalvx1.bitnet
Lipocalins Boguski M.S. boguski@ncbi.nlm.nih.gov
Peitsch M.C. peitsch@ulbio1.unil.ch
lysR family HTH proteins Henikoff S. henikoff@sparky.fhcrc.org
MAC components / perforin Peitsch M.C. peitsch@ulbio1.unil.ch
Myelin proteolipid protein Hofmann K.O. khofmann@cipvax.biolan.uni-koeln.de
PEP requiring enzymes Reizer J. jreizer@ucsd.edu
pfkB carbohydrate kinases Reizer J. jreizer@ucsd.edu
Phytochromes Partis M.D. partis@gcri.afrc.ac.uk
Protein kinases Hanks S. hanks@vuctrvax.bitnet
Hunter T. hunter@salk.bitnet
<PAGE>
PTS proteins Reizer J. jreizer@ucsd.edu
Restriction-modification Bickle T. bickle@urz.unibas.ch
enzymes Roberts R.J. roberts@cshl.org
Ribosomal protein S3 Hallick R. hallick%biotec@arizona.edu
Ribosomal protein S15 Ellis S.R. srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases Harayama S. harayama@cmu.unige.ch
Sodium symporters Reizer J. jreizer@ucsd.edu
Subtilases Brannigan J. jab5@vaxa.york.ac.uk
Thiol proteases Turk B. turk@ijs.ac.mail.yu
Thiol proteases inhibitors Turk B. turk@ijs.ac.mail.yu
TPR repeats Boguski M.S. boguski@ncbi.nlm.nih.gov
Transit peptides von Heijne G. gunnar@cbts.sunet.se
Type-II membrane antigens Levy S. levy@cellbio.stanford.edu
Uracil-DNA glycosylase Aasland R. aasland@bio.uib.no
Xylose isomerase Jenkins J. jenkins@frira.afrc.ac.uk
WAP-type domain Claverie J.-M. jmc@ncbi.nlm.nih.gov
ZP domain Bork P. bork@embl-heidelberg.de
Bacteriophage P4 Halling C. chh9@midway.uchicago.edu
Drosophila Ashburner M. ma11@phx.cam.ac.uk
Escherichia coli Rudd K. rudd@ncbi.nlm.nih.gov
Salmonella typhimurium Rudd K. rudd@ncbi.nlm.nih.gov
Snakes Stocklin R. stocklin@cmu.unige.ch
Yeast chromosome I Ouellette F. francis@monod.biol.mcgill.ca
B.2 Requirements to fulfill to become an on-line expert
An expert should be a scientist working with specific famili(es) of
proteins (or specific domains) and which would:
a) Review the protein sequences in SWISS-PROT and the patterns/matrices
in PROSITE relevant to their field of research.
b) Agree to be contacted by people that have obtained new sequence(s)
which seem to belong to "their" familie(s) of proteins.
c) Have access to electronic mail and be willing to use it to send and
receive data.
If you are willing to be part of this scheme please contact Amos Bairoch
at one of the following electronic mail addresses:
bairoch@cmu.unige.ch
bairoch@cgecmu51.bitnet
<PAGE>
APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
============================================
Relationships between biomolecular databases
============================================
Last updated: March, 1992.
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
**********************
*********************** <----- * EPD [Euk. Promot.] *
* EMBL Nucleotide * -----> **********************
* Sequence Data *
***************** ----> * Library * **********************
* FLYBASE * <---- *********************** <----- * ECD [E. coli map] *
* [Drosophila * ^ | ^ **********************
* genetic maps] * --------+ | | |
***************** <-----+ | | | +--------- **********************
| | | | +--------- * TFD [Trans. fact.] *
| | | | | +------> **********************
| | | | | |
***************** | v | v v | **********************
* REBASE * *********************** * ENZYME [Nomencl.] *
* [Restriction * <---- * SWISS-PROT * <----- **********************
* enzymes] * * Protein Sequence * |
***************** * Data Bank * v
*********************** **********************
***************** | ^ | ^ | ^ | | * OMIM [Diseases] *
* PROSITE * <-------+ | | | | | | +--------> **********************
* [Patterns] * ----------+ | | | | |
***************** | | | | +-----------> **********************
| | | | +-------------- * E. coli 2D gels *
| | | | **********************
| | | |
| | | +----------------> **********************
| | +------------------- * EcoGene/EcoSeq *
| v **********************
| ***********************
+--------> * PDB [3D structures] *
***********************
<PAGE>