SWISS-PROT RELEASE 7.0 RELEASE NOTES
Date: April 7, 1988
Author: A. Bairoch
1. INTRODUCTION
1.1 Evolution
Release 7.0 of SWISS-PROT contains 6821 sequence entries,
comprising 1'885'771 amino acids abstracted from 7128
references. This represents an increase of 11% over release
6.0. The recent growth of the data bank is summarised
below:
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
1.2 Source of data
Release 7.0 has been updated using protein sequence data
from release 15.0 of the PIR (Protein Identification
Resource) protein data bank, as well as translation of
nucleotide sequence data from release 14.0 of the EMBL
nucleotide sequence Data Library.
As an indication to the source of the sequence data in the
SWISS-PROT data bank we list here the statistics concerning
the DR (Databank Reference) pointer lines:
Entries with pointer(s) to only PIR entri(es): 3487
Entries with pointer(s) to only EMBL entri(es): 1541
Entries with pointer(s) to both EMBL and PIR entri(es): 1683
Entries with no pointers lines (entered in house): 110
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE
RELEASE 6
2.1 Sequences and annotations
Some 700 new sequences have been added since the last
release, the sequence data of some 100 existing entries has
been updated and the annotations of almost 1000 entries
have been revised.
In particular we have used reviews articles to update the
annotations of the following groups or families of
proteins: alcohol dehydrogenases, acylphosphatases, beta
lactamases, cholinesterases, elastases, lysozymes,
nitrogenases, peroxidases, superoxide dismutases,
prokaryotic lipoproteins, trypanosomes variant surface
glycoproteins, thioredoxins, heat stable enterotoxins,
pentraxins, azurins, metallothioneins, plastocyanins,
bowman-birk serine protease inhibitors, snake neurotoxins
and cytotoxins, natriuretic peptides, growth hormones,
prolactins.
2.2 Introduction of the Organism Classification (OC) line
type
The OC (Organism Classification) lines contain the
taxonomic classification of the source organism. The
classification is listed top-down as nodes in a taxonomic
tree in which the most general grouping is given first. The
individual items are separated by semi-colons and the list
is terminated by a period. The general format for the OC
line is:
OC NODE[; NODE...].
As an example, the classification lines for a human (Homo
sapiens) sequence entry should be:
OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA;
OC MAMMALIA; EUTHERIA; PRIMATES.
Currently, in release 7.0, the OC lines only contain the
first node of the taxonomic tree. That node can either be:
EUKARYOTA, PROKARYOTA or VIRIDAE. This will already help
users which want to extract from the data bank the entries
containing sequences from eukaryotes, prokaryotes or
viruses. We plan, in future releases, to implement a full
set of OC lines for at least the most common species.
2.3 Documentation changes
Starting with this release we have introduced two new
indices: an index of the sequence of enzymes classified by
their EC number (ECINDEX) and an index of EMBL Data Library
sequences referenced in SWISS-PROT (EMBLTOSP).
3. THE NEXT RELEASE
SWISS-PROT release 8.0 will be probably ready in beginning
of August 1988.
4. WE NEED YOUR HELP !
We welcome any feedback from our users. We especially would
appreciate that you notify us if you find that sequences
belonging to your field of expertise are missing from the
data bank. We also would like to be notified about
annotations to be updated, as for example if the function
of a protein has been clarified or if new post-
translational information has become available.
APPENDIX A: DISKS FOR SWISS-PROT
IBM PC/AT 1.2 Mb disks
SWISS-PROT is stored on seven 1.2 Mb disks. Each of these
disk contains a single bulk file (PROT7_01.BLK to
PROT7_07.BLK):
Disk First sequence Last Sequence
1 16K$TRVPS COAT$CAMVC
2 COAT$CAMVD FIBA$CANFA
3 FIBA$HUMAN IACA$PIG
4 IATP$BOVIN MUCB$HUMAN
5 MUCM$MOUSE PSBD$CHLRE
6 PSBD$MARPO TRPE$SALTY
7 TRPE$SERMA ZEIG$MAIZE
IBM PC 360 Kb disks
SWISS-PROT is stored on twenty four (24) 360 Kb disks. Each
of these disk contains a single bulk file (PROT7_01.BLK to
PROT7_24.BLK):
Disk First sequence Last Sequence
1 16K$TRVPS ANP3$PSEAM
2 ANP4$PSEAM CA11$BOVIN
3 CA11$CHICK CGPG$BOVIN
4 CH15$DROME CRAA$PIG
5 CRAA$PROCA CYCP$RHOSH
6 CYCP$RHOSP ENOA$RAT
7 ENOB$CHICK FTSZ$ECOLI
8 FTZ$DROME H2A1$PSAMI
9 H2A1$SCHPO HBB4$SALIR
10 HBBA$BOSJA HXKA$YEAST
11 HXKB$YEAST KABL$HUMAN
12 KABL$MLVAB KV5J$MOUSE
13 KV5K$MOUSE MERB$SERMA
14 MERB$STAAU NFL$BOVIN
15 NFL$HUMAN PA2$RAT
16 PA2$TRIOK POL2$HPAV
17 POLG$ENMYV RAS$SCHPO
18 RAS1$DROME RPOD$BACSU
19 RPOD$ECOLI TALA$MPOV3
20 TALA$MPOVA TRFE$CHICK
21 TRFE$HUMAN VE59$LAMBD
22 VE5A$HPV11 VNCS$PAVBO
23 VNCY$MUMIM YG12$BPT3
24 YG13$BPT3 ZEIG$MAIZE
IBM PS/2 720 Kb disks
SWISS-PROT is stored on twelve (12) 720 Kb disks. Each of
these disk contains two of the 360 Kb format bulk files.