SWISS-PROT RELEASE 5.0 RELEASE NOTES
Date: September 8, 1987
Author: A. Bairoch
INTRODUCTION
Release 5.0 of SWISS-PROT contains 5205 sequence entries,
comprising 1'327'683 amino acids abstracted from 5673
references. This represents an increase of 28% over release
4.0. The recent growth of the data bank is summarized
below:
Release Date Number of entries Nb of amino
acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
SOURCE OF DATA
Release 5.0 has been updated using protein sequence data
from release 12.0 of the PIR protein data bank, as well as
translation of nucleic-acid sequence data from release 12.0
of the EMBL nucleotide Data Library. The statistics for the
source of the sequences in release 5.0 of SWISS-PROT are:
Entries adapted from PIR entries 4419
Entries translated from EMBL entries 708
Entries entered "in house" 78
DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 4
New sequence data
To make SWISS-PROT as complete as possible we have scanned
the EMBL Data Library for sequences not yet available in
the protein data bank and we have translated and annotated
some 700 additional sequences.
In addition to those new sequences, some 200 existing
entries have been updated with sequence data translated
from EMBL entries.
We also have started to introduce DR lines pointing to the
primary accession number of the nucleotide sequence entry
in EMBL to entries which were sequenced at the DNA or RNA
level. The current statistics for DR lines are the
following:
Entries with pointer line(s) to a PIR entry 4419
Entries with pointer line(s) to an EMBL entry 1512
Entries with pointer line(s) to a PDB entry 140
Modifications in the feature keys
- A new feature key has been introduced with this release:
MUTAGEN which is used to describe sites which have been
experimentally altered.
- The description part of the CONFLICT and VARIANT keys
now follows a precise format.
a) If the position described is missing the description
part will be similar to those shown in the following
examples:
FT CONFLICT 33 33 MISSING (IN REF. 2).
FT VARIANT 12 245 MISSING (IN SMALL VARIANT).
b) Positions with amino acid change(s) are now described
as shown in the following examples:
FT CONFLICT 60 60 P -> A (IN REF. 2).
FT CONFLICT 1 5 ASTQS -> GAWTL (IN REF. 3).
FT VARIANT 39 39 P -> G (IN 50% OF THE MOLECULES).
Format modifications
We have made a number of small modifications to the format
of SWISS-PROT to make it more compatible with that of the
EMBL Data Library. Those format changes are the following:
- Journal abbreviations are now identical in both data
banks.
- The format of the RL line for submitted data and for
Ph.D. thesis has been modified to be compatible with
that used in the EMBL Data Library.
- In the DT line, there are now two spaces between the
date and the comment part (instead of one in previous
releases).
Organism identification code changes
We have modified some viruses organism identification codes
to make them compatible with the abbreviations commonly
used. Among the codes changed are those of the AIDS viruses
which now have organism codes starting with HIV1 or HIV2.
Various changes
- Keywords have been updated in a significant number of
entries.
- Cytochrome P450 entry names and descriptions have been
totally modified to take in account new nomenclature.
- The annotations of a number of protein families has been
updated and completed using recent reviews (among those
reviewed are: cystatins, ubiquitins, RAS oncogenes, "EF-
hand" calcium-binding proteins, ferritins,
serine/threonine protein kinases, GTP binding proteins,
etc).
THE NEXT RELEASE
SWISS-PROT release 6.0 will be probably ready in the end of
january or beginning of february 1987, it will include new
data from PIR release 13 and a significant number of
additional translated EMBL release 13 sequences.
At this stage all the sequences which are stored in SWISS-
PROT and which were sequenced at the DNA or RNA level
should have DR lines pointing to the primary accession
number of the nucleotide sequence entry in EMBL.
We also plan to add some new feature keys and to make the
keyword usage in SWISS-PROT more consistent.
APPENDIX A: DISKS FOR SWISS-PROT
IBM PC/AT 1.2Mb disks
- SWISS-PROT is stored on six 1.2 Mb disks.
- Disks 1 to 5 each contain a single bulk file
(PROT5_01.BLK to PROT5_05.BLK) which correspond to the
sequences adapted from entries in the PIR data bank.
Those sequences are classified in the order in which
they are found in the PIR data bank.
- Disk 6 contains two bulk files: PROT5_NW.BLK and
PROT5_PR.BLK. The first bulk file correspond to new
sequences translated from EMBL Data Library entries. The
second one to a few "preliminary" entry sequences.
IBM PC 360 Kb disks
- SWISS-PROT is stored on eighteen 360 Kb disks.
- Disks 1 to 15 each contain a single bulk file
(PROT5_01.BLK to PROT5_15.BLK) which correspond to the
sequences adapted from entries in the PIR data bank.
Those sequences are classified in the order in which
they are found in the PIR data bank.
- Disks 16 to 17 each contain a single bulk file
(PROT5_N1.BLK and PROT5_N2.BLK which correspond to new
sequences translated from EMBL Data Library entries.
- Disk 18 contain contains two bulk files: PROT5_N3.BLK
and PROT5_PR.BLK. The first bulk file correspond to the
last part of the new sequences translated from EMBL Data
Library entries. The second one to a few "preliminary"
entry sequences.