SWISS-PROT RELEASE 13.0 RELEASE NOTES
Date: January 18, 1990
Author: A. Bairoch
1. INTRODUCTION
1.1 Evolution
Release 13.0 of SWISS-PROT contains 13837 sequence entries, comprising
4'347'336 amino acids abstracted from 13560 references. This represents
an increase of 14% over release 12.0. The recent growth of the data bank
is summarized below:
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
1.2 Source of data
Release 13.0 has been updated using protein sequence data from release
22.0 of the PIR (Protein Identification Resource) protein data bank, as
well as translation of nucleotide sequence data from release 21.0 of the
EMBL Nucleotide Sequence Data Library.
As an indication to the source of the sequence data in the SWISS-PROT
data bank we list here the statistics concerning the DR (Databank
Reference) pointer lines:
Entries with pointer(s) to only PIR entri(es): 3104
Entries with pointer(s) to only EMBL entri(es): 5894
Entries with pointer(s) to both EMBL and PIR entri(es): 3989
Entries with no pointers lines (entered in house): 850
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 11
As the last SWISS-PROT release to be distributed to PC/Gene users was
release 11, we list here the changes that were made both in release 12
and 13.
2.1 Sequences and annotations
Almost 3000 sequences have been added since release 11, the sequence
data of 380 existing entries has been updated and the annotations of
4900 entries have been revised. In particular we have used reviews
articles to update the annotations of the following groups or families
of proteins:
- 11-S plant seed storage proteins
- 3'5'-cyclic nucleotide phosphodiesterases
- Acyl carrier proteins
- Aldehyde dehydrogenases
- Aminoacyl-transfer RNA synthetases
- AraC family bacterial transcription regulation proteins
- Arginases
- Bacteriophage P22 proteins
- Bacteriophage T4 proteins
- Band 3 protein family
- Biotin-requiring enzymes
- Chloroplast photosystems I and II proteins
- Creatine kinases
- Crp family bacterial activator proteins
- Cyclins
- Dehydrins
- DNA mismatch repair proteins
- Endothelins / sarafotoxins
- Enolases
- Eukaryotic thiol proteases
- Extradiol ring-cleavage dioxygenases
- Flavodoxins
- Globins, annelids
- Globins, molluscs
- Glucose-6-phosphate dehydrogenases
- Glutaredoxins
- GTP-binding elongation factors
- Glutamate dehydrogenases
- Glycerate and 3-phosphoglycerate dehydrogenases
- Glycophorins
- Granzymes
- GTP-binding elongation factors
- Heat-labile enterotoxins
- Heat shock hsp90 proteins
- Insulin family proteins
- Insulin-like growth factor binding proteins
- Insect larval cuticle proteins
- Insect-type alcohol dehydrogenases / ribitol dehydrogenase family
- Integrins
- Interleukin 7
- Iron-containing alcohol dehydrogenases
- Lysosome-associated membrane glycoproteins
- LysR family bacterial activator proteins
- L-lactate dehydrogenases
- Macrolide-lincosamide-streptogramin B resistance proteins
- Malate dehydrogenase
- Mammalian defensins
- MHC class II proteins
- Mitochondrial energy transfer proteins
- Myc-type proteins
- Nerve growth factors
- N-4 cytosine-specific DNA methylases
- Peroxidases
- Phosphoglucose isomerases
- Phosphoglycerate kinases
- Phospholipases A2
- Picornaviruses genome polyproteins
- Ribosomal proteins
- Rubredoxins
- Serine hydroxymethyltransferases
- Serine/threonine specific protein phosphatases
- Staphyloccocal enterotoxins / Streptococcal pyrogenic exotoxins
- Sugar transporters
- Thaumatin family proteins
- Transferrins
- Tryptophan synthase alpha and beta chains
- Tyrosine protein kinases
- Uracil-DNA glycosylases
- Vertebrate galactoside-binding lectins
- Zinc-containing alcohol dehydrogenases
2.2 New line-type
Release 12 introduced a new type of data line, the OG line. The OG
(OrGanelle) lines indicate whether the gene coding for a protein
originates from the mitochondria, the chloroplast, or a plasmid. The
format for the OG line is:
OG CHLOROPLAST.
OG MITOCHONDRION.
OG PLASMID name.
Where 'name' is the name of the plasmid.
Previously this information was stored in the OS line, as shown in the
example below.
OS WHEAT (TRITICUM AESTIVUM) CHLOROPLAST.
The above example is now stored as:
OS WHEAT (TRITICUM AESTIVUM).
OG CHLOROPLAST.
2.3 New topic for the comments (CC) line type
As of release 12 we have added a new 'topic' for the comments (CC) line-
type: CAUTION. It is used to indicate that possible errors and/or
grounds for confusion may exist. Example of its usage:
CC -!- CAUTION: ALSO SEE VERSION 2 OF THIS PROTEIN THAT DIFFERS DUE
CC TO A FRAMESHIFT.
2.4 Small change in the RL lines for submissions
RL lines for data submitted to EMBL or Genbank was represented by two
subtypes of RL lines, as illustrated in the following examples:
RL SUBMITTED (OCT-1989) TO THE EMBL DATA LIBRARY.
RL SUBMITTED (OCT-1989) TO GENBANK.
Starting with release 13, all these lines are now in the following
format:
RL SUBMITTED (OCT-1989) TO EMBL/GENBANK DATA BANKS.
2.5 Documentation changes
- ACINDEX.TXT is a new document file which is an index of all the
accession numbers which appear in SWISS-PROT and the name of the
entries in which they occur.
- PDBTOSP.TXT is a new document file which is an index of all the
Brookhaven PDB entries referenced in SWISS-PROT.
- The JOURLIST.TXT document now indicates the abbreviation and the full
names of all journals cited in SWISS-PROT.
Important: for more detailed information concerning the SWISS-PROT
documentation please consult appendix C of these release notes.
3. IMPORTANT NOTES CONCERNING SWISS-PROT RELEASE 13 AND PC/GENE 6.01
3.1 The ryanodine receptor
The rabbit skeletal muscle ryanodine receptor (RYNR$RABIT) is a protein
of 5037 amino acid residues. PC/Gene release 6.01 can only analyze
proteins of up to 5000 residues. This limitation will be increased in
the next major version (6.50). Until this release we have dealt with
this protein in the following way: the sequence entry RYNR$RABIT was
split into two parts. RYN1$RABIT contains the first 4563 residues (which
corresponds to the cytoplasmic domain), and RYN2$RABIT contains residues
4561 to 5037.
Note: due to this modification there are 13838 sequence entries in the
PC/Gene version of the SWISS-PROT data bank (instead of 13837 as listed
in section 1.1 of these release notes).
3.2 The OG line
The OG line-type introduced in release 12 (see section 2.2) is not
supported by release 6.01 of PC/Gene. This means that although these
lines are present in the SWISS-PROT data base (either on the CD-ROM disk
or on the bulk files on the floppy disks), you can not make use of them.
Release 6.50 will fully support OG lines.
Note to CD-ROM users: library files containing the names of all the
sequences which originate from either the mitochondria or the
chloroplast are available on the CD-ROM (for more details see the CD-ROM
release notes).
4. WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, as for example if the function
of a protein has been clarified or if new post-translational information
has become available.
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.69 Gln (Q) 4.10 Leu (L) 9.11 Ser (S) 7.03
Arg (R) 5.22 Glu (E) 6.29 Lys (K) 5.87 Thr (T) 5.84
Asn (N) 4.42 Gly (G) 7.18 Met (M) 2.30 Trp (W) 1.33
Asp (D) 5.23 His (H) 2.27 Phe (F) 3.93 Tyr (Y) 3.20
Cys (C) 1.83 Ile (I) 5.37 Pro (P) 5.13 Val (V) 6.49
Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.04
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 2032
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 916
2x: 389
3x: 187
4x: 127
5x: 87
6x: 67
7x: 32
8x: 28
9x: 44
10x: 20
11- 20x: 66
21-100x: 53
>100x: 16
A.2.2 Table of the most represented species
Number Frequency Species
1 1200 Human
2 1072 Escherichia coli
3 685 Mouse
4 576 Rat
5 443 Baker's yeast (Saccharomyces cerevisiae)
6 366 Bovine
7 239 Fruit fly (Drosophila melanogaster)
8 227 Chicken
9 195 Rabbit
10 160 Bacillus subtilis
11 159 Pig
12 131 African clawed frog (Xenopus laevis)
13 122 Bacteriophage T4
14 112 Salmonella typhimurium
15 103 Tobacco
16 102 Maize
17 92 Rice
18 84 Liverwort (Marchantia polymorpha)
19 78 Wheat
20 74 Staphylococcus aureus
21 71 Vaccinia Virus
22 70 Herpes virus (Type 1, strain 17)
70 Pea
70 Spinach
25 69 Soybean
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 923 1001-1100 112
51- 100 1635 1101-1200 72
101- 150 2474 1201-1300 59
151- 200 1405 1301-1400 39
201- 250 1113 1401-1500 33
251- 300 960 1501-1600 16
301- 350 842 1601-1700 15
351- 400 815 1701-1800 13
401- 450 618 1801-1900 9
451- 500 678 1901-2000 15
501- 550 532 2001-2100 6
551- 600 329 2101-2200 18
601- 650 243 2201-2300 18
651- 700 173 2301-2400 10
701- 750 174 2401-2500 8
751- 800 110 >2500 26
801- 850 101
851- 900 121
901- 950 60
951-1000 62
Currently the five largest sequences are: RYNR$RABIT 5037 a.a.
APB$HUMAN 4563 a.a.
APOA$HUMAN 4548 a.a.
DMD$HUMAN 3685 a.a.
DMD$CHICK 3660 a.a.
APPENDIX B: DOCUMENTATION
SWISS-PROT documentation consists of the following items:
USERMAN .TXT SWISS-PROT user manual
SP13_REL.TXT Release notes (this document)
SHORTDES.TXT Short description of entries in SWISS-PROT (this document
contains the same information as that available in the
catalog file (PROT_CAT.TXT) but it is formatted differently)
JOURLIST.TXT List of abbreviations for journals cited
KEYWLIST.TXT List of keywords in use
SPECLIST.TXT List of organism (species) identification codes
SPECODES.TXT List of sequence entry codes classified by species
ACINDEX .TXT Accession number index
AUTINDEX.TXT Author index
CITINDEX.TXT Citation index
ECINDEX .TXT Index of enzymes classified by their EC number
EMBLTOSP.TXT Index of the EMBL Data Library sequences referenced in
SWISS-PROT
PDBTOSP .TXT Index of Brookhaven PDB entries referenced in SWISS-PROT
- All these document files are available on the CD-ROM disk. They are
stored in the '\DOC_DBAS\SPROT' directory.
- Except for AUTINDEX.TXT and SHORTDES.TXT, all the other document
files are stored in two SWISS-PROT documentation floppy disks.
- Some of these documents are also distributed in a printed form (see
table below).
Document Documentation Printed
Disk N# copy
----------------------------------------
USERMAN .TXT 1 Yes
SP13_REL.TXT 1 Yes
SHORTDES.TXT N.A. [*]
JOURLIST.TXT 1 Yes
KEYWLIST.TXT 1 Yes
SPECLIST.TXT 1 Yes
SPECODES.TXT 1 No
ACINDEX .TXT 1 No
AUTINDEX.TXT N.A. No
CITINDEX.TXT 2 No
ECINDEX .TXT 1 Yes
EMBLTOSP.TXT 1 No
PDBTOSP .TXT 1 Yes
[*] The content of the catalog file (PROT_CAT.TXT) is provided in a
printed form. It contains the same information as that available in
SHORT_DES.TXT, but it is formatted differently.
APPENDIX C: FLOPPY DISK VERSION
C.1 IBM PC/AT 1.2 Mb disks
SWISS-PROT release 13 is stored on eighteen 1.2 Mb disks. Each one
contains a single bulk file (PRT13_01.BLK to PRT13_18.BLK):
Disk First sequence Last Sequence
1 10KA$MYCTU ATCD$RAT
2 ATCE$PIG CATL$CHICK
3 CATL$HUMAN CRAM$CRAAB
4 CRB$DROME DRN1$SHEEP
5 DRNE$VIBCH FM19$BACNO
6 FM1A$ECOLI H3$CAEEL
7 H3$CHICK HPIS$RHOTE
8 HPIS$THIPF KABL$MOUSE
9 KAC$HUMAN LPID$EDWTA
10 LPIV$ECOLI NEF$HIV1R
11 NEF$HIV1S PEPA$HUMAN
12 PEPA$MACFU PRTS$HUMAN
13 PRTS$SERMA RL7$DICDI
14 RL7$ECOLI SST2$YEAST
15 ST12$YEAST TRY1$ECOLI
16 TRY1$HUMAN VIP$GADMO
17 VIP$GOAT YP3$CHLTR
18 YP4$CHLTR ZP3$MOUSE
C.2 IBM PS/2 1.4 Mb disks
The number and content of the 1.4 Mb disks for the PS/2 systems are
exactly identical to those of the 1.2 Mb disks (see above).
C.3 Catalog file
The SWISS-PROT catalog file for PC/Gene (PROT_CAT.TXT) is stored on two
disks (CATALOG disks 1 and 2). Insert the first disk in your floppy
drive and type: INSCAT. Follow the program instructions, you will be
prompted to enter the second disk once the content of the first one have
been copied.
C.4 Documentation disks
There are two documentation disks. The content of these two disks is
described in appendix B.