ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by br flag LNCC Brazil Mirror sites: Australia  Canada  China  Korea  Switzerland
Search for


                    SWISS-PROT RELEASE 17.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 17.0  of SWISS-PROT  contains 20024 sequence entries, comprising
   6'524'504 amino  acids abstracted from 19591 references. This represents
   an increase of 9% over release 16. The recent growth of the data bank is
   summarized below:

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504


   1.2  Source of data

   Release 17.0  has been  updated using protein sequence data from release
   26.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 25.0 of the
   EMBL Nucleotide Sequence Data Library.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank  we list  here the  statistics  concerning  the  DR  (Databank
   Reference) pointer lines:

   Entries with pointer(s) to only PIR entri(es):           3752
   Entries with pointer(s) to only EMBL entri(es):          3713
   Entries with pointer(s) to both EMBL and PIR entri(es): 12112
   Entries with no pointers lines:                           447




      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 16


   2.1  Sequences and annotations

   About 1700 sequences have been added since release 16, the sequence data
   of 312  existing entries  has been  updated and  the annotations of 3750
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  6-phosphogluconate dehydrogenase
   -  Aconitase
   -  Alpha-2 macroglobulin family
   -  ATP synthase a subunit
   -  Catalases
   -  Chalcone resveratrol synthases
   -  Citrate synthase
   -  Dihydroorotase
   -  DNA polymerase family A
   -  Eukaryotic cobalamin-binding proteins
   -  Fatty acid desaturases
   -  Fungal  Zn(2)-Cys(6)   binuclear   cluster   domain   transcriptional
      activators
   -  Gamma-glutamyltranspeptidase
   -  Glutamine amidotransferases class-I
   -  Glutamine amidotransferases class-II
   -  Gonadotropin-releasing hormones
   -  Guanylate cyclases
   -  LIM-1 domain proteins
   -  Myotoxins
   -  Nucleoside diphosphate kinases
   -  Pathogenesis-related proteins BetvI family
   -  Peroxidases
   -  Polyprenyl synthetases
   -  Ribosomal proteins
   -  Rotamases (cyclophilin and FKBP)
   -  Small cytokines (PF4/IL-8 and MCAF/MIP-1 subfamilies)
   -  Sodium symporters
   -  Thiol-activated cytolysins


   2.2 Status of cross-references to PIR

   Older releases of SWISS-PROT contained cross-references to entries which
   were present  only in  the annotated  section of PIR (currently known as
   PIR1); we  started adding cross-references to entries in the unannotated
   sections of PIR (known as PIR2 and PIR3).



                            3. FORTHCOMING CHANGES

   3.1  New line-types: RC and RP

   We plan  to implement the following change in release 19; the current RN
   line will  be replaced  by three  line types:  a modified  RN (Reference
   Number) line  type containing  just  the  reference  number,  a  new  RC
   (Reference Comment)  line  type  containing  comments  relevant  to  the
   reference (strain, tissue, etc.), and a new RP (Reference Position) line
   type containing  the extent of the sequencing carried out by the authors
   of the  reference. Three  examples of  the usage  of these new lines are
   given below.

      RN   [1]
      RC   STRAIN=K12;
      RP   SEQUENCE FROM N.A., AND SEQUENCE OF 1-23.

      RN   [1]
      RC   STRAIN=BALB/C; TISSUE=BRAIN;
      RP   SEQUENCE OF 24-56 AND 67-89.

      RN   [2]
      RC   X-RAY CRYSTALLOGRAPHY=1.8 ANGSTROMS;


   Each reference  block will continue to have exactly one RN line. As many
   RC lines  as are  needed to display the reference's comment will appear.
   If a  reference has no comment then the RC line will not appear. As many
   RP lines  as are  needed to display the extent of sequencing carried out
   by the  authors of  the reference.  If  a  reference  does  not  pertain
   directly to sequencing data then the RP line will not appear.

   3.2  New line-types: CA and CF

   As we announced in the last two release notes, starting with release 18,
   the enzyme entries in SWISS-PROT will have two new line-types:

      CA   Description_of_catalytic_activity.
      CF   Description_of_cofactor.

   These lines  will be  automatically generated  at each release of SWISS-
   PROT from  the information  stored in  the ENZYME  data bank.  They will
   replace the  'CATALYTIC ACTIVITY`  and 'COFACTORS`  comment  lines  (CC)
   topics. Example:

      CC   -!- CATALYTIC ACTIVITY: L-ASPARTATE + 2-OXOGLUTARATE =
               OXALOACETATE + L-GLUTAMATE.
      CC   -!- COFACTOR: PYRIDOXAL PHOSPHATE.

   will be changed to:

      CA   L-ASPARTATE + 2-OXOGLUTARATE = OXALOACETATE + L-GLUTAMATE.
      CF   PYRIDOXAL PHOSPHATE.



   3.3  Change in the OS line

   As we announced in the last two release notes, starting with release 18,
   we will invert the order of the information in the OS line. Currently we
   have 'English  common name  (Latin name)`, we will switch to 'Latin name
   (English common name)`. Example:

      OS   HUMAN (HOMO SAPIENS).

   will be changed to:

      OS   HOMO SAPIENS (HUMAN).


                            4. ENZYME AND PROSITE

   4.1  The ENZYME data bank

   Release 4.0 of the ENZYME data bank is distributed along with release 17
   of SWISS-PROT.  ENZYME release 4.0 contains information relative to 3072
   enzymes. The  data bank  is complete  and up  to date.  Until new enzyme
   nomenclature data  is published  we only  plan to  update the SWISS-PROT
   pointers at  each release  of the  protein sequence  data bank,  correct
   eventual errors,  and complete  the information  concerning synonyms and
   cofactors using the literature.

   4.2  The PROSITE data bank

   Release 6.1  of the  PROSITE data bank is distributed along with release
   17 of  SWISS-PROT. PROSITE  release 6.1  does not really represent a new
   release; the  only changes  between release 6.0 and 6.1 are  updating of
   the pointers  to the  SWISS-PROT entries  whose name  have been modified
   between release  16 and  17. The  next release  of PROSITE (7.0) will be
   distributed with release 18.0 of SWISS-PROT.


                            5. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about annotations to be updated, as for example if the function
   of a protein has been clarified or if new post-translational information
   has become available.



                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.64   Gln (Q) 4.09   Leu (L) 9.09   Ser (S) 7.10
   Arg (R) 5.24   Glu (E) 6.28   Lys (K) 5.86   Thr (T) 5.86
   Asn (N) 4.44   Gly (G) 7.12   Met (M) 2.31   Trp (W) 1.30
   Asp (D) 5.24   His (H) 2.27   Phe (F) 3.95   Tyr (Y) 3.21
   Cys (C) 1.83   Ile (I) 5.44   Pro (P) 5.10   Val (V) 6.48

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 2630

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1138
                            2x:  476
                            3x:  257
                            4x:  166
                            5x:  116
                            6x:   87
                            7x:   69
                            8x:   43
                            9x:   61
                           10x:   16
                       11- 20x:   98
                       21-100x:   81
                         >100x:   22



         A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        1659          Human
         2        1376          Escherichia coli
         3         940          Mouse
         4         871          Rat
         5         643          Baker's yeast (Saccharomyces cerevisiae)
         6         453          Bovine
         7         376          Fruit fly (Drosophila melanogaster)
         8         331          Chicken
         9         252          Bacillus subtilis
        10         246          Rabbit
        11         236          African clawed frog (Xenopus laevis)
        12         232          Vaccinia virus (strain Copenhagen)
        13         220          Pig
        14         190          Human cytomegalovirus (strain AD169)
        15         176          Salmonella typhimurium
        16         160          Bacteriophage T4
        17         142          Maize
        18         124          Rice
        19         111          Tobacco
        20         108          Vaccinia virus (strain WR)
        21         104          Pea
        22         102          Wheat
        23          96          Staphylococcus aureus
        24          91          Slime mold (Dictyostelium discoideum)
        25          86          Barley
        26          85          Sheep
        27          84          Liverwort (Marchantia polymorpha)
        28          83          Soybean
        29          82          Spinach
        30          73          Caenorhabditis elegans
                    73          Neurospora crassa



   A.3  Repartition of the sequences by size

               From   To  Number             From   To   Number
                  1-  50    1174             1001-1100      162
                 51- 100    2099             1101-1200      105
                101- 150    3021             1201-1300       88
                151- 200    1820             1301-1400       52
                201- 250    1468             1401-1500       45
                251- 300    1316             1501-1600       20
                301- 350    1147             1601-1700       22
                351- 400    1131             1701-1800       20
                401- 450     867             1801-1900       22
                451- 500     959             1901-2000       21
                501- 550     718             2001-2100        9
                551- 600     475             2101-2200       22
                601- 650     336             2201-2300       24
                651- 700     258             2301-2400       11
                701- 750     243             2401-2500       11
                751- 800     183             >2500           37
                801- 850     144
                851- 900     150
                901- 950     95
                951-1000     89



   Currently the ten largest sequences are:

                            RYNR$RABIT  5037 a.a.
                            APB$HUMAN   4563 a.a.
                            APOA$HUMAN  4548 a.a.
                            POLG$BVDV   3988 a.a.
                            POLG$HCVA   3898 a.a.
                            TRX$DROME   3759 a.a.
                            ACVA$PENCH  3746 a.a.
                            DMD$HUMAN   3685 a.a.
                            DMD$CHICK   3660 a.a.
                            POLG$KUNJM  3433 a.a.

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by br flag LNCC Brazil Mirror sites: Australia  Canada  China  Korea  Switzerland