ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by br flag LNCC Brazil Mirror sites: Australia  Canada  China  Korea  Switzerland
Search for

             SWISS-PROT RELEASE 12.0 RELEASE NOTES


   Date:     October 14, 1989
   Author:   A. Bairoch


                         1. INTRODUCTION

   1.1  Evolution

   Release 12.0  of SWISS-PROT  contains 12305 sequence entries, comprising
   3'797'482 amino  acids abstracted from 12147 references. This represents
   an increase of 16% over release 11.0. The recent growth of the data bank
   is summarized below:

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482


   1.2  Source of data

   Release 12.0  has been  updated using protein sequence data from release
   21.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 20.0 of the
   EMBL Nucleotide Sequence Data Library.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank  we list  here the  statistics  concerning  the  DR  (Databank
   Reference) pointer lines:

   Entries with pointer(s) to only PIR entri(es):           3125
   Entries with pointer(s) to only EMBL entri(es):          4873
   Entries with pointer(s) to both EMBL and PIR entri(es):  3575
   Entries with no pointers lines (entered in house):        732



      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 11

   2.1  Sequences and annotations

   Some 1466  new sequences  have been  added since  the last  release, the
   sequence  data  of  173  existing  entries  has  been  updated  and  the
   annotations of  2400 entries  have been  revised. In  particular we have
   used reviews  articles to update the annotations of the following groups
   or families of proteins:

      Acyl carrier proteins
      Aminoacyl-transfer RNA synthetases
      Biotin-requiring enzymes
      Chloroplast photosystems I and II proteins
      Creatine kinases
      Crp bacterial activator proteins
      Enolases
      Glucose-6-phosphate dehydrogenases
      Glutaredoxins
      GTP-binding elongation factors
      Heat shock hsp90 proteins
      Insulin family proteins
      Insulin-like growth factor binding proteins
      Insect-type alcohol dehydrogenases / ribitol dehydrogenase family
      Integrins
      Iron-containing alcohol dehydrogenases
      LysR bacterial activator proteins
      Malate dehydrogenase
      Mammalian defensins
      Mitochondrial energy transfer proteins
      Myc-type proteins
      Phosphoglucose isomerases
      Phosphoglycerate kinases
      Serine/threonine specific protein phosphatases
      Sugar transporters
      Uracil-DNA glycosylases
      Vertebrate galactoside-binding lectins
      Zinc-containing alcohol dehydrogenases


   2.2  New line-type

   This release  introduce an  new type  of data  line, the OG line. The OG
   (OrGanelle) lines  indicate if  the gene coding for a protein originates
   from the mitochondria, the chloroplast, or a plasmid. The format for the
   OG line is:

   OG   CHLOROPLAST.
   OG   MITOCHONDRION.
   OG   PLASMID name.

   Where 'name' is the name of the plasmid.

   Previously this  information was  stored in the OS line, as shown in the
   example below.

   OS   WHEAT (TRITICUM AESTIVUM) CHLOROPLAST.

   The above example will now be stored as:

   OS   WHEAT (TRITICUM AESTIVUM).
   OG   CHLOROPLAST.


   2.3  New topic for the comments (CC) line type

   As of release 12 we have added a new 'topic' for the comments (CC) line-
   type: CAUTION,  which is  used to  warn  about  possible  errors  and/or
   grounds for confusion. Example of its usage:

     CC   -!- CAUTION: ALSO SEE VERSION 2 OF THIS PROTEIN THAT DIFFERS DUE
     CC       TO A FRAMESHIFT.


   2.4  Documentation changes

   -  ACINDEX.TXT is  a new  document file  which is  an index  of all  the
      accession numbers  which appear  in SWISS-PROT  and the  name of  the
      entries in which they occur.
   -  PDBTOSP.TXT is  a new  document file  which is  an index  of all  the
      Brookhaven PDB entries referenced in SWISS-PROT.
   -  The JOURLIST.TXT document now indicates the abbreviation and the full
      names of all journals cited in SWISS-PROT.


                             3. THE NEXT RELEASE

   SWISS-PROT release 13.0 will be available in January 1990.

   Starting with  release 13 SWISS-PROT will be distributed with PROSITE, a
   data bank  of sites  and patterns  in proteins.  Both data banks will be
   fully cross-referenced.



                            4. WE NEED YOUR HELP !

   We welcome  any feedback  from our users. We especially would appreciate
   that you notify us if you find that sequences belonging to your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about annotations to be updated, as for example if the function
   of a protein has been clarified or if new post-translational information
   has become available.



                         APPENDIX A: SOME STATISTICS

   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.70   Gln (Q) 4.10   Leu (L) 9.12   Ser (S) 7.03
   Arg (R) 5.21   Glu (E) 6.23   Lys (K) 5.85   Thr (T) 5.85
   Asn (N) 4.39   Gly (G) 7.21   Met (M) 2.29   Trp (W) 1.34
   Asp (D) 5.21   His (H) 2.27   Phe (F) 3.95   Tyr (Y) 3.21
   Cys (C) 1.85   Ile (I) 5.38   Pro (P) 5.14   Val (V) 6.50

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Thr = Lys, Ile, Arg = Asp, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 1841

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 834
                            2x: 345
                            3x: 180
                            4x: 117
                            5x:  73
                            6x:  60
                            7x:  27
                            8x:  29
                            9x:  38
                           10x:  16
                       11- 20x:  61
                       21-100x:  49
                         >100x:  12



        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        1093          Human
         2         951          Escherichia coli
         3         621          Mouse
         4         519          Rat
         5         397          Baker's yeast (Saccharomyces cerevisiae)
         6         337          Bovine
         7         210          Fruit fly (Drosophila melanogaster)
         8         205          Chicken
         9         174          Rabbit
        10         149          Pig
        11         133          Bacillus subtilis
        12         113          African clawed frog (Xenopus laevis)
        13          98          Tobacco
        14          96          Salmonella typhimurium
        15          94          Maize
        16          89          Rice
        17          84          Liverwort (Marchantia polymorpha)
        18          80          Bacteriophage T4
        19          77          Wheat
        20          70          Herpes virus (Type 1, Strain 17)
        21          69          Spinach
        22          68          Vaccinia Virus
        23          67          Varicella-Zoster virus (Strain Dumas)
        24          63          Soybean
        25          62          Bacteriophage Lambda



   A.3  Repartition of the sequences by size

      From   To  Number             From   To   Number
         1-  50     782             1001-1100       96
        51- 100    1520             1101-1200       63
       101- 150    2297             1201-1300       51
       151- 200    1257             1301-1400       31
       201- 250     978             1401-1500       22
       251- 300     827             1501-1600       13
       301- 350     728             1601-1700       15
       351- 400     708             1701-1800       11
       401- 450     540             1801-1900        9
       451- 500     615             1901-2000       14
       501- 550     476                 >2000       68
       551- 600     282
       601- 650     215
       651- 700     156
       701- 750     130
       751- 800      93
       801- 850      93
       851- 900     112
       901- 950      51
       951-1000      52


   Currently the three largest sequences are:

   RYNR$RABIT  5037 a.a.
   APB$HUMAN   4563 a.a.
   APOA$HUMAN  4548 a.a.



                       APPENDIX B: DISKS FOR SWISS-PROT

   B.1  IBM PC/AT 1.2 Mb disks

   SWISS-PROT release  12 is  stored on sixteen 1.2 Mb disks. Each of these
   disk contains a single bulk file (PRT12_01.BLK to PRT12_16.BLK):

   Disk     First sequence        Last Sequence
    1       10K5$ECOLI            ATP6$YEAST
    2       ATP8$ASPAM            CHLN$ECOLI
    3       CHOA$STRSP            CYC$MIRLE
    4       CYC$MOUSE             FA10$HUMAN
    5       FA11$HUMAN            GTA1$RAT
    6       GTA2$RAT              HMEN$DROME
    7       HMEN$DROVI            KAD1$HUMAN
    8       KAD1$PIG              M4$DICDI
    9       M5$ECOLI              NRAM$INACR
   10       NRAM$INADA            POL$HIV2I
   11       POL$HIV2N             RBS4$LYCES
   12       RBS4$SOYBN            SMS1$HUMAN
   13       SMS1$ICTPU            TRPC$ACICA
   14       TRPC$ASPNG            VIP$HUMAN
   15       VIP$PIG               YU74$ECOLI
   16       YVL1$HCMVA            ZP3$MOUSE


   B.2  IBM PS/2 1.4 Mb disks

   The number  and content  of the  1.4 Mb  disks for  the PS/2 systems are
   exactly identical to those of the 1.2 Mb disks (see above).

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by br flag LNCC Brazil Mirror sites: Australia  Canada  China  Korea  Switzerland