UniProt Knowledgebase
Swiss-Prot Protein Knowledgebase
TrEMBL Protein Database

Release notes
UniProtKB release 9.0 of 31-Oct-2006

Content

  Introduction
  UniProtKB/Swiss-Prot Protein Knowledgebase release statistics
  UniProtKB/TrEMBL Protein Database release statistics

  Submissions and Updates
  Download information
  Contact
  Citation

  Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.

Introduction

Release 9.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 51.0 and the UniProtKB/TrEMBL Protein Database release 34.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.


UniProtKB/Swiss-Prot protein knowledgebase release 51.0 statistics

Release 51.0 of 31-Oct-06 of UniProtKB/Swiss-Prot contains 241'242 sequence entries, comprising 88'541'632 amino acids abstracted from 148'048 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146
51.0 10/06 241'242 88'541'632

In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 4'551
C.albicans None yet calbican.txt 572
C.elegans Wormpep celegans.txt 2'966
D.discoideum DictyBase dicty.txt 332
D.melanogaster FlyBase fly.txt 2'436
M.musculus MGD mgdtosp.txt 11'897
S.cerevisiae SGD yeast.txt 5'916
S.pombe GeneDB_SPombe pombe.txt 3'082

UniProtKB/Swiss-Prot release statistics

1.  INTRODUCTION

Release 51.0 of 31-Oct-06 of UniProtKB/Swiss-Prot contains 241242 sequence entries,
comprising 88541632 amino acids abstracted from 148048 references.

19061 sequences have been added since release 50.0, the sequence data of
1336 existing entries has been updated and the annotations of
222181 entries have been revised.

The growth of the database is summarized below.


2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 7.89   Gln (Q) 3.95   Leu (L) 9.65   Ser (S) 6.82
   Arg (R) 5.40   Glu (E) 6.67   Lys (K) 5.92   Thr (T) 5.41
   Asn (N) 4.13   Gly (G) 6.96   Met (M) 2.38   Trp (W) 1.13
   Asp (D) 5.35   His (H) 2.29   Phe (F) 3.96   Tyr (Y) 3.03
   Cys (C) 1.50   Ile (I) 5.90   Pro (P) 4.83   Val (V) 6.73

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Arg, Asp, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/Swiss-Prot: 10671

   The first twenty species represent 76403 sequences:  31.7 % of the total
   number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x: 5274
                            2x: 1616
                            3x:  788
                            4x:  486
                            5x:  340
                            6x:  295
                            7x:  202
                            8x:  177
                            9x:  156
                           10x:   80
                       11- 20x:  416
                       21- 50x:  340
                       51-100x:  133
                         >100x:  368


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1      14987  Homo sapiens (Human)
       2      11897  Mus musculus (Mouse)
       3       5916  Saccharomyces cerevisiae (Baker's yeast)
       4       5528  Rattus norvegicus (Rat)
       5       4877  Escherichia coli
       6       4551  Arabidopsis thaliana (Mouse-ear cress)
       7       3082  Schizosaccharomyces pombe (Fission yeast)
       8       2966  Caenorhabditis elegans
       9       2872  Bos taurus (Bovine)
      10       2842  Bacillus subtilis
      11       2436  Drosophila melanogaster (Fruit fly)
      12       1837  Escherichia coli O157:H7
      13       1782  Methanococcus jannaschii
      14       1774  Haemophilus influenzae
      15       1587  Salmonella typhimurium
      16       1556  Gallus gallus (Chicken)
      17       1509  Escherichia coli O6
      18       1508  Xenopus laevis (African clawed frog)
      19       1486  Shigella flexneri
      20       1410  Mycobacterium tuberculosis
      21       1347  Pongo pygmaeus (Orangutan)
      22       1182  Salmonella typhi
      23       1153  Mycobacterium bovis
      24       1105  Sus scrofa (Pig)
      25       1089  Pseudomonas aeruginosa
      26       1014  Oryza sativa (Rice)
      27        971  Archaeoglobus fulgidus
      28        970  Synechocystis sp. (strain PCC 6803)
      29        930  Brachydanio rerio (Zebrafish) (Danio rerio)
      30        884  Mimivirus
      31        866  Yersinia pestis
      32        863  Vibrio cholerae
      33        857  Rhizobium meliloti (Sinorhizobium meliloti)
      34        807  Oryctolagus cuniculus (Rabbit)
      35        754  Aquifex aeolicus
      36        723  Pasteurella multocida
      37        707  Vibrio parahaemolyticus
      38        690  Staphylococcus aureus (strain Mu50 / ATCC 700699)
      39        688  Staphylococcus aureus (strain N315)
      40        687  Mycoplasma pneumoniae
      41        677  Streptomyces coelicolor
      42        672  Staphylococcus aureus (strain MW2)
      43        670  Staphylococcus aureus (strain COL)
      44        669  Staphylococcus aureus (strain MRSA252)
      45        668  Staphylococcus aureus (strain MSSA476)
      46        660  Bacillus halodurans
      47        659  Canis familiaris (Dog)
      48        655  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
      49        650  Vibrio vulnificus
      50        631  Vibrio vulnificus (strain YJ016)
      51        630  Mycobacterium leprae
      52        612  Anabaena sp. (strain PCC 7120)
      53        608  Treponema pallidum
      54        589  Pseudomonas putida (strain KT2440)
      55        589  Pseudomonas syringae pv. tomato
      56        587  Bacillus anthracis
      57        587  Methanobacterium thermoautotrophicum
      58        581  Neurospora crassa
      59        577  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
      60        577  Staphylococcus epidermidis (strain ATCC 12228)
      61        572  Buchnera aphidicola subsp. Acyrthosiphon pisum
      62        572  Candida albicans (Yeast)
      63        570  Helicobacter pylori (Campylobacter pylori)
      64        569  Ashbya gossypii (Yeast) (Eremothecium gossypii)
      65        568  Photorhabdus luminescens subsp. laumondii
      66        565  Bradyrhizobium japonicum
      67        562  Pan troglodytes (Chimpanzee)
      68        562  Buchnera aphidicola subsp. Schizaphis graminum
      69        561  Yersinia pseudotuberculosis
      70        551  Helicobacter pylori J99 (Campylobacter pylori J99)
      71        551  Ralstonia solanacearum (Pseudomonas solanacearum)
      72        549  Rickettsia prowazekii
      73        548  Zea mays (Maize)
      74        548  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      75        543  Lactococcus lactis subsp. lactis (Streptococcus lactis)
      76        540  Rhizobium loti (Mesorhizobium loti)
      77        539  Listeria monocytogenes
      78        535  Kluyveromyces lactis (Yeast) (Candida sphaerica)
      79        531  Listeria innocua
      80        528  Xanthomonas campestris pv. campestris
      81        518  Neisseria meningitidis serogroup A
      82        517  Neisseria meningitidis serogroup B
      83        516  Shewanella oneidensis
      84        512  Bacillus cereus (strain ATCC 14579 / DSM 31)
      85        507  Buchnera aphidicola subsp. Baizongia pistaciae
      86        507  Clostridium acetobutylicum
      87        505  Caulobacter crescentus (Caulobacter vibrioides)
      88        501  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
      89        491  Xanthomonas axonopodis pv. citri
      90        484  Candida glabrata (Yeast) (Torulopsis glabrata)
      91        483  Mycoplasma genitalium
      92        483  Thermotoga maritima
      93        483  Salmonella paratyphi-a
      94        478  Streptococcus pneumoniae
      95        471  Xylella fastidiosa
      96        470  Listeria monocytogenes serotype 4b (strain F2365)
      97        462  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
      98        461  Deinococcus radiodurans
      99        460  Brucella melitensis
     100        460  Oceanobacillus iheyensis
     101        460  Brucella suis
     102        452  Haemophilus ducreyi
     103        448  Methanosarcina acetivorans
     104        446  Pyrococcus horikoshii
     105        443  Corynebacterium glutamicum (Brevibacterium flavum)
     106        441  Pyrococcus abyssi
     107        441  Clostridium perfringens
     108        439  Halobacterium salinarium (Halobacterium halobium)
     109        435  Chlamydia trachomatis
     110        429  Methanosarcina mazei (Methanosarcina frisia)
     111        426  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
     112        421  Borrelia burgdorferi (Lyme disease spirochete)
     113        420  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
     114        420  Photobacterium profundum (Photobacterium sp. (strain SS9))
     115        417  Nicotiana tabacum (Common tobacco)
     116        416  Pyrococcus furiosus
     117        415  Chlamydia pneumoniae (Chlamydophila pneumoniae)
     118        414  Chromobacterium violaceum
     119        413  Bordetella parapertussis
     120        413  Bordetella pertussis
     121        411  Thermoanaerobacter tengcongensis
     122        410  Bacillus cereus (strain ATCC 10987)
     123        410  Lactobacillus plantarum
     124        409  Synechococcus elongatus (Thermosynechococcus elongatus)
     125        406  Chlamydia muridarum
     126        405  Emericella nidulans (Aspergillus nidulans)
     127        405  Rhizobium sp. (strain NGR234)
     128        404  Campylobacter jejuni
     129        401  Streptococcus pyogenes serotype M6
     130        401  Streptococcus mutans
     131        401  Ovis aries (Sheep)
     132        400  Enterococcus faecalis (Streptococcus faecalis)
     133        395  Sulfolobus solfataricus
     134        395  Streptomyces avermitilis
     135        395  Salmonella choleraesuis
     136        393  Yarrowia lipolytica (Candida lipolytica)
     137        389  Streptococcus pyogenes serotype M1
     138        384  Streptococcus pyogenes serotype M18
     139        383  Streptococcus pyogenes serotype M3
     140        380  Rickettsia conorii
     141        374  Bacillus thuringiensis subsp. konkukian
     142        365  Chlorobium tepidum
     143        361  Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
     144        360  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
     145        360  Corynebacterium efficiens
     146        356  Rhodopseudomonas palustris
     147        356  Nitrosomonas europaea
     148        354  Acinetobacter sp. (strain ADP1)
     149        350  Methanopyrus kandleri
     150        348  Aeropyrum pernix
     151        347  Leptospira interrogans
     152        342  Gloeobacter violaceus
     153        341  Burkholderia pseudomallei (Pseudomonas pseudomallei)
     154        341  Bacillus cereus (strain ZK / E33L)
     155        339  Pisum sativum (Garden pea)
     156        337  Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
     157        332  Dictyostelium discoideum (Slime mold)
     158        332  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
     159        331  Streptococcus agalactiae serotype III
     160        331  Bacillus clausii (strain KSM-K16)
     161        329  Streptococcus agalactiae serotype V
     162        328  Synechococcus sp. (strain WH8102)
     163        328  Sulfolobus tokodaii
     164        326  Mannheimia succiniciproducens (strain MBEL55E)
     165        321  Prochlorococcus marinus (strain MIT 9313)
     166        321  Prochlorococcus marinus
     167        319  Burkholderia mallei (Pseudomonas mallei)
     168        318  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
     169        313  Methylococcus capsulatus
     170        313  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
     171        312  Vibrio fischeri (strain ATCC 700601 / ES114)
     172        311  Thermoplasma acidophilum
     173        309  Staphylococcus aureus
     174        308  Rhodopirellula baltica
     175        305  Triticum aestivum (Wheat)
     176        302  Fusobacterium nucleatum subsp. nucleatum
     177        300  Mycobacterium paratuberculosis
     178        300  Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
     179        300  Geobacillus kaustophilus
     180        298  Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1)
     181        297  Coxiella burnetii
     182        297  Staphylococcus haemolyticus (strain JCSC1435)
     183        297  Macaca mulatta (Rhesus macaque)
     184        297  Geobacter sulfurreducens
     185        292  Glycine max (Soybean)
     186        291  Staphylococcus saprophyticus subsp. saprophyticus
     187        290  Aspergillus fumigatus (Sartorya fumigata)
     188        287  Sulfolobus acidocaldarius
     189        286  Idiomarina loihiensis
     190        286  Solanum tuberosum (Potato)
     191        286  Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
     192        284  Pseudomonas putida
     193        283  Bacteroides thetaiotaomicron
     194        279  Wolinella succinogenes
     195        279  Pyrobaculum aerophilum
     196        278  Cavia porcellus (Guinea pig)
     197        278  Nocardia farcinica
     198        278  Hordeum vulgare (Barley)
     199        277  Zymomonas mobilis
     200        277  Clostridium tetani
     201        275  Thermoplasma volcanium
     202        274  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
     203        269  Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
     204        268  Bacteriophage T4
     205        267  Symbiobacterium thermophilum
     206        267  Spinacia oleracea (Spinach)
     207        266  Corynebacterium diphtheriae
     208        266  Shigella sonnei (strain Ss046)
     209        261  Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
     210        261  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
     211        259  Azoarcus sp. (strain EbN1)
     212        259  Brucella abortus
     213        256  Legionella pneumophila subsp. pneumophila
     214        255  Silicibacter pomeroyi
     215        255  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
     216        255  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
     217        254  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
     218        254  Vaccinia virus (strain Copenhagen) (VACV)
     219        254  Wigglesworthia glossinidia brevipalpis
     220        251  Haloarcula marismortui (Halobacterium marismortui)
     221        251  Legionella pneumophila (strain Paris)
     222        251  Helicobacter hepaticus
     223        251  Methanococcus maripaludis
     224        250  Xanthomonas oryzae pv. oryzae
     225        249  Equus caballus (Horse)
     226        249  Shigella boydii serotype 4 (strain Sb227)
     227        249  Legionella pneumophila (strain Lens)
     228        248  Bifidobacterium longum
     229        247  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
     230        245  Pseudomonas syringae pv. syringae (strain B728a)
     231        242  Porphyromonas gingivalis (Bacteroides gingivalis)
     232        241  Shigella dysenteriae serotype 1 (strain Sd197)
     233        240  Chlamydophila caviae
     234        240  Leifsonia xyli subsp. xyli
     235        236  Haemophilus influenzae (strain 86-028NP)
     236        235  Bacillus stearothermophilus (Geobacillus stearothermophilus)
     237        232  Bacteroides fragilis
     238        231  Blochmannia floridanus
     239        229  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
     240        228  Gluconobacter oxydans (Gluconobacter suboxydans)
     241        226  Campylobacter jejuni (strain RM1221)
     242        225  Lactobacillus johnsonii
     243        224  Propionibacterium acnes
     244        223  Bartonella henselae (Rochalimaea henselae)
     245        223  Desulfotalea psychrophila
     246        222  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
     247        220  Porphyra purpurea
     248        220  Chlamydomonas reinhardtii
     249        216  Gorilla gorilla gorilla (Lowland gorilla)
     250        213  Cryptococcus neoformans (Filobasidiella neoformans)
     251        212  Bartonella quintana (Rochalimaea quintana)
     252        212  Pseudomonas fluorescens (strain PfO-1)
     253        211  Klebsiella pneumoniae
     254        210  Xanthomonas campestris pv. campestris (strain 8004)
     255        207  Cricetulus griseus (Chinese hamster)
     256        206  Burkholderia sp. (strain 383) (Burkholderia cepacia
     257        206  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
     258        205  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
     259        203  Bdellovibrio bacteriovorus
     260        201  Felis silvestris catus (Cat)
     261        200  Vaccinia virus (strain Western Reserve / WR) (VACV)
     262        200  Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)


   
   3.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea           10690 (  4%)
    Bacteria         116347 ( 48%)
    Eukaryota        103579 ( 43%)
    Viruses           10626 (  4%)


   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  14988 ( 14%)           (  6%)
     Other Mammalia         32159 ( 31%)           ( 13%)
     Other Vertebrata        9382 (  9%)           (  4%)
     Viridiplantae          16436 ( 16%)           (  7%)
     Fungi                  16083 ( 16%)           (  7%)
     Insecta                 4691 (  5%)           (  2%)
     Nematoda                3376 (  3%)           (  1%)
     Other                   6464 (  6%)           (  3%)


4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50    4507             1001-1100     2086
                 51- 100   16972             1101-1200     1370
                101- 150   25049             1201-1300     1062
                151- 200   23742             1301-1400      887
                201- 250   24479             1401-1500      740
                251- 300   20223             1501-1600      376
                301- 350   21179             1601-1700      275
                351- 400   19510             1701-1800      219
                401- 450   15270             1801-1900      211
                451- 500   13104             1901-2000      172
                501- 550    9799             2001-2100      118
                551- 600    6807             2101-2200      177
                601- 650    5760             2201-2300      160
                651- 700    3886             2301-2400      106
                701- 750    3213             2401-2500       90
                751- 800    2663             >2500          627
                801- 850    2287
                851- 900    2428
                901- 950    1811
                951-1000    1433


   The average sequence length in UniProtKB/Swiss-Prot is 367 amino acids.

   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
   The longest sequence is  TITIN_HUMAN (Q8WZ42): 34350 amino acids.


5.  JOURNAL CITATIONS

   Note: the following citation statistics reflect the number of distinct
         journal citations.

   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1756


   5.1 Table of the frequency of journal citations

        Journals cited 1x:  615
                       2x:  231
                       3x:  130
                       4x:   89
                       5x:   64
                       6x:   48
                       7x:   38
                       8x:   29
                       9x:   31
                      10x:   21
                  11- 20x:  120
                  21- 50x:  151
                  51-100x:   62
                    >100x:  127


   5.2  List of the most cited journals in UniProtKB/Swiss-Prot

   Nb    Citations   Journal name
   --    ---------   -------------------------------------------------------------
    1        14192   Journal of Biological Chemistry
    2         6814   Proceedings of the National Academy of Sciences of the U.S.A.
    3         4398   Journal of Bacteriology
    4         4128   Gene
    5         4011   Nucleic Acids Research
    6         3687   Biochemical and Biophysical Research Communications
    7         3459   FEBS Letters
    8         3183   Biochemistry
    9         3108   The EMBO Journal
   10         2902   European Journal of Biochemistry
   11         2727   Nature
   12         2601   Biochimica et Biophysica Acta
   13         2570   Molecular and Cellular Biology
   14         2424   Journal of Molecular Biology
   15         2250   Genomics
   16         2195   Cell
   17         1772   Biochemical Journal
   18         1666   Science
   19         1443   Molecular Microbiology
   20         1329   Plant Molecular Biology
   21         1265   Molecular and General Genetics
   22         1192   Journal of Cell Biology
   23         1177   Journal of Virology
   24         1081   Virology
   25         1062   Human Molecular Genetics
   26         1059   Journal of Biochemistry
   27         1010   Nature Genetics
   28         1004   Genes and Development
   29          906   Plant Physiology
   30          904   Oncogene
   31          873   The American Journal of Human Genetics
   32          802   Human Mutation
   33          763   Journal of Immunology
   34          737   Infection and Immunity
   35          726   Development
   36          703   Structure
   37          699   Genetics
   38          681   Yeast
   39          675   Archives of Biochemistry and Biophysics
   40          641   Journal of General Virology
   41          603   Microbiology
   42          585   Molecular Biology of the Cell
   43          551   FEMS Microbiology Letters
   44          544   Blood
   45          536   Nature Structural Biology
   46          525   The Plant Cell
   47          487   Human Genetics
   48          475   Current Genetics
   49          468   Cancer Research
   50          467   Journal of Cell Science
   51          465   Molecular Cell
   52          444   Developmental Biology
   53          429   Applied and Environmental Microbiology
   54          426   Mechanisms of Development
   55          426   The Plant Journal
   56          413   Journal of Clinical Investigation
   57          413   Protein Science
   58          409   Neuron
   59          406   Mammalian Genome
   60          406   Acta Crystallographica, Section D
   61          400   Molecular and Biochemical Parasitology
   62          383   Molecular Endocrinology
   63          376   Journal of Neuroscience
   64          372   The Journal of Experimental Medicine
   65          370   Current Biology
   66          364   Immunogenetics
   67          341   Journal of Molecular Evolution
   68          333   Endocrinology
   69          333   DNA and Cell Biology
   70          322   Journal of Neurochemistry
   71          307   DNA Sequence
   72          291   The Journal of Clinical Endocrinology and Metabolism
   73          291   American Journal of Physiology
   74          285   Biological Chemistry Hoppe-Seyler
   75          282   Toxicon
   76          281   Molecular Biology and Evolution
   77          274   Bioscience, Biotechnology, and Biochemistry
   78          273   Brain Research. Molecular Brain Research
   79          247   Cytogenetics and Cell Genetics
   80          242   Journal of General Microbiology
   81          231   Comparative Biochemistry and Physiology
   82          229   Proteins
   83          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
   84          215   Antimicrobial Agents and Chemotherapy
   85          212   Journal of Medical Genetics
   86          210   Molecular Pharmacology
   87          205   Peptides
   88          193   Journal of Investigative Dermatology
   89          186   Biology of Reproduction
   90          181   Plant and Cell Physiology
   91          181   DNA Research
   92          180   Genome Research
   93          178   Molecular Plant-Microbe Interactions
   94          171   Nature Cell Biology
   95          171   European Journal of Immunology
   96          169   Virus Research
   97          158   Experimental Cell Research
   98          158   Tissue Antigens
   99          158   DNA
  100          157   Biochimie
  101          150   RNA
  102          146   Molecular and Cellular Endocrinology
  103          146   Molecular Phylogenetics and Evolution
  104          145   Hemoglobin
  105          144   Bioorganicheskaia Khimiia
  106          143   American Journal of Medical Genetics
  107          137   Archives of Microbiology
  108          134   Neurology
  109          133   Annals of Neurology
  110          132   Developmental Dynamics
  111          131   European Journal of Human Genetics
  112          129   Insect Biochemistry and Molecular Biology
  113          126   Journal of Human Genetics
  114          124   Genes to Cells
  115          123   Immunity
  116          118   Agricultural and Biological Chemistry
  117          117   Molecular Reproduction and Development
  118          116   General and Comparative Endocrinology
  119          116   Animal Genetics
  120          115   Planta
  121          112   Diabetes
  122          110   Molecular Immunology
  123          108   Glycobiology
  124          107   Developmental Cell
  125          106   Investigative Ophthalmology and Visual Science
  126          103   Journal of Protein Chemistry
  127          101   The New England Journal of Medicine


6.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                     475775              1.97
   Journal                          417892    219561    1.73
   Submitted to EMBL/GenBank/DDBJ    54275     47414    0.22
   Submitted to Swiss-Prot             788       784   <0.01
   Unpublished observations            629       623   <0.01
   Submitted to other databases        579       566   <0.01
   Book citation                       566       554   <0.01
   Plant Gene Register                 531       519   <0.01
   Thesis                              378       376   <0.01
   Patent                              131       129   <0.01
   Worm Breeder's Gazette                6         6   <0.01

Comments (CC)                       967004              4.01
   SIMILARITY                       271184    219367    1.12
   FUNCTION                         169309    163414    0.70
   SUBCELLULAR LOCATION             130620    130620    0.54
   CATALYTIC ACTIVITY                91194     84076    0.38
   SUBUNIT                           87937     87937    0.36
   PATHWAY                           48385     41451    0.20
   COFACTOR                          36561     32578    0.15
   TISSUE SPECIFICITY                23913     23913    0.10
   MISCELLANEOUS                     20879     18880    0.09
   PTM                               19119     15678    0.08
   DOMAIN                            14102     12187    0.06
   ALTERNATIVE PRODUCTS              10235     10235    0.04
   CAUTION                            8915      8197    0.04
   INDUCTION                          6672      6672    0.03
   INTERACTION                        5931      5931    0.02
   DEVELOPMENTAL STAGE                5926      5926    0.02
   ENZYME REGULATION                  3541      3541    0.01
   DISEASE                            3457      2516    0.01
   WEB RESOURCE                       3060      2533    0.01
   MASS SPECTROMETRY                  2556      2135    0.01
   BIOPHYSICOCHEMICAL PROPERTIES      1564      1564    0.01
   POLYMORPHISM                        562       549   <0.01
   RNA EDITING                         457       457   <0.01
   ALLERGEN                            413       413   <0.01
   TOXIC DOSE                          307       304   <0.01
   BIOTECHNOLOGY                       136       136   <0.01
   PHARMACEUTICAL                       69        69   <0.01

Features (FT)                      1694975              7.03
   CHAIN                            245076    237799    1.02
   TRANSMEM                         152755     33630    0.63
   TURN                             117303      9164    0.49
   METAL                            104372     25294    0.43
   STRAND                            90666      8491    0.38
   HELIX                             86054      8898    0.36
   CONFLICT                          85085     29528    0.35
   TOPO_DOM                          80831     16416    0.34
   DOMAIN                            76562     41407    0.32
   CARBOHYD                          72770     18269    0.30
   DISULFID                          71207     18191    0.30
   ACT_SITE                          55572     32657    0.23
   REPEAT                            51933      7667    0.22
   BINDING                           47911     19347    0.20
   MOD_RES                           42295     18871    0.18
   VARIANT                           41927      8508    0.17
   NP_BIND                           35175     25159    0.15
   REGION                            35073     18260    0.15
   COMPBIAS                          23586     13408    0.10
   SIGNAL                            23081     23071    0.10
   VAR_SEQ                           22047      9609    0.09
   MUTAGEN                           17867      4388    0.07
   MOTIF                             17158     11356    0.07
   ZN_FING                           16881      6588    0.07
   SITE                              14257      8162    0.06
   NON_TER                           10836      8297    0.04
   INIT_MET                          10172     10172    0.04
   COILED                             8808      5634    0.04
   PROPEP                             7394      6204    0.03
   LIPID                              6883      4535    0.03
   DNA_BIND                           6558      6123    0.03
   PEPTIDE                            6429      3966    0.03
   TRANSIT                            4212      4175    0.02
   CA_BIND                            2640      1086    0.01
   CROSSLNK                           1743      1175    0.01
   NON_CONS                           1150       519   <0.01
   UNSURE                              457       178   <0.01
   SE_CYS                              249       180   <0.01

Cross-references (DR)              3117026             12.92
   InterPro                         579873    222611    2.40
   EMBL                             456684    232945    1.89
   Pfam                             304763    215339    1.26
   PROSITE                          224649    137580    0.93
   GO                               212471     91220    0.88
   GenomeReviews                    138640    122934    0.57
   KEGG                             113059    102134    0.47
   PIR                               97026     90613    0.40
   TIGRFAMs                          94134     88121    0.39
   HAMAP                             92615     92497    0.38
   PRINTS                            89163     70134    0.37
   HSSP                              78938     78938    0.33
   SMART                             71541     54152    0.30
   BioCyc                            70591     65335    0.29
   ProDom                            57638     55724    0.24
   Ensembl                           42511     42498    0.18
   UniGene                           38777     36129    0.16
   PANTHER                           38091     37880    0.16
   PDB                               36752     10060    0.15
   SMR                               34082     34082    0.14
   ArrayExpress                      33838     33838    0.14
   RZPD-ProtExp                      25639     12023    0.11
   TIGR                              22645     22052    0.09
   PIRSF                             19888     19634    0.08
   LinkHub                           17389     17388    0.07
   HGNC                              14412     14352    0.06
   MIM                               12287     10033    0.05
   MGI                               11746     11700    0.05
   IntAct                            10997     10997    0.05
   SGD                                5974      5906    0.02
   MEROPS                             5241      4936    0.02
   RGD                                5225      5222    0.02
   GermOnline                         4925      4879    0.02
   TAIR                               4609      4521    0.02
   EcoGene                            4259      4256    0.02
   EchoBASE                           4160      4128    0.02
   H-InvDB                            3677      3659    0.02
   WormPep                            3566      2963    0.01
   WormBase                           3195      3114    0.01
   FlyBase                            3164      3040    0.01
   GeneDB_Spombe                      3115      3080    0.01
   TRANSFAC                           2862      2569    0.01
   SubtiList                          2784      2783    0.01
   Gramene                            2675      2675    0.01
   GeneFarm                           1761      1742    0.01
   StyGene                            1543      1539    0.01
   HPA                                1480      1320    0.01
   TubercuList                        1438      1402    0.01
   SWISS-2DPAGE                       1170      1170   <0.01
   ListiList                          1071      1063   <0.01
   Reactome                           1003      1003   <0.01
   ZFIN                                917       907   <0.01
   Leproma                             633       630   <0.01
   AGD                                 575       569   <0.01
   PhotoList                           568       568   <0.01
   LegioList                           500       500   <0.01
   MaizeDB                             439       434   <0.01
   OGP                                 375       374   <0.01
   HIV                                 361       356   <0.01
   REBASE                              353       349   <0.01
   ECO2DBASE                           351       299   <0.01
   DictyBase                           334       331   <0.01
   SagaList                            332       331   <0.01
   GlycoSuiteDB                        282       282   <0.01
   PeroxiBase                          265       258   <0.01
   PHCI-2DPAGE                         241       241   <0.01
   MypuList                            189       189   <0.01
   Aarhus/Ghent-2DPAGE                 128        98   <0.01
   Siena-2DPAGE                        103       103   <0.01
   HSC-2DPAGE                           85        85   <0.01
   PhosSite                             70        70   <0.01
   COMPLUYEAST-2DPAGE                   59        59   <0.01
   PMMA-2DPAGE                          52        52   <0.01
   PptaseDB                             29        29   <0.01
   Rat-heart-2DPAGE                     28        28   <0.01
   ANU-2DPAGE                           21        21   <0.01

Number of explicitly cross-referenced databases: 78
Number of implicitly cross-referenced databases: 27


7.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/Swiss-Prot: 230300

Total number of entries encoded on a Mitochondrion: 4085
Total number of entries encoded on a Plasmid: 3160
Total number of entries encoded on a Plastid: 26
Total number of entries encoded on a Plastid; Apicoplast: 6
Total number of entries encoded on a Plastid; Chloroplast: 5862
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 90

Number of fragments: 8444
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 16655 


UniProtKB/TrEMBL protein database release 34.0 statistics


1.  INTRODUCTION

Release 34.0 of 31-Oct-2006 of UniProtKB/TrEMBL contains 3313264 sequence entries
comprising 1073273937 amino acids.

497407 sequences have been added since release 33, the sequence data of
2732 existing entries has been updated and the annotations of
2815857 entries have been revised. This represents an increase of 18%.


2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 8.30   Gln (Q) 3.93   Leu (L) 9.81   Ser (S) 6.92
   Arg (R) 5.52   Glu (E) 6.04   Lys (K) 5.27   Thr (T) 5.65
   Asn (N) 4.32   Gly (G) 7.01   Met (M) 2.40   Trp (W) 1.34
   Asp (D) 5.21   His (H) 2.24   Phe (F) 4.05   Tyr (Y) 3.04
   Cys (C) 1.40   Ile (I) 5.94   Pro (P) 4.88   Val (V) 6.59

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 119998

   The first twenty species represent  673630 sequences:  20.3 % of the
   total number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x:55767
                            2x:22572
                            3x:11680
                            4x: 6456
                            5x: 3574
                            6x: 2775
                            7x: 1989
                            8x: 1662
                            9x: 1274
                           10x: 1259
                       11- 20x: 6025
                       21- 50x: 2493
                       51-100x: 1023
                         >100x: 1449


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     162793  Human immunodeficiency virus 1
       2      71887  Oryza sativa (japonica cultivar-group)
       3      55035  Homo sapiens (Human)
       4      47627  Mus musculus (Mouse)
       5      44945  Arabidopsis thaliana (Mouse-ear cress)
       6      32207  Hepatitis C virus
       7      28028  Tetraodon nigroviridis (Green puffer)
       8      27313  Tetrahymena thermophila SB210
       9      24948  Drosophila melanogaster (Fruit fly)
      10      20246  Caenorhabditis elegans
      11      20134  Trypanosoma cruzi
      12      17387  Medicago truncatula (Barrel medic)
      13      16934  Brachydanio rerio (Zebrafish) (Danio rerio)
      14      16817  Aedes aegypti (Yellowfever mosquito)
      15      16450  Phaeosphaeria nodorum SN15
      16      15078  Anopheles gambiae str. PEST
      17      14942  uncultured bacterium
      18      14666  Plasmodium chabaudi
      19      13103  Caenorhabditis briggsae
      20      13090  Dictyostelium discoideum AX4
      21      12866  Hepatitis B virus (HBV)
      22      12285  Xenopus laevis (African clawed frog)
      23      12042  Aspergillus oryzae
      24      11773  Plasmodium berghei
      25      11656  Gibberella zeae (Fusarium graminearum)
      26      11001  Chaetomium globosum CBS 148.51
      27      10779  Neurospora crassa
      28      10404  Aspergillus terreus NIH2624
      29      10299  Coccidioides immitis RS
      30      10060  Drosophila pseudoobscura (Fruit fly)
      31      10030  Aspergillus fumigatus (Sartorya fumigata)
      32       9704  Schistosoma japonicum (Blood fluke)
      33       9671  Emericella nidulans (Aspergillus nidulans)
      34       9449  Trypanosoma brucei
      35       9386  Candida albicans (Yeast)
      36       9325  Rattus norvegicus (Rat)
      37       9089  Entamoeba histolytica HM-1:IMSS
      38       9042  Rhodococcus sp. (strain RHA1)
      39       9000  Escherichia coli
      40       8513  Burkholderia xenovorans (strain LB400)
      41       8512  Stigmatella aurantiaca DW4/3-1
      42       8217  Bos taurus (Bovine)
      43       8109  Bradyrhizobium japonicum
      44       8063  Solibacter usitatus Ellin6076
      45       7937  Frankia sp. EAN1pec
      46       7809  Plasmodium yoelii yoelii
      47       7663  Burkholderia vietnamiensis G4
      48       7533  Streptomyces coelicolor
      49       7509  Burkholderia sp. (strain 383) (Burkholderia cepacia 
      50       7432  Bradyrhizobium sp. BTAi1
      51       7314  Streptomyces avermitilis
      52       7262  Myxococcus xanthus (strain DK 1622)
      53       7152  Rhizobium loti (Mesorhizobium loti)
      54       7106  Leishmania major
      55       7062  Rhizobium leguminosarum bv. viciae (strain 3841)
      56       7049  Burkholderia cenocepacia HI2424
      57       6963  Rhodopirellula baltica
      58       6951  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      59       6776  Pseudomonas aeruginosa
      60       6711  Frankia alni ACN14a
      61       6679  Psychroflexus torquis ATCC 700755
      62       6629  Hahella chejuensis (strain KCTC 2396)
      63       6607  Burkholderia cepacia AMMD
      64       6545  Ustilago maydis (Smut fungus)
      65       6419  Cryptococcus neoformans (Filobasidiella neoformans)
      66       6394  Giardia lamblia ATCC 50803
      67       6393  Burkholderia cenocepacia (strain AU 1054)
      68       6383  Cryptococcus neoformans var. neoformans B-3501A
      69       6337  Sinorhizobium medicae WSM419
      70       6280  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      71       6225  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
      72       6219  Ralstonia metallidurans (strain CH34 / ATCC 43123 / DSM 2839)
      73       6217  Yarrowia lipolytica (Candida lipolytica)
      74       6204  Bacillus anthracis
      75       6201  Ralstonia eutropha H16
      76       6153  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      77       6150  Burkholderia pseudomallei (strain 1710b)
      78       6129  Bacillus thuringiensis serovar israelensis ATCC 35646
      79       6025  Plasmodium falciparum
      80       5989  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
      81       5979  Mycobacterium vanbaalenii PYR-1
      82       5936  Yersinia pestis
      83       5904  Bacillus cereus G9241
      84       5896  Rhizobium meliloti (Sinorhizobium meliloti)
      85       5881  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
      86       5852  Mycobacterium sp. KMS
      87       5811  Rhizobium etli (strain CFN 42 / ATCC 51251)
      88       5696  Crocosphaera watsonii
      89       5689  Bacillus sp. NRRL B-14911
      90       5687  Mycobacterium sp. JLS
      91       5665  Nocardia farcinica
      92       5599  Burkholderia pseudomallei (Pseudomonas pseudomallei)
      93       5590  Mycobacterium sp. (strain MCS)
      94       5589  Helicobacter pylori (Campylobacter pylori)
      95       5553  Gallus gallus (Chicken)
      96       5538  Photobacterium profundum 3TCK
      97       5534  Anabaena sp. (strain PCC 7120)
      98       5523  Bacillus weihenstephanensis KBAB4
      99       5516  Pseudomonas fluorescens (strain PfO-1)
     100       5513  Mycobacterium flavescens PYR-GCK


   3.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea           74858 (  2%)
    Bacteria        1612809 ( 49%)
    Eukaryota       1184862 ( 36%)
    Viruses          437391 ( 13%)
    Other              3342 ( <1%)



   Within Eukaryota:

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  55035 (  5%)           (  2%)
     Other Mammalia        119094 ( 10%)           (  4%)
     Other Vertebrata      157328 ( 13%)           (  5%)
     Viridiplantae         259197 ( 22%)           (  8%)
     Fungi                 187904 ( 16%)           (  6%)
     Insecta               134424 ( 11%)           (  4%)
     Nematoda               36759 (  3%)           (  1%)
     Other                 235121 ( 20%)           (  7%)



4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50   42142             1001-1100    19892
                 51- 100  217858             1101-1200    14274
                101- 150  273464             1201-1300    10158
                151- 200  258617             1301-1400     6712
                201- 250  259851             1401-1500     5505
                251- 300  246860             1501-1600     3963
                301- 350  231466             1601-1700     3126
                351- 400  183758             1701-1800     2686
                401- 450  148439             1801-1900     1991
                451- 500  127714             1901-2000     1673
                501- 550   93656             2001-2100     1295
                551- 600   68771             2101-2200     1321
                601- 650   51687             2201-2300     1094
                651- 700   40198             2301-2400      887
                701- 750   35649             2401-2500      672
                751- 800   31903             >2500         6094
                801- 850   23601
                851- 900   20923
                901- 950   15229
                951-1000   11919



   The average sequence length in UniProtKB/TrEMBL is   323 amino acids.

   The shortest sequence is Q96AT0_HUMAN:     4 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



5.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    4959113              1.50
   Submitted to EMBL/GenBank/DDBJ  2564765   1763042    0.77
   Journal                         2340853   1911015    0.71
   Thesis                             5927      5875   <0.01
   Book citation                      4222      4177   <0.01
   Submitted to other databases        390       382   <0.01
   Other                             42956     27184    0.01

Comments (CC)                      1594288              0.48
   CAUTION                          660733    660733    0.20
   SIMILARITY                       339625    332287    0.10
   SUBCELLULAR LOCATION             146413    146413    0.04
   FUNCTION                         143112    137402    0.04
   CATALYTIC ACTIVITY               111155    106722    0.03
   SUBUNIT                           81452     81452    0.02
   COFACTOR                          69193     68827    0.02
   PATHWAY                           28469     24415    0.01
   DOMAIN                             7826      7061   <0.01
   MISCELLANEOUS                      3690      3690   <0.01
   INTERACTION                        2586      2586   <0.01
   MASS SPECTROMETRY                    28        20   <0.01
   ALLERGEN                              6         6   <0.01

Features (FT)                      1584760              0.48
   NON_TER                         1415744    846140    0.43
   SIGNAL                           117681    113596    0.04
   CHAIN                             50795     29813    0.02
   TRANSIT                             540       536   <0.01

Cross-references (DR)             26048671              7.86
   GO                              6905142   1966760    2.08
   InterPro                        4978587   2263523    1.50
   EMBL                            3795886   3304821    1.15
   Pfam                            2836849   2111526    0.86
   PROSITE                         1568909   1014438    0.47
   KEGG                             886563    848900    0.27
   GenomeReviews                    847386    805667    0.26
   PRINTS                           640221    533328    0.19
   SMART                            543869    423745    0.16
   TIGRFAMs                         404484    373646    0.12
   SMR                              383447    383385    0.12
   ProDom                           370442    352631    0.11
   BioCyc                           286378    271096    0.09
   HSSP                             275921    275518    0.08
   PANTHER                          249322    246987    0.08
   PIR                              190563    155148    0.06
   TIGR                             136495    130204    0.04
   UniGene                          111140    106824    0.03
   Ensembl                           99717     99715    0.03
   ArrayExpress                      91421     91404    0.03
   RZPD-ProtExp                      81191     32808    0.02
   PIRSF                             80345     79566    0.02
   Gramene                           71161     71161    0.02
   MGI                               44511     43786    0.01
   FlyBase                           25700     25663    0.01
   TAIR                              19951     19890    0.01
   WormPep                           19324     19239    0.01
   WormBase                          19271     19188    0.01
   LinkHub                           14660     14660   <0.01
   MEROPS                            12421     11979   <0.01
   ZFIN                              12302     12300   <0.01
   LegioList                          5403      5373   <0.01
   IntAct                             5209      5209   <0.01
   ListiList                          4744      4727   <0.01
   AGD                                4141      4141   <0.01
   PDB                                4137      2465   <0.01
   PhotoList                          4112      3988   <0.01
   HGNC                               3152      3152   <0.01
   TubercuList                        2551      2545   <0.01
   DictyBase                          1967      1967   <0.01
   RGD                                1902      1896   <0.01
   GeneDB_Spombe                      1872      1859   <0.01
   SagaList                           1762      1668   <0.01
   Leproma                             974       973   <0.01
   TRANSFAC                            897       886   <0.01
   SGD                                 688       671   <0.01
   PeroxiBase                          633       627   <0.01
   MypuList                            593       589   <0.01
   REBASE                              124       119   <0.01
   PHCI-2DPAGE                         106       106   <0.01
   ANU-2DPAGE                           64        64   <0.01
   SWISS-2DPAGE                         48        48   <0.01
   Reactome                              7         7   <0.01
   PMMA-2DPAGE                           3         3   <0.01
   Siena-2DPAGE                          2         2   <0.01
   COMPLUYEAST-2DPAGE                    1         1   <0.01

Number of explicitly cross-referenced databases: 78


6.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/TrEMBL: 234955

Total number of entries encoded on a Mitochondrion: 144724
Total number of entries encoded on a Plasmid: 55874
Total number of entries encoded on a Plastid: 3169
Total number of entries encoded on a Plastid; Apicoplast: 179
Total number of entries encoded on a Plastid; Chloroplast: 51775
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 166

Number of fragments: 848216


Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk


Download information

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on CD-ROM from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: datalib@ebi.ac.uk / swissprot@ebi.ac.uk
WWW server: http://www.ebi.ac.uk/


Swiss Institute of Bioinformatics (SIB)
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address: swiss-prot@expasy.org
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address: pirmail@georgetown.edu
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

Wu C.H., Apweiler R., Bairoch A., Natale D.A., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Mazumder R., O'Donovan C., Redaschi N., Suzek B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34: D187-D191 (2006).