PROSITE logo

PROSITE documentation PDOC00021

View entry in NiceDoc format
View entry in raw text format (no links)
{PDOC00021}
{PS00022; EGF_1}
{PS01186; EGF_2}
{PS50026; EGF_3}
{BEGIN}
******************************************
* EGF-like domain signatures and profile *
******************************************

A sequence of  about thirty  to forty amino-acid  residues  long found in  the
sequence of  epidermal  growth  factor  (EGF)  has been  shown  [1 to 6] to be
present, in  a more or less conserved form, in a large number of other, mostly
animal  proteins.  EGF  is  a  polypeptide  of about 50 amino acids with three
internal  disulfide  bridges.  It  first  binds with high affinity to specific
cell-surface receptors and then induces their dimerization, which is essential
for  activating  the  tyrosine  kinase  in  the  receptor  cytoplasmic domain,
initiating  a  signal  transduction  that  results  in  DNA synthesis and cell
proliferation.

A  common  feature  of  all  EGF-like  domains  is  that they are found in the
extracellular  domain  of  membrane-bound  proteins or in proteins known to be
secreted (exception: prostaglandin G/H synthase). The EGF-like domain includes
six cysteine residues which have been shown to be involved in disulfide bonds.
The  structure  of several EGF-like domains has been solved. The fold consists
of   two-stranded  beta-sheet  followed  by  a  loop  to  a  C-terminal  short
two-stranded sheet   (see   <PDB:1EGF>).   Subdomains  between  the  conserved
cysteines strongly  vary  in  length  as  shown  in  the  following  schematic
representation of the EGF-like domain:

                 +-------------------+        +-------------------------+
                 |                   |        |                         |
  x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)-C-x
       |                   |         ************************************
       +-------------------+

'C': conserved cysteine involved in a disulfide bond.
'G': often conserved glycine
'a': often conserved aromatic amino acid
'*': position of both patterns.
'x': any residue

Some  proteins  known  to contain one or more copies of an EGF-like domain are
listed below.

 - Adipocyte differentiation inhibitor (gene PREF-1) from mouse (6 copies).
 - Agrin, a basal lamina protein  that causes the aggregation of acetylcholine
   receptors on cultured muscle fibers (4 copies).
 - Amphiregulin, a growth factor (1 copy).
 - Betacellulin, a growth factor (1 copy).
 - Blastula  proteins  BP10  and  Span from sea urchin which are thought to be
   involved in pattern formation (1 copy).
 - BM86, a glycoprotein antigen of cattle tick (7 copies).
 - Bone morphogenic protein 1 (BMP-1), a  protein which induces cartilage  and
   bone formation  and  which  expresses  metalloendopeptidase  activity  (1-2
   copies). Homologous proteins are found in sea urchin - suBMP (1 copy) - and
   in Drosophila - the dorsal-ventral patterning protein tolloid (2 copies).
 - Caenorhabditis elegans developmental proteins lin-12 (13 copies)  and glp-1
   (10 copies).
 - Caenorhabditis elegans apx-1 protein, a patterning protein (4.5 copies).
 - Calcium-dependent serine proteinase (CASP) which degrades the extracellular
   matrix proteins type I and IV collagen and fibronectin (1 copy).
 - Cartilage matrix protein CMP (1 copy).
 - Cartilage oligomeric matrix protein COMP (4 copies).
 - Cell surface antigen 114/A10 (3 copies).
 - Cell surface glycoprotein complex transmembrane subunit ASGP-2  from rat (2
   copies).
 - Coagulation associated proteins C, Z (2 copies) and S (4 copies).
 - Coagulation factors VII, IX, X and XII (2 copies).
 - Complement C1r components (1 copy).
 - Complement C1s components (1 copy).
 - Complement-activating component of Ra-reactive factor (RARF) (1 copy).
 - Complement components C6, C7, C8 alpha and beta chains, and C9 (1 copy).
 - Crumbs, an epithelial development protein from Drosophila (29 copies).
 - Epidermal growth factor precursor (7-9 copies).
 - Exogastrula-inducing peptides A, C, D and X from sea urchin (1 copy).
 - Fat protein, a Drosophila cadherin-related tumor suppressor (5 copies).
 - Fetal  antigen  1, a probable neuroendocrine differentiation protein, which
   is derived from the delta-like protein (DLK) (6 copies).
 - Fibrillin 1 (47 copies) and fibrillin 2 (14 copies).
 - Fibropellins  IA  (21 copies), IB (13 copies), IC (8 copies), II (4 copies)
   and III   (8   copies)  from  the  apical  lamina  -  a  component  of  the
   extracellular matrix - of sea urchin.
 - Fibulin-1 and -2, two extracellular matrix proteins (9-11 copies).
 - Giant-lens  protein (protein Argos), which regulates cell determination and
   axon guidance in the Drosophila eye (1 copy).
 - Growth factor-related proteins from various poxviruses (1 copy).
 - Gurken protein, a Drosophila developmental protein (1 copy).
 - Heparin-binding EGF-like growth factor (HB-EGF), transforming growth factor
   alpha (TGF-alpha), growth factors Lin-3 and Spitz (1 copy);  the precursors
   are membrane proteins, the mature form is located extracellular.
 - Hepatocyte growth factor (HGF) activator (EC 3.4.21.-) (2 copies).
 - LDL  and  VLDL receptors, which bind and transport low-density lipoproteins
   and very low-density lipoproteins (3 copies).
 - LDL  receptor-related  protein  (LRP), which  may  act  as  a  receptor for
   endocytosis of extracellular ligands (22 copies).
 - Leucocyte  antigen  CD97  (3  copies),  cell  surface  glycoprotein EMR1 (6
   copies) and cell surface glycoprotein F4/80 (7 copies).
 - Limulus clotting factor C, which is involved in hemostasis and host defense
   mechanisms in japanese horseshoe crab (1 copy).
 - Meprin A alpha subunit, a mammalian membrane-bound endopeptidase (1 copy).
 - Milk fat globule-EGF factor 8 (MFG-E8) from mouse (2 copies).
 - Neuregulin GGF-I and GGF-II, two human glial growth factors (1 copy).
 - Neurexins from mammals (3 copies).
 - Neurogenic  proteins  Notch, Xotch and the human homolog Tan-1 (36 copies),
   Delta (9  copies)  and  the  similar  differentiation  proteins  Lag-2 from
   Caenorhabditis elegans (2  copies), Serrate (14 copies) and Slit (7 copies)
   from Drosophila.
 - Nidogen  (also called entactin), a basement membrane protein from chordates
   (2-6 copies).
 - Ookinete surface proteins (24 Kd, 25 Kd, 28 Kd) from Plasmodium (4 copies).
 - Pancreatic secretory granule membrane major glycoprotein GP2 (1 copy).
 - Perforin, which lyses non-specifically a variety of target cells (1 copy).
 - Proteoglycans  aggrecan (1 copy), versican (2 copies), perlecan (at least 2
   copies), brevican (1 copy) and chondroitin sulfate proteoglycan (gene PG-M)
   (2 copies).
 - Prostaglandin G/H synthase 1 and 2  (EC 1.14.99.1) (1 copy), which is found
   in the endoplasmatic reticulum.
 - Reelin, an extracellular matrix protein  that  plays a role in  layering of
   neurons in the cerebral cortex and cerebellum of mammals (8 copies).
 - S1-5,  a  human  extracellular  protein whose ultimate activity is probably
   modulated by the environment (5 copies).
 - Schwannoma-derived growth factor (SDGF), an autocrine growth factor as well
   as a mitogen for different target cells (1 copy).
 - Selectins. Cell  adhesion  proteins such  as  ELAM-1 (E-selectin),  GMP-140
   (P-selectin), or the lymph-node homing receptor (L-selectin) (1 copy).
 - Serine/threonine-protein  kinase  homolog  (gene  Pro25)  from  Arabidopsis
   thaliana, which  may   be   involved   in   assembly   or   regulation   of
   light-harvesting chlorophyll A/B protein (2 copies).
 - Sperm-egg fusion proteins PH-30 alpha and beta from guinea pig (1 copy).
 - Stromal cell derived protein-1 (SCP-1) from mouse (6 copies).
 - TDGF-1, human teratocarcinoma-derived growth factor 1 (1 copy).
 - Tenascin  (or  neuronectin),  an  extracellular matrix protein from mammals
   (14.5 copies), chicken (TEN-A) (13.5 copies) and the related proteins human
   tenascin-X (18  copies)  and  tenascin-like  proteins  TEN-A and TEN-M from
   Drosophila (8 copies).
 - Thrombomodulin   (fetomodulin),  which  together  with  thrombin  activates
   protein C (6 copies).
 - Thrombospondin  1, 2 (3 copies), 3 and 4 (4 copies), adhesive glycoproteins
   that mediate cell-to-cell and cell-to-matrix interactions.
 - Thyroid peroxidase 1 and 2 (EC 2.7.10.1) from human (1 copy).
 - Transforming  growth  factor  beta-1  binding protein (TGF-B1-BP) (16 or 18
   copies).
 - Tyrosine-protein kinase receptors Tek and Tie (EC 2.7.1.112) (3 copies).
 - Urokinase-type  plasminogen  activator  (EC  3.4.21.73)  (UPA)  and  tissue
   plasminogen activator (EC 3.4.21.68) (TPA) (1 copy).
 - Uromodulin (Tamm-horsfall urinary glycoprotein) (THP) (3 copies).
 - Vitamin  K-dependent  anticoagulants  protein C (2 copies) and protein S (4
   copies) and  the  similar  protein Z, a single-chain plasma glycoprotein of
   unknown function (2 copies).
 - 63 Kd sperm flagellar membrane protein from sea urchin (3 copies).
 - 93 Kd protein (gene nel) from chicken (5 copies).
 - Hypothetical  337.6  Kd  protein  T20G5.3  from  Caenorhabditis elegans (44
   copies).

The region between the 5th and 6th cysteine contains two conserved glycines of
which at  least  one  is  present  in  most  EGF-like  domains. We created two
patterns for  this  domain,  each  including one of these C-terminal conserved
glycine residues. The profile we developed covers the whole domain.

-Consensus pattern: C-x-C-x(2)-{V}-x(2)-G-{C}-x-C
                    [The 3 C's are involved in disulfide bonds]
-Sequences known to belong to this class detected by the pattern: ALL.
 but not  those  that  have very long or very short regions between the last 3
 conserved cysteines of their EGF-like domain(s).
-Other sequence(s)  detected  in  Swiss-Prot:  87 proteins, of which 27 can be
 considered as possible candidates.

-Consensus pattern: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C
                    [The 3 C's are involved in disulfide bonds]
-Sequences known to belong to this class detected by the pattern: ALL.
 but not  those  that  have very long or very short regions between the last 3
 conserved cysteines of their EGF-like domain(s).
-Other sequence(s)  detected  in  Swiss-Prot:  83 proteins, of which 49 can be
 considered as possible candidates.

-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.

-Note: The  beta chain of the integrin family of proteins contains 2 cysteine-
 rich repeats  which were said to be dissimilar with the EGF pattern [7].
-Note: Laminin  EGF-like repeats (see <PDOC00961>) are longer than the average
 EGF module  and  contain  a further disulfide bond C-terminal of the EGF-like
 region. Perlecan  and  agrin  contain  both EGF-like domains and laminin-type
 EGF-like domains.
-Note: The  pattern do not detect all of the repeats of proteins with multiple
 EGF-like repeats.
-Note: See <PDOC00913> for an entry describing specifically the subset of EGF-
 like domains that bind calcium.

-Last update: April 2006 / Pattern revised.

[ 1] Davis C.G.
     "The many faces of epidermal growth factor repeats."
     New Biol. 2:410-419(1990).
     PubMed=2288911
[ 2] Blomquist M.C., Hunt L.T., Barker W.C.
     "Vaccinia virus 19-kilodalton protein: relationship to several
     mammalian proteins, including two growth factors."
     Proc. Natl. Acad. Sci. U.S.A. 81:7363-7367(1984).
     PubMed=6334307
[ 3] Barker W.C., Johnson G.C., Hunt L.T., George D.G.
     Protein Nucl. Acid Enz. 29:54-68(1986).
[ 4] Doolittle R.F., Feng D.F., Johnson M.S.
     "Computer-based characterization of epidermal growth factor
     precursor."
     Nature 307:558-560(1984).
     PubMed=6607417
[ 5] Appella E., Weber I.T., Blasi F.
     "Structure and function of epidermal growth factor-like regions in
     proteins."
     FEBS Lett. 231:1-4(1988).
     PubMed=3282918
[ 6] Campbell I.D., Bork P.
     Curr. Opin. Struct. Biol. 3:385-392(1993).
[ 7] Tamkun J.W., DeSimone D.W., Fonda D., Patel R.S., Buck C.,
     Horwitz A.F., Hynes R.O.
     "Structure of integrin, a glycoprotein involved in the transmembrane
     linkage between fibronectin and actin."
     Cell 46:271-282(1986).
     PubMed=3487386

--------------------------------------------------------------------------------
PROSITE is copyrighted by the SIB Swiss Institute of Bioinformatics and
distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives
(CC BY-NC-ND 4.0) License, see https://prosite.expasy.org/prosite_license.html
--------------------------------------------------------------------------------

{END}