On the Function of MHC-Antigen Specificity Arnold G. Reinhold Cambridge, MA March 3, 1989 Rev. 2ja Current understanding of the structure and function of Class I MHC, embodied in the "protein-self" model, suggest that short protein fragments bind to a pocket on the MHC molecule and that the resulting complexes are presented on the cell surface for potential interaction with T-cells. [BJORKMAN 87] [PARHAM88] [MARRACK 87] There are two separate binding specificities associated with this model: that between the protein and the MHC molecule and that between the protein and the T-cell receptor. The latter is central to antigen recognition. The former has been more mysterious. Recent studies suggest as few as on in a thousand fragments are sampled This paper suggests a function for the former, namely that it determines a subset of possible proteins that an individual's cells will present and suggests how that subset selection could evolve to explain a number of functions of the immune system including: improving the reliability of T-cell training, MHC restriction, vigorous rejection of tissue grafts from even closely related individuals, detection of virus-infected and cancerous cells, and T-cell clonal deletion. The model also suggests an adaptive function for apparently neutral genetic variation. An analysis along cryptographic lines indicates that this random selection model is numerically feasible. The random selection model. In the proposed model Class I MHC proteins indirectly monitor a cell's protein synthesis by randomly sampling peptide fragments produced during proteolysis and presenting them on the surface of the cell where they can be inspected by T-cells. T-cell training occurs by presenting thymocytes with numerous, similarly selected samples in the thymus. [Note for non biological readers: The "T" in T-cell stands for thymus-derived. T-cell precursors, called thymocytes, mature in the thymus gland where self-reactive cells are thought to be deleted in a process called training.] Thymocytes which react to any sample are killed. Thus almost all T-cell clones which react to fragments of self-protein will have been deleted and the sampled peptides presented by Class I MHC molecules on normally functioning cells will not induce an immune response. But when a virus-infected cell, for example, synthesizes non-self proteins, they will be degraded by the normal proteolysis pathways and fragments will be sampled and presented. The resulting sample-MHC complexes may then be recognized as foreign and an immune response can be initiated against that cell and other infected cells. There is an analogy between the proposed model and cryptologic authenticator systems. [SIMMONS82] Peptides can be considered as code words. Only those code words used to train thymocytes are valid. Cells corrupted by a virus are forced to exhibit an invalid code. For such a system to work, the code word length must be long enough to encode many more possibilities than the set of valid codes. We will show below that this is true for the MHC system. MHC protein does not exhibit the high binding specificity of other antibodies such as Ig and the T-cell receptor. [SETTE87] So the small number of MHC protein allotypes present in an individual will bind a large number of different possible peptides. In the random selection model, MHC binding specificity determines which subset of all the peptide fragments produced in proteolysis will be sampled by an individual's MHC proteins. (See Figure 1.) One adaptive value in presenting only a small subset of possible protein fragments lies in making more reliable whatever mechanism is used to delete self reactive T-cell clones. (See below.) Of course the system cannot be too selective lest it fail to react at all. Another possible adaptive function for MHC selectivity is to allow some self- proteins to be tolerated by the immune system even though they cannot be presented during thymocyte processing. Certain signal proteins, for example, might be inappropriate for expression in the thymus. Such self- proteins could co-evolve with MHC selectivity in such a way that they are never sampled by any MHC-allotype. Since different signal proteins tend to be homologous it is only necessary for the sampling process to ignore those portions of the protein that vary from versions that can be expressed in the thymus. Alternately the thymus could even express non-functional variants of signal proteins for the sole purpose of deleting reactive thymocytes. Analysis. Selectivity eases training. To show the impact of MHC selectivity on training reliability , we assume a simple model of training wherein each thymocyte is repeatedly presented with self-peptides randomly sampled by the organism's MHC allotypes. If the thymocyte ever responds, i.e. if the thymocyte's T-cell receptor binds to the MHC self-peptide complex, then the thymocyte is killed, perhaps by the specialized thymus cell that presented the peptide. Say there are T thymocytes, the total number of self-peptide fragment types produced in normal proteolysis is S, the fraction of such fragments sampled by the organism's MHC allotype is f, and the probability that a single thymocyte will fail to encounter one particular self-peptide type is p. To be sure a thymocyte is not self reactive, it must be presented with all fS self- peptides. So the probability, r, that a self reactive thymocyte will escape deletion is: (1) r = (1 - (1-p)^fS) ~= 1-exp(-pfS), for small p. The number, R, of self-reactive thymocytes escaping will then be: (2) R = rT ~= (1-exp(-pfS))T For r to be small, pfS must be close to zero, in which case: (3) R ~= pfST Thus increasing MHC sampling selectivity (i.e. decreasing f) makes the reliability requirements on thymocyte processing less stringent in direct proportion. The counter pressure to greater MHC selectivity is insuring that inappropriate peptides, such as viral protein fragments, are sampled adequately. If V is the average number of different peptide fragments resulting from proteolysis of a virus's proteins and f is again the fraction of all possible protein fragments that are sampled by an individual's MHC allotypes, then a reliable immune response to the virus infection requires that: (4) fV >> 1 Equations (3) and (4) represent opposite selection pressures on f that must be balanced in the evolution of a species' MHC protein. How big need the peptide fragments be? We promised above to consider whether code word space large was enough, i.e. whether the length of peptides observed to bind to MHC protein is large enough to allow distinction between self and non-self. To do this we must first estimate how many different protein fragments can occur in normal proteolysis. Then we will compare that estimate with a calculation of how many distinguishable peptides of appropriate length there are. Let L be the number of amino acids in a average protein and let G be the number of proteins encoded in a cell's diploid genome. We wish to compute N, the number of possible protein fragments of length between j and k where j <= k. (5) N = G((k-j+1)(L+1) + ((j-1)j - k(k+1))/2) ~= G(k-j+1)(L - (j+k)/2) Here we assume that cleavage can occur at any amino acid, so that there are L-i+1 peptide fragments of length i. McKusick [ MCKUSICK86] suggests a value of 150 for L and cites estimates of between 50,000 and 200,000 for the haploid genome count in man. Since most genes are either monomorphic or have highly homologous allotypes we can use this estimate for G. A range of 2 to 20 amino acids seems typical of processed antigen lengths reported in the literature. There are (at least) two corrections that have to be made. First, the proteolytic enzymes produced by a cell only cleave at certain amino acid combinations. Second, homology between different genes will reduce the number of distinct fragments. We incorporate these effects in a correction factor c. Then for man we have: (6) N is roughly between 0.87 and 3.5 x 10^8 c Peptides are different for our purposes only if T-cells can distinguish between them. Sette et al. [SETTE] investigated the effect of single amino acid substitutions in an antigen both with regard to MHC II and T-cell interaction. More than half the residues were sensitive to conservative substitution for at least one of the two T-cell lines they reported on. If an organism's T-cell population is sensitive to conservative amino acid substitution at m residue sites in presented peptides,then the number of different peptides, M, that can be distinguished is roughly: (7) M = 20^m If m = 10 then (8) M = 20^10 = 1.02 x 10^13 Thus for a conservative estimate of the number of distinguishable residues on presented peptides M >> N, i.e. the code space is ample. We can also calculate b, a "break-even" value of m, where the number of possible fragments distinguished at b residues equals the number of possible fragments in the human protein repertoire: (9) 20^b = L ~= 3.2 x 10^8 c (10) b ~= 6.54 + 0.77 log10 (c) For c = 0.1, the "break-even" length is about 6 amino acids. Discussion. This paper suggests four different pressures on the evolution of MHC allotypes: 1. Keeping the sample subset small enough to allow reliable deletion of self- reactive T-cells. 2. Avoiding the selection of certain self-proteins that cannot be expressed during T-cell maturation. 3. Selecting a large enough sample to detect pathogens and tumors. 4. Specifically sampling signal proteins whose cellular expression cannot be tolerated in later stages of development. Thus it is not surprising that MHC type has been shown by numerous investigators [IMMR83] to be linked with an individual's susceptibility to disease. Possible mechanisms include inadequate selectivity resulting in the release of self reactive T-cells, selecting too few samples from viral protein resulting in a immune response that is too weak, selecting particular antigen fragments from pathogens that mimic a self-protein idiotype, resulting in autoimmune reaction. One weakness in the protection afforded by MHC sampling is that there could evolve within a host individual a virus that would only synthesize proteins which evade sampling by that individual's MHC. While this poses a danger to the individual, MHC polymorphism prevents such a virus from endangering an entire species. Since different individuals express MHC allotypes with different binding specificities they sample a different subset of protein fragments. A virus that evades one individual's selection generally will be sampled by most other individuals' MHC proteins. In the cryptologic authenticator analogy, each individual's set of valid codes is determined by a key in the form of MHC allotypes. A virus could evolve to evade one key but not all keys. The so called neutral genetic drift, random amino acid substitutions with no apparent effect on protein function, might in fact be adaptive for organisms that use the class I MHC system. The resulting variation in self protein sequences would provide additional protection against pathogens that have adapted to other species. Viral infections are often suppressed but not eliminated. In order to mount a cellular defense against a virus-infected cell under the proposed model, the rate of viral protein synthesis must be high enough to insure that some fragments are sampled. This suggests how a quiescent virus could evade eradication. The model also suggests a way that incorporation of a weak or damaged provirus into an organism's genome might be protective against retroviruses: activation of a new, infectious provirus could also activate the inherited provirus. The resulting production of foreign but ineffective protein and its subsequent degradation and presentation could alert the immune system at an earlier stage of viral development. The proposed model also permits some speculation on a mechanism of tumor cell recognition. Expression of certain early developmental signal proteins may stop before the thymocyte deletion system is operating. When such signals are inappropriately expressed later, say, during oncogenesis, fragments are sampled and stimulate the cytotoxic T-cell system. Such developmental proteins might have co-evolved with MHC to maximize their recognition potential. MHC sampling polymorphism also suggests an explanation of the vigorous rejection of tissue grafts from closely related, though non-identical donors. The donor cells will have MHC protein allotypes different from the recipient's and will therefore sample a different subset of protein fragments. As a result the recipient's T-cells will see not just the few epitopes on the donor's variant MHC protein allotypes but a large number of antigen epitopes in the form of sampled peptide fragments which are new to the recipient. These would be fragments of proteins which, though synthesized by the recipient, are not sampled by the recipient's MHC proteins and which, therefore, have not induced deletion of self-reactive thymocytes during T- cell maturation. The donor cells have the wrong cryptologic key and are therefore presenting totally wrong authenticator codes to the recipient's T- cells. Class I MHC performs the sampling function in the vast majority of cells whose normal function does not require ingesting foreign protein. Class II MHC may perform a similar sampling function in antigen processing cells such as macrophages and activated B-cells. Here the detection of a foreign protein fragment does not mark a cell as infected but triggers other immune responses. The first three selection pressures listed above would also seem appropriate to class II MHC. Bibliography [BJORKMAN87] Bjorkman, P.J., et al., "Structure of the human class I histocompatability antigen, HLA-A2," Nature 329, 506 (1987) [DELISI88] DeLisi, C., "Computers in molecular biology: current applications and emerging trends," Science 240, pp. 47-52 (1988) [IMMR83] Immunological Reviews Vol 70 (1983) [KLEIN79] Klein, J., "The major histocompatability complex of the mouse," Science 203, 516-521 (1979) [MARRACK87] Marrack, P. and Kappler, J., "The T cell receptor," Science 238, pp. 1073-1111 (1987) [MCKUSICK86] McKusick, V.A., "Mendelian Inheritance in Man, Seventh Edition," pp. xvii-xviii, Johns Hopkins Univ. Press, Baltimore, MD (1986) [PARHAM88] Parham, P., "Presentation and processing of antigens in Paris," Immunology Today, Vol. 9, No. 3, p65-68 (1988) [SETTE87] Sette, A., Buus, S., Colob, S., Smith, J. A., Miles, C., Grey, H.M., "Structural characteristics of an antigen required for its interaction with Ia and recognition by T cells," Nature 328, 395-399 (1987) [SIMMONS82] Simmons, G.J., "Message authentication without secrecy," in "Secure communications and asymetric cryptosystems," pp. 105-139, Westview Press, Boulder, CO (1982)