Home - Physics Matt McIrvin mmcirvin@world.std.com

A concise introduction to elementary statistical mechanics, or:

Where does the Boltzmann factor come from?

by Matt McIrvin

Contents

  1. Introduction: What it's used for
  2. Probability and entropy
  3. Temperature
  4. The Boltzmann factor
  5. Making it intuitive

(Even if you've never heard of the Boltzmann factor, you might appreciate the following. There is some algebra and a little differential calculus in it, but very little knowledge of physics is assumed, basically just the concept of energy.)

1. Introduction: What it's used for

From time to time I have heard people ask on sci.physics (and in other places, such as the American Journal of Physics, whose Questions and Answers section is basically a more genteel sci.physics in print form) where the Boltzmann factor e^(-E/kT) in statistical mechanics comes from.

This is a factor that shows up in situations where the temperature, T, is given; it is (proportional to) the probability that the system is in a state with energy E, where k is Boltzmann's constant (which you may have seen in the ideal gas law in a chemistry class). The more energetic the state is, the less probable it is, but increasing the temperature increases the probability of the more energetic states. The Boltzmann factor is the basis of a huge amount of thermodynamic and statistical physics, both classical and quantum.

The factor gives the probability of a single state; the probability of a given energy also depends on how many possible states there are with that energy. In the quantum case you can often actually count the states discretely when adding up probabilities. In the classical case the states form a continuum, and you have to replace the sums over states with integrals over phase space. This is one place where the quantum calculations are actually easier to understand, at least to me, and I'm going to assume in the following that states are always discrete. It amazes me that Ludwig Boltzmann managed to figure out more or less the argument that I'll describe here before quantum mechanics was invented.

So why does the expression have this specific form? Feynman justifies it heuristically in the Feynman Lectures on Physics by reference to the "exponential atmosphere"; thermo textbooks usually give a more or less complete explanation, but it's not all in one place and it's hard to see the flow of the logic. I've decided, therefore, to spend a little time writing up a concise and not terribly rigorous explanation, which introduces some of the basic concepts of statistical mechanics along the way. My inspiration mostly comes from Erwin Schrodinger's book Statistical Thermodynamics; I highly recommend it-- it was written by a giant, it's fairly easy to read, it's very short, and as science textbooks go, it's extremely cheap (it is now a Dover reprint).

2. Probability and entropy

Any physical system that is made up of many, many tiny parts will have microscopic details to its physical behavior that are not easy to observe. There are various microscopic states the system can have, each of which is defined by the state of motion of every one of its atoms, for instance. But all we can measure easily are its macroscopic properties like density or pressure.

(You might wonder whether there is some fundamental, physical difference between macroscopic properties and microscopic ones. Really, there isn't. The macroscopic properties are just the ones we choose to measure or control, and the microscopic properties are the ones that jitter around behind the scenes. Usually, the macroscopic properties have to do with things we can comfortably measure with human-sized equipment, hence the name; but that is not necessarily so. In an age when atoms can be photographed and hauled about individually and piled like cannonballs, you can see that the distinction is somewhat porous.)

In the sort of situation studied in statistical mechanics, the microscopic state is constantly thrashing around randomly (subject only to conservation laws), and in such a situation the Second Law of Thermodynamics comes into play.

The Second Law of Thermodynamics can be nicely stated as follows: A physical system will, if isolated (that is, if energy cannot get in or out), tend toward the available macroscopic state in which the number of possible microscopic states is the largest.

This makes sense; if there are many different ways to have a certain set of macroscopic parameters, that ought to increase the likelihood of the system being in that macrostate. It's like rolling a pair of dice. Suppose that the "macrostate" is the total of the dice. There are six ways to get a total of 7 from the "microstates" of the two dice, but only one way to get a total of 2 (snake-eyes) or 12 (boxcars), so 7 is more likely.

Each die can have any of six "microstates", and for each microstate of one die, the other die can be in any microstate, so the number of microstates of the whole system is 6*6 = 36. In general, if you combine two systems into a bigger system, the number of possible microstates multiplies.

If you have hundreds of fair dice and put them all together, the total for which the number of possible microstates is at a maximum is (if you work it out) at a value of 3.5 times the number of dice. If you just shake these dice up in a bag it will be extremely improbable for you to get a total which is much different from that; if you do the experiment many times, the average deviation from this number is nearly certain to be much smaller than the total. If you carefully put the dice into the bag so that they show some vastly different total, then shake up the bag, the total of the uppermost faces of all the dice will converge rapidly on the value (the "macrostate") for which the number of ways to make it from individual dice ("microstates") is at a maximum. It's just the same for the macrostates of an isolated system (except for the additional restriction that some quantities, like the total energy, may be subject to conservation laws); thermal fluctuations do the "shaking," and the macroscopically measurable quantities converge on the values with the largest number of microstates. This is what the Second Law says.

When systems are combined, the numbers of microstates multiply. It's inconvenient to multiply all of these truckloads of numbers, especially if there are 10^23 atoms to deal with! If you take the logarithms of numbers, then the log of the product is the sum of the logs. We can get a quantity maximized by the Second Law, which adds when you put systems together, by taking the logarithm of the number of microstates, as written on Ludwig Boltzmann's tombstone:

S = k log N                                                 (1)

where N is the number of microstates, "log" is the natural log, and k is an arbitrary constant, called Boltzmann's constant. S is the entropy. Stated in terms of this quantity, the Second Law says that isolated systems tend toward an equilibrium macrostate with as large a total entropy as possible, because then the number of microstates is the largest.

If you add up the value of S for all the possible macrostates corresponding to a given energy, then you can get the entropy corresponding to that energy, S(E).

Entropy is sometimes described as a measure of "disorder," and if you ponder the definition given above, you can see that in a narrow sense that is true. However, the word "disorder" has all sorts of esthetic, moral, and political meanings that have nothing to do with k log N. The identification of entropy with disorder in the broader sense (combined, sometimes, with neglect of the Second Law's proviso about isolated systems) has been used to justify many specious arguments, such as the claims that the Second Law disproves evolution or that it proves that technology is evil. It's better just to keep the Boltzmann definition, which is wonderfully simple, firmly in mind. Entropy has to do with the number of ways that the microstate can rearrange itself without affecting the macrostate.

3. Temperature

Okay, so what happens if you have two systems, and you put them in thermal contact so they can exchange some energy? The total energy, E = E1+E2, is constant, but E1 and E2 can individually change. What will the individual systems look like in equilibrium?

Now the two individual systems are not isolated, but the combination of the two is, so the Second Law should apply as long as we consider the combination as a whole. The number of microstates, N1*N2, will be as large as possible, so the total entropy, k (log N1 + log N2), will also be as large as possible: S = S1(E1) + S2(E - E1) will be maximized. But then S can't get any bigger if you change E1, so if you take the derivative of S with respect to E1, that has to be zero:

d S       d S1 (E1)       d S2 (E - E1)
----  =  -----------  +  --------------  =  0
d E1        d E1              d E1

or

 d S1 (E1)       d S2 (E2)
-----------  -  -----------  =  0
   d E1            d E2

(Really these should be partial derivatives.)

So it appears that dS/dE, the derivative of entropy with respect to energy, is the same for the two systems. We call this "1/T", where T is a quantity called temperature. This is the definition of temperature! It's defined so that two systems in thermal contact will tend to equalize their temperatures. We define k (Boltzmann's constant) in (1) so that T is in whatever temperature units we want to use. (This definition will put the zero of temperature at absolute zero, so to use a Fahrenheit or Celsius scale, we have to add a constant too.)

This is a little like discovering that an old friend, unbeknownst to you, has actually been working all these years as a secret agent in outer space, fighting Martians. Temperature is such a mundane quantity, reported on the evening news, and here we've defined it in terms of exotic concepts like the logarithm of the number of microstates! To make it less counterintuitive, just remember that by this definition, heat energy will always flow from hotter objects to colder ones when you put them in thermal contact, because the cold object gets more entropy per unit of energy than the hot one. This is exactly how temperature ought to behave, to conform to our usual experience of temperature. If it's cold outside, that means that the world gains a tremendous amount of entropy by sucking heat energy out of your body, so you'd better bundle up.

In high school you are told that temperature is a measure of energy density, but it really isn't except in very simple situations in which dS/dE is inversely proportional to the energy density (like an ideal gas). An expression always proportional to energy density would not behave in the way we want. The temperature is admittedly a function of the energy density (not the total energy, since both E and S scale in the same way with the size of the system if energy density is constant; temperature is an "intensive quantity"). But it is generally a nonlinear function of energy density, and it is different for different substances, and for the same substance under different conditions; so if you put two objects of identical energy density in thermal contact, they will not always be in thermal equilibrium, and two objects in equilibrium will not always have identical energy densities.

If you have two bags of fair, six-sided dice, and you repeatedly shake them up and dump them out, then select only the rolls in which all of the dice add up to some constant number (an unimaginably tedious exercise, I admit), in the majority of these rolls the average number on a die will be approximately the same in each bag. (How could it be otherwise?) But if one of the bags contains six-sided dice and the other contains those icosahedral dice that role-playing game addicts carry around with them, the result will be quite different; a typical die in one bag will carry more of the total than in the other, even though the system is in the "thermal equilibrium" obtained by shaking the bags.

4. The Boltzmann factor

Now that we've defined temperature, the rest is not so hard. We want to talk about a system that is at constant temperature. That's not the same as an isolated system, but we can define one in terms of the other by the method of combining systems to form bigger systems. We can just put the system we want to study in a great big "oven" that keeps it at constant temperature.

Consider the following pair of systems in thermal contact. One (system 1) is the system we're actually studying, and the other (system 2) is a "heat bath": an immense system at some temperature T2, which is so big that it has nearly all of the total energy E of the combined system. In other words, E1 << E. Suppose furthermore that the entropy of system 2 is a fairly smooth function of E whose derivative doesn't vary much over a range of size E1. In other words, system 2, the heat bath, can absorb energy E1 without the temperature changing significantly at all. Since the temperature is some function of the energy density, and in practice it will be quite continuous unless the heat bath is made of an unsuitable substance like nitroglycerine, this requirement is also easy to satisfy just by making the heat bath really big, so that the energy density changes very little when E1 is absorbed.

Now, for a given total energy E = E1+E2, the probability that there will be some energy E1 is just proportional to the total number of microstates of system 1 with energy E1 (call it N1) times the total number of microstates of system 2 with energy E2 (N2). What we are interested in is N2, because that will tell us how probable each of the E1 microstates is.

What is N2? Well, by the definition of entropy (1),

S2 (E2) = S2 (E - E1) = k log N2

Now, since we assume that S2 is a smooth function and E1 is very small compared to E, we can approximate S2 (E - E1) by

          d S2 (E)
S2 (E) -  -------- * E1 = k log N2
             dE

But the derivative is just 1/(T2), and at equilibrium T2 = T1 = T; so

                      E1
k log N2 = S2 (E) - ------
                      T

or

            S2 (E)              E1
N2 = exp ( ------- ) * exp ( - ---- )
              k                 kT

There's the Boltzmann factor! So as long as a system is in contact with a great big heat bath at some temperature T, the probability of any microstate of system 1 with energy E1 has a factor proportional to e^(-E1/kT), because the number of microstates of the heat bath (which, in practice, could just be the outside world!) for energy E-E1 is proportional to this. This is the Boltzmann factor. The other factor is just a constant, which will be the same for all of the states; we can ignore that because we typically take care of the overall normalization of probabilities by dividing expectation values by the integral of e^(-E/kT), the famous partition function.

5. Making it intuitive

Now that we've done the math, let's go back and look at what it means. The more energy system 1 has, the less energy the heat bath has. We've said that the heat bath's temperature is not significantly affected by its energy. That means that its entropy changes by a constant amount for a constant change in energy... which, in turn, means that the number of available microstates of the heat bath changes by a constant factor for a constant change in energy. That is, it has to vary exponentially with energy.

Is there some nice, intuitive way to see why this should happen? All that we've specified is that the system is really big. Suppose, in the spirit of Max Planck, that it consists of some large number N of oscillators of some sort (which could be atomic electrons, or standing-wave states of moving atoms in a box, or anything you like), and that the oscillators possess a quantized set of energy levels (not necessarily evenly spaced or anything). If you put in a little more energy, then some of the oscillators will get bumped up one or more levels. The more energy you put in, the more oscillators this can happen to.

If you put in just enough to bump up one oscillator, there will be N ways to do this. Put in twice as much, and maybe the same oscillator could get bumped up another level, but that scenario is negligibly improbable compared to the probability that any one of the other N-1 oscillators will get bumped up. N-1 is practically the same as N if N is big, so there are, to an excellent approximation, N*N ways to put in twice as much energy. By the same reasoning, there will be N*N*N ways to put in three times as much... and so on. There's the constant factor associated with a constant change in energy.

This description is a little overidealized, because of course the oscillators won't all be at the same energy level to begin with, and they might not all be identical, so the spacing to the next level will in general vary. But even so, the average spacing to the next level won't vary significantly with the amount of energy you put in, so as the quantity of energy gets large enough to excite a lot of oscillators, the number of oscillators excited will vary linearly with the energy and the number of microstates will vary exponentially. This state of affairs will continue until you've put in enough energy that a significant fraction of the total number of oscillators, N, have been excited by it. Then, the temperature starts to rise appreciably; but you can postpone that by making N large enough.

So the Boltzmann factor is just there because in a really big system, the number of microstates goes up exponentially with its energy. Naturally, taking energy out of the heat bath will make the probability of a state of the other system decrease by an exponential factor!

To do all this rigorously, you imagine doing a large number of experiments under the same macroscopic conditions and measuring the number of times you get each microscopic state; that is how the probabilities are defined. The collection of experiments is called an "ensemble." What I have done here is, in modern language, how you go from a "microcanonical ensemble" (in which energy is held precisely constant, but other things are allowed to fluctuate with the microscopic state) to a "canonical ensemble" (in which, instead, it is temperature that is held constant). If particle number is allowed to vary (by chemical reactions or diffusion across a membrane or something like that), then a completely analogous derivation leads one to the "chemical potential" and the Gibbs canonical ensemble.

Most of equilibrium statistical mechanics follows.

Last modified April 25, 2000
Home - Physics - Top Matt McIrvin mmcirvin@world.std.com