A concise introduction to elementary
statistical mechanics, or:
Where does the Boltzmann factor come from?
by Matt McIrvin
Contents
- Introduction: What it's used for
- Probability and entropy
- Temperature
- The Boltzmann factor
- Making it intuitive
(Even if you've never heard of the Boltzmann factor, you might
appreciate the following. There is some algebra and a little
differential calculus in it, but very little knowledge of physics
is assumed, basically just the concept of energy.)
1. Introduction: What it's used for
From time to time I have heard people ask on sci.physics (and in
other places, such as the American Journal of Physics,
whose Questions and Answers section is basically a more genteel
sci.physics in print form) where the Boltzmann factor e^(-E/kT) in
statistical mechanics comes from.
This is a factor that shows up in situations where the
temperature, T, is given; it is (proportional to) the probability
that the system is in a state with energy E, where k is Boltzmann's
constant (which you may have seen in the ideal gas law in a
chemistry class). The more energetic the state is, the less
probable it is, but increasing the temperature increases the
probability of the more energetic states. The Boltzmann factor is
the basis of a huge amount of thermodynamic and statistical
physics, both classical and quantum.
The factor gives the probability of a single state; the
probability of a given energy also depends on how many
possible states there are with that energy. In the quantum case you
can often actually count the states discretely when adding up
probabilities. In the classical case the states form a continuum,
and you have to replace the sums over states with integrals over
phase space. This is one place where the quantum calculations are
actually easier to understand, at least to me, and I'm going to
assume in the following that states are always discrete. It amazes
me that Ludwig Boltzmann managed to figure out more or less the
argument that I'll describe here before quantum mechanics
was invented.
So why does the expression have this specific form? Feynman
justifies it heuristically in the Feynman Lectures on
Physics by reference to the "exponential atmosphere"; thermo
textbooks usually give a more or less complete explanation, but
it's not all in one place and it's hard to see the flow of the
logic. I've decided, therefore, to spend a little time writing up a
concise and not terribly rigorous explanation, which introduces
some of the basic concepts of statistical mechanics along the way.
My inspiration mostly comes from Erwin Schrodinger's book
Statistical Thermodynamics; I highly recommend it-- it was
written by a giant, it's fairly easy to read, it's very short, and
as science textbooks go, it's extremely cheap (it is now a Dover
reprint).
Contents
2. Probability and entropy
Any physical system that is made up of many, many tiny parts
will have microscopic details to its physical behavior that are not
easy to observe. There are various microscopic states the
system can have, each of which is defined by the state of motion of
every one of its atoms, for instance. But all we can measure easily
are its macroscopic properties like density or
pressure.
(You might wonder whether there is some fundamental, physical
difference between macroscopic properties and microscopic ones.
Really, there isn't. The macroscopic properties are just the ones
we choose to measure or control, and the microscopic properties are
the ones that jitter around behind the scenes. Usually, the
macroscopic properties have to do with things we can comfortably
measure with human-sized equipment, hence the name; but that is not
necessarily so. In an age when atoms can be photographed and hauled
about individually and piled like cannonballs, you can see that the
distinction is somewhat porous.)
In the sort of situation studied in statistical mechanics, the
microscopic state is constantly thrashing around randomly (subject
only to conservation laws), and in such a situation the Second Law
of Thermodynamics comes into play.
The Second Law of Thermodynamics can be nicely stated as
follows: A physical system will, if isolated (that is, if energy
cannot get in or out), tend toward the available macroscopic state
in which the number of possible microscopic states is the
largest.
This makes sense; if there are many different ways to
have a certain set of macroscopic parameters, that ought to
increase the likelihood of the system being in that macrostate.
It's like rolling a pair of dice. Suppose that the "macrostate" is
the total of the dice. There are six ways to get a total of 7 from
the "microstates" of the two dice, but only one way to get a total
of 2 (snake-eyes) or 12 (boxcars), so 7 is more likely.
Each die can have any of six "microstates", and for each
microstate of one die, the other die can be in any microstate, so
the number of microstates of the whole system is 6*6 = 36. In
general, if you combine two systems into a bigger system,
the number of possible microstates multiplies.
If you have hundreds of fair dice and put them all together, the
total for which the number of possible microstates is at a maximum
is (if you work it out) at a value of 3.5 times the number of dice.
If you just shake these dice up in a bag it will be extremely
improbable for you to get a total which is much different from
that; if you do the experiment many times, the average deviation
from this number is nearly certain to be much smaller than the
total. If you carefully put the dice into the bag so that they show
some vastly different total, then shake up the bag, the total of
the uppermost faces of all the dice will converge rapidly on the
value (the "macrostate") for which the number of ways to make it
from individual dice ("microstates") is at a maximum. It's just the
same for the macrostates of an isolated system (except for the
additional restriction that some quantities, like the total energy,
may be subject to conservation laws); thermal fluctuations do the
"shaking," and the macroscopically measurable quantities converge
on the values with the largest number of microstates. This is what
the Second Law says.
When systems are combined, the numbers of microstates multiply.
It's inconvenient to multiply all of these truckloads of numbers,
especially if there are 10^23 atoms to deal with! If you take the
logarithms of numbers, then the log of the product is the sum of
the logs. We can get a quantity maximized by the Second Law, which
adds when you put systems together, by taking the
logarithm of the number of microstates, as written on Ludwig
Boltzmann's tombstone:
S = k log N (1)
where N is the number of microstates, "log" is
the natural log, and k is an arbitrary constant, called Boltzmann's
constant. S is the entropy. Stated in terms of this quantity, the
Second Law says that isolated systems tend toward an equilibrium
macrostate with as large a total entropy as possible, because then
the number of microstates is the largest.
If you add up the value of S for all the possible macrostates
corresponding to a given energy, then you can get the entropy
corresponding to that energy, S(E).
Entropy is sometimes described as a measure of "disorder," and
if you ponder the definition given above, you can see that in a
narrow sense that is true. However, the word "disorder" has all
sorts of esthetic, moral, and political meanings that have nothing
to do with k log N. The identification of entropy with disorder in
the broader sense (combined, sometimes, with neglect of the Second
Law's proviso about isolated systems) has been used to justify many
specious arguments, such as the claims that the Second Law
disproves evolution or that it proves that technology is evil. It's
better just to keep the Boltzmann definition, which is wonderfully
simple, firmly in mind. Entropy has to do with the number of
ways that the microstate can rearrange itself without
affecting the macrostate.
Contents
3. Temperature
Okay, so what happens if you have two systems, and you put them
in thermal contact so they can exchange some energy? The total
energy, E = E1+E2, is constant, but E1 and E2 can individually
change. What will the individual systems look like in
equilibrium?
Now the two individual systems are not isolated, but the
combination of the two is, so the Second Law should apply as long
as we consider the combination as a whole. The number of
microstates, N1*N2, will be as large as possible, so the total
entropy, k (log N1 + log N2), will also be as large as possible: S
= S1(E1) + S2(E - E1) will be maximized. But then S can't get any
bigger if you change E1, so if you take the derivative of
S with respect to E1, that has to be zero:
d S d S1 (E1) d S2 (E - E1)
---- = ----------- + -------------- = 0
d E1 d E1 d E1
or
d S1 (E1) d S2 (E2)
----------- - ----------- = 0
d E1 d E2
(Really these should be partial
derivatives.)
So it appears that dS/dE, the derivative of entropy with respect
to energy, is the same for the two systems. We call this "1/T",
where T is a quantity called temperature. This is the
definition of temperature! It's defined so that two systems in
thermal contact will tend to equalize their temperatures. We define
k (Boltzmann's constant) in (1) so that T is in whatever
temperature units we want to use. (This definition will put the
zero of temperature at absolute zero, so to use a Fahrenheit or
Celsius scale, we have to add a constant too.)
This is a little like discovering that an old friend,
unbeknownst to you, has actually been working all these years as a
secret agent in outer space, fighting Martians. Temperature is such
a mundane quantity, reported on the evening news, and here we've
defined it in terms of exotic concepts like the logarithm of the
number of microstates! To make it less counterintuitive, just
remember that by this definition, heat energy will always flow from
hotter objects to colder ones when you put them in thermal contact,
because the cold object gets more entropy per unit of energy than
the hot one. This is exactly how temperature ought to behave, to
conform to our usual experience of temperature. If it's cold
outside, that means that the world gains a tremendous amount of
entropy by sucking heat energy out of your body, so you'd better
bundle up.
In high school you are told that temperature is a measure of
energy density, but it really isn't except in very simple
situations in which dS/dE is inversely proportional to the energy
density (like an ideal gas). An expression always proportional to
energy density would not behave in the way we want. The temperature
is admittedly a function of the energy density (not the
total energy, since both E and S scale in the same way with the
size of the system if energy density is constant; temperature is an
"intensive quantity"). But it is generally a nonlinear function of
energy density, and it is different for different substances, and
for the same substance under different conditions; so if you put
two objects of identical energy density in thermal contact, they
will not always be in thermal equilibrium, and two objects in
equilibrium will not always have identical energy densities.
If you have two bags of fair, six-sided dice, and you repeatedly
shake them up and dump them out, then select only the rolls in
which all of the dice add up to some constant number (an
unimaginably tedious exercise, I admit), in the majority of these
rolls the average number on a die will be approximately the same in
each bag. (How could it be otherwise?) But if one of the bags
contains six-sided dice and the other contains those icosahedral
dice that role-playing game addicts carry around with them, the
result will be quite different; a typical die in one bag will carry
more of the total than in the other, even though the system is in
the "thermal equilibrium" obtained by shaking the bags.
Contents
4. The Boltzmann factor
Now that we've defined temperature, the rest is not so hard. We
want to talk about a system that is at constant temperature. That's
not the same as an isolated system, but we can define one in terms
of the other by the method of combining systems to form bigger
systems. We can just put the system we want to study in a great big
"oven" that keeps it at constant temperature.
Consider the following pair of systems in thermal contact. One
(system 1) is the system we're actually studying, and the other
(system 2) is a "heat bath": an immense system at some temperature
T2, which is so big that it has nearly all of the total energy E of
the combined system. In other words, E1 << E. Suppose
furthermore that the entropy of system 2 is a fairly smooth
function of E whose derivative doesn't vary much over a range of
size E1. In other words, system 2, the heat bath, can absorb energy
E1 without the temperature changing significantly at all.
Since the temperature is some function of the energy
density, and in practice it will be quite continuous
unless the heat bath is made of an unsuitable substance like
nitroglycerine, this requirement is also easy to satisfy just by
making the heat bath really big, so that the energy density changes
very little when E1 is absorbed.
Now, for a given total energy E = E1+E2, the probability that
there will be some energy E1 is just proportional to the total
number of microstates of system 1 with energy E1 (call it N1) times
the total number of microstates of system 2 with energy E2 (N2).
What we are interested in is N2, because that will tell us how
probable each of the E1 microstates is.
What is N2? Well, by the definition of entropy (1),
S2 (E2) = S2 (E - E1) = k log N2
Now, since we assume that S2 is a smooth
function and E1 is very small compared to E, we can approximate S2
(E - E1) by
d S2 (E)
S2 (E) - -------- * E1 = k log N2
dE
But the derivative is just 1/(T2), and at
equilibrium T2 = T1 = T; so
E1
k log N2 = S2 (E) - ------
T
or
S2 (E) E1
N2 = exp ( ------- ) * exp ( - ---- )
k kT
There's the Boltzmann factor! So as long as a
system is in contact with a great big heat bath at some temperature
T, the probability of any microstate of system 1 with energy E1 has
a factor proportional to e^(-E1/kT), because the number of
microstates of the heat bath (which, in practice, could
just be the outside world!) for energy E-E1 is proportional to
this. This is the Boltzmann factor. The other factor is just a
constant, which will be the same for all of the states; we can
ignore that because we typically take care of the overall
normalization of probabilities by dividing expectation values by
the integral of e^(-E/kT), the famous partition function.
Contents
5. Making it intuitive
Now that we've done the math, let's go back and look at what it
means. The more energy system 1 has, the less energy
the heat bath has. We've said that the heat bath's
temperature is not significantly affected by its energy. That means
that its entropy changes by a constant amount for a constant change
in energy... which, in turn, means that the number of available
microstates of the heat bath changes by a constant factor
for a constant change in energy. That is, it has to vary
exponentially with energy.
Is there some nice, intuitive way to see why this should happen?
All that we've specified is that the system is really big. Suppose,
in the spirit of Max Planck, that it consists of some large number
N of oscillators of some sort (which could be atomic electrons, or
standing-wave states of moving atoms in a box, or anything you
like), and that the oscillators possess a quantized set of energy
levels (not necessarily evenly spaced or anything). If you put in a
little more energy, then some of the oscillators will get bumped up
one or more levels. The more energy you put in, the more
oscillators this can happen to.
If you put in just enough to bump up one oscillator, there will
be N ways to do this. Put in twice as much, and maybe the
same oscillator could get bumped up another level, but
that scenario is negligibly improbable compared to the probability
that any one of the other N-1 oscillators will get bumped up. N-1
is practically the same as N if N is big, so there are, to an
excellent approximation, N*N ways to put in twice as much energy.
By the same reasoning, there will be N*N*N ways to put in three
times as much... and so on. There's the constant factor associated
with a constant change in energy.
This description is a little overidealized, because of course
the oscillators won't all be at the same energy level to begin
with, and they might not all be identical, so the spacing to the
next level will in general vary. But even so, the average
spacing to the next level won't vary significantly with the amount
of energy you put in, so as the quantity of energy gets large
enough to excite a lot of oscillators, the number of oscillators
excited will vary linearly with the energy and the number of
microstates will vary exponentially. This state of affairs will
continue until you've put in enough energy that a significant
fraction of the total number of oscillators, N, have been
excited by it. Then, the temperature starts to rise appreciably;
but you can postpone that by making N large enough.
So the Boltzmann factor is just there because in a really big
system, the number of microstates goes up exponentially with its
energy. Naturally, taking energy out of the heat bath will make the
probability of a state of the other system decrease by an
exponential factor!
To do all this rigorously, you imagine doing a large number of
experiments under the same macroscopic conditions and measuring the
number of times you get each microscopic state; that is how the
probabilities are defined. The collection of experiments is called
an "ensemble." What I have done here is, in modern language, how
you go from a "microcanonical ensemble" (in which energy is held
precisely constant, but other things are allowed to fluctuate with
the microscopic state) to a "canonical ensemble" (in which,
instead, it is temperature that is held constant). If particle
number is allowed to vary (by chemical reactions or diffusion
across a membrane or something like that), then a completely
analogous derivation leads one to the "chemical potential" and the
Gibbs canonical ensemble.
Most of equilibrium statistical mechanics follows.