Limitations of the IBM PC Architecture

or

The Curse of Segments

Ancient History

Altair shipped the first "personal computer" in 1975. It was a true computer, and could be purchased, in kit form, for $400. Input was through front panel toggle switches; output through front panel LEDs (on/off, not 7-segment). Its market was primarily hobbyists and hackers.

By the late '70s, personal computers were available from many vendors, such as Tandy, Commodore, TI and Apple. Computers from different vendors were not compatible. Each vendor had their own architecture, their own operating system, their own bus interface, and their own software. When you purchased a computer, you implicitly made a major commitment to that vendor's standards.

The 6502

In 1980, the state of the art architecture was exemplified by the Commodore 6502. The 6502 is a single-chip microprocessor, with an 8/16 architecture, i.e. an 8-bit data bus and a 16-bit address bus. The programming model reflects the hardware architecture: registers that hold data values, such as the accumulator, are 8 bits, while registers that hold addresses, such as the program counter, are 16 bits. A 16 bit address bus yields a 64K byte address space. By the early '80s, 64K DRAMs were widely available, and 6502-based computers, such as the Commodore 64 and the Apple ][, were shipping with a full complement of 64K bytes of RAM.

In 1980, IBM decided to enter the PC market. They realized—correctly—that the never-ending fall in DRAM prices would soon make the 8/16 architecture obsolete. The next logical step would have been, say, a 16/32 architecture, such as the Motorola 68000. A 16/32 architecture would have improved performance by doubling memory bandwidth, and provided a 4G byte address space—enough for the foreseeable future.

Unfortunately, the 68000 wasn't actually available in 1980, and neither were any other single-chip microprocessors with a 16/32 architecture. The problem is that die size scales up directly with register width. When you go from 8/16 to 16/32, all the registers get twice as big, all the data paths get twice as wide, the ALU's adder carry chain gets twice as many terms—the whole design doubles in size. And in 1980, process technology simply hadn't reached the point where a single-chip 16/32 microprocessor could be fabricated at a marketable price.

The 8086

What was available was the Intel 8086. The 8086 was an ill-conceived attempt to provide an address space larger 64K bytes without actually incurring the costs of a larger architecture. The 8086 is basically a 16/16 architecture. It has a 16-bit program counter, a 16-bit ALU, four 16-bit general purpose registers, and some 16-bit index registers. It also has four 16-bit segment registers. The 8086 performs all computation and data transfer in 16-bit arithmetic, with one exception. Immediately before gating an address onto the external address bus, the 8086 selects one of the segment registers, shifts it 4 bits to the left, and adds the address to it, using 20 bit arithmetic. The external address is therefore 20 bits, and the processor has an address space of 1M byte.

The four segment registers define four segments:

CSthe code segment
DSthe data segment
SSthe stack segment
ESthe extra segment

Most operations implicitly use the correct segment register: instruction fetches use CS, loads and stores use DS, pushes and pops use SS. A few operations, such as block move, use both DS and ES—one for source and one for destination.

Intel's documentation describes this architecture as a programming convenience: here's your code, here's your data, each neatly stored in its own segment. Students of computer science also like segmented architectures for various reasons having to do with OS design.

However, actually programming this machine is a nightmare. You can never just address anything. First, you have to make sure that a segment register is set up for it, and then you have construct the address as an offset into that segment. A segment register can point anywhere in the entire 1M byte address space, but once it has been set, it only provides access to a 64K segment. If you have more than 64K of code or data, you have to reload segment registers on the fly. A particular problem is that there is no good way to index into an array that is bigger than 64K bytes.

IBM and the clones

Under normal circumstances, a design so twisted and flawed as the 8086 would have simply been ignored by the market and faded away. However, 1980 was Intel's lucky year. IBM chose the 8086 as the processor for the PC. Backed by IBM's marketing might and name recognition, the IBM PC quickly captured the bulk of the market. Other vendors either left the PC market (TI), pursued niche markets (Commodore, Apple) or abandoned their own architecture in favor of IBM's (Tandy). With a market share approaching 90%, the PC became a de-facto standard. Software houses wrote operating systems (MicroSoft DOS, Digital Research DOS), spread sheets (Lotus 123), word processors (WordPerfect, WordStar) and compilers (MicroSoft C, Borland C) that ran on the PC. Hardware vendors built disk drives, printers and data acquisition systems that connected to the PC's external bus.

Although IBM initially captured the PC market, it subsequently lost it to clone vendors. Accustomed to being a monopoly supplier of mainframe computers, IBM was unprepared for the fierce competition that arose as Compaq, Leading Edge, AT&T, Dell, ALR, AST, Ampro, Diversified Technologies and others all vied for a share of the PC market. Besides low prices and high performance, the clone vendors provided one other very important thing to the PC market: an absolute hardware standard. In order to sell a PC clone, the manufacturer had to be able to guarantee that it would run all of the customer's existing PC software, and work with all of the customer's existing peripheral hardware. The only way to do this was to design the clone to be identical to the original IBM PC at the register level. Thus, the standard that the IBM PC defined became graven in stone as dozens of clone vendors shipped millions of machines that conformed to it in every detail. This standardization has been an important factor in the low cost and wide availability of PC systems. It has also been a serious obstacle in the attempt to move beyond the limitations of the PC architecture.

Address space blues

The 8086 gives the PC an address space of 1M byte. This is conveniently displayed as sixteen disjoint 64K byte segments, assigned as follows:

SegmentUse
0-9RAM
A-Bvideo RAM
C-DROM on I/O cards
E-FROM BIOS (operating system code)

With 10 segments allowed for RAM, the PC can address up to 640K bytes of main memory. The space between 640K and 1M is reserved for hardware and operating system use. By the mid '80s this architecture was becoming obsolete. 256K and 1M byte DRAM chips were available; users were buying PCs with a full complement of 640K of RAM and wanted more. Unfortunately, as the table above shows, there is no place to put any more memory on a PC.

One solution was bank-select memory systems. A vendor would design a memory card, add some bank-select registers, and map selected blocks of memory into the PC address space, typically at C0000. With a bank-select system, the programmer is responsible for managing the bank-select registers and keeping track of which bank has which data. Today, bank-select systems generally conform to the Lotus/Intel/MicroSoft Expanded Memory Specification (LIM-EMS). In this context, the word expanded specifically denotes a bank-select system.

Expanded memory no doubt allowed a few programs to use more than 640K of RAM, but it is clearly inadequate as a long-term solution to the need for more memory. The only real solution is to move to a bigger architecture. Intel took the first step by introducing the 80286 processor.

The 80286

The 80286 is similar to the 8086 in concept. It is a 16/24 architecture. The data busses and registers are 16 bits. The external address bus is 24 bits, providing a 16M byte address space. Addresses are specified with a 16-bit segment selector and a 16-bit offset. The segment selector specifies a memory resident segment descriptor. The segment descriptor has a 24-bit segment base, a 16-bit segment size and some attribute bits. To generate an address, the 16-bit segment offset is added to the 24-bit segment base address using 24-bit arithmetic, and then gated to the external address bus.

The 80286 gives the programmer a 16M byte address space. However, it is still hamstrung by the need to manipulate segment registers, and the fact that each segment is limited to 64K bytes, as in the 8086. More significantly, the 80286 is limited by the need to remain PC compatible.

Intel knew that they could not market a new processor unless it could run existing PC programs. Therefore, they designed the 80286 with two different execution modes: real mode and protected mode. Protected mode is the 16/24 architecture just described. Real mode is an exact emulation of the 8086 16/16 architecture. Real mode is sometimes called DOS mode. When an 80286 powers on, it boots up in real mode. This allows it to function as the processor in an IBM PC clone. Used this way, the 80286 provides a performance boost, due to its faster clocks and 16-bit data busses. However, the programmer is still restricted to the PC architecture, with its 1M byte address space and 640K RAM limitation. Since DOS and PC programs will not run on an 80286 processor in protected mode, most 80286 processors are run in real mode.

Extended memory madness

Today, most 80286 PCs are shipped with several megabytes of RAM. Since the 80286 has a 16M byte address space, this memory is addressed linearly—no bank select hardware is necessary. In protected mode, all of the memory is usable. In real mode, the first 640K of memory is accessible, as in the standard PC architecture, and the remainder is not. Memory that is inaccessible because it lies above the 640K limit of the original PC architecture is called extended memory—not to be confused with expanded memory.

The most common use of extended memory is to provide a RAM disk for a DOS system. When the program wants access to data stored on the RAM disk, it sets a mode bit that switches the 80286 to protected mode. This gives it access to extended memory. The program then performs the desired data transfer between its own memory space and the RAM disk in extended memory. There happens not to be any way to return from protected mode to real mode, so the program must then save its state and reset the processor. Upon rebooting, the processor resumes execution of the original program in real mode. In practice, all of this is handled by a device driver for the RAM disk, such as RAMDRIVE.SYS.

Intel intended the 80286 to provide a path for upward evolution of PC systems. In particular, they hoped that its DOS compatibility mode would allow it to gain acceptance, and that once there was a sufficient installed base of 80286 processors, software developers would begin writing operating systems and programs that used the features of protected mode. What actually happened was that PC clone vendors used it as a high-performance 8086, users ran it almost exclusively in real mode, and software developers balked at the intricacies and limitations of the protected mode segmented architecture.

The 80386

Intel's next offering was the 80386. Like the 80286, the 80386 has a segmented architecture, and like the 80286, it has two execution modes: real and protected.

In protected mode, the 80386 is a 32/32 architecture. The segmentation scheme is even more complex than that of the 80286, and I'll spare you the details. It does, however, allow 32-bit segment offsets, so a single segment can be up to 4G bytes. This allows a programmer to define a single segment that covers all of available memory, instead of having to continually juggle a collection of 64K byte segments. It also allows indexing into arrays that are larger than 64K bytes.

In real mode, the 80386 provides an exact emulation of the 8086 16/16 architecture.

Unfortunately, the capabilities of the 80386 are little more used than those of the 80286. DOS and PC programs will not run on an 80386 processor in protected mode, so most 80386 processors are run in real mode. The processor in my current machine runs in real mode. It provides access to 640K bytes of main memory and a 3456K byte RAM disk, for a total of 4M bytes of installed RAM.

Memory models

Like it or not, there are millions of PCs out there, and we have to program them—in real mode. MicroSoft C provides extensive facilities for making the best of a bad situation.

One issue is whether addresses are 16 bits or 32 bits. A 16-bit address provides an offset into a single segment. The segment register must have already been loaded with the appropriate base address. A 32-bit address provides both a segment base address and an offset into that segment. When accessing memory through a 32-bit address, the program first loads the segment register from the upper 16 bits of the address, and then uses the lower 16 bits of the address as an offset into that segment. 32-bit addresses require more memory and more CPU cycles, but they provide access to the entire 1M byte address space of the 8086 processor.

If a program has less than 64K of data, then it can put all of its data into a single data segment and use 16-bit addresses to access it. Similarly, if a program has less than 64K of code, it can put all of its code into a single code segment, and use 16-bit addresses for jumps and subroutine calls. Conversely, if code or data do not fit within these limits, then the program must use 32-bit addresses. MicroSoft C provides for all four possibilities, through a set of memory models:

Memory modelData addresses Code addresses
Tiny 16-bit 16-bit
Small 16-bit 16-bit
Compact 32-bit 16-bit
Medium 16-bit 32-bit
Large 32-bit 32-bit
Huge 32-bit 32-bit

The Tiny memory model is same as the Small model, except that the size of the code and data segments together must not exceed 64K bytes. Also, the Tiny model produces a .COM file instead of a .EXE file. .COM files are slightly smaller and load slightly faster under DOS.

The Huge memory model is the same as the Large memory model, except that individual arrays can exceed 64K bytes in size. However, addresses are still stored as segment:offset pairs, and the compiler declines to perform full 32-bit address arithmetic on them. As a result, Huge arrays are subject to the restriction that the size of the array element must be a power of 2.

Choice of memory model is a compile-time option, so you can easily experiment with different models.


Notes

ill-conceived, twisted & flawed
In my opinion
8086
The IBM PC actually used the 8088, not the 8086. The 8088 is the same as the 8086 internally, but has an 8-bit external data bus. This trades memory bandwidth for lower system cost.
expanded
Think of an accordion, with the bellows expanding out behind the keyboard.
80286
There is also an 80186 processor. It is essentially an 8086 with clocks, interrupt controllers and other support circuitry integrated into a single chip. It is used mainly as an embedded processor in industrial control equipment.
extended
Think of a tower, with additional floors extending higher and higher.
reset the processor
I am not making this up.

Steven W. McDougall / resume / swmcd@world.std.com / 1992 February