Sir Clive Sinclair had a dream: everyone should own a computer. In the early '80s, this was quite an ambitious, almost foolhardy thing to say, given that the cost of computing machinery was well beyond the grasp of individuals. Despite the hurdles, Sinclair Research Ltd. produced one of the most popular personal computers in Great Britain and, later on, in Europe: the Sinclair ZX Spectrum.

As a child I owned one of these machines, and because of it I am, happily, a computer scientist. Although the Spectrum is no longer produced, my nostalgia for this machine never quite disappeared. Recently, I embarked on a project to write an "emulator" (in Java) that could run some favorites from the prolific library of Spectrum software. This article is about the challenges encountered and my accomplishments during this adventure. The source code can be downloaded from www.sys-con.com/java/sourcec.cfm.

What Is an Emulator?
According to Merriam-Webster an emulator is "hardware or software that permits programs written for one computer to be run on another usually newer computer." Emulators are in relatively common use today, although they are invisible - a powerful testimonial to the fact that they do their job well. The Java programming language is popular in large part because it can run on many different computers. This is achieved via an emulator (a "virtual machine") that allows Java programs to be executed on different platforms.

Emulators are particularly satisfying to write. Building an emulator is challenging, since it requires an intimate understanding of both the emulated machine and the host machine in order to bridge them together. Emulators are hard to get right since they require extensive attention to detail: every single facility of the emulated environment, whether it's a CPU instruction or interactions between two components, must work as per spec, or the program running on the emulator will likely fail. Once an emulator is completed, it's possible to revive old programs in such a way that they're completely oblivious to their new surroundings. It's almost like traveling back in time.

Emulation in Java
Java has matured tremendously in the years since I first came into contact with it. Since writing emulators has been an on-and-off hobby of mine, I thought it would be an interesting experiment to see what it would take to implement one in Java.

The first two challenges I had to think about before starting down this path were performance and timing. Performance is critical to the success of an emulator since no matter how accurate, if an emulator runs programs significantly slower than the original hardware, no one would want to use it. The Java HotSpot engine has recently advanced the performance of Java programs by leaps and bounds, yet the risk still remains: for every instruction of an original program, the emulator may have to execute tens or hundreds of instructions. Timing is also critical since Java does not (yet) have facilities for running programs in real time. Most notably, garbage collection may well interfere with the proper execution of the emulated program and cause significant pauses, making the programs appear jerky onscreen.

On the flip side, Java provides an excellent environment for writing code since it has a powerful and expressive API. The emulator I'll be describing is loosely based on an open-source emulator (implemented in C) that I worked on with a few other people a number of years ago. The Java source is an order of magnitude more compact and more elegant than the original C source. The Java AWT API significantly reduced the burden of implementing the screen emulation (one of the harder aspects of the original emulator). Last, but not least, the emulator can run on any Java Virtual Machine.

The Sinclair ZX Spectrum
The machine I wrote the emulator for is the Sinclair ZX Spectrum: one of the first personal computers. For a bit of history on the machines and the man behind them, visit www.nvg.ntnu.no/sinclair/.

The Java emulator, called "JZX", is loosely based on a Linux native emulator called "XZX" that I worked on a number of years ago (www.zx-spectrum.net/xzx/).

Despite its simplicity, the Spectrum was fully capable of running sophisticated software such as games, Pascal and C compilers, databases, and word processors.

Architecture Overview
The Spectrum is assembled as in Figure 1 and operates as follows:

The software architecture resembles Figure 1, but is different in a few key aspects (see Figure 2).

The emulator emulates two distinct machines: the original 48K Spectrum model and the subsequent 128K one. Since the machines are similar, 95% of the emulator code base is shared. Not all peripherals from the original Spectrum are emulated: the I/O ports and the speaker are missing. The I/O ports are not present since the original hardware needed them for loading and saving software onto magnetic tape, while the emulator uses files instead. The speaker is not present since it would be almost impossible to emulate it correctly in Java: the sound in the original Spectrum was produced by turning the speaker on and off rapidly to create the appropriate frequency.

Every Java class that represents a Spectrum component is derived from BaseComponent (see Listing 1). BaseComponent and BaseSpectrum form a (extended) Composite pattern, where BaseSpectrum is the Composite and every BaseComponent has a reference to its parent. Any BaseComponent can access its parent and from there, any other sibling. This simulates the "bus" of the original machine. For example, whenever a byte is written into the video RAM, the BaseMemory object can retrieve its BaseSpectrum parent from which it can retrieve the appropriate BaseScreen object and subsequently update the current screen frame.

The BaseComponent class also imposes the contract for the major "lifetime events" of the emulator: startup, reset, shutdown, and load.

When the top-level BaseSpectrum object is created, it calls init() on itself and all its children, followed by reset(). When the emulator is shut down, it calls terminate() in the same way.

Emulation Challenges
Main Loop
The main loop of the emulator resides in the BaseSpectrum class (see Listing 2).

Every instruction executed by the Z80 CPU takes a certain amount of time, which is a multiple of one "state" (also known as "T-state"). The ULA renders one line of the screen every 224 CPU states (STATES-PER-LINE), so it's essential to keep track of how many states pass every time the CPU decodes and executes one instruction. For efficiency purposes, the screen is not updated every time a new line becomes available, but rather every 50 (TV-LINES) lines, which means one frame every 20 milliseconds.

The CPU interrupts are simulated by means of wait()ing on an external Thread object, which simply sets a public field to "true" and calls notifyAll() every 20ms, at which point the main emulator loop notifies the CPU of the interrupt. The reason for wait()ing on the interrupt is that the CPU emulation runs at the speed of the host machine; no attempt is made to slow it down in order to run at the original Spectrum speed, except whenever a screen frame is rendered. This turns out to be entirely appropriate and makes the emulation both fast and believable. Note that it's possible for the main emulation loop to "skip" one interrupt (if, for example, refreshing the screen takes too long). This is a measurably low risk that would take place only on slower machines and would not be readily visible.

Memory Emulation
The Z80 is a 16-bit CPU, meaning it can index up to 64K of memory. This works naturally for the 48K Spectrum (16K ROM and 48K RAM.) The 128K Spectrum, on the other hand, has 12x16K pages (4xROM + 8xRAM); the software can select any four to be "seen" by the CPU. The BaseMemory implementation uses pages for the emulation in both models to achieve maximum reuse.

The BaseMemory object allows the CPU to select the "visible" pages via the method "public void pageIn(int frame, int page)". It also allows direct access to the page data via "public byte[] getBytes(int page)" (useful for the screen emulation, for instance).

All memory operations must convert "virtual" addresses to physical ones. The frame number is simply the first 2 bits of the "virtual" address; the frame offset is the remaining 14 bits (see Listing 3).

Signed and Unsigned Data Types
You may be wondering about the return types of the methodes in Listing 3; they both return an integer (32 bits) despite the fact that they should perhaps return a "byte" (8 bits) and, respectively, a "char" (16 bits).

The Z80 CPU can operate on 8-bit or 16-bit values, either directly or via its registers. Although Java natively supports data types that are 8- and 16-bits wide, the emulator is implemented almost exclusively in terms of integer types. The reason for this is that the Java byte is a signed type, while the Java char is an unsigned type.

Consider the following (fictitious) Z80 instruction that adds the contents of the A register (8 bits) to the contents of the HL register (16 bits) and stores the results in the HL register:

Surprised? The Java language specification (paragraph 5.6.2) mandates that all binary operations where the operands are of type integer (or smaller) should be promoted to integer first. In our case, the (byte) value 0x80 and the (char) value 0x0001 are first promoted to integer before they're added. Since the byte is a signed type, the integer promotion yields the value 0xFFFFFF80. Since the char is an unsigned type, the integer promotion yields the value 0x00000001. When the two integer values are added, the end result is 0xFFFFFF81, which is then truncated to a char, yielding the value 0xFF81. The only way to avoid this behavior is to explicitly prevent the sign extension in the widening conversion. The new code would look something like this:

The "&" will mask all bits but the last 8, yielding (the integer) 0x80, which is then added to "b", producing the correct result.

Although this solution solves the problem, it's only a partial solution for the CPU emulation as a whole. The reason has to do with the CPU flags that indicate whether overflow occurred during a particular operation. Java has no mechanism for indicating overflow, so I must always use a Java data type that's larger than the resulting value. The emulator would explicitly test for overflow and truncate appropriately. In the end, the only feasible solution is to use integer types everywhere, and explicitly deal with issues of truncation and overflow.

To keep the code readable, I adopted a naming convention that exposes the size of the data types involved. The size in bits of the return value and each argument of a function is appended to its name. For example, "int read8(int val16)" means that the function returns 8 bits of data, and receives as an argument 16 bits of data, all embedded in an integer as the less significant bits. Furthermore, the convention is that all input arguments are correct and need no further modifications, while all return values need to be correctly truncated before being returned.

CPU Emulation
In addition to the signed/unsigned challenges described earlier, the CPU poses additional problems in the area of instruction decoding. The decoder is implemented as a large "switch()" statement, which switches on the first byte of the current instruction. Naturally, the code is very large and rather unwieldy to read and modify. One possible solution for dealing with such a large piece of contiguous code would be to have an array of IRunnable objects (that can decode a particular instruction in the "run()" method) indexed on the instruction code.

This approach would allow the code to be structured more elegantly, but it would proliferate the number of classes and impose significant runtime overhead. The "switch()" approach, while difficult to write and maintain, is extremely fast since the JVM implements it internally as a jump table, thereby exhibiting the same architectural approach as the object array, without the performance penalty of invoking an interface method for every instruction.

Screen Emulation
Each pixel on the Spectrum screen can be either on or off. This is represented by the appropriate bit value in a byte (the state of 8 adjacent pixels is governed by the byte value at a particular memory address in video RAM). Color information is represented by another byte.

To draw pixels on the screen, the BaseScreen object extends java.awt.Canvas and implements the drawing logic in the paint() method. A nice side effect of this is that the emulator can be "embedded" into any AWT or Swing container that can render Canvas objects. This allows the emulator to run seamlessly as a standard application or an applet. The BaseScreen object uses an offscreen image to render the screen contents, after which the image is drawn directly onto the screen via java.awt.Graphics.- drawImage() (this common technique prevents flickering).

Every time the CPU writes into the screen memory area, the BaseScreen object is notified. For maximum efficiency, the only action taken at this time is to toggle a Boolean value in an array that indicates that the particular screen byte has changed. When paint() is called, a for loop iterates through the Boolean array and, for every "true" value, it draws the corresponding byte into the offscreen image.

The mechanism for fast updates to the offscreen image is the challenging part. The naÃ¯ve technique is simply to use java.awt.Graphics.fillRect() to render every pixel into the image. While this works, it's very slow due to the overhead of calling the fillRect() method and running it many times for 1x1 rectangles.

A better technique is to create the offscreen image as a decorator for a java.awt.image.MemoryImageSource object. The MemoryImageSource is created, in turn, containing a byte array in RGB format with the pixel data. The rendering code updates the byte array and then calls MemoryImage-Source.newPixels() to notify the object that the data has been updated (see Listing 4).

Table 1 provides the timing results, in milliseconds, for rendering 200 consecutive frames of the (same) Spectrum game (the hardware/software configuration is Windows 2000 Professional, Pentium III 650, 192MB RAM).

These timings are barely adequate: as you recall, each Spectrum frame is refreshed every 20ms. If rendering the frame takes longer than 20ms, the emulation will look choppy (it will skip frames and slow down the machine overall).

To improve performance, I made a key observation about the way color is encoded in the Spectrum. As described earlier, every byte in video RAM is paired with another byte that describes its color: the first byte simply shows whether the pixels are "on" or "off" (the "pixel" byte) and the second byte shows what color the pixels are (the "color" byte). This means there are a total of 256 * 256 different ways that a location in video RAM could appear on the emulated screen (256 values for the "pixel" byte and 256 values for the "color" byte). I can prerender some (fixed) number of these "pixel/color" byte combinations as Java image objects and then simply use java.awt.Graphics .drawImage() to render that piece of the video RAM on the screen (see Figure 3.)

As you can see, the performance improvements are dramatic for Java1 (>90%); for Java2, however, the performance is far worse (a slowdown or more than 1,000%!). The reason for the bad Java2 performance lies in the performance of java.awt.Graphics.drawImage(). This discussion is beyond the scope of this article, but you can read more about it on the Java Developer Connection Web site (http://developer.java.sun.com/developer/) in the BugParade section (http://developer.java.sun.com/
developer/bugParade/bugs/4276423.html).

To resolve the performance problems in Java2, I use a different (and more conventional) technique that's similar to the MemoryImageSource technique described earlier. In Java2, the offscreen image object is a superclass of java.awt.Image, namely a java.awt.image.BufferedImage. This class has a method called setRGB() that allows you to set an RGB pixel array directly into the Image object, without the performance penalties of MemoryImageSource. newPixels().

Note that prerendering all possible 256 * 256 (= 65536) Java image objects will take a toll on the memory footprint of the emulator. If I want a fixed-size cache of these images, I run the risk of "thrashing" in the cache. Discarding entries when the cache is full means the garbage collection will have more work, slowing down the emulation. It's possible to reuse entries in the cache (instead of discarding them), but this will bring us back to the original performance problems with Graphics.drawRect() or MemoryImageSource.newPixels(). A better idea is to use only half the "pixel" byte (a nibble) to prerender Java images. This means that any "pixel" byte will be drawn by concatenating two prerendered Java images. The total number of prerendered nibbles is far more manageable: 16 * 256 (=4096.) The tradeoff is that I now need to make twice as many calls to java.awt.Graphics.drawImage(), but that turns out to be inconsequential.

Debugging Techniques
Debugging the emulator is very challenging. The hardest part to debug is the CPU emulation, primarily due to its sheer size and complexity. The CPU emulation code is bigger and more complicated than the rest of the emulator. Although the Z80 CPU is simple by today's standards, it has a great many flags, registers, and instructions that manipulate these flags. For example, any addition will modify the sign flag, parity flag, carry flag, half-carry flag, and add-subtract flag. The Spectrum software uses all these flags, and any mistake most often translates into a "hard reset" of the emulated Spectrum.

Furthermore, determining exactly where the emulated software crashed is not as easy as watching for the equivalent of an illegal memory access or page fault. (There's no such thing on the Spectrum.) An incorrectly decoded instruction will most often translate to a Spectrum "hard reset" or "hang" thousands of instructions down the stream from it.

The easiest way to debug the emulator is to use another emulator (that's known to be correct) and compare the CPU traces for executing the same program. In this case, I modified the original Linux native emulator to output CPU traces (a CPU trace is the state of all registers and flags after executing every instruction). This has the advantage of pinpointing the precise spot where the Java emulator diverges from the native emulator and thus dramatically reduces the time required for debugging.

Performance Considerations
I was pleasantly surprised to see that it was not only possible to implement a Java emulator for the Spectrum, but that it also ran fast. The CPU emulation is on par with the native emulation; the screen emulation, while slightly slower and a bit more awkward, is well within the limits of realistic emulation for what I would consider "average" hardware. Surprisingly, and most probably due to screen emulation, Java2 didn't fare dramatically better than Java1, which puts Java1 on the map as a reasonable contender for this type of work.

Conclusion
As a platform for emulation, Java is a very strong player. Although the Sinclair Spectrum is not a terribly complex machine by today's standards, it poses significant challenges in its implementation, and requires strong support both in terms of language features and overall performance. Java's elegant and expressive language rises to the challenge and overcomes it easily with code that's more readable, more modular, and far more concise. Java's performance, the wild card in the equation, also meets expectations.

Author Bio
Razvan Surdulescu is a software developer at Trilogy in Austin, TX, where he writes e-business software in Java.

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON is independent of Sun Microsystems, Inc. SYS-CON, JDJEdge 2002 International Java Developer Conference or Java Developer's Journal is not affiliated with Sun Microsystems, Inc.