Last month I discussed the newly introduced Xilinx Extensible Processing Platform (EPP), which represents a new product line and a new venture for FPGA leader Xilinx. To briefly recap, devices in the EPP device family are essentially a high-end microcontroller or embedded processor based on two ARM Cortex-A9 32-bit RISC processor cores (implemented as hard IP cores and not soft cores in the FPGA fabric), some amount of SRAM used largely for processor cache, some standard peripheral blocks implemented as hard IP cores, and multiple AMBA 4 interconnect buses that link the hard-core, on-chip IP blocks with an FPGA fabric that you can use to create additional peripheral devices or anything else you might need for the digital portion of your embedded design. These Xilinx devices will sell for the low tens of dollars and will consume much less power than full-tilt FPGAs, making them very attractive replacements for 32-bit microcontrollers and standalone processors in certain applications. This month, I want to focus on how you might use those multiple on-chip AMBA 4 buses to communicate with whatever you’ve implemented in the EPP’s FPGA fabric. Xilinx hasn’t yet discussed this sort of technical information, but it’s not too hard to project some basic facts.
There are essentially only three fundamental ways to use the Xilinx EPP’s on-chip AMBA 4 buses to communicate with peripheral devices whether they are hard cores outside of the FPGA fabric or soft cores implemented in the FPGA fabric. Those three ways are: registers, memory-mapped RAM, or streaming. Each of these communications approaches has advantages and disadvantages depending on application needs.
I/O data, control, and status registers date back to the earliest days of peripheral chips that were introduced along with the very first wave of microprocessors back in the 1970s. Back then, registers were generally no wider than eight bits. Data registers were almost always eight bits wide and permitted the passing of individual bytes back and forth between the processor and whatever I/O device lay beyond the peripheral chip. There were peripheral chips for simple parallel I/O, UARTs (universal asynchronous receiver/transmitters) for serial I/O, timer chips, interrupt controllers, and that was pretty much all there was at first. Each control and status register in these peripheral chips had individual bits and bit groups that implemented specific functions such as “set the output pins to be low-true” or “enable the interrupt pin.”
I/O registers were implemented as individual latches, so it was easy to take the output of a latch bit and use it for driving another piece of hardware inside of the peripheral chip or to take a signal and connect it to the D input of a status-register bit. We still use I/O status and control registers in precisely the same way today, inside of large peripheral blocks like Ethernet and video controllers. We simply use a lot more registers than before and they tend to be wider than eight bits these days.
Memory-mapped I/O maps a large array of bus-addressed memory locations into a linear memory array inside of the peripheral device. Often, this memory array is implemented as a RAM inside of the peripheral device but if the memory array is small enough, it might be implemented as a large register bank instead of RAM.
The earliest use for such memory-mapped arrays in I/O chips was for memory-mapped video. The CPU could write an image to memory-mapped video RAM and a simple sequencing controller read out the video and sent it to the display. Initially, access to the video RAM had to be interleaved between processor and display sequencer but eventually as display speeds and resolution increased, video RAM became dual-ported to handle the rising number of access cycles per unit time.
Originally, it took an entire board to create a memory-mapped video controller. I recall using a Vector Graphics Flashwriter video display card in my North Star Horizon S-100 computer to implement fast video for a an early WordStar editing system. I had to write the low-level video drivers in Z80 assembly code to connect the Flashwriter to the CP/M operating system and to WordStar itself. That was back in 1979 and things were mighty primitive back then. The advantage of the memory-mapped video back then was performance. The North Star’s Z80 CPU could directly manipulate every character location on video display without using the serial escape sequences mandated by the use of RS-232 terminals. The processor would write characters directly to the screen with a simple byte move; it could examine characters with a simple byte read; and it could change the character’s attribute with a simple read-modify-write instruction sequence.
In an era where processors were relatively expensive, it made sense to use the CPU running the application code to directly manipulate video on the screen as well. In the 21st century, microprocessors are so cheap and CPUs are so isolated from peripheral devices by caches and bus hierarchies that we have radically changed the way video works in most computers and embedded systems. Most systems now employ separate video processors but there are still certain non-video applications and certain peripheral devices that can still make effective use of memory-mapped I/O to provide direct processor access to peripheral memory.
Finally there’s stream I/O, which directs long transaction bursts to one memory or port address. Large operating systems, Linux in particular, have a great affinity for stream I/O and it’s an essential I/O protocol for streaming audio and video media. (No coincidence there.) Generally, a peripheral processor is required in such streaming applications to interpret commands embedded within the data stream and to separate multiplexed data streams (such as merged audio/video streams, which have become extremely common). Often, it’s advisable to place a FIFO at the input port of a streaming-I/O peripheral to help buffer the incoming data stream. Buffering helps to bridge mismatched data rates or inter-burst latencies between the streaming transmitter and receiver.
Xilinx hasn’t discussed any of these details but it’s likely that the EPP will support all three types of I/O transactions. What remains to be seen is what will be supported in hard-core IP and what will need to be implemented in the FPGA fabric.