Here’s a head-scratcher for you. Why not create tesseract FPGAs? A tesseract is the 4-dimensional version of a 3D cube. (Just as a 3D cube can be unfolded to make a set of six connected 2D squares, a tesseract can be unfolded into a set of eight connected 3D cubes.) I’ve loved the word ever since I learned it by reading Robert A. Heinlein’s classic science fiction short story from 1940 called “And He Built a Crooked House” in which an earthquake causes a house built in the unfolded 3D shape of a tesseract to fold into an actual 4D tesseract, trapping the unfortunate occupant inside. If you fold an FPGA into time, you can extrude some of the physical computational circuitry into elsewhen and reduce the amount of circuitry needed to implement your functions. And that is exactly what the new FPGA vendor Tabula has done. The company’s ABAX 3D FPGA architecture gets octuple duty from a LUT cell by fencing it in with eight sets of input/output latches and eight LUT configuration tables. Then, at 8x the “user” clock rate, the FPGA quickly reconfigures the LUT cell, runs part of a calculation, stores the partial result, and proceeds to the next step. The current FPGA design, just announced by Tabula, runs the user clock at 200 MHz and the “Spacetime” clock at 1.6 GHz. As a result, Tabula can offer really “large” FPGAs (in terms of logic cells) at really low prices compared to the big guys: Altera and Xilinx.
Now to do this, you need some magic and you need to value logic-cell capacity over power consumption. First, the magic. Unless you’re going to retrain FPGA users to manually spread their designs across eight time slices, you need to make the 1.6GHz reconfiguration trick work in the background. Altera and Xilinx spent more than a decade trying to sell the idea of spreading designs across time using “on-the-fly reconfigurable logic” and most designers just never latched onto the idea. For some reason, engineers can understand software overlays and DLLs (dynamic-linked libraries) but cannot come to grips with on-the-fly hardware reconfigurability. I think the issue is training more than anything else, but the big FPGA guys just couldn’t sell the idea broadly after trying for years. So there needs to be magic—or some appropriately advanced technology that looks like magic to most of us—to make this trick work.
And there is such magic in the form of an appropriate synthesis tool from Tabula that understands the extra-dimensional aspects of Tabula’s FPGA. The tool takes standard logic designs and “folds” them into time. However, like much of the magic in the Harry Potter book series, this magic isn’t perfect. You don’t necessarily get 8x the logic circuitry from a 1x FPGA. You get about 2.5x according to Tabula, depending on the design. And you get about 2.9x from the 8-ported, 1.6GHz memories on the chip, again, depending on the design. This gap between the real and the ideal reflects the difficulty in developing automated algorithms that can re-pipeline a datapath for additional stages. It’s an art not a science, as any CPU/processor/microprocessor architect will tell you. You can’t always partition one datapath pipleline stage into eight because there just isn’t enough computation taking place in that pipeline stage to allow such expansion or re-pipelining. So, according to Tabula, the average LUT reuse is about 2.5x based on whatever test cases the company used to develop that number.
Now for the power-consumption ramifications. Tabula’s FPGAs trade off die area (in terms of LUTs and on-chip memories) and therefore silicon cost at the expense of power consumption. Running most of the on-chip circuitry at 1.6GHz while delivering the performance of a 200MHz FPGA must cost additional power. In the real world of chip design, power scales linearly with area but superlinearly with frequency, largely due to voltage-rail considerations. You need more voltage to operate at higher clock rates. There’s also the leakage issue caused by setting transistor thresholds to operate at 1.6GHz to contend with. So it’s bound to be a bad tradeoff in terms of power. (I don’t actually know this because it doesn’t seem that Tabula’s been forthcoming about power numbers, but some physics just can’t be bypassed as long as you’re still using off-the-shelf CMOS.)
It’s true that you can sacrifice half of the virtualized Spacetime LUTs and get 400MHz or some other combinations, but folks it’s a 1.6GHz device. Not designed for low power. Design tradeoffs obviously favored device cost, which you can see in the low, blink-inducing prices for the devices. Those prices are indeed mighty attractive for such high logic capacities. However, just about everyone’s worried about power these days, even people designing equipment for those power-sucking data centers that are cooled by diverting nearby rivers through the equipment racks. Every Watt of operating power supplied to the equipment requires an additional Watt for cooling (roughly speaking). A megawatt here, a megawatt there, and pretty soon you’re talking about some real energy consumption. And some real energy costs, which is what truly gets the attention of the data-center managers and owners.
I’ve heard about the Tabula announcements from several sources starting with a morning-of article in the San Jose Mercury News. One of the best technical write-ups I’ve seen so far is this article by Kevin Morris from FPGA Journal. Online comments to Morris’ article suggest that there’s a lot of skepticism in the design community with respect to this new FPGA technology. As with any new technology, even a tesseract FPGA, time will tell if the market accepts this idea or if it will end up on the shelf next to the long-dead and now-dusty remains of reconfigurable logic.