<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Steve Leibson &#187; FPGA</title>
	<atom:link href="http://low-powerdesign.com/sleibson/index.php/category/fpga/feed/" rel="self" type="application/rss+xml" />
	<link>http://low-powerdesign.com/sleibson</link>
	<description>Leibson's Laws and the Penalties for Breaking Them</description>
	<lastBuildDate>Wed, 01 Feb 2012 00:01:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Generation-jumping 2.5D Xilinx Virtex-7 2000T FPGA delivers 1,954,560 logic cells, consumes only 20W</title>
		<link>http://low-powerdesign.com/sleibson/2011/10/25/generation-jumping-2-5d-xilinx-virtex-7-2000t-fpga-delivers-1954560-logic-cells-consumes-only-20w/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/10/25/generation-jumping-2-5d-xilinx-virtex-7-2000t-fpga-delivers-1954560-logic-cells-consumes-only-20w/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 14:00:10 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[2.5D]]></category>
		<category><![CDATA[3D]]></category>
		<category><![CDATA[Virtex]]></category>
		<category><![CDATA[Virtex-7]]></category>
		<category><![CDATA[Xilinx]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=690</guid>
		<description><![CDATA[Xilinx announced today that it is shipping Virtex-7 2000T FPGAs to customers. This is one monster FPGA. Its 6.8 billion transistors deliver 1,954,560 logic cells, 21.55 Mbits of distributed SRAM, 2160 DSP slices, 46,512Kbits of block RAM, four PCIe ports, 36 &#8230; <a href="http://low-powerdesign.com/sleibson/2011/10/25/generation-jumping-2-5d-xilinx-virtex-7-2000t-fpga-delivers-1954560-logic-cells-consumes-only-20w/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Xilinx</strong> announced today that it is shipping <strong>Virtex-7 2000T FPGAs</strong> to customers. This is one monster FPGA. Its 6.8 billion transistors deliver 1,954,560 logic cells, 21.55 Mbits of distributed SRAM, 2160 DSP slices, 46,512Kbits of block RAM, four PCIe ports, 36 12.5Gbps GTX serial transceivers, and 1200 user I/O pins. All in about 20W (!!!). The only fly in the ointment, if you want to call it that, is that no one on this planet can make this FPGA as a monolithic device. The Virtex-7 FPGA is a 2.5D assembly that combines four FPGA tiles on a silicon interposer. The interposer provides 10,000 connections between each of the four FPGA tiles</p>
<p>Here’s an exploded diagram of the FPGA assembly:</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/Xilinx-Virtex-7-2000T-Exploded-Diagram.jpg"><img class="aligncenter size-full wp-image-691" title="Xilinx Virtex-7 2000T Exploded Diagram" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/Xilinx-Virtex-7-2000T-Exploded-Diagram.jpg" alt="" width="540" height="350" /></a></p>
<p>I’ll be covering the 2.5D/3D assembly aspects of this new FPGA in more detail on my <strong>EDA360 Insider</strong> blog this coming Thursday (for 3D Thursday), but I want to discuss the low-power aspects of this device here, now.</p>
<p>The obvious contributor to the Virtex-7 2000T FPGA’s power profile is the use of the 28nm TSMC HPL (high-performance, low-power) high-K, metal-gate (HKMG) process technology. Xilinx chose TSMC’s 28 HPL process to make all of its Series 7 FPGA family members (Virtex-7, Kintex-7, and Artex-7) over TSMC’s 28 HP and 28 P process technologies. Instead of letting us know this fact and leaving it there,  Xilinx published a White Paper that goes into great detail about the decision (“<a href="http://www.xilinx.com/support/documentation/white_papers/wp389_Lowering_Power_at_28nm.pdf" target="_blank">Lowering Power at 28nm with Xilinx 7 Series FPGAs</a>”).</p>
<p>Xilinx considered all three of the TSMC 28nm process technologies for the 7-series FPGA families but the company quickly locked on the two HKMG processes (HP and HPL) as being the “best” for FPGA design. Because Xilinx wanted to use just one process technology to cover all of the planned Series-7 FPGA families from high-performance to low-power, HKMG promised the best mix of performance and leakage for the company’s unified approach to designing all of the Series-7 FGPA families. TSMC’s 28 LP process uses PolySiON (polysilicon/silicon oxy-nitride) gate insulation and is best suited for designs that require less performance than FPGAs. The PolySiON 28 LP process produces transistors that are about 13% slower than those produced in the 28 HPL and 28 HP processes (for the types of transistors Xilinx would be using to build its Series-7 FPGAs) while exhibiting more than twice the leakage. The advantage of the 28 LP process is that it’s less expensive.</p>
<p>Eliminating the 28 LP process as a possibility left the choice between TSMC’s 28 HPL and 28 HP processes. Both processes can produce equally fast transistors but the 28 HP process produces transistors with about twice the leakage of the 28 HPL process for the types of transistors Xilinx would be using to build its Series-7 FPGAs. According to the White Paper, TSMC’s 28 HP process is better suited to GPU and CPU designs that require the ultimate performance and that have the power budget (~100W) to achieve that performance. The maximum Xilinx Series-7 FPGA power budget is 40W, so the company selected TSMC’s 28 HPL process technology. However, in a demo last week, Xilinx showed the Virtex-7 2000T FPGA simultaneously running 3600 copies of its 8-bit nanoBlaze processor core, delivering 180,000 MIPS while consuming 20W. That’s a really impressive amount of computation for the power.</p>
<p>However, it’s the system implications of the Virtex 7 2000T FPGA’s huge capacity that really have an impact on system power consumption. Here is an example diagram that Xilinx used to showcase the power-consumption advantage of the Virtex-7 2000T FPGA:</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/Xilinx-Virtex-7-2000T-Power-Advantage.jpg"><img class="aligncenter size-full wp-image-692" title="Xilinx Virtex-7 2000T Power Advantage" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/Xilinx-Virtex-7-2000T-Power-Advantage.jpg" alt="" width="560" height="319" /></a></p>
<p>In this image, Xilinx claims that one Virtex-7 2000T FPGA delivers the equivalent capability of four “competitive” FPGAs, each with nearly 1M logic cells. Xilinx came to this conclusion based on characteristics other than logic-cell count and I’m not going to tell you that this diagram shows an apples-to-apples comparison. That’s not my point in using this diagram.</p>
<p>For my purposes, the diagram shows four smaller FPGAs tied together with many high-speed differential pairs. Each composite FPGA-to-FPGA link burns 8W. That’s 32W total used just for chip-to-chip communications. To me, that’s the secret low-power sauce that the 2.5D assembly approach of the Xilinx Virtex-7 2000T has. The extremely wide I/O provided by the silicon interposer drastically reduces the power consumption of tile-to-tile communications and this power consumption is included in the device’s overall power consumption number: 20W. That means the FPGA power consumption is essentially “free” if you look at the wrong end of the telescope.</p>
<p>Where might this reduction in power consumption come in handy? At the announcement, Xilinx VP of FPGA Development and Silicon Technology Liam Madden discussed two example cases relevant to this discussion.</p>
<p>The first example involved a customer looking at developing a large ASIC for a communications application. The defining performance characteristic was the need to handle a terabit/sec aggregate data rate using an estimated 20M gates for the logic. The power budget was about 30W. The customer knew that it wanted some amount of programmability and was therefore considering a 3-chip solution with one ASIC and two FPGAs. The estimated time to develop this design was three years and the estimated power consumption was 70W. Way out of the ballpark. One Xilinx Virtex-7 2000T filled the bill and beat the power budget by 30%.</p>
<p>The second example involved the replacement of six chips with one device. One Xilinx Virtex-7 FPGA swallowed all six devices and delivered 5x the performance of the existing multi-chip system while consuming 1/7 of the power.</p>
<p>Here’s the before picture:</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/6-chip-solution.jpg"><img class="aligncenter size-full wp-image-693" title="6-chip solution" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/6-chip-solution.jpg" alt="" width="424" height="258" /></a></p>
<p>And here’s the “after” picture showing the function of all six chips compiled into one Xilinx Virtex-7 2000T FPGA:</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/6-chips-in-a-Virtex-7-2000T.jpg"><img class="aligncenter size-full wp-image-694" title="6 chips in a Virtex-7 2000T" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/6-chips-in-a-Virtex-7-2000T.jpg" alt="" width="326" height="401" /></a></p>
<p>Clearly, 2.5D and 3D assembly is going to have a major influence on the way we design low-power systems in the future.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/10/25/generation-jumping-2-5d-xilinx-virtex-7-2000t-fpga-delivers-1954560-logic-cells-consumes-only-20w/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Altera introduces SoC FPGA melding ARM Cortex-A9 dual-core processor complex with a 28nm FPGA fabric</title>
		<link>http://low-powerdesign.com/sleibson/2011/10/11/altera-introduces-soc-fpga-melding-arm-cortex-a9-dual-core-processor-complex-with-a-28nm-fpga-fabric/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/10/11/altera-introduces-soc-fpga-melding-arm-cortex-a9-dual-core-processor-complex-with-a-28nm-fpga-fabric/#comments</comments>
		<pubDate>Tue, 11 Oct 2011 12:00:33 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SDRAM]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[Altera]]></category>
		<category><![CDATA[SoC FPGA]]></category>
		<category><![CDATA[Xilinx]]></category>
		<category><![CDATA[Zynq]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=668</guid>
		<description><![CDATA[Xilinx first started to talk publicly about the fusion of processors and FPGAs—a product now known as Zynq—in 2010 and has announced plans to roll out parts by the end of this year. It was inevitable that Altera would eventually &#8230; <a href="http://low-powerdesign.com/sleibson/2011/10/11/altera-introduces-soc-fpga-melding-arm-cortex-a9-dual-core-processor-complex-with-a-28nm-fpga-fabric/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Xilinx first started to talk publicly about the fusion of processors and FPGAs—a product now known as Zynq—in 2010 and has announced plans to roll out parts by the end of this year. It was inevitable that Altera would eventually counter with a competing product line. Today the company revealed plans for a line of chips called SoC FPGAs and comparisons between the Altera and Xilinx offerings are inevitable, but let’s look at the details for the Altera offerings.</p>
<p>The SoC FPGA line will include at least 18 different chips with various configurations for the “Hard Processor System” (HPS) and various sizes for the FPGA fabrics connected to the HPS block. In addition, the SoC FPGA product line will be based on two of the Altera 28nm FPGA fabrics—Cyclone V and Arria V—for two different speed grades within the product line. Here’s a generalized block diagram of a device in the product line:</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/Altera-SoC-FPGA-Block-Diagram.jpg"><img class="aligncenter size-full wp-image-669" title="Altera SoC FPGA Block Diagram" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/Altera-SoC-FPGA-Block-Diagram.jpg" alt="" width="513" height="757" /></a></p>
<p>The SoC FPGAs’ HPS is based on two 800MHz ARM Cortex-A9 processor cores with ARM Neon and single/double-precision FPU extensions. Each ARM Cortex-A9 processor has its own L1 caches—separate 32Kbyte L1 caches for instructions and data. The two processor cores share a unified 512Kbyte L2 cache. Each processor also has private interval and watchdog timers. To keep the two processor cores fed with instructions and data, there’s a hard-core, multiport DDR SDRAM controller in the HPS that supports the DDR2 and DDR3 and the LPDDR1 and LPDDR2 SDRAM interface protocols. There’s also a Flash memory controller with a built-in DMA engine. The NAND Flash controller supports NOR and NAND Flash memories including ONFi 1.0 devices and SD, SDIO, and MMC memory cards. In addition, there’s ECC support for the SDRAM and the NAND Flash interfaces.</p>
<p>Next up are the hard-core peripherals within the HPS. There are a lot of them:</p>
<ul>
<li>Two 10/100/1000 Ethernet MACs with DMA</li>
<li>Two USB 2.0 On-the-Go (OTG) controllers with DMA</li>
<li>Four I2C controllers</li>
<li>Two CAN (Controller Area Network) controllers</li>
<li>SPI Master and SPI Slave ports</li>
<li>Two UARTs</li>
<li>General-purpose ports</li>
</ul>
<p>On-chip memory includes 64Kbytes of RAM and a boot ROM.</p>
<p>That’s already quite a lot but then there’s the FPGA section  of the SoC FPGA to consider. On-chip FPGA capacity varies depending on whether the particular SoC FPGA device is based on the Cyclone V or Arria V FPGA fabrics. Devices based on the Cyclone V FPGA fabric will be offered with 25K, 40K, 85K, and 110K logic elements. Devices based on the Arria V FPGA fabric will be offered with 350K and 460K logic elements.</p>
<p>The HPS in the Altera SoC FPGA connects to the on-chip FPGA fabric though two 128-bit AXI buses—one for reads and one for writes. As you can see from the block diagram above, the hard-core peripherals not included in the HPS block separately connect to the FPGA fabric. What’s not apparent from the diagram is that the two ARM Cortex-A9 processors share a Snoop Control Unit (SCU) and there&#8217;s an ACP (accelerator coherency port) linking the HPS to the FPGA fabric so it’s possible to engineer accelerators that maintain coherency with the ARM Cortex-A9 processor cores&#8217; caches and implement them using the on-chip FPGA fabric.</p>
<p>In addition to the six FPGA array sizes (four for the devices based on the Cyclone V FPGA fabric and two for devices based on the Arria V FPGA fabric), Altera plans to offer parts with three HPS subsystem configurations: base, mid, and high. Combined with the six FPGA fabric sizes, that means there are at least 18 Altera SoC FPGA parts planned for the initial product lineup. Altera says that there will also be 1-processor variants in the SoC FPGA lineup. Just in case you suspect that’s perhaps a bit underpowered, keep in mind that essentially 100% of all system designs based on microcontrollers use a far less capable processor core than one 800MHz ARM Cortex-A9 core. You might want to check to make sure you’re not becoming overly acclimatized to multicore designs. On the other hand, if you’re running Android then two capable processor cores will come in handy.</p>
<p>As the block diagram above shows, there are additional hard-core peripherals connected to the SoC FPGA chip’s FPGA array: as many as three more multiport SDRAM controllers, a Gen2 x4 PCIe port (supplemented with the possibility of implementing a soft Gen2 x8 PCIe port in the FPGA fabric), and as many as six 10Gbps high-speed, differential  serial transceivers and as many as thirty 6Gbps high-speed, differential  serial transceivers. These additional peripheral ports have separate access paths into the FPGA fabric of the SoC FPGA devices.</p>
<p>Perhaps the most interesting news is that low-end members of the Altera SoC FPGA family will sell for $15 in “high volumes.” That’s a lot of capability for a relatively low price. In fact, that’s a very low price in the FPGA world. The bad news is that Altera doesn’t plan to ship devices until the second half of 2012.</p>
<p>So that’s the Altera SoC FPGA. Now for the inevitable comparison based on my previous write-ups of the Xilinx Zynq. (See “<a href="http://low-powerdesign.com/sleibson/2011/03/01/xilinx-zynq-epps-create-a-new-category-that-fits-in-among-socs-fpgas-and-microcontrollers/" target="_blank">Xilinx Zynq EPPs create a new category that fits in among SoCs, FPGAs, and microcontrollers</a>”.) First, there’s the processor complex—what Altera calls the HPS. The two products are remarkably similar here: two 800MHz ARM Cortex-A9 processor cores with Neon DSP and FPU extensions, 512Kbytes of unified L2 cache, Flash controller, one SDRAM controller, Snoop Control Unit, timers and watchdog, DMA, etc. Both processor complexes support an ACP (Accelerator Coherency Port) interface into the FPGA fabrics.</p>
<p>There’s some difference in the processor complex-to-FPGA connection scheme: Altera offers one 128-bit read and one 128-bit write AXI bus and a 32-bit APB (Advanced Peripheral Bus) port plus additional ports that go directly from the FPGA fabric to the multiport SDRAM controller in the HPS. Xilinx offers four 32-bit and four 64-bit AXI ports plus direct access from the FPGA fabric to the SDRAM controller. So the Xilinx parts theoretically provide more raw interconnect bandwidth between the processor complex and the FPGA fabric than do the Altera parts. It remains to be seen if that raw capability can deliver more bandwidth in practice, but the potential is clearly there.</p>
<p>But wait! Hold on there! The Altera SoC FPGA parts offer as many as three more SDRAM controllers outside of the hard-core HPS processor complex and those SDRAM controllers can be connected either to devices implemented as soft cores in the on-chip FPGA fabric or through the FPGA fabric to the HPS. That added SDRAM control capability could really be an advantage in systems with extremely high SDRAM bandwidth requirements.</p>
<p>Then there’s the PCIe controller. On the Altera SoC FPGAs, there’s one hard-core Gen2 x4 PCIe port and the possibility of implementing a second, soft-core Gen2 x8 PCIe port in the FPGA fabric. The Xilinx Zynq parts will provide a hard-core Gen2 x4 or x8 PCI port, depending on the family member. There are additional 10.3Gbps serial channels available on the Xilinx Zynq components, so a soft-core PCIe controller is a possibility, as it is for the Altera SoC FPGAs.</p>
<p>Since I’ve brought up the topic of the FPGA fabric, let’s compare those as well. The various Altera SoC FPGA family members offer six FPGA fabric sizes: 25K, 40K, 85K, 110K, 350K, and 460K logic elements. The announced Xilinx Zynq family offers four fabric sizes: 30K, 85K, 125K, and 235K logic elements. So if you need really big FPGA fabrics to complement the capabilities provided by the processor complexes, then the Altera SoC FPGA family seems to offer more capacity for now. However, should a battlefield form at the high end, you can bet that Xilinx will be filling out the product line at the high end, where there’s more margin to be made.</p>
<p>Finally there’s pricing and availability. Both companies have announced high-volume unit pricing “below $15”  but the Xilinx parts are supposed to be available this year and the Altera parts are scheduled to appear in the latter part of next year.</p>
<p>Together, today&#8217;s Altera SoC FPGA announcement and the previous Xilinx Zynq announcements create a truly exciting new product category—one that fuses FPGAs with high-performance microprocessors in a way guaranteed to dramatically extend the reach of FPGAs. The resulting mixture of capability, performance, power consumption, and cost simply cannot be replicated with a 2-chip design.</p>
<p>I predict that many system designers will be unable to resist this combination. Naysayers will point to previous failed attempts at merging FPGAs and hard microprocessor cores and some will predict a similar fate for this new generation of parts. Much has changed. First, the embedded industry has adopted the ARM architectures  and there is a large body of programming talent available for this architecture. Second, these new parts are not FPGAs with processor cores tacked on. They are very capable and complete processor complexes, application processors in their own right, augmented with FPGA fabrics. From my perspective, the Altera SoC FPGAs and Xilinx Zynq parts stand a very good good chance of definining a new and vibrant component category.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/10/11/altera-introduces-soc-fpga-melding-arm-cortex-a9-dual-core-processor-complex-with-a-28nm-fpga-fabric/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Going against the low-power grain to resurrect and improve a 31-year-old HP calculator</title>
		<link>http://low-powerdesign.com/sleibson/2011/05/01/going-against-the-low-power-grain-to-resurrect-and-improve-a-31-year-old-hp-calculator/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/05/01/going-against-the-low-power-grain-to-resurrect-and-improve-a-31-year-old-hp-calculator/#comments</comments>
		<pubDate>Sun, 01 May 2011 00:46:02 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[41C]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[HP 41C]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=546</guid>
		<description><![CDATA[Monte J. Dalrymple is a man with a mission: take one of HP’s most celebrated calculators, the HP 41C, and bring it into the 21st century. He’s done this by reverse engineering the 30-year-old CMOS “Nut” processor designed into the &#8230; <a href="http://low-powerdesign.com/sleibson/2011/05/01/going-against-the-low-power-grain-to-resurrect-and-improve-a-31-year-old-hp-calculator/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Monte J. Dalrymple is a man with a mission: take one of HP’s most celebrated calculators, the HP 41C, and bring it into the 21<sup>st</sup> century. He’s done this by reverse engineering the 30-year-old CMOS “Nut” processor designed into the HP 41C—which ran at a heart-stopping 360 KHz—and installing that processor, plus an MMU, 100 plug-in ROM module images, and a few more features into an Actel FPGA on a replacement CPU board than can operate as much as 50 times faster than the original. (Processor speed is programmable.) All to upgrade a calculator that hasn’t been manufactured for more than 20 years. Such is the stuff of obsession. (Dalrymple says his day job is working as a processor designer.)</p>
<p> </p>
<p>The original Nut processor with its third-of-a-MHz processor consumed 10 microamps of standby power. The new FPGA-based processor, now dubbed “NEWT,” consumes 11 times that amount, a whole 110 microamps. So standby battery power reduces from more than a year to perhaps a month or two. (The HP 41C runs on four N batteries.) The new processor draws 7.9 mA while running.</p>
<p> </p>
<p>Here’s a photo showing the original and the new CPU boards (shown left and right respectively).</p>
<p> </p>
<p><img class="aligncenter size-full wp-image-547" title="HP 41CL processor board" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/05/HP-41CL-processor-board.jpg" alt="HP 41CL processor board" width="540" height="254" /></p>
<p> </p>
<p>Note that the old CPU board exclusively uses through-hole IC packaging technology, which forced the ROMs (in 8-pin DIPs) to be piggybacked due to the lack of circuit board real estate. The NEWT CPU board uses modern surface-mount technology, developed back in the 1980s after the HP 41C was developed.</p>
<p> </p>
<p>In case you too are obsessed with this pinnacle of HP calculator development…</p>
<p>You’ll find the manual for the NEWT processor <a href=" http://www.systemyde.com/pdf/newt.pdf" target="_blank">here</a>.</p>
<p> </p>
<p>You’ll find the manual for the HP 41CL replacement processor board <a href="http://www.systemyde.com/pdf/sy41cl.pdf" target="_blank">here</a>.</p>
<p> </p>
<p>And you’ll find an article about the project <a href="http://systemyde.com/pdf/hhc2010.pdf" target="_blank">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/05/01/going-against-the-low-power-grain-to-resurrect-and-improve-a-31-year-old-hp-calculator/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Xilinx Zynq EPPs create a new category that fits in among SoCs, FPGAs, and microcontrollers</title>
		<link>http://low-powerdesign.com/sleibson/2011/03/01/xilinx-zynq-epps-create-a-new-category-that-fits-in-among-socs-fpgas-and-microcontrollers/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/03/01/xilinx-zynq-epps-create-a-new-category-that-fits-in-among-socs-fpgas-and-microcontrollers/#comments</comments>
		<pubDate>Tue, 01 Mar 2011 11:30:14 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[FPGA]]></category>
		<category><![CDATA[IP]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[LPDDR]]></category>
		<category><![CDATA[LPDDR2]]></category>
		<category><![CDATA[SOC]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=502</guid>
		<description><![CDATA[After telegraphing its punch at ESC last spring, Xilinx has now introduced the first four members of its EPP product line and named them Zynq to differentiate them from the company’s FPGAs. (See “Xilinx redefines the high-end microcontroller with its &#8230; <a href="http://low-powerdesign.com/sleibson/2011/03/01/xilinx-zynq-epps-create-a-new-category-that-fits-in-among-socs-fpgas-and-microcontrollers/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>After telegraphing its punch at ESC last spring, Xilinx has now <a href="http://www.prnewswire.com/news-releases/xilinx-introduces-zynq-7000-family-industrys-first-extensible-processing-platform-117132003.html" target="_blank">introduced</a> the first four members of its EPP product line and named them Zynq to differentiate them from the company’s FPGAs. (See “<a href="http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-extensible-processing-platform-%E2%80%93-part-1/" target="_blank">Xilinx redefines the high-end microcontroller with its ARM-based Extensible Processing Platform – Part 1</a>” and “<a href="http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-arm-based-extensible-processing-platform-%e2%80%93-case-studies-%e2%80%93-part-2/" target="_blank">Xilinx redefines the high-end microcontroller with its ARM-based Extensible Processing Platform – Case Studies – Part 2</a>”.) Two of the four Zynq family members are designed for low-power applications and the other two emphasize performance over power. “What’s an EPP?” you might ask. It’s an “Extensible Processing Platform,” a new IC category Xilinx hopes to create. Think of an EPP as an embedded processor with an attached FPGA fabric. “Haven’t they tried this before?” you’re now asking. Yes, they have. This time, the difference is that Xilinx is emphasizing the “processor” aspect of the device over the FPGA aspect—and you can expect that change in emphasis to make all the difference.</p>
<p>The Xilinx Zync EPP family is designed to wedge in between ASICs or SoCs, microcontrollers, and FPGAs. What Xilinx has done is leverage its 28nm expertise—earned from its development of the company’s Artix/Kintex/Virtex-7 FPGAs—and used that  expertise to develop a new type of product that’s mostly hardened processor cores (with associated memory and peripherals) and then added a layer of FPGA fabric, like icing on a cake, to produce a new confection. With the smaller Zynq parts selling for less than $15 in volume, these confections will clearly catch the eye of many, many system designers trying to get the most bang for their silicon buck. Zynq EPPs will be available in first silicon starting in the second half of 2011 with general engineering samples available in 1H2012.</p>
<p>Here’s a family block diagram of the Xilinx Zynq EPPs:</p>
<p><img class="aligncenter size-full wp-image-523" title="Xilinx Zynq Block Diagram v2" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/03/Xilinx-Zynq-Block-Diagram-v2.jpg" alt="Xilinx Zynq Block Diagram v2" width="580" height="494" /></p>
<p>At their hearts, each of the four Xilinx Zynq EPPs is a dual-core embedded processor based on two 800MHz ARM Cortex-A9 processors. Each processor is augmented with a copy of ARM’s NEON SIMD engine, a double-precision floating-point unit, 32 Kbytes of instruction cache, and 32 Kbytes of data cache. The two processor cores share a 512Kbyte unified L2 cache. Separate memory controllers, one for DRAM and one for Flash, connect the processor cores to external memory. You need two controllers because DDR DRAMs and Flash devices require radically different control algorithms for optimum operation.</p>
<p>There are a large number of additional peripherals on these chips—all in hard-core form—including two Gigabit Ethernet controllers; two USB 2.0 ports (with USB On-The-Go capability); two SDIO ports for talking to SD Flash media cards; two UARTs; two CAN bus controllers for automotive applications; two 12-bit 1Msample/sec A/D converters with 17 analog inputs; two I2C ports and two SPI ports for talking to serial peripherals; some GPIO pins for whatever else you need to talk to; and an 8-channel DMA controller to move data around the chip.</p>
<p>So far, the Zynq EPPs look like very nice, dual-core embedded processors. What happens next is part of Xilinx’ strategy to create an entirely new product category. Using the ARM AMBA 4 AXI4 interconnect as a connection matrix, Xilinx has driven four 32-bit and four 64-bit AXI4 ports into a block of FPGA fabric. The point of the included FPGA fabric is to allow system designers to create peripheral devices not already on the chip in hard-core form. (Note, Cadence introduced a <a href="http://eda360insider.wordpress.com/2011/02/28/cadence-rolls-out-huge-vip-catalog-merging-verification-ip-from-cadence-with-vip-from-denali-acquisition/" target="_blank">new verification IP catalog</a> with an AMBA4 VIP model just yesterday.)</p>
<p>The actual FPGA fabric capacity included on the Zynq EPPs ranges from 30,000 to 235,000 logic cells, depending on the Zynq family member. Xilinx will tell you that those logic-cell capacities are approximately equivalent to 430,000 to 3.5 million ASIC gates. How did Xilinx get these equivalent numbers? By multiplying by 15. Where did “15” come from? It’s an average, derived from the observation that one logic cell appears to do the job of 10 to 20 ASIC gates across a range of designs. Are the “ASIC gates” equivalencies accurate? Looks like plus or minus 33% to me. The Zynq FPGA fabrics also house block RAMs ranging in capacity from 240 Kbytes to 1.86 Mbytes and they include the usual MACs now commonly found in FPGA fabrics.</p>
<p>Each AMBA4 AXI4 port that bridges the processor complex to the FPGA fabric has a dual arbiter to handle simultaneous accesses from the various masters on the chip. A ninth port, based on the ARM Cortex-A9 ACP (accelerator coherency port) connects the processors’ snoop control unit to the FPGA fabric. The ACP provides a device, such as an external DMA controller, with direct access to CPU-coherent data regardless of where the data is in the CPU cache and memory hierarchy.</p>
<p>The two members of the Zynq family designed for low-power applications incorporate an FPGA fabric based on Xilinx’ Artix-7 FPGAs and the two high-performance members of the Zynq family incorporate an FPGA fabric based on the company’s Kintex-7 FPGAs. The two high-performance Zynq devices also sport either four or twelve 10.3Gbps serial transceiver channels and a PCIe Gen2 controller (4- or 8-lane depending on the Zynq family member).</p>
<p>Notably, it’s the hard-core processor section of the Zynq device that powers up first after a reset, which allows the OS to boot and some of the application code to start executing. This is a familiar environment for any embedded software team. After the processors are up and running, the code can then configure the FPGA fabric.</p>
<p>Here’s a table of the key attributes for the four initial members of the Xilinx Zynq EPP family:</p>
<p><img class="aligncenter size-full wp-image-504" title="Xilinx Zynq Family Table" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/02/Xilinx-Zynq-Family-Table.jpg" alt="Xilinx Zynq Family Table" width="600" height="357" /></p>
<p>Enough about the Zynq silicon. The development tools are equally important for such an extensively programmable and configurable device. Xilinx will be providing a $495, Eclipse-based Platform Studio Software Development Kit for the Zynq family. The on-chip ARM Cortex-A9 processor cores open the wide world of ARM’s development ecosystem is open to design teams using Zynq parts.</p>
<p>There are at least a couple of alternatives for developing peripheral blocks in the Zynq EPP FPGA fabrics. The Xilinx ISE Design Suite is the company’s standard FPGA development environment so any designer accustomed to developing logic designs with Xilinx FPGAs will feel at home. The design suite includes both development tools and plug-and-play peripheral IP with AMBA4 AXI4 interfaces that can be dropped into place on the chips. Xilinx has standardized on the AMBA4 AXI4 interconnect standard for its IP block interfaces for both EPPs and FPGAs. Hence the eight AMBA4 AXI4 ports on the Zynq parts. The Xilinx IP blocks also include bus-functional models for system simulation.</p>
<p>Xilinx has created a compelling value proposition with the new Zynq EPPs. It’s quite common for system-design teams to couple some sort of embedded processor with an FPGA in many designs that haven’t the volume needed to justify the design of a custom SoC. The Zynq EPPs offer yet another alternative—one that merges a dual-core embedded processor with a state-of-the-art FPGA fabric and connects the two with a high-bandwidth connection. Moreover, the Xylinx Zynq EPPs give system designers access to 28nm process technology at a relatively low component cost, low NRE (no need to redesign the processor complex), and zero mask and fab costs.</p>
<p>This mixture of capability, performance, and cost simply cannot be replicated with a 2-chip design. Going forward, few system-design teams will be able to avoid at least considering Zynq EPPs in their preliminary architectural explorations. Sure, if you’re building a mobile telephone handset, then a Zynq EPP clearly isn’t for you. If a low-cost microcontroller selling for a buck or so will do the job, that’s an obvious right choice. Custom SoCs still win the day for high-volume, low-power, high-performance applications. For in-between system designs, Zynq EPPs seem like they’re going to be mighty attractive.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/03/01/xilinx-zynq-epps-create-a-new-category-that-fits-in-among-socs-fpgas-and-microcontrollers/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Low-Power Design with FPGAs: The Basics</title>
		<link>http://low-powerdesign.com/sleibson/2010/10/02/low-power-design-with-fpgas-the-basics/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/10/02/low-power-design-with-fpgas-the-basics/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 22:24:09 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SOC]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=456</guid>
		<description><![CDATA[Spiraling complexity in all facets of electronic design often cause us to take our eyes off the basics. A recent paper presented at the IEEE International Conference on Intelligent Control and Information Processing (ICICIP 2010), held in Dalian, China in &#8230; <a href="http://low-powerdesign.com/sleibson/2010/10/02/low-power-design-with-fpgas-the-basics/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Spiraling complexity in all facets of electronic design often cause us to take our eyes off the basics. A recent paper presented at the IEEE International Conference on Intelligent Control and Information Processing (ICICIP 2010), held in Dalian, China in August is helpful simply by reminding us of some of the basics of low-power system design when using FPGAs. The paper’s title is “Survey of FPGA Low Power Design” and it was written by Fuming Sun, Haiyang Wang, Fei Fu, and Xiaoying Li. Sun is with University of Science &#038; Technology, Beijing, China. Wang is with JiangNan Institute of Computing Technology, China. Fei Fu is with PLA 65035 troops, China. Li is with the Cadence Beijing R&#038;D Center.</p>
<p>The paper starts by pointing out that there are three primary factors that determine FPGA power dissipation: static leakage power, short-circuit switching power, and dynamic power. Static power is due to transistor leakage and it has grown in importance as the lithographic features used for FPGA manufacturing shrink. It’s not the shrinking features that cause the leakage however; it’s the exponential increase in the number of transistors, the subsequent lowering of power-supply voltages to reduce dynamic power dissipation, and the corresponding drop in transistor threshold voltages to accommodate the lower supply rail. As a result of the narrowing between transistor threshold voltage and supply voltage, it becomes harder and harder to switch the transistors fully off and so the transistors leak current even when “off.” When there are hundreds of millions of transistors on the FPGA, the resulting leakage becomes large. Therefore, one rule for reducing power consumption in FGPA-based systems is simply to be sure you’re using the smallest possible FPGA to minimize the number of leaky transistors in your design.</p>
<p>Dynamic power is consumed during switching events within the FPGA’s core or I/O logic. One of two ways to reduce dynamic power is to reduce the switching toggle rate wherever possible. That’s the way we usually express this idea. However, the paper presents a different and useful way to think about this goal: “it is desirable to maximize the amount of functionality for each toggle of a high fan-out net.” In other words, figure out how to maximize the amount of work done each clock cycle to reduce the number of required clock cycles. Another common word for this approach is “parallelism.” The more parallelism you can achieve in your design, the lower the required clock rate. The paper also recommends the use of dual-edge triggered flip flops so that work’s done on both clock edges, but not many FPGAs support this kind of flip flop although some CPLDs apparently do. Consequently, it’s unclear just how practical this particular idea is.</p>
<p>However, emphasizing parallelism actually increases the number of transistors needed to implement your design so at the RTL-design level you want to reduce unexpected glitches to minimize short-circuit and dynamic-switching power. You do that with clock synchronization and gated clocks. Insertion of gated clocks switches off clocks to registers that need not toggle on a certain cycle, which reduces dynamic power consumption but does not affect operation. According to the paper, clock gating can reduce dynamic power consumption by 30% to 40%, which is a clear and substantial benefit.</p>
<p>There are a few more helpful hints within this paper. First, the paper recommends that you take advantage of as many of the on-chip hard IP cores (including DSP blocks, FIFOs, multipliers, etc) as you can. These hard cores consume only about 20% of the power compared to the same functions implemented in the programmable FPGA fabric. The number of hard cores on FPGAs is increasing. For example, Xilinx’s recently announced EPP family emphasizes the use of hard cores to implement core microprocessor subsystems and encourages the use of the FPGA fabric to implement only those functions that demand a customized design approach.</p>
<p>Finally, the paper notes that RAMs are a major contributor to power consumption in many FPGA-based designs and that block RAM can be a major contributor to the overall power dissipation within the FPGA. The paper recommends that you should be careful to choose the right RAM primitives for the job and your design should ensure that the block RAM is only enabled when data is needed.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/10/02/low-power-design-with-fpgas-the-basics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More on the Xilinx EPP: Three ways to communicate with on-chip peripherals</title>
		<link>http://low-powerdesign.com/sleibson/2010/06/02/more-on-the-xilinx-epp-three-ways-to-communicate-with-on-chip-peripherals/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/06/02/more-on-the-xilinx-epp-three-ways-to-communicate-with-on-chip-peripherals/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 03:11:06 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[processor]]></category>
		<category><![CDATA[Xilinx]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=358</guid>
		<description><![CDATA[Last month I discussed the newly introduced Xilinx Extensible Processing Platform (EPP), which represents a new product line and a new venture for FPGA leader Xilinx. To briefly recap, devices in the EPP device family are essentially a high-end microcontroller &#8230; <a href="http://low-powerdesign.com/sleibson/2010/06/02/more-on-the-xilinx-epp-three-ways-to-communicate-with-on-chip-peripherals/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Last month I discussed the newly introduced Xilinx Extensible Processing Platform (EPP), which represents a new product line and a new venture for FPGA leader Xilinx. To briefly recap, devices in the EPP device family are essentially a high-end microcontroller or embedded processor based on two ARM Cortex-A9 32-bit RISC processor cores (implemented as hard IP cores and not soft cores in the FPGA fabric), some amount of SRAM used largely for processor cache, some standard peripheral blocks implemented as hard IP cores, and multiple AMBA 4 interconnect buses that link the hard-core, on-chip IP blocks with an FPGA fabric that you can use to create additional peripheral devices or anything else you might need for the digital portion of your embedded design. These Xilinx devices will sell for the low tens of dollars and will consume much less power than full-tilt FPGAs, making them very attractive replacements for 32-bit microcontrollers and standalone processors in certain applications. This month, I want to focus on how you might use those multiple on-chip AMBA 4 buses to communicate with whatever you’ve implemented in the EPP’s FPGA fabric. Xilinx hasn’t yet discussed this sort of technical information, but it’s not too hard to project some basic facts.</p>
<p>There are essentially only three fundamental ways to use the Xilinx EPP’s on-chip AMBA 4 buses to communicate with peripheral devices whether they are hard cores outside of the FPGA fabric or soft cores implemented in the FPGA fabric. Those three ways are: registers, memory-mapped RAM, or streaming. Each of these communications approaches has advantages and disadvantages depending on application needs.</p>
<p>I/O data, control, and status registers date back to the earliest days of peripheral chips that were introduced along with the very first wave of microprocessors back in the 1970s. Back then, registers were generally no wider than eight bits. Data registers were almost always eight bits wide and permitted the passing of individual bytes back and forth between the processor and whatever I/O device lay beyond the peripheral chip. There were peripheral chips for simple parallel I/O, UARTs (universal asynchronous receiver/transmitters) for serial I/O, timer chips, interrupt controllers, and that was pretty much all there was at first.  Each control and status register in these peripheral chips had individual bits and bit groups that implemented specific functions such as “set the output pins to be low-true” or “enable the interrupt pin.”</p>
<p>I/O registers were implemented as individual latches, so it was easy to take the output of a latch bit and use it for driving another piece of hardware inside of the peripheral chip or to take a signal and connect it to the D input of a status-register bit. We still use I/O status and control registers in precisely the same way today, inside of large peripheral blocks like Ethernet and video controllers. We simply use a lot more registers than before and they tend to be wider than eight bits these days.</p>
<p>Memory-mapped I/O maps a large array of bus-addressed memory locations into a linear memory array inside of the peripheral device. Often, this memory array is implemented as a RAM inside of the peripheral device but if the memory array is small enough, it might be implemented as a large register bank instead of RAM.</p>
<p>The earliest use for such memory-mapped arrays in I/O chips was for memory-mapped video. The CPU could write an image to memory-mapped video RAM and a simple sequencing controller read out the video and sent it to the display. Initially, access to the video RAM had to be interleaved between processor and display sequencer but eventually as display speeds and resolution increased, video RAM became dual-ported to handle the rising number of access cycles per unit time.</p>
<p>Originally, it took an entire board to create a memory-mapped video controller. I recall using a Vector Graphics Flashwriter video display card in my North Star Horizon S-100 computer to implement fast video for a an early WordStar editing system. I had to write the low-level video drivers in Z80 assembly code to connect the Flashwriter to the CP/M operating system and to WordStar itself. That was back in 1979 and things were mighty primitive back then. The advantage of the memory-mapped video back then was performance. The North Star’s Z80 CPU could directly manipulate every character location on video display without using the serial escape sequences mandated by the use of RS-232 terminals. The processor would write characters directly to the screen with a simple byte move; it could examine characters with a simple byte read; and it could change the character’s attribute with a simple read-modify-write instruction sequence.</p>
<p>In an era where processors were relatively expensive, it made sense to use the CPU running the application code to directly manipulate video on the screen as well. In the 21<sup>st</sup> century, microprocessors are so cheap and CPUs are so isolated from peripheral devices by caches and bus hierarchies that we have radically changed the way video works in most computers and embedded systems. Most systems now employ separate video processors but there are still certain non-video applications and certain peripheral devices that can still make effective use of memory-mapped I/O to provide direct processor access to peripheral memory.</p>
<p>Finally there’s stream I/O, which directs long transaction bursts to one memory or port address. Large operating systems, Linux in particular, have a great affinity for stream I/O and it’s an essential I/O protocol for streaming audio and video media. (No coincidence there.) Generally, a peripheral processor is required in such streaming applications to interpret commands embedded within the data stream and to separate multiplexed data streams (such as merged audio/video streams, which have become extremely common). Often, it’s advisable to place a FIFO at the input port of a streaming-I/O peripheral to help buffer the incoming data stream. Buffering helps to bridge mismatched data rates or inter-burst latencies between the streaming transmitter and receiver.</p>
<p>Xilinx hasn’t discussed any of these details but it’s likely that the EPP will support all three types of I/O transactions. What remains to be seen is what will be supported in hard-core IP and what will need to be implemented in the FPGA fabric.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/06/02/more-on-the-xilinx-epp-three-ways-to-communicate-with-on-chip-peripherals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Xilinx redefines the high-end microcontroller with its ARM-based Extensible Processing Platform – Case Studies – Part 2</title>
		<link>http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-arm-based-extensible-processing-platform-%e2%80%93-case-studies-%e2%80%93-part-2/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-arm-based-extensible-processing-platform-%e2%80%93-case-studies-%e2%80%93-part-2/#comments</comments>
		<pubDate>Sat, 01 May 2010 20:22:06 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[microcontroller]]></category>
		<category><![CDATA[Xilinx]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=347</guid>
		<description><![CDATA[In my previous blog, I discussed the hard-core features of Xilinx’s new Extensible Processing Platform (EPP) and explained the device at the 50,000-foot level. In this blog, I’ll dig a bit deeper into the thinking behind the EPP’s FPGA fabric &#8230; <a href="http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-arm-based-extensible-processing-platform-%e2%80%93-case-studies-%e2%80%93-part-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://j.mp/aJd8AI" target="_blank">previous blog</a>, I discussed the hard-core features of Xilinx’s new Extensible Processing Platform (EPP) and explained the device at the 50,000-foot level. In this blog, I’ll dig a bit deeper into the thinking behind the EPP’s FPGA fabric and I’ll show some case studies that indicate why Xilinx may have come up with a product family that will revolutionize high-end embedded system design.</p>
<p>Two features of Xilinx’s EPP architecture differentiate it from other microcontrollers. The first, discussed in Part 1, is the presence of a dual-core ARM Cortex-A9 processor. Most microcontrollers contain only one processor core. The EPP has two. So it’s already starting from a high-end position. The second differentiating feature is the inclusion of an unidentified amount of FPGA fabric on the device. Since the Xilinx EPP represents a family of parts, it’s safe to assume that various family members will contain differing amounts of FPGA fabric. That’s an especially safe assumption because the Xilinx presentation showed two EPP examples with different amounts of FPGA fabric. So we know that the family will likely include at least two parts—and probably many more if the product line proves successful.</p>
<p>What do you do with this FPGA fabric? Well the hard-core section of the EPP already gives you two 32-bit processor cores, some microprocessor peripherals, a memory controller, and some SRAM cache. So you might use the fabric to add some standard peripherals that your design needs that are not included in the standard hard-core set. Because the EPP is based on the AMBA-AXI bus, there are already many such peripheral devices available as synthesizable IP to choose from and the mere presence of Xilinx’s EPP is likely to increase the number of choices substantially as IP vendors decide to jump on the bandwagon.</p>
<p>Perhaps more likely, you will develop custom accelerators for application-specific tasks that permit the EPP to perform task-specific computations really, really fast. Bolt-on, bus-connected acceleration is the preferred design style for many embedded systems architects and it appears to me that the Xilinx EPP heartily supports this design style. I expect the Xilinx EPP offerings to flourish because it complements in-favor system design styles so well. So let’s take a look at two case studies provided by Xilinx to illustrate how the EPP can reduce a system design’s parts count, cost, and power consumption.</p>
<p><img class="alignright size-full wp-image-352" title="Xilinx EPP Auto Application" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2010/05/Xilinx-EPP-Auto-Application2.jpg" alt="Xilinx EPP Auto Application" width="358" height="444" />The first example is for an automotive optical-recognition system that provides a driver with a number of assist features for collision avoidance, blind spot detection, visually assisted cruise control, night vision, a self-parking system, and a lane-departure warning system. An automotive vendor wanted to develop such a system in a compact package that could be installed high on the windshield between the glass and the rear-view mirror. The system needed to be passively cooled (not an easy feat considering the location of the system). Sensors feeding the system will include video cameras, passive infrared sensors, and active RADAR sensors. The vendor wished for the system to be scalable, based on which and how many sensors are used in the vehicle.</p>
<p>The total processing requirement for this system included 1600 DMIPS from the supervisory processor and 32 GMACs for the sensor processing. Cost and power targets for this system were $50 and 5W. A design based on a processor-based ASSP backed with two auxiliary DSPs (needed to provide the 32 GMACs) came in at $45.75 and 6.6W, so the cost target was achieved but the power consumption was too high. A second design based on a Xilinx EPP came in at “less than” $40.75 (less than because Xilinx is still somewhat secretive about pricing for an unannounced product, so the listed EPP costs “less than $25&#8243;) and 4.2W, so the power consumption is about 15% below budget. More important, the EPP design provides roughly 200% DMIPS and GMAC of the processing power needed by the design, delivering 3335 DMIPS and 60 GMACs. Even with these cost and power advantages, the Xilinx EPP would be far less attractive if it forced the software team to use an unfamiliar hardware architecture. One of the biggest advantages of the Xilinx approach is the familiar nature of the EPP’s foundation hardware.</p>
<p>The second case study involves an intelligent video surveillance system that can monitor a scene and raise alarms or generate alerts based on the scene. The estimate for processing requirements was 3100 MIPS from the supervisor processor and 49 GMACs for video processing. Cost and power targets were $100 and 10W. A system design based on separate host and video processors came in just above the processing requirements, with a part cost of $93 and a power dissipation of 10W. So this discrete design just meets spec with very little processing headroom and no leeway in power dissipation. A second system design based on a Xilinx EPP delivers 3335 DMIPS and 60 GMACs, so there’s ample video-processing headroom. Parts cost dropped to “less than $87” (again, Xilinx is being cagey with quoting EPP costs) and 7.9W for power dissipation (20% under the power goal).</p>
<p>Both of these case studies illustrate the Xilinx EPP’s applicability in high-end embedded systems with big processing requirements. In such systems, the EPP’s standardized, high-end, hard-core, dual-processor core (an ARM Cortex-A9 MP cluster) coupled to a high-performance, 28nm FPGA fabric though multiple high-performance buses are significant assets, well suited to such high-end applications. Even though these are high-end applications, they are likely to boost sales of Xilinx’s EPP-based devices to levels rarely achieved by Xilinx’s more expensive FPGAs. EPP component costs listed in these two case studies suggest that Xilinx plans to sell these parts for tens of dollars, not hundreds or thousands of dollars. This feat is possible only because the standardized components within the EPP are hard cores, and they consequently consume only 5-10% of the silicon they’d require if implemented with an FPGA fabric.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-arm-based-extensible-processing-platform-%e2%80%93-case-studies-%e2%80%93-part-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Xilinx redefines the high-end microcontroller with its ARM-based Extensible Processing Platform – Part 1</title>
		<link>http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-extensible-processing-platform-%e2%80%93-part-1/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-extensible-processing-platform-%e2%80%93-part-1/#comments</comments>
		<pubDate>Sat, 01 May 2010 19:10:39 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[DRAM]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Cortex]]></category>
		<category><![CDATA[microcontroller]]></category>
		<category><![CDATA[Xilinx]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=339</guid>
		<description><![CDATA[Last week at the Embedded Systems Conference (ESC) held in San Jose, California, Xilinx disclosed additional information about its upcoming Extensible Processing Platform (EPP), which I previously discussed in a February 1 blog entry written just after RTECC (the Real &#8230; <a href="http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-extensible-processing-platform-%e2%80%93-part-1/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Last week at the Embedded Systems Conference (ESC) held in San Jose, California, Xilinx disclosed additional information about its upcoming Extensible Processing Platform (EPP), which I previously discussed in a February 1 blog entry written just after RTECC (the Real Time Embedded Computing Conference, see <a href=" http://low-powerdesign.com/sleibson/2010/02/01/designing-low-power-systems-with-fpgas-part-2/" target="_blank">Designing Low-Power Systems with FPGAs, Part 2</a>). This past week at a press conference, Xilinx’s Senior VP of Worldwide Marketing and Business Development Vin Ratford again spoke of the upcoming processor-centric devices Xilinx plans to introduce next year, but this time he provided far more detail. As promised, the devices fuse features of a high-end microcontroller (hard-core implementations of a 32-bit processor, memory, and I/O) with an FPGA fabric. But wait, you say, haven’t both Xilinx and Altera (and other FPGA vendors) tried this before? Yes, they have, with uninspiring results. However, I submit that Xilinx’s EPP is substantially different and it stands a very good chance of capturing significant market share from microcontrollers and from discrete processors. It may also be very attractive to design teams considering the development of certain types of SOCs. Consequently, the Xilinx EPP family may well become the family of high-volume parts Xilinx wants to have in its product catalog. Ratford provided so much information in his ESC announcement that I’ll need multiple blog entries to cover it all. In this first entry, I’ll describe what Xilinx’s EPP is and I’ll cover some of the thinking behind the architecture; In the second entry, I’ll describe some case studies that illustrate why this component family might be very attractive for a certain class of embedded product—because it promises lower parts count, lower cost, and higher performance with lower power consumption. Please understand that Xilinx stopped short of announcing actual products. Ratford described an architecture that will be used to produce a product family with actual products starting to appear next year.</p>
<p> There are two major components to Xilinx’s EPP: a hard-wired, high-end, microcontroller-like block and a connected FPGA fabric based on Xilinx’s 28nm unified FPGA logic-cell design as shown in the diagram below.</p>
<p> </p>
<div id="attachment_340" class="wp-caption aligncenter" style="width: 530px"><img class="size-full wp-image-340" title="Xilinx EPP Block Diagram" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2010/05/Xilinx-EPP-Block-Diagram.jpg" alt="Xilinx EPP Block Diagram" width="520" height="297" /><p class="wp-caption-text">Xilinx EPP Block Diagram</p></div>
<p> </p>
<p> </p>
<p>First, let’s look at the hard-wired portion. It’s well known that processors don’t run very fast when implemented with FPGAs. The reason mostly revolves around the wiring congestion associated with the large register files of 32-bit RISC processors. Wiring congestion translates into “slow” and you can figure on giving up 50-75% or more of the processor’s maximum clock rate in a given process technology when comparing a synthesized ASIC implementation against a synthesized FPGA implementation. Hand optimization can reclaim some of that speed but if you’re planning on using a standard processor architecture anyway, it makes perfect sense to implement the processor on the FPGA as a hard core using a standard ASIC synthesis flow. That way, you get the full speed of the IC process technology along with the full logic density and therefore a much lower silicon cost.</p>
<p>Xilinx has chosen ARM’s Cortex-A9 32-bit RISC processor core for the EPP but has gone a step farther by implementing a dual-core version of this processor. That choice immediately puts the Xilinx EPP family at the high-end of the microcontroller spectrum. First, there are two 32-bit processor cores. Second, a Cortex-A9 processor can run at 2 GHz in TSMC’s 40nm, high-performance process technology. That’s one fast processor—much faster that many embedded applications require. A dual-core version, as is employed in Xilinx’s EPP family, is faster still.</p>
<p>In choosing a standard processor core from ARM’s extremely successful stable of processors, Xilinx has plugged directly into a broad community of embedded software developers. In other words, choosing the widely used ARM architecture telegraphs Xilinx’s recognition that embedded software development is now the largest and most expensive part of any high-end embedded project. In many such projects, software developers often outnumber hardware developers by 10:1. In announcing the EPP, Xilinx shows that it fully recognizes the need to make the software development team happy first. The company’s selection of an ARM processor core also leverages the associated large and familiar development-tool set, the good selection of operating systems, and the extended ecosystem that goes with the ARM architecture’s large and growing market dominance in the embedded space. All of these factors make the ARM processor very attractive to embedded development teams.</p>
<p>To the dual-core ARM Cortex-A9 processor, Xilinx has added a number of hard-core peripherals including SRAM caches, timers, interrupt controllers, switches, memory controllers, and commonly used I/O peripherals certain to be useful for many high-end embedded designs. Because these additional blocks are all hard-core implementations, they too take little room on the chip and consume much less power than they’d need if implemented in an FPGA fabric. Note that the EPP chips will contain enough SRAM for caches and small scratchpads however bulk memory, generally implemented with DRAM, will be off-chip. Consequently, the EPP architecture includes hard-core DRAM controllers to manage off-chip memory. Ratford’s talk at ESC did not elaborate on the type of memory the on-chip controller can handle however DDR2, DDR3 or both DDR2 and DDR3 would probably be a good guess, considering the high-end nature of the EPP family. The targeted applications will need a lot of memory and DDR2 and DDR3 DRAM are now the best choices in terms of cost/bit.</p>
<p>Key to the software-friendly approach Xilinx is taking with the EPP, the architecture boots code upon power up just like a microcontroller. Only then is the FPGA fabric configured. This approach makes the EPP look very familiar to software developers who are not at all comfortable with writing code for a fluid, amorphous system that’s not well-defined when power comes up. The FPGA vendors spent a lot of money on reconfigurable architectures learning this lesson. In addition, HLL compilers don’t much care for undefined hardware either—undefined hardware just doesn’t fit the standard software-programming models. So the implementation of a complete, hard-wired microcontroller within the EPP cuts out a lot of that old unfamiliar strangeness associated with previous attempts to marry hard processor cores and FPGA fabrics.</p>
<p>Speaking of the FPGA fabric, Xilinx will be using the unified 28nm FPGA fabric in the EPP. Xilinx developed this fabric for its next-generation Spartan and Virtex FPGAs. (If you want more details about this FPGA fabric, take a look at the White Paper <a href="http://www.xilinx.com/support/documentation/white_papers/wp312_Next_Gen_28_nm_Overview.pdf " target="_blank">here</a>. According to Ratford, Xilinx’s Virtex and Spartan FPGAs will both employ this fabric, which is the first time that Xilinx has used the same FPGA fabric for its high-performance and its low-cost FPGA product families. Using the same fabric for the two Xilinx FPGA product lines and for the EPP means that Xilinx need only develop one set of hardware-design tools for the 28nm node and it also means that hardware designers only need to learn one set of tools as well.</p>
<p>The EPP’s hard-core embedded microcontroller communicates with the on-chip FPGA fabric using ARM’s newly announced AMBA 4/AXI bus. Ratford said at RTECC and repeated again at ESC that Xilinx worked with ARM to develop a version of this new bus specifically for FPGA use but he’s not provided details. The diagram of the EPP Ratford projected (reproduced above) shows multiple buses connecting the EPP’s hard-core embedded microcontroller and the on-chip FPGA fabric. Although Ratford provided no additional details, I plan to write a third blog entry discussing possible ways of optimally connecting the processor cores to the FPGA fabric. In the next installment of this blog, I’ll discuss some specific case studies Ratford covered in his ESC presentation that show how the EPP can reduce the parts count, cost, and the power consumption of high-end embedded systems.</p>
<p>(You can find a White Paper describing the Xilinx EPP <a href="http://www.xilinx.com/support/documentation/white_papers/wp369_Extensible_Processing_Platform_Overview.pdf" target="_blank">here</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/05/01/xilinx-redefines-the-high-end-microcontroller-with-its-extensible-processing-platform-%e2%80%93-part-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Tabula FPGA Scatters Logic, Memory, and Power Across Space and Time</title>
		<link>http://low-powerdesign.com/sleibson/2010/04/01/tabula-fpga-scatters-logic-memory-and-power-across-space-and-time/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/04/01/tabula-fpga-scatters-logic-memory-and-power-across-space-and-time/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 15:20:51 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[Tabula]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=337</guid>
		<description><![CDATA[Here’s a head-scratcher for you. Why not create tesseract FPGAs? A tesseract is the 4-dimensional version of a 3D cube. (Just as a 3D cube can be unfolded to make a set of six connected 2D squares, a tesseract can &#8230; <a href="http://low-powerdesign.com/sleibson/2010/04/01/tabula-fpga-scatters-logic-memory-and-power-across-space-and-time/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Here’s a head-scratcher for you. Why not create <a href="http://en.wikipedia.org/wiki/Tesseract" target="_blank">tesseract</a> FPGAs? A tesseract is the 4-dimensional version of a 3D cube. (Just as a 3D cube can be unfolded to make a set of six connected 2D squares, a tesseract can be unfolded into a set of eight connected 3D cubes.) I’ve loved the word ever since I learned it by reading Robert A. Heinlein&#8217;s classic science fiction short story from 1940 called “And He Built a Crooked House” in which an earthquake causes a house built in the unfolded 3D shape of a tesseract to fold into an actual 4D tesseract, trapping the unfortunate occupant inside. If you fold an FPGA into time, you can extrude some of the physical computational circuitry into elsewhen and reduce the amount of circuitry needed to implement your functions. And that is exactly what the new FPGA vendor <a href="http://www.tabula.com/" target="_blank">Tabula</a> has done. The company’s ABAX 3D FPGA architecture gets octuple duty from a LUT cell by fencing it in with eight sets of input/output latches and eight LUT configuration tables. Then, at 8x the “user” clock rate, the FPGA quickly reconfigures the LUT cell, runs part of a calculation, stores the partial result, and proceeds to the next step. The current FPGA design, just announced by Tabula, runs the user clock at 200 MHz and the “Spacetime” clock at 1.6 GHz. As a result, Tabula can offer really “large” FPGAs (in terms of logic cells) at really low prices compared to the big guys: Altera and Xilinx.</p>
<p>Now to do this, you need some magic and you need to value logic-cell capacity over power consumption. First, the magic. Unless you’re going to retrain FPGA users to manually spread their designs across eight time slices, you need to make the 1.6GHz reconfiguration trick work in the background. Altera and Xilinx spent more than a decade trying to sell the idea of spreading designs across time using “on-the-fly reconfigurable logic” and most designers just never latched onto the idea. For some reason, engineers can understand software overlays and DLLs (dynamic-linked libraries) but cannot come to grips with on-the-fly hardware reconfigurability. I think the issue is training more than anything else, but the big FPGA guys just couldn&#8217;t sell the idea broadly after trying for years. So there needs to be magic—or some appropriately advanced technology that looks like magic to most of us—to make this trick work.</p>
<p>And there is such magic in the form of an appropriate synthesis tool from Tabula that understands the extra-dimensional aspects of Tabula’s FPGA. The tool takes standard logic designs and “folds” them into time. However, like much of the magic in the Harry Potter book series, this magic isn’t perfect. You don’t necessarily get 8x the logic circuitry from a 1x FPGA. You get about 2.5x according to Tabula, depending on the design. And you get about 2.9x from the 8-ported, 1.6GHz memories on the chip, again, depending on the design. This gap between the real and the ideal reflects the difficulty in developing automated algorithms that can re-pipeline a datapath for additional stages. It’s an art not a science, as any CPU/processor/microprocessor architect will tell you. You can’t always partition one datapath pipleline stage into eight because there just isn’t enough computation taking place in that pipeline stage to allow such expansion or re-pipelining. So, according to Tabula, the average LUT reuse is about 2.5x based on whatever test cases the company used to develop that number.</p>
<p>Now for the power-consumption ramifications. Tabula’s FPGAs trade off die area (in terms of LUTs and on-chip memories) and therefore silicon cost at the expense of power consumption. Running most of the on-chip circuitry at 1.6GHz while delivering the performance of a 200MHz FPGA must cost additional power. In the real world of chip design, power scales linearly with area but superlinearly with frequency, largely due to voltage-rail considerations. You need more voltage to operate at higher clock rates.  There’s also the leakage issue caused by setting transistor thresholds to operate at 1.6GHz to contend with. So it’s bound to be a bad tradeoff in terms of power. (I don’t actually know this because it doesn’t seem that Tabula’s been forthcoming about power numbers, but some physics just can’t be bypassed as long as you’re still using off-the-shelf CMOS.)</p>
<p>It’s true that you can sacrifice half of the virtualized Spacetime LUTs and get 400MHz or some other combinations, but folks it’s a 1.6GHz device. Not designed for low power. Design tradeoffs obviously favored device cost, which you can see in the low, blink-inducing prices for the devices. Those prices are indeed mighty attractive for such high logic capacities. However, just about everyone’s worried about power these days, even people designing equipment for those power-sucking data centers that are cooled by diverting nearby rivers through the equipment racks. Every Watt of operating power supplied to the equipment requires an additional Watt for cooling (roughly speaking). A megawatt here, a megawatt there, and pretty soon you’re talking about some real energy consumption. And some real energy costs, which is what truly gets the attention of the data-center managers and owners.</p>
<p>I’ve heard about the Tabula announcements from several sources starting with a morning-of article in the San Jose Mercury News. One of the best technical write-ups I’ve seen so far is this <a href="http://www.fpgajournal.com/fpgajournal/feature_articles/20100323-tabula/" target="_blank">article</a> by Kevin Morris from FPGA Journal. Online comments to Morris’ article suggest that there’s a lot of skepticism in the design community with respect to this new FPGA technology. As with any new technology, even a tesseract FPGA, time will tell if the market accepts this idea or if it will end up on the shelf next to the long-dead and now-dusty remains of reconfigurable logic.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/04/01/tabula-fpga-scatters-logic-memory-and-power-across-space-and-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Designing Low-Power Systems with FPGAs, Part 2</title>
		<link>http://low-powerdesign.com/sleibson/2010/02/01/designing-low-power-systems-with-fpgas-part-2/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/02/01/designing-low-power-systems-with-fpgas-part-2/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 17:34:51 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[SOC]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=298</guid>
		<description><![CDATA[Literally within an hour of posting my last blog entry on designing low-power systems with FPGAs, Altera’s marketing engine issued a related email and dropped it into my inbox. Altera’s email pre-announces the company’s upcoming FPGAs based on 28nm lithography. &#8230; <a href="http://low-powerdesign.com/sleibson/2010/02/01/designing-low-power-systems-with-fpgas-part-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Literally within an hour of posting my last blog entry on designing low-power systems with FPGAs, Altera’s marketing engine issued a <a href="http://www.altera.com/b/innovating-at-28-nm.html?contactID=104156938&amp;gwkey=XKSYVJX39T" target="_blank">related email</a> and dropped it into my inbox. Altera’s email pre-announces the company’s upcoming FPGAs based on 28nm lithography. The email included the following marketing graph (with no scale) to explain the advantages of the smaller geometries for FPGA manufacture.</p>
<p><img class="aligncenter size-full wp-image-299" title="Altera 28nm devices" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2010/02/Altera-28nm-devices.jpg" alt="Altera 28nm devices" width="492" height="300" /></p>
<p>The first set of bars in the graph set the baseline using Altera’s 40nm devices as a reference. The next set of bars show that the feature shrink alone improves FPGA gate density by 25% and power consumption by about 12.5%. (Note: That’s my eyeball talking, not Altera’s official numbers.)</p>
<p>The next set of bars shows what happens incrementally when Altera takes some major logic blocks and hard-codes them. Suddenly, gate density doubles and power consumption drops by 40% compared to 40nm FPGA.</p>
<p>The last set of bars shows what happens when you combine the lithography shrink and hard-coded IP. Suddenly you’re getting 4x the gate density at a mere 25% of the power consumption compared to 40nm devices. (Note: I’m not sure what suddenly happened to the transceiver count, that third bar in the group, which had been constant until everything got combined in the last set. My guess is that the marketing artist who drew the graph got overzealous, cut everything 75% for visual consistency, and the proofreaders missed it. I think the number of transceivers is supposed to stay constant, based on the first three sets of bars in the graph.)</p>
<p>Two things to note here. First, you get a lot of bang out of hard-coded IP. Coincidentally, MIPS announced that Altera had licensed the MIPS32 architecture back in October, 2008 but Altera was mum on the subject back then. RISC processor cores make lousy targets for programmable FPGA fabrics, largely because of the routing congestion around their large register files, so processor core IP is one of the IP types that really should be hard-coded onto an FPGA. Although both Altera and Xilinx did not have much success with their first-generation FPGAs that incorporated hard-coded processor cores, that doesn’t mean they’re not going to try again and the MIPS announcement late last year telegraphed that move.</p>
<p>Want more proof? Last week at the Real Time Embedded Computing Conference held in Santa Clara, California, Xilinx’s Senior VP of Worldwide Marketing and Business Development Vin Ratford did more than telegraph his company’s intent to put processor cores back into FPGAs. He announced and elaborated on that intent. Xilinx will be adopting the ARM architecture and an FPGA-friendly version of ARM’s AMBA interconnect in future FPGA generations.</p>
<p>Make no mistake. Processors are coming to FPGAs for several reasons. First, a RISC processor core consumes between 25,000 and 50,000 gates. You can drop one of those puppies into an FPGA fabric and never see it. In essence, those transistors are “free.” That’s the nature of an FPGA’s programmable interconnect. Logic just sort of disappears.</p>
<p>Second, you can’t build a system without at least one processor these days. Which immediately leads to the third reason. If Xilinx and Altera truly wish to convert their “We’re taking over everything” or “All your chips are belong to us” attitudes, then the processor will just have to live on the FPGA silicon. Otherwise, the FPGA companies don’t get all of the chips. It’s as simple as that.</p>
<p>However, as both Altera and Xilinx discovered last time they tried this, dropping a processor core into an FPGA and making it usable is not just a matter of burying some gates into the FPGA fabric. Effective ways of connecting the processor to the programmable FPGA fabric must also exist and the software developers—who represent more than 90% of modern embedded development teams—must also be happy with the integration. You only make them happy with good development, profiling, and debugging tools.</p>
<p>And there’s the rub.</p>
<p>(It’s possible that Shakespeare’s Hamlet was indeed an embedded systems developer.)</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/02/01/designing-low-power-systems-with-fpgas-part-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.535 seconds -->

