<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Steve Leibson &#187; CMOS</title>
	<atom:link href="http://low-powerdesign.com/sleibson/index.php/category/cmos/feed/" rel="self" type="application/rss+xml" />
	<link>http://low-powerdesign.com/sleibson</link>
	<description>Leibson's Laws and the Penalties for Breaking Them</description>
	<lastBuildDate>Wed, 01 Feb 2012 00:01:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>What if 2.5D got really cheap? How would that affect low-power design?</title>
		<link>http://low-powerdesign.com/sleibson/2011/11/17/what-if-2-5d-got-really-cheap-how-would-that-affect-low-power-design/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/11/17/what-if-2-5d-got-really-cheap-how-would-that-affect-low-power-design/#comments</comments>
		<pubDate>Thu, 17 Nov 2011 18:09:37 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[2.5D]]></category>
		<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Flash]]></category>
		<category><![CDATA[Low-Power]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=727</guid>
		<description><![CDATA[Last week, silicon-interposer foundry Deca Technologies unstealthed. I found out from an article in the San Jose Mercury News and just published a blog about the announcement in my other blog, the EDA360 Insider. Deca is a subsidiary of Cypress &#8230; <a href="http://low-powerdesign.com/sleibson/2011/11/17/what-if-2-5d-got-really-cheap-how-would-that-affect-low-power-design/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Last week, silicon-interposer foundry Deca Technologies unstealthed. I found out from an <a href="http://www.mercurynews.com/business/ci_19297216" target="_blank">article</a> in the San Jose Mercury News and just published a <a href="http://eda360insider.wordpress.com/2011/11/17/is-cypress-subsidiary-deca-technologies-onto-2-5d-packaging-in-a-big-way/" target="_blank">blog</a> about the announcement in my other blog, the <strong>EDA360 Insider</strong>. Deca is a subsidiary of Cypress Semiconductor and the outspoken President and CEO of Cypress, TJ Rodgers, was good for a quote, as always:</p>
<p>“We want to use the dense, reliable silicon interconnect inherent in Moore’s Law to integrate the dissimilar chips used in today’s systems, but we face an economic barrier because the interconnect on silicon chips is 1,000 times more expensive than the interconnect on PC boards.</p>
<p>“We could enable a new silicon-based interconnect paradigm if we could make silicon interconnect wafers for $10, just what silicon solar wafers cost today. The problem of mapping solar technology onto Moore’s Law is straightforward, but difficult, and we believe DecaTech has the answer.”</p>
<p>Now don’t take that $10 per interposer wafer to the bank. I get the impression that’s a long-term goal, not a short-term pricing roadmap. However, even a 10x drop in interposer costs will have a big influence on the future of 2.5D assembly technology.</p>
<p>And why should we as low-power systems designers care? Because interconnect is expensive and because interconnect now largely determines system performance. First, think about expense. Let your mind go back 40 years (if it can) to the birth of the microprocessor, which we celebrate this month. In the 1970s, microprocessor interconnect meant a bus. Not on a board but in a system. One of the most successful early microprocessor buses was the S-100 bus. It was named for the 100-pin edge connectors and the 100-conductor bus used to interconnect system boards in the original Altair 8800 microcomputer introduced in 1975 and subsequently adopted by several microcomputer vendors including Imsai, Vector Graphics, North Star Computers (formerly known as Kentucky Fried Computers), and Processor Technology and by board vendors including Godbout Electronics/Compupro and Morrow Micro-Stuff/ThinkerToys.</p>
<p>Back then, due to the nascent state of semiconductor integration, you would to have a processor board, one or likely more than one memory boards, a video board, and one or more I/O boards. A major system expense was just the half dozen or so 100-pin edge connectors and the simple but large circuit board that implemented the bus. The S-100 connectors were expensive and you needed a lot of energy to drive the bus lines because they were physically large and because—as bus speeds increased—they required resistive termination to prevent ringing and you needed even more energy to drive the termination resistors.</p>
<p>By 1981 when the IBM PC appeared, things were getting somewhat better. We still had half a dozen edge connectors but we were down to 62 edge-connector pins (for 8-bit systems). Add another 36 pins when we jumped to 16 bits and we found ourselves right back at close to a 100-pin bus. So much for progress.</p>
<p>For board-to-board interconnect, things are going serial (think of the PCI evolution to PCIe) but chip-to-chip interconnect on a board is still largely parallel with lots of pins on a chip looking to connect to lots of pins on other chips. There is still impedance in those pcb traces and you still need relatively big drivers that consume significant amounts of energy to drive those traces. Hence a movement to go serial for chip-to-chip interconnect on a board—an extension of the migration of buses to serialized versions.</p>
<p>High-speed serial buses incur their own costs. There’s the energy cost of driving even a few wires at multi-GHz speeds and there’s the performance hit in the form of latency increase that you get when you serialize and then deserialize a data stream. It’s not all wine and roses.</p>
<p>So we try to put as much as we can on one IC, but that’s not an ideal solution either. Not all IC processing is the same and chips with different functions are really better off being fabricated with different IC fabrication processes. For example, NAND Flash and DRAM processes push as far down the Moore’s Law curve as they can get, as fast as they can to boost density and cut cost per bit. CMOS logic processes are right behind the memory processes but use more layers of on-chip metal interconnect. Because they require more random connectivity than memories. Analog ICs typically operate at higher supply voltages and they’re nowhere near the leading/bleeding edge of IC processor technology. It’s not economical to put all of these different functions on one die, and so we see renewed interest in multichip modules, known by the 21<sup>st</sup>-centrury name: 2.5D IC assembly using silicon interposers.</p>
<p>2.5D assembly using bare semiconductor die attached to small interposers instead of big circuit boards significantly changes the parallel/serial I/O equation. Suddenly, you don’t need such big I/O drivers on the chip because there’s no wire bond, no IC package interconnect, and significantly shorter traces to drive. Suddenly, massively parallel I/O consumes only a fraction of the energy it previously needed and the balancing equation that calculates the breakeven point between parallel I/O and serialized, high-speed I/O alters. The balance alters to favor parallel I/O more and serial I/O less.</p>
<p>And when major changes like that happen to such equations, the way we design systems also changes.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/11/17/what-if-2-5d-got-really-cheap-how-would-that-affect-low-power-design/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Think Globally, Act in Parallel. What can you do with one million ARM cores acting in parallel and how do you get there?</title>
		<link>http://low-powerdesign.com/sleibson/2011/07/16/think-globally-act-in-parallel-what-can-you-do-with-one-million-arm-cores-acting-in-parallel-and-how-do-you-get-there/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/07/16/think-globally-act-in-parallel-what-can-you-do-with-one-million-arm-cores-acting-in-parallel-and-how-do-you-get-there/#comments</comments>
		<pubDate>Sat, 16 Jul 2011 23:47:06 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[DRAM]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[SDRAM]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[SRAM]]></category>
		<category><![CDATA[Cortex-M0]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Samsung]]></category>
		<category><![CDATA[SpiNNaker]]></category>
		<category><![CDATA[UMC]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=615</guid>
		<description><![CDATA[Professor Steve Furber’s SpiNNaker project is in the news again. I wrote about Furber’s massively parallel brain-emulation project back on March 30 after listening to his keynote at this year’s DATE (Design Automation and Test Europe) conference in Grenoble, France. &#8230; <a href="http://low-powerdesign.com/sleibson/2011/07/16/think-globally-act-in-parallel-what-can-you-do-with-one-million-arm-cores-acting-in-parallel-and-how-do-you-get-there/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Professor Steve Furber’s SpiNNaker project is in the news again. I wrote about Furber’s massively parallel brain-emulation project back on March 30 after listening to his keynote at this year’s DATE (Design Automation and Test Europe) conference in Grenoble, France. (See “<a href="http://low-powerdesign.com/sleibson/2011/03/30/the-incredible-vanishing-power-of-a-machine-instruction-is-this-the-way-to-the-brain/" target="_blank">The incredible vanishing power of a machine instruction. Is this the way to the brain?</a>”) Furber’s DATE keynote title says it all: “Biologically-inspired massively-parallel architectures—computing beyond a million processors.” Furber and his team are referencing nature to help them tackle the really hard processing problems we need to solve in the future through massively parallel, brain-like computing. Brain-like computing—go slow, go wide, go massively parallel—seems to offer a proven, low-power approach to solving some of these big computational problems.</p>
<p>The SpiNNaker project is again in the news at EETimes Europe (see “<a href="http://www.electronics-eetimes.com/en/a-million-arm-cores-to-host-brain-simulator.html?cmp_id=7&amp;news_id=222908354&amp;vID=209" target="_blank">A million ARM cores to host brain simulator</a>”) and the idea of harnessing one million ARM processor cores is certainly a big idea. It excites me. However, we’re still at the humble beginnings of the project.</p>
<p>The SpiNNaker project’s first test chip harnesses 18 ARM9 cores on one 130nm chip manufactured by UMC in Taiwan. This is a 100M-transistor chip and, like most many-processor SoCs, the SpiNNaker SoC mostly consists of memory. The memory needs to be close to the processors for speed and for low-power consumption and there are 55 32Kbyte SRAM blocks on the SpiNNaker die. That’s 14 million bits of SRAM and, frankly speaking, that’s really not very much SRAM. Eighteen processors isn’t really a large number of processors either when your stated goal is one million.</p>
<p>The ARM processors on the SpiNNaker chip use packet communications to emulate the electrical spike communications that occur among the neurons in human and animal brains. From a hardware perspective, I think it’s easy to conceive of a system-level design like this and even conceptually scaling the design to a million connected ARM9 processors isn’t really hard, as long as you don’t try to enumerate all of the processors in your mind. However, with 18 processors per chip, you’ll need approximately 55,600 chips to build an interconnected network of one million processors. That’s still a mighty big box of hardware. More on that in a bit.</p>
<p>The rub is that we really don’t have many good ideas for programming such a massively parallel system. The SpiNNaker project seems to be mostly a hardware endeavor with the explicitly stated intent of developing a hardware testbed for brain researchers who will use SpiNNaker systems for studying various theories of brain function. Presumably, we’ll learn more about massively parallel programming by working with these systems and no doubt we will. As Furber says in a quote published in the EETimes Europe article, “We don&#8217;t know how the brain works as an information-processing system, and we do need to find out. We hope that our machine will enable significant progress towards achieving this understanding.&#8221;</p>
<p>Each SpiNNaker chip in the current design is bundled with a 166MHz, 1Gbit DDR SDRAM and packaged in a 300-pin BGA package. But we’re not going to be building million-processor testbeds with 18 processors per packaged chip. I’m almost absolutely, positively certain about that. This first SpiNNaker prototype just doesn’t scale to one million processors very easily. So the question is, how to get there?</p>
<p>Well, possible clues to answer that question can be found in two recent blogs that I wrote on the <strong>EDA360 Insider</strong> blog. First, Samsung has just announced successful tapeout of a 20nm test chip incorporating an ARM Cortex-M0 processor core. (See “<a href="http://eda360insider.wordpress.com/2011/07/12/samsung-20nm-test-chip-includes-arm-cortex-m0-processor-core-how-many-will-fit-on-the-head-of-a-pin/" target="_blank">Samsung 20nm test chip includes ARM Cortex-M0 processor core. How many will fit on the head of a pin?</a>”) Now an ARM Cortex-M0 processor is not as powerful as an ARM9 processor, but then it’s not supposed to be. It’s designed for control-oriented applications and its 3-stage execution pipeline isn’t designed to get maximum speed from any given process technology. However, we’re building a system that emulates a brain that operates at a few hundred Hertz (that’s <strong>Hertz</strong>, not kilohertz, megahertz, or gigahertz) so I really don’t think the clock speed is all that critical when you’re talking about a million processors. The ARM Cortex-M0 processor core is still a 32-bit RISC processor and I am guessing with a high degree of confidence that it’s fully up to the task of executing the required electrical-spike calculations, albeit not quite as quickly as an ARM9 processor.</p>
<p>What’s interesting about a 12-to-14Kgate ARM Cortex-M0 processor implemented in 20nm process technology is that my calculations suggest that more than half a million ARM Cortex-M0 processors would fit on a chip the size of an Intel “Tukwila” Itanium processor (OK, that’s a big chip, but it’s a commercial one) and that calculation is based on the published number for the area required by an ARM Cortex-M0 implemented in 90nm process technology, not 20nm. Now there’s a lot of slop in this calculation. First, there’s the disparity of using 90nm numbers instead of 20nm numbers. Then there’s the disparity caused by putting no memory at all into the calculation. I just mentally tiled processors edge to edge. Ditto, there’s no on-chip interconnect.</p>
<p>So you probably won’t get half a million ARM Cortex-M0 processor cores on one 20nm chip. But you might get 100,000 or 200,000 ARM Cortex-M0 processor cores on a chip along with an interesting amount of memory and the required interconnect. Now we’re talking about only a handful of chips to get to one million processors. We’re talking about a tabletop box. Now we’re getting into the realm of the feasible for million-processor systems.</p>
<p>The second related blog entry I recently wrote in <strong>EDA360 Insider</strong> that also bears on this very interesting endeavor was about an announcement from Imec, a global research company. Just days ago, Imec announced that it and its partners successfully assembled a custom logic chip with two DRAMs in a stacked 3D configuration. (See “<a href="http://eda360insider.wordpress.com/2011/07/14/3d-thursday-imec-prototypes-3d-chip-stack-finds-some-thermal-surprises/" target="_blank">3D Thursday: IMEC prototypes 3D chip stack, finds some thermal surprises</a>”.) This 3D stacked-chip prototype allowed Imec to test out some process ideas for manufacturing 3D stacked chip assemblies and to make some critical thermal tests to verify thermal models that will be so necessary when 3D assembly goes mass market. The 3D chip stack uses copper-tin micro-bumps and compression bonding for the electrical and mechanical assembly of the chip stack and you can see photos of the assembled stack below.</p>
<p>Here’s a photo of the overall chip stack:</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/07/Imec-3D-Chip.bmp"><img class="aligncenter size-full wp-image-616" title="Imec 3D Chip" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/07/Imec-3D-Chip.bmp" alt="" /></a></p>
<p>And here’s a close-up of the edge of the chip stack to show the three stacked die.</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/07/Imec-3D-Chip-Closeup.bmp"><img class="aligncenter size-full wp-image-617" title="Imec 3D Chip Closeup" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/07/Imec-3D-Chip-Closeup.bmp" alt="" /></a></p>
<p>The 3D Stack’s base chip is approximately 750µm thick. The two top components in the chip stack are each 25µm thick. There’s more technical info in the referenced <strong>EDA360 Insider</strong> blog.</p>
<p>I am convinced that 3D stacking of logic and RAM chips will be absolutely essential to developing massively parallel, low-power systems like the ones envisioned by the SpiNNaker project. First, the only way to feed data and instructions to massively parallel processing chips is through large amounts of on-chip memory and through high-bandwidth, low-energy channels connected to large off-chip memories. 3D assembly techniques permit both Wide I/O and high-speed serial I/O channels to work most effectively and at minimal energy levels and I expect to see rapid adoption of 3D assembly—even and perhaps especially in high-volume, cost-sensitive applications such as mobile phone handsets—in the next few years. This is precisely the sort of manufacturing technology we require to think seriously about million-processor systems.</p>
<p>Now all we need to do is figure out how to program them.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/07/16/think-globally-act-in-parallel-what-can-you-do-with-one-million-arm-cores-acting-in-parallel-and-how-do-you-get-there/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Need to cut IP power? (Who doesn’t?) “Press here” says Calypto</title>
		<link>http://low-powerdesign.com/sleibson/2011/06/12/need-to-cut-ip-power-who-doesn%e2%80%99t-%e2%80%9cpress-here%e2%80%9d-says-calypto/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/06/12/need-to-cut-ip-power-who-doesn%e2%80%99t-%e2%80%9cpress-here%e2%80%9d-says-calypto/#comments</comments>
		<pubDate>Sun, 12 Jun 2011 20:29:33 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[Clock Gating]]></category>
		<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SRAM]]></category>
		<category><![CDATA[Calypto]]></category>
		<category><![CDATA[PowerPro]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=599</guid>
		<description><![CDATA[All SoCs are built with IP blocks. Some of those are legacy IP blocks. Some are purchased from other vendors. Some are developed in-house. All of them draw power—static and dynamic power. At nanometer lithographies, the way to cut static &#8230; <a href="http://low-powerdesign.com/sleibson/2011/06/12/need-to-cut-ip-power-who-doesn%e2%80%99t-%e2%80%9cpress-here%e2%80%9d-says-calypto/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>All SoCs are built with IP blocks. Some of those are legacy IP blocks. Some are purchased from other vendors. Some are developed in-house. All of them draw power—static and dynamic power. At nanometer lithographies, the way to cut static power is through circuit tricks like high-Vt transistors and by powering down entire blocks when not needed. The way to cut dynamic power within an IP block is to stop clocking anything that doesn’t need to be clocked. Designers can gate clocks during the development of an IP design but what about existing IP blocks? Some can be retrofitted with clock gating but the ease of that exercise depends on how familiar the IP designer is with that IP block and how well documented the block is.</p>
<p> </p>
<p>Face it, <span style="text-decoration: line-through;">some</span> most IP blocks aren’t that well documented. You may never know enough about the internals of a purchased IP block to fiddle with its clocking. Legacy IP blocks may have been long abandoned by their designers who have gone off to other tasks, other companies, other planes of existence. Even a block you’ve designed yourself may have scrolled off your own internal memory window long ago.</p>
<p> </p>
<p>Designers everywhere have a common solution for these sorts of problems. “Give me a tool to do this” they demand from EDA vendors. “I just want to push the button.”</p>
<p> </p>
<p>Usually, that’s easier said than done. Calypto’s got a tool you can try however. It’s called PowerPro and comes in two flavors: CG and MG. The CG flavor is based on the company’s SLEC sequential logic equivalency checker. That’s a tool that checks to see if modified IP block “A prime” works the same as original IP block “A.” It’s a general-purpose EDA tool with a variety of uses and one of those uses is for comparing an IP block’s function before and after clock gating.</p>
<p> </p>
<p>Calypto’s PowerPro CG encapsulates the SLEC EDA tool to produce a “done for you” tool that can automatically insert clock gating into an IP design. It also checks to make sure the IP block’s behavior doesn’t change as a result of the added clock gating. Usually the insertion process takes 4 to 8 hours according to Calypto CEO Doug Aitelli who spoke to me about the product at DAC 2011 in San Diego. What do you get for this overnight run? Usually 10% to 30% reduction in dynamic power said Aitelli. Sometimes as much as 60%. Not bad for “pushing the button” I’d say.</p>
<p> </p>
<p>There’s another flavor of PowerPro called PowerPro MG. Nope, not named for a cute little British sports car, “MG” stands for “memory gating.” We tend to forget that today’s SoCs are more than half memory measured by die area. Usually SRAM. We sort of allude to this fact when we talk about MPSoCs—multiple processor SoCs. With each of those processors comes a boatload of on-chip SRAM for fast execution. However, we don’t seem to explicitly call out the memory. We tend to ignore it. I guess MMSoC—mostly memory SoC—doesn’t have the same cachet as MPSoC in our processor-centric world.</p>
<p> </p>
<p>However, if more than half of an SoC is SRAM, it makes sense to pay some attention to reducing the power consumption of an SoC’s on-chip SRAM blocks. That’s what Calypto’s PowerPro MG does. It can automatically add memory gating to an SoC design by evaluating the design’s behavior across many cycles.</p>
<p> </p>
<p>It also goes a step further. Many SRAM blocks for SoCs now have a sleep mode where the memory’s operating power can be reduced by shutting down peripheral circuitry such as address decoders and sense amps while keeping the memory storage array alive. According to Calypto’s Aitelli, most SoC designers find these sleep modes too hard to use, so they simply don’t use them. They don’t have the time. But those sleep modes are still there just waiting to be used. PowerPro MG will add the necessary sleep/wake-up state machine to exploit this little-used memory feature. Push the button, save power.</p>
<p> </p>
<p>Just a story from a chance meeting at DAC. Par for the course. There’s always something new to learn, something new to try.</p>
<p> </p>
<p>To read my blog on the Low-Power Report Card Panel at DAC, click <a href="http://j.mp/kwX9HI" target="_blank">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/06/12/need-to-cut-ip-power-who-doesn%e2%80%99t-%e2%80%9cpress-here%e2%80%9d-says-calypto/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tabula FPGA Scatters Logic, Memory, and Power Across Space and Time</title>
		<link>http://low-powerdesign.com/sleibson/2010/04/01/tabula-fpga-scatters-logic-memory-and-power-across-space-and-time/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/04/01/tabula-fpga-scatters-logic-memory-and-power-across-space-and-time/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 15:20:51 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[Tabula]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=337</guid>
		<description><![CDATA[Here’s a head-scratcher for you. Why not create tesseract FPGAs? A tesseract is the 4-dimensional version of a 3D cube. (Just as a 3D cube can be unfolded to make a set of six connected 2D squares, a tesseract can &#8230; <a href="http://low-powerdesign.com/sleibson/2010/04/01/tabula-fpga-scatters-logic-memory-and-power-across-space-and-time/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Here’s a head-scratcher for you. Why not create <a href="http://en.wikipedia.org/wiki/Tesseract" target="_blank">tesseract</a> FPGAs? A tesseract is the 4-dimensional version of a 3D cube. (Just as a 3D cube can be unfolded to make a set of six connected 2D squares, a tesseract can be unfolded into a set of eight connected 3D cubes.) I’ve loved the word ever since I learned it by reading Robert A. Heinlein&#8217;s classic science fiction short story from 1940 called “And He Built a Crooked House” in which an earthquake causes a house built in the unfolded 3D shape of a tesseract to fold into an actual 4D tesseract, trapping the unfortunate occupant inside. If you fold an FPGA into time, you can extrude some of the physical computational circuitry into elsewhen and reduce the amount of circuitry needed to implement your functions. And that is exactly what the new FPGA vendor <a href="http://www.tabula.com/" target="_blank">Tabula</a> has done. The company’s ABAX 3D FPGA architecture gets octuple duty from a LUT cell by fencing it in with eight sets of input/output latches and eight LUT configuration tables. Then, at 8x the “user” clock rate, the FPGA quickly reconfigures the LUT cell, runs part of a calculation, stores the partial result, and proceeds to the next step. The current FPGA design, just announced by Tabula, runs the user clock at 200 MHz and the “Spacetime” clock at 1.6 GHz. As a result, Tabula can offer really “large” FPGAs (in terms of logic cells) at really low prices compared to the big guys: Altera and Xilinx.</p>
<p>Now to do this, you need some magic and you need to value logic-cell capacity over power consumption. First, the magic. Unless you’re going to retrain FPGA users to manually spread their designs across eight time slices, you need to make the 1.6GHz reconfiguration trick work in the background. Altera and Xilinx spent more than a decade trying to sell the idea of spreading designs across time using “on-the-fly reconfigurable logic” and most designers just never latched onto the idea. For some reason, engineers can understand software overlays and DLLs (dynamic-linked libraries) but cannot come to grips with on-the-fly hardware reconfigurability. I think the issue is training more than anything else, but the big FPGA guys just couldn&#8217;t sell the idea broadly after trying for years. So there needs to be magic—or some appropriately advanced technology that looks like magic to most of us—to make this trick work.</p>
<p>And there is such magic in the form of an appropriate synthesis tool from Tabula that understands the extra-dimensional aspects of Tabula’s FPGA. The tool takes standard logic designs and “folds” them into time. However, like much of the magic in the Harry Potter book series, this magic isn’t perfect. You don’t necessarily get 8x the logic circuitry from a 1x FPGA. You get about 2.5x according to Tabula, depending on the design. And you get about 2.9x from the 8-ported, 1.6GHz memories on the chip, again, depending on the design. This gap between the real and the ideal reflects the difficulty in developing automated algorithms that can re-pipeline a datapath for additional stages. It’s an art not a science, as any CPU/processor/microprocessor architect will tell you. You can’t always partition one datapath pipleline stage into eight because there just isn’t enough computation taking place in that pipeline stage to allow such expansion or re-pipelining. So, according to Tabula, the average LUT reuse is about 2.5x based on whatever test cases the company used to develop that number.</p>
<p>Now for the power-consumption ramifications. Tabula’s FPGAs trade off die area (in terms of LUTs and on-chip memories) and therefore silicon cost at the expense of power consumption. Running most of the on-chip circuitry at 1.6GHz while delivering the performance of a 200MHz FPGA must cost additional power. In the real world of chip design, power scales linearly with area but superlinearly with frequency, largely due to voltage-rail considerations. You need more voltage to operate at higher clock rates.  There’s also the leakage issue caused by setting transistor thresholds to operate at 1.6GHz to contend with. So it’s bound to be a bad tradeoff in terms of power. (I don’t actually know this because it doesn’t seem that Tabula’s been forthcoming about power numbers, but some physics just can’t be bypassed as long as you’re still using off-the-shelf CMOS.)</p>
<p>It’s true that you can sacrifice half of the virtualized Spacetime LUTs and get 400MHz or some other combinations, but folks it’s a 1.6GHz device. Not designed for low power. Design tradeoffs obviously favored device cost, which you can see in the low, blink-inducing prices for the devices. Those prices are indeed mighty attractive for such high logic capacities. However, just about everyone’s worried about power these days, even people designing equipment for those power-sucking data centers that are cooled by diverting nearby rivers through the equipment racks. Every Watt of operating power supplied to the equipment requires an additional Watt for cooling (roughly speaking). A megawatt here, a megawatt there, and pretty soon you’re talking about some real energy consumption. And some real energy costs, which is what truly gets the attention of the data-center managers and owners.</p>
<p>I’ve heard about the Tabula announcements from several sources starting with a morning-of article in the San Jose Mercury News. One of the best technical write-ups I’ve seen so far is this <a href="http://www.fpgajournal.com/fpgajournal/feature_articles/20100323-tabula/" target="_blank">article</a> by Kevin Morris from FPGA Journal. Online comments to Morris’ article suggest that there’s a lot of skepticism in the design community with respect to this new FPGA technology. As with any new technology, even a tesseract FPGA, time will tell if the market accepts this idea or if it will end up on the shelf next to the long-dead and now-dusty remains of reconfigurable logic.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/04/01/tabula-fpga-scatters-logic-memory-and-power-across-space-and-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel cuts IC power by allowing, detecting, and correcting errors</title>
		<link>http://low-powerdesign.com/sleibson/2010/04/01/cut-power-by-allowing-detecting-and-correcting-errors/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/04/01/cut-power-by-allowing-detecting-and-correcting-errors/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 14:59:39 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[error correction]]></category>
		<category><![CDATA[error detection]]></category>
		<category><![CDATA[Intel]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=331</guid>
		<description><![CDATA[The low-power IC-design train has long ridden the rails of lowered supply voltage. However, these lowered supply rails are tangentially approaching transistor threshold voltages and have long been headed for a serious collision because transistors in large, nanometer ICs run &#8230; <a href="http://low-powerdesign.com/sleibson/2010/04/01/cut-power-by-allowing-detecting-and-correcting-errors/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The low-power IC-design train has long ridden the rails of lowered supply voltage. However, these lowered supply rails are tangentially approaching transistor threshold voltages and have long been headed for a serious collision because transistors in large, nanometer ICs run closer and closer to their switching limits. When designing these large circuits, chip designers and EDA tools must make allowances for noise or voltage droop on the supply rails and noise on the signal interconnects within the chip and that means that the designs can’t really run the transistors as fast as possible or at the lowest possible voltage without risking imperfect operation. And who wants to risk imperfect circuit operation? Well, Intel for one.</p>
<p>In a recent <a href="http://www.technologyreview.com/computing/24843/" target="_blank">article</a> published in the MIT Technology Review, Katherine Bourzac writes up a report from Intel Labs about an experimental 45nm chip that allows circuits to run at sub-optimum voltage and somewhat-too-fast frequency settings. Most of the time, there’s no problem because there’s not enough noise or droop to cause the circuits to compute incorrectly. However, sometimes, under certain conditions, there will be errors. What to do? Add error-detection circuitry to detect errors when they happen and then back up one step in the calculation, raise the operating voltage a bit or drop the operating frequency a bit, re-run the calculation to get the right result, and then back the supply voltage down to normal. This is research into what Intel Labs calls “resilient circuits.”</p>
<p>Is there a benefit to this approach? Specifically, is there a power benefit? Apparently, there is. Bourzac quotes Wen-Hann Wang, director of circuits and systems research at Intel and vice president of Intel Labs, who says that even with the extra error-detection circuitry, the net power savings can be a whopping 37%. (Or, if you’re a speed freak, you can get 21% faster operation without reducing operating power.) Wang points out that today’s chips are designed to operate in demanding, multimode scenarios such as “playing a graphics-rich game, uploading video to Facebook, and surfing the Web” (Isn’t it amazing how cell-phone scenarios have replaced computer-use scenarios these days?) and that today’s devices must be designed to handle such scenarios correctly, which means that the chip’s circuits will be overdesigned and will use excessive power most of the time, when simpler operating modes are in use. An error-detection-and-correction scheme allows the design of chips that only use additional power when it’s needed—when there’s an error.</p>
<p>There are at least two more factors to consider as well. First, chips age. As they do, device thresholds change and metal migrates, leading to minute changes in the currents flowing within the chip—changes that deviate from modeled operating scenarios created during chip design. The normal result of these changes for devices that are designed to run perfectly all the time is that the circuitry eventually does not run perfectly and the chip effectively dies even though it actually could operate properly at a slightly higher operating voltage or a lower operating frequency. Apparently, according to Bourzac’s article, the addition of error-detecting-and-correcting circuitry and algorithms also compensates for the problems associated with chip aging.</p>
<p>Second, as Moore’s Law takes the industry down the rabbit hole of shrinking geometries, many more error sources appear. That makes error-detection-and-correction schemes even more attractive and no doubt that is why Intel Labs is looking into the design of such circuitry now rather than later.</p>
<p>I think that the advent of real error-detecting-and-correcting computational circuitry is long overdue. On-chip-variability already causes enough headaches to trigger more research into how digital circuitry must deal with errors in a probabilistic world, not the absolutely perfect Boolean world we’ve come to assume over the 70 some years of digital design. The storage and memory worlds got the call long ago. Disk drives became probabilistic with the adoption of PRML (partial-response, maximum likelihood) coding more than a decade ago and have always had to use error detection and correction to deal with real-world, flawed storage media. DRAM and NAND manufacturers long ago adopted redundant design to allow for dead bits, rows, and columns in their devices. Viterbi, Turbo, and other algorithms protect digital data from errors inherent in the transmission over the air, with all the associated noise and reflections that are part of everyday cellular telephony. So, is digital design at the chip level different? Apparently not.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/04/01/cut-power-by-allowing-detecting-and-correcting-errors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Laser Spike Annealing of Nickel in Nanometer CMOS ICs Cuts Leakage 10x</title>
		<link>http://low-powerdesign.com/sleibson/2009/12/06/laser-spike-annealing-of-nickel-in-nanometer-cmos-ics-cuts-leakage-10x/</link>
		<comments>http://low-powerdesign.com/sleibson/2009/12/06/laser-spike-annealing-of-nickel-in-nanometer-cmos-ics-cuts-leakage-10x/#comments</comments>
		<pubDate>Sun, 06 Dec 2009 20:22:55 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[Green Design]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[leakage]]></category>
		<category><![CDATA[process_technology]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=261</guid>
		<description><![CDATA[One of the sad facts of life for nanometer silicon has been the rise of leakage current as device geometries shrink. At 65nm, CMOS leakage currents roughly equal operating currents, making it virtually impossible to reduce overall operating current by &#8230; <a href="http://low-powerdesign.com/sleibson/2009/12/06/laser-spike-annealing-of-nickel-in-nanometer-cmos-ics-cuts-leakage-10x/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>One of the sad facts of life for nanometer silicon has been the rise of leakage current as device geometries shrink. At 65nm, CMOS leakage currents roughly equal operating currents, making it virtually impossible to reduce overall operating current by more than half. I’ve long thought this was the result of low-V<sub>t</sub> transistors that can never fully turn off, a consequence of the drive to recover speed that’s lost when supply voltages are cut to reduce operating power. Turns out there’s another culprit: nickel contamination that occurs when nickel atoms drift away from the nickel-silicide interface layer used to improve the connectivity of metal inter-layer contact plugs. The nickel atoms drift during the annealing process, which is used to drive the deposited nickel atoms into the transistors’ source and drain contact pads. The first of two annealing cycles drives the metallic nickel atoms into the silicon source and drain pads creating Ni<sub>2</sub>Si silicide. A second, higher-temperature annealing process converts the Ni<sub>2</sub>Si into NiSi, which has lower resistance and thus provides good electrical connectivity between the contact pad and the metal interconnect plug.</p>
<p>It turns out that the current “soak” annealing (which lasts for tens of seconds) processes allow the nickel atoms to drift far afield. Like beach sand in your bathing suit, the nickel gets into places you’d rather not have it. The drifting nickel atoms seem to have an affinity for silicon lattice discontinuities, which can be found at the outside ends of the transistor where source and drain diffusions meet the isolation trenches and in long, narrow voids that run from the source and drain regions towards and into the FET channel. Both of these hiding places cause leakage because the metallic nickel conducts electricity where there should be insulator or semiconductor material. Nickel at the ends of the transistor causes substrate leakage and nickel atoms in the channel naturally cause channel leakage.</p>
<p>Applied Materials and European semiconductor research powerhouse IMEC have jointly developed a laser-annealing process with one-millisecond duration instead of taking tens of seconds. As a result, the diffusing nickel doesn’t have time to drift into these unwanted places during the second annealing step that generates NiSi. Applied Materials described a similar laser-spike annealing process back in 2004 (see article <a href="http://www.appliedmaterials.com/products/assets/front_end/SST-Low-Temp_Spike_Anneal_for_NiSi.pdf" target="_blank">here</a>), but reportedly achieved only a 3-4% leakage reduction back then. This latest development appears to be a refinement of that earlier technique. The two companies will be presenting their findings at this week’s <a href="http://www.his.com/%7Eiedm/" target="_blank">IEDM</a> conference in Baltimore, Maryland.</p>
<p>IMEC and Applied Materials will indeed have pulled a rabbit out of the hat if this laser-spike annealing process plus the application of appropriate transistor-design rules result in cutting leakage currents by 90% for nanometer CMOS. Leakage-driven power loss has become a significant problem for advanced IC design and had appeared to be insurmountable, even with the addition of high-K and metal-gate processing. Now, it appears there’s a real solution with the best of all possible implications for system and logic designers: they don’t need to learn anything new. They can leave this fix to the design tools and to the process engineers and once again skirt the system-level and architectural issues of low-power design.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2009/12/06/laser-spike-annealing-of-nickel-in-nanometer-cmos-ics-cuts-leakage-10x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Surprising Popularity Rise of On-Chip Memory</title>
		<link>http://low-powerdesign.com/sleibson/2009/11/08/the-surprising-popularity-rise-of-on-chip-memory/</link>
		<comments>http://low-powerdesign.com/sleibson/2009/11/08/the-surprising-popularity-rise-of-on-chip-memory/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 16:53:04 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[DRAM]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SOC]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=243</guid>
		<description><![CDATA[I attended the 7th International SOC Conference in Newport Beach last week and several of the speakers addressed issues relating to SOC and system power. One of these speakers was Bob Madge, Director of Technology Marketing at LSI Corp (formerly &#8230; <a href="http://low-powerdesign.com/sleibson/2009/11/08/the-surprising-popularity-rise-of-on-chip-memory/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I attended the 7<sup>th</sup> International SOC Conference in Newport Beach last week and several of the speakers addressed issues relating to SOC and system power. One of these speakers was Bob Madge, Director of Technology Marketing at LSI Corp (formerly LSI Logic). In case you didn’t know, LSI has been evolving its business from its original focus on developing ASICs and SOCs for customers to a focus on programmable ASSPs (application-specific standard products) and custom silicon specifically aimed at the networking and storage markets. Madge’s first slide explained the reasoning: annual storage-capacity growth is a projected 49% per year and annual network-traffic growth is a projected 42% per year. Good growth numbers for a business to target.</p>
<p>To deliver competitive parts, LSI stays on top of IC design and manufacturing trends. One trend that caught LSI and the semiconductor industry by surprise has been the rapid growth in on-chip memory use. On-chip memory makes sense for two reasons. First and foremost, it provides better performance than off-chip memory because putting memory on the chip along with the logic circuitry eliminates two sets of off-chip drivers and receivers, which reduces power consumption for memory transactions. Second, on-chip logic can communicate with on-chip memory over extremely wide memory interfaces—pin count is not an issue if you stay on the chip. A wide memory interface reduces the number of transfers needed to move a given amount of data and lower transfer rates cut power as well.</p>
<p>However, merging logic and memory on one piece of silicon has always presented design and manufacturing issues. Bulk, high-volume, high-capacity memory manufacturing processes differ from logic manufacturing processes because the two processes must optimize different parameters. Memory processes emphasize low cost manufacturing and tend to have fewer metal layers than logic processes, which emphasize speed and on-chip connectivity. “Frequency, density, and power are always a challenge,” said Madge.</p>
<p>For example:</p>
<ul>
<li>Today’s      network routers use 400-Mbit buffers. Switches need 512 Mbits of storage      or more. In the future, said Madge, these devices will need as much as 1      Gbit of on-chip memory in multiple configurations.</li>
</ul>
<ul>
<li>IP      controllers used in network storage applications currently use 60 to 100      Mbits of cache memory. In the future, these devices will need 200 Mbits of      memory or more.</li>
</ul>
<ul>
<li>Media      processors currently use 60 to 80 Mbits of memory running at 500 MHz.      Future needs will be on the order of 100 to 200 Mbits of memory running at      600 to 700 MHz.</li>
</ul>
<p>All of these examples demonstrate the coming challenges for fast, dense, on-chip memory.</p>
<p>LSI is looking at embedded (on-chip) DRAM and the use of 3D, through-silicon via technology for chip-to-chip stacking as ways of increasing the amount of on-chip memory. The company is doing this because it sees a continued and rapid rise in the amount of on-chip memory needed for its networking and storage chips.</p>
<p>Embedded DRAM cuts power because it uses a 1T (one-transistor) cell, which obviously improves density over a 4T or 6T static RAM cell. However, embedded DRAM also reduces static and dynamic power consumption because the fewer transistors use less power and leak less current than the greater number of transistors required to build the same amount of SRAM memory.</p>
<p>LSI is also investigating other power-saving features that become possible when you move memory onto the logic chip including a sleep mode for the memory, dual power rails, and low-voltage operation. However, said Madge, the biggest benefit appears to be a move to embedded DRAM because of the huge reduction in transistor counts.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2009/11/08/the-surprising-popularity-rise-of-on-chip-memory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Give OTP a chance for low-power, on-chip storage</title>
		<link>http://low-powerdesign.com/sleibson/2009/10/04/give-otp-a-change-for-low-power-on-chip-storage/</link>
		<comments>http://low-powerdesign.com/sleibson/2009/10/04/give-otp-a-change-for-low-power-on-chip-storage/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 18:58:37 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Flash]]></category>
		<category><![CDATA[Hubble]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[Space]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ASIC]]></category>
		<category><![CDATA[OTP]]></category>
		<category><![CDATA[PROM]]></category>
		<category><![CDATA[SOC]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=185</guid>
		<description><![CDATA[The on-chip memories that get most of the attention are read/write memories such as SRAM, DRAM, Flash, and MRAM (which I just covered in my previous blog entry). However, there&#8217;s a place for OTP (one-time programmable) memory on chip, so &#8230; <a href="http://low-powerdesign.com/sleibson/2009/10/04/give-otp-a-change-for-low-power-on-chip-storage/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The on-chip memories that get most of the attention are read/write memories such as SRAM, DRAM, Flash, and MRAM (which I just covered in my previous blog entry). However, there&#8217;s a place for OTP (one-time programmable) memory on chip, so the technology bears some thought. I discussed OTP at last week&#8217;s <a href="http://www.gsaglobal.org/expo/2009/attendees/program.aspx" target="_blank">GSA Emerging Opportunities Expo and Conference</a> in Santa Clara, California with Jim Lipman of <a href="http://www.sidense.com/" target="_blank">Sidense</a>, a vendor that offers hard IP for on-chip OTP memory.</p>
<p>Sidense&#8217;s SiPROM memory cell consists of one specially designed FET as shown in the figure below. The special part of the FET&#8217;s design is a stepped gate-oxide layer with two thicknesses: thick and thin. Unprogrammed, the FET looks like a FET. Programming causes a controlled disruption in the thin part of the FET&#8217;s channel-oxide insulation to produce a conduction path from the FET&#8217;s gate to the conduction channel. Charge-coupled sense amps can detect whether or not an FET in the OTP array has or has not been programmed.</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2009/10/sidense-memory-cell.jpg"><img class="aligncenter size-medium wp-image-186" title="sidense-memory-cell" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2009/10/sidense-memory-cell.jpg" alt="" width="516" height="300" /></a></p>
<p>It&#8217;s because of the charge-coupled sense amps that Sidense&#8217;s SiPROM technology qualifies as a low-power memory technology. These sense amps are only on for tens of nanoseconds during a read cycle and are not powered continuously. This is a patented feature of Sidense&#8217;s technology.</p>
<p>Although designers have an obvious bias towards read/write technologies for on-chip memory, OTP memory can be quite useful for storing infrequently programmed or reprogrammed data such as calibration and trim settings, serial numbers, configurations, boot code, and security keys. This last application is particularly interesting. Lipman provided an example. The security keys for the HDMI digital display interface spec need about 2.5 kbits for storage. However, there&#8217;s the possibility that the security can be broken and that new keys will need to be distributed. A 16-kbit array of OTP memory can store about six sets of HDMI keys, which should be enough storage to last beyond the expected life of the end equipment.</p>
<p>You should also be aware of the factors that argue in favor of on-chip OTP memory. Sidense&#8217;s cells are about 1.2x larger than ROM cells, so there&#8217;s a 20% size penalty in exchange for the flexibility of programmability. In exchange for this size penalty, there&#8217;s no need for a mask change if the data stored in the OTP ROM needs to be changed in the factory or in the field (for an update).</p>
<p>In addition, Sidense&#8217;s OTP memory easily tracks IC manufacturing process changes although it&#8217;s hard IP, so Sidense must tailor the IP for each vendor&#8217;s process technology. Sidense&#8217;s SiPROM products are currently available from 180nm to 55nm and are portable to 40nm and below. Supported foundries include TSMC, UMC, Fujitsu Microelectronics, SMIC, Tower, IBM and Chartered.</p>
<p>It&#8217;s also interesting to compare OTP memory with Flash. Lipman says that Sidense&#8217;s OTP SiPROM cells are about half the size of Flash cells for a given semiconductor technology. In addition, the creation of Flash-cell floating gates adds process changes that can add roughly 30% to wafer production costs. Finally, Flash process technology is clearly getting into trouble as lithographies shrink. Some presenters at the recent <a href="http://www.flashmemorysummit.com/" target="_blank">Flash Memory Summit</a> were predicting that the 22nm node might be the last node to support Flash memory, although such end-of-the-world prognostications from the semiconductor pundits are often wrong. By contrast, Sidense&#8217;s SiPROM cells require only standard CMOS processing, so the company claims it&#8217;s easier for their OTP memory than it is for Flash cells to track process improvements.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2009/10/04/give-otp-a-change-for-low-power-on-chip-storage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Could A Low-Power Middle Ground Between ASICs/SOCs and FPGAs Help You?</title>
		<link>http://low-powerdesign.com/sleibson/2009/09/05/could-a-low-power-middle-ground-between-asicssocs-and-fpgas-help-you/</link>
		<comments>http://low-powerdesign.com/sleibson/2009/09/05/could-a-low-power-middle-ground-between-asicssocs-and-fpgas-help-you/#comments</comments>
		<pubDate>Sat, 05 Sep 2009 16:24:04 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[eASIC]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Nextreme]]></category>
		<category><![CDATA[structured AISC]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=124</guid>
		<description><![CDATA[You can’t always get what you want, But if you try sometime, You’ll find, You get what you need. Those lyrics from a song from the Rolling Stones describes the situation with ASICs/SOCs and FPGAs. For low power, you want &#8230; <a href="http://low-powerdesign.com/sleibson/2009/09/05/could-a-low-power-middle-ground-between-asicssocs-and-fpgas-help-you/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>You can’t always get what you want,<br />
But if you try sometime,<br />
You’ll find,<br />
You get what you need.</em></p>
<p>Those lyrics from a song from the Rolling Stones describes the situation with ASICs/SOCs and FPGAs. For low power, you want an ASIC or SOC. However, there are huge obstacles to using an ASIC or SOC. First, you need a team that knows how to design custom silicon or you need to rent one—which is expensive. If you have your own design team, you should be prepared to drop a million dollars or so on design tools and another million or so on NRE charges. Also be prepared for a 6-18 month design cycle, lots of painstaking verification, and the risk of at least one silicon respin due to design errors or spec changes. High risk indeed.</p>
<p>On the other hand, there are FPGAs. The NRE cost is zilch. The design tools are low-cost or no-cost. There’s no physical chip design required, hence a lot less verification. In short, it’s much easier to design a system based on FPGAs than on SOCs or ASICs, but there’s a price to pay: higher unit cost, less performance, and higher power consumption. All three figures of merit are 10-20x out of whack for FPGAs versus ASICs/SOCs. In addition, you’ll not get the same maximum gate count in an FPGA, not by a long shot.</p>
<p>So if you need an ASIC or SOC, then you need one. If not, and if an FPGA’s part cost, power consumption, and/or performance aren’t where your design needs to be, there is a middle ground. In the recent past, this middle-ground component has been called a “structured ASIC.” That&#8217;s become a tarnished name. In the distant past, the name for a similar sort of device might be called a “gate array.” Today, eASIC calls it a “new ASIC.”</p>
<p>What’s a “new ASIC”? If it’s an eASIC Nextreme or Nextreme2, then it’s a predesigned field-of-LUTs device with a preconfigured routing fabric on the metal layers. The only unconfigured layer is the via 6 layer. Standard Nextreme wafers are processed to metal layer 6 and stored. When a design is sent in, the via 6 and metal 7 and 8 layers are added. Depending on how fast the part needs to be made, the via 6 layer is customized using either direct-write e-beam or a standard lithographic mask and then the standard metal 7 and 8 layers are added on top.</p>
<p>So, what do you get from this technology? You get a zero-NRE, FPGA-like device that has much higher silicon density than an FPGA because there are no switches or configuration RAM cells in the routing fabric—just fast, tiny layer-6 configuration vias. Consequently, you get a chip that can clock faster than an FPGA—250 MHz (typical) for a 90nm Nextreme New ASIC and 500 MHz (typical) for a 45nm Nextreme2 &#8220;new ASIC.&#8221; You get a device that operates at lower power than an FPGA and you get a device that offers more gates/chip at lower component costs (but not as low as for an ASIC/SOC). You also get a chip that’s easier to design than an ASIC/SOC and one that can be delivered in as little as 4 weeks. Design-tool cost is lower than for ASICs/SOCs as well because eASIC offers a specialized, Nextreme-specific version of Magma’s design tools for as little as $8k per seat.</p>
<p>What are Nextreme parts used for? I asked Jasbinder (Jazz) Bhoot, eASIC’s VP of Worldwide Marketing, that question. His answer was both interesting and a bit surprising:</p>
<ul>
<li> Cell phone microprojectors (where cost and power dissipation are critical)</li>
<li> Other microprojectors</li>
<li> Medical devices such as ultrasound imagers where power is not so much of a problem but device cooling is a big problem</li>
<li> Portable medical devices that run on batteries</li>
<li> Wired networking products where Nextreme parts are consolidating several FPGA designs into one chip with much lower power consumption</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2009/09/05/could-a-low-power-middle-ground-between-asicssocs-and-fpgas-help-you/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Hunka, Hunka Burning CMOS (All About Latchup)</title>
		<link>http://low-powerdesign.com/sleibson/2009/07/05/a-hunka-hunka-burning-cmos-all-about-latchup/</link>
		<comments>http://low-powerdesign.com/sleibson/2009/07/05/a-hunka-hunka-burning-cmos-all-about-latchup/#comments</comments>
		<pubDate>Sun, 05 Jul 2009 18:47:52 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[latchup]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=64</guid>
		<description><![CDATA[You’re a mere 10 minutes from completely understanding and preventing CMOS latchup in your low-power designs. Wizard of Oz Dave Jones has just posted his sixteenth EE Video Blog on these topics. Here it is:      ]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">You’re a mere 10 minutes from completely understanding and preventing CMOS latchup in your low-power designs. Wizard of Oz Dave Jones has just posted his <a title="EEVBlog #16" href="http://www.alternatezone.com/eevblog/" target="_blank">sixteenth EE Video Blog</a> on these topics. Here it is:</p>
<p style="text-align: center;"> </p>
<p style="text-align: center;"> </p>
<p> <br />
<center><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/S0TZMivVzVk&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/S0TZMivVzVk&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1"></embed></object><br />
</center></p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2009/07/05/a-hunka-hunka-burning-cmos-all-about-latchup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.471 seconds -->

