<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Steve Leibson &#187; SDRAM</title>
	<atom:link href="http://low-powerdesign.com/sleibson/index.php/category/sdram/feed/" rel="self" type="application/rss+xml" />
	<link>http://low-powerdesign.com/sleibson</link>
	<description>Leibson's Laws and the Penalties for Breaking Them</description>
	<lastBuildDate>Wed, 01 Feb 2012 00:01:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Is 2012 going to be another breakout year for NAND Flash and Low-Power Design?</title>
		<link>http://low-powerdesign.com/sleibson/2012/01/09/is-2012-going-to-be-another-breakout-year-for-nand-flash-and-low-power-design/</link>
		<comments>http://low-powerdesign.com/sleibson/2012/01/09/is-2012-going-to-be-another-breakout-year-for-nand-flash-and-low-power-design/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 13:00:04 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[Flash]]></category>
		<category><![CDATA[SDRAM]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[Video]]></category>
		<category><![CDATA[cadence]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Micron]]></category>
		<category><![CDATA[NAND]]></category>
		<category><![CDATA[Nikon]]></category>
		<category><![CDATA[Samsung]]></category>
		<category><![CDATA[Sony]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=754</guid>
		<description><![CDATA[It’s just one week into the year, I am increasingly getting the feeling that 2012 is going to be a momentous, tumultuous year for semiconductor technology and low-power system design. Among the many recent events that are giving me this &#8230; <a href="http://low-powerdesign.com/sleibson/2012/01/09/is-2012-going-to-be-another-breakout-year-for-nand-flash-and-low-power-design/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It’s just one week into the year, I am increasingly getting the feeling that 2012 is going to be a momentous, tumultuous year for semiconductor technology and low-power system design. Among the many recent events that are giving me this feeling are the changes taking place in the NAND Flash arena. Nearly all low-power system designers depend on NAND Flash in some form because it is currently the technology of choice for storing code and data when a system is in deep low-power/sleep mode or when switched off. We use NAND Flash on chip for microcontrollers. We use NAND Flash chips on board for main storage in mobile phone handsets, tablets, eBook readers, and many other embedded systems. We use NAND Flash cards for removable storage in cameras, camcorders, mobile phone handsets, voice recorders, and media players. Any changes to NAND Flash technology ripple widely through the low-power design landscape like earth tremors.</p>
<p>At least three major changes to NAND Flash technology in the recent past have caught my attention. The first such event I want to discuss in this blog entry is the HMC or Hybrid Memory Cube that Micron first announced last year and is now in joint development with major partners including Samsung and IBM.</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2012/01/Micron-Hybrid-Memory-Cube.png"><img class="alignright size-full wp-image-756" style="margin: 10px;" title="Micron Hybrid Memory Cube" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2012/01/Micron-Hybrid-Memory-Cube.png" alt="" width="252" height="186" /></a>I previously wrote about the HMC (see “<a href="http://eda360insider.wordpress.com/2011/12/01/3d-thursday-hybrid-memory-cube-does-anyone-know-whats-happening-with-ibm-and-micron/" target="_blank">3D Thursday: Hybrid Memory Cube—Does anyone know what’s happening with IBM and Micron?</a>”) and its design is for high-performance computing systems that require extremely high throughput: 1 Tbit/sec. (See “<a href="http://eda360insider.wordpress.com/2011/08/22/want-to-know-more-about-the-micron-hybrid-memory-cube-hmc-how-about-its-terabitsec-data-rate/" target="_blank">Want to know more about the Micron Hybrid Memory Cube (HMC)? How about its terabit/sec data rate?</a>”) The HMC is a DRAM example of the kinds of memory modules we’re likely to see from the marriage of 3D IC assembly techniques and advanced NAND Flash devices.</p>
<p>The HMC runs many, many TSVs (through silicon vias) up through a stack of as many as four SDRAM die to access the inherent parallelism of the multiple DRAM arrays on each die. Each proprietary DRAM die in the HMC stack has 16 separate memory arrays, resulting in substantial potential parallelism and consequently, substantial potential memory throughput.</p>
<p>However, the high-performance approach of the HMC is not the only way to harness 3D assembly and semiconductor memory. For example, at the end of last year, I wrote an extended blog describing a thought experiment that employed the HMC design concepts using Wide I/O SDRAM instead of the special NAND Flash chips in the HMC. (See “<a href="http://eda360insider.wordpress.com/2011/12/28/3d-thursday-lets-end-2011-with-a-high-performance-dram-memory-stack-design-how-would-you-improve-it/" target="_blank">3D Thursday: Let’s end 2011 with a high-performance DRAM memory stack design. How would you improve it?</a>”) Wide I/O SDRAM presents four independent 128-bit DRAM channels to the host system, resulting in a high level of memory parallelism. Just not as high as for the HMC. In fact, the performance is about half that of the HMC but it’s still pretty good. The same parallelism concepts could be applied to NAND Flash devices designed to a similar Wide I/O specification for NAND Flash. The lower interface speeds enabled by a Wide I/O memory interface port really drop power consumption while maintaining good performance through the parallelism uncovered by the access to the multiple on-chip memory arrays.</p>
<p>I have not heard of any efforts to adopt the Wide I/O interface spec to NAND Flash devices. Not yet. But the move to extracting parallelism from the arrays on all memory chips is too attractive to ignore in a world that perpetually thirsts for bandwidth at low power.</p>
<p>At the end of the year, two other announcements directly related to NAND Flash memory have caught my eye: the introduction of the XQD memory card format and the ONFI 3.0 interface spec. The Compact Flash Association <a href="http://compactflash.org/2011/compactflash-association-announces-the-first-video-performance-guarantee-vpg-profile-specification/" target="_blank">introduced</a> the XQD memory card format in December 2011. The XQD memory card has a slightly larger footprint than an SD memory card and a somewhat smaller footprint than a Compact Flash (CF) memory card. It’s as thick as a CF card. But the really big difference here is the interface to the memory card. The XQD memory card uses a PCIe (PCI Express) interface clocked initially at 2.5 Gbits/sec, resulting in a maximum write speed of 125 Mbytes/sec.</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2012/01/Nikon-D4-DSLR.png"><img class="size-full wp-image-757 alignright" style="border: 0px;" title="Nikon D4 DSLR" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2012/01/Nikon-D4-DSLR.png" alt="" width="248" height="238" /></a>That’s really fast and speed is important when you’re shooting large images at a fast rate, which occurs during HD video recording and at high burst speeds in high-resolution digital still cameras. Both such conditions exist in the new Nikon D4 DSLR, which Nikon <a href="http://www.dpreview.com/news/2012/01/06/NikonD4" target="_blank">launched</a> just last week. The Nikon D4 DSLR can shoot 16.2 Mpixel frames at 10 to 11 frames per second. Normally, DSLRs use in-camera RAM to buffer burst-mode still captures but the Nikon D4 DSLR can accept the new XQD memory cards and Sony <a href="http://www.dpreview.com/news/2012/01/06/sony-xqd-memory-cards" target="_blank">introduced</a> the first series of such cards last week, concurrent with Nikon’s introduction of the Nikon D4 DSLR.</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2012/01/Sony-H-Series-XQD-card.png"><img class="alignright size-full wp-image-758" title="Sony H Series XQD card" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2012/01/Sony-H-Series-XQD-card.png" alt="" width="162" height="227" /></a>Sony claims that its H Series XQD card can accept bursts of 100 uncompressed still images from the Nikon D4 DSLR in continuous shot mode. That’s a huge jump in burst length for a digital still camera and will be invaluable in shooting images of sports activities, for example.</p>
<p>One of the secrets behind the XQD card format’s performance is that PCIe interface port, which is also unique in that it is a memory interface and is not derived from a disk interface. That should mean that a host processor doesn’t need a disk controller to operate an XQD card. The card can be mapped to the host processor’s memory bus and the controller can reside in each memory card. Eliminating the disk controller from the serial chain between the processor and the Flash memory chips should cut costs, reduce power consumption, and boost performance.</p>
<p>All of those benefits are welcome in the world of low-power design. After all, do we really need controllers controlling controllers in an efficient system design? I don’t think so.</p>
<p>Now before you bemoan the need of a controller in each memory card, you should be aware that there already is a controller in each CF and SD memory card. You don’t think that NAND Flash arrays already look like disk drives, do you? We do indeed currently have controllers controlling controllers in existing NAND Flash memory subsystems.</p>
<p>A PCIe interface spec should simplify things somewhat.</p>
<p>The third development that’s caught my eye in the Flash memory arena is the announcement of the ONFI 3.0 interface specification for Flash memory. The ONFI (Open NAND Flash Interface) Working Group <a href="http://onfi.org/news-events/onfi-announces-publication-of-the-3-0-standard-pushes-data-transfer-speeds-to-400-mbsec/" target="_blank">introduced</a> the third major revision of the ONFI spec nearly a year ago, in March 2011. What’s new is that there are now products appearing that use ONFI 3.0.</p>
<p>The advantage of the new ONFI specification is that it doubles transfer rates to 400 Mtransfers/sec using the NV-DDR2 200MHz double-data-rate (DDR) protocol while adopting 1.8V SSTL_18 signaling to cut the power dissipation of the interface. See a pattern evolving here? More performance and less power consumption. The question is whether or not ONFI 3.0 is real or not. Well, the memories now seem real because <a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2012/01/Intel-Micron-128Gbit-ONFI-3-Flash-chip.png"><img class="alignright size-full wp-image-759" title="Intel Micron 128Gbit ONFI 3 Flash chip" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2012/01/Intel-Micron-128Gbit-ONFI-3-Flash-chip.png" alt="" width="300" height="261" /></a>Intel and Micron jointly <a href="http://newsroom.intel.com/community/intel_newsroom/blog/2011/12/06/intel-micron-extend-nand-flash-technology-leadership-with-introduction-of-worlds-first-128gb-nand-device-and-mass-production-of-64gb-20nm-nand" target="_blank">previewed</a> a 128Gbit NAND Flash device in December with the derivative 64Gbit NAND Flash device going into production now. According to the joint Intel/Micron announcement, the 128Gbit device will be in volume production later this year after a “rapid transition” from the 64Gbit device.</p>
<p>However, an ONFI 3.0 memory device isn’t sufficient. You also need a controller on an SOC that can operate ONFI 3.0 devices. Cadence just <a href="http://www.cadence.com/cadence/newsroom/press_releases/pages/pr.aspx?xml=010912_onfi3" target="_blank">introduced</a> an ONFI 3.0 NAND Flash controller IP block and companion PHY IP today along with appropriate verification IP so it’s now possible to include an ONFI 3.0 NAND Flash controller in an SoC design using the standard ASIC flow.</p>
<p>As you can see, there’s a tremendous amount of new technological development going into NAND Flash memory and I see big things ahead this year, all to the benefit of low-power system designers.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2012/01/09/is-2012-going-to-be-another-breakout-year-for-nand-flash-and-low-power-design/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Altera introduces SoC FPGA melding ARM Cortex-A9 dual-core processor complex with a 28nm FPGA fabric</title>
		<link>http://low-powerdesign.com/sleibson/2011/10/11/altera-introduces-soc-fpga-melding-arm-cortex-a9-dual-core-processor-complex-with-a-28nm-fpga-fabric/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/10/11/altera-introduces-soc-fpga-melding-arm-cortex-a9-dual-core-processor-complex-with-a-28nm-fpga-fabric/#comments</comments>
		<pubDate>Tue, 11 Oct 2011 12:00:33 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SDRAM]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[Altera]]></category>
		<category><![CDATA[SoC FPGA]]></category>
		<category><![CDATA[Xilinx]]></category>
		<category><![CDATA[Zynq]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=668</guid>
		<description><![CDATA[Xilinx first started to talk publicly about the fusion of processors and FPGAs—a product now known as Zynq—in 2010 and has announced plans to roll out parts by the end of this year. It was inevitable that Altera would eventually &#8230; <a href="http://low-powerdesign.com/sleibson/2011/10/11/altera-introduces-soc-fpga-melding-arm-cortex-a9-dual-core-processor-complex-with-a-28nm-fpga-fabric/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Xilinx first started to talk publicly about the fusion of processors and FPGAs—a product now known as Zynq—in 2010 and has announced plans to roll out parts by the end of this year. It was inevitable that Altera would eventually counter with a competing product line. Today the company revealed plans for a line of chips called SoC FPGAs and comparisons between the Altera and Xilinx offerings are inevitable, but let’s look at the details for the Altera offerings.</p>
<p>The SoC FPGA line will include at least 18 different chips with various configurations for the “Hard Processor System” (HPS) and various sizes for the FPGA fabrics connected to the HPS block. In addition, the SoC FPGA product line will be based on two of the Altera 28nm FPGA fabrics—Cyclone V and Arria V—for two different speed grades within the product line. Here’s a generalized block diagram of a device in the product line:</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/Altera-SoC-FPGA-Block-Diagram.jpg"><img class="aligncenter size-full wp-image-669" title="Altera SoC FPGA Block Diagram" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/10/Altera-SoC-FPGA-Block-Diagram.jpg" alt="" width="513" height="757" /></a></p>
<p>The SoC FPGAs’ HPS is based on two 800MHz ARM Cortex-A9 processor cores with ARM Neon and single/double-precision FPU extensions. Each ARM Cortex-A9 processor has its own L1 caches—separate 32Kbyte L1 caches for instructions and data. The two processor cores share a unified 512Kbyte L2 cache. Each processor also has private interval and watchdog timers. To keep the two processor cores fed with instructions and data, there’s a hard-core, multiport DDR SDRAM controller in the HPS that supports the DDR2 and DDR3 and the LPDDR1 and LPDDR2 SDRAM interface protocols. There’s also a Flash memory controller with a built-in DMA engine. The NAND Flash controller supports NOR and NAND Flash memories including ONFi 1.0 devices and SD, SDIO, and MMC memory cards. In addition, there’s ECC support for the SDRAM and the NAND Flash interfaces.</p>
<p>Next up are the hard-core peripherals within the HPS. There are a lot of them:</p>
<ul>
<li>Two 10/100/1000 Ethernet MACs with DMA</li>
<li>Two USB 2.0 On-the-Go (OTG) controllers with DMA</li>
<li>Four I2C controllers</li>
<li>Two CAN (Controller Area Network) controllers</li>
<li>SPI Master and SPI Slave ports</li>
<li>Two UARTs</li>
<li>General-purpose ports</li>
</ul>
<p>On-chip memory includes 64Kbytes of RAM and a boot ROM.</p>
<p>That’s already quite a lot but then there’s the FPGA section  of the SoC FPGA to consider. On-chip FPGA capacity varies depending on whether the particular SoC FPGA device is based on the Cyclone V or Arria V FPGA fabrics. Devices based on the Cyclone V FPGA fabric will be offered with 25K, 40K, 85K, and 110K logic elements. Devices based on the Arria V FPGA fabric will be offered with 350K and 460K logic elements.</p>
<p>The HPS in the Altera SoC FPGA connects to the on-chip FPGA fabric though two 128-bit AXI buses—one for reads and one for writes. As you can see from the block diagram above, the hard-core peripherals not included in the HPS block separately connect to the FPGA fabric. What’s not apparent from the diagram is that the two ARM Cortex-A9 processors share a Snoop Control Unit (SCU) and there&#8217;s an ACP (accelerator coherency port) linking the HPS to the FPGA fabric so it’s possible to engineer accelerators that maintain coherency with the ARM Cortex-A9 processor cores&#8217; caches and implement them using the on-chip FPGA fabric.</p>
<p>In addition to the six FPGA array sizes (four for the devices based on the Cyclone V FPGA fabric and two for devices based on the Arria V FPGA fabric), Altera plans to offer parts with three HPS subsystem configurations: base, mid, and high. Combined with the six FPGA fabric sizes, that means there are at least 18 Altera SoC FPGA parts planned for the initial product lineup. Altera says that there will also be 1-processor variants in the SoC FPGA lineup. Just in case you suspect that’s perhaps a bit underpowered, keep in mind that essentially 100% of all system designs based on microcontrollers use a far less capable processor core than one 800MHz ARM Cortex-A9 core. You might want to check to make sure you’re not becoming overly acclimatized to multicore designs. On the other hand, if you’re running Android then two capable processor cores will come in handy.</p>
<p>As the block diagram above shows, there are additional hard-core peripherals connected to the SoC FPGA chip’s FPGA array: as many as three more multiport SDRAM controllers, a Gen2 x4 PCIe port (supplemented with the possibility of implementing a soft Gen2 x8 PCIe port in the FPGA fabric), and as many as six 10Gbps high-speed, differential  serial transceivers and as many as thirty 6Gbps high-speed, differential  serial transceivers. These additional peripheral ports have separate access paths into the FPGA fabric of the SoC FPGA devices.</p>
<p>Perhaps the most interesting news is that low-end members of the Altera SoC FPGA family will sell for $15 in “high volumes.” That’s a lot of capability for a relatively low price. In fact, that’s a very low price in the FPGA world. The bad news is that Altera doesn’t plan to ship devices until the second half of 2012.</p>
<p>So that’s the Altera SoC FPGA. Now for the inevitable comparison based on my previous write-ups of the Xilinx Zynq. (See “<a href="http://low-powerdesign.com/sleibson/2011/03/01/xilinx-zynq-epps-create-a-new-category-that-fits-in-among-socs-fpgas-and-microcontrollers/" target="_blank">Xilinx Zynq EPPs create a new category that fits in among SoCs, FPGAs, and microcontrollers</a>”.) First, there’s the processor complex—what Altera calls the HPS. The two products are remarkably similar here: two 800MHz ARM Cortex-A9 processor cores with Neon DSP and FPU extensions, 512Kbytes of unified L2 cache, Flash controller, one SDRAM controller, Snoop Control Unit, timers and watchdog, DMA, etc. Both processor complexes support an ACP (Accelerator Coherency Port) interface into the FPGA fabrics.</p>
<p>There’s some difference in the processor complex-to-FPGA connection scheme: Altera offers one 128-bit read and one 128-bit write AXI bus and a 32-bit APB (Advanced Peripheral Bus) port plus additional ports that go directly from the FPGA fabric to the multiport SDRAM controller in the HPS. Xilinx offers four 32-bit and four 64-bit AXI ports plus direct access from the FPGA fabric to the SDRAM controller. So the Xilinx parts theoretically provide more raw interconnect bandwidth between the processor complex and the FPGA fabric than do the Altera parts. It remains to be seen if that raw capability can deliver more bandwidth in practice, but the potential is clearly there.</p>
<p>But wait! Hold on there! The Altera SoC FPGA parts offer as many as three more SDRAM controllers outside of the hard-core HPS processor complex and those SDRAM controllers can be connected either to devices implemented as soft cores in the on-chip FPGA fabric or through the FPGA fabric to the HPS. That added SDRAM control capability could really be an advantage in systems with extremely high SDRAM bandwidth requirements.</p>
<p>Then there’s the PCIe controller. On the Altera SoC FPGAs, there’s one hard-core Gen2 x4 PCIe port and the possibility of implementing a second, soft-core Gen2 x8 PCIe port in the FPGA fabric. The Xilinx Zynq parts will provide a hard-core Gen2 x4 or x8 PCI port, depending on the family member. There are additional 10.3Gbps serial channels available on the Xilinx Zynq components, so a soft-core PCIe controller is a possibility, as it is for the Altera SoC FPGAs.</p>
<p>Since I’ve brought up the topic of the FPGA fabric, let’s compare those as well. The various Altera SoC FPGA family members offer six FPGA fabric sizes: 25K, 40K, 85K, 110K, 350K, and 460K logic elements. The announced Xilinx Zynq family offers four fabric sizes: 30K, 85K, 125K, and 235K logic elements. So if you need really big FPGA fabrics to complement the capabilities provided by the processor complexes, then the Altera SoC FPGA family seems to offer more capacity for now. However, should a battlefield form at the high end, you can bet that Xilinx will be filling out the product line at the high end, where there’s more margin to be made.</p>
<p>Finally there’s pricing and availability. Both companies have announced high-volume unit pricing “below $15”  but the Xilinx parts are supposed to be available this year and the Altera parts are scheduled to appear in the latter part of next year.</p>
<p>Together, today&#8217;s Altera SoC FPGA announcement and the previous Xilinx Zynq announcements create a truly exciting new product category—one that fuses FPGAs with high-performance microprocessors in a way guaranteed to dramatically extend the reach of FPGAs. The resulting mixture of capability, performance, power consumption, and cost simply cannot be replicated with a 2-chip design.</p>
<p>I predict that many system designers will be unable to resist this combination. Naysayers will point to previous failed attempts at merging FPGAs and hard microprocessor cores and some will predict a similar fate for this new generation of parts. Much has changed. First, the embedded industry has adopted the ARM architectures  and there is a large body of programming talent available for this architecture. Second, these new parts are not FPGAs with processor cores tacked on. They are very capable and complete processor complexes, application processors in their own right, augmented with FPGA fabrics. From my perspective, the Altera SoC FPGAs and Xilinx Zynq parts stand a very good good chance of definining a new and vibrant component category.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/10/11/altera-introduces-soc-fpga-melding-arm-cortex-a9-dual-core-processor-complex-with-a-28nm-fpga-fabric/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Think Globally, Act in Parallel. What can you do with one million ARM cores acting in parallel and how do you get there?</title>
		<link>http://low-powerdesign.com/sleibson/2011/07/16/think-globally-act-in-parallel-what-can-you-do-with-one-million-arm-cores-acting-in-parallel-and-how-do-you-get-there/</link>
		<comments>http://low-powerdesign.com/sleibson/2011/07/16/think-globally-act-in-parallel-what-can-you-do-with-one-million-arm-cores-acting-in-parallel-and-how-do-you-get-there/#comments</comments>
		<pubDate>Sat, 16 Jul 2011 23:47:06 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[CMOS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[DRAM]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[SDRAM]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[SRAM]]></category>
		<category><![CDATA[Cortex-M0]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Samsung]]></category>
		<category><![CDATA[SpiNNaker]]></category>
		<category><![CDATA[UMC]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=615</guid>
		<description><![CDATA[Professor Steve Furber’s SpiNNaker project is in the news again. I wrote about Furber’s massively parallel brain-emulation project back on March 30 after listening to his keynote at this year’s DATE (Design Automation and Test Europe) conference in Grenoble, France. &#8230; <a href="http://low-powerdesign.com/sleibson/2011/07/16/think-globally-act-in-parallel-what-can-you-do-with-one-million-arm-cores-acting-in-parallel-and-how-do-you-get-there/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Professor Steve Furber’s SpiNNaker project is in the news again. I wrote about Furber’s massively parallel brain-emulation project back on March 30 after listening to his keynote at this year’s DATE (Design Automation and Test Europe) conference in Grenoble, France. (See “<a href="http://low-powerdesign.com/sleibson/2011/03/30/the-incredible-vanishing-power-of-a-machine-instruction-is-this-the-way-to-the-brain/" target="_blank">The incredible vanishing power of a machine instruction. Is this the way to the brain?</a>”) Furber’s DATE keynote title says it all: “Biologically-inspired massively-parallel architectures—computing beyond a million processors.” Furber and his team are referencing nature to help them tackle the really hard processing problems we need to solve in the future through massively parallel, brain-like computing. Brain-like computing—go slow, go wide, go massively parallel—seems to offer a proven, low-power approach to solving some of these big computational problems.</p>
<p>The SpiNNaker project is again in the news at EETimes Europe (see “<a href="http://www.electronics-eetimes.com/en/a-million-arm-cores-to-host-brain-simulator.html?cmp_id=7&amp;news_id=222908354&amp;vID=209" target="_blank">A million ARM cores to host brain simulator</a>”) and the idea of harnessing one million ARM processor cores is certainly a big idea. It excites me. However, we’re still at the humble beginnings of the project.</p>
<p>The SpiNNaker project’s first test chip harnesses 18 ARM9 cores on one 130nm chip manufactured by UMC in Taiwan. This is a 100M-transistor chip and, like most many-processor SoCs, the SpiNNaker SoC mostly consists of memory. The memory needs to be close to the processors for speed and for low-power consumption and there are 55 32Kbyte SRAM blocks on the SpiNNaker die. That’s 14 million bits of SRAM and, frankly speaking, that’s really not very much SRAM. Eighteen processors isn’t really a large number of processors either when your stated goal is one million.</p>
<p>The ARM processors on the SpiNNaker chip use packet communications to emulate the electrical spike communications that occur among the neurons in human and animal brains. From a hardware perspective, I think it’s easy to conceive of a system-level design like this and even conceptually scaling the design to a million connected ARM9 processors isn’t really hard, as long as you don’t try to enumerate all of the processors in your mind. However, with 18 processors per chip, you’ll need approximately 55,600 chips to build an interconnected network of one million processors. That’s still a mighty big box of hardware. More on that in a bit.</p>
<p>The rub is that we really don’t have many good ideas for programming such a massively parallel system. The SpiNNaker project seems to be mostly a hardware endeavor with the explicitly stated intent of developing a hardware testbed for brain researchers who will use SpiNNaker systems for studying various theories of brain function. Presumably, we’ll learn more about massively parallel programming by working with these systems and no doubt we will. As Furber says in a quote published in the EETimes Europe article, “We don&#8217;t know how the brain works as an information-processing system, and we do need to find out. We hope that our machine will enable significant progress towards achieving this understanding.&#8221;</p>
<p>Each SpiNNaker chip in the current design is bundled with a 166MHz, 1Gbit DDR SDRAM and packaged in a 300-pin BGA package. But we’re not going to be building million-processor testbeds with 18 processors per packaged chip. I’m almost absolutely, positively certain about that. This first SpiNNaker prototype just doesn’t scale to one million processors very easily. So the question is, how to get there?</p>
<p>Well, possible clues to answer that question can be found in two recent blogs that I wrote on the <strong>EDA360 Insider</strong> blog. First, Samsung has just announced successful tapeout of a 20nm test chip incorporating an ARM Cortex-M0 processor core. (See “<a href="http://eda360insider.wordpress.com/2011/07/12/samsung-20nm-test-chip-includes-arm-cortex-m0-processor-core-how-many-will-fit-on-the-head-of-a-pin/" target="_blank">Samsung 20nm test chip includes ARM Cortex-M0 processor core. How many will fit on the head of a pin?</a>”) Now an ARM Cortex-M0 processor is not as powerful as an ARM9 processor, but then it’s not supposed to be. It’s designed for control-oriented applications and its 3-stage execution pipeline isn’t designed to get maximum speed from any given process technology. However, we’re building a system that emulates a brain that operates at a few hundred Hertz (that’s <strong>Hertz</strong>, not kilohertz, megahertz, or gigahertz) so I really don’t think the clock speed is all that critical when you’re talking about a million processors. The ARM Cortex-M0 processor core is still a 32-bit RISC processor and I am guessing with a high degree of confidence that it’s fully up to the task of executing the required electrical-spike calculations, albeit not quite as quickly as an ARM9 processor.</p>
<p>What’s interesting about a 12-to-14Kgate ARM Cortex-M0 processor implemented in 20nm process technology is that my calculations suggest that more than half a million ARM Cortex-M0 processors would fit on a chip the size of an Intel “Tukwila” Itanium processor (OK, that’s a big chip, but it’s a commercial one) and that calculation is based on the published number for the area required by an ARM Cortex-M0 implemented in 90nm process technology, not 20nm. Now there’s a lot of slop in this calculation. First, there’s the disparity of using 90nm numbers instead of 20nm numbers. Then there’s the disparity caused by putting no memory at all into the calculation. I just mentally tiled processors edge to edge. Ditto, there’s no on-chip interconnect.</p>
<p>So you probably won’t get half a million ARM Cortex-M0 processor cores on one 20nm chip. But you might get 100,000 or 200,000 ARM Cortex-M0 processor cores on a chip along with an interesting amount of memory and the required interconnect. Now we’re talking about only a handful of chips to get to one million processors. We’re talking about a tabletop box. Now we’re getting into the realm of the feasible for million-processor systems.</p>
<p>The second related blog entry I recently wrote in <strong>EDA360 Insider</strong> that also bears on this very interesting endeavor was about an announcement from Imec, a global research company. Just days ago, Imec announced that it and its partners successfully assembled a custom logic chip with two DRAMs in a stacked 3D configuration. (See “<a href="http://eda360insider.wordpress.com/2011/07/14/3d-thursday-imec-prototypes-3d-chip-stack-finds-some-thermal-surprises/" target="_blank">3D Thursday: IMEC prototypes 3D chip stack, finds some thermal surprises</a>”.) This 3D stacked-chip prototype allowed Imec to test out some process ideas for manufacturing 3D stacked chip assemblies and to make some critical thermal tests to verify thermal models that will be so necessary when 3D assembly goes mass market. The 3D chip stack uses copper-tin micro-bumps and compression bonding for the electrical and mechanical assembly of the chip stack and you can see photos of the assembled stack below.</p>
<p>Here’s a photo of the overall chip stack:</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/07/Imec-3D-Chip.bmp"><img class="aligncenter size-full wp-image-616" title="Imec 3D Chip" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/07/Imec-3D-Chip.bmp" alt="" /></a></p>
<p>And here’s a close-up of the edge of the chip stack to show the three stacked die.</p>
<p><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/07/Imec-3D-Chip-Closeup.bmp"><img class="aligncenter size-full wp-image-617" title="Imec 3D Chip Closeup" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2011/07/Imec-3D-Chip-Closeup.bmp" alt="" /></a></p>
<p>The 3D Stack’s base chip is approximately 750µm thick. The two top components in the chip stack are each 25µm thick. There’s more technical info in the referenced <strong>EDA360 Insider</strong> blog.</p>
<p>I am convinced that 3D stacking of logic and RAM chips will be absolutely essential to developing massively parallel, low-power systems like the ones envisioned by the SpiNNaker project. First, the only way to feed data and instructions to massively parallel processing chips is through large amounts of on-chip memory and through high-bandwidth, low-energy channels connected to large off-chip memories. 3D assembly techniques permit both Wide I/O and high-speed serial I/O channels to work most effectively and at minimal energy levels and I expect to see rapid adoption of 3D assembly—even and perhaps especially in high-volume, cost-sensitive applications such as mobile phone handsets—in the next few years. This is precisely the sort of manufacturing technology we require to think seriously about million-processor systems.</p>
<p>Now all we need to do is figure out how to program them.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2011/07/16/think-globally-act-in-parallel-what-can-you-do-with-one-million-arm-cores-acting-in-parallel-and-how-do-you-get-there/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multicore server, PC, and embedded designs push memory power, drive use of advanced DDR3 SDRAMs</title>
		<link>http://low-powerdesign.com/sleibson/2010/07/02/multicore-server-pc-and-embedded-designs-push-memory-power-drive-use-of-advanced-ddr3-sdrams/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/07/02/multicore-server-pc-and-embedded-designs-push-memory-power-drive-use-of-advanced-ddr3-sdrams/#comments</comments>
		<pubDate>Fri, 02 Jul 2010 21:32:15 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[DRAM]]></category>
		<category><![CDATA[Green Design]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[SDRAM]]></category>
		<category><![CDATA[DDR2]]></category>
		<category><![CDATA[DDR3]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=413</guid>
		<description><![CDATA[Systems designers try all sorts of methods to reduce system power consumption. For years, we’ve relied on circuit tricks and have been reducing logic supply levels from the 5V power supplies that were so common in from the 1970s and &#8230; <a href="http://low-powerdesign.com/sleibson/2010/07/02/multicore-server-pc-and-embedded-designs-push-memory-power-drive-use-of-advanced-ddr3-sdrams/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Systems designers try all sorts of methods to reduce system power consumption. For years, we’ve relied on circuit tricks and have been reducing logic supply levels from the 5V power supplies that were so common in from the 1970s and throughout the 1980s to the 1V levels we now employ with today’s advanced logic chips. Memory supply voltages have dropped as well. For example, the original DDR SDRAMs had a 2.5V supply voltage and DDR2 SDDRAM employs 1.8V supply voltage. That’s nearly double today’s SOC, processor, and microcontroller core voltages. The reason for this lag in supply-voltage reduction is that memory vendors prefer to stay in the economic sweet spot for IC lithography as opposed to logic design which prefers to stay on or near the bleeding edge. Consequently, memory’s share of a system’s power-consumption pie has been rising and there really hasn’t been much attention paid to reducing memory power consumption. The advent of DDR3 SDRAM provides another opportunity to cut memory power through further reductions in memory supply voltage and coupled with advanced process technology, Samsung has attained a supply voltage of 1.35V for its 40nm DDR3 SDRAMs. This drop in memory supply voltage can produce a 38% cut in server power consumption, according to Samsung.</p>
<p> </p>
<p>Performance isn’t really the engine that drives DDR3 adoption. The real driver is bandwidth and there are two design trends that force the quest for ever-increasing amounts of memory bandwidth. The first such design trend is the wholesale adoption of homogeneous and heterogeneous multicore architectures. As an industry, we’ve embraced the use of multiple processor cores as a solution to the death of Dennard scaling. Although most people attribute the increase in operating frequency and the decrease in per-transistor power consumption through lithographic shrinks to Moore’s Law, which Gordon Moore codified in an article he published in 1965 while working at Fairchild Semiconductor, that attribution is not factually correct. Moore simply predicted that the number of transistors on a chip would grow exponentially over time as lithographies shrank. It was IBM’s Robert Dennard who observed in 1974 that lithographic advances in IC manufacturing also consistently produced faster transistors that consumed less power. For decades, we’ve used Dennard scaling to produce faster and faster processors (while attributing the improvements to Moore’s Law).</p>
<p> </p>
<p>The semiconductor industry has poured billions of dollars into keeping Moore’s Law alive but Dennard scaling died at 90nm. We continue to get more transistors on a chip with each advance in IC lithographic scaling, but the transistors no longer get appreciably faster, so the MHz wars have ended. Worse, pushing transistors to their performance limit now produces leaky transistors that dissipate as much power when off as when on. We now recognize that the way to get more performance is to use the transistor bounty to increase the number of processors and to distribute the work load across these processors without striving for multi-GHz clock rates.</p>
<p> </p>
<p>With all of these on-chip processors executing code and accessing data on a multicore chip, system designers must find a way to make large amounts of inexpensive memory available to these processors. For the last decade, the most cost effective way to provide a system with large amounts of low-cost memory has been the SDRAM. The classic system design teams a multicore processor or SOC with one or more SDRAM channels. As memory bandwidth needs rise, the SDRAMs’ per-channel transfer rate and the number of SDRAM channels used has increased. DDR transfer rate have now reached and exceeded 1600 Mtransfers/sec and it’s not uncommon to find server processors with three SDRAM channels, for example. Because of the constant thirst for memory bandwidth, DDR3 SDRAM sales exceeded DDR2 SDRAM sales beginning with the first quarter of 2010, according to the leading SDRAM vendor Samsung, and the company expects DDR2’s share of SDRAM market sales to drop below 20% by the end of the year.</p>
<p> </p>
<p>When you move that much data between a processor and memory, you’re likely to dissipate a considerable amount of power and indeed, memory power consumption has been on the rise. Lowering memory power consumption can substantially lower system-level power consumption. For example, states Samsung, going to 40nm, 2-Gbit DDR3 SDRAM with a 1.35V power supply can cut a server’s memory power consumption by 80% compared to the equivalent number of storage bits implemented with 60nm, 1-Gbit, DDR2 SDRAMs running at 1.8V and can even cut memory power consumption by 38% compared to equal-sized memory arrays consisting of 60nm, 1-Gbit, DDR2 SDRAMs running at 1.5V.</p>
<p>As a result, according to Samsung’s measurements, 40nm, 2-Gbit DDR3 SDRAMs running at 1.35V can cut power by an astonishing 38% at the system level for servers. To put that into economic perspective, says Samsung, the use of 1.35V DDR3 SDRAMs in a server can save 2564 kilowatt-hours per year. Samsung estimates that there will be 32 million servers operating in data centers worldwide by the end of this year. If they all were equipped with 1.35V DDR3 memory, the annual power consumption would be reduced by 82 terawatt-hours, worth an estimated $28 billion. That kind of money gets any data-center manager’s attention.</p>
<p>The same sort of energy savings apply to any multicore system whether it’s a server, a PC, or an embedded system based on a heterogeneous multicore processor design.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/07/02/multicore-server-pc-and-embedded-designs-push-memory-power-drive-use-of-advanced-ddr3-sdrams/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>SPMT engulfs LPDDR2 standard, making adoption a no-brainer. Meanwhile Marvell jumps on the bandwagon.</title>
		<link>http://low-powerdesign.com/sleibson/2010/06/07/spmt-engulfs-lpddr2-standard-making-adoption-a-no-brainer-coincidentally-marvell-jumps-on-the-bandwagon/</link>
		<comments>http://low-powerdesign.com/sleibson/2010/06/07/spmt-engulfs-lpddr2-standard-making-adoption-a-no-brainer-coincidentally-marvell-jumps-on-the-bandwagon/#comments</comments>
		<pubDate>Mon, 07 Jun 2010 09:00:05 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[DRAM]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[LPDDR2]]></category>
		<category><![CDATA[SDRAM]]></category>
		<category><![CDATA[SOC]]></category>
		<category><![CDATA[DDR2]]></category>
		<category><![CDATA[DDR3]]></category>
		<category><![CDATA[DDR4]]></category>
		<category><![CDATA[Hynix]]></category>
		<category><![CDATA[Marvell]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[Samsung]]></category>
		<category><![CDATA[SPMT]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=368</guid>
		<description><![CDATA[An insidious power problem has slowly crept up on embedded-system designers. While most of us were firmly focused on the power dissipation of our ever-expanding logic designs with their increasing number of processor cores in multicore designs, we mostly ignored &#8230; <a href="http://low-powerdesign.com/sleibson/2010/06/07/spmt-engulfs-lpddr2-standard-making-adoption-a-no-brainer-coincidentally-marvell-jumps-on-the-bandwagon/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;"><img class="alignright size-full wp-image-371" style="border: white 10px solid;" title="SPMT Logo" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2010/06/SPMT-Logo1.jpg" alt="SPMT Logo" width="200" height="64" />An insidious power problem has slowly crept up on embedded-system designers. While most of us were firmly focused on the power dissipation of our ever-expanding logic designs with their increasing number of processor cores in multicore designs, we mostly ignored the huge leaps in power consumption being caused by the rapid growth in memory size and big jumps in memory-access speeds and memory bandwidth. To cut memory costs, most high-end mobile and embedded designs today employ one high-bandwidth SDRAM device or array to satisfy all of a system&#8217;s memory requirements. Yet we think very little about the power impact of hooking big DDR SDRAMs up to our SOCs and ASICs—and these SDRAMs run at clock rates measured in hundreds of MHz or GHz, at transfer rates that are double the clock rate. It takes some real power to sling bits between a processor and SDRAM at transfer rates approaching or exceeding 1 Gtransfers/sec and even though the supply and I/O voltages have been dropping on SDRAM keeping memory power somewhat in check (only somewhat), wide DDR2 and DDR3 memory interfaces that deliver the highest bandwidths may now consume Watts of power. Watts! This simply cannot stand.</p>
<p>Not coincidentally, that’s the position of the <a href="http://www.spmt.org/index.aspx">SPMT (Serial Port Memory Technology) Consortium</a>, which has been developing a low-power, high-performance memory interface for mobile and embedded applications. The low-power aspect arises primarily from SPMT’s use of low-voltage differential signaling (LVDS), which transfers information using 150 mV differential signal swings instead of single-ended, ground-referenced signal swings of more than a volt. The high-performance aspect arises from the use of multi-Gbits/sec transfer rates per SPMT data lane.</p>
<p>But there’s been a big, ugly fly in the SPMT ointment. Memory vendors know that more than 80% of all DRAMs go into PCs and servers and they stick with memory designs—and memory interfaces in particular—that best suit the needs of PC and server designers. Today, that means DDR2 memory, which is the mainstream DRAM technology, but the industry is quickly switching to DDR3. DDR4 is yet undefined but it too is a rapidly approaching memory-interface specification that will most assuredly &#8221;fix&#8221; the problems we have with DDR3. These PC- and server-centric, high-speed parallel SDRAM interfaces burn a lot of power to deliver high bandwidth, which creates the niche opportunity that the SPMT Consortium has been trying to fill for mobile and embedded designs. Unfortunately, DDR memory has such a huge presence in the DRAM arena that there’s been little chance for any other interface approach to take hold.</p>
<p>Until now.</p>
<p>Today, the SPMT Consortium announced a major revision to the SPMT standard that may well spell the difference between an interesting technical exercise and an immensely successful new memory-interface standard. Previously, the SPMT specification multiplexed read/write commands and the data on the same unidirectional LVDS lanes. Doing so somewhat reduced the throughput on the data lines but it also reduced the memory pin count because SPMT memory didn’t need separate control/address (CA) lines. The reduced pin count was considered a major benefit that reduced the cost of packaged SPMT memory devices. The new SPMT specification, which completely supersedes the prior specification, does away with this control/address/data multiplexing in favor of using the same CA signal and pin definitions that LPDDR2 memory uses to carry control and address signaling.</p>
<p>This is a significant and important change to the SPMT spec because LPDDR2 is already poised to take over the mobile and embedded design spaces. (See <a href="http://www.denali.com/wordpress/index.php/dmr/2010/05/20/lpddr2-the-new-mainstream-memory-for-emb">LPDDR2: The new mainstream memory for embedded and mobile applications?</a> on Denali Software’s Memory Report blog.) Further, four pairs of unidirectional SPMT data lanes now precisely overlap the 16 bidirectional data lines of a x16 LPDDR2 memory, making it possible to build one memory chip that can support both LPDDR2 and SPMT protocols using the same set of pins. What that means is that with only a few changes to the memory controller and memory PHY, an SOC or embedded processor can accommodate both LPDDR2 and SPMT memory using exactly the same set of interface pins. It also means that SDRAMs designed to the new SPMT specification can be used as LPDDR2 SDRAMs, ensuring a ready market when commercial SPMT SDRAMs first hit the market near the end of 2011—assuming things go according to the SPMT Consortium&#8217;s current plans.</p>
<p>So where’s the power advantage? It kicks in after the required SDRAM transfer rate hits a critical level. For example, the SPMT Consortium’s data estimates that a x32 LPDDR2 memory interface operating at 400MHz dissipates about 180mW while providing 3.2 Gbytes/sec of peak data throughput over 32 data lines (800 Gbits/sec/pin) and 360mW at a peak data throughput of 6.4 Gbytes/sec over 64 data lines. (Regular old DDR2 and DDR3 SDRAM interfaces would consume a lot more power than this.) By contrast, the SPMT interface dissipates 180mW while transferring 6.4 Gbytes/sec over eight data lanes (8 Gbits/sec/lane) and 360mW when transferring 12.8 Gbytes/sec over 16 data lanes. So the SPMT interface appears to be about twice as power efficient as the LPDDR2 interface at higher data rates, which LPDDR2 memory can’t attain without resorting to a very wide data bus and using several memory devices in the bargain. However the LPDDR2 parallel interface has a power advantage over the SPMT serial interface at lower transfer rates. So LPDDR2 memory might suffice for today’s embedded and mobile applications and might also suffice for low-activity modes in future applications.</p>
<p style="text-align: left;">The graph below, supplied by SPMT, tells the story. The graph shows that at low data rates, LPDDR2 memory dissipates less power than SPMT memory—largely because of the DLL integrated into SPMT memory. (DLLs consume non-negligable amounts of power and although DDR2 and DDR3 memories incorporate DLLs, LPDDR2 memory does not.) So the SPMT Consortium has done something very smart and has developed an integrated mode-switching mechanism called SerialSwitch, which allows an SDRAM controller to programmably shift an SPMT memory between its LPDDR2 and SPMT serial interface modes using a control register built into the memory device.</p>
<p style="text-align: left;"> </p>
<p style="text-align: left;"> <img class="size-full wp-image-372 aligncenter" title="Memory Crossover" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2010/06/Memory-Crossover.jpg" alt="Memory Crossover" width="600" height="351" /></p>
<p style="text-align: left;"> </p>
<p style="text-align: left;">Mobile phone vendors and other embedded/mobile system designers know that video will be heavily used in many future products and they also know that memory transfer-rate and bandwidth requirements will only go up as a result. SPMT&#8217;s SerialSwitch mechanism provides a way for one memory device to support both low- and high-bandwidth operating modes with an appropriate level of power consumption depending on a system&#8217;s instantaneous bandwidth requirements. By definition, all commercial SPMT memories will incorporate the SerialSwitch feature. The following figure shows how the SPMT SerialSwitch mechanism works.</p>
<p style="text-align: left;"> </p>
<p style="text-align: left;"><img class="aligncenter size-full wp-image-373" style="margin-left: 0px; margin-right: 0px;" title="SerialSwitch" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2010/06/SerialSwitch.jpg" alt="SerialSwitch" width="600" height="286" /></p>
<p style="text-align: left;"> </p>
<p>During Tg, the figure shows SPMT memory operating as a x16 LPDDR2 memory. Note that the data lines (DQ/HS) employ full-voltage, single-ended signaling in this mode. During time Tg, the memory’s DLL is off, which saves power. At the beginning of time Th, the system determines that more bandwidth is or soon will be needed, so it directs the memory controller to send a command to the memory to spin up the DLL in preparation for switching to SPMT serial mode. That process takes 5 to 10 microseconds. During this time, the memory continues to operate as an LPDDR2 memory so the DLL spin-up time is hidden and doesn’t interfere with system operation but power consumption will rise. Once the SPMT memory’s DLL has spun up, at time Ti, the system&#8217;s memory controller commands the SPMT memory to switch to serial communications mode. This transition takes a maximum of 10 clock cycles. After that and during time Tj in the figure above, the memory operates in SPMT serial-communications mode. Note that the data lines have switched to LVDS signaling, as shown in the figure. LVDS signaling reduces the memory interface&#8217;s power consumption. At some later time depending on system requirements, the memory controller can power down the memory (shown as time Tk) or switch back to LPDDR2 mode (the period following the period that starts at time Tk in the above figure). Don’t be misled by this figure by the way—SPMT memory need not pass through the power-down mode to switch from SMPT-serial communications to LPDDR2 mode.</p>
<p>Systems can use SPMT memory in LPDDR2 mode at boot time and whenever the system is operating in a mode with low memory-bandwidth requirements. The system can quickly switch to the LVDS SPMT-serial mode whenever it requires higher memory data rates—for example when video is activated, when multiple operating modes are in use simultaneously, or when multiple processors are running in a multicore device. The SPMT Consortium estimates that the optimum crossover point between LPDDR2 and SPMT serial interface data rates for a x16/8-lane LPDDR2/SPMT-serial memory device is around 1.6 Gbytes/sec based on energy considerations.</p>
<p>By subsuming the LPDDR2 standard and making SPMT memories wholly superset compatible with LPDDR2 memories, I think the SPMT consortium has significantly raised the likelihood of adoption when commercial SPMT memories finally appear late next year. I also think the likelihood of such memories appearing is pretty high considering that the top two DRAM vendors, Samsung and Hynix, are members of the SPMT Consortium. Together, Samsung and Hynix have a bit more than half of the overall DRAM market according to the latest stats from the DRAMeXchange (<a href="http://j.mp/aNaNiY">http://j.mp/aNaNiY</a>).</p>
<p>On the embedded processor side of the equation, Marvel has announced that it too has joined the consortium, which further improves SPMT’s chances of success. In fact, Marvell supplied a canned quote for the SPMT Consortium&#8217;s press release with one of the strongest statements I&#8217;ve seen in such press releases, so I am suspending my usual cynicism about such quotes and reproduce it here:</p>
<p><em>“Today’s mobile DRAM technology is geared to support the bandwidth needs of single core processors. As devices evolve to integrate multi-core CPU, multi shader 3D graphic engines at multi-GigaHertz speeds, it’s clear that DRAM will be the single performance bottleneck, especially for handheld systems where power budget is a major constraint,” said Dr. Sehat Sutardja, chairman, president and chief executive officer at Marvell. “Marvell is joining the SPMT Consortium to actively promote Serial Port Memory Technology as an industry standard and address the immediate needs of the industry. We encourage other companies active in the sector to join us in our mission.”</em></p>
<p>Strong backing like this from a market maker like Marvell can only help SPMT&#8217;s cause. Whether or not SPMT actually reaches critical mass is something that we’ll all be watching as events unfold in the hotly competitive memory arena over the next 18 to 24 months.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2010/06/07/spmt-engulfs-lpddr2-standard-making-adoption-a-no-brainer-coincidentally-marvell-jumps-on-the-bandwagon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>State-of-the-Art in Low-Power Memory: Denali’s MemCon</title>
		<link>http://low-powerdesign.com/sleibson/2009/06/30/state-of-the-art-in-low-power-memory-denali%e2%80%99s-memcon/</link>
		<comments>http://low-powerdesign.com/sleibson/2009/06/30/state-of-the-art-in-low-power-memory-denali%e2%80%99s-memcon/#comments</comments>
		<pubDate>Tue, 30 Jun 2009 16:06:42 +0000</pubDate>
		<dc:creator>sleibson321</dc:creator>
				<category><![CDATA[DRAM]]></category>
		<category><![CDATA[Flash]]></category>
		<category><![CDATA[Low-Power]]></category>
		<category><![CDATA[LPDDR]]></category>
		<category><![CDATA[LPDDR2]]></category>
		<category><![CDATA[SDRAM]]></category>

		<guid isPermaLink="false">http://low-powerdesign.com/sleibson/?p=49</guid>
		<description><![CDATA[Need gobs of cheap RAM? Need it to operate at the lowest possible power? This blog&#8217;s for you. I attended Denali&#8217;s ninth annual MemCon conference a few days ago. It was three days of intensive discussion about the state of &#8230; <a href="http://low-powerdesign.com/sleibson/2009/06/30/state-of-the-art-in-low-power-memory-denali%e2%80%99s-memcon/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>Need gobs of cheap RAM? Need it to operate at the lowest possible power? This blog&#8217;s for you.</em></p>
<p>I attended Denali&#8217;s ninth annual MemCon conference a few days ago. It was three days of intensive discussion about the state of the art in DRAM and Flash memory-the two mainstay memory technologies in use today. Surprisingly, NAND Flash memory is now the low-cost leader in terms of cost per bit, having passed by DRAM a few years ago. However, DRAM remains the mainstay memory for the vast number of designs and DDR SDRAM now rules as it becomes easier and easier to find microcontrollers and FPGAs with direct DDR interfaces and DDR controller and PHY IP for SOCs.</p>
<p>Memory power consumption as a percentage of system power consumption has grown with the rapid growth of memory-array size in all sorts of systems. A real eye opener at MemCon 09 was a chart on the power consumption of memory in server systems, where the large server memory arrays consume as much as 40% of the system power and the processor now consumes a mere 28%. Why is that important? It&#8217;s important because big server users like Google pay tens of millions of dollars each year in electrical power costs to run and to cool their server farms and 40% of a few tens of millions of dollars is, well, tens of millions of dollars.</p>
<p>Note that the current share-of-power percentages for servers don&#8217;t make processor power consumption unimportant-28% is still a big number-but the clear message is that server designers must now be far more concerned with memory power consumption because it&#8217;s a big part of the power puzzle. As embedded designs adopt large DDR memory DIMMs for bulk memory, the same sort of situation applies. Embedded designers must also be aware of the way their DRAM choices affect system power.</p>
<p>Marc Greenberg, Denali&#8217;s Director of Technical Marketing, gave a 2-hour tutorial on low-power DDR SDRAM on the first day of MemCon09. He threw up one slide that does a terrific job of putting all of the low-power SDRAM parts in perspective:</p>
<div id="attachment_60" class="wp-caption aligncenter" style="width: 500px"><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2009/06/low-power-dram-selection1.jpg"><img class="size-full wp-image-60" title="low-power-dram-selection1" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2009/06/low-power-dram-selection1.jpg" alt="Low-Power DDR Selection Criteria" width="490" height="381" /></a><p class="wp-caption-text">Low-Power DDR Selection Criteria</p></div>
<p>This slide shows the optimum type of SDRAM to use based on your design&#8217;s memory-capacity and speed requirements. I like this slide a lot because it helps you to pick from the wide array of DDR types and speeds. However, it seems that your selection job is about to become a lot simpler. Look what happens to the chart when you add in LPDDR2 memory:</p>
<div id="attachment_61" class="wp-caption aligncenter" style="width: 500px"><a href="http://low-powerdesign.com/sleibson/wp-content/uploads/2009/06/low-power-dram-selection-with-lpddr21.jpg"><img class="size-full wp-image-61" title="low-power-dram-selection-with-lpddr21" src="http://low-powerdesign.com/sleibson/wp-content/uploads/2009/06/low-power-dram-selection-with-lpddr21.jpg" alt="Low-Power DDR Selection Criteria with LPDDR2" width="490" height="381" /></a><p class="wp-caption-text">Low-Power DDR Selection Criteria with LPDDR2</p></div>
<p>LPDDR2 memory delivers the low-power goods by operating the SDRAM&#8217;s memory core and I/O at 1.2V, which is what you need to do to substantially cut memory power these days. Several manufacturers have announced LPDDR parts with I/O speeds to 400MHz/DDR800 and spec sheets for these parts are beginning to appear on DRAM vendor Web sites. LPDDR2 vendors with announced parts include Elpida, Hynix, Micron, and Nanya. Note that there&#8217;s also the possibility for existing LPDDR1 vendors to create parts that operate at 1.2V for similar power savings and that some of the soon-to-be-seen DDR3 parts may operate at 1.35V, which qualify them as low-power DRAMS.</p>
<p>In addition, there&#8217;s a spec for LPDDR2 non-volatile memory (LPDDR2-NVM) to allow LPDDR2 DRAM and Flash to be intermixed. The big advantage of Flash LPDDR2 is the very low standby power but Flash memory exhibits both read and write wear-out failure, so DRAM isn&#8217;t yet obsolete and you&#8217;ll likely need both memory types in your system design. The LPDDR2-NVM spec allows for I/O speeds to 533MHz/DDR1066 operation, but Greenberg says that the initial LPDDR2-NVM parts are likely to be slower than the maximum.</p>
]]></content:encoded>
			<wfw:commentRss>http://low-powerdesign.com/sleibson/2009/06/30/state-of-the-art-in-low-power-memory-denali%e2%80%99s-memcon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.534 seconds -->

