Do you believe in 21st century Intelligent Design?

Late last month, columnist Mike Cassidy wrote about visionary Clayton Christensen’s Innovator’s Dilemma in the San Jose Mercury News and his words reminded me that it was time, past time, to make yet another blog-based plea for intelligent design. No, I’m not talking about “intelligent design” in the form of an alternative to evolutionary theory. Not that one. I’m talking about “intelligent design” in the form of adding more microprocessors and more software to all electronic designs in a valiant attempt to produce products are more aware of the context of their surroundings. In other words, products that are far less stupid. At the same time, I believe that this form of intelligent design can help you cut product manufacturing cost.

More value at lower cost. Doesn’t that seem like a good deal?

Here’s what Cassidy wrote that triggered this blog:

“Clay Christensen has an idea: Scare the hell out of yourselves.

OK, that’s not precisely the way he put it. But the author of “The Innovator’s Dilemma” is all about new ideas. Not just new — but different, unorthodox, radical, uncertain, frightening and disruptive. You don’t solve old problems with old ideas. The other day, Christensen held a one-man teach-in for non-profits and their supporters at San Jose’s Mexican Heritage Plaza, preaching the gospel of “disruptive innovation.” It’s an idea that is embedded in Silicon Valley’s DNA. It is also an idea that is a lot easier to talk about than to actually deploy.”

So what’s the connection with “intelligent design”? I was immediately reminded of a great new product, a home thermostat of all things, that I’d just written up from last month’s CES show.(See “Friday Video: Two minutes of system-design expertise from Matt Rogers, VP of Engineering and founder of Nest and designer of the thermostat of the future”) The thermostat is from a new company called Nest and the product is called the Nest Learning Thermostat. It looks like an updated 21st century version of the old golden Honeywell manual thermostats that were common in the 1950s and 1960s. (Noted industrial designer Henry Drefuss created the Honeywell thermostat’s iconic circular industrial design in 1952.)

However, unlike those old Honeywell thermostats that were based on bimetallic temperature-sensing coil springs and mercury tilt switches, the Nest Learning Thermostat is based on a 32-bit microprocessor. And a TCP/IP stack. And WiFi. In adding these specific technologies to its Learning Thermostat, Nest demonstrates a grasp of Christensen’s concept of the “Innovator’s Dilemma.”

How?

First, understand that the context of what we mean by “home” has changed. Frequently in the modern Western world, there’s no one home. That means the home’s heating and cooling requirements are different. Also, our definition of “home” increasingly includes a home WiFi network, possibly with access to the outside world through the home’s broadband router. Add some intelligence and connectivity to a thermostat, and you can disruptively change how we heat and cool our homes with a large resulting energy savings.

After installation, the Nest Learning Thermostat needs to know three things:

  1. What’s your Zip code (Yes, I know there’s a US bias built in here. Early intelligence has its limits.)
  2. Should the thermostat start to heat or cool your home?
  3. What temperatures should the Nest Thermostat use to heat or cool your home while you are away?

After that, you set the thermostat to the desired current temperature and the Nest Learning Thermostat then starts to observe your daily habits (with respect to heating and cooling only!). It monitors when you turn up the heat (like in the morning) and when you turn it down (like before you go to sleep). It notes when you turn on the cooling (like when the afternoon sun starts to make it overly warm for you).

The Thermostat learns your daily routing and your weekend routine (weekend habits are different for most people) and by the eight day, the thermostat has developed a pretty good idea of your habits and strives to maintain a comfortable home for you by catering to those habits. The Nest Learning Thermostat can start to heat your home several minutes before you rise in the morning. (This is a major feature for those of us whose first job in the morning is to turn up the heat for our spouses so they can get up.)

You can log into your thermostat from your office to adjust the heat so that your home is warm by the time you return from work or to let the thermostat know you’re going out for dinner and the heat can be delayed for a few hours. And of course, you can do the same from your smartphone no matter where you are on the planet (assuming you’ve got cellular coverage wherever you are).

The Nest Learning Thermostat also tries to train you after it learns your habits. It wants you to learn better habits in terms of energy consumption. It does this by starting to display a small green leaf when you turn down the heat or taper off on the air conditioning relative to your habitual heating and cooling use. This leaf tells you that you’re saving energy and thus money. It’s a subtle form of coercion and some people won’t like being told what to do by a thermostat. Others—the ones most likely to buy this product—will appreciate the watchful eye.

None of this would be possible if the Nest Learning Thermostat did not have a 32-bit microprocessor and a WiFi connection. Note that it took an entirely new company to build a thermostat like this. Although Honeywell still makes thermostats, microprocessor-based ones at that, it’s not building anything like the Nest Learning Thermostat. At least not this year. We just had to replace the thermostat in our condo and it is a Honeywell thermostat. It’s a standard setback thermostat design with an LCD. I have to tell it the time. I have to program it for a setback sequence. And Honeywell knew that the user interface on this product was so unintuitive that it kindly included a fold-out instruction booklet that pokes out of the top left of the thermostat, as you can see from this photo.

Frankly, I find the display of the new Honeywell thermostat extremely confusing. It shows the current temperature on the left side in large characters, the setpoint temperature in somewhat smaller characters on the right side, and the time in even smaller characters in the middle. To me, it looks like the display information was simply thrown on the display with little consideration to how the information is portrayed. The time is not the central piece of information on a thermostat yet that information is central to the display, albeit in small characters. The two important and conjoined pieces of information, the current and setpoint temperatures, are spaced nearly as far apart on the display as possible. And there’s no fixing this design with a firmware update. That LCD’s permanently configured to save manufacturing cost.

Not, repeat not intelligent design.

You can bet that the Nest Intelligent Thermostat does not have an ungainly instruction booklet poking out of its sleek, smooth industrial design. To anyone who has ever used a regular dumb thermostat, the Nest Learning Thermostat’s everyday operational use appears to be entirely intuitive. And if you want to do more complex things with the Nest Learning Thermostat, you interact with it through a Web page, not a handful of multi-use rubber buttons and a limited (for cost reasons) LCD. For these reasons (and a few more), I think the Nest Learning Thermostat is one of the best examples of 21st century intelligent design I’ve seen. It offers up several lessons in thoughtful design that I hope you will appreciate as much as I do.

Note: To read Mike Cassidy’s entire column, click on the following link: “Clay Christensen sees Silicon Valley non-profits’ dilemma as the innovator’s dilemma

Posted in Design, Low-Power | Tagged , , | Leave a comment

3-Hour, $50 Short course in Low-Power Design with Prof. Jan Rabaey. Silicon Valley, Jan 31

The Santa Clara Valley (SCV) Chapter of the IEEE Solid State Circuits Society is hosting a 3-hour short course in low-power design a the end of this month. The course is divided into two parts:

  1. Fundamentals of low-power design and a review of well-established low-power design techniques
  2. New low-power design techniques that will come into their own based on current technology and application trends.

This is a rare opportunity to get a concentrated presentation from Professor Jan Rabaey of UC Berkeley, who happens to be an excellent and engaging speaker. The short course will be taught at the TI Silicon Valley Auditorium (formerly National Semiconductor), 2900 Semiconductor Drive, Santa Clara, CA.

 

The charge is $50 plus a $3.74 event fee. Register here.

 

Posted in Low-Power | Leave a comment

Is 2012 going to be another breakout year for NAND Flash and Low-Power Design?

It’s just one week into the year, I am increasingly getting the feeling that 2012 is going to be a momentous, tumultuous year for semiconductor technology and low-power system design. Among the many recent events that are giving me this feeling are the changes taking place in the NAND Flash arena. Nearly all low-power system designers depend on NAND Flash in some form because it is currently the technology of choice for storing code and data when a system is in deep low-power/sleep mode or when switched off. We use NAND Flash on chip for microcontrollers. We use NAND Flash chips on board for main storage in mobile phone handsets, tablets, eBook readers, and many other embedded systems. We use NAND Flash cards for removable storage in cameras, camcorders, mobile phone handsets, voice recorders, and media players. Any changes to NAND Flash technology ripple widely through the low-power design landscape like earth tremors.

At least three major changes to NAND Flash technology in the recent past have caught my attention. The first such event I want to discuss in this blog entry is the HMC or Hybrid Memory Cube that Micron first announced last year and is now in joint development with major partners including Samsung and IBM.

I previously wrote about the HMC (see “3D Thursday: Hybrid Memory Cube—Does anyone know what’s happening with IBM and Micron?”) and its design is for high-performance computing systems that require extremely high throughput: 1 Tbit/sec. (See “Want to know more about the Micron Hybrid Memory Cube (HMC)? How about its terabit/sec data rate?”) The HMC is a DRAM example of the kinds of memory modules we’re likely to see from the marriage of 3D IC assembly techniques and advanced NAND Flash devices.

The HMC runs many, many TSVs (through silicon vias) up through a stack of as many as four SDRAM die to access the inherent parallelism of the multiple DRAM arrays on each die. Each proprietary DRAM die in the HMC stack has 16 separate memory arrays, resulting in substantial potential parallelism and consequently, substantial potential memory throughput.

However, the high-performance approach of the HMC is not the only way to harness 3D assembly and semiconductor memory. For example, at the end of last year, I wrote an extended blog describing a thought experiment that employed the HMC design concepts using Wide I/O SDRAM instead of the special NAND Flash chips in the HMC. (See “3D Thursday: Let’s end 2011 with a high-performance DRAM memory stack design. How would you improve it?”) Wide I/O SDRAM presents four independent 128-bit DRAM channels to the host system, resulting in a high level of memory parallelism. Just not as high as for the HMC. In fact, the performance is about half that of the HMC but it’s still pretty good. The same parallelism concepts could be applied to NAND Flash devices designed to a similar Wide I/O specification for NAND Flash. The lower interface speeds enabled by a Wide I/O memory interface port really drop power consumption while maintaining good performance through the parallelism uncovered by the access to the multiple on-chip memory arrays.

I have not heard of any efforts to adopt the Wide I/O interface spec to NAND Flash devices. Not yet. But the move to extracting parallelism from the arrays on all memory chips is too attractive to ignore in a world that perpetually thirsts for bandwidth at low power.

At the end of the year, two other announcements directly related to NAND Flash memory have caught my eye: the introduction of the XQD memory card format and the ONFI 3.0 interface spec. The Compact Flash Association introduced the XQD memory card format in December 2011. The XQD memory card has a slightly larger footprint than an SD memory card and a somewhat smaller footprint than a Compact Flash (CF) memory card. It’s as thick as a CF card. But the really big difference here is the interface to the memory card. The XQD memory card uses a PCIe (PCI Express) interface clocked initially at 2.5 Gbits/sec, resulting in a maximum write speed of 125 Mbytes/sec.

That’s really fast and speed is important when you’re shooting large images at a fast rate, which occurs during HD video recording and at high burst speeds in high-resolution digital still cameras. Both such conditions exist in the new Nikon D4 DSLR, which Nikon launched just last week. The Nikon D4 DSLR can shoot 16.2 Mpixel frames at 10 to 11 frames per second. Normally, DSLRs use in-camera RAM to buffer burst-mode still captures but the Nikon D4 DSLR can accept the new XQD memory cards and Sony introduced the first series of such cards last week, concurrent with Nikon’s introduction of the Nikon D4 DSLR.

Sony claims that its H Series XQD card can accept bursts of 100 uncompressed still images from the Nikon D4 DSLR in continuous shot mode. That’s a huge jump in burst length for a digital still camera and will be invaluable in shooting images of sports activities, for example.

One of the secrets behind the XQD card format’s performance is that PCIe interface port, which is also unique in that it is a memory interface and is not derived from a disk interface. That should mean that a host processor doesn’t need a disk controller to operate an XQD card. The card can be mapped to the host processor’s memory bus and the controller can reside in each memory card. Eliminating the disk controller from the serial chain between the processor and the Flash memory chips should cut costs, reduce power consumption, and boost performance.

All of those benefits are welcome in the world of low-power design. After all, do we really need controllers controlling controllers in an efficient system design? I don’t think so.

Now before you bemoan the need of a controller in each memory card, you should be aware that there already is a controller in each CF and SD memory card. You don’t think that NAND Flash arrays already look like disk drives, do you? We do indeed currently have controllers controlling controllers in existing NAND Flash memory subsystems.

A PCIe interface spec should simplify things somewhat.

The third development that’s caught my eye in the Flash memory arena is the announcement of the ONFI 3.0 interface specification for Flash memory. The ONFI (Open NAND Flash Interface) Working Group introduced the third major revision of the ONFI spec nearly a year ago, in March 2011. What’s new is that there are now products appearing that use ONFI 3.0.

The advantage of the new ONFI specification is that it doubles transfer rates to 400 Mtransfers/sec using the NV-DDR2 200MHz double-data-rate (DDR) protocol while adopting 1.8V SSTL_18 signaling to cut the power dissipation of the interface. See a pattern evolving here? More performance and less power consumption. The question is whether or not ONFI 3.0 is real or not. Well, the memories now seem real because Intel and Micron jointly previewed a 128Gbit NAND Flash device in December with the derivative 64Gbit NAND Flash device going into production now. According to the joint Intel/Micron announcement, the 128Gbit device will be in volume production later this year after a “rapid transition” from the 64Gbit device.

However, an ONFI 3.0 memory device isn’t sufficient. You also need a controller on an SOC that can operate ONFI 3.0 devices. Cadence just introduced an ONFI 3.0 NAND Flash controller IP block and companion PHY IP today along with appropriate verification IP so it’s now possible to include an ONFI 3.0 NAND Flash controller in an SoC design using the standard ASIC flow.

As you can see, there’s a tremendous amount of new technological development going into NAND Flash memory and I see big things ahead this year, all to the benefit of low-power system designers.

Posted in EDA, Flash, SDRAM, SOC, Video | Tagged , , , , , , , | Leave a comment

2011: A great year for low-power design, wasn’t it? Part B

2011 was a great year for low-power design. I don’t think I can remember a year as good to low-power designers and I thought I’d devote this second part of my blog post on this topic to review some major process and packaging technology advances that occurred this year, which I think will have major implications for years to come.

28nm

Perhaps the biggest process innovation to hit the semiconductor industry, with a long-tail effect on low-power semiconductor design, will turn out to be the 28nm process node. Here are the clues on which I’m basing this opinion.

First, Xilinx provided a very long and detailed explanation as to why it selected the TSMC 28nm HPL process technology for all three members of its series 7 FPGAs (Virtex-7, Kintex-7, and Artex-7) over the TSMC 28nm HP and P process technologies. The 28nm HP process variant is strictly for high-performance designs and the 28nm LP process variant, which employs PolySiON (polysilicon/silicon oxy-nitride) gate oxide, delivers low leakage but also lower-performance transistors. The 28nm HPL process variant is a high-K metal-gate technology that produces high-speed, low-leakage transistors. (See “Xilinx 28nm low-power SoC design class, part 2: Process Technology”)

It’s not Xilinx’ selection of the 28nm HPL process that’s the highlight here—it’s the fact that there are two low-power variants of the 28nm process available to IC design teams. They have the flexibility to pick the process variant that best meets the objectives for a specific IC.

Altera took a different approach in developing a low-power FPGA based on 28nm process technology. Late last month, the company announced that it had started shipping low-power Arria V FPGAs based on the TSMC 28nm LP process variant because it allows the Arria V FPGA family to deliver “the lowest total power, lowest static power, and lowest transceiver power of any midrange FPGA family, consuming up to 40% less power compared to previous generation devices.”

It’s this flexibility in 28nm process variants that I think will allow all sorts of interesting low-power design techniques to be used in future products designed with 28nm process technology. And the 28nm process node is critically important for another reason: It appears to be the last process node to be manufactured without taking extraordinary measures in lithography. By that, I mean that after the 28nm node, we will be seeing a big bump in lithographic complexity, first through double, triple, and quadruple patterning and then through EUV (extreme ultraviolet, aka X-rays) lithography. Scaling is about to get much more difficult after 28nm.

3D IC Assembly 

Which leads me to the other big semiconductor manufacturing innovation whose time has apparently arrived: 3D IC assembly. I believe that the time has arrived for 3D IC assembly to become mainstream and that we will see a revolution in SoC design and development based on adding 3D design to the mix. There are just too many benefits to ignore and I discussed some of these in my previous blog post (Part A).

To recap, the big advantages are:

  • A huge reduction in I/O power to transfer signals from chip to chip
  • A huge increase in chip-to-chip bandwidth without increasing I/O power consumption or packaging costs
  • Ability to intermix logic, memory, analog, and RF functions by implementing them on separately optimized die (using separately optimized process technologies) and then creating appropriate 3D assemblies
  • Cost advantages by reducing average die size
  • Reduction in pressure to jump to the next process node (and higher design and NRE costs) by providing an alternative path to SoC-level integration

These advantages cannot be ignored and indeed companies are not ignoring them. TSMC itself has already stepped in and proposed itself as a 1-stop shop for IC die manufacture and 3D assembly. (See “3D Week: The State of 3D IC assembly—December 2011”) Xilinx is already solidly in the 2.5D IC assembly camp with the Virtex-7 2000T FPGA and you can bet that technology will make its way down the product line as quickly as the costs of early adoption can be reduced.

JEDEC has announced the finalization of the Wide I/O SDRAM specification, which is essential to the development of 3D memory stacks with high-speed, low-power data-transfer features. (See “3D Week: JEDEC Wide I/O Memory spec cleared for use

The economics of the entire industry now solidly point to the adoption of 3D IC assembly as a mainstream packaging technology. (See “3D Week: Driven by economics, it’s now one minute to 3D”) Between the increasing availability of 28nm process technology and the rise of 3D IC assembly, I see a new flowering of electronic design the likes of which we have not seen since the early 1990s, when surface-mount technology and ASIC design both came of age nearly simultaneously. That was truly an great era for the industry. The same sort of simultaneous advance in semiconductor and packaging technology is taking place with 28nm process technology and 3D IC assembly.

It’s a great time for low-power design.

Posted in 2.5D, 3D, Low-Power | Tagged , , , , , , , | Leave a comment

2011: A great year for low-power design, wasn’t it?

2011 was a great year for low-power design. I don’t think I can remember a year as good to low-power designers. I thought I’d devote this blog to a review of some major developments in 2011 that made low-power designers’ lives easier. In fact, there’s so much to talk about that I’m splitting this blog post in two. In the first half, I’ll write about significant developments in standard silicon offerings including microcontrollers, embedded application processors, and FPGAs. In part B, I’ll discuss some of the year’s most significant developments in design at the silicon level and the implications for people who design ASICs, SoCs, and ASSPs. It truly was a bountiful year.

Low-power microcontrollers

If there ever was a year for microcontroller advancement, this was it. Every major microcontroller vendor had something new and exciting on the low-power front. So many developments that I can only hit the highlights:

In August, ARM’s Alan Rampon wrote a blog post listing 17 microcontroller vendors that were offering a broad range of low-power devices based on various ARM Cortex-M series processor cores. The vendor list includes:

  • Analog Devices (Cortex-M3)
  • Atmel (Cortex-M3)
  • Broadcom (Cortex-M3)
  • Cypress Semiconductor (Cortex-M3)
  • Dust Networks (Cortex-M3)
  • Ember (Cortex-M3)
  • Energy Micro (Cortex-M0, M3)
  • Freescale Semiconductor (Cortex-M4)
  • Fujitsu (Cortex-M3)
  • Holtek (Cortex-M3)
  • Nuvoton (Cortex-M0)
  • NXP (Cortex-M0, M3, M4)
  • ON Semiconductor (Cortex-M3)
  • Samsung (Sortex-M0, M3)
  • ST Microelectronics (Cortex-M3)
  • Texas Instruments (Cortex-M3)
  • Toshiba (Cortex-M3)

That list is probably somewhat dated already, but you get the idea. The proliferation of low-power microcontrollers greatly accelerated during 2011. One such device that really sticks in my mind (because it’s recent), is the onset of shipments of the NXP Semiconductor LPC4350, which packs an ARM Cortex-M4 and an ARM Cortex-M0 into one microcontroller that costs less than $4 in quantities of 10,000.

This microcontroller is on the forefront of a new wave of processor design called “asymmetric multiprocessing” and there’s a real “wave-of-the-future” look to this development. (See “More news on the asymmetric processing SoC front”)

Asymmetric Multiprocessing

The microprocessor is 40 years old (last month!) and silicon microprocessor implementations have really advanced over those four decades while many of our design memes have not. In particular, I’m thinking of the meme that says “processors are expensive, so layer as many tasks as possible on a processor to save money.” The net effect of this meme is to make us develop increasingly complex multitasking schemes in an attempt to get processor utilization up to 80% or 90% or perhaps even 95%.

Now any engineer can tell you that when you load any component to near 100%, you have just sent and engraved, gold-plated invitation to Murphy, asking for an audience. In other words, something will go wrong. You won’t always get the latency you expected. You won’t always get the bandwidth you need.

So you’d better ask yourself: Are complex multitasking systems really worth the effort when I can get two 32-bit microprocessor cores in one device for less than $4? You’d better be serious coming up with that answer. I believe that asymmetric multiprocessing will remake all of design, including low-power design, during this coming decade.

Asymmetric multiprocessor design wasn’t the only innovation that loomed in 2011. Xilinx finally announced the first four members of its new Zynq 7000 EPP (Extensible Processing Platform) family, which fuses a processor complex containing two ARM Cortex-A9 processor cores with an FPGA fabric.

Now assembling systems with microprocessors and FPGAs isn’t new. In fact, putting processor cores and FPGA fabrics onto the same piece of silicon isn’t particularly new either. However, doing it right? That is new. And this development fits into the low-power design world because putting the processor complex and the FPGA fabric on chip with a massive on-chip interconnect between the two cuts interface power significantly by reducing the interconnect frequency. You don’t need GHz interconnect clock rates when you have thousands of wires for parallel interconnect.

2.5D IC Assembly

Speaking of Xilinx, the company started shipping engineering samples of the Virtex-7 2000T FPGAs to customers last month and this too is a low-power design story. The story is completely told in this graphic:

 

 

The Xilinx Virtex-7 2000T is a very large FPGA with two million logic elements. But it’s not a monolithic piece of silicon. Rather, the Virtex-7 2000T consists of four “identical” FPGA tiles, each with half a million logic elements (and a ton of other stuff). The FPGA tiles are mounted on a silicon interposer, which establishes more than 10,000 connections between each tile (56,000 connections in total). The silicon interposer is a fascinating piece of technology. It’s a 65nm IC with four layers of metal on each side of the die and no transistors. It’s a silicon circuit board that must be made in a wafer fab. In this case, TSMC owns the fab. The interposer is as large as the stepper reticule will allow. The advantage here is that each FPGA tile is a quarter of the size of the interposer, and die yield has an exponential relationship to die size. The smaller the die, the better the yield percentage. So 2.5D assembly makes a lot of sense in several different ways.

The 2.5D IC assembly-with-interposer approach taken to create the Xilinx Virtex-7 2000T allows the FPGA tiles to use lower power I/O drivers because these drivers will only be driving short, closely controlled traces between adjacent tiles. That system-design knowledge saves power. Although the Xilinx Virtex-7 2000T uses four identical die fabricated with a 28nm process technology to realize the active elements, 2.5D IC assembly permits heterogeneous die assembly as well, as shown in this image from Xilinx:

 

 

As you can see, 2.5D IC assembly allows designers the freedom to intermix die from radically different IC technologies such as logic, memory (DRAM, Flash, SRAM, etc.), analog, and RF. It’s a pc-board-like technology but on a much smaller scale. The resulting 2.5D device may well be better optimized and cost less than it might if the design team attempted to place everything on one monolithic die. That’s a topic I’ll take up in Part B of this blog entry.

Posted in 2.5D, ARM, Design, Low-Power, Microcontroller, Multicore, SOC | Tagged , , , , | Leave a comment

What if 2.5D got really cheap? How would that affect low-power design?

Last week, silicon-interposer foundry Deca Technologies unstealthed. I found out from an article in the San Jose Mercury News and just published a blog about the announcement in my other blog, the EDA360 Insider. Deca is a subsidiary of Cypress Semiconductor and the outspoken President and CEO of Cypress, TJ Rodgers, was good for a quote, as always:

“We want to use the dense, reliable silicon interconnect inherent in Moore’s Law to integrate the dissimilar chips used in today’s systems, but we face an economic barrier because the interconnect on silicon chips is 1,000 times more expensive than the interconnect on PC boards.

“We could enable a new silicon-based interconnect paradigm if we could make silicon interconnect wafers for $10, just what silicon solar wafers cost today. The problem of mapping solar technology onto Moore’s Law is straightforward, but difficult, and we believe DecaTech has the answer.”

Now don’t take that $10 per interposer wafer to the bank. I get the impression that’s a long-term goal, not a short-term pricing roadmap. However, even a 10x drop in interposer costs will have a big influence on the future of 2.5D assembly technology.

And why should we as low-power systems designers care? Because interconnect is expensive and because interconnect now largely determines system performance. First, think about expense. Let your mind go back 40 years (if it can) to the birth of the microprocessor, which we celebrate this month. In the 1970s, microprocessor interconnect meant a bus. Not on a board but in a system. One of the most successful early microprocessor buses was the S-100 bus. It was named for the 100-pin edge connectors and the 100-conductor bus used to interconnect system boards in the original Altair 8800 microcomputer introduced in 1975 and subsequently adopted by several microcomputer vendors including Imsai, Vector Graphics, North Star Computers (formerly known as Kentucky Fried Computers), and Processor Technology and by board vendors including Godbout Electronics/Compupro and Morrow Micro-Stuff/ThinkerToys.

Back then, due to the nascent state of semiconductor integration, you would to have a processor board, one or likely more than one memory boards, a video board, and one or more I/O boards. A major system expense was just the half dozen or so 100-pin edge connectors and the simple but large circuit board that implemented the bus. The S-100 connectors were expensive and you needed a lot of energy to drive the bus lines because they were physically large and because—as bus speeds increased—they required resistive termination to prevent ringing and you needed even more energy to drive the termination resistors.

By 1981 when the IBM PC appeared, things were getting somewhat better. We still had half a dozen edge connectors but we were down to 62 edge-connector pins (for 8-bit systems). Add another 36 pins when we jumped to 16 bits and we found ourselves right back at close to a 100-pin bus. So much for progress.

For board-to-board interconnect, things are going serial (think of the PCI evolution to PCIe) but chip-to-chip interconnect on a board is still largely parallel with lots of pins on a chip looking to connect to lots of pins on other chips. There is still impedance in those pcb traces and you still need relatively big drivers that consume significant amounts of energy to drive those traces. Hence a movement to go serial for chip-to-chip interconnect on a board—an extension of the migration of buses to serialized versions.

High-speed serial buses incur their own costs. There’s the energy cost of driving even a few wires at multi-GHz speeds and there’s the performance hit in the form of latency increase that you get when you serialize and then deserialize a data stream. It’s not all wine and roses.

So we try to put as much as we can on one IC, but that’s not an ideal solution either. Not all IC processing is the same and chips with different functions are really better off being fabricated with different IC fabrication processes. For example, NAND Flash and DRAM processes push as far down the Moore’s Law curve as they can get, as fast as they can to boost density and cut cost per bit. CMOS logic processes are right behind the memory processes but use more layers of on-chip metal interconnect. Because they require more random connectivity than memories. Analog ICs typically operate at higher supply voltages and they’re nowhere near the leading/bleeding edge of IC processor technology. It’s not economical to put all of these different functions on one die, and so we see renewed interest in multichip modules, known by the 21st-centrury name: 2.5D IC assembly using silicon interposers.

2.5D assembly using bare semiconductor die attached to small interposers instead of big circuit boards significantly changes the parallel/serial I/O equation. Suddenly, you don’t need such big I/O drivers on the chip because there’s no wire bond, no IC package interconnect, and significantly shorter traces to drive. Suddenly, massively parallel I/O consumes only a fraction of the energy it previously needed and the balancing equation that calculates the breakeven point between parallel I/O and serialized, high-speed I/O alters. The balance alters to favor parallel I/O more and serial I/O less.

And when major changes like that happen to such equations, the way we design systems also changes.

 

Posted in 2.5D, CMOS, Design, Flash, Low-Power | Leave a comment

“Watt’s Next?” asks Chris Malachowsky, co-founder, NVIDIA Fellow, and Senior VP or Research

Everything—literally everything—we design today is defined by its power consumption said Chris Malachowsky, an NVIDIA co-founder, fellow, and senior VP of research. Malachowsky spoke yesterday at a luncheon during the ICCAD conference held this week in San Jose, California. At the low end of the system spectrum, mobile devices are defined by how much you can do with a Watt. At the high end, supercomputers and supercomputer performance are now defined by how much electricity you can afford. Malachowsky joked that supercomputers now use so much electricity that the local power company is giving them away for free when you sign up for a 2-year service contract just like mobile phone handsets are subsidized by the carriers here in the US. That’s funny enough to hurt.

Coincidentally, NVIDIA makes chips that go into both wireless handsets at the low end (the company’s Tegra series of processors) and supercomputers at the high end (the company’s Tesla series of GPU—graphics processing unit—processing chips). In between are the original NVIDIA products, the GeForce and QUADRO series of graphics chips and boards. Here’s a graphic of the NVIDIA product line from Malachowsky’s talk:

Tegra mobile application processors go into mobile handsets that cost roughly $100 but deliver about 2x the CPU performance and 4x the GPU performance of a PC that sold 10 years ago for about $3200. Here’s the specs for comparison:

 

The quest for performance isn’t going to stop in the handset space so NVIDIA has a roadmap for future processors. The product currently on the market is the Tegra 2 and NVIDIA has already previewed the next step up, code-named Kal-El (Superman’s original name on Krypton). (Note: for more information on Kal-El, see my blog posts “Processor Wars: NVIDIA reveals a phantom fifth ARM Cortex-A9 processor core in Kal-El mobile processor IC. Guess why it’s there?” and “Friday Video: Why do you need four ARM Cortex-A9 processor cores in a mobile processor SoC?”)

Here’s an NVIDIA roadmap for Tegra processors from Malachowsky’s talk:

Note that NVIDIA likes to use superhero alter-ego names for future Tegra processors.

One thing that’s not progressing quickly on the mobile handset front is battery capacity. Batteries are just not getting better as fast as we’re adding transistors to silicon die thanks to Moore’s Law. As a result, the Tegra processors, like all mobile application processors, are constrained by the amount of power available in a handset.

On the supercomputer front, NVIDIA Tesla GPU chips already power three of the five fastest supercomputers in the world: the Tianhe-1A, the Titan (evolved from the x86 Jaguar), and the Nebulae. These supercomputers use a lot of processors, as you can see from this image:

The reason that NVIDIA chips are in supercomputers at all is because researchers and students recognized that NVIDIA’s evolving line of graphics chips contained a lot of parallel processing power and if certain tough math problems and algorithms could be re-expressed to look like problems of drawing and shading triangles, then GPUs could be pressed into service for these other sorts of problems. This conceptual leap resulted in the development of the NVIDIA Tesla line of supercomputing GPU chips.

However, supercomputers are also being constrained by power. Not in how much power is available—it takes megaWatts to run a supercomputer—but by how much power is affordable. And don’t forget, for every megaWatt needed to power the supercomputer, you need a comparable amount of power to cool the supercomputer.

Even the US Department of Energy (DOE) is concerned. It recently put out a Request for Information (RFI) to find out how we might build a 1-Exaflop (an Exaflop is a billion Gigaflops) supercomputer that “only” consumes 20MW (!!!) On the current commercial trajectory, with no extra DOE help, we will eventually be able to build an Exabyte supercomputer but it will consume four or five times the amount of energy said Malachowsky.

Why do we need an Exaflop supercomputer? Because simulation has replaced the wet lab, said Malachowsky. Science, all science, needs simulation and the more the better. The more the faster. As Malachowsky said in his talk, science needs 1000x more computing (but without 1000x the power consumption) because simulation or “computational science” has become the third pillar of science.

(Note: Theory and Experimentation are the first two pillars. Yeah, I didn’t know that either, but I have it on the authority of the President’s Information Technology Advisory Committee.)

And get this: no more lazy, lazy processor or system architects. The “process fairies” aren’t working as hard as they used to, said Malachowsky. Oh sure, they’re still bringing us 2x the transistor count with each new IC process step just like Gordon Moore promised way back in 1965. Sure, the process fairies are keeping that promise. But poor Dennard. His observation about power and speed scaling with lithographic geometry—that’s dead. It died at 90nm. Party’s over.

So what? Here’s what. We’re going to have to rethink our approaches to getting more processing performance using less power. Scaling is out and here’s the graphic proof from Malachowsky’s talk:

Without architectural innovation, the average annual rate of processor performance improvement appears to be dropping from 52% to 20%. Architecture is in and we’ve got to get smarter because the process fairies aren’t working as hard as they used to.

What can we do? Well, one approach is already evident in the design of the multicore NVIDIA Kal-El mobile application processor. The Kal-El chip contains five ARM Cortex-A9 processor cores. Architecturally similar, one of the five ARM processor cores is synthesized for low-power operation. The other four identical cores are synthesized for maximum performance and consequently draw more power. When the Kal-El chip has a lot of work to do, one or more of the high-performance cores is operating. When there’s just a little work to do, the operating system transfers the work load to the low-power core and shuts down all four of the high-performance cores. The Android OS already knows how to do this.

Kal-El’s low-power “companion” ARM Cortex-A9 core is an example of an emerging SoC design style called “dark silicon.” Fortunately, dark silicon is much easier to understand that dark matter or dark energy. Dark silicon simply describes sections of an SoC that are shut down and powered off. In earlier days when there weren’t enough transistors to go around, letting a piece of silicon go dark was unthinkable. In fact, we loaded up a processor with as much work as it could do and perhaps even a little more if we needed to push things. Dark silicon? Fugetaboutit. But now in the multicore era, we’re getting quite used to the idea.

However, dark silicon isn’t going to save us by itself. We need to get smarter about what we do inside of a single core as well said Malachowsky. We’re going to get smart about the energy cost of everything we do inside of a processor core. Malachowsky didn’t directly explain what this means but he did provide a clue.

Here’s a table from Malachowsky’s presentation that shows the energy cost of typical processor transactions:

Note the pattern in this table. The energy costs for moving operands around on chip are comparable to those for performing a computation. This ratio actually gets worse for data movement as lithographic scaling progresses because gates get smaller but the average wire length and cross-sectional resistance get larger. The energy cost for moving an operand on or off chip is higher still. It takes power to wiggle those printed-circuit board traces.

As I said, it was just a clue.

One of the most interesting parts of Malachowsky’s talk for me was where the funding for this architectural research will come from. I would never have guessed.

Video games.

That would be the two middle NVIDIA product lines shown in the first image in this blog post—GeForce and QUADRO. It seems that the video gaming market is pretty big—about $35 billion per year. That’s bigger than the movie market (and way bigger than EDA). Hard-core gamers will apparently pay handsomely for architectural advances as long as it lets them shoot faster.

So when we cure cancer, you can thank a gamer. Meanwhile, give some thought to Malachowsky’s words. There are a lot of really sharp ideas for designers of low-power systems in this presentation.

Posted in ARM, Low-Power, Multicore, SOC | Tagged , , , , , , , , , , | Leave a comment

Generation-jumping 2.5D Xilinx Virtex-7 2000T FPGA delivers 1,954,560 logic cells, consumes only 20W

Xilinx announced today that it is shipping Virtex-7 2000T FPGAs to customers. This is one monster FPGA. Its 6.8 billion transistors deliver 1,954,560 logic cells, 21.55 Mbits of distributed SRAM, 2160 DSP slices, 46,512Kbits of block RAM, four PCIe ports, 36 12.5Gbps GTX serial transceivers, and 1200 user I/O pins. All in about 20W (!!!). The only fly in the ointment, if you want to call it that, is that no one on this planet can make this FPGA as a monolithic device. The Virtex-7 FPGA is a 2.5D assembly that combines four FPGA tiles on a silicon interposer. The interposer provides 10,000 connections between each of the four FPGA tiles

Here’s an exploded diagram of the FPGA assembly:

I’ll be covering the 2.5D/3D assembly aspects of this new FPGA in more detail on my EDA360 Insider blog this coming Thursday (for 3D Thursday), but I want to discuss the low-power aspects of this device here, now.

The obvious contributor to the Virtex-7 2000T FPGA’s power profile is the use of the 28nm TSMC HPL (high-performance, low-power) high-K, metal-gate (HKMG) process technology. Xilinx chose TSMC’s 28 HPL process to make all of its Series 7 FPGA family members (Virtex-7, Kintex-7, and Artex-7) over TSMC’s 28 HP and 28 P process technologies. Instead of letting us know this fact and leaving it there,  Xilinx published a White Paper that goes into great detail about the decision (“Lowering Power at 28nm with Xilinx 7 Series FPGAs”).

Xilinx considered all three of the TSMC 28nm process technologies for the 7-series FPGA families but the company quickly locked on the two HKMG processes (HP and HPL) as being the “best” for FPGA design. Because Xilinx wanted to use just one process technology to cover all of the planned Series-7 FPGA families from high-performance to low-power, HKMG promised the best mix of performance and leakage for the company’s unified approach to designing all of the Series-7 FGPA families. TSMC’s 28 LP process uses PolySiON (polysilicon/silicon oxy-nitride) gate insulation and is best suited for designs that require less performance than FPGAs. The PolySiON 28 LP process produces transistors that are about 13% slower than those produced in the 28 HPL and 28 HP processes (for the types of transistors Xilinx would be using to build its Series-7 FPGAs) while exhibiting more than twice the leakage. The advantage of the 28 LP process is that it’s less expensive.

Eliminating the 28 LP process as a possibility left the choice between TSMC’s 28 HPL and 28 HP processes. Both processes can produce equally fast transistors but the 28 HP process produces transistors with about twice the leakage of the 28 HPL process for the types of transistors Xilinx would be using to build its Series-7 FPGAs. According to the White Paper, TSMC’s 28 HP process is better suited to GPU and CPU designs that require the ultimate performance and that have the power budget (~100W) to achieve that performance. The maximum Xilinx Series-7 FPGA power budget is 40W, so the company selected TSMC’s 28 HPL process technology. However, in a demo last week, Xilinx showed the Virtex-7 2000T FPGA simultaneously running 3600 copies of its 8-bit nanoBlaze processor core, delivering 180,000 MIPS while consuming 20W. That’s a really impressive amount of computation for the power.

However, it’s the system implications of the Virtex 7 2000T FPGA’s huge capacity that really have an impact on system power consumption. Here is an example diagram that Xilinx used to showcase the power-consumption advantage of the Virtex-7 2000T FPGA:

In this image, Xilinx claims that one Virtex-7 2000T FPGA delivers the equivalent capability of four “competitive” FPGAs, each with nearly 1M logic cells. Xilinx came to this conclusion based on characteristics other than logic-cell count and I’m not going to tell you that this diagram shows an apples-to-apples comparison. That’s not my point in using this diagram.

For my purposes, the diagram shows four smaller FPGAs tied together with many high-speed differential pairs. Each composite FPGA-to-FPGA link burns 8W. That’s 32W total used just for chip-to-chip communications. To me, that’s the secret low-power sauce that the 2.5D assembly approach of the Xilinx Virtex-7 2000T has. The extremely wide I/O provided by the silicon interposer drastically reduces the power consumption of tile-to-tile communications and this power consumption is included in the device’s overall power consumption number: 20W. That means the FPGA power consumption is essentially “free” if you look at the wrong end of the telescope.

Where might this reduction in power consumption come in handy? At the announcement, Xilinx VP of FPGA Development and Silicon Technology Liam Madden discussed two example cases relevant to this discussion.

The first example involved a customer looking at developing a large ASIC for a communications application. The defining performance characteristic was the need to handle a terabit/sec aggregate data rate using an estimated 20M gates for the logic. The power budget was about 30W. The customer knew that it wanted some amount of programmability and was therefore considering a 3-chip solution with one ASIC and two FPGAs. The estimated time to develop this design was three years and the estimated power consumption was 70W. Way out of the ballpark. One Xilinx Virtex-7 2000T filled the bill and beat the power budget by 30%.

The second example involved the replacement of six chips with one device. One Xilinx Virtex-7 FPGA swallowed all six devices and delivered 5x the performance of the existing multi-chip system while consuming 1/7 of the power.

Here’s the before picture:

And here’s the “after” picture showing the function of all six chips compiled into one Xilinx Virtex-7 2000T FPGA:

Clearly, 2.5D and 3D assembly is going to have a major influence on the way we design low-power systems in the future.

Posted in FPGA, Low-Power | Tagged , , , , | 4 Comments

Tesla? Hah! How about a brand new, 2012 all-electric DeLorean? $90K

McFly!!! This was just too good to pass up. DeLorean Motor Company (Humble, TX), the establishment that bought the remains of the original DeLorean car company including tons of finished stainless-steel body parts, plans to bring out an electric version next year. A brand new electric version, re-engineered with a 260 hp electric motor that supposedly drives the car from 0 to 60 mph in 4.9 seconds. Wowzers!

The electric DeLorean debuted at the DMC Texas Open House Event on Oct.14th, 2011. Current range is said to be 70 miles without the Mr. Fusion option but there’s no word on when that will become available. Probably sometime in the future.

Read about the Jalopnik test drive of the electric DeLorean here.

Posted in Low-Power | Tagged , , , | Leave a comment

Altera introduces SoC FPGA melding ARM Cortex-A9 dual-core processor complex with a 28nm FPGA fabric

Xilinx first started to talk publicly about the fusion of processors and FPGAs—a product now known as Zynq—in 2010 and has announced plans to roll out parts by the end of this year. It was inevitable that Altera would eventually counter with a competing product line. Today the company revealed plans for a line of chips called SoC FPGAs and comparisons between the Altera and Xilinx offerings are inevitable, but let’s look at the details for the Altera offerings.

The SoC FPGA line will include at least 18 different chips with various configurations for the “Hard Processor System” (HPS) and various sizes for the FPGA fabrics connected to the HPS block. In addition, the SoC FPGA product line will be based on two of the Altera 28nm FPGA fabrics—Cyclone V and Arria V—for two different speed grades within the product line. Here’s a generalized block diagram of a device in the product line:

The SoC FPGAs’ HPS is based on two 800MHz ARM Cortex-A9 processor cores with ARM Neon and single/double-precision FPU extensions. Each ARM Cortex-A9 processor has its own L1 caches—separate 32Kbyte L1 caches for instructions and data. The two processor cores share a unified 512Kbyte L2 cache. Each processor also has private interval and watchdog timers. To keep the two processor cores fed with instructions and data, there’s a hard-core, multiport DDR SDRAM controller in the HPS that supports the DDR2 and DDR3 and the LPDDR1 and LPDDR2 SDRAM interface protocols. There’s also a Flash memory controller with a built-in DMA engine. The NAND Flash controller supports NOR and NAND Flash memories including ONFi 1.0 devices and SD, SDIO, and MMC memory cards. In addition, there’s ECC support for the SDRAM and the NAND Flash interfaces.

Next up are the hard-core peripherals within the HPS. There are a lot of them:

  • Two 10/100/1000 Ethernet MACs with DMA
  • Two USB 2.0 On-the-Go (OTG) controllers with DMA
  • Four I2C controllers
  • Two CAN (Controller Area Network) controllers
  • SPI Master and SPI Slave ports
  • Two UARTs
  • General-purpose ports

On-chip memory includes 64Kbytes of RAM and a boot ROM.

That’s already quite a lot but then there’s the FPGA section  of the SoC FPGA to consider. On-chip FPGA capacity varies depending on whether the particular SoC FPGA device is based on the Cyclone V or Arria V FPGA fabrics. Devices based on the Cyclone V FPGA fabric will be offered with 25K, 40K, 85K, and 110K logic elements. Devices based on the Arria V FPGA fabric will be offered with 350K and 460K logic elements.

The HPS in the Altera SoC FPGA connects to the on-chip FPGA fabric though two 128-bit AXI buses—one for reads and one for writes. As you can see from the block diagram above, the hard-core peripherals not included in the HPS block separately connect to the FPGA fabric. What’s not apparent from the diagram is that the two ARM Cortex-A9 processors share a Snoop Control Unit (SCU) and there’s an ACP (accelerator coherency port) linking the HPS to the FPGA fabric so it’s possible to engineer accelerators that maintain coherency with the ARM Cortex-A9 processor cores’ caches and implement them using the on-chip FPGA fabric.

In addition to the six FPGA array sizes (four for the devices based on the Cyclone V FPGA fabric and two for devices based on the Arria V FPGA fabric), Altera plans to offer parts with three HPS subsystem configurations: base, mid, and high. Combined with the six FPGA fabric sizes, that means there are at least 18 Altera SoC FPGA parts planned for the initial product lineup. Altera says that there will also be 1-processor variants in the SoC FPGA lineup. Just in case you suspect that’s perhaps a bit underpowered, keep in mind that essentially 100% of all system designs based on microcontrollers use a far less capable processor core than one 800MHz ARM Cortex-A9 core. You might want to check to make sure you’re not becoming overly acclimatized to multicore designs. On the other hand, if you’re running Android then two capable processor cores will come in handy.

As the block diagram above shows, there are additional hard-core peripherals connected to the SoC FPGA chip’s FPGA array: as many as three more multiport SDRAM controllers, a Gen2 x4 PCIe port (supplemented with the possibility of implementing a soft Gen2 x8 PCIe port in the FPGA fabric), and as many as six 10Gbps high-speed, differential  serial transceivers and as many as thirty 6Gbps high-speed, differential  serial transceivers. These additional peripheral ports have separate access paths into the FPGA fabric of the SoC FPGA devices.

Perhaps the most interesting news is that low-end members of the Altera SoC FPGA family will sell for $15 in “high volumes.” That’s a lot of capability for a relatively low price. In fact, that’s a very low price in the FPGA world. The bad news is that Altera doesn’t plan to ship devices until the second half of 2012.

So that’s the Altera SoC FPGA. Now for the inevitable comparison based on my previous write-ups of the Xilinx Zynq. (See “Xilinx Zynq EPPs create a new category that fits in among SoCs, FPGAs, and microcontrollers”.) First, there’s the processor complex—what Altera calls the HPS. The two products are remarkably similar here: two 800MHz ARM Cortex-A9 processor cores with Neon DSP and FPU extensions, 512Kbytes of unified L2 cache, Flash controller, one SDRAM controller, Snoop Control Unit, timers and watchdog, DMA, etc. Both processor complexes support an ACP (Accelerator Coherency Port) interface into the FPGA fabrics.

There’s some difference in the processor complex-to-FPGA connection scheme: Altera offers one 128-bit read and one 128-bit write AXI bus and a 32-bit APB (Advanced Peripheral Bus) port plus additional ports that go directly from the FPGA fabric to the multiport SDRAM controller in the HPS. Xilinx offers four 32-bit and four 64-bit AXI ports plus direct access from the FPGA fabric to the SDRAM controller. So the Xilinx parts theoretically provide more raw interconnect bandwidth between the processor complex and the FPGA fabric than do the Altera parts. It remains to be seen if that raw capability can deliver more bandwidth in practice, but the potential is clearly there.

But wait! Hold on there! The Altera SoC FPGA parts offer as many as three more SDRAM controllers outside of the hard-core HPS processor complex and those SDRAM controllers can be connected either to devices implemented as soft cores in the on-chip FPGA fabric or through the FPGA fabric to the HPS. That added SDRAM control capability could really be an advantage in systems with extremely high SDRAM bandwidth requirements.

Then there’s the PCIe controller. On the Altera SoC FPGAs, there’s one hard-core Gen2 x4 PCIe port and the possibility of implementing a second, soft-core Gen2 x8 PCIe port in the FPGA fabric. The Xilinx Zynq parts will provide a hard-core Gen2 x4 or x8 PCI port, depending on the family member. There are additional 10.3Gbps serial channels available on the Xilinx Zynq components, so a soft-core PCIe controller is a possibility, as it is for the Altera SoC FPGAs.

Since I’ve brought up the topic of the FPGA fabric, let’s compare those as well. The various Altera SoC FPGA family members offer six FPGA fabric sizes: 25K, 40K, 85K, 110K, 350K, and 460K logic elements. The announced Xilinx Zynq family offers four fabric sizes: 30K, 85K, 125K, and 235K logic elements. So if you need really big FPGA fabrics to complement the capabilities provided by the processor complexes, then the Altera SoC FPGA family seems to offer more capacity for now. However, should a battlefield form at the high end, you can bet that Xilinx will be filling out the product line at the high end, where there’s more margin to be made.

Finally there’s pricing and availability. Both companies have announced high-volume unit pricing “below $15”  but the Xilinx parts are supposed to be available this year and the Altera parts are scheduled to appear in the latter part of next year.

Together, today’s Altera SoC FPGA announcement and the previous Xilinx Zynq announcements create a truly exciting new product category—one that fuses FPGAs with high-performance microprocessors in a way guaranteed to dramatically extend the reach of FPGAs. The resulting mixture of capability, performance, power consumption, and cost simply cannot be replicated with a 2-chip design.

I predict that many system designers will be unable to resist this combination. Naysayers will point to previous failed attempts at merging FPGAs and hard microprocessor cores and some will predict a similar fate for this new generation of parts. Much has changed. First, the embedded industry has adopted the ARM architectures  and there is a large body of programming talent available for this architecture. Second, these new parts are not FPGAs with processor cores tacked on. They are very capable and complete processor complexes, application processors in their own right, augmented with FPGA fabrics. From my perspective, the Altera SoC FPGAs and Xilinx Zynq parts stand a very good good chance of definining a new and vibrant component category.

Posted in ARM, FPGA, Low-Power, SDRAM, SOC | Tagged , , , | Leave a comment