Monthly Archives: October 2014


Low-Power VLSI Design Methodology

Low Power Design is the today’s need in VLSI. Why? Well, ask yourself ! You go to gadget shop, looking for a new cell-phone. Apart from the price, what are the qualitative things that you would be most concerned about?


  • Features including the speed of the processor
  • Battery back-up
  • Operating System

A good Operating system can make an efficient use of the system’s hardware resources but is more driven by the software applications that you wish to run. However, the first two are directly influenced by the design methodology and the technology node that goes behind designing your device.


You would love to buy a cell-phone with a faster processor to enable you to have your applications run fast, your computations quicker. Also, you wouldn’t want to charge your cell-phone every hour. Or for that matter everyday! This would translate into a design challenge to have your device to consume least power.

Frequency and power go hand-in-hand. You cannot just go on increasing the frequency (assuming that timing is met!), without expecting any hit on power.


Power, itself has many components. To just give you a glimpse, we’ll talk about the components of power in brief.

Power dissipated has two components: Dynamic and Static.





Dynamic power constitute that component of total power which comes into picture when the devices (the individual transistors) switch their values from either 0 to 1 or vice-versa. Dynamic power itself has two components:

  • Capacitive Load Power: Depends on the output load of each transistor switching states.
  • Short Circuit Power: Depends on the input transition.




Static Power is the component which is dissipated when the device is not switching i.e. it is in standby mode and mainly constitutes of leakage power.


We talked about the fact that Power and speed of the device go hand-in-hand. It is pretty much evident from the above equation. As you tend to increase the frequency of your design (again emphasizing that timing must be met!), the switching rate of the devices would increase and hence capacitive load component of the dynamic power would increase.



One turn-around to reduce power is to reduce the voltage supply at which your devices are working. But this, in turn, will reduce the signal swing available for the devices to cross the threshold voltage ( Vt ) and hence would engender myriad design challenges.


Before I conclude this post, I would like to make one last point. The device complexity is increasing every day and the device size is shrinking. This ensures that your latest cell-phone is sleek in its look but again, hit is on Power!





The above table shows the trend of ever-increasing power dissipation with scaling down of technology nodes. This has forced the designers to come up with innovative design solutions to deliver the best to you.

In upcoming posts, we will discuss these design-for-low-power solutions in detail.


[1] Low Power Methodology Manual: For System on Chip Design by Michael Keating, David Flynn, Robert Aitken, Alan Glibbons and Kaijian Shi.



This is a guest post by Naman Gupta, a Static Timing Analysis (STA) engineer at a leading semiconductor company in India. To read more blog posts from Naman, visit

selecting CPU

Selecting an embedded MCU for SoC

This is a guest post by Dolphin Integration which provides IP core, EDA tool and ASIC/SoC design services


When a SoC integrator has to select a microcontroller for his application, most of the time, he refers to his own previous experiences rather than on a rational assessment as no standard benchmarking process has emerged for selecting an embedded microcontroller. The reason is that there are a lot of criteria to consider and it is difficult to achieve a fair comparison between the different products available.


Experience shows that the main criteria to consider when selecting a microcontroller are, on one hand: speed, area, consumption, computing power, code density, on the other hand: quality and maturity of development & debug tools. Each of these criteria depends on many parameters and conditions for evaluation that make accurate comparisons difficult. Moreover, the selection of an embedded MCU requires that the evaluator knows how to balance the relative value of each criteria depending on the challenges of the top circuit implementation.


Therefore, due to the complexity of parameters to be taken into account (technology, Process/Voltage/Temperature conditions, standard cell library, tools versions …), and knowing that it is rare that suppliers provide exhaustive specifications of all these parameters when communicating the performances of a core, it is unlikely that the evaluator can make a meaningful comparison between different products based only on data given in the datasheets. In this article we will discuss the pitfalls to avoid during the assessment of the relevant criteria for the selection of an embedded MCU, and we will propose some guidelines to make such assessments more straight forward.


Indeed, the power consumption of a processor depends on many factors such as the target technology (lithography size, process flavour, threshold voltages…see Table 1), the library of standard cells used to implement the core and the execution activity imposed on the processor during power simulation. Beyond that, what is left unsaid is often more important than what is stated explicitly. Consequently, caution and judicious reading of the vendor data sheets are mandatory when comparing the power figures for competing processor IP. Rarely, you will find apples to compare with apples from the data sheets only.


Table 1: Process option versus Power

The table 1 shows the great variability of power according to threshold voltages and process variant. In the given example (NAND2 drive 2 gate), the dynamic power can be 43 % higher in LP process but the leakage power can be up to 175 times lower! Furthermore, as leakage varies dramatically with temperature, letting the reader unaware of the process and temperature corner being used for power estimation may lead to a wrong comparison and then to attribute artificial qualities to a processor. Table 2 shows the impact of standard cells library selection on power figures: depending on the library chosen in the same lithography and the same process, the dynamic power may vary in a wide proportion.

Table 2: Library options versus Power


Figure 1 shows the impact of process corner on power specifications. We can notice that the PVT conditions applied for the measurement can decrease of 30% the power consumption


Figure 1: Flip80251-Typhoon power consumption in different PVT conditions


While values of power consumption of a packaged processor (standard chip) take necessarily into account all circuitry in the package, power specifications of processor cores are based on simulations – vendors are free to delete or ignore any number of power-dissipating functions when reporting power numbers. Some vendors do not include specific functions when they measure the power consumption. They often specify in their datasheets what they did not take into account.


However, sometimes, these functions are vital for the microcontroller. For example, when a supplier does not include the clock tree, the customer has to appreciate the consequences of this omission: indeed, the clock tree includes the gates operating at the highest frequency of the core. This function dissipates much of the CPU power. It is not possible to design a microcontroller without clock tree. Obviously, it would be unfair to compare the power specifications of two processors and omit the clock tree from one of them (see Figure 2).


Figure 2: Flip80251-Typhoon power consumption with/without function omission


Another example of frequent omission: Shall we consider the static consumption? It will depend on the fabrication technology, the Vt flavour selected (i.e. Nominal, High or Low Vt), the temperature range and the operating frequency. Indeed, down to 180 nm technology, static power consumption is not considered because it is insignificant compared to the dynamic power consumption. However, in advanced fabrication technology, from 90 nm and beyond, the static power consumption cannot be ignored anymore. It could represent a significant part of the total consumption but most of the time, vendors do not indicate the static power consumption.

Even if the datasheet clarifies all the items listed above, one more important “detail” remains unclear (in most of the datasheet): What is the processor doing while the power simulations run? If the program being run during the power simulation is a loop of NOPs (i.e. no operation), the customer would expect to get lower power numbers than if the processor were exercising its function units. Thus even the benchmark program being run during power simulation can influence the core’s power consumption specifications.

Table 3: Influence of processor activity on power figures

Data contained in datasheets will give a global idea of the dynamic consumption. Thanks to that, it will help to have an overview of the consumption of competitors’ solutions, but since no standardized power-benchmarking program for processor cores has emerged and since the measurement conditions can be far from the customer’s application conditions, it seems unrealistic to have accurate power consumption estimation for a microcontroller without running its own power simulations.


Thus, such partial data will not enable an accurate model of the consumption, which is essential to the SoC designer for proper sizing of the power grid to meet IR-drop and Electro-Migration criteria. We encourage the customer to run his own power simulation for two reasons. He could have a full control of the evaluation conditions and he could assess the power consumption of the rest-of-SoC, especially the memory system, in the mean time.


This last point is important and often underestimated: an embedded processor is just one part of the system and a processor that enables to reduce the number of access to the memory system is a better processor for low power optimisation. So, it is important to use EDA solutions to have an early assessment of the system power consumption. When evaluating a processor core, you could benefit from an evaluation version of our SCROOGE EDA solution. It enables to get quickly an accurate estimation of the power consumption thanks to the emulation of the clock tree and the wire loads.


When processor vendors communicate on the area, there are also some omissions that could make the numbers meaningless. The MCU area depends a lot on the configuration considered: is it only the processor configuration? If not, which peripherals are considered? Which options? These questions are really crucial when considering the MCU area. For example the MCU Flip80251 Typhoon with the processor configuration is raised by 24% adding standard peripherals like Timers, UART, I/O ports and an embedded debug support (see Figure 3). The difference of areas can be very significant between the configurations. As a result, to compare the area of two MCUs, their configurations have to be purely the same. Even some synthesis options like scan insertion have to be considered or not in the area estimation and indicated in the measurement conditions.



Figure 3: Flip80251-Typhoon area


Core vendors are used to communicate on area either in terms of number of (equivalent) gates or by giving a silicon area in mm2. Both ways present some traps to avoid.

The number of gates is helpful to express the area independently of the library chosen for the synthesis. It is computed by dividing the core area after synthesis by the area of a reference gate. For all vendors, this reference gate is a NAND2 gate, but is there only one NAND2 gate? The answer is no because there are different subtypes of NAND2 gates varying in size. A NAND2 gate is available in several “drives” in a library: for instance, a NAND2-drive1 has an area of 7.526 μm2 in TSMC 0.18 μm process, and a NAND2-drive2 has an area of 15 μm2. As a result, an IP of 5,000 gates for NAND2-drive1 equivalent Gate count will be announced at 2,500 gates for NAND2-drive2 equivalent Gate count. So, the figure is very different from one count to another if the drive is not indicated in the IP supplier documentation.


The other point that makes the gate count comparison unfair is the variability of gate size between libraries. The relative size between the NAND2 and the others cells has a direct impact on the gate count. Imagine you are synthesizing a core with 2 libraries in TSMC 0.18um which differs only for the size of the NAND2 drive 1 and the size of one Flip-Flop. The library A has a NAND2 of 10 um2 and a DFF of 100 um2. The library B has a larger NAND2 (15 um2) but a smaller DFF (95 um2). Suppose that for a given design, the number of NAND2 and DFF is the same. The total area would be identical with library A and library B. However, the library A will give a gate count 50% Processor i/o ports WDT Timers 0&1 UART BIRDTiny Drawing scaled to actual area in TSMC 0.18um  larger than the library B even if it is the same core, synthesized with the same constraint in the same technology. As a result, the gate count depends on multiple parameters. It can be drastically different depending on how it is calculated and what is considered. The gate count is uncertain and so, not significant to conclude on the MCU area.


Therefore, the silicon area is the figure that really matters. It will represent the real cost for the designer and its customer. However, once again, this figure depends on many parameters: the lithography size and the process, the standard cells library used, the PVT conditions, the synthesis constraints (frequency, I/O delays, max cap, rest of SoC constraints, etc…), and if the result is pre or post place and route. Taking into account all these parameters, it seems impossible to compare area figures from different providers measured in exactly the same conditions. The customer will even have to ask to the MCU provider to give the detailed measurement conditions because it is rarely indicated in the datasheets. Then, the figure of the datasheets will be useful for a first evaluation.

Table 4: Influence of library in core area

Moreover the principal parameter impacting the area is the clock frequency chosen. Between area oriented optimization and speed-oriented optimization synthesis, the result will be widely different. It demonstrates that this information is crucial when comparing value in data sheet, to know if the microcontrollers are compared in the same conditions.

Figure 4: impact of target speed on silicon area


As a conclusion, if a SoC designer wants to get, from the datasheets, some information about the MCU area, he will have to be very careful on the measurement conditions. But he has to keep in mind that for tiny processors in advanced technological processes, the CPU area is so minimized and the difference would be so small that the gate count should be a secondary decisive factor.


As shown before, even if the frequency and the area are highly linked together, the right way to define target frequency is to look at the frequency that enables to process the critical part of the application program in the required amount of time. Then, you could assess what is the impact of the required frequency on area and power. Indeed, the goal is not to target the highest frequency possible with a core because it would give both a huge area and a high consumption, but to find the pair “processing power – core frequency” that provides the best trade-off area / power. In that way, processor frequency has to be considered rather as a trade-off that enables to achieve the others target. In the next chapter, we will propose a method to assess the processing power needed avoiding the classical traps.


The wide range of applications makes it difficult to characterize the embedded domain. In embedded systems, the applications range from sensor systems with simple MCUs to smart phones that have almost the functionality of a desktop machine combined with support for wireless communications. Another particularity of the embedded world is that there is not a significant legacy code base that would favour a standard instruction set architecture (ISA), as it has happened in the desktop world. This has led to a remarkable diversity of ISAs for embedded applications that makes the selection of the benchmark program even more crucial for finding the best architecture for a particular application. The more frequent pitfall when evaluating the code density or the processing power is to underestimate the role of the selected benchmark on the reliability of the result. By not applying the right benchmark, you could select a processor which is not the most appropriate for your application as shown in the table 5 below.

Table 5: Code size for different benchmark

These benchmarks are part of the category “industrial control” of MiBench.

So, the question is: how to select the right benchmark for a meaningful appreciation of the processing power (and consequently the code density)?


Since different application domains have different execution characteristics, a wide range of benchmark programs has been developed in the attempt to characterize these different domains. Most of these benchmarks are targeted towards specific areas of computation. For instance, the primary focus of the Dhrystone was to measure integer performance; LINPACK is for vectorizable computations; and Whetstone is for floating point intensive applications. Other benchmarks are available to stress network TCP/IP stacks, data input/output and other specific tasks.


Even if the limitations of Dhrystone are well known by a majority of developers, Core vendors, including the largest one, ARM™, are still communicating on processor performance by giving their Dhrystone score.

Why Dhrystone is not good?

The Dhrystone is a “synthetic benchmark”. Synthetic benchmark programs are artificial programs that include mixes of operations carefully chosen to match the relative mix of operations observed in some class of applications programs. The hypothesis is that the instructions mix is the same as those of the user program, so that the performance obtained when executing the synthetic program should provide an accurate indication of what would be obtained when executing an actual application. The main problem is that the patterns, for memory access in real application, are very hard to duplicate in a synthetic program. These patterns determine memory locality that deeply affects the performance of a hierarchical memory subsystem (i.e. including a cache). As a result, hardware and compilation optimization can produce execution times that are significantly different that the execution times produced on actual application programs, even though the relative instructions mix is the same in both cases.


To improve the limited capabilities of synthetic benchmarks, standardized sets of real application programs have been collected into various application-program benchmark suites. These real application programs can more accurate characterize how current applications will exercise a system than the other types of benchmark programs. However, to reduce the time required to run the entire set of programs, they often use artificially small input data sets. This constraint may limit the ability of the application to accurately model the memory behavior and I/O requirements of a user’s application programs. However, even with these limitations, these types of benchmark programs are the best to been developed to date.


There have been some efforts to characterize embedded workloads, most notably the suite developed by the EEMBC consortium and its academic equivalent MiBench. They have recognized the difficulty of using just one suite to characterize such a diverse application domain and have instead produced a set of suites that typify workloads in some embedded markets. For instance, MiBench benchmark programs are divided into six suites with each suite targeting a specific area of the embedded market. The categories are Automotive and Industrial Control, Consumer Devices, Office Automation, Networking, Security, and Telecommunications. All the programs are available as standard C source code for being portable.


It is important to note that a benchmark program should be easy of use and should be relatively simple to execute on a variety of systems. A benchmark difficult to use is more likely to be used incorrectly. Furthermore, if it is not easy to port the benchmark to various systems, it is probably a better use of the performance analyst’s time to measure the actual application performance instead of spending time trying to run the benchmark program.

In summary:

  • Select carefully your benchmark according to your application domain
  • Do not blindly trust the result if you are not able to clearly state how close of your real application the benchmark program is
  • Do not underestimate the capacity to optimize a C code for a given architecture. If the evaluator is porting a C code that has been optimized for a different architecture, it is likely that the first density resulting on the new architecture is far from the best result that can be achieved with this architecture. Furthermore, take care of the relative Ccompiler optimization
  • Take care of C-library support / optimisation. Most of the recent 32-bit MCU architectures have a development tools suite based on GCC. The C-library of GCC includes support of functions that are most of the time useless in embedded systems. As a result, if these libraries are not optimised for embedded system, they will be unnecessary large. Optimizing C library is a way to reduce code size without modifying application program or optimizing C compiler


It is very easy to make a processor assessment that leads to a wrong conclusion, because at each step of the evaluation, there are many possibilities to compare data that where not measured under the same conditions. That is the reason why we could see many circuits that embeds not the most appropriate MCU, but only a convenient one. In this article, we have seen how Dolphin documents a rigorous frame that enables to avoid the main pitfalls of the evaluation process and, when coupled with advanced power-simulation tools and methodologies leads to an objective assessment of different MCU architectures to select the right embedded MCU for the targeted application.


Click here to learn more about Dolphin Integration products and services




Electronic Design: Getting to First Time Right

This is a guest post from Michael Hermann, V.P. of Engineering at Nuvation Engineering, provider of complex electronic product development and design services.


Getting to “first time right” is a key goal at Nuvation Engineering and is built into our electronic design methodology. Broadly speaking, it means two things: First, when you design and build new hardware, your methodology delivers new boards with zero cuts and jumps. You may need to still tweak a component value or two or change component population options, but you’ve avoided those problems that would require a board re-spin. Second, your first board spin has NO architectural or serious performance issues that would require a re-spin before transitioning to volume manufacturing. So basically “First Time Right” means “no board re-spin required.” This helps your project stay on budget, on schedule, and maybe even ahead of the curve on both of those critical KPIs.


Electronic Design Example: Custom BMS for liquid metal battery energy storage system
Custom battery management system for large-scale liquid metal battery. Learn more.


Let’s be up front about one thing so my fellow engineers don’t call me out here – first time right design isn’t about trying to get to the highly unlikely outcome of being able to take your very first layout to long-term manufacturing without making any changes along the way. That’s just almost never going to happen, and if it did you just got lucky (in terms of long-term optimization needs), that’s all. Realistically, a First Time Right board will very likely still require minor layout updates as you proceed toward the volume manufacturing process – optimizations are to be expected for reasons of test, manufacturing cost, system integration, and other considerations.


3 Benefits of First Time Right

Three of the most important benefits are schedulecost, and confidence.

Schedules gets shorter in the long-run because major design changes (remember, don’t confuse First Time Right with “first time perfect”) cost a lot of time and money.


Electronic Design Example: FDR InfiniBand PCIe Card
FDR InfiniBand (14Gb/s data rate per lane) PCIe card. Learn more.

Regarding confidence, in engineering we’ve always got customers – very directly when you’re an electronic design services company like Nuvation, but an internal engineering team is accountable to customers too. Product management, sales, marketing, manufacturing, executive management – these people/groups are all looking for, and counting on, outcomes from Engineering. So whether internal or external, an engineering team with a First Time Right approach is going to be better able to instill confidence in the many stakeholders who are dependent on Engineering’s success. First Time Right outcomes can, apart from helping your job security, increase cooperation and collaborative problem solving and improve performance outcomes across departments, or even across whole organizations. It’s sometimes hard to measure all these benefits, but we can certainly say that First Time Right outcomes reach beyond just the project scope, budget and schedule.


5 Proven Principles to Reach First Time Right

So how do you get to First Time Right? While there are many techniques, I will discuss what I have found to be some of the most important guiding principles. By “principles,” I mean how one approaches the work – the work itself would cover things like simulations, tolerances, flexibility/expansion approaches, etc., all very important things which I may write about sometime, but not here today.


Up-Front Planning and Design – Every board spin of even a moderately complicated design will cost weeks of schedule time and a multitude of engineering hours. Investing some time up front on rigorous planning and design can save you much more damaging schedule and cost overruns later in the project.

Electronic Design Example: High-speed HD camera with CCD sensor.
High-speed HD camera with CCD sensor. Learn more.

At Nuvation we distinguish design as happening at the document level, which is why we write detailed Hardware Design Descriptions (HDDs) before jumping into schematics. In the HDD you select your major components, figure out your reset sequencing, your I/O, your clocking, and so on (it’s quite a long list). All these kinds of things need to be thought through before you’re drawing schematics, or you’re going to be wasting a lot of time redoing work.


Peer Review – This is perhaps one of the most fundamental principles at Nuvation. We consider engineering to be a team sport, and your peers are your teammates. We do peer reviews both informally and formally: “Informally” means we encourage engineers to go to their colleagues for consultation, input, and problem-solving – don’t be a lone wolf. “Formally” means we do reviews before designs proceed to the next stages – we do those reviews at the design level (see below for what I mean by that), as schematics near completion, and at multiple points during PCB layout (usually in the placement and routing phases).


Signoffs – Nuvation leverages some of the best value points from phase-gate project management methodologies. This includes having several points during a project where we ask ourselves and our customers respectively for signoff. This feeds into peer review too. When you ask somebody to sign off on something and really say “I approve it,” this ups the ante.


Hardware is Not an Island – at Nuvation hardware, software and FPGA engineering are a combined group. If you don’t have these disciplines working together, you’re going to have architectural problems. A practical application of this principle is to have the hardware teams get their requirements design document(s) and schematics reviewed by the embedded software team. The software engineers don’t need to possess the skills required to do the hardware work themselves, but at the interaction points they must be able to contribute at a level that can corroborate the quality of the design and help improve upon it; members of great teams understand each other’s work.


There are no Junior Engineers – at Nuvation we don’t have “junior” engineers – once you’re on the team, you’re a “Staff Engineer” and you grow from there. Entry-level engineers bring a highly valuable questioning approach; without years of “doing it that way,” they are unburdened with the past – they can come from curiosity and tend to question more.

Nuvation co-op engineer
“Entry-level engineers bring a highly valuable questioning approach; without years of “doing it that way,” they are unburdened with the past – they can come from curiosity and tend to question more.”

Nuvation has chosen to harness these fresh ways of looking at engineering and we actively value the contributions of our recently graduated engineers, hence the term “Staff Engineer.” A good engineering group embraces fresh and new perspectives; to think you know it all is just plain arrogance.


To Sum – First Time Right design is achieved through collaboration across disciplines (peer review), rigorous planning and documentation, accountability (sign-offs), and valuing the contributions of all staff engineers regardless of seniority. The major KPIs impacted are schedule, cost, and confidence. Of course there are many other aspects to an engineering approach that achieves a First Time Right design; for example, working with highly talented engineers is certainly a big part of it – but at the end of the day two teams with equal skill can produce different outcomes, depending on how they work together, and the principles they follow to achieve First Time Right electronic design.

This is a guest post from Michael Hermann, V.P. of Engineering at Nuvation Engineering, a U.S. and Canadian-based provider of complex electronic product development and design services.

flip chip

Flip Chip Market and Technology Trends

Yole Développement announces its Flip Chip Market and Technology Trends report. Yole Développement’s analysis updates the business status of the Flip-Chip market including data for TIM, underfills, substrates and Flip-Chip bonders. Discover fully updated 2010 – 2018 market forecast, detailed technology roadmap and bottom up approach, plus a strong focus on micro bumping for 3DIC & 2.5D.



Over the next five years, an incredible 3x wafer growth is expected for the Flip-Chip platform, which will reach 40M+ of 12’’eq wspy by 2018!
Despite its high 19% CAGR, Flip-chip is not new – in fact, it was first introduced by IBM over 30 years ago! As such, it would be easy to consider it an old, uninteresting, mature technology… but this is far from true! Instead, Flip-Chip is keeping up with the times and developing new bumping solutions to serve the most advanced technologies, like 3DIC and 2.5D. Indeed, no matter what packaging technology you’re using, a bumping step is always required at the end! In 2012, bumping technologies accounted for 81% of the total installed capacity in the middle end area. That’s big. Really big. So big that it represents 14M+ 12’’eq wafers – and fab loading rates are high as well, especially for the Cu pillar platform (88%). Flip-Chip is also big on value: in 2012 it was a $20B market (making it the biggest market in the middle-end area), and Yole Développement expects it to continue growing at an 9% clip, ultimately reaching $35B by 2018!
Flip-Chip capacity is expected to grow over the next five years to meet large demand from three main areas:
1) CMOS 28nm IC, including new applications like APE and BB
2) The next generation of DDR Memory
3) 3DIC/2.5D interposer using micro-bumping.

Driven by these applications, Cu pillar is on its way to becoming the interconnect of choice for Flip-Chip.
In addition to traditional applications which have used Flip-Chip for a while now (laptop, desktop and their CPUs, GPUs & Chipsets – which are growing slowly but still represent significant production volumes for Flip-Chip), Yole Développement’s analyst expects to see strong demand from mobile & wireless (smartphones), consumer applications (tablets, smart TV, set top box), computing and high performance/ industrial applications such as network, servers, data centers and HPC.
The new “Flip-Chip packaged ICs” are expected to radically alter the market landscape with new specific motivations that will drive demand for wafer bumping. “In the context of 3D integration and the “More than Moore” approach, Flip-Chip is one of the key technology bricks and will help enable more sophisticated system on chip integration than ever before!”, says Lionel Cadix, Market & Technology Analyst, Advanced Packaging, at Yole Développement.
Flip-Chip is being reshaped by a new kind of demand that is hungry for Cu pillars and micro-bumps, which are on their way to becoming the new mainstream bumping metallurgy for die interconnection.


Meanwhile, Cu pillar is fast becoming the interconnect of choice for advanced CMOS (≤ 28nm), memory, and micro-bumping for 2.5D interposer and 3DIC
In addition to studying mainstream bumping technologies, this Yole Développement report focuses on Cu pillar bumping, which is becoming increasingly popular for a wide variety of applications. The massive adoption of Cu pillars is motivated by a combination of several drivers, including very fine pitch, no UBM needed, high Z standoff, etc.
Cu pillar Flip-Chip is expected to grow at a 35% CAGR between 2010-2018 in terms of wafer count. Production is already high at Intel, the #1 Flip-Chip producer – and by 2014, more than 50% of bumped wafers for Flip-Chip will be equipped with Cu pillars.
As early as 2013, micro-bumping for 2.5D & 3DIC, in conjunction with new applications like APE, DDR memory, etc., will boost Flip-Chip demand and create new challenges and new technological developments (see figure on the left). Today, Flip- Chip is available in a wide range of pitches to answer the specific needs of every application.
The ultimate evolution in bumping technologies will consist of directly bonding IC with copper pads. 3D integration of ICs using this bump-less Cu-Cu bonding is expected to provide an IC-to-IC connection density higher than 4 x 105 cm-2, making it suitable for future wafer-level 3D integration of IC in order to augment Moore’s Law scaling.



Taiwan is the #1 location for Flip-Chip bumping, announced Yole Développement.
The major OSATs are preparing to produce fcBGA based Cu pillar packages and won’t limit the reach of cu pillar bumping to fcCSP. This will allow every company involved in CPU, GPU Chipset, APE, BB, ASIC, FPGA and Memory to access Cu pillar Flip-Chip technology.
Cu pillar capacity is expected to grow rapidly over the 2010 – 2014 timeframe (31% CAGR), hitting ~ 9M wspy by 2014 and supporting the growing demand for micro-bumping and advanced CMOS IC bumping.
In the mutating middle-end area, CMOS foundries now propose wafer bumping services (TSMC, GLOBALFOUNDRIES, etc.), as opposed to bumping houses, which are dedicated to bumping operations (FCI, Nepes, etc.), and OSATs, which keep investing in advanced bumping technologies.


In 2012, OSATs owned 31% of installed capacity in ECD solder bumping and 22% of installed capacity in Cu pillar bumping. A full overview of 2012 installed capacities for all bumping platforms is provided in this report.
Concerning geography, Taiwan has the biggest overall bumping capacity (regardless of the metallurgy), with important capacity coming from foundries and OSAT factories. Taiwan currently leads the outsourcing “solder & copper” Flip-Chip wafer bumping market (see figure on the left). Flip-Chip market growth, spurred on by the emergence of the “middle-end” environment, has challenged traditional “IDM vs. fabless” supply chain possibilities more than ever before!


This is a guest post by Yole Développement that provides marketing, technology and strategy consulting.

Failure analysis

Getting the Most out of IC Design Houses

In the field of IC Design, there are many companies that have unique focus and experience in designing integrated circuits, as well as being involved in the other aspects of IC design house elements, such as packaging, testing, validation, et cetera.


It may seem like a simple choice to go with one of the big name companies that are already established on the market, but their services can be much more expensive than one has bargained for.


Therefore, many companies may want to seek the services of smaller, more specialized IC design houses to cater to very specific needs, or may want to employ a large provider, but their projects are too small to be considered by these established names.


By creating a pool of IC design houses and vendors, the problem of finding the right IC design house(s) for your needs can not only help you save costs, but it will also reduce time to market and minimize the risks of paying for additional masks or design bugs.


But How IC Design Houses Get Exposure?


Looking for more independent IC design house can be difficult for those who are using a typical Google search to find what they’re looking for. That’s one of the primary goals of this search directory, is to help companies make the most affordable choices in IC design house while helping smaller, more independent companies obtain the exposure they need.


Smaller companies that have a smaller focus in the realm of IC design house, get boost their exposure get the jump start they need to work their way into the market.


Provide the Right Projects to IC Design Houses


Although visitors are free to peruse through the lists of providers in the use of the search option, there is an RFQ form that can be filled out and submitted to the IC Design Houses via concierge service. By analyzing the criteria listed, a viewer’s request will be submitted to those IC design houses whose skills are most fitting for the job, who can then contact you directly. This makes it much easier for customer and provider to engage in a dialogue that leads to the creation of project and possible future engagement.


By pairing the right service provider(s) in IC design house with a customer, the marketplace will continue to grow, as newer IC design houses are always being created and won’t be left stranded in the market with no way of gaining the attention they deserve.


Join AnySilicon search directory to get your IC design house listed – click here.


TSV is a business…Looking for wider adoption!

3D Through Silicon Vias (TSV) is in MEMS, CMOS Image Sensors and high-end applications. When will it be used for mainstream consumer applications?… All results are part of the new report released by Yole Développement (Yole): 3DIC & 2.5D TSV Interconnect for Advanced Packaging – 2014 Business Update. This technology & market report is an overview of the TSV implementation for various devices and packages including memories, logic, MEMS, photonics, CIS and other applications. The company analyzes future 3D products for high-end applications and alternative packaging technologies (fan-out, advanced organic substrates, monolithic 3D). Yole also details the market adoption roadmap and wafer start by platform and by application.


Through Silicon Vias (TSV) technology was adopted in production a few years ago for MEMS and CMOS Image Sensors (CIS).” Driven by consumer applications such as smartphones and tablets, this market is expected to continue to grow over the next several years. For high end memories, 2015 will be the turning point for 3D adoption”, explains Thibault Buisson, Technology & Market Analyst, Advanced Packaging at Yole. “Standards have now been established, therefore the industry will be ready to enter in high-volume manufacturing. Wide I/Os and logic-on-logic will follow, most probably around 2016-2017”, he adds. Emerging applications, such as photonics based on interposer, are also being developed for future products. However, their market entrance is most likely not going to happen before 2019-2020.



According to Yole Développement, market drivers have not changed over the years, fundamentally. Today, 3DIC is still driven by the need to increase performance and functionality, and to reduce form factor and cost. Adoption of 3DIC technology–due to its many advantages, as well as its ability to enable heterogeneous integration—is being considered for a wide range of applications.


“There is a significant advantage to using 3DIC and that is why this packaging platform is part of all the roadmaps of the key semiconductor players across the entire supply chain”, details Rozalia Beica, CTO & Business Unit Director, Advanced Packaging and Semiconductor Manufacturing.


Once 3D is adopted it will never be dropped! In the CMOS Image Sensor application the evolution of TSV has never stopped. Even though the integration methods used for CMOS Image Sensors have changed and evolved over the years, TSV continued to be incorporated in the packaging of these devices, increasing functionality and enabling more efficient utilization of its silicon space. Sony, leader of the CMOS Image Sensor, by using a full-filled TSV and via last approach to stack the CIS onto a CMOS die, was able to more efficiently utilize (90%) of its die surface area for the pixel array while decreasing the size of the die. This technology, called Exmor, is using a 3D stacked integration approach, and, currently is the new trend for this type of devices as it enables a smaller die size and faster on-chip processing. The path is open for the heterogeneous integration of devices: MEMS are being integrated onto ASIC dies connected with TSVs (such as mCube, Bosch, with their accelerometer products, and others), and 3D stacked devices with integrated passives for medical applications, etc.


This is a guest post by Yole Développement. Click here to read more from Yole Développement.