Monthly Archives: March 2016

IDDQ

IDDQ Testing

Iddq testing is one of the many ways to test CMOS integrated circuits in production. These circuits are usually tested as a way to find different types of manufacturing faults. Electric faults can be a major hazard and it can even lead to fatalities. This method relies on measuring the supply current (Idd) in its quiescent state (static value of a non-switching circuit). The current that is then measured at this state is called Iddq or Idd (quiescent).

IDDQ

 

This testing method is based on the principle that there is no static current path between the power supply and the ground in a correctly operating quiescent CMOS digital circuit – except for a small amount of leakage. It then detects the leak by picking up on any increased magnitude of the current, which is easily shown due to semiconductor manufacturing faults. It then has the upper hand of being able to check the chip for as many possible faults with only one measurement. It also works much better than conventional stuck-at fault test vectors in the sense that it picks up faults that usually go by these measurements undetected.

 

Even though this method is quite popular and simple, its inner workings are very complex. It goes beyond just measuring the supply current. To use an example, if a line is shortened to Vdd it will still be unable to draw extra current if the gate driving the signal is set to ‘1’. But a different input attempting to set the signal at ‘0’ will show an increase in quiescent current that will indicate a bad part in the electrical stream. A typical Iddq test will use about 20 inputs. These test inputs need only controllability and not necessarily observability. The reason for this is that observability takes place through the shared power connection.

 

The advantages of Iddq are far greater than anyone could have ever imagined. Firstly, it is a simple and direct test that can identify physical defects more effectively than standardised equipment or methods. Secondly, the time period attached to it isn’t very demanding. What this means is that the design time and area overhead are relatively low. The test generation is fast, the test application time is fast due to the small sets in vectors, and it catches underlying effects that other tests can’t pick up on immediately.

 

One disadvantage of Iddq testing is that it can be time consuming if compared to methods like scan testing. It is also a more expensive option, comparatively speaking. The reason for this is because it is achieved by current measurements that take much more time than reading digital pins in mass production.

nvme logo

NVM Express IP for Enterprise SSDs – Overview and Implementation

The NVM Express (NVMe) specification has been introduced in 2011. Five years later, it is definitely adopted as the new standard storage interface for Solid-State Drives (SSD). Even if SAS and SATA SSDs are still dominating the market (in unit shipment), the PCIe SSD market share is growing fast and will go over SAS and SATA.

 

Most of SSD manufacturers jumped into this new storage market with flash-based technology. A second wave of products will come in the near future, using a new generation of non-volatile memories, delivering impressive speed performances compared to NandFlash memories. The SSD manufacturers will have a to deal with low latency SSD controller design in order to benefit from the new NVM features, while keeping high reliability and low power consumption.

 

This white paper proposes a solution based on a full hardware NVMe implementation, describing its architecture, implementation and characterization.

 

 

 

The origin of NVM Express (NVMe)

 

From HDD to SSD

 

The total cost of ownership (TCO) of SSD (Solid State Drive) based system is becoming in the same range of systems running on spinning disk drives (HDD), or even lower for configurations that require high performances. The price per gigabyte is higher, but it provides a higher return on investment thanks to the performances. A HDD is limited to about 100 random IOPS due to the mechanics latency. A good SATA SSD can reach up to 70kIOPS. For an application that requires 10kIOPS, at least 100HDDs will be used, but only 1 SATA SSD is enough, saving cost from drives, CPU, memory, software license and power consumption.

 

 

In the digital world where the demand of performance is increasing every day, we see the same comparison between SAS/SATA SSDs and PCIe SSDs. The second one is more expensive in term of dollars per gigabyte, but it provides about 10 times more performances in term of IOPS. Therefore, for application where the value is related to the performances, the choice of PCIe SSDs is the standard choice, providing the lowest TCO.

 

 

With faster non-volatile memories (NVM) and more powerful CPUs, the SATA/SAS interface is the IOPS bottleneck. The PCIe interface provides higher bandwidth and lower latency.

 

 

Non-Volatile Memory and Multicore

 

NandFlash memory is the current used technology for SSDs. The memory read access is in the range of 50µs and in the 1ms range for write access. Post-NandFlash memory will be available soon on the market. Intel and Micron announced the 3DXP technology in 2015, and many other players are working on equivalent technologies. This next generation of non-volatile memories (NG-NVM) such as 3DXP, RRAM or MRAM, come with higher performances than NandFlash, in the µs or below, for both read and write access.

 

In addition to the new generation of non-volatile memories, processing technologies are evolving with higher performances. Frequency speed is not growing any more, due to physics limitations, but architecture allows the integration of multiple cores. High-end CPUs come with up to 18 cores, while some ASSP comes with more than 200 hundred cores (manycore CPU). The CPUs are able to provide more computing capabilities, including IO management.

NVM express specification

 

As described above, current technologies allow the design of very fast SSDs based on fast memories, low latency PCIe interface, and high performance multicore CPUs on the host side. But what about the storage protocol? Using legacy protocols that have been designed for single core architecture and slow disks is not relevant. In addition, the first PCIe SSD manufacturers developed their own driver, adding development and qualification costs.

 

A new standard protocol was required…NVM Express.

 

NVM Express was developed to reduce latency and provide faster performance with support for security and end-to-end data protection. Defined by 80+ NVM Express Work Group members, the specification, published in March, 2011, provides a flexible architecture for Enterprise and Client platforms. NVM Express is an optimized, high performance, scalable host controller interface with a streamlined register interface and command set designed for Enterprise and Client systems that use PCI Express SSDs. For more details on the specification: www.nvmexpress.org.

 

On the host side, the driver source code is available on the NVM Express website, and is supported by the major operating systems.

 

NVM Express benefits

 

  • Driver standardization: an open source version is available from the NVM Express web site. Futur NVMe features, such as vendor specific commands, may integrate the standard driver
  • Performance increase since the SATA bottleneck has been removed :
    • Unlimited number of commands, NVM Express is able benefit from all the NandFlash resources in parallel (only 32 with SATA).
    • Mutli queues management
    • Faster serial bus with multi-lanes
    • Reduced  power consumption

Protocol

 

The NVM Express specification defines a controller interface for PCIe SSD used for Enterprise and Client applications. It is based on a queue mechanism with advanced register interface, command set and feature set including error logging, status, system monitoring (SMART, health), and firmware management).

 

As a basic example, below is the protocol for a single write command.

 

 

NVMe is a protocol encapsulated in PCIe data packet. Here is the simplified write access description:

1)      The host sets the configuration space of the device in order to inform it that there is a new submission queue ready

2)      The device reads the submission queue into the host memory. The submission queue data includes the write access description (called submission command): source and destination address, data size, priority…

3)      The device manages the data transfer

4)      The device sends a completion queue to the host

Commands and queues

 

As explained above, NVMe is a submission/completion queue-based protocol. The commands are created by the host and placed in a submission queue. The completion queue is used to signal to the host the executed commands. It supports up to 64k I/O queues with up to 64k entries per queue. When a submission command is ready in the submission queue, the host indicates it to the device through a tail/head mechanism. Then the device fetches the submission command from the host memory. The submission command is executed according to its priority defined with an arbitration scheme. Each submission command is 64 bytes in size and is composed of a command identifier, an opcode, size, address and all information required for the submission command execution. The predefined submission commands are specific for data transfer which leads to a reduced list compared to the SCSI protocol.

 

The list includes administration and IO submission commands:

 

Administration submission commands:

 

Delete I/O Submission Queue, Create I/O Submission Queue, Get Log Page, Delete I/O Completion Queue, Create I/O Completion Queue, Identify, Abort, Set Features, Get Features, Asynchronous Event Request, Firmware Activate, Firmware Image Download, I/O Command Set specific, Format NVM, Security Send, and Security Receive

 

I/O commands:

Mandatory IO submission command: Flush, Write, Read

Optional IO submission command: Write Uncorrectable, Compare, Write Zeroes, Dataset Management, Reservation Register, Reservation Report, Reservation Acquire, Reservation Release

 

 

 

 

 

 

Enterprise Grade: performance and reliability

 

Latency

 

There are multiple definitions for latency. Therefore this paper will not provide a detailed definition, but some parameters to take into account when discussing latency numbers.

 

Start and stop time: The latency defines the time between a request and its completion. In case of a SSD, one might ask, what is the start time? Or when does the data transfer on the host side initiate?, or when does the SSD receive the request? The difference is simply a subsystem latency or a system latency.

 

What about the stop time: Is it the end of the full data transfer? or is it when the first data is on the bus?

 

Another parameter is the data size when considering the latency as the full time including data transfer. For PCIe SSD, latencies are provided for an IO size of 4kB, 512B or less. The transfer time will not be the same, leading to differences in the latency numbers.

 

In addition, the provided latency in the documentation of a SSD is the latency measured when using it for the first time. In this case, the memory is empty and access is quite fast: less than 100µs, or even lower when cache mechanisms are used; but after a few hours of running, the latency range is more in the millisecond range, or at least few hundred of microseconds.

 

So, when talking about latency, be sure to have the same definition.

 

Queue depth

 

The queue depth (QD) is related to the number of commands executed in the same batch. A queue depth of 1 means that the host will wait for the completion of this single command before sending a new command. A queue depth of 8 means that 8 commands will be processed in the same batch. The IOPS performance is related to the queue depth and the latency. In order to reach the maximum number of IOPS, SSD manufacturers typically use benchmark with QD=256. In the real life of applications, the queue depth is between 8 and 32.

 

IOPS

 

It‘s important to understand IOPS numbers. Storage manufacturers typically communicate with best case IOPS numbers. But for which PCIe configuration? Which IO size? Which Queue Depth? Here is a description of the theoretical maximum number of IOPS on a PCIe Gen2 x4 configuration.

 

The IOPS is one of the key performance parameter for a storage system. The other parameters are the latency (in µs or ms), and the throughput (MB/s). The size of the IO is generally 4kB, and the results are provided in kIOPS. The typical measures listed in storage documentation are done for read and write, random and sequential accesses. In benchmarks studies, additional measures are done like read/write (70/30).

 

 

A Gen2 lane speed is 5Gb/s. With 4 lanes, the overall speed is 20Gb/s. It would be too easy to do the following calculation: 20Gb/s divided by 4096B, equal to 610k IOPS! The real value must take into account at least 3 parameters: the 8/10 bit coding, the PCIe overhead, and the fact that a bus can’t be used 100% of the time. The first factor, the 8/10bit coding, basically results in 80% of real information transferred on the PCIe bus. The second factor is the PCIe overhead. In addition to a data packet, multiple added bytes are included in the data transfer, such as sequence number, header and CRC. That leads in a 20 or 24 bytes overhead. Let’s do the calculation with the typical payload (data packet size) of 256B. 280B are used to transfer 256B, resulting in a 256/280=91% efficiency. Finally, the PCIe bus occupancy rate is estimated as 90%.

 

 

Therefore, the maximum IOPS is 610k x 80% (8/10b coding) x 91% (PCIe overhead) x 90% (PCIe bus occupancy rate) = 400kIOPS. This is the maximum that we can reach, assuming data transfer without any protocol or any command management. Unfortunately, it is impossible to send/receive data only on the bus of a PCIe SSD. A protocol is required in order to provide mandatory information in addition to the data (address, packet size, priority, system information…). The NVMe specification defines an optimized register interface, command set and feature for PCIe SSDs. The necessary commands for the data transfer will add traffic on the PCIe bus, resulting in a performance loss compared the theoretical maximum 400kIOPS. This loss is estimated as 5%, leading to 400 kIOPS x 95% = 380kIOPS. This is the maximum IOPS performance to be observed on an optimized storage system using a NVMe PCIe Gen 2 x4 interface.

 

 

Performance for intensive applications

 

Why do we need systems with low latency? Social networks are good examples, such as a server receiving millions of requests that need to be able to access the storage media with the lowest time; this is not the only case that also includes data base requests, big data analytics, in-memory computing, etc. For all these businesses, a higher latency means a lower transaction number each day, which is often closely linked to revenue. In a few words: time is money!

 

It is easy to understand that low latency will provide benefits to the above applications, but there are other ways to benefit from low latency NVMe drives. In a software-defined infrastructure, the resources are virtualized. This flexible server management is feasible with fast connections between storage and compute, even on different racks. For that, the NVM Express specification is being adapted to other physical layer (Ethernet, InfiniBand, Fiber Channel…) in order to keep advantages of low latency from NVMe, this is called NVMe over Fabrics.

 

 

Reliability

 

Reliability, one of the most important criteria for an enterprise storage system. Losing data is not allowed. How is a storage system reliable? The main potential issues are the memory corruption, a controller failure and performances coherency which may lead in an availability issue. The expected reliability from the IT manager is defined with a 99.999% availability. Below is a short list of features to integrate in a NVMe SSD in order to reach the five nines.

 

Quality of service: this is often related to the latency. SSD manufacturers communicate often with the lowest latency number (e.g 20µs) but IT manager will look at the 99.99 percentile latency (e.g. few hundred of µs). Therefore, the internal SSD management, including NVMe protocol processing, must be designed in order to ensure a low latency in any cases.

 

Redundancy: all flash arrays are typically based on a dual controller architecture. There is an active and a standby controller. That allows the access to the SSDs even if a controller has failed. On the SSD side, that means that each SSD is accessible by the 2 controllers. A switch mechanism is required at the front of the SSD, or as a better solution, a dual port interface can be integrated inside of the SSD. That can be done by using 2 PCIe interfaces.

 

Data protection: the data coming from the host are going through many interfaces and memory buffer: PCIe interface, internal RAM, CPU, external DRAM, Flash memory… Then at each step, there is an opportunity to get corrupted data due to power spikes for example. In order to ensure end to end protection, some specific mechanism must be integrated. It is well defined by the T10 organization.

 

ECC: NandFlash memory is well known to have limited endurance (few hundreds to few thousands cycles). When reaching this limitation, bits may be corrupted in the memory. Then, an error correction code (ECC) is required in order to detect and correct the corrupted data. This is included in the SSD controller, in software or hardware. Common ECC are based on a BCH technology.  LDPC are better for coming NandFlash memories. LDPC allows to correct a higher number of errors.

 

What about NVMe and Reliability?

 

The NVM express specification has been defined for both enterprise and client SSDs. Therefore, reliability features can be easily integrated.

 

 

 

NVMe SSD Design

 

Architecture

 

IP-Maker has developed its own NVMe IP, from the ground-up, to be integrated in SSD controllers. Using pre-validated NVMe IP core, allows to greatly reduce Time-To-Market for storages OEM which want to benefit from a powerful NVMe compliant solution. The IP-Maker NVMe IP core is full featured, easy to use into both FPGA and ASIC designs.

 

Below is the architecture of the NVMe IP from IP-Maker. All the different part required by the NVMe specification have been designed through multiple hardware blocks, including configuration space, queue context management, queue arbitration and read/write DMA engines.

 

 

Each of the hardware blocks takes only few clock cycles to be processed, therefore reducing dramatically the NVMe processing latency. So the impact of the NVMe processing on the system latency is very low compared to the other latency parameters.

 

The NVMe commands are processed by an automatic command processing unit. The data transfer rate is accelerated with the multi channels DMA integrated in the read and write engines (read and write DMA channels are independent and can operate in parallel). So, the PCIe bus is always used by NVMe accesses. The maximum throughput is defined by the PCIe configuration: number of lanes and speed generation.

 

This full hardware NVMe architecture is ideal for persistent memories, such as 3DXP, NVRAM, MRAM or RRAM.

 

Command processing unit

 

IP-Maker has implemented a full hardware architecture for the NVMe commands. That includes the different steps: fetching, arbitration and execution. This Automatic Command Processing Unit is connected to a multi-channel DMA to perform data transfers.

 

All the fetching is automatically done in hardware with dedicated customizable DMAs. The different arbitration rules such as round robin have been designed in a dedicated hardware block. The commands are executed by a NVMe command engine. The additional features are also managed in hardware, including the queue context management, the interrupt management, the asynchronous event and error log page.

 

All administration submission commands and mandatory NVM IO submission commands are processed in hardware, without use of an external processor. In case of a read or write command, the DMAs are triggered to perform the data transfers, leading to a full hardware implementation of the NVM Express specification. A CPU may be required to provide flexibility like for vendor specific commands.  This architecture is easy to integrate with standard interfaces, between the PCIe and memory controllers.

 

Multichannel DMA

 

The multi-channel DMA is configurable, up to 32 channels for read and up to 32 channels for write. The DMA engines are activated as soon as a memory transfer command is fetched by the automatic command processing unit. Using multiple channels allow to perform data transfer all the time, avoiding transfer stop when the data are read or written in the memory. NVMe is based on a queue mechanism. Multiple commands can be used in one queue. For latency optimization, the best is to use one command per queue. But for data transfer, it is recommended to increase the number of commands in a single queue (e.g. 1024 commands per queue).

 

Reliability

 

The way to manage and process the NVMe protocol may have an important impact on the quality of service. When based on a full software implementation, the processing time may change according to IRQ management. If based on a full hardware architecture, the processing time is deterministic and will provide a good system latency QoS.

 

Using a full hardware NVMe implementation is more easy to use with a dual PCIe interface. A  NVMe IP can be used on the back end of each PCIe controller, or only one can be used, shared by the PCIe controllers. The second case seems to be easier to manage. The use of a tag system will help in identifying which PCIe controller is accessing the NVMe IP.

 

 

 

 

Test and verification

 

Reference design

 

The NVMe IP has been integrated in a FPGA-based reference design. It is based on Xilinx FPGA. The NVMe IP is connected to the PCIe hard IP and a soft DDR3 controller IP. It is configured as Gen2 x4. The storage part of this NVMe reference design is based on a 2GB DDR3 memory in order to demonstrate the NVMe IP performances.

 

 

On the host side, a server platform is running Linux Fedora 17 with the NVMe driver. When installed in the PCIe slot, the NVMe reference design is detected as a NVMe storage device. The performances are measured with the standard FIO tool (GFIO version with graphical user interface).

 

The latency numbers have been measured with GFIO, a standard tool for storage benchmarking:

 

 

The results of GFIO show a 380 kIOPS (the maximum on a Gen2 x4 configuration), in both sequential and random 4kB IOs. On the latency side, it leads to 12µs (QD=1, IO= 4kB). This latency budget is split between 3µs for data transfer setup, and 9µs for data transfer completion. A data transfer setup includes:  a file system setup and an NVMe driver setup. Completion time includes:  doorbell register write access, submission command fetching, data transfer, and completion command. The latency budget coming from the IP-Maker NVMe IP is only few hundred of ns.

 

Other tools, such as a protocol analyzer may be used in order to screen all the NVMe transaction details.

 

Compliance test

 

In order to be NVM Express compliant, a SSD product must pass the official test suite managed by the University of New Hampshire Interoperability Lab (UNH-IOL). The conformance tests are based on the UNH-IOL NVM Express Conformance Test Suite Document release by the NVM Express work group. UNH-IOL provides NVM Express conformance testing.

 

For more details on the test suite: https://www.iol.unh.edu/services/testing/NVMe

 

The reference design from IP-Maker has successfully passed the NVMe compliant test in 2015.

IP-Maker is part of the official NVMe integrator list: https://www.iol.unh.edu/registry/nvme

 

 

 

 

 

 

Conclusion

 

This NVMe hardware implementation combines the benefits of an ultra-low latency, a reduced power consumption and a limited silicon cost. Therefore it is an ideal solution for high performance data storage systems. Industry leaders will benefit from the enterprise grade without adding cost. In addition, it is ready to support next generation of NVM, which comes with better performances in term latency, density and power consumption.

 

 

 

 

About IP-Maker

IP-Maker is a leader in Intellectual Properties (IP) for high performance storage applications. IP-Maker’s NVM Express (NVMe) technology provides a unique hardware accelerated solution that leverages the PCIe SSD performances, including ultra-low latency and high throughput. IP-Maker is a contributor to the NVMe specification. The ASIC and FPGA IP portfolio includes NVMe, Universal NandFlash Controller and ECC IP cores. The combination of the IP-Maker technology and its associate services dramatically cuts time-to-market.

www.ip-maker.com

 

Contact information: contact@ip-maker.com  

 

calculator

Understanding IC Cost

There have been many debates around the final cost of an IC. Over the years the misconception and failure to agree on what the calculated IC cost would be. The reason for this is that ICs are a simple concept anymore. Technology moves at an extremely fast pace and chip designers have to keep up with this when calculating the IC cost.

 

 

A while ago, silicon die used to be the dominant factor when it came to calculating IC costs. Back then, estimating a chip cost was as easy as determining by die size of the chip. While silicon still remains a major key element in the equation, it has become necessary to consider all the other components that play an equal important role.

 

Experts have noted that there is a very simple equation one can use in order to determine the final chip cost:

 

Final IC cost = package cost + test cost + die cost + Shipping cost

 

 

This equation then takes care of all the necessary factors that have the hugest impact on the production of the chip. Of course there are other factors that might be calculated into this standard calculation. For example, shipping costs can also be factored in. This includes things like shipping, handling, ERP system, trays and boxes, insurances, and other costs that you need to consider.

 

The most important element to keep in mind when calculating IC costs is the price may change during manufacturing process due to several reasons. The most obvious reason is that some price elements are not yet set. For example yield and testing time. Also, technical decisions that are made during the manufacturing process can have a huge impact on the economic status of the project. The only way to make sure that you are calculating an accurate IC cost is to understand that some things are still open.

 

 

Early analysis is the key to ensuring that you keep the IC costs under control. Therefore, we have made this calculator that can help you get a good estimate of the final IC cost.

 

IC Cost

Wafer Click here to get wafer price quote
Wafer Sort Click here to get test price quote
Package Click here to get package price quote
Final Test Click here to get test price quote

 

 

image for package

Designing for Power Integrity: Status, Challenges and Opportunities

It has been almost two decades since the target impedance concept was first proposed for the design of power distribution networks. Both academia and industry have come a long way since then by proposing solutions for managing power integrity in packages and printed circuit boards (PCB). This paper briefly reviews the past and identifies challenges that need to be addressed  in the future for tackling this problem. These challenges are often times opportunities for research  that can lead to interesting and often times innovative solutions. Some ideas for man- aging power integrity in the future are discussed in this paper.

 

I. Introduction

 

The semiconductor industry has been very successful in scaling the transistor over the last five decades. Thanks to Moore’s law, this scaling has enabled the integration of a billion transistors on a chip today. However, scaling requires that the voltage be reduced from one computer generation to the next and this essentially has led to the problems related to power distribution. With the current increasing due to the doubling of transistors every 18 months and due to voltage scaling, the demands placed on the power distribution has been steadily increasing. Designing for power integrity refers to managing the power supply noise across the voltage and ground terminals of the transistors such that they function at speed. The chip, package and PCB have their fair share of contribution towards the generation of power supply noise and hence their individual designs and interactions between them play a large role in determining power supply noise. This is depicted in Figure 1 where both the core and I/O circuits need to be powered through the power distribution network (PDN).

 

 

In Figure 1, the core circuits corresponds to transistors that communicate with each other within the integrated circuit (IC) while I/O circuits are the Input/Output terminals that are used to com- municate between ICs through the package and PCB. As has been well documented by now by several researchers [1], the

voltage fluctuations on the power supply rails of the transistor cause increased jitter and reduction in the voltage margin and therefore power supply fluctuations have a direct influence on the operating frequency. Over the years, the communication speed between ICs off-chip has steadily been increasing with greater than 6Gbps speeds per I/O terminal being supported as in GDDR5 [2] along with an increase in the number of parallel bits being transmitted. With an objective of
reaching >1TBps of communication bandwidth between ICs, the electronics industry is in the process of developing new technologies and signalling schemes to enable it. Examples are 2.5D and 3D integration where ICs are partitioned and communicate with each other through a silicon interposer or are stacked on each other
using through silicon vias, as shown in Figure 2 [3].
In Figure 2, the communication between ICs determines the speed of the system and hence the interconnections and package play a very critical role in determining the performance of the system. The focus of this article is primarily on off-chip (or I/O) signalling.

 

II. The Status

Though power supply noise is a transient phenomenon that occurs due to the switching of transistors, the design of the PDN is best accomplished in the frequency domain. This concept, which originated in the mid 1990s, is the methodology being pursued by most designers today. This methodology involves the optimization of the PDN impedance such that it meets a target impedance value, where the target impedance is defined using Ohm’s law. This is illustrated in Figure 3 where the target impedance ZT allowed looking
from the power supply terminals of the transistor or IC towards the Voltage Regulator Module (VRM) is given by:

 

where dV is the allowed ripple (specification of the transistor) and I is the current drawn. If the current drawn by the transistors is known and is assumed as a constant with frequency, then the target impedance is a constant with frequency as well, as shown in Figure 3. Since the PDN has resistance, inductance and capacitance in them, the
resulting frequency response is oscillatory with resonances (nulls) and anti-resonances (peaks), as depicted in the figure. The objective of the design process is to ensure that the PDN response never exceeds the target impedance over the frequency bandwidth of interest (typically up to the fundamental of the clock frequency or higher) [4].

 

On a semi-log scale as shown in Figure 3, the positive slope in the PDN response is due to the inductance, shown as the culprit here (since it increases the impedance), and the negative slope is due to the capacitance, shown as the saviour (since it decreases the impedance). The effect of resistance is not shown in the figure even though it is very important in managing the frequency response. As noted in Figure 3, various parts of the system which include the voltage regulator module (VRM), capacitors (on the chip, package and PCB) and planes contribute towards the impedance depending on the frequency range. This concept can be applied to both core and I/O signalling with the
fundamental difference between the two being the long transmission lines in the package and PCB used to communicate between ICs in the latter.

 

Figure 3: Target impedance and PDN response

 

Figure 4: Transmission line signalling, return current, return path discontinuity (RPD) and role of capacitor

 

 

 

Figure 5: Eye diagrams (a) before capacitors: height=400mV and jitter=93ps and (b) after capacitors: height=475mV and jitter=76ps

 

The importance of transmission line signaling on signal and power integrity is best illustrated here using a simple four metal layer structure in Figure 4, where an interconnection transitions between the top and bottom layers creating two discontinuities along its path [5]. As is well known from electromagnetic theory, a transmission line supports a forward and return current, where the charging of the interconnection in Figure 4 causes a return current to flow on the plane closest to it. Following the return currents on the planes, and since the planes support different DC potentials (VDD and VSS), they are not connected to each other. This causes return path discontinuities
(RPD) at via locations, which act like displacement current sources, causing electromagnetic disturbance between the two planes. Since the edges of the planes are open circuited, over time standing wave resonances are generated between the planes. As has been shown in [5] and by others, the standing waves create an impedance response between the planes at ports 3 and 4 in Figure 4 similar to the response in Figure 3, causing a variation in the insertion loss of the signal lines between ports 1 and 2 in Figure 4. The channel response measured as an eye diagram at port 2 will be affected by the RPDs. Today, this effect is mitigated either by using capacitors (shown in Figure 4)
or by stitching the planes together using vias (if the planes are at the same DC potential) at the discontinuities. This design approach provides the necessary continuity for the return currents, thereby improving the channel response.
The measured eye diagrams for the example in Figure 4 at port 2 using a 600Mbps Pseudo Random Bit Stream (PRBS) at port 1 are shown in Figure 5 where a ~20% improvement in eye height and jitter results from using decoupling capacitors. The simple example used to illustrate the effect of RPDs on the channel response is one of the root causes for interaction between signal and power distribution in a system. The addition of capacitors to compensate for return path discontinuities in concept is similar to Figure 3 where the impedance of the power distribution is lowered through the addition of capacitors. In 2004, lowering the PDN impedance for improving signal
and power integrity was identified as a major requirement in the future [6]. Since then new technologies have emerged from both industry and academia to address these, which include thin dielectrics and embedded decoupling capacitors to name a few in both the package and PCB, which are now commercially available [1].

Over the years, with the complexity of systems growing due to the increase in channel speed and bandwidth, the number of discontinuities in the system is growing rapidly. Managing these discontinuities using capacitors or other means is a major challenge we face today.

 

III. The Challenge

In a complex package or PCB, the discontinuities along the return path that affect signal integrity are many. An example is illustrated

Figure 7: PDN impedance and role of capacitors over last two decades

 

in Figure 6 where a subset of the discontinuities experienced by the signal lines are shown for one layer of an eight layer package from IBM. Of course, not every discontinuity can be fixed using capacitors since other means need to be used as well such as changing stack-up, re-routing the lines around discontinuities and removing via transitions to name a few. In most packages and PCB, the severity of the RPDs is a result of two primary effects namely, 1) referencing where a signal line is referenced to either the ground plane, voltage plane or both and 2) cavity resonances due to the standing waves generated between the voltage and ground planes. Over the last decade, several signal and power integrity tools have emerged that has helped immensely to tackle the RPD identification problem. To reduce PDN impedance we have essentially relied on the use of
capacitors. Some of the complex systems today use advanced technologies mentioned earlier such as thin dielectrics and embedded high K dielectrics for reducing impedance. Hence, over the last two decades we have successfully reduced the PDN impedance by a large factor to the order of milli-ohms today, as shown in Figure 7.
Today, the capacitors we use in a system are enormous and are often times far greater than the ICs they service. With the electronics industry being very cost conscious, the luxury of relying on capacitors to continuously reduce the PDN impedance from one generation to the next is becoming increasingly difficult. Simply put, adding capacitors is akin to throwing away money as illustrated in Figure 7 and this represents the overwhelming challenge we face today. Ofcourse, not all capacitors relate to mitigating RPDs
in the package and PCB since they also support the core circuits

Figure 8: Power transmission line based PDN for chip to chip communication

 

and therefore the IC designers are equally to blame. But as the designers of the channel for inter-chip communication, can we address this problem through other means, rather than just relying on capacitors. In [6], another issue that was highlighted was the need for minimizing noise coupling through the power distribution. This led to the development of electromagnetic bandgap (EBG) structures for isolation, where slots in the voltage and ground planes create bandgaps in the frequency response [1], [7]. Though this improves
isolation significantly, it introduces more RPDs into the design and also increases power supply noise at the source (due to larger inductance), which can both be very  problematic.
The problems identified could very well be opportunities for research. Some ideas are discussed in the next section which by no means are the only solutions, but hopefully acts as a catalyst to transform the way we address these challenges in future years.

 

IV. The Opportunity

There are two fundamental problems that require a solution today, namely, 1) minimizing return path discontinuities in a design and 2) suppressing power supply noise. These two issues are related to each other and our goal is therefore to find a solution by minimizing the capacitors required. To this end, a possible solution addressing the communication path between two ICs is shown in Figure 8.
In the figure, the voltage plane is removed and replaced with a power transmission line (PTL) [8], where the signal transmission line and PTL are referenced to a common ground plane. This removes the RPDs described in Figure 4 caused by a change in reference planes for the signal lines and eliminates cavity resonances between the voltage
and ground planes. Since cavity resonances are the primary source of coupling through the PDN, EBG like structures are no longer required and filters can be designed in the PTL to mitigate coupling at the source. Also since noise coupling occurs locally as cross talk between the transmission lines, this can be mitigated through known
methods. Decoupling capacitors are of two types namely, 1) source capacitors that provide charge to the switching circuits and 2) capacitors that are used to mitigate RPDs. The second class of capacitors can be eliminated using the signalling scheme in Figure 8, thereby reducing the total number of capacitors required. Moreover, since the
impedance of the PTL is determined based on the on-resistance of the transistor, the impedance of the signal line (typically 50Ω), and the voltage swing required at the receiver, its value is of the order of ohms and not milli-ohms. The termination resistor at the source end in Figure 8 is used to absorb reflections in the network, which can be
removed based on the signalling scheme used [10]. PTLs however fix only half the problem by eliminating RPDs since the schematic shown in Figure 8 can still generate power supply noise by causing voltage fluctuations between the TxPwr/RxPwr and ground nodes of both the transmitter and receiver ICs. This issue can be addressed by constructing
a PDN that stays pre-charged to a constant voltage at all times. In Figure 8, if the PDN is pre-charged to a constant voltage, then the parasitics of the PDN shown as series resistor and inductor in the figure, will no longer cause fluctuations at the power supply terminals of the ICs. This is possible using a simple modification to the circuit schematic in Figure 8, as shown in Figure 9.

The dummy path transistor in Figure 9 is a PMOS transistor whose gate has an inverted input as compared to the data input (data_in). In the figure the PTL has an impedance of 25Ω to obtain a 1.25V voltage swing across the 50Ω resistor. The receiver IC is not shown in the figure. When the signal transmission line is being charged, the dummy path transistor is OFF and only turns ON during the discharging of the transmission line. If the on-resistance of the dummy path transistor is adjusted to Rpmos+Z0, then during discharging, the current flowing from the power supply (Vdd) into ground through this transistor will equal the current during the charging of the signal line. Since the current is constant during both cycles, the power supply node TxPwr is always charged to a constant voltage and therefore there is no noise across the power supply terminals of the transistors (at least theoretically!). Details of this signalling scheme are available in [8]. Implementation of the CCPTL scheme on a PCB using off-the-shelf ICs has shown that the improvement in eye height and jitter (peak to peak) at 1.5Gbps can be as large as 15% and 36% respectively (Figure 10), as compared to the conventional schemes used today.

Figure 9: Signalling using Constant Current Power Transmission Line (CCPTL)

 

So, imagine a PDN that contains no voltage planes, contains minimum return path discontinuities, uses high impedance structures and requires fewer capacitors than before. The PTL, in some cases, can be routed on the same layer as the signal transmission line, reducing the layers required. Since the PDN is always charged to a constant voltage, the design is very tolerant to manufacturing variations where the mismatch effect between the PTL and signal lines is minimal. Designing such structures becomes much easier due to the absence of resonances and therefore the need for CAD tools that analyse entire packages or PCBs to compute the PDN impedance may no longer be required (not sure if
this would be good for the EDA vendors!). In addition, a single PTL can be used to service several drivers, thereby reducing routing congestion, as described later in this section.
In the CCPTL scheme, since constant current is drawn from the power supply during both the charging and discharging of the signal transmission line, the power usage doubles as compared to the more conventional methods. With low power being a main driver for many electronics applications, and since the number of
parallel I/Os between ICs are expected to increase, the power usage needs to be reduced. This is possible through coding schemes such as pseudo-balanced signalling [9], [10] and inversion coding [11]. The Pseudo–balanced PTL (PBPTL) which uses 4 to 6 bit coding thereby maintaining a constant current through the PDN, has been shown to reduce power consumption by 50% as compared to CCPTL [10]. Similarly inversion coding with Constant Voltage PTL (CVPTL) where a resistive network is used to vary the
current through the PDN to maintain the power supply at the chip terminals at a constant voltage has been shown to reduce power as well [11]. These are some alternate techniques that provide the benefits described earlier, while simultaneously reducing power.

 

Figure 10: Eye diagrams at 1.5Gbps for (a) conventional and (b) CCPTL signalling for one channel

 

To demonstrate the practicality and scalability of this approach for industrial applications, the PBPTL scheme has been applied to power the output drivers from a Spartan-6 LX45 FG(G) 484 Xilinx Field Programmable Gate Array (FPGA) IC shown in Figure 11a successfully. This implementation supported 12 bit to 18 bit coding
using a PTL of impedance 10Ω at 600MHz, where a single PTL was used to feed all the drivers, as shown in Figure 11b. The eye diagram at the far end of a signal line for a PRBS at 600MHz is shown in Figure 12, showing a peak to peak jitter of 100ps and eye height of 1.37V, for a 2.5V power supply, where the improvement in peak
to peak jitter was 66% as compared to the more conventional methods [12].

 

Figure 11: Xilinx FPGA with PBPTL implementation (a) Fabricated Board and (b) Layout showing PTL

 

Figure 12: Measured eye diagram of Xilinx FPGA board

 

V. Conclusion

Packages and boards are becoming very complex. Managing signal integrity is becoming very challenging due to the interaction between the signal and power distribution network, which occurs through return path discontinuities. Given the challenges we face in designing such packages and PCBs, it is time for our community to start thinking outside the box for managing power integrity in the future, rather than just extending already known methods. Stuffing the package and PCB with more and more capacitors to
reduce PDN noise is definitely not the answer. This is a great opportunity for all of us to think of new approaches to address the challenges we face. We need the Eureka moment and it is certainly time for our community to innovate!

 

VI. Acknowledgement

Most of the work on power transmission lines was supported by the National Science Foundation under contract number ECCS- 0967134.

 

VII. References

[1] M. Swaminathan and E. Engin, “Power Integrity Modeling and Design for
Semiconductors and Systems”, Prentice Hall, Nov 2007.
[2] S. Bae, et al, “A 60nm 6 GB/s/pin GDDR5 Graphics DRAM with Multifaceted
Clocking and ISI/SSN Reduction Techniques”, IEEE International Solid State
Circuits Conference, pp 278-279, 2008.
[3] M. Swaminathan and K. J. Han, “Design and Modeling for 3D ICs and Interposers”,
World Scientific Publishers, Sep. 2013.
[4] L. D. Smith, R. E. Anderson, D. W. Forehand, T. J. Pelc, and T. Roy, “Power distribution
system design methodology and capacitor selection for modern
CMOS technology”, IEEEE. Trans. on Advanced Packaging, vol. 22, no. 3, pp.
284 – 291, Aug. 1999.
[5] Madhavan Swaminathan, Daehyun Chung, Stefano Grivet-Talocia, Krishna
Bharath, Vishal Laddha, and Jianyong Xie, “Designing and Modeling for
Power Integrity”, Invited Paper, IEEE Transactions on Electromagnetic Compatibility,
pp. 288 – 310, Vol. 52, No. 2, May 2010.
[6] M. Swaminathan, J. Kim, I. Novak and J. Libous, “Power Distribution Networks
for System-on-Package”, IEEE Trans. on Advanced Packaging, Vol. 27,
No. 2, pp. 286-300, May, 2004.
[7] M. H. Nisanci, F. de Paulis, D. di Febo and A. Orlandi, “Practical EBG Application
to Multilayer PCB: Impact on Signal Integrity”, IEEE EMC magazine, Vol.
2, No. 2, pp. 82-87, 2013.
[8] S. Huh, M. Swaminathan and D. Keezer, “Constant current power transmission
line based power delivery network for single-ended signaling,” IEEE
Trans. on Electromagnetic Compatibility, Vol. 53, Issue: 4, pp: 1050 – 1064,
2011.
[9] D. Oh, F. ware, W. p. Kim, J. H. Kim, J. Wilson, L. Luo, J. Kizer, R. Schmitt, C.
Yuan and J. Eble, “Pseudo-differential signaling scheme based on 4b/6b
multiwire code“, Proceedings of Electrical Performance of Electronic Packaging,
pp. 29 – 32, Oct. 2008.
[10] S. Huh, M. Swaminathan and D. Keezer, “Pseudo-balanced Signaling using
Power Transmission Lines for Parallel I/O Links”, IEEE Transactions on Electromagnetic
Compatibility, Vol. 55, Issue: 2, pp: 315-327, 2013.
[11] S. Telikapalli, M. Swaminathan and D. Keezer, “Minimizing simultaneous
switching noise at reduced power with constant voltage power transmission
lines for High Speed Signaling”, International Symposium on Quality Electronic
Design (ISQED), pp. 714-718, 2013..
[12] S. K. Kim and M. Swaminathan, “Implementation of Power Transmission Lines
for Field Programmable Gate Arrays for Managing Signal and Power Integrity”,
International Symposium on Electromagnetic Compatibility, 2013.

 

 

_______________________________________________________________

This is a guest post by Madhavan Swaminathan is the John Pippin Chair in Electromagnetics in the School of Electrical and Computer Engineering (ECE) and Director of the Interconnect and Packaging Center, Georgia Tech; and the Founder and CTO of E-System Design, a company focusing on the development of CAD tools for achieving signal and power integrity in integrated 3D micro and nano-systems.

The article was first published on IEEE Electromagnetic Compatibility Magazine – Volume 2 – Quarter 3

Sprinter leaving starting blocks on the running track. Explosive

Speed in IC’s : A major concern

Firstly let me ask what strikes your mind first when I say performance?
Intel started designing processors with MHz to GHz frequencies (Improving the performance of course, but if we see the advantage there might be some flaws too). Yes serially it was possible to send and receive the data fast, but consequently faster was the rate of power consumption. Then came the low power techniques like clock gating and power gating into limelight. Clock gating as the word says it simply means gating the clock.

 

Sprinter leaving starting blocks on the running track. Explosive

 

Why to waste the clock every time, if data is controlled by another signal? But bringing a gate into the clock path is something playing with the highest priority signal & something similar was with power gating where few sections of the circuit when not in use were shutted off.

 

While many companies like Intel, Soctronics etc are already busy working on 14nm technology as area is what market always demanded throughout the years for small size and light weight products, the catch is less the area a system takes, more the devices an IC eat and more the devices an IC have more is the logic so is the functionality. This reminds me of how Jack S kilby have thought of bringing “Integration of circuits” where devices started shrinking as width kept on decreasing with what the wire resistance and its internal capacitance decreased as well.

 

Now as delay decreases the time it takes for the electron to travel from one point to another, the frequency of operation increases.  This is what we have seen and should continue right? It is like Few years back SONY laptops faced some complaints from the users that the system after a year or so started producing humming sound in some of its products running at 3 GHz clock. This noise was even louder then desktop systems, its like a bowing plane, I don’t want to detail it but yeah you must have heard that. The system would even go so hot, you just can’t afford allowing it to rest on your lap. Yes, the frustrating sound invoked within was from the fans used required to compensate the heat produced due to continues switching of signals.
Experiencing the same some smart one’s will definitely try for re installations but  that really is not a reliable solution. Picking any another, fan noise should only come into picture if you are using lots of tabs at a time but here the sounds momentarily continues even when it is in a complete “shut down condition” or when it shows (0 % CPU usage and 25% memory usage) still it continues to do its job.

 

 

Anyways, what made this happen…? and what made the speed a limitations now instead with the device shrinking the speed should actually keep on increasing.

 

To answer this question it is important to know:

 

  • What exactly the size of Silicon Dioxide molecule is ?
  • Should we ignore the wire delays which connects the macros (i.e transistors) at this level.

 

Switching speed is decided by how strong the electric field is. If the device shrinks the area should reduce and this reduces the distance between the two plates (Gate and Channel) which eventually results into stronger and stronger field. But the molecular size of silicon dioxide is approx. 0.09nm (for 45nm technology) which indeed shows a limitation you just can’t reduce further. Hence more reduction in area after 45nm results in more power consumption but no subsequent increase in speed (frequency of operation). Moreover with channel length going beyond quite a few nano-meter, the wire capacitance and resistance will play an important role deciding the delays hence, cannot be ignored. So frequency of operation is an important issue now.

 

_______________________________

This is a guest post by Shiven Pandya

ti

TI Factory in Greenock to Close Down or Transfer

Hundreds of jobs are under threat after Texas Instrument announced the closure of its Greenock factory.

 

TI said it would be closing the fab over the next three years, and moving its operations to fabs in Germany, Japan and Maine, in order to save money. Approximately 400 people are likely to be affected, in manufacturing, engineering, management and support. The company said it does not expect any job losses to occur before 2017, and it would take three years to fully transfer operations abroad.

 

ti

 

The company stated: “Our employees have done everything they can to keep the site cost-competitive, and we strongly considered ways to improve the site’s efficiency, such as upgrading or expanding the facility.

 

“However, even with a considerable investment, TI’s factory in Greenock would be far less efficient than our other larger, more efficient fabs (fabrication plants), which have open capacity available to absorb what’s produced in Greenock.

 

“As part of this process, we are attempting to sell and transfer the facility as an on-going manufacturing operation (manufacturing related jobs, equipment, land and building).

 

“We have contracted with Atreg, a company that specializes in selling manufacturing properties, to help us with this.”