ASICs stands for Application Specific Integrated Circuits, and refer to semiconductor solutions design for a particular application, as opposed to other solutions like Field Programmable Gate Arrays (FPGAs) which can be programmed multiple times to perform a different functions. ASIC is also sometimes referred to as SoC (System on Chip).
The journey of designing an ASIC is a long winding road which takes you from a concept to a working silicon. Although the end product is typically extremely small (in mm2), the journey is quite interesting, full of challenges and trade-offs which the designers need to wrap their heads around to make the best engineering call. This post would try to elucidate different steps in the ASIC design flow starting from ASIC design specification to design tape-out for manufacturing in the foundry, and highlight important decisions and activities that each step entails. While the intricacies of each step might depend on the choice of EDA vendor, the design application and also the technology node, the sequence largely remains the same. Figure 1 shows the flow chart for the ASIC design flow.
Figure 1: ASIC Design Flow
ASIC Specification
The first step in ASIC design flow is defining the specifications of the product before we embark on designing it. This phase typically involves market surveys with potential customers to figure out the needs and talking to the technology experts to gauge the future trends. The latter is particularly important because ASIC design cycle may be anywhere between 6 months to 2 years. It is therefore important to foresee and predict what trends would be relevant 1-2 years down the line if one needs to sell their product to a wide audience.
This marketing research translates into high-level product specifications like top level functionality of what you intend to do with your ASIC, specific computation algorithm that you want to implement, clock frequencies that would make the product appealing to the customers, package type- Ball Grid Array (BGA) or CSP (Chip Scale Package) etc., power supply, communication protocols that will help interface with the external world, temperature range that you would want your product to work in.
Developing a thorough and correct specification usually sets a solid foundation for the ASIC design. The technical specifications need refinement of the technical requirements over time, but it’s important to cover the information in an unambiguous manner.
ASIC Architecture
After pruning the specifications, it’s now time to partition the entire ASIC or SOC’s functionality into multiple functional blocks. Architects like to brainstorm many possible options for the architecture and discuss their pros and cons while considering- performance implications, technical feasibility, and resource allocation in terms of both cost and time. A good architecture focuses on gleaning the best performance of the ASIC chip, while minimizing the hardware resources which directly helps in keeping the overall cost of the chip within the allocated budget. During this phase, architects define the relationship between various functional blocks and allocate time budget to each block. All these technical details are captured in an architecture document.
Figure 2: Micro-architecture of Intel’s Haswell Processor. Image Courtesy: Real World Technologies
Once you have high level idea of all the functional blocks needed, it would be prudent to identify the critical modules and possibly brainstorm whether you need to re-use those IPs from previous projects, make necessary changes to the existing IPs or perhaps procure them from other parties.
The divide between hardware and software blocks is also a critical part of this phase of the ASIC design. Design is captured in a high level programming language like C++ or System C.
Logic Design and Verification
This step refers to the frontend part of the ASIC design flow and involves coding the data flow of each functional block in a hardware description language like Verilog, VHDL or System Verilog. The interactions between the functional blocks is also coded. Logic Design usually comprises of:
Combinational Logic: Combinational logic usually refers to Boolean combinatorial gates like the OR, AND, NAND, NOR etc. While these gates are simple, these can be combined to perform complex digital operations.
Sequential Elements: Sequential elements play a critical role in interfacing between different combinational logic clouds performing different functions by storing their output temporarily. These sequential elements like the flip-flops and the latches are also referred to as memory elements and are controlled by a synchronizing or a control signal referred to a clock. Both flip-flops and latches are bi-stable elements because they have 2 stable states: 0 or 1.
Finite State Machines (FSMs): These are higher abstraction of a sequential logic which can be implemented both in hardware and software. FSMs model response of a digital machine to a set of inputs to produce deterministic set of outputs, and serves as an important building block for logic designers.
Arithmetic Logic Blocks: Arithmetic computations form the heart of the computing logic, and usually is the bottleneck for performance in high performance CPU cores. Arithmetic computation includes addition, subtraction, multiplication and division. There are numerous possible implementations of these circuits which offer a trade-off between performance, area and power. Logic designers can choose one best suited for their application to optimize for one or more parameters.
Data-path Design: In addition to coding combinations of above elements, Hardware Description Languages (HDLs) can model data path design in an abstract manner like a programming language which can be interpreted by EDA tools correctly. These could be multiplexing, decoding, case statements etc.
Analog Design: In addition to digital logic, ASIC may have many analog components help in interfacing with the real world and may comprise of Temperature Sensors, Analog to Digital (ADC) and Digital to Analog Converters (DAC), and most importantly the clock generating unit the Phase Locked Loops (PLLs).
Example of a behavioral HDL code for 2:1 Multiplexer:
In parallel to logic design, verification team needs to develop a verification plan or both digital and analog logic components, and create testbench to be able to test the design for all possible corner cases to ensure correct functionality which needs to be consistent with the specification. Writing the RTL usually takes around 10-20% of the entire design cycle time, while Verification accounts for 80-90% of the time.
Physical Design
This refers to the backend design cycle. If there’s just one aspect that distinguishes the backend design from frontend design, then it would be- delay. Frontend design, while being cognizant of the logic delays and speed, largely ignores it for majority part of the RTL coding and verification. While, on the other hand, physical design sees real delay right from the very beginning.
Physical design flow is further sub-divided into the following:
Synthesis
Synthesis reads in the RTL code (.v or .sv files) along with physical libraries of the standard cells that may contain- delay information (.lib files), physical dimensions and metal layer information within the cell (.lef files) and other constraint files to convert the behavioral or dataflow code into real physical standard cell gates. Note that there are many possible implementations for 2:1 Multiplexer, and Synthesis is responsible to do an educated trade-off with performance, power and area to come up with the best implementation considering these constraints. As an example for the 2:1 Multiplexer, one possible implementation is below:
Figure 3: Gate level implementation of 2:1 Multiplexer
Floorplanning
Floorplanning step formalizes and refines the floorplan that was first conjured up during the architecture planning step. In this step, the entire die area is divided into physical partitions, and their shapes are molded while keeping in mind the area requirements, the flow of top level data and control buses, possibility of any future growth. Pins and ports are assigned a rough location, which can further be refined depending on the Place and Route results.
Figure 4: Floorplanning the blocks relative to each other. Image Courtesy: Andrew Kahng, UCSD
It’s quite common for physical design engineers to work on more than 1 floorplan in parallel, and try to evaluate which one works best for overall design QoR (Quality of Results). This is usually the most critical step in physical design cycle, and requires multiple iterations. Any additional time spent here is worth it considering its long lasting implications on routing congestion, cell density, timing QoR and DRCs.
A robust power grid delivery- which addresses static and dynamic IR drop is also a critical function of the floorplanning step.
Placement
During placement, all standard cells are placed in legal locations on site rows. The aim of this step is to minimize the wire length, while ensuring optimal placement that will help faster timing convergence.
Figure 5: Standard Cells arranged on site rows. Image Courtesy: Andrew Kahng, UCSD
No real routes are laid during this step. Placement estimates routing through a step called Global Routing, where it estimates the total wire length and global route congestion. Many modern placement engines have the capability to take into account the switching activity from SAIF or VCD files, and try to optimize placement for achieving lower dynamic power.
Figure 6: Placed design. Image courtesy: Andrew Kahng, UCSD
Clock Tree Synthesis
Till now, clock network was ideal. During clock tree synthesis, clocks are propagated and the clock tree is synthesized using clock buffers. The major goals of this step is to achieve optimal clock latency while minimizing clock skew. There are many proposed algorithms to design an optimal clock tree- H Tree, Steiner Tree etc. In addition to this, one may choose Clock Tree Mesh, Multi-source Clock Tree Synthesis or traditional Single Point Clock Tree Synthesis which offer trade-offs for dynamic power, routing resources and OCV adjustment due to common clock path.
Figure 7: Typical H tree clock distribution. Image Courtesy: Research Gate
Clock being the signal with highest toggling frequency in the design, clock buffer tree accounts for over 75% of the dynamic power dissipated in an ASIC. Architecture may support clock gating to turn off idle parts of the chip to save dynamic power.
Detail Routing
With all instances placed and clocks routed, now it’s time to route the signal nets. Modern process supports 10-12 metal layer stack, with M0-M1 reserved for standard cell routing. The algorithm used for detail routing is usually a glorified maze router with added constraints to ensure faster run-times. The metal resources are divided into tracks which are the legal locations for metal routes. Aim of detail routing is to ensure minimum detours because these may have implications on timing, and to ensure minimum DRC (Design Rule Check) violations like opens, shorts etc. This step performs multiple search and repair loops (10-20) to keep the overall DRC count low.
Figure 8: Routed Design. Image Courtesy: Andrew Kahng, UCSD
Physical and Timing Verification
While logic verification ensures correct functionality, physical verification ensures correct layout. There’s been an increase in Physical Verification checks which includes- DRC (Design Rule Checks), LVS (Layout versus Schematic), Electromigration, Electro-static discharge violations (ESD), Antenna violations, Pattern Match (PM) violations, Shorts, Opens, Floating nets etc. It is important to track these violations in parallel with the Place and Route flow to avoid any surprises just days before tape-out.
Timing Verification verifies that the chip runs at the specified frequency by ensuring setup and hold is met for all timing paths in the design.
Figure 9: FRICO ASIC, 350 nm technology
ASIC design flow is a complex engineering problem that goes through a plethora of steps from concept to silicon. While some steps are more like art than engineering (like floorplanning), other some steps entail sound engineering trade-offs (like physical design and timing). With an increased demand for better performance and shrinking time to market, ASIC design flow would continue to get more intricate over the next decade. The core motivation and design philosophy, however, would remain the same.
ASIC Design Cost – By Nanometer
Select a technology node has a direct impact on your design cost.
A more advanced node more sophisticated maskset and EDA tools — therefore the NRE cost will be higher.
The chart below shows the chip design cost per nanometer from 28nm to 2nm: