Best Practices: IC Troubleshooting & IC Failure Analysis

“Perception is reality,” we have heard this often. When an IC fails or the customer thinks that it failed, we must respond with an FA. Yet, to do that effectively, we must have accurate, pertinent information about the incident. That is the only way to avoid guesswork.

Let me relate an incident that happened not so long ago. A part was returned as a failure and we knew nothing else. We ran it on the automatic test equipment (ATE), bench tested, x-rayed, and decapped the part. We flooded it with soft electrons in an electron microscope to look for emission sites indicating damage. We measured its temperature using a liquid crystal coating. The part was perfect. We found no reason for failure, so the QA department said exactly that in the FA report. Why, we wondered, was the part returned as failed?

About two months later we learned almost by accident that the customer experienced this failure only when the part was heated above +60°C. We started the FA again. We tested the part at room temperature (+25°C), and we found… nothing. The part no longer functioned as it was destroyed in the process of testing it.

Ultimately, this was a one-time return event; it did not happen again. But there was something more important learned in this episode: without crucial performance (i.e. failure) data we were blind and guessing. We wasted considerable time and money for nothing.

An Exhaustive Exercise in QA Futility

Many times a failed IC is so damaged that the origin of the damage cannot be determined. One customer took a board from the assembly contractor back to their lab facility. There they removed the IC from the board and claimed that the IC failed. Very likely. The customer came to a conclusion: a “root cause” in the IC itself. They wanted an FA, but where was the failure data? Were the circumstances recorded carefully? What would prevent future failures? We were back to guessing, not fact checking—hardly a prescription for a meaningful FA.

In this case the customer had concentrated on three pins of a multioutput device. Here is what we did know: the part left the fab operating with a certainty of a few parts in billions; it operated in a circuit for hours before it failed. Was it an infant failure or was it damaged by external handling? Had it been in the customer’s circuit? In the application environment? Did electrostatic discharge (ESD) at the factory weaken the circuit so it failed later? Perhaps there was damage by a shipping clerk who ignored an ESD protocol? The list of possible factors seemed endless.

IC-Failure-Analysis

The first partial schematic received from the customer was not very helpful. It showed neither what drove the failed part nor what the part needed to drive. The local FAE was asked to check the ground. Were the grounds separated correctly? You could not tell from the schematic.

We received a few more pieces of the schematic, but now had more questions than answers. Why did the customer check at only three of many outputs? Were any input or output pins of the device connected with low impedances to board pins? Was the power and ground count as low impedance connections? Could ESD on the board pins be the issue? We were still guessing.

____________________________________

This is a guest post by Sage Analytical Lab