Why Nvidia’s Blackwell is Having Issues with TSMC CoWoS-L

Plus: UCIe 2.0 for 3D stacking; Samsung slows down Texas expansion

Nvidia’s Blackwell Experiences Delays

Nvidia’s Blackwell AI chip is reportedly facing significant delays due to several issues discovered late in its manufacturing cycle.

Reportedly, one problem is with the processor die connecting the two Blackwell GPUs on a GB200 chip. Nvidia is revising the design and will need to requalify with TSMC before mass production can begin.

Big tech players who have invested heavily in Nvidia’s technology are reportedly not happy. Google ordered over 400K GB200 chips in a deal exceeding $10B. Meta has reportedly also placed a $10B order, and Microsoft had plans to have GB200 GPUs ready for OpenAI by the first quarter of 2025 – which is now in doubt. Another problem has been packaging related. Nvidia’s B100 & B200 GPUs are the first processors to use TSMC’s CoWoS-L packaging, which connects chiplets using a redistribution layer (RDL) interposer with local silicon bridges.

Figure 1: Nvidia Blackwell AI Chip
Image Source: Nvidia

Allegedly, a mismatch in the coefficient of thermal expansion (CTE) among the GPU chiplets, the silicon bridges, the RDL interposer, and motherboard substrate led to warping and system failure. According to reports, Nvidia had to redesign the GPU silicon’s top metal layers and bumps to improve yields.

Nvidia’s B100 and B200 GPUs are the industry’s first products to use TSMC’s CoWoS-L packaging with a “super carrier interposer.” This enables the building of systems-in-package up to six times the reticle size by using active or passive local silicon interconnect (LSI) bridges integrated into an RDL interposer (instead of a silicon interposer in the case of CoWoS-S used for H100). Placement of the bridge dies requires state-of-the-art precision, particularly for the bridges between the two main compute dies, which are essential for maintaining the 10 TB/s interconnect.

Analysts from Semi Analysis report that there could be a CTE mismatch between the GPU chiplets, LSI bridges, the RDL interposer, and motherboard substrate, which causes warpage and failure of the whole SiP. (Figure 2)

Additionally, there are reports of a required redesign of the top global routing metal layers and bumps out of the Blackwell GPU silicon.

Figure 2: Examples of CTE issues in Nvidia’s Blackwell
Image Source: Resonac

There also appears to be an issue with TSMC not having enough CoWoS-L capacity. TSMC is both building a new fab, AP6, for CoWoS-L and converting existing CoWoS-S capacity at AP3

Figure 3: Nvidia’s Blackwell family specifications
Image Source: Semi Analysis Specifications

UCIe for 3D Stacking

The Universal Chiplet Interconnect Express (UCIe) Consortium, recently released its 2.0 specification with updates that address design challenges for testability, manageability and debug (DFx) for the SiP lifecycle across multiple chiplets. A key feature of the update is support for 3D packaging to enable chiplets to dramatically increase bandwidth density and power efficiency.

In discussions with EE Times, consortium chair Debendra Das Sharma describes UCIe 1.0, which supported 2D and 2.5D, as a planar interconnect with side-by-side chiplets. The new specification supports 3D stacking of chiplets vertically and the UCIe 2.0 specification is fully backward compatible, while supporting vendor agnostic chiplet interoperability.

Das Sharma added that a recently formed automative working group reflects an interest from that sector to start gathering requirements.

Debugging and testing are also updated in UCIe 2.0 both on a die basis and once the chiplets are packaged.

Das Sharma added that one of the 3D trends in chiplets is a move to hybrid bonding, which is becoming more mainstream and has allowed for aggressive shrinking of the bump pitches between chiplets. A 3D interconnect almost all but eliminates the distance between chiplets, so that means interoperability must be constrained to the same bump pitch.

UCIe-3D is optimized for hybrid bonding for bump pitches as big as 10-25 microns to as small as one micron or less to provide flexibility and scalability.

On the security front, UCIe 2.0 uses a hub and spoke model, with a management director acting as the root of trust.

Samsung Slows Down Texas Expansion

For those of you that missed it, Samsung has slowed the ramp of its new fab in Taylor, Texas, despite the conditional CHIPS Act award for more than $6B. During its quarterly report in April, Samsung said that it was delaying the production start for the Taylor fab project from the second half of 2024 to “maybe 2026.”

For all the latest in advanced packaging, stay linked to IFTLE.