CoWoS advanced packaging now represents 17–23% of AI accelerator manufacturing costs, with per-chip packaging ranging from $750 for H100-class designs to over $1,100 for NVIDIA's B200. Memory and packaging together account for 60–70% of total chip COGS, far surpassing the logic die itself. This guide synthesizes analyst estimates (Morgan Stanley, JPMorgan, Bernstein), TSMC earnings guidance, Epoch AI cost models, and TrendForce supply chain reporting into the definitive 2026 view of advanced packaging economics. Data as of April 2026.
The single most important chart in this entire article — where does the money actually go in an AI accelerator? Logic silicon is the smallest slice:
Morgan Stanley, DigiTimes, December 2025
CoWoS packaging cost by variant
CoWoS-S uses a monolithic silicon interposer fabricated on TSMC's N65-equivalent process with TSVs, multi-layer RDL, and deep-trench capacitors. For the H100 — an 814mm² die plus 5–6 HBM3 stacks on a ~2,500mm² interposer — total packaging cost runs approximately $750 per chip. JPMorgan estimates a fully processed CoWoS wafer costs $10,000–$12,000 (JPMorgan, April 2025), with the interposer alone consuming 50–70% of packaging cost. With roughly 25–28 gross interposers per 300mm wafer and 60–70% interposer yields at this size, the math reconciles at $590–$800 interposer cost alone, before assembly and HBM attach. The practical size ceiling for CoWoS-S is ~2,700mm² (3.3× reticle).
CoWoS-L replaces the monolithic silicon interposer with an organic RDL substrate embedded with Local Silicon Interconnect (LSI) bridge dies — enabling packages far beyond CoWoS-S's reticle limit. NVIDIA's B200 uses CoWoS-L to integrate two ~800mm² compute dies plus 8 HBM3E stacks, a configuration physically impossible on CoWoS-S. Per-chip packaging cost runs $1,000–$1,100 (Silicon Analysts estimate; Epoch AI Monte Carlo, 2025) — a ~47% premium over CoWoS-S. The premium comes from additional LSI bridge components, higher microbump counts, and initially lower assembly yields. Paradoxically, CoWoS-L is more cost-effective at very large package sizes because it sidesteps the catastrophic yield losses of fabricating monolithic silicon interposers beyond 2,700mm². Small LSI bridges yield at ~90% versus ~60% for large monolithic interposers.
CoWoS-R replaces the silicon interposer entirely with an organic thin-film interposer using InFO-based RDL — the cheapest CoWoS variant. AWS Trainium2 is the flagship user, employing CoWoS-R for a dual-chiplet configuration with 4 HBM stacks.
| Feature | CoWoS-S | CoWoS-L | CoWoS-R |
|---|---|---|---|
| Interposer type | Monolithic silicon (TSVs) | Organic RDL + LSI bridges | Organic RDL (InFO-based) |
| Max interposer size | ~2,700mm² (3.3× reticle) | >5,000mm² (6×+ reticle) | Scalable, large |
| Max die count | 1 SoC + 6–8 HBM | 2+ SoCs + 8–12 HBM | 1 SoC + 4+ HBM |
| Typical cost/chip | $300–$800 | $800–$2,000 | $500–$1,000 |
| Key products | H100, MI300X, TPU v5/v6 | B200/B300, Rubin, MI400 | Trainium2/3, networking |
| 2026 capacity share | ~30–40% (declining) | ~50–60% (growing) | ~5–10% (niche) |
TSMC raised CoWoS prices 10–20% for 2025 (TweakTown, January 2025), with Morgan Stanley projecting an additional 20% cumulative increase through 2026. CoWoS-S lines remained fully booked through 2025 into 2026 (TSMC CEO C.C. Wei, Q3 2025 earnings call). JPMorgan projects CoWoS-L will comprise "the overwhelming majority of CoWoS production through 2027." NVIDIA alone secured >70% of CoWoS-L capacity for 2025 (TrendForce, February 2025).
How alternatives compare. Intel EMIB embeds small silicon bridges only where die-to-die connections are needed, achieving 30–40% lower cost than CoWoS (TrendForce, November 2025). Bernstein estimates EMIB packaging at "low hundreds of dollars per chip" versus $900–$1,000 for CoWoS on equivalent Rubin-class designs (Bernstein, 2026). Intel claims ~90% wafer utilization for small bridge dies versus ~60% for large interposers. TSMC InFO-PoP for iPhone A-series chips costs an estimated $10–$30 per chip — 3–10× cheaper than CoWoS — because it eliminates the silicon interposer entirely.
TSMC is also outsourcing aggressively: 240,000–270,000 CoWoS wafers/year will move to OSATs in 2026 (Global Semi Research, December 2025), with Amkor handling 180,000–190,000, SPIL 60,000–80,000, and ASE tripling to 20,000–25,000 WPM by end-2026. ASE raised advanced packaging prices 5–20% for 2026.
Key takeaway: CoWoS-L costs 20–40% more than CoWoS-S per chip, but is actually more cost-effective for packages exceeding 2,700mm² because it avoids catastrophic silicon interposer yield losses. For everything larger than a single reticle + 8 HBM, CoWoS-L is the only viable option.
See a full per-chip cost breakdown in the Packaging Calculator.
When chiplets beat monolithic — and when they don't
The chiplet-versus-monolithic equation hinges on three variables: defect density, die size, and packaging cost. At TSMC N3's mature defect density of ~0.09 defects/cm², a monolithic 800mm² die yields approximately 48.7% under the Poisson model, producing ~32 good dies per wafer from 65 gross. At a wafer cost of ~$19,500 for N3, that translates to approximately $609 per good die.
Four 200mm² chiplets on the same N3 process yield 83.5% each, producing ~255 good chiplets per wafer at $76 each — or $306 for a complete set of four. The chiplet approach saves roughly $300 in silicon costs. But this is overwhelmed by advanced packaging: a CoWoS-L interposer plus assembly adds $700–$1,000, KGD testing adds $20–$40, and integration testing $50–$100. Total chiplet cost reaches $1,126–$1,546 versus ~$700 for the monolithic approach — monolithic is ~45% cheaper for 800mm² total silicon at mature defect densities.
The picture changes dramatically as defect density rises. The crossover is visible: monolithic stays cheaper until defect density passes ~0.17/cm², then chiplet wins.
TrendForce, Morgan Stanley, JPMorgan, Global Semi Research, December 2025
| D₀ (def/cm²) | Monolithic yield (800mm²) | Monolithic die cost | 4-chiplet die cost | Chiplet total (with CoWoS) | Winner |
|---|---|---|---|---|---|
| 0.05 (very mature) | 67.0% | $449 | $264 | $1,124 | Monolithic |
| 0.09 (mature N3) | 48.7% | $609 | $306 | $1,166 | Monolithic |
| 0.15 (early production) | 30.1% | $995 | $372 | $1,232 | Monolithic |
| 0.20 (immature) | 20.2% | $1,489 | $432 | $1,292 | Chiplet |
| 0.30 (very early) | 9.1% | $3,304 | $576 | $1,436 | Chiplet |
The crossover at 800mm² occurs around D₀ ≈ 0.17–0.20 defects/cm² — typical of early production on a new node. The common rule of thumb that "chiplets beat monolithic above 400–500mm²" applies only when using cheap organic-substrate MCM packaging ($50–$200). AMD's EPYC Genoa proves this brilliantly: 12 small CCDs (~70mm² each) plus one IOD (~419mm²) on an inexpensive organic substrate delivers >40% cost reduction versus a hypothetical monolithic equivalent (AMD Chiplet Actuary paper, arXiv). But for AI accelerators requiring HBM — which mandates CoWoS packaging regardless — the comparison becomes less about cost and more about physical necessity (exceeding reticle limits) and risk management (yield insurance on immature nodes).
Yield versus die size at current defect densities (Poisson model):
| Die size | D₀=0.05 | D₀=0.07 | D₀=0.09 | D₀=0.12 | D₀=0.15 | D₀=0.20 |
|---|---|---|---|---|---|---|
| 100mm² | 95.1% | 93.2% | 91.4% | 88.7% | 86.1% | 81.9% |
| 200mm² | 90.5% | 86.9% | 83.5% | 78.7% | 74.1% | 67.0% |
| 400mm² | 81.9% | 75.6% | 69.8% | 61.9% | 54.9% | 44.9% |
| 600mm² | 74.1% | 65.7% | 58.3% | 48.7% | 40.7% | 30.1% |
| 800mm² | 67.0% | 57.1% | 48.7% | 38.3% | 30.1% | 20.2% |
| 1000mm² | 60.7% | 49.7% | 40.7% | 30.1% | 22.3% | 13.5% |
Real-world chip cost comparisons illustrate the range:
| Chip | Architecture | Total silicon | Node | Packaging | Est. COGS |
|---|---|---|---|---|---|
| NVIDIA H100 | Monolithic 814mm² | 814mm² | 4N | CoWoS-S, ~$750 | $3,320 |
| NVIDIA B200 | 2× ~800mm² | ~1,600mm² | 4NP | CoWoS-L, ~$1,100 | $6,400 |
| AMD MI300X | 8 XCD + 4 IOD (12 dies) | ~2,400mm² | N5+N6 | SoIC + CoWoS-S, ~$1,500 | $5,300 |
| AMD EPYC Genoa | 12 CCD + 1 IOD | ~1,259mm² | N5+N6 | Organic MCM, ~$75 | $300–500 |
| Apple M2 Ultra | 2× M2 Max | ~780mm² | N5P | InFO-L (UltraFusion), ~$75 | $200–350 |
The H100's cost breakdown is revealing: the 814mm² logic die costs only ~$300 (9% of COGS), while HBM3 memory is ~$1,350 (41%), CoWoS-S is ~$750 (23%), and test/assembly is ~$920 (28%). For B200, HBM3E rises to ~$2,900 (45% of COGS). NVIDIA moved to dual-die for Blackwell not for cost savings but because a single >1,600mm² die physically exceeds the reticle limit. Jensen Huang stated NVIDIA invested ~$10B in NV-HBI interconnect R&D to make dual-die work at 10 TB/s die-to-die bandwidth.
Key takeaway: The "chiplets are always cheaper above 400mm²" rule is wrong for HBM-bearing AI accelerators. On CoWoS, monolithic wins until defect density exceeds ~0.17/cm². Chiplets are chosen for physics (reticle limit) and yield insurance, not cost savings.
Compare cost structures across 13 AI accelerators in the Cost Bridge Chart.
Test economics — the hidden cost multiplier
For a large AI accelerator like the H100, total test and assembly costs run ~$920 (28% of $3,320 COGS). Individual step costs scale sharply with die complexity:
| Test step | Standard SoC | Large AI accelerator (~800mm²) |
|---|---|---|
| Wafer sort | $2–5/die | $5–15/die |
| Final test | $5–20/package | $20–50+/package |
| Burn-in | $5–15/chip | $15–30+/chip |
| System-level test | $5–20 | $10–50+ |
ATE hourly rates run ~$100/hour for mainstream SoC testers; high-end configurations (Advantest V93000, Teradyne UltraFLEXplus) reach $100–$200/hour fully loaded (AnySilicon, 2024). Large AI accelerator dies require 30–120 seconds per die at wafer sort versus 0.5–2 seconds for simple IoT chips, with only 1–2 die per touchdown due to massive die area — collapsing parallelism and driving up per-die cost. Burn-in runs 24–168 hours at 125–175°C under voltage stress (SemiEngineering, 2024). Historically test was 2–3% of IC revenue; for advanced-node AI chips it is rising to 5–10%.
Chiplets insert a critical additional step: Known Good Die (KGD) testing. Once chiplets are bonded via microbumps or hybrid bonds, rework is essentially impossible — a single defective die scraps the entire package, including all other good chiplets, interposer, and HBM stacks (Cadence, 2024). The "Rule of Ten" applies: detecting defects costs 10× more at each subsequent manufacturing stage.
KGD testing costs $5–$15 per chiplet for thorough structural, parametric, and BIST testing. FormFactor notes that full KGD testing of every die is "often not economically feasible" — the industry compromise is "Good Enough Die" testing that balances cost against fallout risk (FormFactor/Amy Leong, 2022). Composite yield math makes quality paramount: for a 4-chiplet design at 95% KGD each, system yield before assembly losses is only 0.95⁴ = 81.5%. For AMD's MI300X with 12 chiplets, even 98% per-chiplet KGD produces only 0.98¹² = 78.5% composite yield.
The math is brutal: every chiplet you add punishes system yield multiplicatively. This is why MI300X-class designs demand ≥98% KGD per chiplet just to reach passable system yield:
Silicon Analysts, composite yield = (KGD)^N
| Cost category | Monolithic (~800mm²) | Chiplet (2–4 dies + interposer) |
|---|---|---|
| Wafer sort | $5–15/die | $3–8/chiplet × N |
| KGD testing | N/A | $5–15/chiplet |
| Interposer test | N/A | $2–5 |
| Dicing | $0.50–2 | $0.50–1 × N |
| Packaging/assembly | $15–50 (FC-BGA) | $50–150 (CoWoS) |
| Final test | $10–50 | $15–60 |
| Burn-in | $5–30 | $10–40 |
| Total test | $45–200 | $80–350+ |
Total test cost for chiplet packages runs 15–30% higher than monolithic equivalents (Chiplet Actuary, arXiv). The B200's ~$2,400 in test and assembly dwarfs the H100's $920, reflecting dramatically higher multi-die testing complexity. UCIe lane repair provides partial mitigation — redundant lanes can be switched in during test, improving effective assembly yield by 1–5% (SemiEngineering/Amkor, 2024).
Key takeaway: Once chiplets are bonded, rework is impossible — a single bad die scraps the whole package. That physics is why KGD testing adds 15–30% to chiplet test costs versus monolithic, and why "Good Enough Die" compromises are industry standard.
CoWoS capacity, allocation, and lead times
TSMC's CoWoS capacity has roughly doubled annually since 2023 — the kind of growth curve the semiconductor industry almost never sees outside of greenfield node ramps:
Silicon Analysts, Poisson yield model, 800mm² total silicon, TSMC N3 wafer at $19,500, CoWoS-L packaging at $800
| Period | Capacity (WPM) | Source |
|---|---|---|
| End 2023 | ~13,000–16,000 | Nomad Semi, TSMC |
| End 2024 | ~35,000–40,000 | TrendForce, October 2024 |
| End 2025 | ~75,000–80,000 | TrendForce, Global Semi Research |
| Q1 2026 (est.) | ~80,000–90,000 | Inferred |
| End 2026 target | 120,000–130,000 | TrendForce, Morgan Stanley, DigiTimes |
| End 2027 target | ~141,000–170,000 | JPMorgan / 36kr |
CoWoS-L is the primary growth driver: of NVIDIA's 510,000 TSMC CoWoS wafers, ~510,000 are CoWoS-L (Morgan Stanley, December 2025). TSMC is boosting CoWoS-S capacity mainly through equipment reallocation rather than new builds. TSMC operates advanced packaging across AP3 (Longtan), AP5/AP5B (Taichung), AP6/AP6B (Zhunan), AP7 (Chiayi — opening ceremony December 4, 2025), and AP8 (Tainan — 96,000+ sqm, 9× AP6's size, equipment move-in began Q4 2025). A US packaging facility is planned to break ground in 2026 for completion by 2029.
Customer allocation is heavily concentrated. Global CoWoS demand in 2026 is projected at approximately 1 million wafers — up 40–50% YoY (Morgan Stanley, December 2025). NVIDIA alone takes more than the rest of the industry combined:
Silicon Analysts estimate aggregating Epoch AI, Raymond James, TrendForce, March 2026
| Customer | 2026 wafers | Share | Key products |
|---|---|---|---|
| NVIDIA | ~595,000 | ~60% | Rubin, Blackwell Ultra, Vera CPU |
| Broadcom | ~150,000 | ~15% | Google TPU (90K), Meta ASIC (50K), OpenAI (10K) |
| AMD | ~105,000 | ~11% | MI355, MI400, Venice CPU |
| Marvell | ~55,000 | ~5.5% | Custom chips for AWS, Microsoft |
| Amazon/Alchip | ~50,000 | ~5% | Trainium3, custom AI ASICs |
| MediaTek | ~20,000 | ~2% | Google TPU project (new entrant) |
| Others | ~25,000 | <3% | Various |
Broadcom is gaining share as hyperscaler ASIC demand accelerates. MediaTek is a new entrant booking ~20,000 wafers for Google's TPU project. The top customers lock in >85% of capacity, leaving <15% for second-tier players and startups.
Lead times have compressed from the 50+ week peak during 2024–early 2025 (FinancialContent, January 2026) to 30–40 weeks for new orders as of early 2026, driven by TSMC expansion and OSAT outsourcing. C.C. Wei stated capacity was "about three times short" of AI demand during Q3 2025. By early 2026, Morgan Stanley's OCP Conference analysis and Jensen Huang's commentary suggest foundries and CoWoS are "no longer the primary bottleneck" — constraints are shifting downstream to memory, power infrastructure, and rack assembly. A notable DigiTimes report from August 2025 indicated CoWoS utilization was briefly ~60%, complicating the "perpetually sold out" narrative.
Key takeaway: Supply-demand is approaching equilibrium. TSMC (~130K WPM) + OSATs (~40K WPM) = ~2M wafer-starts/year against ~1.0–1.15M projected demand. The bottleneck is moving from packaging to HBM and power.
Track live allocation and queue status on the Allocation Dashboard.
AI chip packaging case studies
NVIDIA B200 integrates two ~800mm² GB100 dies (TSMC 4NP, 208B transistors total) connected via NV-HBI at 10 TB/s, with 8 HBM3E stacks delivering 192GB at 8.0 TB/s, all on a CoWoS-L organic interposer with embedded LSI bridges. Estimated total COGS is ~$6,400: packaging ~$1,100 (17%), HBM3E ~$2,900 (45%), logic dies ~$850 (13%), test/assembly ~$1,550 (24%). Packaging yield losses alone add roughly $1,000 in effective scrap cost. The GB200 NVL72 rack (72 GPUs + 36 Grace CPUs, 120kW liquid-cooled) implies GPU-only component COGS exceeding $460,000 per rack before system integration.
AMD MI300X is a 3.5D packaging tour de force: 8 XCD compute chiplets (N5, ~115mm² each) are 3D hybrid-bonded onto 4 IOD dies (N6, ~370mm² each) via SoIC at 9µm TSV pitch. The 12-chiplet stack sits on a CoWoS-S interposer at ~3.5× reticle alongside 8 HBM3 stacks. Total active silicon exceeds 2,400mm² across 153B transistors. Estimated COGS is ~$5,300, with packaging at $1,200–$1,800 reflecting the world's most complex commercial packaging — over 100 pieces of silicon per package including HBM layers.
Google TPU v7 (Ironwood) uses dual ~700mm² compute dies on TSMC N3P with 8 HBM3E stacks (192GB, 7.2 TB/s), delivering 4,614 FP8 TFLOPS — a dual-die + 8 HBM configuration that mirrors B200, a physics-driven convergence. Estimated packaging cost is $1,000–$1,300. Microsoft Maia 200 (launched January 2026) uses a monolithic 727mm² die on TSMC N3E with 216GB HBM3E across 6 stacks; estimated packaging is $900–$1,200. AWS Trainium2 uses CoWoS-R — the cost-conscious organic-interposer variant — for 2 chiplets + 4 HBM3 stacks. Meta MTIA v2 is the outlier: a 421mm² die on TSMC N5 using standard flip-chip BGA packaging with LPDDR5 — no HBM, no advanced packaging — at ~$50–$150 total COGS. Newer MTIA generations (300+) are transitioning to chiplet architectures with CoWoS-S and HBM.
| Chip | Packaging | HBM | Est. pkg cost | Est. total COGS |
|---|---|---|---|---|
| NVIDIA B200 | CoWoS-L | 8× HBM3E (192GB) | ~$1,100 | ~$6,400 |
| NVIDIA H100 | CoWoS-S | 5–6× HBM3 (80GB) | ~$750 | ~$3,320 |
| AMD MI300X | SoIC + CoWoS-S | 8× HBM3 (192GB) | ~$1,500 | ~$5,300 |
| Google TPU v7 | CoWoS (likely L) | 8× HBM3E (192GB) | ~$1,000–1,300 | Not public |
| Microsoft Maia 200 | CoWoS (S or L) | 6× HBM3E (216GB) | ~$900–1,200 | ~$5,000–7,000 |
| AWS Trainium2 | CoWoS-R | 4× HBM3 (96GB) | ~$700–1,000 | ~$3,000–4,500 |
| Meta MTIA v2 | Standard BGA | None (LPDDR5) | ~$5–15 | ~$50–150 |
Model any of these configurations yourself in the Chip Price Calculator.
Future packaging roadmap
CoWoS 9.5× reticle scales the interposer to ~8,100+ mm², accommodating 12+ HBM stacks alongside cutting-edge logic dies. Mass production is targeted for 2027 (TSMC North America Technology Symposium, May 2025; TrendForce, November 2024). Bernstein estimates NVIDIA's Rubin accelerator will carry a CoWoS packaging cost of ~$900–$1,000 per chip using this configuration with 12+ HBM stacks.
CoPoS (Chip-on-Panel-on-Substrate) replaces the round 300mm wafer with a 310mm × 310mm rectangular panel for interposer fabrication. Tool deliveries to TSMC subsidiary Xintec's R&D line began February 2026, with full pilot line completion targeted June 2026 (TrendForce/Commercial Times, April 2026). Mass production is expected late 2028 to early 2029 at AP7 Phase 4 in Chiayi. Panel utilization exceeds 95% versus ~85% for circular wafers, driving an expected 20–30% cost advantage over current CoWoS. The substrate-less CoWoP variant could achieve 30–50% cost reduction by eliminating ABF substrates, which currently account for ~40% of packaging cost.
SoIC capacity stands at ~10,000 WPM in 2025, targeting 15,000–20,000 WPM by end-2026 (TrendForce, March 2025). CapEx runs up to $7 billion per 10,000 WPM of SoIC capacity — among the most capital-intensive packaging technologies. Current customers include AMD (MI300 series); Apple, NVIDIA (Rubin), and Broadcom are confirmed future users.
Hybrid bonding is in production for AMD MI300 (SoIC), 3D NAND, and CMOS image sensors, but HBM4 will stick with microbumps, postponing hybrid bonding adoption to HBM4E around 2027 (Semiconductor Engineering, March 2026). The hybrid bonder market reached ~$152M in 2025 and is projected to reach $397M by 2030 at 21.1% CAGR (Yole Group, 2025). Thermocompression bonding remains simpler and cheaper — hybrid bonding only becomes economical when pad pitch drops below ~10µm.
UCIe 3.0 was released August 5, 2025 (UCIe Consortium), doubling data rates to 48 GT/s and 64 GT/s and extending sideband reach to 100mm. The consortium has grown to 150+ members, enabling multi-vendor chiplet interoperability and lowering barriers to entry for smaller design teams.
Industry economics
The total advanced packaging market reached approximately $43–50 billion in 2025 (Yole, IMARC, Acumen), growing at 9.5–10.6% CAGR through 2030. The high-performance segment (chiplets, AI/HPC) is the fastest-growing subsector at 23% CAGR, projected to reach $28.5B by 2030 (Yole/SEMI Summit, 2025). TSMC's advanced packaging revenue reached ~8% of total company revenue in 2025, targeting >10% in 2026. Industry-wide CapEx for advanced packaging exceeded $14 billion in 2025 (Yole), with ASE alone planning $7B for 2026.
Global CoWoS wafer demand: 370,000 wafers in 2024 → 670,000 in 2025 → ~1 million in 2026 (Morgan Stanley). Key equipment vendors span BESI and ASMPT (bonding), Canon and ASML (packaging lithography — ASML's TWINSCAN XT:260 launched in 2024 supports interposers up to 3,432mm² without stitching), and Advantest (~31% ATE share) and Teradyne (~23%). High-end ATE systems cost $1.5–$5 million per unit.
Ajinomoto controls >95% of the ABF film market for CPU/GPU substrates (Nikkei Asia), with supply constrained for high-layer-count packages. Ajinomoto plans a 50% boost in ABF output by 2030. Intel and NVIDIA have co-invested in substrate supplier expansions, covering ~50% of new production line costs. Glass substrates are emerging as a long-term alternative — AMD and AWS are reportedly accelerating glass substrate timelines.
CoWoS costs are rising, not falling: TSMC raised prices 10–20% for 2025, with Morgan Stanley projecting additional 20% cumulative increases through 2026. The progression from H100-class (~$750) to B200-class (~$1,100) to Rubin-class (~$900–$1,000) shows per-chip costs remain elevated even as TSMC scales. The most significant cost relief will come from panel-level packaging (CoPoS), which should deliver 20–50% cost reduction — but mass production timing of late 2028–early 2029 means meaningful relief is still 2–3 years away.
Frequently asked questions
How much does CoWoS packaging cost per chip?
CoWoS-S costs approximately $300–$800 per chip depending on die size and HBM count. CoWoS-L costs $800–$2,000 for multi-die configurations. For an H100-class chip, packaging is about $750 (23% of total COGS). For NVIDIA's B200, packaging is ~$1,100 (17%).
Is chiplet packaging cheaper than monolithic?
It depends on packaging technology. On organic substrates (like AMD EPYC), chiplets save >40% above ~400mm². On CoWoS (required for HBM), chiplets only break even above ~800mm² on immature processes (D₀ > 0.17 defects/cm²). On mature nodes, a monolithic design is ~45% cheaper at 800mm² total silicon because CoWoS packaging costs ($700–$1,000) overwhelm the silicon yield advantage.
What is Known Good Die (KGD) testing?
KGD testing verifies each chiplet works before assembly. Once bonded via microbumps or hybrid bonds, rework is impossible — a single defective die scraps the entire package. KGD costs $5–$15 per chiplet and is mandatory for cost-effective multi-die packaging.
How long is the CoWoS lead time in 2026?
Lead times have compressed from 50+ weeks in 2024–early 2025 to approximately 30–40 weeks for new orders as of Q1 2026, driven by TSMC capacity expansion and OSAT outsourcing. Constraints have shifted downstream to HBM, power infrastructure, and rack assembly.
What is CoWoS-L and how is it different from CoWoS-S?
CoWoS-L uses an organic RDL interposer with embedded Local Silicon Interconnect (LSI) bridges instead of a monolithic silicon interposer. It supports packages up to 5,000mm²+ (versus 2,700mm² for CoWoS-S) and is used for NVIDIA's B200 and future Rubin GPUs. It costs 20–40% more per chip than CoWoS-S but is actually more economical at very large package sizes because it avoids catastrophic interposer yield losses.
Who gets the most CoWoS capacity?
NVIDIA secures ~60% of total CoWoS allocation in 2026 (~595,000 wafers). Broadcom gets ~15% (with Google TPU, Meta ASIC, and OpenAI projects), AMD ~11%, Marvell ~5.5%, and Amazon/Alchip ~5%. The top customers lock in >85% of capacity, squeezing out smaller AI chip companies.
References & Sources
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]