Phison AI Data Platform Infrastructure Phison AI Data Platform 基礎設施

Phison GPU Server Phison GPU Server

Pre-integrated GPU servers with aiDAPTIVLink middleware and aiDAPTIVCache NAND Flash — extending AI memory, reusing KV cache across inference sessions, and supporting up to 5× more concurrent users on the same infrastructure. 預整合 aiDAPTIVLink 中介軟體與 aiDAPTIVCache NAND Flash 的 GPU 伺服器，擴展 AI 記憶體容量、跨推理工作階段重用 KV Cache，在相同硬體規模下支援最高 5 倍並行使用者。

0 0 × × Higher Concurrent Users 更高並行使用者

Up to 5× more concurrent users on the same GPU infrastructure — validated on PRO6000 and H200 nodes. 在相同 GPU 基礎設施上最高 5 倍並行使用者 — PRO6000 與 H200 節點實測驗證。

0 0 B B Parameter Models 參數量模型

Train and serve LLMs up to 800B parameters with aiDAPTIVCache memory extension. 搭配 aiDAPTIVCache 記憶體延伸，支援最高 800B 參數模型訓練與推理。

Architecture 架構

How aiDAPTIV Transforms GPU Server aiDAPTIV 如何改變 GPU Server

Traditional GPU servers rely on HBM, DRAM, and SSD tiers alone. Phison aiDAPTIV extends AI memory with aiDAPTIVLink middleware and aiDAPTIVCache NAND Flash. 傳統 GPU 伺服器僅依賴 HBM、DRAM 與 SSD 分層。Phison aiDAPTIV 透過 aiDAPTIVLink 中介軟體與 aiDAPTIVCache NAND Flash 延伸 AI 記憶體。

Traditional Approach 傳統架構做法 AI Training Architecture AI 訓練架構

Phison aiDAPTIV Approach Phison aiDAPTIV 架構做法 NEW AI Training Architecture 新一代 AI 訓練架構

Performance 效能

GPU Server Performance Results GPU Server 效能成果

Compare concurrent user capacity and inference metrics with and without aiDAPTIV on the same GPU server infrastructure. 在相同 GPU 伺服器基礎設施上，比較啟用 aiDAPTIV 前後的並行使用者容量與推理指標。

5 × Higher Concurrent Users 5 × 更高並行使用者

Phison aiDAPTIV expands effective AI memory in software — no extra GPUs required. Same hardware, up to 5× more concurrent users. Phison aiDAPTIV 透過軟體層擴展有效 AI 記憶體，無需增購 GPU，即可在相同硬體上大幅提升並行使用者容量。

Inference Speed 推理速度

Multi-GPU DAS benchmark — concurrent users, TTFT, and TPS with and without aiDAPTIV. 多 GPU DAS 基準測試 — 比較啟用 aiDAPTIV 前後的並行使用者、TTFT 與 TPS。

6000ada ×8

gpt-oss-120b AI100 ×2

Without aiDAPTIV 未啟用 aiDAPTIV

Users 使用者: 10
TTFT: 6.7s
TPS: 28.3

With aiDAPTIV 啟用 aiDAPTIV

Users 使用者: 40
TTFT: 2.3s
TPS: 28.0

PRO6000 ×8

gpt-oss-120b AI100 ×2

Without aiDAPTIV 未啟用 aiDAPTIV

Users 使用者: 20
TTFT: 10.1s
TPS: 18.8

With aiDAPTIV 啟用 aiDAPTIV

Users 使用者: 60
TTFT: 2.6s
TPS: 21.3

H200 ×8

llama3.3-70b AI200 ×4

Without aiDAPTIV 未啟用 aiDAPTIV

Users 使用者: 20
TTFT: 7.4s
TPS: 21.7

With aiDAPTIV 啟用 aiDAPTIV

Users 使用者: 100
TTFT: 8.3s
TPS: 22.2

B300 ×8

llama3.3-70b AI200 ×8

Without aiDAPTIV 未啟用 aiDAPTIV

Users 使用者: 60
TTFT: 9.3s
TPS: 16.5

With aiDAPTIV 啟用 aiDAPTIV

Users 使用者: 180
TTFT: 9.9s
TPS: 17.7

Multi-GPU DAS inference speed benchmark comparison 多 GPU DAS 推理速度基準比較
GPU GPU	Model 模型	Cache 快取		Users 使用者	TTFT TTFT	TPS TPS
6000ada ×8	gpt-oss-120b	AI100 ×2	Without aiDAPTIV 未啟用 aiDAPTIV With aiDAPTIV 啟用 aiDAPTIV	10 40	6.7s 2.3s	28.3 28.0
PRO6000 ×8	gpt-oss-120b	AI100 ×2	Without aiDAPTIV 未啟用 aiDAPTIV With aiDAPTIV 啟用 aiDAPTIV	20 60	10.1s 2.6s	18.8 21.3
H200 ×8	llama3.3-70b	AI200 ×4	Without aiDAPTIV 未啟用 aiDAPTIV With aiDAPTIV 啟用 aiDAPTIV	20 100	7.4s 8.3s	21.7 22.2
B300 ×8	llama3.3-70b	AI200 ×8	Without aiDAPTIV 未啟用 aiDAPTIV With aiDAPTIV 啟用 aiDAPTIV	60 180	9.3s 9.9s	16.5 17.7

Configurations 配置

GPU Server System Configurations GPU Server 系統配置

Pre-integrated DAS architecture options from 4U RTX to 8U HGX — each paired with aiDAPTIVCache for training and inference at scale. 從 4U RTX 到 8U HGX 的預整合 DAS 架構選項，每款皆搭配 aiDAPTIVCache，支援大規模訓練與推理。

4U Server 4U 伺服器

NVIDIA RTX 6000 Ada NVIDIA RTX 6000 Ada

CPU CPU: 2 × 12 Cores 2 × 12 核心
aiDAPTIVCache aiDAPTIVCache: 2 × AI100E 2TB or 4 × AI100E 2TB 2 × AI100E 2TB 或 4 × AI100E 2TB
System Memory 系統記憶體: 512 / 1024 GB 512 / 1024 GB

LLM Model Size (Training) LLM 模型規模（訓練）: < 200B / 400B < 200B / 400B
Concurrent Users* (Inference) 並行使用者*（推理）: 10 40

Power Spec. 電源規格: Max. 6 kW (Avg. 2 kW / 3 kW) 最大 6 kW（平均 2 kW / 3 kW）

4U Server 4U 伺服器

NVIDIA RTX PRO6000 Server Edition NVIDIA RTX PRO6000 Server Edition

CPU CPU: 2 × 32 Cores 2 × 32 核心
aiDAPTIVCache aiDAPTIVCache: 2 × AI100E 2TB or 4 × AI100E 2TB or 4 × AI200E 4TB 2 × AI100E 2TB 或 4 × AI100E 2TB 或 4 × AI200E 4TB
System Memory 系統記憶體: 256 / 512 / 1024 GB 256 / 512 / 1024 GB

LLM Model Size (Training) LLM 模型規模（訓練）: < 200B / 400B / 800B < 200B / 400B / 800B
Concurrent Users* (Inference) 並行使用者*（推理）: 20 60

Power Spec. 電源規格: Max. 12 kW (Avg. 2 kW / 3 kW / 6 kW) 最大 12 kW（平均 2 kW / 3 kW / 6 kW）

5U Server 5U 伺服器

NVIDIA HGX H200 NVIDIA HGX H200

CPU CPU: 2 × 48 Cores 2 × 48 核心
aiDAPTIVCache aiDAPTIVCache: 4 × AI200E 4TB 4 × AI200E 4TB
System Memory 系統記憶體: 2048 GB 2048 GB

LLM Model Size (Training) LLM 模型規模（訓練）: ≤ 800B ≤ 800B
Concurrent Users* (Inference) 並行使用者*（推理）: 20 100

Power Spec. 電源規格: Max. 18 kW (Avg. 12 kW) 最大 18 kW（平均 12 kW）

8U Server 8U 伺服器

NVIDIA HGX B300 NVIDIA HGX B300

CPU CPU: 2 × 64 Cores 2 × 64 核心
aiDAPTIVCache aiDAPTIVCache: 4 × AI200E 4TB 4 × AI200E 4TB
System Memory 系統記憶體: 3072 GB 3072 GB

LLM Model Size (Training) LLM 模型規模（訓練）: ≤ 800B ≤ 800B
Concurrent Users* (Inference) 並行使用者*（推理）: 60 180

Power Spec. 電源規格: 36 kW (Avg. 14 kW) 36 kW（平均 14 kW）

OS: Ubuntu · 1920GB ×2 (RAID1). Note: TTFT < 10sec, TPS > 20 tokens/sec, Input token length = 16K 作業系統：Ubuntu · 1920GB ×2 (RAID1)。備註：TTFT < 10 秒、TPS > 20 tokens/sec、輸入 token 長度 = 16K