Azure Storage 2026：AI 训练、推理与关键业务存储

January 22, 20263分钟阅读

摘要

微软公布了 Azure Storage 2026 路线图，重点围绕 AI 训练、调优与推理的数据通路升级，包括更大规模的 Blob 存储、与 NVIDIA DGX on Azure 搭配的 Azure Managed Lustre，以及与 Foundry、Ray、LangChain 等生态的更深集成。这意味着企业不仅能更高效地支撑大模型与 agent 应用的持续高并发需求，也能在 SAP、Kubernetes 有状态应用和超低延迟关键业务场景中获得更好的性能、治理能力与成本效率。

Introduction: why this matters

AI 正从偶发式实验走向始终在线的生产环境——尤其是推理与自主“agentic”工作负载，会带来持续、高并发的访问模式。Azure Storage 的 2026 路线图聚焦于打通端到端的 AI 数据流（训练 → 调优 → 推理），同时提升传统任务关键型系统（如 SAP）以及超低延迟交易平台的成本、运维简化与性能。

What’s new (and what Microsoft is emphasizing)

1) Training at frontier scale: Blob and high-throughput data paths

Blob scaled accounts 被强调为一种可在每个区域横向扩展到 数百个 scale units 的方式，面向包含 数百万对象 的工作负载（在训练/调优数据集以及 checkpoint/模型文件管理中很常见）。
Microsoft 指出，用于支撑 OpenAI 规模运营的创新正在更广泛地向企业开放。

2) Purpose-built storage for AI compute: Azure Managed Lustre (AMLFS)

Azure 与 NVIDIA DGX on Azure 的合作，将加速计算与 Azure Managed Lustre 配对，以持续为 GPU 集群供给数据。
AMLFS 现已包含对 25 PiB namespaces 的预览支持，并提供最高 512 GBps 吞吐量，定位为面向大型研究与工业级推理场景（如汽车、机器人）的顶级托管 Lustre 方案。

3) AI ecosystem integrations: faster paths from data to inference

计划在 AI 框架生态上进行更深度的集成，包括 Microsoft Foundry、Ray/Anyscale 和 LangChain。
Foundry 内的原生 Azure Blob 集成 被定位为帮助将企业数据整合进 Foundry IQ，用于知识 grounding、fine-tuning 与低延迟的上下文服务，同时将治理与安全控制保持在租户内。

4) Agentic scale cloud-native apps: block storage + Kubernetes orchestration

Microsoft 指出，agent 可能会比人驱动的应用产生 高一个数量级的查询量，从而对存储/数据库层造成压力。
Elastic SAN 被描述为面向 SaaS 风格、多租户架构的核心构建块，提供托管的块存储池与防护机制（guardrails）。
Azure Container Storage (ACStor) 的方向将转向 Kubernetes operator model，并计划在 CSI drivers 之外，表达出 将代码库开源 的意图，以简化 Kubernetes 上有状态应用的开发。

5) Mission-critical price/performance: SAP, ANF, Ultra Disk

针对 SAP HANA，Azure 的 M-series 更新将磁盘性能目标提升到约 780k IOPS 与 16 GB/s 吞吐量。
Azure NetApp Files (ANF) 与 Azure Premium Files 仍是核心共享存储选项，并通过 ANF Flexible Service Level 与 Azure Files Provisioned v2 等举措改善 TCO。
即将推出：ANF 的 Elastic ZRS service level，通过跨 AZ 的同步复制实现 zone-redundant HA。
Ultra Disk 的性能被重点强调（亚 500µs 延迟；最高 400K IOPS/10 GB/s，并可在 Ebsv6 VMs 上提升至最高 800K IOPS/14 GB/s）。

Impact on IT admins and platform teams

对于以推理与 agentic 为主的应用，预计架构层面将更强调 吞吐量、并发与数据本地性。
Kubernetes operators 以及潜在的开源 ACStor 可能会改变团队在 AKS 上标准化有状态工作负载的方式。
存储选型将更贴合具体工作负载：Blob 用于数据集/上下文，Lustre 用于 GPU pipeline，Elastic SAN/Ultra Disk 用于高 IOPS 事务性需求，ANF 用于共享型企业工作负载。

Action items / next steps

按阶段梳理 AI 工作负载（训练 vs 推理 vs agentic），并与存储类型对齐（Blob + AMLFS + block/shared）。
审视 AMLFS 预览限制（25 PiB/512 GBps），并验证 Lustre 可改善的 GPU pipeline 瓶颈。
评估 Elastic SAN：适用于多租户 SaaS 或需要池化块存储的高并发微服务。
如需为企业应用提供性能一致的、zone-redundant 的 NFS，请 规划 ANF Elastic ZRS。
对于 AKS 团队，跟踪 ACStor operator + open-source 的更新，以减少定制化的有状态存储管理。

Azure Storage 2026：AI 训练、推理与关键业务存储

Introduction: why this matters

What’s new (and what Microsoft is emphasizing)

1) Training at frontier scale: Blob and high-throughput data paths

2) Purpose-built storage for AI compute: Azure Managed Lustre (AMLFS)

3) AI ecosystem integrations: faster paths from data to inference

4) Agentic scale cloud-native apps: block storage + Kubernetes orchestration

5) Mission-critical price/performance: SAP, ANF, Ultra Disk

Impact on IT admins and platform teams

Action items / next steps

需要Azure方面的帮助？

相关文章

Microsoft The Shift Podcast on Agentic AI Challenges

Azure Agentic AI for Regulated Industry Modernization

Fireworks AI on Microsoft Foundry for Azure Inference

Azure Copilot Migration Agent for App Modernization

Azure IaaS Resource Center for Resilient Infrastructure

Microsoft Foundry ROI Study Shows 327% Enterprise AI Gains