Fireworks AI on Microsoft Foundry for Azure Inference

March 16, 20263 min read

Summary

Microsoft has launched a public preview of Fireworks AI on Microsoft Foundry, bringing high-throughput, low-latency open-model inference to Azure through a single managed endpoint. It matters because enterprises can now access models like DeepSeek V3.2, gpt-oss-120b, Kimi K2.5, and MiniMax M2.5 with Azure’s governance, serverless or provisioned deployment options, and bring-your-own-weights support—making it easier to move open-model AI from experimentation into production.

Fireworks AI arrives on Microsoft Foundry

Introduction

Organizations adopting open models want more than raw performance—they need a practical way to run those models securely, govern them consistently, and move from testing to production without stitching together multiple tools. Microsoft’s new public preview of Fireworks AI on Microsoft Foundry is aimed at solving that problem by combining fast open-model inference with Azure’s enterprise management and governance capabilities.

What’s new

Microsoft Foundry now includes Fireworks AI as a public preview option for open model inference in Azure. The announcement positions Foundry as a centralized control plane for the full AI lifecycle, including model evaluation, deployment, customization, and operations.

Key updates include:

Public preview of Fireworks AI on Microsoft Foundry for high-throughput, low-latency open model inference
Access to supported open models through a single Azure endpoint in Foundry
Support for these models today:
- DeepSeek V3.2
- OpenAI gpt-oss-120b
- Kimi K2.5
- MiniMax M2.5
MiniMax M2.5 is newly added to Foundry with serverless support
Bring-your-own-weights (BYOW) support for quantized or fine-tuned models trained elsewhere
Deployment flexibility with:
- Serverless, pay-per-token inference for rapid experimentation
- Provisioned Throughput Units (PTUs) for predictable production performance

Microsoft also highlighted Fireworks AI’s large-scale inference capabilities, including internet-scale token processing and benchmark-leading throughput for open models.

Why this matters for IT and platform teams

For Azure administrators, AI platform teams, and enterprise architects, this reduces the operational complexity of supporting open models. Instead of building separate serving stacks or governance frameworks, teams can use Foundry as a single environment for model access, deployment, observability, and policy control.

This is especially relevant for organizations that want to:

Standardize on open models without vendor lock-in
Support custom fine-tuned models while keeping a consistent serving platform
Balance cost and performance across experimentation and production workloads
Apply enterprise governance and security controls to AI deployments in Azure

Recommended next steps

Admins and AI teams should:

Review the Microsoft Foundry model catalog for Fireworks-hosted models.
Evaluate whether serverless or PTU-based deployments best fit workload requirements.
Test BYOW scenarios if your organization already has fine-tuned or quantized open models.
Validate governance, observability, and operational requirements before production rollout.
Track Microsoft’s additional guidance on model customization and lifecycle management in Foundry.

Fireworks AI on Microsoft Foundry gives Azure customers a stronger path to operationalizing open models at scale—without sacrificing performance, flexibility, or enterprise control.

Fireworks AI on Microsoft Foundry for Azure Inference

Fireworks AI arrives on Microsoft Foundry

Introduction

What’s new

Why this matters for IT and platform teams

Recommended next steps

Need help with Azure?

Related Posts

Microsoft Azure Europe Expansion Boosts AI Capacity

Azure IaaS Security: Defense-in-Depth by Design

Azure API Management Named IDC Leader for 2026

Azure Local Scales Sovereign Private Cloud

Azure Integrated HSM Open Source Boosts Trust

GPT-5.5 in Microsoft Foundry for Enterprise AI