Fireworks AI on Microsoft Foundry for Azure Inference
Summary
Microsoft has launched a public preview of Fireworks AI on Microsoft Foundry, bringing high-throughput, low-latency open-model inference to Azure through a single managed endpoint. It matters because enterprises can now access models like DeepSeek V3.2, gpt-oss-120b, Kimi K2.5, and MiniMax M2.5 with Azure’s governance, serverless or provisioned deployment options, and bring-your-own-weights support—making it easier to move open-model AI from experimentation into production.
Audio Summary
Fireworks AI arrives on Microsoft Foundry
Introduction
Organizations adopting open models want more than raw performance—they need a practical way to run those models securely, govern them consistently, and move from testing to production without stitching together multiple tools. Microsoft’s new public preview of Fireworks AI on Microsoft Foundry is aimed at solving that problem by combining fast open-model inference with Azure’s enterprise management and governance capabilities.
What’s new
Microsoft Foundry now includes Fireworks AI as a public preview option for open model inference in Azure. The announcement positions Foundry as a centralized control plane for the full AI lifecycle, including model evaluation, deployment, customization, and operations.
Key updates include:
- Public preview of Fireworks AI on Microsoft Foundry for high-throughput, low-latency open model inference
- Access to supported open models through a single Azure endpoint in Foundry
- Support for these models today:
- DeepSeek V3.2
- OpenAI gpt-oss-120b
- Kimi K2.5
- MiniMax M2.5
- MiniMax M2.5 is newly added to Foundry with serverless support
- Bring-your-own-weights (BYOW) support for quantized or fine-tuned models trained elsewhere
- Deployment flexibility with:
- Serverless, pay-per-token inference for rapid experimentation
- Provisioned Throughput Units (PTUs) for predictable production performance
Microsoft also highlighted Fireworks AI’s large-scale inference capabilities, including internet-scale token processing and benchmark-leading throughput for open models.
Why this matters for IT and platform teams
For Azure administrators, AI platform teams, and enterprise architects, this reduces the operational complexity of supporting open models. Instead of building separate serving stacks or governance frameworks, teams can use Foundry as a single environment for model access, deployment, observability, and policy control.
This is especially relevant for organizations that want to:
- Standardize on open models without vendor lock-in
- Support custom fine-tuned models while keeping a consistent serving platform
- Balance cost and performance across experimentation and production workloads
- Apply enterprise governance and security controls to AI deployments in Azure
Recommended next steps
Admins and AI teams should:
- Review the Microsoft Foundry model catalog for Fireworks-hosted models.
- Evaluate whether serverless or PTU-based deployments best fit workload requirements.
- Test BYOW scenarios if your organization already has fine-tuned or quantized open models.
- Validate governance, observability, and operational requirements before production rollout.
- Track Microsoft’s additional guidance on model customization and lifecycle management in Foundry.
Fireworks AI on Microsoft Foundry gives Azure customers a stronger path to operationalizing open models at scale—without sacrificing performance, flexibility, or enterprise control.
Need help with Azure?
Our experts can help you implement and optimize your Microsoft solutions.
Talk to an ExpertStay updated on Microsoft technologies