Azure

Fireworks AI on Microsoft Foundry for Azure Inference

3 min read

Summary

Microsoft has launched a public preview of Fireworks AI on Microsoft Foundry, bringing high-throughput, low-latency open-model inference to Azure through a single managed endpoint. It matters because enterprises can now access models like DeepSeek V3.2, gpt-oss-120b, Kimi K2.5, and MiniMax M2.5 with Azure’s governance, serverless or provisioned deployment options, and bring-your-own-weights support—making it easier to move open-model AI from experimentation into production.

Audio Summary

0:00--:--
Need help with Azure?Talk to an Expert

Fireworks AI arrives on Microsoft Foundry

Introduction

Organizations adopting open models want more than raw performance—they need a practical way to run those models securely, govern them consistently, and move from testing to production without stitching together multiple tools. Microsoft’s new public preview of Fireworks AI on Microsoft Foundry is aimed at solving that problem by combining fast open-model inference with Azure’s enterprise management and governance capabilities.

What’s new

Microsoft Foundry now includes Fireworks AI as a public preview option for open model inference in Azure. The announcement positions Foundry as a centralized control plane for the full AI lifecycle, including model evaluation, deployment, customization, and operations.

Key updates include:

  • Public preview of Fireworks AI on Microsoft Foundry for high-throughput, low-latency open model inference
  • Access to supported open models through a single Azure endpoint in Foundry
  • Support for these models today:
    • DeepSeek V3.2
    • OpenAI gpt-oss-120b
    • Kimi K2.5
    • MiniMax M2.5
  • MiniMax M2.5 is newly added to Foundry with serverless support
  • Bring-your-own-weights (BYOW) support for quantized or fine-tuned models trained elsewhere
  • Deployment flexibility with:
    • Serverless, pay-per-token inference for rapid experimentation
    • Provisioned Throughput Units (PTUs) for predictable production performance

Microsoft also highlighted Fireworks AI’s large-scale inference capabilities, including internet-scale token processing and benchmark-leading throughput for open models.

Why this matters for IT and platform teams

For Azure administrators, AI platform teams, and enterprise architects, this reduces the operational complexity of supporting open models. Instead of building separate serving stacks or governance frameworks, teams can use Foundry as a single environment for model access, deployment, observability, and policy control.

This is especially relevant for organizations that want to:

  • Standardize on open models without vendor lock-in
  • Support custom fine-tuned models while keeping a consistent serving platform
  • Balance cost and performance across experimentation and production workloads
  • Apply enterprise governance and security controls to AI deployments in Azure

Admins and AI teams should:

  1. Review the Microsoft Foundry model catalog for Fireworks-hosted models.
  2. Evaluate whether serverless or PTU-based deployments best fit workload requirements.
  3. Test BYOW scenarios if your organization already has fine-tuned or quantized open models.
  4. Validate governance, observability, and operational requirements before production rollout.
  5. Track Microsoft’s additional guidance on model customization and lifecycle management in Foundry.

Fireworks AI on Microsoft Foundry gives Azure customers a stronger path to operationalizing open models at scale—without sacrificing performance, flexibility, or enterprise control.

Need help with Azure?

Our experts can help you implement and optimize your Microsoft solutions.

Talk to an Expert

Stay updated on Microsoft technologies

AzureMicrosoft FoundryFireworks AIopen modelsAI inference

Related Posts

Azure

Microsoft Azure Europe Expansion Boosts AI Capacity

Microsoft is expanding Azure datacenter capacity across Europe to meet rising demand for cloud and AI workloads, with investments in new and existing regions including Denmark, Belgium, Austria, Greece, and Finland. The update matters for IT leaders because it improves data residency options, supports sovereign cloud requirements, and brings lower-latency infrastructure closer to users and regulated workloads.

Azure

Azure IaaS Security: Defense-in-Depth by Design

Microsoft has outlined how Azure IaaS applies defense-in-depth across hardware, compute, networking, storage, and operations using secure-by-design, secure-by-default, and secure-in-operation principles. The update matters because it clarifies which protections are built into the platform by default and where IT teams should align their own VM, network, and identity configurations.

Azure

Azure API Management Named IDC Leader for 2026

Microsoft has been named a Leader in the IDC MarketScape: Worldwide API Management 2026 Vendor Assessment, highlighting Azure API Management’s role in governing both traditional APIs and AI workloads. For IT teams, the announcement underscores Microsoft’s push to provide a single platform for API security, observability, policy enforcement, and AI gateway capabilities at enterprise scale.

Azure

Azure Local Scales Sovereign Private Cloud

Microsoft has expanded Azure Local to support sovereign private cloud deployments that scale from hundreds to thousands of servers within a single sovereign boundary. The update helps governments, regulated industries, and critical infrastructure operators run larger AI, analytics, and mission-critical workloads locally while maintaining data residency, compliance, and operational control.

Azure

Azure Integrated HSM Open Source Boosts Trust

Microsoft has open-sourced key components of Azure Integrated HSM, including firmware, drivers, and the software stack, while launching an Open Compute Project workgroup to guide development. The move gives customers and regulators more transparency into Azure’s server-local hardware key protection model and prepares the technology for broader availability in Azure V7 virtual machines.

Azure

GPT-5.5 in Microsoft Foundry for Enterprise AI

Microsoft is making OpenAI GPT-5.5 generally available in Microsoft Foundry, giving Azure customers a new frontier model designed for long-context reasoning, agentic execution, and lower token usage. The update matters for enterprises because Foundry adds the security, governance, identity, and deployment controls needed to run production AI agents at scale.