Azure

Fireworks AI on Microsoft Foundry for Azure Inference

3 min read

Summary

Microsoft has launched a public preview of Fireworks AI on Microsoft Foundry, bringing high-throughput, low-latency open-model inference to Azure through a single managed endpoint. It matters because enterprises can now access models like DeepSeek V3.2, gpt-oss-120b, Kimi K2.5, and MiniMax M2.5 with Azure’s governance, serverless or provisioned deployment options, and bring-your-own-weights support—making it easier to move open-model AI from experimentation into production.

Audio Summary

0:00--:--
Need help with Azure?Talk to an Expert

Fireworks AI arrives on Microsoft Foundry

Introduction

Organizations adopting open models want more than raw performance—they need a practical way to run those models securely, govern them consistently, and move from testing to production without stitching together multiple tools. Microsoft’s new public preview of Fireworks AI on Microsoft Foundry is aimed at solving that problem by combining fast open-model inference with Azure’s enterprise management and governance capabilities.

What’s new

Microsoft Foundry now includes Fireworks AI as a public preview option for open model inference in Azure. The announcement positions Foundry as a centralized control plane for the full AI lifecycle, including model evaluation, deployment, customization, and operations.

Key updates include:

  • Public preview of Fireworks AI on Microsoft Foundry for high-throughput, low-latency open model inference
  • Access to supported open models through a single Azure endpoint in Foundry
  • Support for these models today:
    • DeepSeek V3.2
    • OpenAI gpt-oss-120b
    • Kimi K2.5
    • MiniMax M2.5
  • MiniMax M2.5 is newly added to Foundry with serverless support
  • Bring-your-own-weights (BYOW) support for quantized or fine-tuned models trained elsewhere
  • Deployment flexibility with:
    • Serverless, pay-per-token inference for rapid experimentation
    • Provisioned Throughput Units (PTUs) for predictable production performance

Microsoft also highlighted Fireworks AI’s large-scale inference capabilities, including internet-scale token processing and benchmark-leading throughput for open models.

Why this matters for IT and platform teams

For Azure administrators, AI platform teams, and enterprise architects, this reduces the operational complexity of supporting open models. Instead of building separate serving stacks or governance frameworks, teams can use Foundry as a single environment for model access, deployment, observability, and policy control.

This is especially relevant for organizations that want to:

  • Standardize on open models without vendor lock-in
  • Support custom fine-tuned models while keeping a consistent serving platform
  • Balance cost and performance across experimentation and production workloads
  • Apply enterprise governance and security controls to AI deployments in Azure

Admins and AI teams should:

  1. Review the Microsoft Foundry model catalog for Fireworks-hosted models.
  2. Evaluate whether serverless or PTU-based deployments best fit workload requirements.
  3. Test BYOW scenarios if your organization already has fine-tuned or quantized open models.
  4. Validate governance, observability, and operational requirements before production rollout.
  5. Track Microsoft’s additional guidance on model customization and lifecycle management in Foundry.

Fireworks AI on Microsoft Foundry gives Azure customers a stronger path to operationalizing open models at scale—without sacrificing performance, flexibility, or enterprise control.

Need help with Azure?

Our experts can help you implement and optimize your Microsoft solutions.

Talk to an Expert

Stay updated on Microsoft technologies

AzureMicrosoft FoundryFireworks AIopen modelsAI inference

Related Posts

Azure

Azure Storage Migration: Plan and Move Data Confidently

Microsoft has outlined a more structured Azure Storage migration approach that combines Azure Migrate, the new Azure Copilot Migration Agent preview, Azure Storage Mover, and Azure Data Box. The guidance helps IT teams choose the right planning and transfer tools based on data size, network limits, synchronization needs, and modernization goals.

Azure

Azure Build 2026: 3 AI Priorities for Business Leaders

Microsoft Build 2026 emphasized a shift from AI experimentation to enterprise-scale systems designed to deliver measurable business outcomes. Key Azure announcements focused on shared business context for AI, integrated agent platforms with governance, and broader model choice to help organizations deploy AI faster, more securely, and with better cost control.

Azure

Claude Fable 5 in Microsoft Foundry Now Available

Microsoft has added Anthropic’s Claude Fable 5 to Microsoft Foundry, Foundry Agent Service, and GitHub Copilot for enterprise AI workloads. The model is designed for long-running, multi-step tasks and multimodal reasoning, while Foundry adds the governance, guardrails, and operational controls organizations need to deploy autonomous agents safely on Azure.

Azure

Azure Cobalt 200 VMs Boost Agentic AI Performance

Microsoft has announced early access preview for Azure Cobalt 200 Arm-based VMs, delivering up to 50% better generational CPU performance than Cobalt 100 for cloud-native, Linux-based, and agentic AI workloads. The new VMs add higher storage and networking performance, scale to 128 vCPUs, and enable memory encryption by default, making them important for organizations optimizing AI inferencing, data pipelines, and modern web services.

Azure

Azure Foundry IQ Adds Serverless Retrieval and MCP

Microsoft has expanded Azure Foundry IQ with serverless retrieval in public preview, new multi-source knowledge connectors, and generally available knowledge bases for production agent workloads. The updates help developers build and scale grounded AI agents faster while improving security, retrieval quality, and access to both enterprise and web data.

Azure

Microsoft Discovery GA: R&D AI Platform and App Preview

Microsoft has made Microsoft Discovery generally available as a production-ready platform for building and governing agentic AI workflows in scientific and engineering research. It also introduced the Microsoft Discovery app in preview, giving researchers and academic teams a simpler local entry point before moving to enterprise-scale deployments.