Skip to content

feat: add azure-foundry provider for Microsoft Foundry model access#950

Open
guglxni wants to merge 5 commits into
MoonshotAI:mainfrom
guglxni:feat/azure-foundry-provider
Open

feat: add azure-foundry provider for Microsoft Foundry model access#950
guglxni wants to merge 5 commits into
MoonshotAI:mainfrom
guglxni:feat/azure-foundry-provider

Conversation

@guglxni

@guglxni guglxni commented Jun 20, 2026

Copy link
Copy Markdown

Related Issue

Resolve #918

Problem

Kimi Code has no first-class support for Microsoft Foundry model deployments. Users must hand-wire the generic openai provider against Foundry's OpenAI v1-compatible route, which is undocumented, fragile around auth (api-key vs Bearer), and poorly named for Foundry's multi-model catalog (GPT, DeepSeek, Llama, Mistral, etc.). Real-world usage already exists via this workaround (#520).

Foundry-hosted Kimi reasoning models (e.g. Kimi-K2.6) additionally hit think-only responses when wired through the generic OpenAI adapter: max_tokens shares the output budget with reasoning_content, so the model can finish reasoning without emitting visible text or tool calls.

What changed

  • Add a new azure-foundry provider type across kosong, agent-core, and oauth custom-registry wiring.
  • Implement AzureFoundryChatProvider targeting Foundry's OpenAI v1 route (https://{resource}.openai.azure.com/openai/v1) with api-key header auth, delegating streaming/tools/reasoning to the existing OpenAI chat-completions adapter.
  • Require base_url before constructing the client so api-key auth never falls back to the default OpenAI host.
  • Clamp completion budgets against Foundry's shared input+output context window (max_context_size).
  • Recover once when a model stalls after tool results without issuing further tool calls (#520).
  • Foundry-hosted Kimi reasoning models: detect Kimi deployment ids and send the native Kimi wire format — max_completion_tokens (visible output budget) plus thinking: { type: 'enabled' } alongside reasoning_effort, instead of max_tokens which conflates reasoning and output.
  • Honor explicit withThinking('off') over history-based reasoning_effort auto-injection.
  • Apply KIMI_MODEL_THINKING_KEEP to Foundry-hosted Kimi models.
  • Document setup and credential keys (AZURE_FOUNDRY_API_KEY, AZURE_FOUNDRY_BASE_URL) in English and Chinese provider/config docs.
  • Add unit and e2e adapter tests for auth headers, base URL normalization, shared-window clamping, Kimi wire format, provider resolution, and catalog wire inference.

Out of scope (follow-ups per #918): Entra ID token refresh, legacy deployment URLs with api-version, Foundry Agent Service APIs, and /provider catalog import (models.dev has no Azure entry).

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked a related issue, or explained the problem above.
  • I have added tests that prove my feature works.
  • Ran gen-changesets skill, or this PR needs no changeset.
  • Ran gen-docs skill, or this PR needs no doc update.

Test plan

  • pnpm vitest run packages/kosong/test/azure-foundry.test.ts packages/kosong/test/kimi-reasoning.test.ts packages/kosong/test/shared-context-window.test.ts packages/kosong/test/catalog.test.ts packages/agent-core/test/harness/runtime-provider.test.ts packages/agent-core/test/config/kimi-env-params.test.ts
  • pnpm --filter @moonshot-ai/kosong typecheck
  • pnpm --filter @moonshot-ai/agent-core typecheck
  • pnpm --filter @moonshot-ai/kimi-code-oauth typecheck

Introduce a first-class azure-foundry provider that targets Foundry's OpenAI
v1-compatible route with api-key authentication, so users no longer need to
hand-wire the generic openai provider for Azure deployments.

Closes MoonshotAI#918

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot

changeset-bot Bot commented Jun 20, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 1207f07

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@moonshot-ai/kimi-code Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f36bd80f8e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


const clientOpts: Record<string, unknown> = {
apiKey: key,
baseURL: baseUrl,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require a Foundry base URL before building the client

When an azure-foundry provider is configured with an API key but no base_url/AZURE_FOUNDRY_BASE_URL, baseUrl remains undefined here, so the OpenAI SDK falls back to its default OpenAI host instead of failing fast. That sends the Foundry api-key header to the wrong endpoint and produces a confusing upstream auth error; this provider should reject missing/blank base URLs before constructing the client.

Useful? React with 👍 / 👎.

guglxni and others added 4 commits June 20, 2026 21:02
Require base_url before constructing the Foundry client so api-key auth
never falls back to the default OpenAI host. Clamp completion budgets against
Foundry's shared input+output context window and recover once when a model
stalls after tool results without issuing further tool calls.

Addresses Codex review on MoonshotAI#950. Relates to MoonshotAI#918 and MoonshotAI#520.

Co-authored-by: Cursor <cursoragent@cursor.com>
Foundry deployments of Kimi-K2.x were using max_tokens, which shares the
output budget with reasoning_content and can yield think-only responses.
Use max_completion_tokens and thinking enablement like the native Kimi
provider, honor explicit thinking-off over history auto-injection, and
apply shared-window clamping against the correct completion field.
Microsoft Foundry exposes Kimi through the OpenAI chat-completions
schema and rejects the Moonshot-proprietary `thinking` argument. Keep
reasoning enabled via `reasoning_effort` and the max_completion_tokens
split; only KimiChatProvider sends `thinking` on the native Moonshot API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add azure-foundry provider for Microsoft Foundry model access

1 participant