Qwen-AgentWorld-35B-A3B カバー画像

Alibaba (Qwen) が Qwen-AgentWorld-35B-A3B をリリース

ALIBABA (QWEN)OSS

最終更新: 2026年06月24日 12:05 元記事 →

Alibaba (Qwen) から Qwen-AgentWorld-35B-A3B がリリースされました。

何が変わったのか

Qwen-AgentWorld-35B-A3B
📑 Technical Report |
📖 Blog |
🤗 Hugging Face |
🤖 ModelScope |
💻 GitHub |
🖥️ Demo
> [!Note]
> This repository contains the model weights and configuration files for Qwen-AgentWorld-35B-A3B, a native language world model trained for agentic environment simulation.
>
> These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, etc.
Qwen-AgentWorld is the first language world model to cover seven agent interaction domains within a single model. It simulates agentic environments via long chain-of-thought reasoning, predicting the next environment state given an agent’s action and interaction history. Trained through a three-stage pipeline — CPT injects environment knowledge, SFT activates next-state-prediction reasoning, RL sharpens simulation fidelity — Qwen-AgentWorld is a native world model: environment modeling is the training objective from the CPT stage onward, not a post-hoc add-on.
Highlights
– Seven Unified Domains. A single model covers MCP (tool calling), Search, Terminal, SWE (software engineering), Android, Web, and OS — spanning both text and GUI interaction environments.
– Native World Model. Environment modeling from CPT onward, not post-hoc adaptation on a general-purpose LLM.
– Generalizable, Scalable & Controllable Simulator. Zero-shot generalization to OOD environments (e.g., OpenClaw); controllable perturbations and fictional-world construction surpass real-environment training.
– Agent Foundation Model. LWM RL warm-up on single-turn, non-agentic trajectories transfers to multi-turn, tool-calling agentic tasks across 7 benchmarks, including 3 entirely out-of-domain.
Model Overview
– Type: Causal Language Model (Language World Model)
– Base Model: Qwen3.5-35B-A3B-Base
– Training Stage: Continual Pre-Training (CPT) → Supervised Fine-Tuning (SFT) → Reinforcement Learning (RL, GSPO)
– Number of Parameters: 35B in total and 3B activated
– Hidden Dimension: 2048
– Token Embedding: 248320 (Padded)
– Number of Layers: 40
– Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
– Gated DeltaNet:
– Number of Linear Attention Heads: 32 for V and 16 for QK
– Head Dimension: 128
– Gated Attention:
– Number of Attention Heads: 16 for Q and 2 for KV
– Head Dimension: 256
– Rotary Position Embedding Dimension: 64
– Mixture Of Experts
– Number of Experts: 256
– Number of Activated Experts: 8 Routed + 1 Shared
– Expert Intermediate Dimension: 512
– Context Length: 262,144 tokens
– Disclaimer: No outputs from external API services are included in the training pipeline.
Performance
AgentWorldBench (Open-Ended Evaluation)
Five-dimensional rubric mean per domain, normalized to 0-100 scale.
| Model | MCP | Search | Term. | SWE | Android | Web | OS | Overall |
|:——|:—:|:——:|:—–:|:—:|:——-:|:—:|:–:|:———–:|
| GPT-5.4 | 70.10 | 37.26 | 53.69 | 66.29 | 60.00 | 51.80 | 68.58 | 58.25 |
| Claude Opus 4.8 | 54.93 | 35.14 | 59.18 | 64.10 | 61.50 | 54.66 | 66.62 | 56.59 |

入手方法・リンク

公式サイトを確認してください。

SOURCE: Alibaba (Qwen) (2026-06-22)

← LLM Watch トップへ

類似投稿

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です