Alibaba (Qwen) が Qwen3-ForcedAligner-0.6B-hf をリリース

ALIBABA (QWEN)OSS

最終更新: 2026年06月26日 18:05　元記事 →

Alibaba (Qwen) から Qwen3-ForcedAligner-0.6B-hf がリリースされました。

▸何が変わったのか

Qwen3-ForcedAligner (Transformers native)
Overview
The Qwen3-ASR family includes Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and ASR for 52 languages and dialects. Both leverage large-scale speech training data and the strong audio understanding capability of their foundation model, Qwen3-Omni. The 1.7B version achieves state-of-the-art performance among open-source ASR models and is competitive with the strongest proprietary commercial APIs.
Key features:
– All-in-one: Supports language identification and speech recognition for 30 languages and 22 Chinese dialects, including English accents from multiple countries and regions.
– Excellent and Fast: High-quality and robust recognition under complex acoustic environments. Qwen3-ASR-0.6B reaches 2000× throughput at a concurrency of 128. Both models support streaming/offline unified inference with a single model and handle long audio.
– Forced Alignment: Qwen3-ForcedAligner-0.6B supports timestamp prediction for arbitrary units within up to 5 minutes of speech in 11 languages, surpassing E2E-based forced-alignment models in accuracy.
Model Architecture
Available Checkpoints
| Model | Supported Languages | Supported Dialects | Inference Mode | Audio Types |
|—|—|—|—|—|
| Qwen/Qwen3-ASR-1.7B-hf & Qwen/Qwen3-ASR-0.6B-hf | Chinese (zh), English (en), Cantonese (yue), Arabic (ar), German (de), French (fr), Spanish (es), Portuguese (pt), Indonesian (id), Italian (it), Korean (ko), Russian (ru), Thai (th), Vietnamese (vi), Japanese (ja), Turkish (tr), Hindi (hi), Malay (ms), Dutch (nl), Swedish (sv), Danish (da), Finnish (fi), Polish (pl), Czech (cs), Filipino (fil), Persian (fa), Greek (el), Hungarian (hu), Macedonian (mk), Romanian (ro) | Anhui, Dongbei, Fujian, Gansu, Guizhou, Hebei, Henan, Hubei, Hunan, Jiangxi, Ningxia, Shandong, Shaanxi, Shanxi, Sichuan, Tianjin, Yunnan, Zhejiang, Cantonese (HK), Cantonese (Guangdong), Wu, Minnan | Offline / Streaming | Speech, Singing Voice, Songs with BGM |
| Qwen/Qwen3-ForcedAligner-0.6B-hf | Chinese, English, Cantonese, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish | — | NAR | Speech |
—
Setup
Until Qwen3-ForcedAligner is part of an official Transformers release, install from source:
bash
pip install git+https://github.com/huggingface/transformers
—
Usage
With Qwen3-ASR
Transcribe with the ASR model, then pass the transcript and audio to the forced aligner.
python
import torch
from transformers import AutoProcessor, AutoModelForMultimodalLM, AutoModelForTokenClassification
asrmodelid = “Qwen/Qwen3-ASR-0.6B-hf”
alignermodelid = “Qwen/Qwen3-ForcedAligner-0.6B-hf”
asrprocessor = AutoProcessor.frompretrained(asrmodelid)
asrmodel = AutoModelForMultimodalLM.frompretrained(asrmodelid, devicemap=”auto”)
alignerprocessor = AutoProcessor.frompretrained(alignermodelid)
alignermodel = AutoModelForTokenClassification.frompretrained(
alignermodelid, dtype=torch.bfloat16, devicemap=”auto”
)
audiourl = “https:/

◆入手方法・リンク

公式サイトを確認してください。

公式発表を読む

SOURCE: Alibaba (Qwen) (2026-06-26)

← LLM Watch トップへ

Alibaba (Qwen) が Qwen3-ForcedAligner-0.6B-hf をリリース

▸何が変わったのか

◆入手方法・リンク

Tencent が Hy-MT1.5-1.8B-1.25bit をリリース ── 440MBで33言語翻訳、スマホでサクサク動く極限の軽量モデル

DeepSeek が DeepSeek-V4 をリリース ── 100万トークン対応、推論コスト劇減の超効率アーキテクチャ

Tencent が Hy3-preview-Base をリリース ── 295Bの超巨大MoE、アクティブ21Bで競合を凌駕

Tencent が UniCom をリリース ── 圧縮連続表現で理解も生成も両立する統合マルチモーダルモデル

ByteDance が Cola-DLM をリリース ── 拡散モデルで言語生成に挑む異端のアーキテクチャ

MiniMax が MiniMax-M3-MXFP8 をリリース ── 約428Bパラメータのネイティブマルチモーダル、100万コンテキストをスパースアテンションで高速処理

コメントを残すコメントをキャンセル

▸何が変わったのか

◆入手方法・リンク

類似投稿

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル