Skip to content
  • Status
  • Announcements
  • Docs
  • Support
  • About
  • Partners
  • Enterprise
  • Careers
  • Pricing
  • Privacy
  • Terms
  •  
  • © 2025 OpenRouter, Inc

    Qwen: Qwen3 VL 32B Instruct

    qwen/qwen3-vl-32b-instruct

    Created Oct 23, 2025262,144 context
    $0.50/M input tokens$1.50/M output tokens

    Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.

    Recent activity on Qwen3 VL 32B Instruct

    Total usage per day on OpenRouter

    Prompt
    1.82M
    Completion
    73K

    Prompt tokens measure input size. Reasoning tokens show internal thinking before a response. Completion tokens reflect total output length.