StepFun: Step3

OpenAI • text • vision • function-calling • json-mode

Provider IDstepfun-ai/step3

Step3 is a cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.

Quick Summary

Best For:

High-volume, low-latency tasks where cost efficiency is paramount

Pricing:

$0.00/1M input tokens, $0.00/1M output tokens

Context Window:

65,536 tokens

Key Differentiator:

Cost-optimized for high-volume usage

Specifications

Context Window

65,536 tokens

Max Output Tokens

65,536 tokens

Streaming

Yes

JSON Mode

Yes

Vision

Yes

Tier

Affordable

Capabilities

text

vision

function-calling

json-mode

StepFun: Step3

Best For:

Pricing:

Context Window:

Key Differentiator:

Social

Legal