Overview
Most capable open-source model excelling in reasoning, coding, mathematical tasks, and academic research with cost-effective performance
Key Features
- Mixture-of-Experts (MoE) architecture with 671B total parameters, 37B activated per token
- Multi-head Latent Attention (MLA) and auxiliary-loss-free load balancing for efficiency
- Trained on 14.8T high-quality tokens with advanced post-training optimization
- Superior performance in math, coding, and reasoning benchmarks vs leading models
- Cost-effective training at only $5.6M total cost using innovative techniques
Input Capabilities
Text
Technical Specifications
Context Window
64,000
Max Output
8,000
Model Information
Provider: DeepSeek
Model Code: deepseek-chat
Category: Chat