Yuan 3.0 Ultra: New AI Model for Smarter, Faster Tech

DeepGeek
المؤلف DeepGeek
تاريخ النشر
آخر تحديث
Yuan 3.0 Ultra: New AI Model for Smarter, Faster Tech

Research on expert load distribution shows two main parts during training:

  1. Early Stage: Expert loads change a lot because they start randomly.
  2. Stable Stage: Expert loads settle down. The order of experts for processing data stays mostly the same.
🐝
  • Small Load Rule (⍺): This helps experts with much less work than average.
  • Total Load Rule (β): This finds experts that do the least work overall.

Faster Hardware and Better Expert Setup

MethodTFLOPS per GPU
Base Model (1515B)62.14
DeepSeek-V3 Aux Loss80.82
Yuan3.0 Ultra (LAEP)92.60
  • Model Pruning: Helped make it 32.4% more efficient.
  • Expert Rearrangement: Helped make it 15.9% more efficient.

Less Overthinking with New RIRM Method

  • rmin=0: Best for quick, direct answers.
  • rmax=3: The highest number of checks allowed.

How Yuan 3.0 Ultra Does on Business Tests

TestWhat it TestsYuan3.0 Ultra ScoreTop Competitor Score
DocmatixMultimodal RAG67.4%48.4% (GPT-5.2)
ChatRAGText Search (Avg)68.2%53.6% (Kimi K2.5)
MMTabTable Questions62.3%66.2% (Kimi K2.5)
SummEvalSummaries62.8%49.9% (Claude Opus 4.6)
Spider 1.0Text-to-SQL83.9%82.7% (Kimi K2.5)
BFCL V3Using Tools67.8%78.8% (Gemini 3.1 Pro)

#Yuan30Ultra #AIModel #MultimodalAI #MoEModel #AIEfficiency #LargeLanguageModels #ArtificialIntelligence
أضف تفاعلك على هذا المقال

Commentaires

عدد التعليقات : 0