Summary

Context

  • More efficient, and better open-source for coding agent

Solution

  • Use index-share (some sort of optimisation on the attention layer) to improve FLOP
  • Multi-token prediction (MTP) layers to improve throughput
  • Support 1M context window.
  • Use a new infra for training and inference (slime).
  • Anti-hacking to improve RL. RL tend to find shortcut to achieve the reward (reward-hacking). Detect when the model does that during training and inference.
  • Only 1% behind Opus 4.8???
    • The comparison use 744B model (800gb+)