GOOG: Google Releases New Technique to...

Google has released Multi-Token Prediction (MTP) drafters for its Gemma 4 family of open AI models, a new technique that can increase inference speeds by up to three times without degrading quality. This addresses a key bottleneck in large language models where performance is often limited by memory bandwidth rather than raw computing power.

The enhancement uses a form of speculative decoding, where a smaller, faster "drafter" model proposes several future tokens of text at once. The main, more powerful Gemma 4 model then verifies these proposed tokens in a single parallel pass, instead of generating them one by one. This process allows for significantly faster text generation for applications on local computers, mobile devices, and in the cloud. The drafters are available under an open-source license.

Related News

DeepSeek makes 75% API discount permanent, intensifying global AI price war

Google appeals search monopoly ruling, arguing it won fair and square

Alphabet sees $450 target, driven by AI and Cloud growth

Waymo Pauses Freeway and Atlanta Robotaxis, Citing Flooding Software Updates

Google Faces EU Fraud Complaints, Risking 6% Revenue Fine Over Scams