Cloudflare launched new infrastructure to run large AI language models across its global network. The system separates input processing (prefill) from output generation (decode) using optimized hardware configurations.

The architecture features a custom inference engine named Infire to manage GPUs. A secondary system called Unweight compresses AI model weights by 15-22% without sacrificing accuracy.

These advancements reduce memory usage to increase model speed and efficiency on the platform.