Open-Source Models Make Huge Leaps

Welcome to this week's edition of Overclocked!

Open-source AI models are crushing benchmarks and challenging the closed-source world, while Mark Zuckerberg says the future isn’t one giant model—it’s billions of personal superintelligences. Let’s dive in ⬇️

In today’s newsletter ↓
🆚 Open-source challengers aim past GPT-4
🧠 Zuckerberg plots a personal AI for everyone
💼 Big Tech’s AI–core business blur sparks antitrust talk
🏀 AI technologists negotiate $100 million contracts
🔬 Weekly Challenge: Stress-test models on OpenRouter

🆚 The Open-Source Battle Heats Up

Open-source has never looked so formidable. In the span of two weeks, three heavyweight contenders dropped code that edges closer to closed-source titans.

🔥 GLM 4.5 (Zhipu AI)

Beijing-based Zhipu released GLM 4.5-9B weights under an Apache-2.0 license. Benchmarks in English and Chinese show 77 percent on MMLU—roughly GPT-3.5 territory—while context windows stretch to 256 k tokens, enough to swallow a full textbook. The catch: generation still lags in reasoning depth, and the largest 130 B-parameter checkpoint remains research-only.

🌅 Horizon Alpha, OpenAI’s Ghost?

A surprise repo called Horizon Alpha-36B appeared on Hugging Face with a model card referencing “OpenAI alignment weights.” Researchers quickly noted a GPT-style tokenizer and strong code synthesis scores. OpenAI hasn’t claimed the drop, but speculation is rampant that the company quietly seeded a community test bed before GPT-5. Until provenance is confirmed, major platforms are sandboxing Horizon-Alpha to watch for license or safety red flags.

🚀 Kimi K2 (Moonshot AI)

Shanghai’s Moonshot pushed Kimi K2-MoE, a mixture-of-experts model boasting 1 million-token context and 88 percent on GSM8K math—within two points of Gemini 2.5-Pro. Early adopters love Kimi’s long-document Q&A but report slower first-token latency outside mainland China.

🌏 Beyond the Benchmarks

  • Sovereignty: Chinese labs pitch open models as a hedge against U.S. chip sanctions.

  • Cost curves: Self-hosting GLM 4.5 on eight H100s costs ≈$1.20 per million tokens—half the price of GPT-4-o via API.

  • Innovation loops: Open weights let startups fine-tune for niche domains (legal Spanish, quantum chemistry) without waiting for closed vendors.

Yet limitations remain: scattered documentation, inconsistent safety filters, and no guarantee of long-term support. For developers, the decision is no longer open vs. closed but “Which blend of cost, capability, and governance fits my stack?” The arms race just turned cooperative—and chaotic.

🧠 Zuckerberg Bets Big on Personal Superintelligence

In a manifesto titled “Personal Superintelligence,” Mark Zuckerberg laid out Meta’s most ambitious vision yet: an AI that “knows us deeply, understands our goals, and helps us achieve them.” Unlike rivals chasing a monolithic AGI, Meta says the future belongs to billions of individualized models running on-device and in a privacy-aware cloud.

🔑 Key Pillars

Here’s why Meta is uniquely positioned to make a foothold in the ‘personal superintelligence’ race:

  • Context-rich hardware: Meta’s next-gen smart-glasses will use cameras and multimodal sensors to feed real-time context into a local Llama variant.

  • Edge inference: A pared-down “ego-model” will run on-device, with heavy reasoning off-loaded to data-center LLMs tuned to a user’s preferences.

  • Open ecosystem: While acknowledging safety trade-offs, Meta reiterates its commitment to open-sourcing most research checkpoints—arguing “a free society requires visible code.”

🏗 Infrastructure Check

Regarding infrastructure, Meta is one of the best positioned tech companies to take advantage of compute, talent, and spend. From buying a 49% stake in Scale to luring countless AI researchers and scientists away from top companies like Apple, OpenAI, and others, Zuckerberg seems to be digging in deep - both in his pockets and while breaking ground on new data centers.

🤔 What Could Possibly Go Wrong?

  • Privacy drift: Glasses that “see what we see” intensify fears of always-on surveillance.

  • Safety scaling: Building guardrails for billions of unique models is uncharted territory.

  • Economic shakeup: If everyone wields a negotiation-savvy AI, does anyone have leverage?

Still, the upside is hard to ignore. Imagine an assistant that drafts a pitch in your voice, books travel around your calendar quirks, and nudges you off doom-scrolling—without ever phoning home to a centralized brain. Meta believes getting there first could reshape the consumer-tech pecking order for the next decade.

The Weekly Scoop 🍦

💡 Weekly Challenge: Stress-Test Models on OpenRouter Like a Pro

Goal: Figure out which open-source (or semi-open) model actually fits your daily workflow—without burning hours of guess-and-check.

Here’s what to do:

🔎 Pick Your Fighters – open OpenRouter.ai and bookmark three engines with very different DNA (e.g., Llama-3-70B, Mixtral-8x22B, and Kimi K2-MoE).

  • ✍️ Craft a Nuanced Prompt – start with a tough but realistic task:
    “Draft a polite refund email that cites EU consumer-rights regulation 2011/83/EU and offers a store credit alternative.”

  • Run Them in Parallel – note first-token latency, full-response time, word count, and whether the model refuses, hallucinates law, or nails the citation.

  • 🔄 Switch Contexts – recycle the same trio of models on:
    • legal Spanish summary
    • a 12-line haiku epic
    • Python code to sort a 2 GB CSV

  • 📊 Scorecard It – rate each run 1-5 on speed, accuracy, tone, and cost per 1K tokens.

That’s it for the latest in AI news this week! Open-source fireworks and Meta’s personal-AI gamble prove the frontier is as much about philosophy as horsepower. Which side are you on?Hit reply and share.

Zoe from Overclocked