Thursday, January 15, 2026

Radar Tendencies to Watch: December 2025 – O’Reilly

November ended. Thanksgiving (within the US), turkey, and a prepare of mannequin bulletins. The bulletins had been thrilling: Google’s Gemini 3 places it within the lead amongst giant language fashions, no less than in the interim. Nano Banana Professional is a spectacularly good text-to-image mannequin. OpenAI has launched its heavy hitters, GPT-5.1-Codex-Max and GPT-5.1 Professional. And the Allen Institute launched its newest open supply mannequin, Olmo 3, the main open supply mannequin from the US.

Since Tendencies avoids deal-making (ought to we?), we’ve additionally prevented the angst round an AI bubble and its implosion. Proper now, it’s secure to say that the bubble is fashioned of cash that hasn’t but been invested, not to mention spent. If it’s a bubble, it’s sooner or later. Do guarantees and desires make a bubble? Does a bubble fabricated from guarantees and desires pop with a bang or a pffft?

AI

  • Now that Google and OpenAI have laid down their playing cards, Anthropic has launched its newest heavyweight mannequin: Opus 4.5. They’ve additionally dropped the value considerably.
  • The Allen Institute has launched its newest open supply mannequin, Olmo 3. The institute’s opened up the entire growth course of to permit different groups to know its work.
  • To not be outdone, Google has launched Nano Banana Professional (aka Gemini 3 Professional Picture), its state-of-the-art picture technology mannequin. Nano Banana’s largest characteristic is the power to edit photos to alter the looks of things with out redrawing them from scratch. And based on Simon WIllison, it watermarks the components of a picture it generates with SynthID.
  • OpenAI has launched two extra elements of GPT-5.1, GPT-5.1-Codex-Max (API) and GPT-5.1 Professional (ChatGPT). This launch brings the corporate’s strongest fashions for generative work into view.
  • A gaggle of quantum physicists declare to have lowered the dimensions of the DeepSeek mannequin by half, and to have eliminated Chinese language censorship. The mannequin can now let you know what occurred in Tiananmen Sq., clarify what Pooh seemed like, and reply different forbidden questions.
  • The discharge prepare for Gemini 3 has begun, and the commentariat rapidly topped it king of the LLMs. It consists of the power to spin up an internet interface so customers may give it extra details about their questions, and to generate diagrams together with textual content output.
  • As a part of the Gemini 3 launch, Google has additionally introduced a brand new agentic IDE known as Antigravity.
  • Google has launched a brand new climate forecasting mannequin, WeatherNext 2, that may forecast with resolutions as much as 1 hour. The information is on the market via Earth Engine and BigQuery, for many who wish to do their very own forecasting. There’s additionally an early entry program on Vertex AI.
  • Grok 4.1 has been launched, with experiences that it’s presently the very best mannequin at generative prose, together with artistic writing. Be that as it could, we don’t see why anybody would use an AI that has been skilled to mirror Elon Musk’s ideas and values. If AI has taught us one factor, it’s that we have to assume for ourselves.
  • AI calls for the creation of recent information facilities and new vitality sources. States need to guarantee that these energy crops are constructed, and in-built ways in which don’t go prices on to customers.
  • Grokipedia makes use of questionable sources. Is anybody shocked? How else would you prepare an AI on the newest conspiracy theories?
  • AMD GPUs are aggressive, however they’re hampered as a result of there are few libraries for low-level operations. To resolve this drawback, Chris Ré and others have introduced HipKittens, a library of programming primitive operations for AMD GPUs.
  • OpenAI has launched GPT-5.1. The 2 new fashions are Instantaneous, which is tuned to be extra conversational and “human,” and Considering, a reasoning mannequin that now adapts the time it takes to “assume” to the problem of the questions.
  • Massive language fashions, together with GPT-5 and the Chinese language fashions, present bias in opposition to customers who use a German dialect fairly than customary German. The bias seemed to be higher because the mannequin dimension elevated. These outcomes additionally apply to languages like English.
  • Ethan Mollick on evaluating (finally, interviewing) your AI fashions is a must-read.
  • Yann LeCun is leaving Fb to launch a brand new startup that may develop his concepts about constructing AI.
  • Harbor is a brand new device that simplifies benchmarking frameworks and fashions. It’s from the builders of the Terminal-Bench benchmark. And it brings us a step nearer to a world the place folks construct their very own specialised AI fairly than depend on giant suppliers.
  • Music rights holders are starting to make offers with Udio (and presumably different firms) that prepare their fashions on current music. Sadly, this doesn’t remedy the larger drawback: Music is a “collectively produced shared cultural good, sustained by human labor. Copyright isn’t suited to defending this type of shared worth,” as professors Oliver Bown and Kathy Bowrey have argued.
  • Moonshot AI has lastly launched Kimi K2 Considering, the primary open weights mannequin to have benchmark outcomes aggressive with—or exceeding—the very best closed weights fashions. It’s designed for use as an agent, calling exterior instruments as wanted to resolve issues.
  • Tongyi DeepResearch is a brand new totally open supply agent for doing analysis. Its outcomes are akin to OpenAI deep analysis, Claude Sonnet 4, and related fashions. Tongyi is a part of Alibaba; it’s yet one more essential mannequin to return out of China.
  • Information facilities in area? It’s an attention-grabbing and difficult concept. Cooling is a a lot larger drawback than you’d count on. They might require huge arrays of photo voltaic cells for energy. However some folks assume it’d occur.
  • MiniMax M2 is a brand new open weights mannequin that focuses on constructing brokers. It has efficiency much like Claude Sonnet however at a a lot cheaper price level. It additionally embeds its thought processes between and tags, which is a vital step towards interpretability.
  • DeepSeek has launched a new mannequin for OCR with some very attention-grabbing properties: It has a brand new course of for storing and retrieving reminiscences that additionally makes the mannequin considerably extra environment friendly.
  • Agent Lightning supplies a code-free strategy to prepare brokers utilizing reinforcement studying.

Programming

  • The Zig programming language has printed a e book. On-line, after all.
  • Google is weakening its controversial new guidelines about developer verification. The corporate plans to create a separate class for functions with restricted distribution, and develop a stream that may enable the set up of unverified apps.
  • Google’s LiteRT is a library for operating AI fashions in browsers and small units. LiteRT helps Android, iOS, embedded Linux, and microcontrollers. Supported languages embrace Java, Kotlin, Swift, Embedded C, and C++.
  • Does AI-assisted coding imply the tip of recent languages? Simon Willison thinks that LLMs can encourage the event of recent programming languages. Design your language and ship it with a Claude Expertise-style doc; that must be sufficient for an LLM to discover ways to use it.
  • Deepnote, a successor to the Jupyter Pocket book, is a next-generation pocket book for information analytics that’s constructed for groups. There’s now a shared workspace; completely different blocks can use completely different languages; and AI integration is on the street map. It’s now open supply.
  • The thought of assigning colours (purple, blue) to instruments could also be useful in limiting the chance of immediate injection when constructing brokers. What instruments can return one thing damaging? This seems like a step in direction of the appliance of the “least privilege” precept to AI design.

Safety

  • We’re making the identical mistake with AI safety as we made with cloud safety (and safety on the whole): treating safety as an afterthought.
  • Anthropic claims to have disrupted a Chinese language cyberespionage group that was utilizing Claude to generate assaults in opposition to different techniques. Anthropic claims that the assault was 90% automated, although that declare is controversial.
  • Don’t change into a sufferer. Information collected for on-line age verification makes your web site a goal for attackers. That information is effective, they usually comprehend it.
  • A analysis collaboration makes use of information poisoning and AI to disrupt deepfake photos. Customers use Silverer to course of their photos earlier than posting. The device makes invisible modifications to the unique picture that confuse AIs creating new photos, resulting in unusable distortions.
  • Is it a shock that AI is getting used to generate faux receipts and expense experiences? In spite of everything, it’s used to faux nearly every little thing else. It was inevitable that enterprise functions of AI fakery would seem.
  • HydraPWK2 is a Linux distribution designed for penetration testing. It’s based mostly on Debian and is supposedly simpler to make use of than Kali Linux.
  • How safe is your trusted execution setting (TEE)? All the main {hardware} distributors are weak to a variety of bodily assaults in opposition to “safe enclaves.” And their phrases of service typically exclude bodily assaults.
  • Atroposia is a new malware-as-a-service package deal that features a native vulnerability scanner. As soon as an attacker has damaged right into a web site, they will discover different methods to stay there.
  • A brand new form of phishing assault (CoPhishing) makes use of Microsoft Copilot Studio brokers to steal credentials by abusing the Signal In subject. Microsoft has promised an replace that may defend in opposition to this assault.

Operations

  • Right here’s tips on how to set up Open Pocket book, an open supply equal to NotebookLM, to run by yourself {hardware}. It makes use of Docker and Ollama to run the pocket book and the mannequin domestically, so information by no means leaves your system.
  • Open supply isn’t “free as in beer.” Neither is it “free as in freedom.” It’s “free as in puppies.” For higher or for worse, that almost says it.
  • Want a framework for constructing proxies? Cloudflare’s subsequent technology Oxy framework is perhaps what you want. (No matter you consider their latest misadventure.)
  • MIT Media LabsMission NANDA intends to construct infrastructure for a decentralized community of AI brokers. They describe it as a world decentralized registry (not in contrast to DNS) that can be utilized to find and authenticate brokers utilizing MCP and A2A. Isn’t this what we needed from the web within the first place?

Net

Issues

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles