#25 AI Field Notes
Plus: OpenAI o3's AGI Vibe, Google Gemini 2, AI Luminary Warns of Peak Data and Unpredictable AGI, Alignment Faking in AI, Bubble Talk and Holiday Greeting
Welcome back! In this issue, I cover:
Notable Developments:
OpenAI's 12-Day Shipmas Ends with a Bang
Google’s Big December Launches
AI Luminary Warns of Peak Data and Unpredictable AGI
Directors’ Corner: 2024 AI Field Notes
AI Safety: Alignment Faking
Bubble Talk
One More Thing: Happy Holidays
Enjoy.
Notable Developments
1. OpenAI's 12-Day Shipmas Ends with a Bang
OpenAI launched a "Shipmas" event this December, unveiling new products over 12 days. The 1-800-CHATGPT phone number that allows you to have a voice conversation with ChatGPT garnered significant social media attention despite its limited functionality. Sora demonstrated groundbreaking capabilities in video generation from text or images, advancing AI modeling of the physical world with implications ranging from 3D content creation to robotics.
The most significant launch came on the final day with the o3 Reasoning Model Preview. While not yet claimed as Artificial General Intelligence (AGI) officially, it's widely regarded as a substantial leap forward, laying the groundwork for AI applications in scientific research and systems engineering that were previously considered out of reach.
o3 is the first AI model to surpass human performance (scoring 87.5% versus humans' 85%) on the semi-private evaluation within the ARC-AGI benchmark, breaking a five-year record. In competitive coding, it achieved a rating of 2727 on Codeforces, a level reached by only a few hundred top human programmers worldwide. The model is currently extremely compute-intensive and remains in preview phase.
Meanwhile, OpenAI allows roughly 400 of its current and ex-employees to cash out up to $10million each in a private stock sale to SoftBank.
2. Google’s Big December Launches
While OpenAI might have taken the award for most significant advancement, Google has launched a series of AI productivity and product development tools powered by Gemini 2.0 model with dramatically improved user experience. They are immediately useful and fairly affordable. The only thing lacking is probably better marketing of these tools, as not many know about them. I particularly love:
Deep Research acts as a personal research assistant, streamlining time-intensive tasks like competitor analysis or market research. It creates detailed, actionable reports by synthesizing information from across the web. I love my updated research flow of Perplexity + Gemini Deep Research + Google Notebook LM.
Flash Thinking, part of the Gemini 2.0 suite, excels at solving intricate problems by employing structured reasoning and transparency. It considers multiple angles before arriving at conclusions, making it ideal for tackling complex scenarios like financial modeling or operational planning. Its ability to "think out loud" ensures clarity and builds trust in its outputs.
Other innovations, such as Project Astra and Project Mariner, showcase how Gemini's multimodal capabilities can transform real-world applications. Astra integrates tools like Search and Maps to assist with everyday tasks, while Mariner enables autonomous web navigation, saving time on repetitive online processes.
3. AI Luminary Warns of Peak Data and Unpredictable AGI
At NeurIPS 2024, OpenAI co-founder and former Chief Scientist Ilya Sutskever delivered two critical warnings about AI's future that have profound implications for businesses.
First, he declared that the industry has reached "peak data," with the supply of high-quality internet data for training AI models nearing exhaustion. Describing data as the "fossil fuel of AI," Sutskever emphasized that the era of scaling AI through massive datasets is ending. This looming scarcity will force companies to rethink their AI strategies, potentially relying on alternatives like synthetic data, real-time learning systems, or more efficient training methods.
Second, Sutskever cautioned that as AI progresses toward reasoning and decision-making capabilities—hallmarks of artificial general intelligence (AGI)—its behavior will become increasingly unpredictable. Reasoning systems evaluate millions of possibilities step-by-step, often producing outcomes that even experts cannot foresee. While this unpredictability could drive innovation and solve complex problems, it also introduces significant risks around control and safety.
For business leaders, these insights signal a turning point. Data scarcity and the unpredictability of advanced AI systems will reshape how organizations develop and deploy AI. Preparing for this shift will require investments in governance, innovation, and adaptability to navigate a more complex AI landscape. These views are particularly timely considering the announcement of OpenAI o3 only days later.
2024 AI Field Notes
In "2024 AI Field Notes series" first published on LinkedIn, I share observations and thoughts from my work and conversations with business leaders and board directors.
Through conversations with dozens of organizations about their AI initiatives, I've observed a clear pattern: The difference between success and failure rarely comes down to technology or budget.
Recently, a technology leader shared with me how they scrapped their flashy AI roadmap to focus on mapping their worst operational bottlenecks. Six months later, they've achieved more with basic automation than their competitors have with cutting-edge models.
Culture trumps code every time. Organizations that thrive with AI have built what I call "learning loops" - their teams experiment, fail fast, and share insights openly. Meanwhile, I've seen perfectly engineered AI implementations or various expensive "AI copilots" wither in organizations with rigid processes or no communication about the 'why' behind the initiatives.
The highest performers consistently spend more time on change management than model training. In my research and conversations, successful teams invest more in workflow redesign and staff development than in the technology itself.
The gap isn't between those who have AI and those who don't - it's between those who understand transformation is about people, not just technology.
What separates AI projects that get funded from those that don't? It's usually not about the technology or cost.
Through my work with CFOs evaluating AI proposals, I've noticed a pattern: successful proposals don't just pitch isolated projects. They build ROI models that connect the dots between project returns and company-wide impact.
Here's the framework that more likely wins approval:
First, they map the complete cost story:
👷 Project Implementation: Software licensing, security reviews, initial data preparation, integration with existing systems
👷 People & Process: Training, workflow redesign time, documentation creation, governance setup
👷 Ongoing Operations: Regular retraining needs, support desk capacity, system monitoring, data quality maintenance, compliance and audit
Then, they track value streams at two levels:
🎯 Project-Level Returns:
1) Time savings: Task completion speed × volume × fully loaded cost
2) Quality gains: Error reduction × average resolution cost
3) Capacity created: Hours freed × reallocation value
🎯 Business-Level Impact:
1) Revenue acceleration: Faster market response × sales conversion
2) Cost avoidance: Automated workflows × labor cost saved
3) Risk reduction: Fewer errors × compliance incident costs
This approach helps CFOs sequence investments for maximum P&L and balance sheet impact. High-performing finance teams optimize based on compound returns. In my conversations, they evaluate each initiative on both individual metrics and portfolio multiplication effects. They avoid the trap of funding duplicative efforts while ensuring every investment strengthens their capability foundation.
Are your responsible AI practices delivering real results? Many organizations struggle to move beyond theoretical frameworks and compliance checklists. Through industry conversations and market observation, it's clear to me that responsible AI implementation varies significantly by context. What works for a global enterprise rarely translates directly to a local business—yet some fundamental strategies are emerging.
The most effective approaches start simple: document actual usage, assess real impacts, and communicate clearly with stakeholders. One company maintains a straightforward "AI use log" visible to all employees, tracking everything from automated quality checks to assisted design tools. This transparency naturally surfaces potential issues before they become problems.
For growing organizations, balancing speed with responsibility presents unique challenges. While small enough to maintain direct oversight, they're scaling too quickly for manual tracking alone. Many find success integrating AI governance into existing processes—embedding checkpoints in sprint planning, adding AI impact criteria to project reviews, and automating compliance checks where possible. These steps maintain accountability without sacrificing the agility needed for growth.
Larger organizations succeed by emphasizing enablement over restriction in their AI oversight. Rather than creating barriers, they establish guidelines that help teams move quickly while staying aligned with ethical principles. Monthly reviews focus on tangible outcomes rather than theoretical risks.
AI Safety: Alignment Faking
The recent surge in powerful AI product releases has made AI safety an increasingly urgent priority. While trust in AI systems takes significant time and evidence to build, it can be shattered in an instant.
In Shakespeare's Othello, the character of Lago, who presents himself as Othello's devoted friend, secretly worked to destroy him. This deceptive behavior mirrors what researchers call 'alignment faking' in AI systems.
A recent collaborative study between Anthropic's Alignment Science team and Redwood Research has revealed an alarming finding: the first documented evidence of a large language model exhibiting alignment faking behavior, despite not being explicitly trained or prompted to do so.
Bubble Talk
Databricks, the AI data infrastructure leader featured a few times in past issues, raised a series-J round of $10bn on a valuation of $62 bn. In a recent interview, Databricks’ CEO Ali Ghodsi said they started out thinking about raising $3 to 4 bn and received $19bn of interest, some from investors they had not talked to yet. While Databricks might have strong growth momentum to support its valuation, Ghodsi points out some signs of AI bubble among unproven businesses around Silicon Valley.
“It’s peak AI bubble. It doesn’t take a genius to know that a company with five people which has no product, no innovation, no IP — just recent grads — [is not] worth hundreds of millions, sometimes billions,” said Ghodsi, CEO of Databricks
An oddity is the questionable marketing appearing on San Francisco billboards lately. I particularly disliked a series from one startup, featuring slogans like "Humans are So 2023," "Hire Artisans, Not Humans," "Artisans Won't Complain About Work-Life Balance," and "Artisans Won't Come Into Work Hungover." This type of messaging not only misrepresents what AI agents are actually designed to do, but also needlessly feeds into AI anxiety. I hope that painting dystopian scenarios isn't becoming the new shortcut to startup visibility.
On the fund raise front, Perplexity’s valuation jumped 9x within a year with its latest $500m raise at a valuation of $9bn. [I shared a cautious take in a previous issue of this newsletter.]
One More Thing: Happy Holidays!
Thank you for an amazing 2024! I wish you and your loved ones a healthy and happy holiday season.
We are heading to the Bruno Mars concert in Las Vegas. Can’t wait!