Pizza, Drinks, and World Models @ ICLR
There are two things you can count on when you put a bunch of AI researchers in a room with pizza and wine: (1) debates get loud, and (2) nobody agrees on what a “world model” really is.
At our informal ICLR meetup, between slices of Da Michele’s legendary pizza, we found ourselves arguing whether world models should be compact simulators of physics or sprawling networks of concepts. One camp said: “keep them lean, like a chess engine — grounded in prediction, not bloated with theory.” The other countered: “without structure, they’re just pattern matchers with no real grasp of causality.” Neither side backed down, and I suspect the debate will outlive the last bottle of wine we opened.
For me, the night reinforced a theme I'd been mulling over for a while: sight isn’t enough. Recognition is not comprehension. Pixels don’t make opinions. If we want models that can truly reason, anticipate, and respond, they need informed, opinionated perspectives — not just sharper eyes. That’s the chasm we have to cross, whether we’re talking robotics, AR, or good old-fashioned computer vision benchmarks.
The best part? These conversations happened without a podium, without a timer, and with tomato sauce flying dangerously close to laptops. Sometimes the most important research questions don’t emerge in peer-reviewed papers — they surface between sips of wine and bites of pizza, when people are relaxed enough to argue what they really think.




