Evaluating MCP Shopping Agents: Why Tool Design Beats Model Scale
We ran hundreds of shopping agent conversations across eight MCP routes and four model sizes. The bottleneck was never the model; it was the interface.
A blend of the finest ingredients to always hit the spot.
From machine learning researchers to engineers, we're all fermenting ideas into reality—one dill-icious innovation at a time.
Two timezones, one kitchen. London ships product, Cape Town ships papers.
Open-notebook research. Methods, confidence intervals, and the experiments that didn't work.