AEO Budget Benchmark 2026: 11% of Marketing Spend, Climbing Fast
Causal Impact, GeoLift, and ZIP-level holdouts give marketing leaders the first defensible answer to whether AEO investment actually moves revenue.
By Tessa Wright, Enterprise & Revenue · May 26, 2026
ZIP-code geo experiments using Google Causal Impact, Meta GeoLift, Eppo, and Statsig give CFOs defensible proof that AEO investment lifts revenue. Here is the methodology.
Frequently Asked Questions
What is a geo experiment for AEO and why use ZIP codes?
A geo experiment for AEO splits a region into matched test and control geographies, applies the AEO intervention (citation work, local schema, llms.txt, content push) to test markets only, then compares outcomes against the synthetic counterfactual built from control markets. ZIP codes are the right unit for local AEO because they roughly map to LLM grounding behavior in tools like ChatGPT search and Perplexity, they are small enough to give a large sample of geographies, and they tie cleanly to most CRM and ad-platform location fields. For national-brand AEO, DMAs (210 in the US) are often a better unit because of higher per-unit volume and lower noise. The output is a defensible point estimate of incremental revenue or sessions, with credible intervals, that survives CFO scrutiny.
How does Google Causal Impact differ from a regular A/B test?
Google's Causal Impact R package, released in 2014 by Kay Brodersen and colleagues at Google Research, fits a Bayesian structural time-series model to pre-intervention control data, then projects what the test market would have done absent the intervention. The difference between observed and projected is the causal effect, with full posterior credible intervals. Unlike a standard A/B test, Causal Impact does not require user-level randomization, which is impossible for AEO because LLM citation behavior is not user-randomizable. It works for organic channels, brand marketing, and any intervention you cannot randomize at the click level. The tradeoff is that the inference is only as good as the control series, which is why matched-market selection matters more than the model choice itself.
Can I run a geo experiment on a tight budget without Eppo or Statsig?
Yes. The open-source stack is sufficient for most operators. Install the CausalImpact R package (or its Python port), pull daily revenue and sessions by ZIP or DMA from your warehouse, choose 5 to 10 matched test markets and 20 to 40 control markets using pre-period correlation, and treat one geo with the AEO intervention for at least four weeks. Meta's GeoLift R package adds power analysis and market selection automation and is also free. Paid platforms like Eppo and Statsig add multi-team workflow, automated power calculations, and PR-grade reporting; they justify their cost above roughly $10M ARR or for teams running more than four concurrent experiments. Below that scale, the open-source path delivers identical statistical rigor at zero license cost.
How long does a ZIP-code AEO geo experiment need to run?
Plan for a six-to-twelve-week test window with a four-to-eight-week stable pre-period for model fitting. Four weeks is the practical minimum for treatment because LLM citation indexes lag content publication by 7 to 21 days for most major engines, and you need at least two stable post-citation weeks for the conversion data to settle. Underpowered tests that run two weeks are the most common mistake we see; they almost always fail to reject the null even when the intervention worked. Run a power analysis upfront using GeoLift's power simulator or Causal Impact's posterior predictive check. The minimum detectable effect at the geo level is typically 8 to 15 percent lift, which is meaningful for local-AEO work but too coarse to detect 2 to 3 percent changes.
What metrics should I measure in an AEO geo experiment?
Three layers. Top-of-funnel: branded search volume per geo (Google Trends or paid Glimpse data), direct traffic, and citation share-of-voice tracked by Profound, Otterly, or Peec. Mid-funnel: organic sessions, AI-referred sessions (from utm and referrer parsing for OpenAI, Anthropic, Perplexity), and lead-form submits. Bottom-funnel: pipeline created, opportunities, and closed-won revenue tied to geo via CRM billing-state or shipping-ZIP field. The Causal Impact model is run separately for each metric, and you typically expect lifts to compound down the funnel with longer lag. For local-AEO work, store visits via Google Business Profile insights and Apple Business Connect actions are also worth including, since LLM-driven discovery often resolves in offline foot traffic that web analytics cannot capture.
Related Articles
Topics: AEO geographic causal impact, geo experiment ai search, causal impact, GeoLift, AEO measurement
Browse all articles | About Signal