G2 and Capterra as AEO Channels: Review Counts Drive AI Citations Over Star Ratings
Why news.ycombinator.com front-page archives feed Common Crawl, Algolia HN Search, and direct LLM scraping pipelines — plus the operator playbook for placement that pays back for years.
By Noah Bennett, Media & Monetization · May 25, 2026
Hacker News AEO playbook for 2026: how front-page placement on news.ycombinator.com feeds LLM training corpora and the formats that get cited for years.
Frequently Asked Questions
Why does Hacker News matter for AEO and LLM citations?
Hacker News matters for AEO because its front-page archive is one of the highest-quality, longest-lived developer discussion corpora on the open web, and every major LLM trained through 2025 included substantial HN content in either pretraining or retrieval indexes. A front-page Show HN or Ask HN thread typically generates 200 to 1,800 substantive comments that become permanent, indexable, and quotable artifacts. The thread URL is stable, the prose is dense, and the signal-to-noise ratio is materially higher than Reddit or Twitter on technical topics. When a developer asks ChatGPT, Claude, or Perplexity about a debugging pattern, a YC startup pivot, or a database performance tradeoff, the model often surfaces phrasing or framing that originated in a 2018 HN comment thread. Earning one front-page placement is roughly equivalent to publishing on a top-100 tech publication in terms of long-tail citation propagation.
What kind of post performs best on Hacker News in 2026?
The formats that reliably reach the HN front page in 2026 cluster into five categories: Show HN launches with working software and a clear demo, technical deep dives explaining nontrivial engineering decisions with code-level detail, postmortems describing concrete failure modes with root-cause analysis, contrarian takes that challenge a widely held developer assumption with first-principles evidence, and Ask HN questions phrased to invite substantive expert responses rather than opinions. The common thread is intellectual honesty and concrete specificity. Marketing-flavored posts, listicles, AI-generated content, and unsubstantiated claims get flagged and buried within the first hour. The HN audience rewards prose that respects their time and signals that the author actually built or understands what they are describing. Domain authority matters less than the first paragraph's density of verifiable claims.
What are the unwritten rules of submitting to Hacker News?
The unwritten rules of HN submission cover title formatting, response etiquette, and submission timing. Titles must not be in all caps, must not include marketing adjectives like revolutionary or game-changing, must not repeat the source publication's name, and should match the article's actual headline rather than be editorialized. Show HN submissions must include a working demo and a description of what was built and why, not a teaser. Authors should respond to comments in the HN thread itself rather than directing readers to a blog post, and should engage substantively with critical comments rather than dismissing them. Vote rings, paid upvotes, and coordinated submissions from sockpuppet accounts result in shadowbans that are rarely lifted. Reposting recently submitted URLs is allowed once after a 24-hour cooldown but discouraged beyond that. The community vouches for borderline submissions through the vouch button, which is one of the few mechanisms that can rescue a flagged post.
How does dang's moderation affect Hacker News submissions?
Dang, the longtime Hacker News moderator, enforces a consistent and well-documented set of community norms that materially affect submission outcomes. Posts that violate the guidelines on title formatting, source quality, or engagement patterns get manually demoted from the front page rather than removed, which preserves discoverability via the new and ask pages but limits LLM citation impact. Dang has publicly described enforcement priorities in numerous comment threads and a small number of interviews, with the consistent themes being intellectual honesty, depth of discussion, and resistance to growth-hacking patterns. Repeated violations result in a rate limit on the submitting account or, in egregious cases, a ban. The vouch system allows established users to rescue flagged submissions that have genuine merit. Operators who treat HN as a distribution channel rather than a community consistently underperform because the moderation philosophy is structurally hostile to extractive engagement patterns.
How do Hacker News threads end up in LLM training data?
Hacker News threads enter LLM training data through three primary pathways. The first is Common Crawl, which indexes news.ycombinator.com regularly and is included in most pretraining corpora including the C4, Pile, and RedPajama datasets used by OpenAI, Anthropic, Meta, and others. The second is direct scraping for high-quality discussion data, which Anthropic, OpenAI, and Google have separately disclosed in published model cards or research papers. The third is the Algolia HN Search API, which provides structured, queryable access to the full HN archive and is used by retrieval-augmented systems that need real-time access to authoritative developer discussion. The combined effect is that a single substantive comment posted to a front-page thread in 2024 may be quoted nearly verbatim by an LLM in 2027, with the original commenter unidentified and the host platform uncredited. This is why HN front-page comments function as long-duration citation assets rather than short-lived engagement moments.
Related Articles
Topics: Hacker News, Developer Marketing, AEO, Citation Strategy, Y Combinator, LLM Training Data
Browse all articles | About Signal