Hacker News is one of the most signal-dense feeds on the internet. If you want to track what the technical community is paying attention to, monitor job postings from YC-backed companies, analyze technology trends over time, or build a content pipeline from a high-quality source, HN data is uniquely valuable.
Getting that data reliably, at scale, and without rate-limiting headaches is a different story. This post covers the use cases, the constraints, and the fastest path to structured HN data in 2026.
The use cases span a wide range of teams and applications:
HN does have a public API at hacker-news.firebaseio.com. It is free and requires no authentication. The catch is how it works: it exposes individual item endpoints (one per story, one per comment) and a list of current top story IDs, but provides no bulk access, no filtering, no search, and no pagination beyond the top 500 items.
To get the top 100 stories with full metadata, you need to:
For ad-hoc use this is fine. For scheduled pipelines pulling full comment trees, historical data, or job thread contents, the per-item architecture means hundreds or thousands of requests per run. The API has no documented rate limits but throttles aggressively under load, and Firebase connections add latency that compounds across bulk requests.
The real bottleneck: A single "Who is Hiring?" thread can contain 1,000+ top-level comments, each requiring a separate API call. Fetching a complete thread reliably takes minutes of carefully paced requests, error handling, and deduplication logic.
We built and maintain an HN Top Stories scraper on Apify that handles the Firebase pagination, rate limiting, and data normalization for you. You configure what you want; it returns clean structured JSON.
{
"listType": "topstories",
"maxItems": 100,
"includeComments": false
}
Supported listType values: topstories, newstories, beststories, askstories, showstories, jobstories.
Each story returns a clean object:
{
"id": 43812045,
"title": "Show HN: I built a local-first sync engine for SQLite",
"url": "https://github.com/example/sync-engine",
"score": 412,
"by": "username",
"time": 1745401200,
"descendants": 87,
"type": "story",
"hnUrl": "https://news.ycombinator.com/item?id=43812045"
}
| Field | Type | Description |
|---|---|---|
id | integer | HN item ID |
title | string | Story title |
url | string | External link (null for Ask HN) |
score | integer | Upvote count at time of scrape |
by | string | Submitter username |
time | integer | Unix timestamp of submission |
descendants | integer | Total comment count |
type | string | story / job / ask / show |
hnUrl | string | Direct HN discussion link |
Output is available as JSON, CSV, or XLSX from the Apify platform. Runs can be scheduled (hourly, daily, weekly) to power automated pipelines without any infrastructure on your end.
The actor uses Pay Per Event pricing at $0.005 per story. The math is simple:
| Volume | Cost |
|---|---|
| Top 100 stories | $0.50 |
| 500 stories | $2.50 |
| Daily run × 30 days (100 stories/day) | $15/month |
For a daily digest or monitoring pipeline, that is a trivially small infrastructure cost compared to maintaining your own Firebase polling service with retry logic and error handling.
HN Top Stories Scraper on Apify →
Apify has a free tier for testing. Sign up here if you do not have an account. The actor connects directly to Apify\'s scheduling and storage APIs, so you can build automated pipelines without managing any additional infrastructure.