Web Data Labs › Blog › Hacker News Scraper

Scraping Hacker News Data in 2026 (Top Stories, Jobs, Comments)

April 24, 2026 · 5 min read

Hacker News is one of the most signal-dense feeds on the internet. If you want to track what the technical community is paying attention to, monitor job postings from YC-backed companies, analyze technology trends over time, or build a content pipeline from a high-quality source, HN data is uniquely valuable.

Getting that data reliably, at scale, and without rate-limiting headaches is a different story. This post covers the use cases, the constraints, and the fastest path to structured HN data in 2026.

Why people scrape Hacker News

The use cases span a wide range of teams and applications:

Tech trend monitoring — HN surfaces emerging technologies, tools, and papers days or weeks before they reach mainstream coverage. Teams building intelligence pipelines use HN as an early-warning system.
Job market analysis — The monthly "Who is Hiring?" threads contain structured job data from thousands of companies. Researchers and job aggregators extract these for salary benchmarking, remote work trend analysis, and hiring signal tracking.
Content curation — Newsletters, Slack bots, and internal digests that surface top developer content programmatically pull from HN to filter by score, comment count, or domain.
Competitive intelligence — Tracking when your product, competitor, or industry appears on HN and how the community responds is valuable brand and product signal.
Research datasets — Academic and industry researchers studying online communities, information diffusion, or developer discourse use HN comment threads as a corpus.

The official API and its limits

HN does have a public API at hacker-news.firebaseio.com. It is free and requires no authentication. The catch is how it works: it exposes individual item endpoints (one per story, one per comment) and a list of current top story IDs, but provides no bulk access, no filtering, no search, and no pagination beyond the top 500 items.

To get the top 100 stories with full metadata, you need to:

Fetch the top stories list (one request → 500 IDs)
Fetch each item individually (100 requests)
Handle rate limiting and retries yourself

For ad-hoc use this is fine. For scheduled pipelines pulling full comment trees, historical data, or job thread contents, the per-item architecture means hundreds or thousands of requests per run. The API has no documented rate limits but throttles aggressively under load, and Firebase connections add latency that compounds across bulk requests.

The real bottleneck: A single "Who is Hiring?" thread can contain 1,000+ top-level comments, each requiring a separate API call. Fetching a complete thread reliably takes minutes of carefully paced requests, error handling, and deduplication logic.

The practical solution: a ready-made HN scraper

We built and maintain an HN Top Stories scraper on Apify that handles the Firebase pagination, rate limiting, and data normalization for you. You configure what you want; it returns clean structured JSON.

Input

{
  "listType": "topstories",
  "maxItems": 100,
  "includeComments": false
}

Supported listType values: topstories, newstories, beststories, askstories, showstories, jobstories.

Output

Each story returns a clean object:

{
  "id": 43812045,
  "title": "Show HN: I built a local-first sync engine for SQLite",
  "url": "https://github.com/example/sync-engine",
  "score": 412,
  "by": "username",
  "time": 1745401200,
  "descendants": 87,
  "type": "story",
  "hnUrl": "https://news.ycombinator.com/item?id=43812045"
}

What you get per story

Field	Type	Description
`id`	integer	HN item ID
`title`	string	Story title
`url`	string	External link (null for Ask HN)
`score`	integer	Upvote count at time of scrape
`by`	string	Submitter username
`time`	integer	Unix timestamp of submission
`descendants`	integer	Total comment count
`type`	string	story / job / ask / show
`hnUrl`	string	Direct HN discussion link

Output is available as JSON, CSV, or XLSX from the Apify platform. Runs can be scheduled (hourly, daily, weekly) to power automated pipelines without any infrastructure on your end.

Pricing

The actor uses Pay Per Event pricing at $0.005 per story. The math is simple:

Volume	Cost
Top 100 stories	$0.50
500 stories	$2.50
Daily run × 30 days (100 stories/day)	$15/month

For a daily digest or monitoring pipeline, that is a trivially small infrastructure cost compared to maintaining your own Firebase polling service with retry logic and error handling.

Try it

HN Top Stories Scraper on Apify →

Apify has a free tier for testing. Sign up here if you do not have an account. The actor connects directly to Apify\'s scheduling and storage APIs, so you can build automated pipelines without managing any additional infrastructure.