Substack has become one of the most important signals in the independent media and tech writing space. If you want to monitor newsletters in a niche, track how often writers publish, analyze engagement across publications, or build a database of Substack authors for outreach — you need programmatic access to newsletter data.
Substack does not offer a public API. Getting post data at scale requires working around rate limits, handling JavaScript-rendered pages, and managing sessions cleanly. This post covers the practical approach.
Substack pages load content dynamically and implement rate limiting that kicks in quickly under automated access patterns. Several data points — like subscriber counts — are not publicly visible and require authenticated sessions to access.
The rate limit problem: Substack throttles unauthenticated requests aggressively. A nave scraper pulling posts from multiple publications sequentially will hit 429 errors within minutes. Reliable extraction requires session management, respectful pacing, and retry logic built into the request layer.
Beyond rate limiting, each publication has a different URL structure, posts can be free or paywalled, and extracting engagement data (likes, comments) requires handling different API endpoint patterns across publications.
We built and maintain a Substack Scraper on Apify that handles session management, rate limiting, and data normalization. You provide publication URLs; it returns structured JSON with post data and engagement metrics.
{
"publicationUrls": [
"https://example.substack.com",
"https://another.substack.com"
],
"maxPostsPerPublication": 50,
"includePaywalledPosts": false
}
Each post returns a clean structured object:
{
"title": "The State of Developer Tools in 2026",
"subtitle": "A look at where the ecosystem is heading",
"slug": "the-state-of-developer-tools-2026",
"url": "https://example.substack.com/p/the-state-of-developer-tools-2026",
"publishedAt": "2026-03-18T10:00:00Z",
"author": "Jane Smith",
"publicationName": "Dev Dispatch",
"likes": 412,
"commentCount": 38,
"wordCount": 2400,
"isFreePost": true
}
| Field | Type | Description |
|---|---|---|
title | string | Post headline |
subtitle | string | Post subtitle or deck |
url | string | Full post URL |
publishedAt | string | ISO 8601 publish timestamp |
author | string | Writer name |
publicationName | string | Newsletter name |
likes | integer | Like count at time of scrape |
commentCount | integer | Number of comments |
wordCount | integer | Approximate post length |
isFreePost | boolean | Whether post is publicly accessible |
Output is available as JSON, CSV, or XLSX. Runs can be scheduled on Apify to monitor publications continuously and pipe new posts into downstream pipelines.
The actor uses Pay Per Event pricing at $0.005 per post.
| Volume | Cost |
|---|---|
| 100 posts | $0.50 |
| 500 posts | $2.50 |
| 10 publications × 50 posts each | $2.50 |
| Daily monitoring (10 pubs) × 30 days | ~$1.50/month |
Apify has a free tier for testing. Sign up here if you do not have an account. The actor connects directly to Apify’s scheduling and webhook APIs, so you can trigger runs automatically and push results to your data pipeline without managing infrastructure.