Bluesky crossed 30 million registered users in early 2026 and continues to grow as the decentralized alternative to Twitter/X. For researchers, marketers, and data engineers, that growth makes Bluesky increasingly relevant — it is an active platform with real engagement, a developer-friendly architecture, and public data accessible without the $42K/year price tag Twitter now charges for API access.
The AT Protocol — the open standard Bluesky runs on — does expose public APIs. But like every public API meant for individual app use, it enforces rate limits that make bulk data collection impractical. The standard rate limit is approximately 3,000 requests per 5-minute window per IP. That sounds generous until you try to collect posts at scale: a single search query for a moderately active keyword might require hundreds of paginated requests just to retrieve the past 24 hours of results. Add profile lookups, follower graphs, and repost chains, and you exhaust that budget quickly.
For one-off lookups — checking a single profile, grabbing the latest 20 posts on a topic — the AT Protocol API works fine. For anything resembling a research dataset, a brand monitoring pipeline, or a competitive intelligence workflow, the rate limits create a hard ceiling that the API simply cannot work around.
A well-built scraper that operates at the HTTP layer — rather than through the API — sidesteps the request-count ceiling while staying within the bounds of Bluesky's public data. That is what the actor described in this post does.
Track mentions of your brand, product name, or competitors across Bluesky in near-real time. Unlike Twitter, Bluesky's public feed is genuinely public — no authentication required to read posts. A monitoring pipeline that runs the actor on a schedule gives you a daily snapshot of brand sentiment, emerging complaints, and organic advocacy without manual search.
Bluesky is increasingly used by researchers, journalists, and policy professionals who left Twitter after the API paywall. Social media researchers who previously relied on the Twitter Academic API have migrated to Bluesky as a more accessible alternative. Bulk post collection by keyword, hashtag, or user cohort enables the same kinds of discourse analysis, network mapping, and temporal trend studies that drove a decade of Twitter-based research.
Profile data — follower count, following count, post frequency, engagement patterns — gives marketers a quantitative baseline for influencer identification and competitor benchmarking. Collecting this data at scale across a list of accounts provides the kind of comparative analysis that manual browsing cannot produce efficiently.
Identifying what content performs well on Bluesky — which posts get reshared, which topics spike in a given week — requires data before it requires judgment. A bulk collection of posts filtered by keyword or engagement threshold gives content teams the raw material to spot emerging topics before they peak.
user.bsky.social) and DID identifierThe actor accepts a simple JSON input. The three most common collection modes are keyword search, profile scraping, and user feed collection. Examples:
{
"query": "AI agents",
"maxResults": 500,
"type": "posts",
"sort": "latest"
}
To collect profile data for a list of handles:
{
"type": "profiles",
"handles": [
"atproto.com",
"jay.bsky.team",
"pfrazee.com"
]
}
To collect the recent posts from a specific user:
{
"type": "feed",
"handle": "pfrazee.com",
"maxResults": 200
}
All three modes return results to the same Apify dataset, formatted consistently as JSON records.
Each post is returned as a flat JSON object. Here is a representative record from a keyword search run:
{
"uri": "at://did:plc:abc123/app.bsky.feed.post/xyz789",
"url": "https://bsky.app/profile/alice.bsky.social/post/xyz789",
"text": "The shift to decentralized social is real. Bluesky crossed 30M users and it's actually good.",
"authorHandle": "alice.bsky.social",
"authorDisplayName": "Alice Chen",
"authorDid": "did:plc:abc123xyz",
"likeCount": 142,
"repostCount": 38,
"replyCount": 17,
"indexedAt": "2026-04-20T14:32:11.000Z",
"lang": "en",
"hasImages": false,
"hasExternalLink": false,
"replyTo": null
}
Results stream into an Apify dataset as the run progresses. You can export to JSON, CSV, or JSONL, or push results to a webhook, Google Sheets, S3, or any downstream tool Apify integrates with.
A few things worth knowing before you run a large collection job:
No code required. The actor runs on Apify's managed infrastructure — you provide the input, Apify handles the execution, and results land in a structured dataset you can export or connect downstream.
posts, profiles, or feed), enter your query or handles, set maxResultsFor recurring pipelines — daily keyword monitoring, weekly profile snapshots, monthly trend reports — Apify's built-in scheduler runs the actor on your chosen interval without any infrastructure management. Set it once, get data on a schedule.
If you need Bluesky data integrated into a larger pipeline (a data warehouse, a BI tool, a CRM), Apify's output integrations cover Google Sheets, Zapier, Make, S3, and webhooks without custom code.
Bluesky's 30M+ user base and genuinely public data model make it one of the more accessible social platforms for large-scale data collection in 2026. The AT Protocol APIs are well-documented and free — but their rate limits (~3,000 requests per 5-minute window, IP-based) make bulk collection impractical without managed proxy infrastructure and careful request pacing.
For teams that need Bluesky posts, profiles, or search results at scale without building and maintaining that infrastructure, the Bluesky Scraper on Apify handles rate management, retries, and proxy distribution automatically. It returns clean JSON covering posts (text, engagement counts, timestamps, author metadata), profiles (bio, follower/following counts, post history), and keyword search results — ready to export or connect downstream.
Whether you are doing academic research, brand monitoring, content trend analysis, or competitive intelligence, it is a practical starting point that skips the infrastructure work entirely. Start a free run on Apify →