← All posts

How to Collect Twitter/X Data in 2026 (Without Getting Blocked)

April 22, 2026 · 8 min read
Contents Why Twitter data is still worth collecting Why it's so hard in 2026 What you can actually get Using the Twitter Scraper actor Input configuration Output structure Pricing vs official API Use cases

Twitter/X still has somewhere around 600 million monthly active users. That's a lot of real-time public opinion, brand mentions, market signals, and research data. The problem is that getting systematic access to that data in 2026 is genuinely painful - either expensive through the official API or fragile if you try to roll your own solution.

This post covers what data you can collect, why doing it yourself is harder than it looks, and how to use a managed scraper actor to skip the hard parts.

Why Twitter data is still worth collecting

Despite everything - the rebranding, the API lockdown, the general chaos since 2022 - X remains the primary real-time public conversation platform. A few specific use cases where it's genuinely irreplaceable:

The data is genuinely useful. The access problem is what's frustrating.

Why it's so hard in 2026

The official API pricing is the obvious starting point. Here's the current tier breakdown:

Tier Price Read access
Free $0/mo Write-only. No reads.
Basic $100/mo 10,000 tweets/month
Pro $5,000/mo 1M tweets/month, full archive search
Enterprise $42,000+/mo Negotiated volume

For context: the 2022 standard API gave you around 500,000 tweets/month for free. The Basic tier today at 10,000 tweets is roughly 50x less data for $1,200/year. The jump from Basic to Pro is $4,900/month for the privilege of getting meaningful volume.

So most people look at alternatives. And that's where it gets complicated.

X has invested heavily in bot detection over the past two years. The signals they track go well beyond simple rate limiting - session behavior, browser fingerprinting, account age, interaction patterns. If you're trying to build your own scraper today, you're fighting an adversarial system that's actively maintained and updated. What worked six months ago probably doesn't work now.

Even if you get the technical side right, there's the operational overhead: maintaining multiple accounts, rotating sessions, handling CAPTCHAs, monitoring for blocks, keeping up with site changes whenever X deploys frontend updates. It's a full-time maintenance job for something that's supposed to be infrastructure.

Note on terms of service: Automated data collection from X is governed by their developer policy. Review the X Developer Agreement for your specific use case before collecting data at scale.

What you can actually get

Assuming you have working access, here's what's available from public X data:

Tweet data

Profile data

Search and discovery

Using the Twitter Scraper actor

The Twitter Scraper on Apify handles the browser automation, session management, and anti-detection layer so you don't have to. You configure what you want to collect, run the actor, get structured JSON back.

It runs on Apify's infrastructure - no local setup, no proxies to manage, no maintenance when X changes their frontend. The actor gets updated when things break.

Input configuration

The actor takes a JSON input. Here's a typical configuration for collecting tweets by search query:

{
  "searchTerms": [
    "#bitcoin",
    "ethereum price",
    "from:elonmusk"
  ],
  "maxItems": 500,
  "tweetLanguage": "en",
  "onlyVerifiedUsers": false,
  "includeUserInfo": true,
  "dateFrom": "2026-04-01",
  "dateTo": "2026-04-22",
  "sortBy": "Latest"
}

For collecting a specific user's timeline:

{
  "usernames": [
    "OpenAI",
    "AnthropicAI",
    "karpathy"
  ],
  "maxTweetsPerUser": 200,
  "includeReplies": false,
  "includeRetweets": true
}

For hashtag monitoring:

{
  "searchTerms": ["#webdev", "#buildinpublic"],
  "maxItems": 1000,
  "sortBy": "Latest",
  "includeUserInfo": true
}

The maxItems controls your cost - lower it when testing, raise it for production runs. sortBy can be "Latest" for chronological or "Top" for engagement-ranked results.

Output structure

Each tweet comes back as a structured JSON object. Here's what a typical output record looks like:

{
  "id": "1782341928374651392",
  "text": "just shipped a new feature for our data pipeline - cuts processing time by 40%. small win but these add up",
  "createdAt": "2026-04-22T09:14:23.000Z",
  "author": {
    "id": "38294756",
    "username": "dataengineerX",
    "displayName": "Data Engineer",
    "followersCount": 4821,
    "followingCount": 312,
    "verified": false,
    "profileImageUrl": "https://pbs.twimg.com/profile_images/..."
  },
  "metrics": {
    "likes": 47,
    "retweets": 8,
    "replies": 3,
    "views": 1204,
    "bookmarks": 12
  },
  "hashtags": [],
  "urls": [],
  "language": "en",
  "isRetweet": false,
  "isReply": false,
  "media": []
}

If you enabled includeUserInfo, every tweet includes the full author object. If a tweet has media, the media array contains image/video URLs and metadata. Hashtags and cashtags are extracted and returned as arrays - useful for filtering without doing string parsing yourself.

The output goes to Apify's dataset storage by default. You can export it as JSON, CSV, or JSONL, or pull it via the Apify API directly into your pipeline.

Pricing vs official API

Here's how the numbers compare for typical use cases:

Need Official X API Twitter Scraper (Apify)
10,000 tweets/month $100/mo (Basic) ~$5-15/mo
100,000 tweets/month $5,000/mo (Pro) ~$50-150/mo
1M tweets/month $5,000/mo (Pro, at limit) ~$500-1,000/mo
Full archive search $5,000/mo minimum Depends on volume
Real-time streaming Enterprise only Not available

The actor makes most sense in the 10k-500k tweets/month range - where the official API is either too expensive or too limited. At very high volumes (millions of tweets), the official API becomes more competitive if you can afford Pro tier.

One thing to factor in: the official API gives you access to real-time streaming and the full archive search back to 2006 at Pro/Enterprise. The scraper approach gets you recent data well, but historical data collection at scale is slow. If you need tweets from 2019, the official API is the right tool - just expensive.

Getting started: Apify has a free tier with $5 monthly platform credit. That's enough to run test jobs and validate your data pipeline before committing to a paid plan. The Twitter Scraper actor is available at apify.com/cryptosignals/twitter-scraper.

Use cases and practical patterns

Brand monitoring pipeline

Run the actor on a daily cron job searching for your brand name and key product terms. Pull the results into your database, flag anything with negative sentiment keywords, route to Slack. You can have this running in a few hours without maintaining any scraping infrastructure.

Research dataset collection

For academic work - define your search terms and date range, run once, export to CSV or JSONL. The structured output means no parsing work on your end. You get clean tweet text, engagement metrics, and author data ready for analysis.

Competitor monitoring

Set up user timeline collection for competitor accounts. Track what they're announcing, what their customers are replying, how engagement trends over time. The data is public - this is standard competitive intelligence work.

Crypto/finance signals

Collect tweets mentioning specific tickers or projects, run sentiment analysis, feed into a trading signal pipeline. The volume and recency controls let you tune how much data you're processing per run.

Twitter data collection in 2026 has a real cost - either in money if you use the official API, or in maintenance overhead if you build your own. A managed actor lands in the middle: you pay for compute rather than API access, and someone else handles the infrastructure upkeep. For most projects in the 10k-500k tweet/month range, that's the sensible trade-off.

📚 Free Resource

Want to master web scraping end-to-end? The Complete Web Scraping Playbook 2026 covers proxies, anti-bot bypass, data pipelines, and selling data — all in one PDF guide.

Get the Playbook — $9 →