Web Data Labs › Blog › SEC EDGAR Scraper

How to Scrape SEC EDGAR Filings in 2026 (No API Key Required)

May 2, 2026 · 6 min read

The SEC EDGAR system holds more than 35 million filings spanning every U.S. public company and every regulated investment vehicle since the early 1990s. 10-Ks, 10-Qs, 8-Ks, S-1s, proxy statements, insider transactions, fund holdings, M&A disclosures — the full universe of corporate disclosure required by U.S. securities law lives there, free for anyone to read. For investment researchers, quantitative analysts, compliance teams, M&A trackers, and journalists, EDGAR is the closest thing to a primary source the public markets have. The catch is that EDGAR was designed for human browsing, not bulk extraction. Pulling structured filing metadata out of it at scale is a real engineering problem, even though the underlying data is fully public.

This post walks through why EDGAR data is so valuable, what makes it tricky to collect cleanly, and how to extract structured filing records across companies, forms, and date ranges without writing or maintaining your own pipeline.

Why SEC EDGAR data matters

Investment research and fundamental analysis — Annual 10-Ks and quarterly 10-Qs are the canonical source for financial statements, risk factors, segment breakdowns, and management commentary. Building a comparable cross-company dataset means systematically pulling the most recent filings for an entire universe of tickers, not clicking through one company at a time.
Event-driven trading and signal generation — 8-K filings disclose material events — executive departures, acquisitions, earnings, restatements, debt issuance — within four business days of the event. Latency-sensitive desks watch the EDGAR feed continuously; even slower-moving discretionary funds want a clean historical 8-K archive to backtest event-driven hypotheses.
Competitive and M&A intelligence — S-1 and S-4 filings around IPOs and mergers are dense with strategic detail: customer concentration, pricing models, key partnerships, named competitors, churn metrics. Tracking new S-1 filings across a sector gives you a real-time view of who is going public and how they describe their market.
Compliance and risk monitoring — Auditors, regulators, and counterparty risk teams need to track filings of specific entities or filing types continuously. A clean filing-level feed with filed dates and document links is the backbone of any disclosure-monitoring pipeline.
Academic and journalism research — Researchers studying disclosure timing, executive compensation, related-party transactions, or environmental and governance language need historical filing datasets at the form level. Building those datasets without EDGAR access at scale is effectively impossible.

What makes SEC EDGAR hard to work with

EDGAR is a public system, and at small volumes you can simply browse it. The problems start the moment you want a clean filing-level dataset across many companies, forms, and years.

Identifier resolution and pagination depth: Companies are identified by CIK (Central Index Key), not ticker, and a single legal entity can have multiple historical CIKs through reorganisations and name changes. Mapping “Apple Inc” or “TSLA” to the right CIK and pulling its full filing history requires resolving the entity, paginating through filings going back decades for older issuers, and stitching results together cleanly. The fair-use guidance on the SEC’s public endpoints is strict, and naive collection patterns are throttled or blocked.

Filing metadata is spread across multiple endpoints with different shapes. Submissions data lives in one place, full-text search across the body of filings lives in another, and the actual document indexes that link to PDFs and primary documents are reconstructed from accession numbers using a path scheme you have to derive from documentation. Form types are inconsistent across decades — 10-K vs 10-K/A vs 10-KSB — and the same logical concept shows up under several form codes that you need to normalise. Date ranges interact awkwardly with EDGAR’s pagination, so a naive filter pulls the wrong slice unless you handle window edges explicitly.

None of this is fundamentally hard, but doing it correctly across 35 million filings, 600 thousand filers, and 40 years of history is a substantial integration job that has nothing to do with the actual research question you started with.

How to use the SEC EDGAR Scraper

We maintain an SEC EDGAR Scraper on Apify that handles entity resolution, pagination, form normalisation, and document URL construction. You give it a company name (or CIK), pick which forms you care about, set a date range, and it returns clean structured filing records ready for analysis or import. No API key is required — the scraper uses official SEC public endpoints under the SEC’s fair-use rules.

Input

Pull all 10-K and 8-K filings for Tesla over the last year:

{
  "searchMode": "company",
  "query": "Tesla Inc",
  "forms": "10-K,8-K",
  "maxItems": 20
}

Full-text search across all filings for any mention of a topic:

{
  "searchMode": "fulltext",
  "query": "climate change risk",
  "forms": "10-K",
  "startDate": "2025-01-01",
  "endDate": "2026-01-01",
  "maxItems": 100
}

Look up filings directly by CIK (skips entity resolution):

{
  "searchMode": "company",
  "query": "0001318605",
  "forms": "10-Q",
  "maxItems": 12
}

Calling the actor from Python

Using the Apify Python client:

import apify_client

client = apify_client.ApifyClient('YOUR_API_TOKEN')

run = client.actor('cryptosignals/sec-edgar-scraper').call(run_input={
    'searchMode': 'company',
    'query': 'Apple Inc',
    'forms': '10-K,10-Q,8-K',
    'startDate': '2024-01-01',
    'endDate': '2026-05-01',
    'maxItems': 50,
})

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item['filed_date'], item['filing_type'], item['document_url'])

Output

Each filing is returned as a structured JSON record:

{
  "company_name": "Tesla, Inc.",
  "cik": "0001318605",
  "ticker": "TSLA",
  "filing_type": "10-K",
  "filed_date": "2026-02-03",
  "period_of_report": "2025-12-31",
  "description": "Annual report",
  "document_url": "https://www.sec.gov/Archives/edgar/data/1318605/000131860526000012/0001318605-26-000012-index.htm",
  "primary_document": "tsla-20251231.htm",
  "accession_number": "0001318605-26-000012",
  "scraped_at": "2026-05-02T10:30:00Z"
}

The document_url points to the EDGAR filing index page, which lists the primary document and every exhibit. accession_number is the canonical filing identifier you can use to deduplicate across runs or join against other EDGAR-derived datasets.

Common use cases

Use case	Typical query
Build a 10-K corpus for an industry	Pull the latest 10-K for every issuer in a sector to feed into NLP pipelines for risk-factor analysis, segment extraction, or competitor mention tracking.
8-K event monitoring	Pull all 8-K filings across a watchlist over a sliding window to power event-driven alerts or backtests on material-event categories.
IPO and S-1 tracking	Pull all S-1 filings filed in the last 90 days to maintain a real-time view of upcoming IPOs and their disclosed business models.
Compliance and disclosure monitoring	Continuously refresh the filing list for a portfolio of regulated entities and alert on new filings of specific types within hours of submission.
Cross-filing full-text search	Search for a specific phrase, technology, or counterparty mentioned across all 10-Ks in a date range to surface every issuer that disclosed exposure to it.

For company-level enrichment beyond SEC disclosure data — private companies, funding rounds, headcount, founders — pair this scraper with our Crunchbase Scraper for a fuller picture of the public-and-private corporate landscape.

Pricing

Pricing model	Cost	Effective
Pay per result	$0.01 per filing record	From May 19, 2026

Pay-per-result pricing means you only pay for successfully extracted filings, not for compute time, retries, or queries that returned no matches. A 1,000-filing pull is a flat $10; a 10,000-filing dataset is $100. Apify Free tier ($0/month) includes monthly platform credit you can spend on this actor for evaluation runs before committing to a paid plan.

Get started

Run the scraper directly in the Apify console: apify.com/cryptosignals/sec-edgar-scraper. Pick a search mode, paste a company name or query, choose your forms and date range, hit Start. The dataset is downloadable as JSON, CSV, or Excel, and accessible via the standard Apify dataset API for any downstream pipeline.