Shopify powers over 4.5 million online stores worldwide, from solo DTC brands to enterprise retailers doing hundreds of millions in annual revenue. For competitive intelligence teams, price monitoring tools, dropshipping operators, and market researchers, Shopify store data — product catalogs, pricing, variant availability, and inventory signals — represents one of the most valuable e-commerce datasets available outside of Amazon.
Unlike Amazon or Walmart, Shopify does not have a centralized marketplace API. Each store is an independent deployment. While Shopify’s platform exposes some data through JSON endpoints, these are inconsistently enabled across stores, rate-limited, and often incomplete. Getting reliable, structured product data across dozens or hundreds of Shopify stores requires a different approach.
On the surface, many Shopify stores look accessible. The platform’s JSON product endpoints (/products.json) are publicly documented and work on stores that have not disabled them. But this apparent openness hides significant operational complexity when you scale beyond a handful of stores.
The consistency problem: Shopify stores are not a single target — they are millions of independently configured deployments. Some expose full JSON feeds; others disable them, add Cloudflare, or implement custom bot detection at the CDN layer. A scraper that works reliably on one store may fail entirely on another store built on the same platform. Handling this heterogeneity at scale requires per-store detection logic and fallback strategies that most teams do not have time to build.
Beyond access inconsistency, Shopify’s JSON endpoints are paginated with a 250-product limit per page and no standardized pagination token across store versions. Large catalogs with thousands of SKUs require careful pagination management and deduplication. Variant data — sizes, colors, material options — is nested and not always cleanly separated from parent product records, requiring normalization logic that differs across store themes and catalog structures.
Inventory data adds another layer of complexity. Shopify stores can configure inventory tracking per variant, and the available quantity field is often omitted or set to a non-meaningful value for stores using third-party fulfillment. Reading inventory signals correctly requires understanding which stores have configured tracking and which have not.
Anti-bot measures have also increased across high-value Shopify stores. Major DTC brands that know their pricing data is commercially sensitive have added Cloudflare Bot Management, behavioral fingerprinting challenges, and IP reputation scoring on top of the standard Shopify stack. These stores effectively close the JSON endpoints to non-browser traffic.
We maintain a Shopify Store Scraper on Apify that handles store-level detection, JSON fallback logic, pagination, and data normalization. You provide store URLs; it returns structured product catalogs with variant-level detail.
Scrape one or more Shopify stores by URL:
{
"storeUrls": [
"https://gymshark.com",
"https://allbirds.com",
"https://chubbiesshorts.com"
],
"maxProductsPerStore": 500,
"includeVariants": true,
"includeInventory": true
}
Or target specific product categories within a store:
{
"storeUrls": ["https://gymshark.com"],
"collectionPath": "/collections/mens-t-shirts",
"maxProductsPerStore": 200
}
Each product returns a structured object:
{
"store": "gymshark.com",
"productId": "6789012345678",
"title": "Vital Seamless 2.0 T-Shirt",
"handle": "vital-seamless-2-0-t-shirt-mens",
"vendor": "Gymshark",
"productType": "T-Shirts",
"tags": ["mens", "seamless", "training"],
"price": 40.00,
"compareAtPrice": null,
"currency": "USD",
"availableVariants": 18,
"totalVariants": 24,
"variants": [
{
"variantId": "39876543210123",
"title": "Small / Black",
"price": 40.00,
"sku": "GS-VST-S-BLK",
"inventoryQuantity": 143,
"available": true
},
{
"variantId": "39876543210456",
"title": "Medium / Black",
"price": 40.00,
"sku": "GS-VST-M-BLK",
"inventoryQuantity": 0,
"available": false
}
],
"images": ["https://cdn.shopify.com/s/files/..."],
"publishedAt": "2025-09-14T10:00:00Z",
"updatedAt": "2026-04-22T08:30:00Z",
"url": "https://gymshark.com/products/vital-seamless-2-0-t-shirt-mens"
}
| Field | Type | Description |
|---|---|---|
store | string | Source Shopify store domain |
productId | string | Shopify internal product ID |
title | string | Product name |
vendor | string | Brand or manufacturer name |
price | float | Current selling price |
compareAtPrice | float | Original price if on sale, else null |
availableVariants | integer | Variant count currently in stock |
variants | array | All size/color variants with price and inventory |
tags | array | Store-assigned product tags |
publishedAt | string | When product was first listed |
updatedAt | string | Last catalog update timestamp |
Output is available as JSON, CSV, or XLSX. Runs can be scheduled on Apify to monitor specific stores on a daily or hourly cadence, enabling automated price change detection and inventory alerts.
A typical workflow is to run the scraper daily against a list of competitor stores, then diff the results against the previous run. Any product where price changed or availableVariants dropped to zero triggers a downstream alert — a Slack message, a webhook to your pricing tool, or a row in a Google Sheet.
Apify’s scheduling and webhook APIs make this straightforward to automate without managing infrastructure. You configure the actor run, set a daily schedule, and point the output to your destination. No servers required.
The actor uses Pay Per Event pricing at $0.005 per product.
| Volume | Cost |
|---|---|
| 100 products | $0.50 |
| 500 products | $2.50 |
| Full store (1,000 SKUs) | $5.00 |
| Daily monitor (5 stores × 200 SKUs) × 30 days | $15/month |
Shopify Store Scraper on Apify →
Apify has a free tier for testing. Sign up here if you do not have an account. The actor integrates with Apify’s scheduling, webhook, and dataset APIs so you can run automated competitor monitoring pipelines without managing any infrastructure.