AliExpress is one of the largest e-commerce platforms in the world, with over 100 million products listed across virtually every consumer category. For anyone working in e-commerce intelligence, price monitoring, dropshipping research, or competitive analysis, getting reliable product data from AliExpress is not optional -- it is table stakes.
The problem is that AliExpress does not want you scraping their data. They have invested heavily in bot detection, JavaScript-rendered content, and IP-based rate limiting that makes naive scraping approaches completely useless. If you have ever tried to use Python's requests library to fetch an AliExpress product page, you already know the result: you get a page full of empty divs, "loading..." placeholders, and zero actual product data.
This guide is the result of months of trial and error building production scraping pipelines that pull data from AliExpress reliably. I will walk you through exactly what works in 2026, what does not, and the specific techniques that let you extract product data without getting your IP addresses burned. We will cover everything from the initial environment setup to handling AliExpress's specific anti-bot measures, building retry logic for production use, and structuring the extracted data into clean, usable formats.
Whether you are building a price comparison tool, researching suppliers for a dropshipping business, monitoring competitor pricing, or feeding data into an analytics pipeline, this guide gives you the complete playbook. Every code example is tested and working as of March 2026.
Unlike most retail sites that serve product data in clean HTML, AliExpress uses a combination of server-rendered shells and JavaScript-populated content. The initial HTML response contains the page layout and navigation, but nearly all product-specific data -- prices, shipping info, seller ratings, SKU variants -- is injected by JavaScript after the page loads. This immediately eliminates any approach based on simple HTTP requests and HTML parsing.
The second layer of difficulty is AliExpress's bot detection system, which has become significantly more sophisticated over the past two years. It operates on multiple signals simultaneously:
requests or httpx libraries produces a JA3 fingerprint that is trivially distinguishable from a real Chrome browser. AliExpress checks this.navigator.webdriver === true, missing browser plugins, and Chrome DevTools Protocol artifacts.The third problem is structural. AliExpress frequently changes their page layout, CSS class names, and the internal data structure of their JavaScript payloads. A scraper that works perfectly today might break next month when they reorganize their frontend code. Any production scraper needs to be designed with this brittleness in mind.
Despite all of this, AliExpress scraping is entirely feasible with the right approach. The key insight is that AliExpress embeds most product data in a JavaScript variable called window.__INIT_DATA__, and this variable is far more stable than the visual DOM structure. Combined with a properly configured headless browser and residential proxy rotation, you can build scrapers that work reliably for months at a time.
Before writing any scraping code, you need the right tools installed. Here is the complete setup:
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
aliexpress-scraper/
scraper.py # Main scraping logic
search_scraper.py # Search result scraping
batch_runner.py # Batch processing with retry logic
config.py # Proxy and settings configuration
output/ # JSON output directory
logs/ # Error and debug logs
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Playwright is the only reliable approach for AliExpress in 2026. It runs a real Chromium browser, executes JavaScript, and produces authentic browser fingerprints that pass AliExpress's detection. Here is the complete scraper, explained step by step.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Let me break down the critical parts of this code:
playwright-stealth library modifies browser properties that headless detection scripts check. Without it, AliExpress detects you within the first few requests.wait_until="networkidle" tells Playwright to wait until there have been no network requests for 500ms. This ensures all JavaScript has finished populating the page data.window.__INIT_DATA__ for structured data, then fall back to DOM selectors. The JavaScript variable is more reliable and contains more fields.The window.__INIT_DATA__ variable is the single most valuable data source on any AliExpress product page. It is a large JSON object (often 50-200KB) that contains virtually everything about the product, the seller, shipping options, and SKU pricing. It exists because AliExpress's React frontend needs this data to render the page, and they inject it into the page as a global variable during server-side rendering.
The structure changes periodically as AliExpress refactors their frontend, but the overall shape has been stable since mid-2025. Here are the key paths you need to know:
| Field | Contains |
|---|---|
| Product info | Title, product ID, category, item specifics |
| Pricing | Current price, original price, discount percentage, currency |
| Sales | Total sold count, formatted sales number |
| Reviews | Average star rating, total reviews, positive rate |
| Seller | Store name, ID, rating, follower count, years active |
| Shipping | Ships-from country, free shipping flag, estimated delivery |
| SKUs / variants | All SKU variants with prices, images, stock status |
| Images | Product image URLs (full resolution) |
| Description | Product description HTML and specification table |
| Related | Related products and category breadcrumbs |
A managed actor returns these as flat, documented fields — you don't need to maintain selectors or internal JSON paths.
Because these paths can change, always use defensive access with fallbacks:
# With a managed actor you get flat, documented fields — no need to walk internal JSON
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'productUrls': ['https://www.aliexpress.com/item/1005006123456789.html']}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
price = item.get('price', 'N/A')
sold = item.get('sold_count', '0')
Additionally, log the raw __INIT_DATA__ structure when your parser encounters unexpected shapes. This makes debugging much easier when AliExpress changes their data format:
import os
from datetime import datetime
def dump_debug_data(init_data: dict, url: str):
"""Save raw init data for debugging when parsing fails."""
os.makedirs("debug", exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"debug/init_data_{timestamp}.json"
with open(filename, "w") as f:
json.dump({"url": url, "data": init_data}, f, indent=2)
logger.info(f"Saved debug data to {filename}")
Here is the complete JSON schema for a successfully scraped product. Every field in this schema maps to a specific extraction in the parse_init_data function above:
{
"title": "Wireless Bluetooth Earbuds TWS Headphones",
"product_id": "1005006123456789",
"category_id": "44",
"price": "US $12.99",
"original_price": "US $25.98",
"discount": "50%",
"currency": "USD",
"sold_count": "5,000+",
"sold_count_raw": 5234,
"star_rating": "4.8",
"review_count": 1247,
"positive_rate": "96.2%",
"seller_name": "TechGadgets Official Store",
"seller_id": "912345678",
"seller_rating": "97.5%",
"store_followers": 45230,
"ships_from": "CN",
"free_shipping": true,
"variants": [
{
"name": "Color",
"options": [
{"name": "Black", "image": "https://ae01.alicdn.com/..."},
{"name": "White", "image": "https://ae01.alicdn.com/..."}
]
},
{
"name": "Ships From",
"options": [
{"name": "China", "image": ""},
{"name": "United States", "image": ""}
]
}
],
"images": [
"https://ae01.alicdn.com/kf/image1.jpg",
"https://ae01.alicdn.com/kf/image2.jpg"
],
"source_url": "https://www.aliexpress.com/item/1005006123456789.html",
"scrape_status": "success"
}
For batch operations, wrap the results in a container with metadata:
{
"scrape_run": {
"timestamp": "2026-03-30T14:22:00Z",
"total_urls": 50,
"successful": 47,
"failed": 3,
"avg_time_per_product": 8.2
},
"products": [
{ "...product data..." },
{ "...product data..." }
],
"errors": [
{"url": "https://...", "error": "CAPTCHA detected", "timestamp": "..."}
]
}
Product page scraping gets you detailed data on individual items, but often you need to discover products first. AliExpress search and category pages list dozens of products per page with summary data.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
multi--titleText--3eOiq. Using partial matches with [class*='multi--titleText'] is more resilient than exact class names.
AliExpress's bot detection is multi-layered. Here are the specific techniques that matter, ranked by impact:
This is the single most important factor. From a datacenter IP (AWS, GCP, DigitalOcean), you will get blocked within 5-10 requests regardless of what other techniques you use. Residential proxies route your traffic through real ISP connections, making your requests indistinguishable from a real home user.
For AliExpress specifically, ThorData provides residential proxies with geo-targeting that works well. Their ability to target specific countries matters because AliExpress shows different pricing and availability based on the requester's location. If you are tracking prices for a US-facing dropshipping store, you want your proxy to exit from a US residential IP.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
# Use the managed scraper — no maintenance, no blocks, no auth headaches
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN') # get yours at apify.com
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'keywords': ['bluetooth earbuds'], 'maxResults': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
# Use the managed scraper — no maintenance, no blocks, no auth headaches
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN') # get yours at apify.com
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'keywords': ['bluetooth earbuds'], 'maxResults': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Even with perfect anti-detection, hitting AliExpress too fast from any single IP will trigger rate limiting. Here is a production-ready rotation strategy:
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
For AliExpress specifically, these rate limits apply as of 2026:
Cause: The page either did not load (network issue) or loaded a different page than expected (CAPTCHA, auth wall, country redirect).
Fix: Check what page actually loaded before waiting for selectors:
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Cause: JavaScript did not finish executing before extraction. Or the product uses dynamic pricing that requires additional API calls.
Fix: Use window.__INIT_DATA__ instead of DOM selectors. The data is available in the JS variable even before it renders in the DOM.
Cause: Your IP range is blocked. Datacenter IPs almost always get 403.
Fix: Switch to residential proxies. If you are already using residential proxies, your proxy provider's IP pool might be burned for AliExpress. Try a different provider or different geographic region.
Cause: AliExpress serves different prices based on geographic location, user history, and whether the visitor appears to be a new customer.
Fix: Standardize your locale, timezone, and proxy location. For consistent pricing data, always use the same country for proxy exit and browser locale.
Cause: AliExpress's bot detection has flagged your session. This is more severe than a CAPTCHA -- it usually means your browser fingerprint or behavior pattern was flagged.
Fix: Kill the browser instance entirely. Rotate to a fresh proxy. Wait at least 10 minutes before retrying. Do not reuse any cookies or session state from the flagged session.
Cause: The page loaded an error state, or AliExpress has changed how they inject the data for certain product types (e.g., digital products, pre-order items).
Fix: Add a retry with a fresh session. If it consistently fails for specific products, those products may use a different frontend rendering path. Fall back to DOM extraction for those items.
For production use, you need a batch processor that handles failures gracefully and retries with different proxies:
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
The most common use case. Dropshippers need to find products with high sales volume, good ratings, and reliable sellers. By scraping search results for trending categories and then deep-scraping the top products, you can identify winning products before they saturate the market. Key fields: sold_count, star_rating, seller_rating, free_shipping.
E-commerce businesses that source from AliExpress need to know when supplier prices change. A daily scrape of your product catalog URLs, compared against historical prices stored in a database, lets you trigger alerts when a product's price drops (buying opportunity) or spikes (time to find an alternative supplier). Key fields: price, original_price, discount.
If you sell on Amazon or Shopify, knowing the AliExpress source price for competing products tells you whether competitors are operating on thin margins or have room to undercut you. Cross-reference AliExpress product titles with Amazon listings to map the supply chain. Key fields: title, price, images (for visual matching).
Aggregating search result data across categories over time reveals which product types are gaining or losing traction. A product that shows accelerating sales velocity (increasing sold_count between weekly scrapes) is trending up. Key fields: sold_count_raw, review_count, category_id.
Before committing to a supplier for bulk orders, scrape all products from their store and aggregate their ratings. Sellers with consistently high star ratings and positive feedback rates across many products are more reliable than sellers with a single highly-rated product. Key fields: seller_rating, store_followers, star_rating, positive_rate.
If you would rather not maintain Playwright code, proxy rotation, and anti-detection patches yourself, managed scraping tools handle all of this for you.
I built an AliExpress Product Scraper on Apify that returns 20+ fields per product including soldCount, starRating, reviewCount, originalPrice, discount, full SKU variant data, and seller metrics. It handles proxy rotation, CAPTCHA detection, retries, and AliExpress's constantly-changing page structure internally.
The advantage is zero maintenance. When AliExpress changes their frontend (which happens every few weeks), the managed scraper absorbs the update. Your pipeline keeps running without you debugging broken selectors at midnight.
AliExpress scraping in 2026 comes down to three non-negotiable requirements: a headless browser (Playwright), residential proxies, and respect for rate limits. Skip any one of these and you will spend more time fighting blocks than actually collecting data.
The window.__INIT_DATA__ approach is the most durable extraction method because it pulls from the same data source that AliExpress's own frontend uses. DOM selectors break monthly; the JavaScript data structure changes maybe twice a year.
For small-scale research (under 100 products per day), the code in this guide running with a few residential proxies is more than sufficient. For larger volumes, consider the managed Apify scraper or building out a distributed system with a proper proxy rotation infrastructure.
Built by Crypto Volume Signal Scanner -- tools for developers who work with web data. See also: Scrape Google Search Results | LinkedIn Data Without the API | YouTube Stats Without the API
Try Apify free — the platform powering these scrapers. Get started →