NASA's Astronomy Picture of the Day has been running since June 16, 1995. Every single day for more than three decades, an astronomer has hand-picked an image of the universe — a galaxy, a comet, a Mars rover panorama, a Hubble deep field — and written a short explanation aimed at a curious general audience. There are now more than 11,000 entries in the archive, and the project is still going. As an open source of high-quality, well-curated, copyright-clean astronomy media, APOD has no real equivalent.
If you're building anything around space, science education, or simply need a beautiful image and a paragraph of context per day, APOD is the obvious data source. The catch is the same as with most government datasets: it exists, it's free, and the moment you try to use it programmatically at any scale you start running into edge cases. This post walks through what APOD data actually looks like, what people use it for, and how to pull it cleanly without writing or maintaining the integration yourself.
Each APOD entry is a structured record. The interesting fields are the picture URL, an HD version, a title, a longer explanation written by the curating astronomer, the date, and an optional copyright credit when the image isn't public domain. About 5–10% of entries are videos rather than images — YouTube embeds, Vimeo clips, or interactive panoramas. Anything you build needs to handle both.
The dataset is small enough to walk in full (a single request per day since 1995) but large enough that you don't want to babysit a backfill. The image hosting also moves around occasionally, and older entries sometimes have inconsistent metadata — missing copyright fields, broken thumbnail URLs, slightly different HTML formatting in the explanation. Anyone who has tried to write their own APOD integration has hit at least one of these.
The official APOD endpoint is rate-limited per API key, returns slightly different shapes depending on the parameters you send, and gets cranky on long date ranges. Image URLs change hosting occasionally. Some entries return a video URL where you'd expect an image. Copyright fields are inconsistent. Backfilling thirty years of history without rate-limit retries and schema normalization is a weekend you're not getting back.
Most production builds also want some form of residential proxies in front of any large historical fetch — not because APOD is hostile, but because shared cloud IPs eat 429s long before per-key quotas do.
We built and maintain cryptosignals/nasa-apod-scraper on Apify so you don't have to. It handles the rate limits, the schema normalization, and the video-vs-image handling. You give it a small JSON input describing what you want, and it returns clean structured records. Pay-per-result, no API key juggling.
{
"action": "today"
}
Returns:
[
{
"date": "2026-05-03",
"title": "M31: The Andromeda Galaxy",
"explanation": "The most distant object easily visible to the unaided eye...",
"url": "https://apod.nasa.gov/apod/image/2605/M31_HubbleSpitzerGendler_960.jpg",
"hdurl": "https://apod.nasa.gov/apod/image/2605/M31_HubbleSpitzerGendler_4096.jpg",
"media_type": "image",
"copyright": "Robert Gendler"
}
]
{
"action": "random",
"count": 5
}
{
"action": "range",
"start_date": "2025-01-01",
"end_date": "2025-12-31"
}
The output is one normalized record per day, with consistent field names, the media type clearly tagged, and broken/missing fields handled. Whatever you do with the result — render a widget, send an email, fill a database, train a model — you start from clean data instead of a pile of edge cases.
If APOD is just one input in a larger space or science product, the actor pattern composes well: fetch APOD daily, store the records, and combine with whatever other feeds you want (launches, telescope schedules, near-Earth object data) without each integration becoming its own maintenance burden.
Run cryptosignals/nasa-apod-scraper on Apify. Free credits cover thousands of records on signup, which is more than enough to back the first version of whatever you're building.