DEAR PEOPLE FROM THE FUTURE: Here's what we've figured out so far...

Welcome! This is a Q&A website for computer programmers and users alike, focused on helping fellow programmers and users. Read more

What are you stuck on? Ask a question and hopefully somebody will be able to help you out!
+2 votes
by

2 Answers

+1 vote
 
Best answer

I remember using one called warrick but the links appear to be dead (this could be a mirror, maybe?). However I've found other two tools, one called wayback-machine-downloader (Ruby) and the other called waybackpack (Python).

Wayback Machine Downloader

Usage: wayback_machine_downloader http://example.com

Download an entire website from the Wayback Machine.

Optional options:
    -d, --directory PATH             Directory to save the downloaded files into
				     Default is ./websites/ plus the domain name
    -s, --all-timestamps             Download all snapshots/timestamps for a given website
    -f, --from TIMESTAMP             Only files on or after timestamp supplied (ie. 20060716231334)
    -t, --to TIMESTAMP               Only files on or before timestamp supplied (ie. 20100916231334)
    -e, --exact-url                  Download only the url provied and not the full site
    -o, --only ONLY_FILTER           Restrict downloading to urls that match this filter
				     (use // notation for the filter to be treated as a regex)
    -x, --exclude EXCLUDE_FILTER     Skip downloading of urls that match this filter
				     (use // notation for the filter to be treated as a regex)
    -a, --all                        Expand downloading to error files (40x and 50x) and redirections (30x)
    -c, --concurrency NUMBER         Number of multiple files to download at a time
				     Default is one file at a time (ie. 20)
    -p, --maximum-snapshot NUMBER    Maximum snapshot pages to consider (Default is 100)
				     Count an average of 150,000 snapshots per page
    -l, --list                       Only list file urls in a JSON format with the archived timestamps, won't download anything

waybackpack

usage: waybackpack [-h] (-d DIR | --list) [--raw] [--root ROOT]
                   [--from-date FROM_DATE] [--to-date TO_DATE]
                   [--user-agent USER_AGENT] [--follow-redirects]
                   [--uniques-only] [--collapse COLLAPSE] [--quiet]
                   url

positional arguments:
  url                   The URL of the resource you want to download.

optional arguments:
  -h, --help            show this help message and exit
  -d DIR, --dir DIR     Directory to save the files. Will create this
                        directory if it doesn't already exist.
  --list                Instead of downloading the files, only print the list
                        of snapshots.
  --raw                 Fetch file in its original state, without any
                        processing by the Wayback Machine or waybackpack.
  --root ROOT           The root URL from which to serve snapshotted
                        resources. Default: 'https://web.archive.org'
  --from-date FROM_DATE
                        Timestamp-string indicating the earliest snapshot to
                        download. Should take the format YYYYMMDDhhss, though
                        you can omit as many of the trailing digits as you
                        like. E.g., '201501' is valid.
  --to-date TO_DATE     Timestamp-string indicating the latest snapshot to
                        download. Should take the format YYYYMMDDhhss, though
                        you can omit as many of the trailing digits as you
                        like. E.g., '201604' is valid.
  --user-agent USER_AGENT
                        The User-Agent header to send along with your requests
                        to the Wayback Machine. If possible, please include
                        the phrase 'waybackpack' and your email address. That
                        way, if you're battering their servers, they know who
                        to contact. Default: 'waybackpack'.
  --follow-redirects    Follow redirects.
  --uniques-only        Download only the first version of duplicate files.
  --collapse COLLAPSE   An archive.org `collapse` parameter. Cf.:
                        https://github.com/internetarchive/wayback/blob/master
                        /wayback-cdx-server/README.md#collapsing
  --quiet               Don't log progress to stderr.
  --max-retries MAX_RETRIES
                        How many times to try accessing content with 4XX or
                        5XX status code before skipping?
by
selected by
+1 vote

If you append the suffix if_ to the date in the URL, it will hide the archive toolbar. Like this https://web.archive.org/web/20210715if_/https://example.org

If you append the suffix id_ to the date in the URL, it will give you the original page. Like this https://web.archive.org/web/20210715id_/https://example.org

Therefore you can wget the page

wget -p https://web.archive.org/web/20210715id_/https://example.org

It's an old trick that I cannot find documented anywhere. Cannot even remember where I learned it.

by
Contributions licensed under CC0
...