Archive box is an opensource application that allows the downloading and archiving of websites to be used offline. It has a good user interface and is simple to use. This post will go through easy setup and basic usage.

Setup

For applications like this, it’s usually best to just use docker. I use ArchiveBox at home and don’t require SSL so the basic docker-compose configuration will be sufficient. You can find the docker-compose.yml file here. Below is the quick setup from their website. Remember to know the admin password you set. It will be used to login after we get it running.

# make directory to contain compose file and download it
mkdir ~/archivebox && cd ~/archivebox
curl -O 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/docker-compose.yml'

# run initial admin account setup
docker-compose run archivebox init --setup

# start the container fully
docker-compose up -d

This is what my compose file looks like after I remove the lines commented out.

version: '2.4'

services:
    archivebox:
        image: ${DOCKER_IMAGE:-archivebox/archivebox:master}
        command: server --quick-init 0.0.0.0:8000
        ports:
            - 8000:8000
        environment:
            - ALLOWED_HOSTS=*                   
        volumes:
            - ./data:/data

If SSL is a requirement for you, see the compose file. There is a section on setting up Nginx with SSL in front of ArchiveBox.

Logging In

Now the user you created should be able to login to the web interface. Your installation can be found at http://localhost:8000. Login with the credentials you created and you can now use ArchiveBox.

Archiving a Website

To archive a website, click “ADD+”; at the top of the page.

Now the screen will show a textbox where URLs can be entered. Enter the desired URL you wish to archive. For this tutorial we will use https://isitchristmas.com/ since it is a tiny site.

Then below that ensure to pick which method to Archive you need. If you don’t pick, it will archive using all of the methods. I use wget usually.

Now click “Add URLs and archive +”. This will begin the archiving process. Some websites will take a while. You can leave the site and come back later to check on it when it may be done. The “snapshot” section will now have our archive.

CLI Commands

The same can be accomplished inside the docker container. To get to the shell inside the container run this.

docker exec -itu archivebox archivebox_archivebox_1 bash

Then to add a website run the following.

archivebox add https://isitchristmas.com/

This was from the help section of the command. There is a lot you can do from the command alone. Here is the example section.

Example Use:
    mkdir my-archive; cd my-archive/
    archivebox init
    archivebox status

    archivebox add https://example.com/some/page
    archivebox add --depth=1 ~/Downloads/bookmarks_export.html

    archivebox list --sort=timestamp --csv=timestamp,url,is_archived
    archivebox schedule --every=day https://example.com/some/feed.rss
    archivebox update --resume=15109948213.123 

Once added the WebUI will be updated with the archive you specified.