ArchiveBox Linux Setup
Archive box is an opensource application that allows the downloading and archiving of websites to be used offline. It has a good user interface and is simple to use. This post will go through easy setup and basic usage.
Setup
For applications like this, it’s usually best to just use docker. I use ArchiveBox at home and don’t require SSL so the basic docker-compose configuration will be sufficient. You can find the docker-compose.yml file here. Below is the quick setup from their website. Remember to know the admin password you set. It will be used to login after we get it running.
# make directory to contain compose file and download it
mkdir ~/archivebox && cd ~/archivebox
curl -O 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/docker-compose.yml'
# run initial admin account setup
docker-compose run archivebox init --setup
# start the container fully
docker-compose up -d
This is what my compose file looks like after I remove the lines commented out.
version: '2.4'
services:
archivebox:
image: ${DOCKER_IMAGE:-archivebox/archivebox:master}
command: server --quick-init 0.0.0.0:8000
ports:
- 8000:8000
environment:
- ALLOWED_HOSTS=*
volumes:
- ./data:/data
If SSL is a requirement for you, see the compose file. There is a section on setting up Nginx with SSL in front of ArchiveBox.
Logging In
Now the user you created should be able to login to the web interface. Your installation can be found at http://localhost:8000. Login with the credentials you created and you can now use ArchiveBox.
Archiving a Website
To archive a website, click “ADD+”; at the top of the page.
Now the screen will show a textbox where URLs can be entered. Enter the desired URL you wish to archive. For this tutorial we will use https://isitchristmas.com/ since it is a tiny site.
Then below that ensure to pick which method to Archive you need. If you don’t pick, it will archive using all of the methods. I use wget usually.
Now click “Add URLs and archive +”. This will begin the archiving process. Some websites will take a while. You can leave the site and come back later to check on it when it may be done. The “snapshot” section will now have our archive.
CLI Commands
The same can be accomplished inside the docker container. To get to the shell inside the container run this.
docker exec -itu archivebox archivebox_archivebox_1 bash
Then to add a website run the following.
archivebox add https://isitchristmas.com/
This was from the help section of the command. There is a lot you can do from the command alone. Here is the example section.
Example Use:
mkdir my-archive; cd my-archive/
archivebox init
archivebox status
archivebox add https://example.com/some/page
archivebox add --depth=1 ~/Downloads/bookmarks_export.html
archivebox list --sort=timestamp --csv=timestamp,url,is_archived
archivebox schedule --every=day https://example.com/some/feed.rss
archivebox update --resume=15109948213.123
Once added the WebUI will be updated with the archive you specified.