User:Mjb/ia
Sometimes, to upload and manage content on archive.org, I use the Internet Archive command-line tool, ia.
The official documentation for it is pretty good:
- Command-Line Interface documentation at archive.org
What I am writing here is just a supplement to cover some things that came up as I used the tool.
Contents
- 1 Installation and setup
- 2 Bandwidth limiting
- 3 Upload a folder of videos
- 4 Replacing files in a folder is not possible
- 5 Resuming is not possible
- 6 Delete all content for a given identifier
- 7 See what files are in the item
- 8 See what changes are pending
- 9 See what metadata is associated with the item and its files
- 10 Update metadata
Installation and setup
It only runs on Unix-like systems, so I installed it in a Lubuntu installation which I am running on a virtual machine in VirtualBox on Windows:
sudo apt-get install ia
You only have to set your login credentials once, unless you are going to be using multiple accounts:
ia configure
Bandwidth limiting
Sometimes I want to throttle the network traffic used by ia, but the normal way of doing this (using trickle
) does not have any effect. So, with the help of a guide found in a web search, I discovered I can use tc
to throttle the entire network interface (as seen by Lubuntu) instead:
sudo tc qdisc add dev enp0s3 root tbf rate 880kbit latency 60ms burst 1540
In that command, 880kbit is the rate I want to limit to (about 110 KB/s). enp0s3 is the name of the network interface, as found by running ip link
.
I discovered through trial and error that 60ms latency results in no dropped packets, while 50ms results in about a 4% drop rate. To see the packet stats:
tc -s -d qdisc ls dev enp0s3
Upload a folder of videos
Choose a URL-friendly identifier, and say what file or folder to upload, and set the mediatype correctly. Everything else is optional.
ia upload identifier folderpath --metadata="title:This is a Better Title than the Identifier" --metadata="mediatype:movies" --metadata="date:1990-05-20" --metadata="language:English"
Common mediatypes: texts, movies, image, or data. If you accidentally enter videos, it will be interpreted as movies.
When you upload a folder, it only uploads the contents of the folder, not the top level folder you gave on the command line.
Replacing files in a folder is not possible
Do not try to upload individual files to replace files in a folder; they will not go into the subfolder. You have to delete and upload the whole folder!
Resuming is not possible
As far as I know, there is no way to resume an interrupted transfer. So if you only get part of a file uploaded, it is gone. If you only get part of a folder uploaded, you need to delete everything and try again.
Delete all content for a given identifier
ia delete identifier --all -H x-archive-keep-old-version:0
Give it a few minutes to finish. The tool will finish before the server actually deletes everything; be patient.
It is possible some files will still be left behind, e.g. metadata .xml, .sqlite, and a thumbnail image. Don't sweat it.
See what files are in the item
ia list identifier
See what changes are pending
Whenever you upload, delete, or change something, a bunch of related tasks are added to a queue. To get a list of pending tasks applicable to a certain item, you can visit https://archive.org/catalog.php?identifier=foo (replace foo) which is also linked from https://archive.org/manage/foo (the archive item manager).
A history page at https://catalogd.archive.org/history/foo shows even more info.
See what metadata is associated with the item and its files
ia metadata identifier
Update metadata
ia metadata identifier --modify="date:1985"