User:Mjb/ia
Sometimes, to upload and manage content on archive.org, I use the Internet Archive command-line tool, ia.
The official documentation for it is pretty good:
- Command-Line Interface documentation at archive.org
What I am writing here is just a supplement to cover some things that came up as I used the tool.
Contents
- 1 Installation and setup
- 2 Bandwidth limiting
- 3 Upload a folder of videos
- 4 Replacing files in a subfolder is not possible
- 5 Resuming is not possible
- 6 Delete all content for a given identifier
- 7 See what files are in the item
- 8 See what changes are pending
- 9 See what metadata is associated with the item and its files
- 10 Update metadata
Installation and setup
It only runs on Unix-like systems, so I installed it in a Lubuntu installation which I am running on a virtual machine in VirtualBox on Windows:
sudo apt-get install ia
You only have to set your login credentials once, unless you are going to be using multiple accounts:
ia configure
Bandwidth limiting
Sometimes I want to throttle the network traffic used by ia, but the normal way of doing this (using trickle
) does not have any effect. So, with the help of a guide found in a web search, I discovered I can use tc
to throttle the entire network interface (as seen by Lubuntu) instead:
sudo tc qdisc add dev enp0s3 root tbf rate 880kbit latency 60ms burst 1540
In that command, 880kbit is the rate I want to limit to (about 110 KB/s). enp0s3 is the name of the network interface, as found by running ip link
.
I discovered through trial and error that 60ms latency results in no dropped packets, while 50ms results in about a 4% drop rate. To see the packet stats:
tc -s -d qdisc ls dev enp0s3
Upload a folder of videos
Choose a URL-friendly identifier, say what files to upload, and set the mediatype correctly. Everything else is optional.
ia upload identifier folderpath/* --metadata="title:This is a Better Title than the Identifier" --metadata=mediatype:movies --metadata=collection:opensource_movies --metadata=date:1990-05-20 --metadata=language:English
Common mediatypes: texts, movies, image, or data. If you accidentally enter videos, it will be interpreted as movies.
The collection defaults to opensource which is Community Texts. Use opensource_movies for Community Videos. Otherwise you will have to email info@archive.org from the email address associated with the account, and ask them to move the item for you.
Replacing files in a subfolder is not possible
You can replace files by uploading them again. However, this only works for files in the top level of the item.
Do not try to upload individual files to replace files in a subfolder; they will not go into the subfolder. You have to replace the whole subfolder!
Resuming is not possible
As far as I know, there is no way to resume an interrupted transfer. So if you only get part of a file uploaded, it is gone. If you only get part of a subfolder uploaded, you need to delete it and try again.
Delete all content for a given identifier
ia delete identifier --all -H x-archive-keep-old-version:0
Give it a few minutes to finish. The tool will finish before the server actually deletes everything; be patient.
It is possible some files will still be left behind, e.g. metadata .xml, .sqlite, and a thumbnail image. Don't sweat it.
See what files are in the item
ia list identifier
See what changes are pending
Whenever you upload, delete, or change something, a bunch of related tasks are added to a queue. To get a list of pending tasks applicable to a certain item, you can visit https://archive.org/catalog.php?identifier=foo (replace foo) which is also linked from https://archive.org/manage/foo (the archive item manager).
A history page at https://catalogd.archive.org/history/foo shows even more info.
See what metadata is associated with the item and its files
ia metadata identifier
Update metadata
ia metadata identifier --modify="date:1985"