User:Mjb/ia

From Offset
< User:Mjb
Revision as of 19:43, 6 August 2020 by Mjb (talk | contribs) (Upload a folder of videos)
Jump to navigationJump to search

Sometimes, to upload and manage content on archive.org, I use the Internet Archive command-line tool, ia.

The official documentation for it is pretty good:

What I am writing here is just a supplement to cover some things that came up as I used the tool.

Installation and setup

It only runs on Unix-like systems, so I installed it in a Lubuntu installation which I am running on a virtual machine in VirtualBox on Windows:

   sudo apt-get install ia

You only have to set your login credentials once, unless you are going to be using multiple accounts:

   ia configure

Bandwidth limiting

Sometimes I want to throttle the network traffic used by ia, but the normal way of doing this (using trickle) does not have any effect. So, with the help of a guide found in a web search, I discovered I can use tc to throttle the entire network interface (as seen by Lubuntu) instead:

   sudo tc qdisc add dev enp0s3 root tbf rate 880kbit latency 60ms burst 1540

In that command, 880kbit is the rate I want to limit to (about 110 KB/s). enp0s3 is the name of the network interface, as found by running ip link.

I discovered through trial and error that 60ms latency results in no dropped packets, while 50ms results in about a 4% drop rate. To see the packet stats:

   tc -s -d qdisc ls dev enp0s3

Upload a folder of videos

Choose a URL-friendly identifier, say what files to upload, and set the mediatype correctly. Everything else is optional.

   ia upload identifier folderpath/* --metadata="title:This is a Better Title than the Identifier" --metadata=mediatype:movies --metadata=collection:opensource_movies --metadata=date:1990-05-20 --metadata=language:English 

Common mediatypes: texts, movies, image, or data. If you accidentally enter videos, it will be interpreted as movies.

The collection defaults to opensource which is Community Texts. Use opensource_movies for Community Videos. Otherwise you will have to email info@archive.org from the email address associated with the account, and ask them to move the item for you.

IA does not want you to upload folders, only files. The initial upload can include folders at the top level, but if you do that, you can't upload anything into those folders or modify/replace anything in them. You would have to re-upload the entire folder if you want to make any changes to it.

Replacing files in a subfolder is not possible

You can replace files by uploading them again. However, this only works for files in the top level of the item.

Do not try to upload individual files to replace files in a subfolder; they will not go into the subfolder. You have to replace the whole subfolder!

Resuming is not possible

As far as I know, there is no way to resume an interrupted transfer. So if you only get part of a file uploaded, it is gone. If you only get part of a subfolder uploaded, you need to delete it and try again.

Delete all content for a given identifier

   ia delete identifier --all -H x-archive-keep-old-version:0

Give it a few minutes to finish. The tool will finish before the server actually deletes everything; be patient.

It is possible some files will still be left behind, e.g. metadata .xml, .sqlite, and a thumbnail image. Don't sweat it.

See what files are in the item

   ia list identifier

See what changes are pending

Whenever you upload, delete, or change something, a bunch of related tasks are added to a queue. To get a list of pending tasks applicable to a certain item, you can visit https://archive.org/catalog.php?identifier=foo (replace foo) which is also linked from https://archive.org/manage/foo (the archive item manager).

A history page at https://catalogd.archive.org/history/foo shows even more info.

See what metadata is associated with the item and its files

   ia metadata identifier

Update metadata

   ia metadata identifier --modify="date:1985"