Difference between revisions of "User:Mjb/ia"

From Offset
Jump to navigationJump to search
(Replacing files in a folder is not possible: subfolder!)
(Replacing files in a subfolder is not possible)
Line 41: Line 41:
 
==Replacing files in a subfolder is not possible==
 
==Replacing files in a subfolder is not possible==
  
You can replace files in the top level of the item only. So if you use the tool to upload a folder, you can upload more files (overwriting, if they already exist), but not the files that are in a subfolder.
+
You can replace files by uploading them again. However, this only works for files in the top level of the item.
  
So do not try to upload individual files to replace files in a subfolder; they will not go into the subfolder. You have to delete and upload the whole subfolder!
+
Do not try to upload individual files to replace files in a subfolder; they will not go into the subfolder. You have to replace the whole subfolder!
  
 
==Resuming is not possible==
 
==Resuming is not possible==

Revision as of 17:00, 15 May 2020

Sometimes, to upload and manage content on archive.org, I use the Internet Archive command-line tool, ia.

The official documentation for it is pretty good:

What I am writing here is just a supplement to cover some things that came up as I used the tool.

Installation and setup

It only runs on Unix-like systems, so I installed it in a Lubuntu installation which I am running on a virtual machine in VirtualBox on Windows:

   sudo apt-get install ia

You only have to set your login credentials once, unless you are going to be using multiple accounts:

   ia configure

Bandwidth limiting

Sometimes I want to throttle the network traffic used by ia, but the normal way of doing this (using trickle) does not have any effect. So, with the help of a guide found in a web search, I discovered I can use tc to throttle the entire network interface (as seen by Lubuntu) instead:

   sudo tc qdisc add dev enp0s3 root tbf rate 880kbit latency 60ms burst 1540

In that command, 880kbit is the rate I want to limit to (about 110 KB/s). enp0s3 is the name of the network interface, as found by running ip link.

I discovered through trial and error that 60ms latency results in no dropped packets, while 50ms results in about a 4% drop rate. To see the packet stats:

   tc -s -d qdisc ls dev enp0s3

Upload a folder of videos

Choose a URL-friendly identifier, and say what file or folder to upload, and set the mediatype correctly. Everything else is optional.

   ia upload identifier folderpath --metadata="title:This is a Better Title than the Identifier" --metadata="mediatype:movies" --metadata="date:1990-05-20" --metadata="language:English"

Common mediatypes: texts, movies, image, or data. If you accidentally enter videos, it will be interpreted as movies.

When you upload a folder, it only uploads the contents of the folder, not the top level folder you gave on the command line.

Replacing files in a subfolder is not possible

You can replace files by uploading them again. However, this only works for files in the top level of the item.

Do not try to upload individual files to replace files in a subfolder; they will not go into the subfolder. You have to replace the whole subfolder!

Resuming is not possible

As far as I know, there is no way to resume an interrupted transfer. So if you only get part of a file uploaded, it is gone. If you only get part of a folder uploaded, you need to delete everything and try again.

Delete all content for a given identifier

   ia delete identifier --all -H x-archive-keep-old-version:0

Give it a few minutes to finish. The tool will finish before the server actually deletes everything; be patient.

It is possible some files will still be left behind, e.g. metadata .xml, .sqlite, and a thumbnail image. Don't sweat it.

See what files are in the item

   ia list identifier

See what changes are pending

Whenever you upload, delete, or change something, a bunch of related tasks are added to a queue. To get a list of pending tasks applicable to a certain item, you can visit https://archive.org/catalog.php?identifier=foo (replace foo) which is also linked from https://archive.org/manage/foo (the archive item manager).

A history page at https://catalogd.archive.org/history/foo shows even more info.

See what metadata is associated with the item and its files

   ia metadata identifier

Update metadata

   ia metadata identifier --modify="date:1985"