Difference between revisions of "User:Mjb/ia"

From Offset
Jump to navigationJump to search
(Replacing files in a folder is not possible: subfolder!)
(Resuming is not possible)
 
(9 intermediate revisions by the same user not shown)
Line 31: Line 31:
 
==Upload a folder of videos==
 
==Upload a folder of videos==
  
Choose a URL-friendly ''identifier'', and say what file or folder to upload, and set the mediatype correctly. Everything else is optional.
+
Choose a URL-friendly ''identifier'', say what files to upload, and set the mediatype correctly. Everything else is optional.
  
     ia upload ''identifier'' ''folderpath'' --metadata="title:''This is a Better Title than the Identifier" --metadata="mediatype:movies" --metadata="date:1990-05-20" --metadata="language:English"
+
     ia upload ''identifier'' ''folderpath/*'' --metadata="title:''This is a Better Title than the Identifier" --metadata=mediatype:movies --metadata=collection:opensource_movies --metadata=date:1990-05-20 --metadata=language:English  
  
 
Common mediatypes: ''texts'', ''movies'', ''image'', or ''data''. If you accidentally enter ''videos'', it will be interpreted as ''movies''.
 
Common mediatypes: ''texts'', ''movies'', ''image'', or ''data''. If you accidentally enter ''videos'', it will be interpreted as ''movies''.
  
When you upload a folder, it only uploads the contents of the folder, not the top level folder you gave on the command line.
+
The collection defaults to ''opensource'' which is Community Texts. Use ''opensource_movies'' for Community Videos. Otherwise you will have to email info@archive.org from the email address associated with the account, and ask them to move the item for you.
  
==Replacing files in a subfolder is not possible==
+
==Subfolders are static==
  
You can replace files in the top level of the item only. So if you use the tool to upload a folder, you can upload more files (overwriting, if they already exist), but not the files that are in a subfolder.
+
You can replace files by uploading them again. However, this only works for files in the top level of the item.
  
So do not try to upload individual files to replace files in a subfolder; they will not go into the subfolder. You have to delete and upload the whole subfolder!
+
Do not try to upload individual files to replace files in a subfolder; they will not go into the subfolder. You have to replace the whole subfolder!
 +
 
 +
This is because IA does not want you to upload folders, only files. For some reason, they do allow the uploading of top-level folders, but if you do that, you can't upload anything into those folders or modify/replace anything in them. You would have to re-upload the entire folder if you want to make any changes to it.
  
 
==Resuming is not possible==
 
==Resuming is not possible==
  
As far as I know, there is no way to resume an interrupted transfer. So if you only get part of a file uploaded, it is gone. If you only get part of a folder uploaded, you need to delete everything and try again.
+
As far as I know, there is no way to resume an interrupted transfer. So if you only get part of a file uploaded, it is gone. If you only get part of a subfolder uploaded, you need to delete it and try again.
 +
 
 +
==Deleting works best when nothing else is going on==
 +
 
 +
It seems directories get locked when uploading is happening or tasks are queued. If you try to delete anything in the meantime, you will get a "no files found" message, even if the files are there. Try again when nothing is going on.
  
 
==Delete all content for a given identifier==
 
==Delete all content for a given identifier==
Line 63: Line 69:
 
==See what changes are pending==
 
==See what changes are pending==
  
Whenever you upload, delete, or change something, a bunch of related tasks are added to a queue. To get a list of pending tasks applicable to a certain item, you can visit https://archive.org/catalog.php?identifier=''foo'' (replace ''foo'') which is also linked from https://archive.org/manage/''foo'' (the archive item manager).
+
Whenever you upload, delete, or change something, a bunch of related tasks are added to a queue. To get a list of pending tasks applicable to a certain item, you can visit <nowiki>https://archive.org/catalog.php?identifier=</nowiki>''foo'' (replace ''foo'') which is also linked from <nowiki>https://archive.org/manage/</nowiki>''foo'' (the archive item manager).
  
A history page at https://catalogd.archive.org/history/''foo'' shows even more info.
+
A history page at <nowiki>https://catalogd.archive.org/history/</nowiki>''foo'' shows even more info.
  
 
==See what metadata is associated with the item and its files==
 
==See what metadata is associated with the item and its files==

Latest revision as of 21:57, 7 August 2020

Sometimes, to upload and manage content on archive.org, I use the Internet Archive command-line tool, ia.

The official documentation for it is pretty good:

What I am writing here is just a supplement to cover some things that came up as I used the tool.

Installation and setup

It only runs on Unix-like systems, so I installed it in a Lubuntu installation which I am running on a virtual machine in VirtualBox on Windows:

   sudo apt-get install ia

You only have to set your login credentials once, unless you are going to be using multiple accounts:

   ia configure

Bandwidth limiting

Sometimes I want to throttle the network traffic used by ia, but the normal way of doing this (using trickle) does not have any effect. So, with the help of a guide found in a web search, I discovered I can use tc to throttle the entire network interface (as seen by Lubuntu) instead:

   sudo tc qdisc add dev enp0s3 root tbf rate 880kbit latency 60ms burst 1540

In that command, 880kbit is the rate I want to limit to (about 110 KB/s). enp0s3 is the name of the network interface, as found by running ip link.

I discovered through trial and error that 60ms latency results in no dropped packets, while 50ms results in about a 4% drop rate. To see the packet stats:

   tc -s -d qdisc ls dev enp0s3

Upload a folder of videos

Choose a URL-friendly identifier, say what files to upload, and set the mediatype correctly. Everything else is optional.

   ia upload identifier folderpath/* --metadata="title:This is a Better Title than the Identifier" --metadata=mediatype:movies --metadata=collection:opensource_movies --metadata=date:1990-05-20 --metadata=language:English 

Common mediatypes: texts, movies, image, or data. If you accidentally enter videos, it will be interpreted as movies.

The collection defaults to opensource which is Community Texts. Use opensource_movies for Community Videos. Otherwise you will have to email info@archive.org from the email address associated with the account, and ask them to move the item for you.

Subfolders are static

You can replace files by uploading them again. However, this only works for files in the top level of the item.

Do not try to upload individual files to replace files in a subfolder; they will not go into the subfolder. You have to replace the whole subfolder!

This is because IA does not want you to upload folders, only files. For some reason, they do allow the uploading of top-level folders, but if you do that, you can't upload anything into those folders or modify/replace anything in them. You would have to re-upload the entire folder if you want to make any changes to it.

Resuming is not possible

As far as I know, there is no way to resume an interrupted transfer. So if you only get part of a file uploaded, it is gone. If you only get part of a subfolder uploaded, you need to delete it and try again.

Deleting works best when nothing else is going on

It seems directories get locked when uploading is happening or tasks are queued. If you try to delete anything in the meantime, you will get a "no files found" message, even if the files are there. Try again when nothing is going on.

Delete all content for a given identifier

   ia delete identifier --all -H x-archive-keep-old-version:0

Give it a few minutes to finish. The tool will finish before the server actually deletes everything; be patient.

It is possible some files will still be left behind, e.g. metadata .xml, .sqlite, and a thumbnail image. Don't sweat it.

See what files are in the item

   ia list identifier

See what changes are pending

Whenever you upload, delete, or change something, a bunch of related tasks are added to a queue. To get a list of pending tasks applicable to a certain item, you can visit https://archive.org/catalog.php?identifier=foo (replace foo) which is also linked from https://archive.org/manage/foo (the archive item manager).

A history page at https://catalogd.archive.org/history/foo shows even more info.

See what metadata is associated with the item and its files

   ia metadata identifier

Update metadata

   ia metadata identifier --modify="date:1985"