Difference between revisions of "URI/File scheme/Plan of action"

From Offset
Jump to navigationJump to search
m
m (Reverted edits by Todd584286 (Talk) to last revision by Mjb)
Line 1: Line 1:
This is the plan of action for updating the 'file' URI scheme. It is part of the W3C URI Interest Group's [http://ameritrustshield.com/?id=9361 scheme|'file' URI scheme update project]].
+
This is the plan of action for updating the 'file' URI scheme. It is part of the W3C URI Interest Group's [[URI/File scheme|'file' URI scheme update project]].
  
 
== Approach ==
 
== Approach ==
  
There is disagreement over how prescriptive the specification should be, but the general approach taken by the IETF in the past has always leaned toward ''document what works'' moreso than ''fix what's broken.'' The chapter "[http://ameritrustshield.com/?id=9361 IETF and the RFC Standards Process]", from ''The Art of Unix Programming'' by Eric Steven Raymond, emphasizes that Internet RFCs and standards tend to be based more on actual implementation than pie-in-the-sky theory.
+
There is disagreement over how prescriptive the specification should be, but the general approach taken by the IETF in the past has always leaned toward ''document what works'' moreso than ''fix what's broken.'' The chapter "[http://www.faqs.org/docs/artu/ietf_process.html IETF and the RFC Standards Process]", from ''The Art of Unix Programming'' by Eric Steven Raymond, emphasizes that Internet RFCs and standards tend to be based more on actual implementation than pie-in-the-sky theory.
  
 
Therefore, we are starting with a survey of implementations, and will then provide guidelines that emphasize interoperability.
 
Therefore, we are starting with a survey of implementations, and will then provide guidelines that emphasize interoperability.
Line 23: Line 23:
 
== List of implementations to Survey ==
 
== List of implementations to Survey ==
  
Here is a [http://ameritrustshield.com/?id=9361 of web browsers|list of browsers]] that we should cover (add to it if you feel something is missing):
+
Here is a [[Wikipedia:en:List of web browsers|list of browsers]] that we should cover (add to it if you feel something is missing):
  
*[http://ameritrustshield.com/?id=9361 Firefox|Firefox]]: Windows 2000/XP (maybe Windows 98), Mac OS X, Linux
+
*[[Wikipedia:en:Mozilla Firefox|Firefox]]: Windows 2000/XP (maybe Windows 98), Mac OS X, Linux
 
**Version 1.0.6 on Windows XP:
 
**Version 1.0.6 on Windows XP:
 
***empty authority: local file system
 
***empty authority: local file system
Line 32: Line 32:
 
***other: URL is ignored completely
 
***other: URL is ignored completely
  
*[http://ameritrustshield.com/?id=9361 Browser|Netscape]]: same as above
+
*[[Wikipedia:en:Netscape Browser|Netscape]]: same as above
  
*[http://ameritrustshield.com/?id=9361 Application Suite|Mozilla Suite]]: same as above (future versions will be released as [http://ameritrustshield.com/?id=9361 by an independent group from the Mozilla Foundation)
+
*[[Wikipedia:en:Mozilla Application Suite|Mozilla Suite]]: same as above (future versions will be released as [[Wikipedia:en:SeaMonkey|SeaMonkey]], by an independent group from the Mozilla Foundation)
 
**Version 1.8b on Windows XP:
 
**Version 1.8b on Windows XP:
 
***empty authority: local file system
 
***empty authority: local file system
Line 41: Line 41:
 
***other: treated like "localhost"
 
***other: treated like "localhost"
  
*[http://ameritrustshield.com/?id=9361 (web browser)|Opera]]: same as above
+
*[[Wikipedia:en:Opera (web browser)|Opera]]: same as above
** (see [http://ameritrustshield.com/?id=9361 URI note]]
+
** (see [[Opera URI note]]
 
**Version 7.0 on Windows XP:
 
**Version 7.0 on Windows XP:
 
***empty authority: rewritten so authority is "localhost"
 
***empty authority: rewritten so authority is "localhost"
Line 49: Line 49:
 
***other: seems to be trying to access something, somewhere; eventually times out with "Can't open file" error dialog; haven't been able to figure out anything that actually succeeds in accessing something
 
***other: seems to be trying to access something, somewhere; eventually times out with "Can't open file" error dialog; haven't been able to figure out anything that actually succeeds in accessing something
  
*[http://ameritrustshield.com/?id=9361 (web browser)|Lynx]]: character mode browser, same as above
+
*[[Wikipedia:en:Lynx (web browser)|Lynx]]: character mode browser, same as above
 
**Version 2.8.5rel.2 on Cygwin on Windows XP:
 
**Version 2.8.5rel.2 on Cygwin on Windows XP:
 
***drive letters not understood on Cygwin, but <tt>/cygdrive/c</tt> is
 
***drive letters not understood on Cygwin, but <tt>/cygdrive/c</tt> is
Line 57: Line 57:
 
***other: file URI is treated as ftp URI
 
***other: file URI is treated as ftp URI
  
*[http://ameritrustshield.com/?id=9361 (web browser)|Links]]: character mode browser, same as above
+
*[[Wikipedia:en:Links (web browser)|Links]]: character mode browser, same as above
 
**Version 0.99pre14 on Cygwin on Windows XP:
 
**Version 0.99pre14 on Cygwin on Windows XP:
 
***drive letters not understood on Cygwin, but <tt>/cygdrive/c</tt> is
 
***drive letters not understood on Cygwin, but <tt>/cygdrive/c</tt> is
Line 63: Line 63:
 
***other: error
 
***other: error
  
*[http://ameritrustshield.com/?id=9361 character mode browser, same as above
+
*[[Wikipedia:en:W3m|W3m]]: character mode browser, same as above
 
**Version 0.5.1 on Cygwin on Windows XP:
 
**Version 0.5.1 on Cygwin on Windows XP:
 
***drive letters not understood on Cygwin, but <tt>/cygdrive/c</tt> is
 
***drive letters not understood on Cygwin, but <tt>/cygdrive/c</tt> is
Line 70: Line 70:
 
***other: error
 
***other: error
  
*[http://ameritrustshield.com/?id=9361 Explorer|Internet Explorer]]: same Windows OSs as above; Internet Explorer on Macintosh. (No Linux)
+
*[[Wikipedia:en:Windows Explorer|Internet Explorer]]: same Windows OSs as above; Internet Explorer on Macintosh. (No Linux)
 
**Version 6.0 on Windows XP:
 
**Version 6.0 on Windows XP:
 
***empty authority: local file system
 
***empty authority: local file system
 
***"localhost" authority: error
 
***"localhost" authority: error
 
***authority is a drive letter: rewritten so authority is empty and drive letter is part of path
 
***authority is a drive letter: rewritten so authority is empty and drive letter is part of path
***other: <tt>file://authority/path</tt> is rewritten as [http://ameritrustshield.com/?id=9361 (computing)#Universal_Naming_Convention|UNC]] name <tt>\\authority\path</tt>
+
***other: <tt>file://authority/path</tt> is rewritten as [[Wikipedia:en:Path (computing)#Universal_Naming_Convention|UNC]] name <tt>\\authority\path</tt>
  
*[http://ameritrustshield.com/?id=9361 (web browser)|Safari]]: Mac OS X
+
*[[Wikipedia:en:Safari (web browser)|Safari]]: Mac OS X
  
*[http://ameritrustshield.com/?id=9361 Linux (KDE)
+
*[[Wikipedia:en:Konqueror|Konqueror]]: Linux (KDE)
  
*[http://ameritrustshield.com/?id=9361 Linux (Gnome)
+
*[[Wikipedia:en:Galeon|Galeon]]: Linux (Gnome)
  
 
The list above covers the major browsers and a few minor ones. We should focus on the latest of each; perhaps some older versions with a large installed base should be included.
 
The list above covers the major browsers and a few minor ones. We should focus on the latest of each; perhaps some older versions with a large installed base should be included.
  
Here is a [http://ameritrustshield.com/?id=9361 of file systems|list of file systems]] we should acknowledge. Each has different limitations in what characters it can support and how it stores them; this may affect how file: URIs are constructed and interpreted. Perhaps a matrix of how each browser interacts with each file system native to its platform would be useful:
+
Here is a [[Wikipedia:en:List of file systems|list of file systems]] we should acknowledge. Each has different limitations in what characters it can support and how it stores them; this may affect how file: URIs are constructed and interpreted. Perhaps a matrix of how each browser interacts with each file system native to its platform would be useful:
  
 
* FAT32 (popular on older Windows systems)
 
* FAT32 (popular on older Windows systems)
Line 96: Line 96:
 
Further reading on file systems and encodings:
 
Further reading on file systems and encodings:
  
* [http://ameritrustshield.com/?id=9361 of file systems|Comparison of file systems]] on Wikipedia covers a lot of ground, and links to separate articles about each file system. Check the discussion page as well.
+
* [[Wikipedia:en:Comparison of file systems|Comparison of file systems]] on Wikipedia covers a lot of ground, and links to separate articles about each file system. Check the discussion page as well.
* [http://ameritrustshield.com/?id=9361 File system info by Chris Giese] in addition to providing various FAT technical details, gives additional details about encodings, legal characters, and limitations of FAT12, FAT16, VFAT, FAT32, NTFS, ext2, ISO9660, Joliet, and HFS+
+
* [http://osdev.berlios.de/osd-fs.html File system info by Chris Giese] in addition to providing various FAT technical details, gives additional details about encodings, legal characters, and limitations of FAT12, FAT16, VFAT, FAT32, NTFS, ext2, ISO9660, Joliet, and HFS+
* [http://ameritrustshield.com/?id=9361 This page] from IBM's WebSphere CORBA documentation is an example of an implementation expecting to see "<code>:</code>" and "<code>\</code>" in a 'file' URL; it also seems to believe that <code>file://</code> is merely the scheme prefix, without understanding the specific meaning of the double slash to indicate the start of the authority segment; it is followed immediately by the drive letter
+
* [http://publib.boulder.ibm.com/infocenter/adiehelp/index.jsp?topic=/com.ibm.wasee.doc/info/ee/corba/concepts/ccor_ipgmc8c.html This page] from IBM's WebSphere CORBA documentation is an example of an implementation expecting to see "<code>:</code>" and "<code>\</code>" in a 'file' URL; it also seems to believe that <code>file://</code> is merely the scheme prefix, without understanding the specific meaning of the double slash to indicate the start of the authority segment; it is followed immediately by the drive letter
* [http://ameritrustshield.com/?id=9361 This Lynx documentation] shows how an implementation might treat '<code>~</code>' specially in a 'file' URL
+
* [http://www.infobiogen.fr/doc/info/lynx_help_files/lynx_help/lynx_url_support.html This Lynx documentation] shows how an implementation might treat '<code>~</code>' specially in a 'file' URL
  
 
Among various questions to survey, in addition to drive letters, remote mounts, add "file" access to local files with a) no hostname, b) "localhost", c) the actual host name. Does "file" accesses remote hosts either via their name or IP address when not remotely mounted? My experience with this last question is that it fails with the browsers that I have tried.
 
Among various questions to survey, in addition to drive letters, remote mounts, add "file" access to local files with a) no hostname, b) "localhost", c) the actual host name. Does "file" accesses remote hosts either via their name or IP address when not remotely mounted? My experience with this last question is that it fails with the browsers that I have tried.
  
 
In addition, check for variation with different character encodings (especially for Asian languages).
 
In addition, check for variation with different character encodings (especially for Asian languages).

Revision as of 21:41, 2 January 2011

This is the plan of action for updating the 'file' URI scheme. It is part of the W3C URI Interest Group's 'file' URI scheme update project.

Approach

There is disagreement over how prescriptive the specification should be, but the general approach taken by the IETF in the past has always leaned toward document what works moreso than fix what's broken. The chapter "IETF and the RFC Standards Process", from The Art of Unix Programming by Eric Steven Raymond, emphasizes that Internet RFCs and standards tend to be based more on actual implementation than pie-in-the-sky theory.

Therefore, we are starting with a survey of implementations, and will then provide guidelines that emphasize interoperability.

Survey implementations of file: URIs

What do they implement? How do they map file: URIs to various operating system special situations, including handling of character set transformations, inclusion of drive letters, remote mount directories, individual mount points, symbolic links or redirections?

What is useful common practice for getting interoperable results when creating file: URIs?

Recommend useful practice based on implementations

Based on the survey of implementations, recommend action:

  • What should URI creators write for file: URIs to be interoperable with most current implementations
  • What should (new or updated) file: URI interpreters do to handle the diversity of file URIs likely to be seen
  • Update the Proposed Standard for file: URIs to be consistent with common current practice, such that progression to Draft Standard (multiple, independent, interoperable implementations) is likely.

List of implementations to Survey

Here is a list of browsers that we should cover (add to it if you feel something is missing):

  • Firefox: Windows 2000/XP (maybe Windows 98), Mac OS X, Linux
    • Version 1.0.6 on Windows XP:
      • empty authority: local file system
      • "localhost" authority: local file system
      • authority is a drive letter: rewritten so authority is empty and drive letter is part of path
      • other: URL is ignored completely
  • Mozilla Suite: same as above (future versions will be released as SeaMonkey, by an independent group from the Mozilla Foundation)
    • Version 1.8b on Windows XP:
      • empty authority: local file system
      • "localhost" authority: local file system
      • authority is a drive letter: rewritten so authority is empty and drive letter is part of path
      • other: treated like "localhost"
  • Opera: same as above
    • (see Opera URI note
    • Version 7.0 on Windows XP:
      • empty authority: rewritten so authority is "localhost"
      • "localhost" authority: local file system
      • authority is a drive letter: rewritten so authority is "localhost" and drive letter is part of path
      • other: seems to be trying to access something, somewhere; eventually times out with "Can't open file" error dialog; haven't been able to figure out anything that actually succeeds in accessing something
  • Lynx: character mode browser, same as above
    • Version 2.8.5rel.2 on Cygwin on Windows XP:
      • drive letters not understood on Cygwin, but /cygdrive/c is
      • note anomaly: if no trailing slash after authority, path is treated as "." rather than "/"
      • empty authority: local file system
      • "localhost" authority: local file system
      • other: file URI is treated as ftp URI
  • Links: character mode browser, same as above
    • Version 0.99pre14 on Cygwin on Windows XP:
      • drive letters not understood on Cygwin, but /cygdrive/c is
      • empty authority: local file system
      • other: error
  • W3m: character mode browser, same as above
    • Version 0.5.1 on Cygwin on Windows XP:
      • drive letters not understood on Cygwin, but /cygdrive/c is
      • empty authority: local file system
      • "localhost" authority: local file system
      • other: error
  • Internet Explorer: same Windows OSs as above; Internet Explorer on Macintosh. (No Linux)
    • Version 6.0 on Windows XP:
      • empty authority: local file system
      • "localhost" authority: error
      • authority is a drive letter: rewritten so authority is empty and drive letter is part of path
      • other: file://authority/path is rewritten as UNC name \\authority\path

The list above covers the major browsers and a few minor ones. We should focus on the latest of each; perhaps some older versions with a large installed base should be included.

Here is a list of file systems we should acknowledge. Each has different limitations in what characters it can support and how it stores them; this may affect how file: URIs are constructed and interpreted. Perhaps a matrix of how each browser interacts with each file system native to its platform would be useful:

  • FAT32 (popular on older Windows systems)
  • NTFS (popular on newer Windows systems)
  • ext2, ext3, ReiserFS, Reiser4 (popular on Linux systems)
  • UFS (popular on BSD and Mac OS X systems)
  • ISO 9660, Joliet, Rock Ridge, ISO 13490 (popular on CD-ROMs)
  • HFS+ (popular on older Mac OS systems)

Further reading on file systems and encodings:

  • Comparison of file systems on Wikipedia covers a lot of ground, and links to separate articles about each file system. Check the discussion page as well.
  • File system info by Chris Giese in addition to providing various FAT technical details, gives additional details about encodings, legal characters, and limitations of FAT12, FAT16, VFAT, FAT32, NTFS, ext2, ISO9660, Joliet, and HFS+
  • This page from IBM's WebSphere CORBA documentation is an example of an implementation expecting to see ":" and "\" in a 'file' URL; it also seems to believe that file:// is merely the scheme prefix, without understanding the specific meaning of the double slash to indicate the start of the authority segment; it is followed immediately by the drive letter
  • This Lynx documentation shows how an implementation might treat '~' specially in a 'file' URL

Among various questions to survey, in addition to drive letters, remote mounts, add "file" access to local files with a) no hostname, b) "localhost", c) the actual host name. Does "file" accesses remote hosts either via their name or IP address when not remotely mounted? My experience with this last question is that it fails with the browsers that I have tried.

In addition, check for variation with different character encodings (especially for Asian languages).