OS.Path
structure
signature OS_PATH
structure Path
: OS_PATH
The OS.Path
structure provides support for manipulating the syntax of file system paths independent of the underlying file system. It is purposely designed not to rely on any file system operations: none of the functions accesses the actual file system. There are two reasons for this design: many systems support multiple file systems that may have different semantics and applications may need to manipulate paths that do not exist in the underlying file system.
Before discussing the model of paths and the semantics of the individual operations, we need to define some terms:
#"/"
in Unix in Microsoft Windows, #"\\"
is used. This character is #"/"
in Unix in Microsoft Windows, both #"\\"
and #"/"
are allowed. For example, in Unix, the path "abc/def"
contains two arcs: "abc"
and "def"
. There are two special arcs: parentArc
and currentArc
. Under both Unix and Microsoft Windows, the parentArc
is ".."
and currentArc
is "."
. An empty arc corresponds to an empty string.
Although represented concretely as a string, an arc should be viewed as an abstraction in the context of the OS.Path
structure, with a limited set of valid representations. In particular, a non-empty string a corresponds to valid representation of an arc only if
returns fromString
a{isAbs=false, vol="", arcs=[a]}
.
"/"
and "/a/b"
; Microsoft Windows examples include "\"
, "\a\b"
, and "A:\a\b"
.
".."
and "a/b"
; Microsoft Windows examples include ".."
, "a\b"
, and "A:a\b"
.
"."
"/."
, "/"
, "a"
, "a/b/c"
, ".."
, "../a"
, "../../a/b/c"
, and "/a/b/c"
.
System note [WINDOWS]:
In a Microsoft Windows implementation, canonical paths are entirely lowercase.
""
. Under Microsoft Windows, example volume names are ""
, "A:"
, and "C:"
.
In addition to operations for canonicalizing paths and computing relative paths, the Path
structure supports path manipulations relative to three different views of a path:
""
volume.
#"."
. This works for Microsoft Windows, OS/2, and Unix; the Macintosh does not really have a notion of extension.
Our main design principle is that the functions should behave in a natural fashion when applied to canonical paths. All functions, except concat
, preserve canonical paths, i.e., if all arguments are canonical, then so is the result.
Note that although the model of path manipulation provided by the Path
structure is operating system independent, the analysis of strings is not. In particular, any given implementation of the Path
structure has an implicit notion of what the arc separator character is. Thus, on a Microsoft Windows system, Path
will treat the string "\\d\\e"
as representing an absolute path with two arcs, whereas on a Unix system, it will correspond to a relative path with one arc.
exception Path
exception InvalidArc
val parentArc : string
val currentArc : string
val fromString : string
-> {
isAbs : bool,
vol : string,
arcs : string list
}
val toString : {
isAbs : bool,
vol : string,
arcs : string list
} -> string
val validVolume : {isAbs : bool, vol : string} -> bool
val getVolume : string -> string
val getParent : string -> string
val splitDirFile : string -> {dir : string, file : string}
val joinDirFile : {dir : string, file : string} -> string
val dir : string -> string
val file : string -> string
val splitBaseExt : string
-> {base : string, ext : string option
}
val joinBaseExt : {base : string, ext : string option}
-> string
val base : string -> string
val ext : string -> string option
val mkCanonical : string -> string
val isCanonical : string -> bool
val mkAbsolute : {path : string, relativeTo : string}
-> string
val mkRelative : {path : string, relativeTo : string}
-> string
val isAbsolute : string -> bool
val isRelative : string -> bool
val isRoot : string -> bool
val concat : string * string -> string
val fromUnixPath : string -> string
val toUnixPath : string -> string
val parentArc : string
".."
on Microsoft Windows and Unix).
val currentArc : string
"."
on Microsoft Windows and Unix).
fromString path
{isAbs, vol, arcs}
of the path specified by path. vol is the volume name and arcs is the list of (possibly empty) arcs of the path. isAbs is true
if the path is absolute. Under Unix, the volume name is always the empty string; under Microsoft Windows, in addition it can have the form "A:"
, "C:"
, etc.
Here are some examples for Unix paths:
path |
fromString path
|
---|---|
""
|
{isAbs=false, vol="", arcs=[]}
|
"/"
|
{isAbs=true, vol="", arcs=[""]}
|
"//"
|
{isAbs=true, vol="", arcs=["", ""]}
|
"a"
|
{isAbs=false, vol="", arcs=["a"]}
|
"/a"
|
{isAbs=true, vol="", arcs=["a"]}
|
"//a"
|
{isAbs=true, vol="", arcs=["","a"]}
|
"a/"
|
{isAbs=false, vol="", arcs=["a", ""]}
|
"a//"
|
{isAbs=false, vol="", arcs=["a", "", ""]}
|
"a/b"
|
{isAbs=false, vol="", arcs=["a", "b"]}
|
toString {isAbs, vol, arcs}
""
when applied to {isAbs=false, vol="", arcs=[]}
. The exception Path
is raised if validVolume
{isAbs, vol}
is false
, or if isAbs is false
and arcs has an initial empty arc. The exception InvalidArc
is raised if any component in arcs is not a valid representation of an arc. The exception Size
is raised if the resulting string would have size greater than String.maxSize
.
is the identity. toString
o fromString
is also the identity, provided no exception is raised and none of the strings in arcs contains an embedded arc separator character. In addition, fromString
o toString
evaluates to isRelative
(toString
{isAbs=false, vol, arcs})true
when defined.
validVolume {isAbs, vol}
true
if vol is a valid volume name for an absolute or relative path, respectively as isAbs is true
or false
. Under Unix, the only valid volume name is ""
. Under Microsoft Windows, the valid volume names have the form "a:"
, "A:"
, "b:"
, "B:"
, etc. and, if isAbs = false
, also ""
. Under MacOS, isAbs can be true
if and only if vol is ""
.
getVolume path
getParent path
getParent
path = path
if and only if path is a root. If the last arc is empty or the parent arc, then getParent
appends a parent arc. If the last arc is the current arc, then it is replaced with the parent arc. Note that if path is canonical, then the result of getParent
will also be canonical.
Here are some examples for Unix paths:
path |
getParent path
|
---|---|
"/"
|
"/"
|
"a"
|
"."
|
"a/"
|
"a/.."
|
"a///"
|
"a///.."
|
"a/b"
|
"a"
|
"a/b/"
|
"a/b/.."
|
".."
|
"../.."
|
"."
|
".."
|
""
|
".."
|
splitDirFile path
""
, if the last arc is ""
.
Here are some examples for Unix paths:
path |
splitDirFile path
|
---|---|
""
|
{dir = "", file = ""}
|
"."
|
{dir = "", file = "."}
|
"b"
|
{dir = "", file = "b"}
|
"b/"
|
{dir = "b", file = ""}
|
"a/b"
|
{dir = "a", file = "b"}
|
"/a"
|
{dir = "/", file = "a"}
|
joinDirFile {dir, file}
InvalidArc
. The exception Size
is raised if the resulting string would have size greater than String.maxSize
.
dir path
file path
#dir o splitDirFile
and #file o splitDirFile
, respectively, although they are probably more efficient.
splitBaseExt path
"."
in the last arc; NONE
is returned if the extension is not defined. The base part is everything to the left of the extension except the final "."
. Note that if there is no extension, a terminating "."
is included with the base part.
Here are some examples for Unix paths:
path |
splitBaseExt path
|
---|---|
""
|
{base = "", ext = NONE}
|
".login"
|
{base = ".login", ext = NONE}
|
"/.login"
|
{base = "/.login", ext = NONE}
|
"a"
|
{base = "a", ext = NONE}
|
"a."
|
{base = "a.", ext = NONE}
|
"a.b"
|
{base = "a", ext = SOME "b"}
|
"a.b.c"
|
{base = "a.b", ext = SOME "c"}
|
".news/comp"
|
{base = ".news/comp", ext = NONE}
|
joinBaseExt {base, ext}
NONE
). It is a left inverse of splitBaseExt
, i.e., joinBaseExt
o splitBaseExt
is the identity. The opposite does not hold, since the extension may be empty, or may contain extension separators. Note that although splitBaseExt
will never return the extension SOME
("")
, joinBaseExt
treats this as equivalent to NONE
. The exception Size
is raised if the resulting string would have size greater than String.maxSize
.
base path
ext path
#base o splitBaseExt
and #ext o splitBaseExt
, respectively, although they are probably more efficient.
mkCanonical path
"."
under Unix and Microsoft Windows).
Note that the syntactic canonicalization provided by mkCanonical
may not preserve file system meaning in the presence of symbolic links (see concat
).
isCanonical path
true
if path is a canonical path. It is equivalent to (path = mkCanonical
path)
.
mkAbsolute {path, relativeTo}
mkCanonical
(concat
(abs, p))
. Thus, if path and relativeTo are canonical, the result will be canonical. If relativeTo is not absolute, or if the two paths refer to different volumes, then the Path
exception is raised. The exception Size
is raised if the resulting string would have size greater than String.maxSize
.
mkRelative {path, relativeTo}
If relativeTo is not absolute, or if path and relativeTo are both absolute but have different roots, the Path
exception is raised. The exception Size
is raised if the resulting string would have size greater than String.maxSize
.
Here are some examples for Unix paths:
path
|
relativeTo
|
mkRelative{path, relativeTo}
|
---|---|---|
"a/b"
|
"/c/d"
|
"a/b"
|
"/"
|
"/a/b/c"
|
"../../.."
|
"/a/b/"
|
"/a/c"
|
"../b/"
|
"/a/b"
|
"/a/c"
|
"../b"
|
"/a/b/"
|
"/a/c/"
|
"../b/"
|
"/a/b"
|
"/a/c/"
|
"../b"
|
"/"
|
"/"
|
"."
|
"/"
|
"/."
|
"."
|
"/"
|
"/.."
|
"."
|
"/a/b/../c"
|
"/a/d"
|
"../b/../c"
|
"/a/b"
|
"/c/d"
|
"../../a/b"
|
"/c/a/b"
|
"/c/d"
|
"../a/b"
|
"/c/d/a/b"
|
"/c/d"
|
"a/b"
|
isAbsolute path
isRelative path
true
if path is, respectively, absolute or relative.
isRoot path
true
if path is a canonical specification of a root directory.
concat (path, t)
Path
if t is not a relative path or if path and t refer to different volumes. The exception Size
is raised if the resulting string would have size greater than String.maxSize
.
A implementation of concat
might be:
fun concat (p1, p2) = (case (fromString p1, fromString p2) of (_, {isAbs=true, ...}) => raise Path | ({isAbs, vol=v1, arcs=al1}, {vol=v2, arcs=al2, ...} ) => if ((v2 = "") orelse (v1 = v2)) then toString{ isAbs=isAbs, vol=v1, arcs=concatArcs(al1, al2) } else raise Path (* end case *))where
concatArcs
is like List.@
, except that a trailing empty arc in the first argument is dropped. Note that concat
should not be confused with the concatenation of two strings.
concat
does not preserve canonical paths. For example,
returns concat
("a/b", "../c")"a/b/../c"
. The parent arc is not removed because "a/b/../c"
and "a/c"
may not be equivalent in the presence of symbolic links.
fromUnixPath s
InvalidArc
exception if any arc in the Unix path is invalid in the host system's path syntax (e.g., an arc that has a backslash character in it when the host system is Microsoft Windows).
Note that the syntax of Unix pathnames necessarily limits this function. It is not possible to specify paths that have a non-empty volume name or paths that have a slash in one of their arcs using this function.
toUnixPath s
Path
exception is raised. Also, if any arc in the pathname contains the slash character, then the InvalidArc
exception is raised.
OS
,OS.FileSys
,OS.IO
,OS.Process
,Posix.FileSys
Syntactically, two paths can be checked for equality by applying string equality to canonical versions of the paths. Since volumes and individual arcs are just special classes of paths, an identical test for equality can be applied to these classes.
Generated April 12, 2004
Last Modified June 18, 2002
Comments to John Reppy.
This document may be distributed freely over the internet as long as the copyright notice and license terms below are prominently displayed within every machine-readable copy.
Copyright © 2004 AT&T and Lucent Technologies. All rights reserved.
Permission is granted for internet users to make one paper copy for their
own personal use. Further hardcopy reproduction is strictly prohibited.
Permission to distribute the HTML document electronically on any medium
other than the internet must be requested from the copyright holders by
contacting the editors.
Printed versions of the SML Basis Manual are available from Cambridge
University Press.
To order, please visit
www.cup.org (North America) or
www.cup.cam.ac.uk (outside North America). |