PRIM_IO
signature
signature PRIM_IO
structure BinPrimIO
:> PRIM_IO
where type array = Word8Array.array
where type vector = Word8Vector.vector
where type elem = Word8.word
where type pos = Position.int
structure TextPrimIO
:> PRIM_IO
where type array = CharArray.array
where type vector = CharVector.vector
where type elem = Char.char
structure WideTextPrimIO
:> PRIM_IO (* OPTIONAL *)
where type array = WideCharArray.array
where type vector = WideCharVector.vector
where type elem = WideChar.char
The PRIM_IO
signature is an abstraction of the fundamental system call operations commonly available on open file descriptors OS.IO.iodesc
. Imperative and Stream I/O facilities do not access the operating system directly, but access the appropriate Primitive I/O reader
and writer
that accomplishes the required system call.
Several operations in the PRIM_IO
interface will raise exceptions that have been left intentionally unspecified. The actual exception raised will usually be operating system-dependent, but may vary. For example, a reader
connected to a prime number generator might raise Overflow
. More typically, the close operation on a reader or writer may cause an exception to be raised if there is a failure in the underlying file system, such as the disk being full or the file server being unavailable. In addition, one would expect readVec
and readVecNB
to raise Size
if the resulting vector would exceed the maximum allowed vector size or if the input parameter is negative. Similarly, one would expect readArr
, readArrNB
, writeArr
, writeArrNB
, writeVec
, and writeVecNB
to raise Subscript
if array bounds are violated. Readers and writers should not, in general, raise the IO.Io
exception. It is assumed that the higher levels will catch the exceptions raised at the Primitive I/O level, and appropriately construct and raise the IO.Io
exception.
A reader
is required to raise IO.Io
if any of its functions, except close
or getPos
, is invoked after a call to close
. A writer
is required to raise IO.Io
if any of its functions, except close
, is invoked after a call to close
. In both cases, the cause
field of the exception should be IO.ClosedStream
.
type elem
type vector
type vector_slice
type array
type array_slice
eqtype pos
val compare : pos * pos -> order
datatype reader
= RD of {
name : string,
chunkSize : int,
readVec : (int -> vector) option,
readArr : (array_slice -> int) option,
readVecNB : (int -> vector option) option,
readArrNB : (array_slice -> int option) option,
block : (unit -> unit) option,
canInput : (unit -> bool) option,
avail : unit -> int option,
getPos : (unit -> pos) option,
setPos : (pos -> unit) option,
endPos : (unit -> pos) option,
verifyPos : (unit -> pos) option,
close : unit -> unit,
ioDesc : OS.IO.iodesc option
}
datatype writer
= WR of {
name : string,
chunkSize : int,
writeVec : (vector_slice -> int) option,
writeArr : (array_slice -> int) option,
writeVecNB : (vector_slice -> int option) option,
writeArrNB : (array_slice -> int option) option,
block : (unit -> unit) option,
canOutput : (unit -> bool) option,
getPos : (unit -> pos) option,
setPos : (pos -> unit) option,
endPos : (unit -> pos) option,
verifyPos : (unit -> pos) option,
close : unit -> unit,
ioDesc : OS.IO.iodesc option
}
val openVector : vector -> reader
val nullRd : unit -> reader
val nullWr : unit -> writer
val augmentReader : reader -> reader
val augmentWriter : writer -> writer
type elem
elem
type is an abstraction that represents the ``element'' of a file (or device, etc.). Typically, elements are either characters (char
) or bytes (Word8.word
).
type vector
type vector_slice
type array
type array_slice
vector
type is an immutable vector of elements; the vector_slice
type is a slice of a vector; the array
type is an mutable array of elements; and the array_slice
type is a slice of an array.
eqtype pos
compare (pos, pos')
LESS
, EQUAL
, or GREATER
when pos is less than, equal to, or greater than pos', respectively, in some underlying linear ordering on pos
values.
datatype reader
= RD of {
name : string,
chunkSize : int,
readVec : (int -> vector) option,
readArr : (array_slice -> int) option,
readVecNB : (int -> vector option) option,
readArrNB : (array_slice -> int option) option,
block : (unit -> unit) option,
canInput : (unit -> bool) option,
avail : unit -> int option,
getPos : (unit -> pos) option,
setPos : (pos -> unit) option,
endPos : (unit -> pos) option,
verifyPos : (unit -> pos) option,
close : unit -> unit,
ioDesc : OS.IO.iodesc option
}
reader
is an abstraction for a source of items of type elem
. Usually, it will correspond to a file or device opened for reading. It can also represent the output of some algorithm or function, not necessarily connected to the outside world, that produces elements. The resulting sequence of elements is potentially unbounded. In the description below, we will usually refer to the limit sequence as a ``file,'' as this is the most common instance.
name
reader
, used in error messages shown to the user.
chunkSize
reader
. This is typically set to the block size of the operating system's buffers. chunkSize
= 1 strongly recommends unbuffered reads, but since buffering is handled at a higher level, it cannot guarantee it. chunkSize
<= 0 is illegal.
readVec(n)
0
). It blocks, if necessary, until end-of-stream is detected or at least one element is available.
It is recommended that implementations of this function raise the Size
exception if n < 0.
readArr(slice)
0
(this function also returns 0
when slice is empty). It blocks, if necessary, until at least one element is available.
readVecNB(n)
SOME
(v)
; or if end-of-stream is detected without blocking, returns SOME
(fromList
[])
; or if a read would block, returns NONE
.
readArrNB(slice)
NONE
, otherwise it returns SOME
(n)
, where n is the number of elements actually read (0
on end-of-stream)
block()
canInput()
true
if and only if the next read can proceed without blocking.
avail()
NONE
if it cannot be determined. For files or strings, this is the file or string size minus the current position; for most other input sources, this is probably NONE
. This can be used as a hint by inputAll
. Note that this is a byte count, not an element count.
getPos()
getPos
function must be non-decreasing (in the absence of setPos
operations or other interference on the underlying object).
setPos(i)
endPos()
verifyPos()
getPos
, except that the latter may maintain its own notion of file position for efficiency, whereas verifyPos
will typically perform a system call to obtain the underlying operating system's value of the file position.
close
reader
closed and, if necessary, performs any cleanup and releases any operating system resources. Further operations on the reader
(besides close
and getPos
) raise IO.ClosedStream
.
ioDesc
readVec
, readVecNB
, readArr
, or readArrNB
must be provided. Providing more of the optional functions increases functionality and/or efficiency of clients.
readVec
, readArr
, and block
means that blocking input is not possible.
readVecNB
, readArrNB
, and canInput
means that non-blocking input is not possible.
readVecNB
means that non-blocking input requires two system calls (using canInput
and readVec
).
readArr
or readArrNB
means that input into an array requires extra copying. Note that the ``lazy functional stream'' model does not use arrays at all.
setPos
prevents random access.
avail
return a value helps the client perform very large input more efficiently, with one system call and no copying.
If the reader
can provide more than the minimum set of operations in a way that is more efficient then the obvious synthesis (see augmentReader
), then by all means it should do so. Providing more than the minimum by just doing the obvious synthesis inside the Primitive I/O layer is not recommended because then clients will not get the ``hint'' about which are the efficient (``recommended'') operations. Clients concerned with efficiency will make use of the operations provided natively, and may need to choose algorithms depending on which operations are provided; clients not concerned with efficiency or requiring certain operations can use the reader constructed by augmentReader
.
datatype writer
= WR of {
name : string,
chunkSize : int,
writeVec : (vector_slice -> int) option,
writeArr : (array_slice -> int) option,
writeVecNB : (vector_slice -> int option) option,
writeArrNB : (array_slice -> int option) option,
block : (unit -> unit) option,
canOutput : (unit -> bool) option,
getPos : (unit -> pos) option,
setPos : (pos -> unit) option,
endPos : (unit -> pos) option,
verifyPos : (unit -> pos) option,
close : unit -> unit,
ioDesc : OS.IO.iodesc option
}
writer
is a file (device, etc.) opened for writing. A writer
is an abstraction for a store of items of type elem
. Usually, it will correspond to a file or device opened for writing. It can also represent input to some algorithm or function, not necessarily connected to the outside world, which consumes the output to guide its computations. The resulting store of elements is potentially unbounded. In the description below, we will usually refer to the store as a ``file,'' as this is the most common instance.
name
chunkSize
writer
. This is typically set to the block size of the operating system's buffers. chunkSize
= 1 strongly recommends unbuffered writes, but since buffering is handled at a higher level, it cannot guarantee it. chunkSize
<= 0 is illegal.
writeVec(slice)
writeArr(slice)
writeVecNB(slice)
SOME
(n)
, where n is the number of elements actually written. Otherwise, if it would block then it returns NONE
without blocking.
writeVecNB(slice)
SOME
(n)
, where n is the number of elements actually written. Otherwise, if it would block then it returns NONE
without blocking.
block()
writer
is guaranteed to be able to write without blocking.
canOutput()
true
if and only if the next write can proceed without blocking.
getPos()
endPos()
setPos(i)
verifyPos()
getPos
, except that the latter may maintain its own notion of file position for efficiency, whereas verifyPos
will typically perform a system call to obtain the underlying operating system's value of the file position.
close()
writer
closed and, if necessary, performs any cleanup and releases any operating system resources. Further operations (other than close
) raise IO.ClosedStream
.
ioDesc
The write operations return the number of full elements that have been written. If the size of an element is greater than 1 byte, it is possible that an additional part of an element might be written. For example, if one tries to write 2 elements, each of size 3 bytes, the underlying system write operation may report that only 4 of the 6 bytes has been written. Thus, one full element has been written, plus part of the second, so the write operation would return 1.
One of writeVec
, writeVecNB
, writeArr
, or writeArrNB
must be provided. Providing more of the optional functions increases functionality and/or efficiency of clients.
writeVec
, writeArr
, and block
means that blocking output is not possible.
writeVecNB
, writeArrNB
, and canOutput
means that non-blocking output is not possible.
writeArr
or writeArrNB
means that extra copying will be required to write from an array.
setPos
prevents random access.
openVector v
val nullRd : unit -> reader
val nullWr : unit -> writer
nullRd
acts like a reader that is always at end-of-stream. The writer nullWr
serves as a sink; any data written using it is thrown away. Null readers and writers can be closed; if closed, they are expected to behave the same as any other closed reader or writer.
augmentReader rd
readVec
, readArr
, readVecNB
, and readArrNB
are provided, by synthesizing these from the operations of rd
.
For example, augmentReader
can synthesize readVec
from readVecNB
and block
, synthesize vector reads from array reads, synthesize array reads from vector reads, as needed. The following table indicates how each synthesis can be accomplished.
Synthesize: | From: |
---|---|
readVec
|
readVec or readArr or (block and (readVecNB or readArrNB ))
|
readArr
|
readArr or readVec or (block and (readArrNB or readVecNB ))
|
readVecNB
|
readVecNB or readArrNB or (canInput and (readVec or readArr ))
|
readArrNB
|
readArrNB or readVecNB or (canInput and (readArr or readVec ))
|
augmentReader
should do no harm: if a reader rd supplies some operation (such as readArr
), then augmentReader
(rd)
provides the same implementation of that operation, not a synthesized one.
augmentWriter wr
writeVec
, writeArr
, writeVecNB
, and writeArrNB
are provided, by synthesizing these from the operations of wr
.
The following table indicates how each synthesis can be accomplished.
Synthesize: | From: |
---|---|
writeVec
|
writeVec or writeArr or (block and (writeVecNB or writeArrNB ))
|
writeArr
|
writeArr or writeVec or (block and (writeArrNB or writeVecNB ))
|
writeVecNB
|
writeVecNB or writeArrNB or (canOutput and (writeVec or writeArr ))
|
writeArrNB
|
writeArrNB or writeVecNB or (canOutput and (writeArr or writeVec ))
|
BinIO
,IMPERATIVE_IO
,OS.IO
,Posix.IO
,PrimIO
,STREAM_IO
,TextIO
It may not be possible to use augmentReader
or augmentWriter
to synthesize operations in a way that is thread-safe in concurrent systems.
None of the function components in readers and writers should block, except for the obvious ones: readVec
, readArray
, writeVec
, writeArray
, and block
.
The end-of-stream condition at the Stream I/O level is an artifact of a read operation at the Primitive I/O level returning 0 elements. Although this event is maintained in the stream, it is a transient condition at the Primitive I/O level. If a call to readVec
returns an empty vector one time, it is quite possible that another call to readVec
at the same file position will return elements. The value returned by an endPos
function should indicate the position after the last element in the file at the time the function is called.
Implementation note:
The functions
getPos
,setPos
,endPos
, andverifyPos
are used to support arbitrary random access on the underlying I/O file or device. On most systems, this is well-supported only on static objects such as strings or regular files. Typical implementations will therefore setgetPos
and the rest toNONE
in both readers and writers for all other file types.In an implementation where an input stream is represented as a chain of buffers, each buffer may contain, along with its vector of data, a
pos
value indicating where the data came from in the underlying file. As each buffer corresponds to a read operation, the client is likely to callgetPos
on each read operation. Thus, the reader should, if possible, maintain its own version of the file position, to be returned bygetPos
, to avoid extra system calls.Unlike readers, which can expect their
getPos
functions to be called frequently, writers need not implementgetPos
in a highly efficient manner: a system call for eachgetPos
is acceptable. Indeed, with a file opened for atomic append, the information cannot be obtained reliably except by a system call usingverifyPos
.
Generated April 12, 2004
Last Modified July 1, 2002
Comments to John Reppy.
This document may be distributed freely over the internet as long as the copyright notice and license terms below are prominently displayed within every machine-readable copy.
Copyright © 2004 AT&T and Lucent Technologies. All rights reserved.
Permission is granted for internet users to make one paper copy for their
own personal use. Further hardcopy reproduction is strictly prohibited.
Permission to distribute the HTML document electronically on any medium
other than the internet must be requested from the copyright holders by
contacting the editors.
Printed versions of the SML Basis Manual are available from Cambridge
University Press.
To order, please visit
www.cup.org (North America) or
www.cup.cam.ac.uk (outside North America). |