STREAM_IO
signature
signature STREAM_IO
The STREAM_IO
signature defines the interface of the Stream I/O layer in the I/O stack. This layer provides buffering over the readers and writers of the Primitive I/O layer.
Input streams are treated in the lazy functional style: that is, input from a stream f yields a finite vector of elements, plus a new stream f'. Input from f again will yield the same elements; to advance within the stream in the usual way, it is necessary to do further input from f'. This interface allows arbitrary lookahead to be done very cleanly, which should be useful both for ad hoc lexical analysis and for table-driven, regular-expression-based lexing.
Output streams are handled more conventionally, since the lazy functional style does not seem to make sense for output.
Stream I/O functions may raise the Size
exception if a resulting vector of elements would exceed the maximum vector size, or the IO.Io
exception. In general, when IO.Io
is raised as a result of a failure in a lower-level module, the underlying exception is caught and propagated up as the cause
component of the IO.Io
exception value. This will usually be a Subscript
, IO.ClosedStream
, OS.SysErr
, or Fail
exception (the last possible because of user-supplied readers or writers), but the stream I/O module will rarely (perhaps never) need to inspect it.
type elem
type vector
type instream
type outstream
type out_pos
type reader
type writer
type pos
val input : instream -> vector * instream
val input1 : instream -> (elem * instream) option
val inputN : instream * int -> vector * instream
val inputAll : instream -> vector * instream
val canInput : instream * int -> int option
val closeIn : instream -> unit
val endOfStream : instream -> bool
val output : outstream * vector -> unit
val output1 : outstream * elem -> unit
val flushOut : outstream -> unit
val closeOut : outstream -> unit
val mkInstream : reader * vector -> instream
val getReader : instream -> reader * vector
val filePosIn : instream -> pos
val setBufferMode : outstream * IO.buffer_mode -> unit
val getBufferMode : outstream -> IO.buffer_mode
val mkOutstream : writer * IO.buffer_mode -> outstream
val getWriter : outstream -> writer * IO.buffer_mode
val getPosOut : outstream -> out_pos
val setPosOut : out_pos -> outstream
val filePosOut : out_pos -> pos
type elem
type vector
Char.char
and String.string
, while for binary streams, these are Word8.word
and Word8Vector.vector
.
type instream
Input streams are in one of three states: active, truncated, or closed. When initially created, the stream is active. When disconnected from its underlying primitive reader (e.g., by getReader
), the stream is truncated. When closeIn
is applied to the stream, the stream enters the closed state. A closed stream is also truncated. The only real difference between a truncated stream and a closed one is that in the latter case, the stream's primitive I/O reader is closed.
Reading from a truncated input stream will never block; after all buffered elements are read, input operations always return empty vectors.
type outstream
Output streams are in one of three states: active, terminated, or closed. When initially created, the stream is active. When disconnected from its underlying primitive writer (e.g., by getWriter
), the stream is terminated. When closeOut
is applied to the stream, the stream enters the closed state. A closed stream is also terminated. The only real difference between a terminated stream and a closed one is that in the latter case, the stream's primitive I/O writer is closed.
In a terminated output stream, there is no mechanism for performing more output, so any output operations will raise the IO.Io
exception.
type out_pos
out_pos
value. Thus, the canonical representation for the type is (outstream
* pos
)
.
type reader
type writer
type pos
TextIO.StreamIO
), pos
is abstract; in others it may be concrete (e.g., Position.int
in BinIO.StreamIO
).
input f
Io
exception if there is an error in the underlying reader.
input1 f
NONE
is returned. It may block until one of these conditions is satisfied. This function raises the Io
exception if there is an error in the underlying reader.
inputN (f, n)
Io
exception if there is an error in the underlying reader. It raises Size
if n < 0 or the number of elements to be returned is greater than maxLen
. Also, inputN
(f,0)
returns immediately with an empty vector and f
, so this cannot be used as an indication of end-of-stream.
Using instream
s, one can synthesize a non-blocking version of inputN
from inputN
and canInput
, as inputN
is guaranteed not to block if a previous call to canInput
returned
.
SOME
(_)
inputAll f
Io
exception if there is an error in the underlying reader. The stream f' is immediately past the next end-of-stream of f. For ordinary files in which only one end-of stream is expected, f' can be ignored. If a file has multiple end-of-stream conditions (which can happen under some operating systems), inputAll
returns all the elements up to the next end-of-stream. It raises Size
if the number of elements to be returned is greater than maxLen
for the relevant vector
type.
canInput (f, n)
NONE
if any attempt at input would block. It returns SOME
(k)
, where 0 <= k <= n, if a call to input
would return immediately with at least k characters. Note that k = 0 corresponds to the stream being at end-of-stream.
Some streams may not support this operation, in which case the Io
exception will be raised. This function also raises the Io
exception if there is an error in the underlying reader. It raises the Size
exception if n
< 0.
Implementation note:
It is suggested that implementations of
canInput
should attempt to return as large a k as possible. For example, if the buffer contains 10 characters and the user callscanInput (f, 15)
,canInput
should callreadVecNB(5)
to see if an additional 5 characters are available.Such a lookahead commits the stream to the characters read by
readVecNB
but it does not commit the stream to return those characters on the next call toinput
. Indeed, a typical implementation will simply return the remainder of the current buffer, in this case, consisting of 10 characters, ifinput
is called. On the other hand, an implementation can decide to always respond toinput
with all the elements currently available, provided an earlier call toinput
has not committed the stream to a particular response. The only requirement is that any future call ofinput
on the same input stream must return the same vector of elements.
closeIn f
closeIn
on a closed stream has no effect. This function raises the Io
exception if there is an error in the underlying reader.
endOfStream f
true
; otherwise it returns false
. This function raises the Io
exception if there is an error in the underlying reader.
This function may block when checking for more input. It is equivalent to
(length(#1(input f)) = 0)where
length
is the vector length operation
Note that even if endOfStream
returns true
, subsequent input operations may succeed if more data becomes available. A stream can have multiple end-of-streams interspersed with normal elements. This can happen on Unix, for example, if a user types control-D (#"\^D"
) on a terminal device, and then keeps typing characters; it may also occur on file descriptors connected to sockets.
Multiple end-of-streams is a property of the underlying reader. Thus, readVec
on a reader
may return an empty string, then another call to readVec
on the same reader
may return a nonempty string, then a third call may return an empty string. It is always true, however, that
endOfStream f = endOfStream fIn addition, if
endOfStream
f
returns true
, then input
f
returns ("",f')
and endOfStream
f'
may or may not be true.
output (f, vec)
Io
if f is terminated. This function also raises the Io
exception if there is an error in the underlying writer.
output1 (f, el)
Io
if f is terminated. This function also raises the Io
exception if there is an error in the underlying writer.
flushOut f
Io
exception if there is an error in the underlying writer.
closeOut f
Io
exception if there is an error in the underlying writer or if flushing fails. In the latter case, the stream is left open.
mkInstream (rd, v)
instream
built on top of the reader rd with the initial buffer contents v.
If the reader does not implement all of its fields (for example, if random access operations are missing), then certain operations will raise exceptions when applied to the resulting instream
. The following table describes the minimal relationship between instream
operations and a reader:
instream supports: | if reader implements: |
---|---|
input , inputN , etc.
|
readVec
|
canInput
|
readVecNB
|
endOfStream
|
readVec
|
filePosIn
|
getPos and setPos
|
mkInstream
should construct the input stream using the reader provided. If the user wishes to employ synthesized functions in the reader, the user may call mkInstream
with an augmented reader
. See augmentReader
(rd)PRIM_IO
for a description of the functions generated by augmentReader
.
Building more than one input stream on top of a single reader has unpredictable effects, since readers are imperative objects. In general, there should be a 1-1 correspondence between a reader and a sequence of input streams. Also note that creating an input stream this way means that the stream could be unaware that the reader has been closed until the stream actually attempts to read from it.
getReader f
(closeIn f; inputAll f)
. The function raises the exception Io
if f is closed or truncated.
filePosIn f
Io
if the stream does not support the operation, or if f has been truncated.
It should be true that, if #1(inputAll f)
returns vector v
, then
(setPos (filePosIn f); readVec (length v))should also return
v
, assuming all operations are defined and terminate.
Implementation note:
If the
pos
type is a concrete integer corresponding to a byte offset, and the translation function (between bytes and elements) is known, the value can be computed directly. If not, the value is given byfun pos (bufp, n, r as RD rdr) = let val readVec = valOf (#readVec rdr) val getPos = valOf (#getPos rdr) val setPos = valOf (#setPos rdr) val savep = getPos() in setPos bufp; readVec n; getPos () before setPos savep endwherebufp
is the file position corresponding to the beginning of the current buffer,n
is the number of elements already read from the current buffer, andr
is the stream's underlying reader.
setBufferMode (f, mode)
getBufferMode f
IO.NO_BUF
causes any buffered output to be flushed. If the flushing fails, the Io
exception is raised. Switching the mode between IO.LINE_BUF
and IO.BLOCK_BUF
should not cause flushing. If, in going from IO.BLOCK_BUF
to IO.LINE_BUF
, the user desires that the buffer contain no newline characters, the user should call flushOut
explicitly.
mkOutstream (wr, mode)
If the writer does not implement all of its fields (for example, if random access operations are missing), then certain operations will raise exceptions when applied to the resulting outstream
. The following table describes the minimal relationship between outstream
operations and a writer:
outstream supports: | if augmented writer implements: |
---|---|
output , output1 , etc.
|
writeArr
|
flushOut
|
writeArr
|
setBufferMode
|
writeArr
|
getPosOut
|
writeArr and getPos
|
setPosOut
|
writeArr and setPos
|
mkOutstream
should construct the output stream using the writer provided. If the user wishes to employ synthesized functions in the writer, the user may call mkOutstream
with an augmented writer
. See augmentWriter
(wr)PRIM_IO
for a description of the functions generated by augmentWriter
.
Building more than one outstream
on top of a single writer has unpredictable effects, since buffering may change the order of output. In general, there should be a 1-1 correspondence between a writer and an output stream. Also note that creating an output stream this way means that the stream could be unaware that the writer has been closed until the stream actually attempts to write to it.
getWriter f
Io
if f is closed, or if the flushing fails.
getPosOut f
Io
if the stream does not support the operation, if any implicit flushing fails, or if f is terminated.
Implementation note:
A typical implementation of this function will require calculating a value of type
pos
, capturing where the next element written to f will be written in the underlying file. If thepos
type is a concrete integer corresponding to a byte offset, and the translation function (between bytes and elements) is known, the value can be computed directly usinggetPos
. If not, the value is given byfun pos (f, w as WR wtr) = let val getPos = valOf (#getPos wtr) in flushOut f; getPos () endwheref
is the output stream andw
is the stream's underlying writer.
setPosOut opos
Io
exception if the flushing fails, if the stream does not support the operation, or if the stream underlying opos is terminated.
filePosOut opos
Suppose we are given an output stream f
and a vector of elements v
, and let opos
equal getPosOut(f)
. Then the code
(setPos opos; writeVec{buf=v,i=0,sz=NONE})should have the same effect as the last line of the function
fun put (outs,x) = (flushOut outs; output(outs,x);flushOut outs)when called with
(f,v)
assuming all operations are defined and terminate, and that the call to writeVec
returns length v
.
IMPERATIVE_IO
,PRIM_IO
,StreamIO
,TEXT_STREAM_IO
The signature of input1
is inconsistent with all of the other input functions. This is intentional: its type makes it a (char,instream)
and thus a source of characters for the various scan functions.
StringCvt.reader
Another point to notice about input1
is that it cannot be used to read beyond an end-of-stream. When an end-of-stream is encountered. the programmer will need to use one of the other input functions to obtain the stream after the end-of-stream. For example, if input1(f)
returns NONE
, a call to inputN(f,1)
will return immediately with (fromList [], f')
, and f'
can be used to continue input.
It is possible that a stream's underlying reader/writer, or its operating system file descriptor, could be closed while the stream is still active. When this condition is detected, typically by an exception being raised by the lower level, the stream should raise the IO.Io
exception with cause
set to IO.ClosedStream
. On a related point, one can close a truncated or terminated string. This is intended as a convenience, with the inactive stream providing a handle to the underlying file, but it also provides an opportunity to close a reader or writer being actively used by another stream.
Output flushing can occur by calls to any output operation, or by calls to flushOut
, closeOut
, getWriter
, setPosOut
, getPosOut
, or if setBufferMode
is called with mode IO.NO_BUF
. If flushing finds that it can do only a partial write (i.e., writeVec
or a similar function returns a number of elements written less than its sz argument), then the stream function must adjust the stream's buffer for the items written and then try again. If the first or any successive write attempt raises an exception, then the stream function must raise the IO.Io
exception.
For the remainder of this chapter, we shall assume the following binding:
structure TS = TextIO.StreamIOand that
elem = char
. Also, the predicates used to illustrate a point should all evaluate to true, assuming they complete without exception.
Input is semi-deterministic: input
may read any number of elements from f the ``first'' time, but then it is committed to its choice, and must return the same number of elements on subsequent reads from the same point:
fun chkInput f= let val (a,_) = TS.input f val (b,_) = TS.input f in a=b endalways returns true. In general, any expression involving input streams and functions defined in the
STREAM_IO
signature should always evaluate to the same value, barring exceptions.
Closing or truncating a stream just causes the not-yet-determined part of the stream to be empty:
fun chkClose f = let val (a,f') = TS.input f val _ = TS.closeIn f val (b,_) = TS.input f in a=b andalso TS.endOfStream f' end
Closing a closed stream is legal and harmless:
fun closeTwice f = (TS.closeIn f; TS.closeIn f; true)
If a stream has already been at least partly determined, then input
cannot possibly block:
fun noBlock f = let val (s,_) = TS.input f in case TS.canInput (f, 1) of SOME 0 => (size s) = 0 | SOME _ => (size s) > 0 | NONE => false endNote that a successful
canInput
does not imply that more characters remain before end-of-stream, just that reading will not block.
A freshly opened stream is still undetermined (no ``read'' has yet been done on the underlying reader):
fun newStr rdr = let val a = TS.mkInstream (rdr, "") in TS.closeIn a; size(#1(TS.input a)) = 0 endThis has the useful consequence that if one opens a stream, then extracts the underlying reader, the reader has not yet been advanced in its file.
A generalization of this property says that the first time any stream value is produced, it is up-to-date with respect to its reader:
fun nreads(f,0) = f | nreads(f,n) = let val (_,f') = TS.input f in nreads(f',n-1) end fun reads (rdr, n) = let (* for any n>=0 *) val f = nreads(TS.mkInstream (rdr, ""),n) in TS.closeIn f; size(#1(TS.input f)) = 0 end
The sequence of strings returned from a fresh stream by input
is exactly the sequence returned by the underlying reader. This includes end-of-stream conditions, which the reader indicates by returning a zero-element vector and input
indicates in the same way.
The endOfStream
test is equivalent to input
returning an empty sequence:
fun isEOS f = let val (a,_) = TS.input f in ((size a)=0) = (TS.endOfStream f) end
The semantics of inputAll
can be defined in terms of input
:
fun inputAll f = case TS.input f of ("",f') => ("",f') | (s,f') => let val (rest,f'') = inputAll f' in (s ^ rest, f'') endAn actual implementation, however, is likely to be much more efficient; for example, on a large file,
inputAll
might read the whole file in a single system call or use memory mapping. Note that if a stream f
contains data "abc"
followed by an end-of-stream followed by "defg"
and another end-of-stream, then inputAll f
returns ("abc",f')
, and inputAll f'
returns ("defg",f'')
.
The semantics of inputN
can be related to inputAll
by the following predicate:
fun allAndN (f,n) = let val (s,f1) = TS.inputN(f,n) val (t,f2)= TS.inputAll f in size s < n andalso s=t andalso equiv(f1,f2) orelse let val (r,f3) = TS.inputAll f1 in size s = n andalso t = s ^ r andalso equiv(f2,f3) end endwhere the
equiv
predicate represents that the two argument streams behave identically under input
:
fun equiv (f,g) = let val (s,f') = TS.input f val (t,g') = TS.input g in s=t andalso equiv(f',g') endignoring termination conditions. If
f
contained exactly n
characters before the end-of-stream, then r
in allAndN
will be the empty string. Another way of saying this is that inputN
returns fewer than n
characters if and only if those elements are followed by an end-of-stream.
The semantics of input1
can be defined in terms of inputN
:
fun input1 f = case TS.inputN (f,1) of ("",_) => NONE | (s,f')=> SOME(String.sub(s,0),f')
If chunkSize
= 1 in the underlying reader
, then input operations should be unbuffered. Thus, the following function always returns true:
fun isTrue (rdr : TextPrimIO.reader) = let val f = TS.mkInstream(rdr, "") val (_,f') = TS.input f val (TextPrimIO.RD{chunkSize,...},s) = TS.getReader f' in (chunkSize > 1) orelse (size s = 0) endwhere
rdr
denotes a reader
created from a newly opened file. Although input
may perform a Primitive I/O read operation on the reader for k
>= 1 elements, it must immediately return all the elements it receives. This does not hold, however, for partly determined input streams. For example, the function
fun maybeTrue (rdr : TextPrimIO.reader) = let val f = TS.mkInstream(rdr, "") val _ = TS.input (#2 (TS.input f)) val (_,f') = TS.input f val (TextPrimIO.RD{chunkSize,...},s) = TS.getReader f' in (chunkSize > 1) orelse (size s = 0) endmight return false. In this case, the stream
f
has accumulated a history of more input, which will not be emptied by a single call to input
.
Similarly, if a writer sets chunkSize
= 1, it suggests that output operations should be unbuffered. An application can specify that a stream should be unbuffered using the setBufferMode
function.
Implementation note:
A general rule for implementing stream input is: ``do not bother the reader.'' Whenever it is possible to do so, input must be done by using elements from the buffer, without any operation on the underlying reader. This is necessary so that repeated calls to
endOfStream
will not make repeated system calls.Implementations may require a device such as an extra boolean to mark multiple end-of-streams, in order that
input
applied to the same stream always returns the same vector.The manual page of the
StreamIO
functor lists a variety of implementation suggestions, many of which are applicable to any implementation of theSTREAM_IO
signature.In general, if an exception occurs during any Stream I/O operation, then the stream must leave itself in a consistent state, without losing or duplicating data. In some SML systems, a user interrupt aborts execution and returns control to a top-level prompt, without raising any exception that the current execution can handle. It may be the case that some information must be lost or duplicated. Data (input or output) must never be duplicated, but may be lost. This can be accomplished without Stream I/O doing any explicit masking of interrupts or locking. On output, the internal state (saying how much has been written) should be updated before doing the write operation; on input, the read should be done before updating the count of valid characters in the buffer.
Generated April 12, 2004
Last Modified May 10, 1996
Comments to John Reppy.
This document may be distributed freely over the internet as long as the copyright notice and license terms below are prominently displayed within every machine-readable copy.
Copyright © 2004 AT&T and Lucent Technologies. All rights reserved.
Permission is granted for internet users to make one paper copy for their
own personal use. Further hardcopy reproduction is strictly prohibited.
Permission to distribute the HTML document electronically on any medium
other than the internet must be requested from the copyright holders by
contacting the editors.
Printed versions of the SML Basis Manual are available from Cambridge
University Press.
To order, please visit
www.cup.org (North America) or
www.cup.cam.ac.uk (outside North America). |