Streams offer another way to represent sequential data implicitly. A stream is a lazily computed linked list. Like the Link class from Chapter 2, a Stream instance responds to requests for its first element and the rest of the stream. Like an Link, the rest of a Stream is itself a Stream. Unlike an Link, the rest of a stream is only computed when it is looked up, rather than being stored in advance. That is, the rest of a stream is computed lazily.

To achieve this lazy evaluation, a stream stores a function that computes the rest of the stream. Whenever this function is called, its returned value is cached as part of the stream in an attribute called _rest, named with an underscore to indicate that it should not be accessed directly.

The accessible attribute rest is a property method that returns the rest of the stream, computing it if necessary. With this design, a stream stores how to compute the rest of the stream, rather than always storing the rest explicitly.

>>> class Stream:
            """A lazily computed linked list."""
            class empty:
                def __repr__(self):
                    return 'Stream.empty'
            empty = empty()
            def __init__(self, first, compute_rest=lambda: empty):
                assert callable(compute_rest), 'compute_rest must be callable.'
                self.first = first
                self._compute_rest = compute_rest
            @property
            def rest(self):
                """Return the rest of the stream, computing it if necessary."""
                if self._compute_rest is not None:
                    self._rest = self._compute_rest()
                    self._compute_rest = None
                return self._rest
            def __repr__(self):
                return 'Stream({0}, <...>)'.format(repr(self.first))
    

A linked list is defined using a nested expression. For example, we can create an Link that represents the elements 1 then 5 as follows:

>>> r = Link(1, Link(2+3, Link(9)))
    

Likewise, we can create a Stream representing the same series. The Stream does not actually compute the second element 5 until the rest of the stream is requested. We achieve this effect by creating anonymous functions.

>>> s = Stream(1, lambda: Stream(2+3, lambda: Stream(9)))
    

Here, 1 is the first element of the stream, and the lambda expression that follows returns a function for computing the rest of the stream.

Accessing the elements of linked list r and stream s proceed similarly. However, while 5 is stored within r, it is computed on demand for s via addition, the first time that it is requested.

>>> r.first
    1
    >>> s.first
    1
    >>> r.rest.first
    5
    >>> s.rest.first
    5
    >>> r.rest
    Link(5, Link(9))
    >>> s.rest
    Stream(5, <...>)
    

While the rest of r is a two-element linked list, the rest of s includes a function to compute the rest; the fact that it will return the empty stream may not yet have been discovered.

When a Stream instance is constructed, the field self._rest is None, signifying that the rest of the Stream has not yet been computed. When the rest attribute is requested via a dot expression, the rest property method is invoked, which triggers computation with self._rest = self._compute_rest(). Because of the caching mechanism within a Stream, the compute_rest function is only ever called once, then discarded.

The essential properties of a compute_rest function are that it takes no arguments, and it returns a Stream or Stream.empty.

Lazy evaluation gives us the ability to represent infinite sequential datasets using streams. For example, we can represent increasing integers, starting at any first value.

>>> def integer_stream(first):
            def compute_rest():
                return integer_stream(first+1)
            return Stream(first, compute_rest)
    
>>> positives = integer_stream(1)
    >>> positives
    Stream(1, <...>)
    >>> positives.first
    1
    

When integer_stream is called for the first time, it returns a stream whose first is the first integer in the sequence. However, integer_stream is actually recursive because this stream's compute_rest calls integer_stream again, with an incremented argument. We say that integer_stream is lazy because the recursive call to integer_stream is only made whenever the rest of an integer stream is requested.

>>> positives.first
    1
    >>> positives.rest.first
    2
    >>> positives.rest.rest
    Stream(3, <...>)
    

The same higher-order functions that manipulate sequences -- map and filter -- also apply to streams, although their implementations must change to apply their argument functions lazily. The function map_stream maps a function over a stream, which produces a new stream. The locally defined compute_rest function ensures that the function will be mapped onto the rest of the stream whenever the rest is computed.

>>> def map_stream(fn, s):
            if s is Stream.empty:
                return s
            def compute_rest():
                return map_stream(fn, s.rest)
            return Stream(fn(s.first), compute_rest)
    

A stream can be filtered by defining a compute_rest function that applies the filter function to the rest of the stream. If the filter function rejects the first element of the stream, the rest is computed immediately. Because filter_stream is recursive, the rest may be computed multiple times until a valid first element is found.

>>> def filter_stream(fn, s):
            if s is Stream.empty:
                return s
            def compute_rest():
                return filter_stream(fn, s.rest)
            if fn(s.first):
                return Stream(s.first, compute_rest)
            else:
                return compute_rest()
    

The map_stream and filter_stream functions exhibit a common pattern in stream processing: a locally defined compute_rest function recursively applies a processing function to the rest of the stream whenever the rest is computed.

To inspect the contents of a stream, we can coerce up to the first k elements to a Python list.

>>> def first_k_as_list(s, k):
            first_k = []
            while s is not Stream.empty and k > 0:
                first_k.append(s.first)
                s, k = s.rest, k-1
            return first_k
    

These convenience functions allow us to verify our map_stream implementation with a simple example that squares the integers from 3 to 7.

>>> s = integer_stream(3)
    >>> s
    Stream(3, <...>)
    >>> m = map_stream(lambda x: x*x, s)
    >>> m
    Stream(9, <...>)
    >>> first_k_as_list(m, 5)
    [9, 16, 25, 36, 49]
    

We can use our filter_stream function to define a stream of prime numbers using the sieve of Eratosthenes, which filters a stream of integers to remove all numbers that are multiples of its first element. By successively filtering with each prime, all composite numbers are removed from the stream.

>>> def primes(pos_stream):
            def not_divible(x):
                return x % pos_stream.first != 0
            def compute_rest():
                return primes(filter_stream(not_divible, pos_stream.rest))
            return Stream(pos_stream.first, compute_rest)
    

By truncating the primes stream, we can enumerate any prefix of the prime numbers.

>>> prime_numbers = primes(integer_stream(2))
    >>> first_k_as_list(prime_numbers, 7)
    [2, 3, 5, 7, 11, 13, 17]
    

Streams contrast with iterators in that they can be passed to pure functions multiple times and yield the same result each time. The primes stream is not "used up" by converting it to a list. That is, the first element of prime_numbers is still 2 after converting the prefix of the stream to a list.

>>> prime_numbers.first
    2
    

Just as linked lists provide a simple implementation of the sequence abstraction, streams provide a simple, functional, recursive data structure that implements lazy evaluation through the use of higher-order functions.