Composing Programs - Interpreting SQL

Browse Knowledge See Learning Tools Try Rapid Repetition Take Practice Test Play Quiz Game Study Flash Cards

In order to create an interpreter for the subset of SQL we have introduced so far, we need to create a representation for tables, a parser for statements written as text, and an evaluator for parsed statements. The sql interpreter example includes all of these components, providing a simple but functional demonstration of a declarative language interpreter.

In this implementation, each table has its own a class, and each row in a table is represented by an instance of its table's class. A row has one attribute per column in the table, and a table is a sequence of rows.

The class for a table is created using the namedtuple function in the collections package of the Python standard library, which returns a new sub-class of tuple that gives names to each element in the tuple.

Consider the cities table from the previous section, repeated below.

sqlite> create table cities as
       ...>    select 38 as latitude, 122 as longitude, "Berkeley" as name union
       ...>    select 42,             71,               "Cambridge"        union
       ...>    select 45,             93,               "Minneapolis";

The following Python statements construct a representation for this table.

>>> from collections import namedtuple
    >>> CitiesRow = namedtuple("Row", ["latitude", "longitude", "name"])
    >>> cities = [CitiesRow(38, 122, "Berkeley"),
                  CitiesRow(42,  71, "Cambridge"),
                  CitiesRow(43,  93, "Minneapolis")]

The result of a select statement can be interpreted using sequence operations. Consider the distances table from the previous section, repeated below.

sqlite> create table distances as
       ...>   select name, 60*abs(latitude-38) as distance from cities;
    sqlite> select distance/5, name from distances;
    0|Berkeley
    48|Cambridge
    84|Minneapolis

This table is generated from the name and latitude columns of the cities table. This resulting table can be generated by mapping a function over the rows of the input table, a function that returns a DistancesRow for each CitiesRow.

>>> DistancesRow = namedtuple("Row", ["name", "distance"])
    >>> def select(cities_row):
            latitude, longitude, name = cities_row
            return DistancesRow(name, 60*abs(latitude-38))
    >>> distances = list(map(select, cities))
    >>> for row in distances:
            print(row)
    Row(name='Berkeley', distance=0)
    Row(name='Cambridge', distance=240)
    Row(name='Minneapolis', distance=300)

The design of our SQL interpreter generalizes this approach. A select statement is represented as an instance of a class Select that is constructed from the clauses of the select statement.

>>> class Select:
            """select [columns] from [tables] where [condition] order by [order]."""
            def __init__(self, columns, tables, condition, order):
                self.columns = columns
                self.tables = tables
                self.condition = condition
                self.order = order
                self.make_row = create_make_row(self.columns)
            def execute(self, env):
                """Join, filter, sort, and map rows from tables to columns."""
                from_rows = join(self.tables, env)
                filtered_rows = filter(self.filter, from_rows)
                ordered_rows = self.sort(filtered_rows)
                return map(self.make_row, ordered_rows)
            def filter(self, row):
                if self.condition:
                    return eval(self.condition, row)
                else:
                    return True
            def sort(self, rows):
                if self.order:
                    return sorted(rows, key=lambda r: eval(self.order, r))
                else:
                    return rows

The execute method joins input tables, filters and orders the resulting rows, then maps a function called make_row over those resulting rows. The make_row function is created in the Select constructor by a call to create_make_row, a higher-order function that creates a new class for the resulting table and defines how to project an input row to an output row. (A version of this function with more error handling and special cases appears in sql.)

>>> def create_make_row(description):
            """Return a function from an input environment (dict) to an output row.
            description -- a comma-separated list of [expression] as [column name]
            """
            columns = description.split(", ")
            expressions, names = [], []
            for column in columns:
                if " as " in column:
                    expression, name = column.split(" as ")
                else:
                    expression, name = column, column
                expressions.append(expression)
                names.append(name)
            row = namedtuple("Row", names)
            return lambda env: row(*[eval(e, env) for e in expressions])

Finally, we need to define the join function that creates the input rows. Given an env dictionary contains existing tables (lists of rows) keyed by their name, the join function groups together all combinations of rows in the input tables using the product function in the itertools package. It maps a function called make_env over the joined rows, a function that converts each combination of rows into a dictionary so that it can be used to evaluate expressions. (A version of this function with more error handling and special cases appears in sql.)

>>> from itertools import product
    >>> def join(tables, env):
            """Return an iterator over dictionaries from names to values in a row.
            tables -- a comma-separate sequences of table names
            env    -- a dictionary from global names to tables
            """
            names = tables.split(", ")
            joined_rows = product(*[env[name] for name in names])
            return map(lambda rows: make_env(rows, names), joined_rows)
    >>> def make_env(rows, names):
            """Create an environment of names bound to values."""
            env = dict(zip(names, rows))
            for row in rows:
                for name in row._fields:
                    env[name] = getattr(row, name)
            return env

Above, row._fields evaluates to the column names of the table containing the row. The _fields attribute exists because the type of row is a namedtuple class.

Our interpreter is complete enough to execute select statements. For instance, we can compute the latitude distance from Berkeley for all other cities, ordered by their longitude.

>>> env = {"cities": cities}
    >>> select = Select("name, 60*abs(latitude-38) as distance",
                        "cities", "name != 'Berkeley'", "-longitude")
    >>> for row in select.execute(env):
            print(row)
    Row(name='Minneapolis', distance=300)
    Row(name='Cambridge', distance=240)

The example above is equivalent to the following SQL statement.

sqlite> select name, 60*abs(latitude-38) as distance
       ...>        from cities where name != "Berkeley" order by -longitude;
    Minneapolis|420
    Cambridge|240

We can also store this resulting table in the environment and join it with the cities table, retrieving the longitude for each city.

>>> env["distances"] = list(select.execute(env))
    >>> joined = Select("cities.name as name, distance, longitude", "cities, distances",
                        "cities.name == distances.name", None)
    >>> for row in joined.execute(env):
            print(row)
    Row(name='Cambridge', distance=240, longitude=71)
    Row(name='Minneapolis', distance=300, longitude=93)

The example above is equivalent to the following SQL statement.

sqlite> select cities.name as name, distance, longitude
       ...>        from cities, distances where cities.name = distances.name;
    Cambridge|240|71
    Minneapolis|420|93

The full sql example program also contains a simple parser for select statements, as well as execute methods for create table and union. The interpreter can correctly execute all SQL statements included within the text so far. While this simple interpreter only implements a small amount of the full Structured Query Language, its structure demonstrates the relationship between sequence processing operations and query languages.

Query Plans. Declarative languages describe the form of a result, but do not explicitly describe how that result should be computed. This interpreter always joins, filters, orders, and then projects the input rows in order to compute the result rows. However, more efficient ways to compute the same result may exist, and query interpreters are free to choose among them. Choosing efficient procedures for computing query results is a core feature of database systems.

For example, consider the final select statement above. Rather than computing the join of cities and distances and then filtering the result, the same result may be computed by first sorting both tables by the name column and then joining only the rows that have the same name in a linear pass through the sorted tables. When tables are large, efficiency gains from query plan selection can be substantial.

Original Link

http://composingprograms.com/pages/43-declarative-programming.html#interpreting-sql