To specify a MapReduce application, we require an implementation of the MapReduce framework into which we can insert map and reduce functions. In the following section, we will use the open-source Hadoop implementation. In this section, we develop a minimal implementation using built-in tools of the Unix operating system.

The Unix operating system creates an abstraction barrier between user programs and the underlying hardware of a computer. It provides a mechanism for programs to communicate with each other, in particular by allowing one program to consume the output of another. In their seminal text on Unix programming, Kernigham and Pike assert that, ""The power of a system comes more from the relationships among programs than from the programs themselves."

A Python source file can be converted into a Unix program by adding a comment to the first line indicating that the program should be executed using the Python 3 interpreter. The input to a Unix program is an iterable object called standard input and accessed as sys.stdin. Iterating over this object yields string-valued lines of text. The output of a Unix program is called standard output and accessed as sys.stdout. The built-in print function writes a line of text to standard output. The following Unix program writes each line of its input to its output, in reverse:

#!/usr/bin/env python3
  
  import sys
  
  for line in sys.stdin:
      print(line.strip('\n')[::-1])
  

If we save this program to a file called rev.py, we can execute it as a Unix program. First, we need to tell the operating system that we have created an executable program:

$ chmod u+x rev.py
  

Next, we can pass input into this program. Input to a program can come from another program. This effect is achieved using the | symbol (called "pipe") which channels the output of the program before the pipe into the program after the pipe. The program nslookup outputs the host name of an IP address (in this case for the New York Times):

$ nslookup 170.149.172.130 | ./rev.py
  moc.semityn.www
  

The cat program outputs the contents of files. Thus, the rev.py program can be used to reverse the contents of the rev.py file:

$ cat rev.py | ./rev.py
  3nohtyp vne/nib/rsu/!#
  
  sys tropmi
  
  :nidts.sys ni enil rof
  )]1-::[)'n\'(pirts.enil(tnirp
  

These tools are enough for us to implement a basic MapReduce framework. This version has only a single map task and single reduce task, which are both Unix programs implemented in Python. We run an entire MapReduce application using the following command:

$ cat input | ./mapper.py | sort | ./reducer.py
  

The mapper.py and reducer.py programs must implement the map function and reduce function, along with some simple input and output behavior. For instance, in order to implement the vowel counting application described above, we would write the following count_vowels_mapper.py program:

#!/usr/bin/env python3
  
  import sys
  from mr import emit
  
  def count_vowels(line):
      """A map function that counts the vowels in a line."""
      for vowel in 'aeiou':
          count = line.count(vowel)
          if count > 0:
              emit(vowel, count)
  
  for line in sys.stdin:
      count_vowels(line)
  

In addition, we would write the following sum_reducer.py program:

#!/usr/bin/env python3
  
  import sys
  from mr import values_by_key, emit
  
  for key, value_iterator in values_by_key(sys.stdin):
      emit(key, sum(value_iterator))
  

The mr module is a companion module to this text that provides the functions emit to emit a key-value pair and group_values_by_key to group together values that have the same key. This module also includes an interface to the Hadoop distributed implementation of MapReduce.

Finally, assume that we have the following input file called haiku.txt:

Google MapReduce
  Is a Big Data framework
  For batch processing
  

Local execution using Unix pipes gives us the count of each vowel in the haiku:

$ cat haiku.txt | ./count_vowels_mapper.py | sort | ./sum_reducer.py
  'a'   6
  'e'   5
  'i'   2
  'o'   5
  'u'   1