
Makefiles without the annoying bits

Build automation: the classic case

Build automation: NLP pipeline

Build automation: running machine learning experiments

What’s great about Makefiles

  • power of shell scripts + dependency management
  • target wildcards: abstract over multiple objects/documents/experiments…
  • declarative → self-documenting
  • widespread → suitable for distribution and replication

What’s not so great

  • arcane syntax
  • wildcards quite limited


Produce to the rescue!

Make syntax

# Compile
%.o : %.c
	cc -c $<

# Link
% : %.o
	cc -o $@ $<

Produce syntax

# Compile
dep.c = %{name}.c
recipe = cc -c %{c}

# Link
dep.o = %{name}.o
recipe = cc -o %{target} %{o}

Make: only first prerequisite named

out/%.pos : out/ out/%.pos.corr
	./src/scripts/apply_corrections $< \
        --corrections out/$*.pos.corr > $@

DRY violation!

Produce: name prerequisites

[out/%{name}.pos] = %{name}
dep.corr = %{name}.pos.corr
recipe = ./src/scripts/apply_corrections %{auto} %{corr} > %{target} 

You don’t have to name them

deps = %{name}.tex bibliography.bib
recipe =
	pdflatex %{name}
	bibtex %{name}
	pdflatex %{name}
	pdflatex %{name}

Make: a single wildcard

out/%.labeled : out/$$(subst test,train,$$(subst dev,train,$$*)).model \
                out/$$(basename $$*).feat
        wapiti label -m $< out/$(basename $*).feat > $@

Produce: any number of wildcards…

dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

…or even regular expressions…

dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

…or matching conditions.

cond = %{portion in ('dev', 'test')}
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

Make: counterintuitive declarations

.PHONY: clean
	rm *.o temp

Produce: rule attributes

type = task
recipe = rm *.o temp

Make: a handful of functions

sources := foo.c bar.c baz.s ugh.h
foo: $(sources)
	cc $(filter %.c %.s,$(sources)) -o foo

Produce: the full power of Python

sources = foo.c bar.c baz.s ugh.h

deps = %{sources}
recipe = cc %{' '.join([f for f in sources.split() \
        if f.endswith('.c') or f.endswith('.s')])}

Minor Make annoyances fixed

  • % vs. $* confusion
  • $$ in recipe

Design goals

  1. killer feature: multiple wildcards
  2. low barrier for users
    • intuitive syntax
    • easy to install: one file, drop it in $PATH
  3. do one thing and do it well
    • any scripting language for recipes
    • Python for dependency generation
    • thin wrapper for Producefile syntax and dependency management
  4. low development cost

Why Python?

  • no compilation needed
  • rich standard library
    • argparse for command-line options
    • configparser and shlex for parsing Producefiles
    • eval for evaluating Python expressions
    • subprocess for executing recipes
    • logging for info and debugging

Other build automation tools

  • Ant, SCons, A-A-P…
    • define their own languages for recipes
    • designed with specific tasks in mind
  • Rake
    • Rakefile = Ruby script
    • too much syntax
  • redo
    • commendably minimal


None of them support multiple wildcards!

Rake example

# Compile
rule '.o' =< ['.c'] do |t|
  sh "cc -c #{t.source}"
  • too much syntax to remember
  • no multiple wildcards (just regex)

Coolest challenges

  • interplay of %{wildcards} and %{"expressions"}
  • the build algorithm

The build algorithm


  • automatically run all commands to produce requested target
  • don’t build files that are already up-to-date (timestamp)
  • allow for deleting intermediate files without affecting up-to-dateness

Phase 1: build dependency graph, gather info

Starting from targets requested by user, for each target,
  1. fail on cyclic dependency
  2. stop if target already seen
  3. determine which rule to use (first that matches)
  4. list direct dependencies, process them recursively
  5. determine target type: file or task
  6. determine whether target is missing
  7. determine target time
  8. determine whether target is out of date

Phase 2: produce

For each target requested by user, call build_if_necessary(target):

def build_if_necessary(target):
    if target in out_of_date or target in missing:

def build(target):
    for dd in direct_dependencies[target]:



A target is missing iff its type is file and it does not exist.

“Out of date”?

A target is out of date if any of these conditions hold:
  • its type is task
  • some direct dependency is newer
  • some direct dependency is out of date
  • the “always build” option is on


The time of a task is 0.

The time of a missing file is the time of its newest direct dependency (or 0 if none).

The time of an existing file is its last-modified time.


  • parallel building
  • includes
  • tweaking options for fooling the build algorithm

Come get some Produce!

