Produce

Makefiles without the annoying bits

http://kilian.evang.name/

Build automation: the classic case

example dependency chart for compiling a C program

Build automation: NLP pipeline

example dependency chart for running an NLP pipeline

Build automation: running machine learning experiments

example dependency chart for running machine learning experiments

What’s great about Makefiles

power of shell scripts + dependency management
target wildcards: abstract over multiple objects/documents/experiments…
declarative → self-documenting
widespread → suitable for distribution and replication

What’s not so great

arcane syntax
wildcards quite limited

Produce to the rescue!

Make syntax


# Compile
%.o : %.c
	cc -c $<

# Link
% : %.o
	cc -o $@ $<

Produce syntax


# Compile
[%{name}.o]
dep.c = %{name}.c
recipe = cc -c %{c}

# Link
[%{name}]
dep.o = %{name}.o
recipe = cc -o %{target} %{o}

Make: only first prerequisite named


out/%.pos : out/%.pos.auto out/%.pos.corr
	./src/scripts/apply_corrections $< \
        --corrections out/$*.pos.corr > $@

DRY violation!

Produce: name prerequisites


[out/%{name}.pos]
dep.auto = %{name}.pos.auto
dep.corr = %{name}.pos.corr
recipe = ./src/scripts/apply_corrections %{auto} %{corr} > %{target}

You don’t have to name them


[%{name}.pdf]
deps = %{name}.tex bibliography.bib
recipe =
	pdflatex %{name}
	bibtex %{name}
	pdflatex %{name}
	pdflatex %{name}

Make: a single wildcard


.SECONDEXPANSION:
out/%.labeled : out/$$(subst test,train,$$(subst dev,train,$$*)).model \
                out/$$(basename $$*).feat
        wapiti label -m $< out/$(basename $*).feat > $@

Produce: any number of wildcards…


[out/%{corpus}.%{portion}.%{fset}.labeled]
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

…or even regular expressions…


[/out/(?P<corpus>.*)\.(?P<portion>dev|test)\.(?P<fset>.*)\.labeled/]
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

…or matching conditions.


[out/%{corpus}.%{portion}.%{fset}.labeled]
cond = %{portion in ('dev', 'test')}
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

Make: counterintuitive declarations


.PHONY: clean
clean:
	rm *.o temp

Produce: rule attributes


[vacuum]
type = task
recipe = rm *.o temp

Make: a handful of functions


sources := foo.c bar.c baz.s ugh.h
foo: $(sources)
	cc $(filter %.c %.s,$(sources)) -o foo

Produce: the full power of Python


[]
sources = foo.c bar.c baz.s ugh.h

[foo]
deps = %{sources}
recipe = cc %{' '.join([f for f in sources.split() \
        if f.endswith('.c') or f.endswith('.s')])}

Minor Make annoyances fixed

% vs. $* confusion
$$ in recipe

Design goals

killer feature: multiple wildcards
low barrier for users
- intuitive syntax
- easy to install: one file, drop it in $PATH
do one thing and do it well
- any scripting language for recipes
- Python for dependency generation
- thin wrapper for Producefile syntax and dependency management
low development cost

Why Python?

no compilation needed
rich standard library
- argparse for command-line options
- configparser and shlex for parsing Producefiles
- eval for evaluating Python expressions
- subprocess for executing recipes
- logging for info and debugging

Other build automation tools

Ant, SCons, A-A-P…
- define their own languages for recipes
- designed with specific tasks in mind
Rake
- Rakefile = Ruby script
- too much syntax
redo
- commendably minimal

None of them support multiple wildcards!

Rake example


# Compile
rule '.o' =< ['.c'] do |t|
  sh "cc -c #{t.source}"
end

too much syntax to remember
no multiple wildcards (just regex)

Coolest challenges

interplay of %{wildcards} and %{"expressions"}
the build algorithm

The build algorithm

Desiderata

automatically run all commands to produce requested target
don’t build files that are already up-to-date (timestamp)
allow for deleting intermediate files without affecting up-to-dateness

Phase 1: build dependency graph, gather info

Starting from targets requested by user, for each target,

fail on cyclic dependency
stop if target already seen
determine which rule to use (first that matches)
list direct dependencies, process them recursively
determine target type: file or task
determine whether target is missing
determine target time
determine whether target is out of date

Phase 2: produce

For each target requested by user, call build_if_necessary(target):


def build_if_necessary(target):
    if target in out_of_date or target in missing:
        build(target)

def build(target):
    for dd in direct_dependencies[target]:
        build_if_necessary(dd)
    run_recipe(target)
    out_of_date.discard(target)
    missing.discard(target)

“Missing”?

A target is missing iff its type is file and it does not exist.

“Out of date”?

A target is out of date if any of these conditions hold:

its type is task
some direct dependency is newer
some direct dependency is out of date
the “always build” option is on

“Newer”?

The time of a task is 0.

The time of a missing file is the time of its newest direct dependency (or 0 if none).

The time of an existing file is its last-modified time.

# TODO

parallel building
includes
tweaking options for fooling the build algorithm

Come get some Produce!

https://github.com/texttheater/produce/

Photo courtesy of Patrick Feller, CC BY 2.0