Produce

Makefiles without the annoying bits

http://kilian.evang.name/

Build automation: the classic case

example dependency chart for compiling a C program

Build automation: NLP pipeline

example dependency chart for running an NLP pipeline

Groningen Meaning Bank logo

Build automation: running machine learning experiments

example dependency chart for running machine learning experiments

What’s great about Makefiles

  • power of shell scripts + dependency management
  • target wildcards: abstract over multiple objects/documents/experiments…
  • declarative → self-documenting
  • widespread → suitable for distribution and replication

What’s not so great

  • arcane syntax
  • wildcards quite limited

 

Produce to the rescue!

Make syntax


# Compile
%.o : %.c
	cc -c $<

# Link
% : %.o
	cc -o $@ $<
					

Produce syntax


# Compile
[%{name}.o]
dep.c = %{name}.c
recipe = cc -c %{c}

# Link
[%{name}]
dep.o = %{name}.o
recipe = cc -o %{target} %{o}
					

Make: only first prerequisite named


out/%.pos : out/%.pos.auto out/%.pos.corr
	./src/scripts/apply_corrections $< \
        --corrections out/$*.pos.corr > $@
					

DRY violation!

Produce: name prerequisites


[out/%{name}.pos]
dep.auto = %{name}.pos.auto
dep.corr = %{name}.pos.corr
recipe = ./src/scripts/apply_corrections %{auto} %{corr} > %{target} 
					

You don’t have to name them


[%{name}.pdf]
deps = %{name}.tex bibliography.bib
recipe =
	pdflatex %{name}
	bibtex %{name}
	pdflatex %{name}
	pdflatex %{name}
					

Make: a single wildcard


.SECONDEXPANSION:
out/%.labeled : out/$$(subst test,train,$$(subst dev,train,$$*)).model \
                out/$$(basename $$*).feat
        wapiti label -m $< out/$(basename $*).feat > $@
					

Produce: any number of wildcards…


[out/%{corpus}.%{portion}.%{fset}.labeled]
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}
					

…or even regular expressions…


[/out/(?P<corpus>.*)\.(?P<portion>dev|test)\.(?P<fset>.*)\.labeled/]
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}
					

…or matching conditions.


[out/%{corpus}.%{portion}.%{fset}.labeled]
cond = %{portion in ('dev', 'test')}
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}
					

Make: counterintuitive declarations


.PHONY: clean
clean:
	rm *.o temp
					

Produce: rule attributes


[vacuum]
type = task
recipe = rm *.o temp
					

Make: a handful of functions


sources := foo.c bar.c baz.s ugh.h
foo: $(sources)
	cc $(filter %.c %.s,$(sources)) -o foo
					

Produce: the full power of Python


[]
sources = foo.c bar.c baz.s ugh.h

[foo]
deps = %{sources}
recipe = cc %{' '.join([f for f in sources.split() \
        if f.endswith('.c') or f.endswith('.s')])}
					

Minor Make annoyances fixed

  • % vs. $* confusion
  • $$ in recipe

Design goals

  1. killer feature: multiple wildcards
  2. low barrier for users
    • intuitive syntax
    • easy to install: one file, drop it in $PATH
  3. do one thing and do it well
    • any scripting language for recipes
    • Python for dependency generation
    • thin wrapper for Producefile syntax and dependency management
  4. low development cost

Why Python?

  • no compilation needed
  • rich standard library
    • argparse for command-line options
    • configparser and shlex for parsing Producefiles
    • eval for evaluating Python expressions
    • subprocess for executing recipes
    • logging for info and debugging

Other build automation tools

  • Ant, SCons, A-A-P…
    • define their own languages for recipes
    • designed with specific tasks in mind
  • Rake
    • Rakefile = Ruby script
    • too much syntax
  • redo
    • commendably minimal

 

None of them support multiple wildcards!

Rake example


# Compile
rule '.o' =< ['.c'] do |t|
  sh "cc -c #{t.source}"
end
						
  • too much syntax to remember
  • no multiple wildcards (just regex)

Coolest challenges

  • interplay of %{wildcards} and %{"expressions"}
  • the build algorithm

The build algorithm

Desiderata

  • automatically run all commands to produce requested target
  • don’t build files that are already up-to-date (timestamp)
  • allow for deleting intermediate files without affecting up-to-dateness

Phase 1: build dependency graph, gather info

Starting from targets requested by user, for each target,
  1. fail on cyclic dependency
  2. stop if target already seen
  3. determine which rule to use (first that matches)
  4. list direct dependencies, process them recursively
  5. determine target type: file or task
  6. determine whether target is missing
  7. determine target time
  8. determine whether target is out of date

Phase 2: produce

For each target requested by user, call build_if_necessary(target):

def build_if_necessary(target):
    if target in out_of_date or target in missing:
        build(target)

def build(target):
    for dd in direct_dependencies[target]:
        build_if_necessary(dd)
    run_recipe(target)
    out_of_date.discard(target)
    missing.discard(target)

						

“Missing”?

A target is missing iff its type is file and it does not exist.

“Out of date”?

A target is out of date if any of these conditions hold:
  • its type is task
  • some direct dependency is newer
  • some direct dependency is out of date
  • the “always build” option is on

“Newer”?

The time of a task is 0.

The time of a missing file is the time of its newest direct dependency (or 0 if none).

The time of an existing file is its last-modified time.

# TODO

  • parallel building
  • includes
  • tweaking options for fooling the build algorithm

Come get some Produce!

fruit and vegetables on a stand https://github.com/texttheater/produce/

Photo courtesy of Patrick Feller, CC BY 2.0