Overview
Teaching: 15 min
Exercises: 15 minQuestions
How can I write a Makefile to update things when my scripts have changed rather than my input files?
Objectives
Output files are a product not only of input files but of the scripts or code that created the output files.
Recognize and avoid false dependencies.
Our Makefile now looks like this (download it from here):
# Generate summary table.
results.txt : *.dat
python zipf_test.py $^ > $@
# Count words.
.PHONY : dats
dats : isles.dat abyss.dat last.dat
isles.dat : books/isles.txt
./wordcount $< > $@
abyss.dat : books/abyss.txt
./wordcount $< > $@
last.dat : books/last.txt
./wordcount $< > $@
wordcount : wordcount.cpp main.cpp
c++ --std=c++11 -o wordcount wordcount.cpp main.cpp
.PHONY : clean
clean :
rm -f *.dat
rm -f results.txt
Our data files are a product not only of our text files, but also the
program wordcount
that processes the text files and creates the
data files. The wordcount
program is, in turn, generated by compiling
the C++ source files wordcount.cpp
and main.cpp
. A change to the source (e.g. to add a
new column of summary data or remove an existing one) results in changes to the
.dat
files it outputs. So, let’s pretend to edit wordcount.cpp
,
using touch
, and re-run Make:
$ make dats
$ touch wordcount.cpp
$ make dats
Nothing happens! Though we’ve updated wordcount.cpp
and we have a rule
that will recompile wordcount
, this doesn’t happen and our data files
are not updated. This is because our rules for creating .dat
files don’t
record any dependencies on wordcount
.
We need to add wordcount
as a dependency of each of our
data files also:
isles.dat : books/isles.txt wordcount
./wordcount $< > $@
abyss.dat : books/abyss.txt wordcount
./wordcount $< > $@
last.dat : books/last.txt wordcount
./wordcount $< > $@
If we pretend to edit wordcount
and re-run Make,
$ touch wordcount.cpp
$ make dats
then we get:
c++ --std=c++11 -o wordcount wordcount.cpp main.cpp
./wordcount books/isles.txt > isles.dat
./wordcount books/abyss.txt > abyss.dat
./wordcount books/last.txt > last.dat
Dry run
make
can show the commands it will execute without actually running them if we pass the-n
flag:$ touch wordcount.cpp $ make -n dats
This gives the same output to the screen as without the
-n
flag, but the commands are not actually run. Using this ‘dry-run’ mode is a good way to check that you have set up your Makefile properly before actually running the commands in it.
The following figure shows the dependencies embodied within our
Makefile, involved in building the results.txt
target, after adding
wordcount
and zipf_test.py
as dependencies to their respective target files:
Why Don’t the
.txt
Files Depend onwordcount
?
.txt
files are input files and have no dependencies. To make these depend onwordcount
would introduce a false dependency.
Why Don’t the
.dat
Files Depend onwordcount.cpp
?The
.dat
files are generated bywordcount
. To make these depend onwordcount.cpp
would mean that they would be regenerated wheneverwordcount.cpp
was changed, however thewordcount
program would not be recompiled because its rule would not be triggered.
Intuitively, we should also add wordcount
as dependency for
results.txt
, as the final table should be rebuilt as we remake the
.dat
files. However, it turns out we don’t have to! Let’s see what
happens to results.txt
when we update wordcount
:
$ touch wordcount.cpp
$ make results.txt
then we get:
c++ --std=c++11 -o wordcount wordcount.cpp main.cpp
./wordcount books/abyss.txt > abyss.dat
./wordcount books/isles.txt > isles.dat
./wordcount books/last.txt > last.dat
python zipf_test.py abyss.dat isles.dat last.dat > results.txt
The whole pipeline is triggered, even the creation of the
results.txt
file! To understand this, note that according to the
dependency figure, results.txt
depends on the .dat
files. The
update of wordcount
triggers an update of the *.dat
files. Thus, make
sees that the dependencies (the .dat
files) are
newer than the target file (results.txt
) and thus it recreates
results.txt
. This is an example of the power of make
: updating a
subset of the files in the pipeline triggers rerunning the appropriate
downstream steps.
Updating One Input File
What will happen if you now execute:
$ touch books/last.txt $ make results.txt
- only
last.dat
is recreated- all
.dat
files are recreated- only
last.dat
andresults.txt
are recreated- all
.dat
andresults.txt
are recreatedSolution
3.
onlylast.dat
andresults.txt
are recreated.Follow the dependency tree to understand the answer(s).
wordcount
as a Dependency ofresults.txt
.What would happen if you actually added
wordcount
as dependency ofresults.txt
, and why?Solution
If you change the rule for the
results.txt
file like this:results.txt : *.dat wordcount python zipf_test.py $^ > $@
wordcount
becomes a part of$^
, thus the command becomespython zipf_test.py abyss.dat isles.dat last.dat wordcount > results.txt
This results in an error from
zipf_test.py
as it tries to parse the executable as if it were a.dat
file. Try this by running:$ make results.txt
You’ll get
python zipf_test.py abyss.dat isles.dat last.dat wordcount > results.txt Traceback (most recent call last): File "zipf_test.py", line 19, in <module> counts = load_word_counts(input_file) File "zipf_test.py", line 12, in load_word_counts for line in input_fd: File "/path/to/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte make: *** [results.txt] Error 1
We still have to add the zipf-test.py
script as dependency to
results.txt
. Given the answer to the challenge above, we cannot use
$^
for the rule. Instead we can use $<
to refer to the first dependency
i.e. *.dat :
results.txt : *.dat zipf_test.py
python zipf_test.py $< > $@
Where We Are
This Makefile contains everything done so far in this topic.
Key Points
Make results depend on processing scripts as well as data files.
Dependencies are transitive: if A depends on B and B depends on C, a change to C will indirectly trigger an update to A.