It's easy to think of Camlp4 as just "defmacro on steroids"; that is, just a tool for syntax extension, but it is really a box of independently-useful tools. As we've seen, Camlp4 can be used purely for code generation; in this post I'll describe a tool that uses it purely for code consumption: a (minimal, broken) version of otags:
openCamlp4.PreCastmoduleM =Camlp4OCamlRevisedParser.Make(Syntax)moduleN =Camlp4OCamlParser.Make(Syntax)We're going to call the OCaml parser directly. These functor applications are used only for their effect (which is to fill in an empty grammer with OCaml cases); ordinarily they would be called as part of Camlp4's dynamic loading process. Recall that the original syntax parser is an extension of the revised parser, so we need both, in this order.
letfiles =ref[]letrecdo_fn fn =letst =Stream.of_channel (open_in fn)inletstr_item =Syntax.parse_implem (Loc.mk fn) st inletstr_items =Ast.list_of_str_item str_item []inlettags =List.fold_right do_str_item str_items []in files :=(fn, tags)::!filesWe'll call
do_fn
for each filename on the command line. The Syntax.parse_implem
function takes a Loc.t
and a stream, and parses the stream into a str_item
. (The initial Loc.t
just provides the filename so later locations can refer to it, for error messages etc.) Now, recall that even though we got back a single str_item
, it can contain several definitions (collected with StSem
). We use Ast.list_of_str_item
to get an ordinary list, then accumulate tags into files
.
anddo_str_item si tags =match si with(* | <:str_item< let $rec:_$ $bindings$ >> -> *)|Ast.StVal (_, _, bindings)->letbindings =Ast.list_of_binding bindings []inList.fold_right do_binding bindings tags | _ -> tagsWe'll only consider value bindings. The commented-out
str_item
quotation doesn't work (run it through Camlp4 to see why--I'm not sure where the extra StSem
/StNil
come from), so we fall back to an explicit constructor. (The rec
antiquotation matches a flag controlling whether an StVal
is a let rec
or just a let
; here we don't care.) Now we have an Ast.binding
, which again can contain several bindings (collected with BiAnd
) so we call Ast.list_of_bindings
.
anddo_binding bi tags =match bi with|<:binding@loc<$lid:lid$=$_$>>->letline =Loc.start_line loc inletoff =Loc.start_off loc inletpre ="let "^ lid in(pre, lid, line, off)::tags | _ -> tagsWe're going to generate an
etags
-format file, where each definition consists of a prefix of the line in the source, the tag itself, the line number, and the character offset. If you look in the parser you'll see that the left side of a binding can be any pattern (as you'd expect), but we only handle the case where it's a single identifier; the lid
antiquotation extracts it as a string. The line number and character offset are easy to find from the location of the binding (see camlp4/Camlp4/Sig.ml for the Loc
functions), which we get with @loc
. The prefix is problematic: the location of the binding does not include the let
or and
part, and anyway what we really want is everything from the beginning of the line. Doable but not so instructive of Camlp4, so we just tack on a "let "
prefix (so this doesn't work for and
or if there is whitespace).
letprint_tags files =letch = open_out "TAGS"inListLabels.iter files ~f:(fun(fn, tags)->Printf.fprintf ch "\012\n%s,%d\n" fn 0;ListLabels.iter tags ~f:(fun(pre, tag, line, off)->Printf.fprintf ch "%s\127%s\001%d,%d\n" pre tag line off))Generating the tags file is straightforward, following the description at the bottom of the otags README. (The
0
is supposed to be the length of the tag data, but my Emacs doesn't seem to care.) We put the pieces together with Arg
:
;;Arg.parse [] do_fn "otags: fn1 [fn2 ...]"; print_tags !filesand finally, a Makefile:
otags: otags.ml ocamlc \ -pp camlp4of \ -o otags \ -I +camlp4 -I +camlp4/Camlp4Parsers \ dynlink.cma camlp4fulllib.cma otags.mlWe could improve this in many ways (error-handling, patterns, types, etc.); clearly we can't replicate otags in a few dozen lines. But Camlp4 takes care of a lot of the hard work. Next time, maybe, an actual syntax extension.