Quantcast
Channel: Ambassador to the Computers
Viewing all articles
Browse latest Browse all 25

Reading Camlp4, part 8: implementing quotations

$
0
0

The Camlp4 system of quotations and antiquotations is an awesome tool for producing and consuming OCaml ASTs. In this post (and the following one) we will see how to provide this facility for other syntaxes and ASTs. Here we consider just quotations; we’ll add antiquotations in the following post.

An AST for JSON

Our running example will be a quotation expander for JSON. Let’s begin with the JSON AST, in a module Jq_ast:

typet=|Jq_null|Jq_boolofbool|Jq_numberoffloat|Jq_stringofstring|Jq_arrayoftlist|Jq_objectof(string*t)list

This is the same (modulo order and names) as json_type from the json-wheel library, but for various reasons we will not be able to use json_type. The Jq_ prefix is for json_quot, the name of this little library.

Parsing JSON

We’ll use a Camlp4 grammar to parse JSON trees. It is not necessary to use Camlp4’s parsing facilities in order to implement quotations—ultimately we will need to provide just a function from strings to ASTs, so we could use ocamlyacc or what-have-you instead—but it is convenient. Here is the parser:

openCamlp4.PreCastopenJq_astmoduleGram=MakeGram(Lexer)letjson=Gram.Entry.mk"json";;EXTENDGramjson:[["null"->Jq_null|"true"->Jq_booltrue|"false"->Jq_boolfalse|i=INT->Jq_number(float_of_stringi)|f=FLOAT->Jq_number(float_of_stringf)|s=STRING->Jq_strings|"[";es=LIST0jsonSEP",";"]"->Jq_arrayes|"{";kvs=LIST0[s=STRING;":";j=json->(s,j)]SEP",";"}"->Jq_objectkvs]];END

We use the default Camlp4 lexer (with MakeGram(Lexer)); as we have seen, keywords mentioned in a Camlp4 grammar are added to the lexer, so we don’t need to do anything special to lex null etc. However, while JSON/Javascript has a single number type, the default lexer returns different tokens for INT and FLOAT numbers, so we convert each to Jq_number. In fact, these tokens (along with STRING) represent OCaml integer, float and string literals, which do not exactly match the corresponding JSON ones, but they are fairly close so let’s not worry about it for now; we’ll revisit the lexer in a later post.

The parser itself is pleasingly compact; we can make good use of the LIST0 special symbol and an anonymous entry for parsing objects. Unfortunately things will get a little more complicated when we come to antiquotations.

Lifting the AST

Next we need to “lift” values of the JSON AST to values of the OCaml AST. What does “lift” mean, and why do we need to do it? The goal is to convert quotations in OCaml code, such as

letx=<:json<[1,"foo",true]>>

into the equivalent

letx=Jq_ast.Jq_array[Jq_ast.Jq_number1.;Jq_ast.Jq_string"foo";Jq_ast.Jq_booltrue]

This is to happen as part of Camlp4 preprocessing, which produces an OCaml AST, so what we produce in place of the <:json< ... >> expression must be a fragment of OCaml AST. We have a parser which takes a valid JSON string to the JSON AST; what remains is to take a JSON AST value to the corresponding OCaml AST. So we need a function with cases something like:

|Jq_null-><:expr<Jq_null>>|Jq_numbern-><:expr<Jq_number$`flo:n$>>|...

It is not such a big deal to hand-write this lifting function for a small AST like JSON, but it is arduous and error-prone for full-size ASTs. Fortunately Camlp4 has a filter which does it for us. Let’s first look at the signature of the Jq_ast module:

openCamlp4.PreCasttypet=...(* as above *)moduleMetaExpr:sigvalmeta_t:Ast.loc->t->Ast.exprendmoduleMetaPatt:sigvalmeta_t:Ast.loc->t->Ast.pattend

The generated modules MetaExpr and MetaPatt provide functions to lift a JSON AST to either an OCaml expr (when the quotation appears as an expression) or patt (when it appears as a pattern). The loc arguments are inserted into the resulting OCaml AST so that compile errors have correct locations.

Now the implementation of Jq_ast:

moduleJq_ast=structtypefloat'=floattypet=(* almost as above *)...|Jq_numberoffloat'...endincludeJq_astopenCamlp4.PreCast(* for Ast refs in generated code *)moduleMetaExpr=structletmeta_float'_locf=<:expr<$`flo:f$>>includeCamlp4Filters.MetaGeneratorExpr(Jq_ast)endmoduleMetaPatt=structletmeta_float'_locf=<:patt<$`flo:f$>>includeCamlp4Filters.MetaGeneratorPatt(Jq_ast)end

The file needs the Camlp4MetaGenerator filter (the camlp4.metagenerator package with findlib). The main idea is that the calls to Camlp4Filters.MetaGenerator{Expr,Patt} are expanded into the lifting functions. But there are a couple of fussy details:

First: The argument module Jq_ast which we pass to the generators is used both on the left and right of the generated function; if you look at the generated code there are cases like:

|Jq_ast.Jq_null-><:expr<Jq_ast.Jq_null>>

(The <:expr< .. >> is already expanded in the actual generated code.) We need the AST to be available qualified by the module Jq_ast both in the current file and also in code that uses the quotation. So we have a nested Jq_ast module (for local uses, on the left-hand side) which we include (for external uses, on the right-hand side).

Second: The generators scan all the types defined in the current module, then generate code from the last-appearing recursive bundle. (In this case the recursive bundle contains just t, but in general there can be more than one; mutually recursive lifting functions are generated.) There are some special cases for predefined types, and in particular for float; however, it seems to be wrong:

letmeta_float_locs=Ast.ExFlo(_loc,s)

The ExFlo constructor takes a string representing the float, but calls to this function are generated when you use float in your type. To work around this, we define the type float' (on its own rather than as part of the last-appearing recursive bundle, or else Camlp4 would generate a meta_float' that calls meta_float), and provide correct meta_float' functions. There is a similar bug with meta_int, but meta_bool is correct, so our Jq_bool case does not need fixing.

(It is interesting to contrast this approach of lifting the AST with how it is handled in Template Haskell using the “scrap your boilerplate” pattern; see Geoffrey Mainland’s paper Why It’s Nice to be Quoted.)

Quotations

Finally we can hook the parser and AST lifter into Camlp4’s quotation machinery, in the Jq_quotations module:

openCamlp4.PreCastmoduleQ=Syntax.Quotationletjson_eoi=Jq_parser.Gram.Entry.mk"json_eoi"EXTENDJq_parser.Gramjson_eoi:[[x=Jq_parser.json;EOI->x]];END;;letparse_quot_stringlocs=Jq_parser.Gram.parse_stringjson_eoilocsletexpand_exprloc_s=Jq_ast.MetaExpr.meta_tloc(parse_quot_stringlocs)letexpand_str_itemloc_s=letexp_ast=expand_exprlocNonesin<:str_item@loc<$exp:exp_ast$>>letexpand_pattloc_s=Jq_ast.MetaPatt.meta_tloc(parse_quot_stringlocs);;Q.add"json"Q.DynAst.expr_tagexpand_expr;Q.add"json"Q.DynAst.patt_tagexpand_patt;Q.add"json"Q.DynAst.str_item_tagexpand_str_item;Q.default:="json"

First, we make a new grammar entry json_eoi which parses a json expression followed by the end-of-input token EOI. Grammar entries ordinarily ignore the rest of the input after a successful parse. If we were to use the json entry directly, we would silently accept quotations with trailing garbage, and in particular incorrect quotations that happen to have a correct prefix, rather than alerting the user.

Then we register quotation expanders for the <:json< >> quotation in the expr, patt, and str_item contexts (str_item is useful because that is the context at the top level prompt), using Syntax.Quotation.add. All the expanders do is call the parser, then run the result through the appropriate lifting function.

Finally we set json as the default quotation, so we can just say << >> for JSON quotations. This is perhaps a bit cheeky, since the user may want something else as the default quotation; whichever module is loaded last wins.

It is worth reflecting on how the quotation mechanism works in the OCaml parser: There is a lexer token for quotations, but no node in the OCaml AST, so everything must happen in the parser. When a quotation is lexed, its entire contents is returned as a string. (Nested quotations are matched in the lexer—see quotation and antiquot in camlp4/Camlpl4/Struct/Lexer.mll—without considering the embedded syntax; this makes the << and >> tokens unusable in the embedded syntax.) The string is then expanded according to the table of registered expanders; expanders return a fragment of OCaml AST which is inserted into the parse tree.

You might have thought (as I did) that something fancy happens with quotations, e.g. Camlp4 switches to a different parser on the fly, then back to the original parser for antiquotations. But it is much simpler than that. At the same time, it is much more complicated than that, as we will see next time when we cover antiquotations (and in particular how nested antiquotations/quotations are handled).

(You can find the complete code here, including a pretty-printer and integration with the top level; after building and installing you can say e.g.

#<<[1,"foo",true]>>;;-:Jq_ast.t=[1,"foo",true]

although without antiquotations it is not very useful.)


Viewing all articles
Browse latest Browse all 25

Trending Articles