CWMClone in SWI-Prolog (alpha release)

This is an early draft of a CWM clone. Well, perhaps more than a CWM clone (I'd like to have multiple engines). But definitely a inferencing system that uses N3 as its logic language.

I started writing it to get a grip on the syntax and semantics of N3, and to play around with implementing tokenizes, programming language, parsers, and metacircular inference engines in Prolog. I've learned a fair bit, though exactly what is a bit harder to tell :)

Oh, I was also convinced that CWM.py is way slower than it had to be. I'm hoping that a clean implementation in a reasonable Prolog will also exhibit loads of speed.

It's extremely hacky.(Not as bad as it was :)) I'm still feeling my way around N3, CWM, and Prolog. But it reads a fair bit of N3, and can do a bit of --think style inference. Not too shabby.

Ok, very shabby. Don't tell me, fix it and show me up :)

Downloading it

You don't want to download it, do you? Egad ....

Well, for the masochistic, try:

http://www.unc.edu/~bparisa/sw/cwmclone/cwmclone.P

You also need SWI-Prolog, and the rdf_db.pl and xml_write.pl modules (pop them in your library directory). Peek at the RdfDB page for useful rdf_db.pl commands.

Whoops, you actually need my rdf_db.pl, as it handles subject literals. Bleah.

http://www.unc.edu/~bparisa/sw/cwmclone/rdf_db.pl.txt (strip off the .txt)

You may also want:

http://www.unc.edu/~bparsia/SW/cwmclone/test.n3 (parent/child/grandparent)
http://www.unc.edu/~bparsia/SW/cwmclone/rules12.2.n3 (slightly modifed version, rules12.n3; using rule generating rules.###You don't want this anymore. The regular rules12.n3 works fine now.

Using it

You downloaded it, and now you want to use it? Oh my.

Well, I guess I should give you a few hints, so you don't totally hate me. Or maybe so you will.

Fire up SWI-Prolog. Consult cwmclone.P. (e.g., by entering ['path/to/cwmclone.p']. at the command prompt. If you stash it in your library directory [library(cwmclone)]. may work (not sure about the .P suffix, I generally use .pl, but people get confused).

Yeah yeah, warnings galore. It's not polished, OK?

You can consult_n3 a file into a rdf_db database:

?- consult_n3('Path/to/test.n3', user).

Now:

?- compile_rules(user).

Will convert all the top level rules to a form the engine can process.

?- filter(user, test).

Will write the results of (applying the rules in the top level context (user) to the top level context) into rdf_dB(test). You can scan the results by:

?- rdf(S, P, O, test).

(Remember, the semicolon will take you to the next result.)

Finally, you can do something like --think:

?- apply_rules(user).

(Only do this for user.) rdf(S, P, O). should show all the triples read in, plus all the triples derived from repeated application of the rules to and into the top level context (user).

For example, test.n3 has the following rules:

this log:forAll t:x, t:y, t:z.
{t:x t:parent t:y.}log:implies {t:y t:child t:x}.
{t:x t:parent t:y. t:y t:parent t:z.} log:implies {t:x t:gParent t:z. t:z t:gChild t:x}.
{t:x t:parent t:y. t:y t:gParent t:z.} log:implies {t:x t:ggParent t:z.}.

Given base facts only of parent relations, this will generate great-grand-parent relations. Notice that it first has to generate grandparent relations.

Another example, rules12.n3 has a rule generating rule for transitive properties. It declares "ancestor" to be a transitive property, and provides a few relations using that property. First filtering will get the derived rule (i.e., the specialization of transitivity to #ancestor). If you filtered into the top context, a second filter will apply the transitivity. If you do that in the top context, you have the equivalent of apply_rules/1.

The N3 read-evaluate-print loop (repl)

Well, it's not really a repl. Evaluation is sorta weak. But you can enter N3 statements into the current dB Start it up with:

?- n3_repl.

Shut it down with:

n3- @stop.

No way to apply or filter from this line, yet. Or to query. It's still fun, I think.

Why the hell am I "releasing" it?

Well, I'm not, really. Not really. But, why not? It's actually something one might use to make something much better :)

Actually, in the past week (well, week of Feb. 23 or so) I've made so much progress I'm no longer ashamed of it, even a little bit. I'm parsing loads of real world N3 files and am closing in on inference correct. Built-ins are the last bit of compiler hacking to do for functionality, and then implementing the individual built-ins.

Note: This is not a contribution, nor should the author, or any criticism from him, be taken seriously.

--Bijan Parsia.

ToDo

TooDone

Built-ins

Built-ins deserve their own real section. They are nasty and complicated, and muck up a lot of things easily. For me, at least. I find it very unclear what to do with built-ins in the consequent (do they change the builtin?). The compiler doesn't yet acknowledge builtins, as such. The following built-ins have, at least, scratch implementations:

Tests and Timings

Tests to make work

A lot of the tests floating around seem rather uninteresting. I mean, tests of builtins are handy, I suppose, but surely the (an?) interesting bit is the inference stuff. I guess translation stuff is interesting, I guess. Maybe. I'm not all that interested, though, in N3 <-> RDF or even <-> NTriples. Good chunks of N3 don't make the roundtrip, and I've already got an RDF parser (which I could use to feed RDF files into a N3 db). I don't see many tests with what the results should be. And I'm not finding good performance hitters.

Hmm. Euler stuff might help! Er..especially for cloning Euler :)

Performance

Performance is a big reason for my writing this. cwm.py (old, 1.8x versions) are slow. (Or so people complain.) New cwm.py is super duper slow. And, to be frank, the internals are hell. Nobody wants to touch them and even when TimBL touches them, sometimes bad things happen :)

Implementing a logic engine in Python, not using continuations, seems likely to be tedious and hard to get right and fast (though, I may take a shot at translating the engine in Computing With Logic over to Python). Prolog has decades of implementation experience, often directed at making it fast. So, I figured that a fairly straightforward implementation of CWM in Prolog was likely to be both much easier to understand, and as fast, if not faster.

Where do things stand now? Well, it seems that a fair number of the longer running examples use a fair number of built-ins, so testing them will have to wait (and it seems that their performance is somewhat dependant on those built-ins, which basically comes down to which libs are better, Python's or the Prolog implementation's). CWM's parser is likely to whomp CWMClone's for a number of reasons. Thus far, however, inference time seems comparable to old CWM. (And there are some rank inefficiencies, like recompiling all the rules after each round of application.)

Ooh, I pulled out rdf_assert/4 from the compiled rules and a filter rocked. Subsecond performance!!!! on the rules/0_95 test! CWM on a faster machine takes 20 or so (but it is fetching the files from the net). Yeah, there's some speed to be mined. (Note that rdf_assert/4 is used merely because it results in the nice rdf_db.pl convenience functions ... which aren't really needed by a CWMClone.

Future thoughts: I like to have the compiler compiler to Mercury programs, and thence to C. Compilied N3 scripts! (GNU Prolog might work well for this too, but Mercury is interesting for other reasons.)

Indeed, I think that porting CWMClone will be my Learning Mercury project. Mercury aims to be logically purer than Prolog and way more efficiently compileable. Whether this is true, I don't know. There are certainly faster Prologs than SWI, and this ain't bad!

Thanks and Acknowledgements

Thanks to Sean B. Palmer for listening to me whinge, testing stuff, running stuff, running CWM for comparsion, running Eep for comparsion, hosting test files, putting together the CWM info page, etc. etc. etc. Now if he'd only learn Prolog and slave for me.