PyClaireH versus RyClaireH: which bot wins the imitation game?

I’ve been meaning to write about my newest bot for a while, but finally here we are. Welcome, RyClaireH, to the fold. Yay! In case you’re wondering who the other bot is, you can find out all about PyClaireH in this blog post. For those already familiar with Py, the easiest way to describe Ry is through her differences. Where Py is a Markov chain (more detail on this), Ry is a much more sophisticated pseudo-Markov chain. Py essentially uses word-level probabilities to construct sentences based on the likelihood that one word will occur after another. On the other hand, Ry uses NLP (natural language processing). From this toolkit, she tags each word in the data for its part of speech (e.g. noun, verb, adverb, adjective, etc.) in advance, and also uses dictionaries of nouns and adjectives to help her formulate more syntactically coherent tweets. In a nutshell, Py is a very simple bot that works at the level of the word (lexis), whereas Ry works with both words (lexis) and grammar (syntax). Neither bot has any help at the higher levels of language, such as with the meanings of words (semantics), and certainly not with the meanings communicated beyond the word (pragmatics). Arguably a semantically “aware” bot is possible – semantic tagging by something like the USAS tagger could provide a nice way in, but to incorporate that into the model requires a level of programming competence I don’t possess.

As I mentioned in my Py post a few months ago, one of my interests was in whether more coding and extra tools would help Ry to be more convincing than Py, or if it would actually hinder her, so this post is a non-serious, barely-scientific, entirely-amusement-driven shoot-out between the two. You can (and indeed should) pick about three hundred holes in the general methods and rigour of this, but I refuse to let that stop me.

So, onwards. Continue reading