cpp-markov-chain

A simple C++ markov chain and HTTP server, designed to be used in tandem.

The Chain Terminal

This is the main method of interacting with chains locally, and can be used to load, create and run chains.

About Chains

Chains are implemented via C++ maps between a phrase and a word, and when one is queried, the query includes a hard/soft limit. Chains work on spaces - they include punctuation in words (e.g. 'complete.'). The soft limit determines the minimum word count the chain can run for before a full stop ends the output (this does not include instances where the chain runs out of data to spit out), such as 'I am a fish.', where a soft limit of 2 would lead to the output ending here. Similarly, the hard limit is an absolute cut-off point - once the hard limit is reach the ouput immediately ends (e.g. 'I am a...' with a hard limit of 3).

The number of words a chain has in its context is referred to as its 'length'. A chain with a length of 1 just knows what word follows another, a chain with a length of 2 knows that 'I am' follows 'a' and 'am a' follows 'fish.', in the sentence 'I am a fish.'. Furthermore, the words are case-sensitive, and include any punctuation (ie 'fish' is not a word, but 'fish.' is).

key Characters

Directories are separated from the command character by a single space.

'n' create a new chain
'l' loads a chain from a given directory*
't' trains a chain on a given directory
's' saves a chain to a given directory*
'c' changes a chain option (e.g. default soft/hard limit)
- 's' for soft limit
- 'h' for hard limit
- 'd' sets the debug mode. Enter a 1 to enable it, or 0 to disable it
'd' displays information about the chain
'>' regurgitates the rest of the input
'q' quits the program

Otherwise, it is assumed that the text was meant to be as input to the chain for regurgitation.

* chains should be saved with an .jkc extension. If the chain has already been saved/loaded, it will remember this location, so you won't need to specify the location later

Running The Server

When initialising the server, do ./run-server (-c [server config path]) (d). The server config is optional and stores settings about the server. If a config cannot be found, values will default to:

port = 6678 model-directory = ../models/

'd' will run the server in debug mode, with a lot of info being printed to the terminal.

Running The Server as a Container

The container uses a two-stage build process (hence why .dockerignore is a bit bland) to create only the necessary files in the image (the server executable, config file, and models directory).

Querying The Server

The http request contains up to 4 parameters - the model, the prompt, the hard limit, and the soft-limit. An example request would be 'http://localhost:6678/model=mdl1.jkc&prompt=My%20name520is&soft_limit=3'. Note that 'hard_limit' is not included, so will default to whatever value is inside the server's config file. In this prompt, the model queried would be (server.model-directory + '/mdl1.jkc').

Testing The Chain

The chain test suite first trains a model on a very small input, and tests it locally, including checking for random chance (one input phrase having multiple output words). Next, similar tests are done via the API through python and the requests package.

Sometimes the code will throw errorno 111, meaning the server didnt get the socket it was supposed to, so when the tests try to access the socket, they can't. Just give it a bit and try again later. This can also happen while doing ./run-server, though no error message will appear, so just stop the process, wait a bit, and try again.

Personal Reflection on the Project

I was originally inspired to make something like this after reading about Nepenthes, an anti-web-scraper tar pit, designed to trap (particularly gen-ai) crawlers that violated a website's robots.txt, and uses markov babble to poison/entertain them while doing this.

This was also a project I aimed to do with minimal ai assistance (the occasional pointer towards c++ syntax), and while using git in the CLI. This is why an embarassing number of the commits are along the lines of 'forgot x'. Setting up github actions also took some time, with the reason for most of the errors simply being that some of the files it was trying to transfer had been .gitignore'd, so weren't there. Hopefully, lesson learned.

The github actions file is originally courtesy of the University of Warwick Computing Society, and I am also using their services to host an example container - https://examplejarkov.containers.uwcs.co.uk/?model=example.jkc, trained on this file (except this last bit, unless I decide to retrain it (unlikely)).

*A combination of my name + markov, it was not intended to sound explicit.

Theres a few more features I could add, but I am happy with it's current state, being actually somewhat usable. One example would be finding an Accept header value that actually displays the result as json to my browser, though it's in the right format to be parsed.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
build		build
include		include
models		models
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.cpp		main.cpp
runserver.cpp		runserver.cpp
todo		todo
todo2		todo2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cpp-markov-chain

The Chain Terminal

About Chains

key Characters

Running The Server

Running The Server as a Container

Querying The Server

Testing The Chain

Personal Reflection on the Project

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

Anonyyymous/cpp-markov-chain

Folders and files

Latest commit

History

Repository files navigation

cpp-markov-chain

The Chain Terminal

About Chains

key Characters

Running The Server

Running The Server as a Container

Querying The Server

Testing The Chain

Personal Reflection on the Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages