llama.clj is a clojure wrapper for the llama.cpp library.
deps.edn dependency:
All of the docs assume the following requires:
Throughout these docs, we'll be using the llama 7b chat model. and the following context based on this model.
The llama.clj API is built around two functions, llama/create-context
and llama/generate-tokens
. The create-context
builds a context that can be used (and reused) to generate tokens.
The llama/create-context
has two arities:
If no opts
are specified, then defaults will be used.
The model-path
arg should be a string path (relative or absolute) to a F16, Q4_0, Q4_1, Q5_0, Q5_1, or Q8_0 ggml model.
Once a context is created, it can then be passed to llama/generate-tokens
. The llama/generate-tokens
function returns seqable or reducible sequence of tokens given a prompt. That means generated tokens can be processed using all of the normal clojure sequence and transducer based functions.
Working with raw tokens is useful in some cases, but most of the time, it's more useful to work with a generated sequence of strings corresponding to those tokens. Lllama.clj provides a simple wrapper of llama/generate-tokens
for that purpose, llama/generate
.
If results don't need to be streamed, then llama/generate-string
can be used to return a string with all the generated text up to the max context size.
To generate embeddings, contexts must be created with :embedding
set to true
.