Generated with Clerk from notebooks/usage.clj


llama.clj is a clojure wrapper for the llama.cpp library.


deps.edn dependency:

com.phronemophobic/llama-clj {:mvn/version "0.8.2"}


All of the docs assume the following requires:

(require '[com.phronemophobic.llama :as llama])

Throughout these docs, we'll be using the llama 7b chat model. and the following context based on this model.

;; downloaded previously from
(def llama7b-path "models/llama-2-7b-chat.ggmlv3.q4_0.bin")
(def llama-context (llama/create-context llama7b-path {:n-gpu-layers 1}))


The llama.clj API is built around two functions, llama/create-context and llama/generate-tokens. The create-context builds a context that can be used (and reused) to generate tokens.

Context Creation

The llama/create-context has two arities:

(llama/create-context model-path)
(llama/create-context model-path opts)

If no opts are specified, then defaults will be used.

The model-path arg should be a string path (relative or absolute) to a F16, Q4_0, Q4_1, Q5_0, Q5_1, or Q8_0 ggml model.

Token Generation

Once a context is created, it can then be passed to llama/generate-tokens. The llama/generate-tokens function returns seqable or reducible sequence of tokens given a prompt. That means generated tokens can be processed using all of the normal clojure sequence and transducer based functions.

(first (llama/generate-tokens llama-context "Hello World"))
(llama/decode-token llama-context)
(take 10)
(llama/generate-tokens llama-context "Hello World")))
! My name

Generating Text

Working with raw tokens is useful in some cases, but most of the time, it's more useful to work with a generated sequence of strings corresponding to those tokens. Lllama.clj provides a simple wrapper of llama/generate-tokens for that purpose, llama/generate.

(take 5)
(llama/generate llama-context "Write a haiku about documentation."))
[" " "U" "n" "t" "e"]

If results don't need to be streamed, then llama/generate-string can be used to return a string with all the generated text up to the max context size.

"Write a haiku about documentation.")
Writing in silence
Behind every code

Generating Embeddings

To generate embeddings, contexts must be created with :embedding set to true.

(def llama-embedding-context
(llama/create-context llama7b-path {:n-gpu-layers 1
:embedding true}))
#object[com.phronemophobic.llama.raw.proxy$com.sun.jna.Pointer$ILookup$ILLamaContext$AutoCloseable$49378b1 0x42375f8c "
(llama/generate-embedding llama-embedding-context "some text")
0x55ee9a17 "