ToC
Generated with Clerk from notebooks/usage.clj

llama.clj

llama.clj is a clojure wrapper for the llama.cpp library.

Dependency

deps.edn dependency:

com.phronemophobic/llama-clj {:mvn/version "0.8.2"}

Requires

All of the docs assume the following requires:

(require '[com.phronemophobic.llama :as llama])

Throughout these docs, we'll be using the llama 7b chat model. and the following context based on this model.

;; downloaded previously from
;; https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin
(def llama7b-path "models/llama-2-7b-chat.ggmlv3.q4_0.bin")
(def llama-context (llama/create-context llama7b-path {:n-gpu-layers 1}))

Overview

The llama.clj API is built around two functions, llama/create-context and llama/generate-tokens. The create-context builds a context that can be used (and reused) to generate tokens.

Context Creation

The llama/create-context has two arities:

(llama/create-context model-path)
(llama/create-context model-path opts)

If no opts are specified, then defaults will be used.

The model-path arg should be a string path (relative or absolute) to a F16, Q4_0, Q4_1, Q5_0, Q5_1, or Q8_0 ggml model.

Token Generation

Once a context is created, it can then be passed to llama/generate-tokens. The llama/generate-tokens function returns seqable or reducible sequence of tokens given a prompt. That means generated tokens can be processed using all of the normal clojure sequence and transducer based functions.

(first (llama/generate-tokens llama-context "Hello World"))
29991
(clojure.string/join
(eduction
(llama/decode-token llama-context)
(take 10)
(llama/generate-tokens llama-context "Hello World")))
! My name

Generating Text

Working with raw tokens is useful in some cases, but most of the time, it's more useful to work with a generated sequence of strings corresponding to those tokens. Lllama.clj provides a simple wrapper of llama/generate-tokens for that purpose, llama/generate.

(into
[]
(take 5)
(llama/generate llama-context "Write a haiku about documentation."))
[" " "U" "n" "t" "e"]

If results don't need to be streamed, then llama/generate-string can be used to return a string with all the generated text up to the max context size.

(llama/generate-string
llama-context
"Write a haiku about documentation.")
Unterscheidung:
Documentation
Writing in silence
Behind every code

Generating Embeddings

To generate embeddings, contexts must be created with :embedding set to true.

(def llama-embedding-context
(llama/create-context llama7b-path {:n-gpu-layers 1
:embedding true}))
#object[com.phronemophobic.llama.raw.proxy$com.sun.jna.Pointer$ILookup$ILLamaContext$AutoCloseable$49378b1 0x42375f8c "
native@0x7fd92fe6d300"
]
(llama/generate-embedding llama-embedding-context "some text")
#object["
[F"
0x55ee9a17 "
[F@55ee9a17"
]