Running open LLM models on Apple Silicon

Experiments using a Mac Mini M2Pro 32G

Nov 26, 2023

Since I purchased my Mac Mini last month I have tried three methods for running LLM models on Apple Silicon. Here I will only discuss using Ollama since this is the method I now use most of the time. Ollama can be run on the command line and it supports a REST interface. I have 32G of memory, but for the examples here 16G is also works well. With 32G I can run 3 or 4 7B models or 2 13B models concurrently.

Note: if you don’t have a Mac with Apple Silicon you can still try Ollama using my short demo Google Colab notebook olama_local_langchain.ipynb

Start by installing Ollama application that should also install the command line utility on your path /usr/local/bin/ollama

The first time you reference a model it is downloaded and cached for future use. You can also directly install a model, for example:

ollama pull mistral:instruct

I like to place my prompts in a text file and use something like the following to run a prompt:

ollama run mistral:instruct --verbose "Please process $(cat test.txt)"

I usually don’t run using the --verbose option.

You can both use a REPL and also enable a REST service by just running a model:

ollama run mistral:instruct

>>> Send a message (/? for help)

Here is a link to my Racket book showing how to setup Ollama and use with a Racket (Scheme) client: https://leanpub.com/racket-ai/read#leanpub-auto-using-a-local-mistral-7b-model-with-ollamaai

If you prefer using using a Python client then pip install the library langchain and try this example script:

from langchain.callbacks.manager import CallbackManager

from langchain.llms import Ollama

llm = Ollama(

model="mistral:7b-instruct",

verbose=False,

)

s = llm("how much is 1 + 2?")

print(s)

Mark Watson’s Artificial Intelligence Books and Blog

Discussion about this post