Since I purchased my Mac Mini last month I have tried three methods for running LLM models on Apple Silicon. Here I will only discuss using Ollama since this is the method I now use most of the time. Ollama can be run on the command line and it supports a REST interface. I have 32G of memory, but for the examples here 16G is also works well. With 32G I can run 3 or 4 7B models or 2 13B models concurrently.
Note: if you don’t have a Mac with Apple Silicon you can still try Ollama using my short demo Google Colab notebook olama_local_langchain.ipynb
Start by installing Ollama application that should also install the command line utility on your path /usr/local/bin/ollama
The first time you reference a model it is downloaded and cached for future use. You can also directly install a model, for example:
ollama pull mistral:instruct
I like to place my prompts in a text file and use something like the following to run a prompt:
ollama run mistral:instruct --verbose "Please process $(cat test.txt)"
I usually don’t run using the --verbose option.
You can both use a REPL and also enable a REST service by just running a model:
ollama run mistral:instruct
>>> Send a message (/? for help)
Here is a link to my Racket book showing how to setup Ollama and use with a Racket (Scheme) client: https://leanpub.com/racket-ai/read#leanpub-auto-using-a-local-mistral-7b-model-with-ollamaai
If you prefer using using a Python client then pip install the library langchain and try this example script:
from langchain.callbacks.manager import CallbackManager
from langchain.llms import Ollama
llm = Ollama(
model="mistral:7b-instruct",
verbose=False,
)
s = llm("how much is 1 + 2?")
print(s)