Trying out running LLM on my Macbook

06 Apr, 2026

It's a little bit late to the party to start testing Local LLM deployment using homebrew install ollama but hey it's better late than never.

I browse around to decide which model to test out using canirun.ai. it's a useful tool to have a glimpse on what LLMs your machine could handle.

Mine is a Apple M2 Pro Chip with 16GB vram. Not powerful but it could handle some decent LLM in the early 2026.

I test run these models

tinyllama:latest
qwen3.5:9b
gemma4:e2b

I imagine the tinyllama could be tiny and fast. but it's capability is so limited that I think it's really hard for me to think a use case for it. But maybe I am wrong. I welcome any input. But at least I won't give meaningful task to a model that miscalculate 2+2 and mis count how many e is in apple.

The other two however, is pretty impressive. Running those capability locally is beyond my imagination. I have not dig deeper on the factors that dictate response time of a model but gemma4:e2b is super fast and even in thinking mode. I can imagine doing works with it. qwen3.5:9b is not bad either but it's thinking mode is taking much longer and I could observe some "over thinking" in some simple tasks.

ok that's my initial though of it. now my next move is to try rigging those to see if they could replace my API-based use cases that always talk to Claude and OpenAI frontier models.