Gemma 4:e4b as a Local Coding Assistant — First Look
Google DeepMind released Gemma 4 this week, and I couldn’t help myself — I had to see if the smaller e4b model was worth using as a local coding assistant on my MacBook Pro M5 (base model, 16GB). This is the first time I’ve seriously tested a local LLM. Realistically, I hadn’t found one that ran well enough on my Mac to bother.
What got my attention was this claim from the announcement:
E2B and E4B models: A new level of intelligence for mobile and IoT devices Engineered from the ground up for maximum compute and memory efficiency, these models activate an effective 2 billion and 4 billion parameter footprint during inference to preserve RAM and battery life. In close collaboration with our Google Pixel team and mobile hardware leaders like Qualcomm Technologies and MediaTek, these multimodal models run completely offline with near-zero latency across edge devices like phones, Raspberry Pi, and NVIDIA Jetson Orin Nano. Android developers can now prototype agentic flows in the AICore Developer Preview today for forward-compatibility with Gemini Nano 4.
If it can run on a Google Pixel, my M5 should have no trouble. So I set it up through Ollama and put it to work on iOS development. It runs, it’s fast, and it’s actually useful — with one big caveat out of the box: it talks too much.
Contents
- The Default System Prompt
- Taming the Verbosity
- Results with the Tweaked Model
- By the Numbers
- What’s Next
The Default System Prompt
When prompted, Gemma 4:e4b takes up about 10GB of RAM and the output quality is genuinely good. The problem is the quantity. Gemma 4 likes to think out loud — and it narrates every step of that process in its response. For a quick coding question, you end up wading through a wall of reasoning before you get to the code you actually wanted.
Here’s what I mean. I prompted it to “create a model for a TodoItem using SwiftData”:
>>> create a model for a TodoItem using SwiftData
Thinking...
Here’s a thinking process that leads to the suggested solution:
-
Understand the Goal: The request is to create a model for a
TodoItemspecifically using SwiftData. -
Identify Key Components of a
TodoItem: What attributes should a typical to-do list item have?- Title/Task Description (String)
- Completion Status (Boolean)
- Due Date (Optional Date)
- Creation Date (Often useful, but sometimes automatic)
- Self-Correction/Enhancement: It’s good practice to include a unique identifier or a way to track it, though SwiftData handles the primary ID automatically.
-
Recall SwiftData Model Requirements: To make a class persistent in SwiftData, it must:
- Be a class (not a struct).
- Use the
@Modelmacro/property. - Often benefit from being
Observable(though not strictly required for persistence, it’s best practice when working with SwiftUI/State).
-
Draft the Basic Model Structure (Code Skeleton):
@Model
final class TodoItem {
// Properties go here
}
-
Implement Properties and Initializer:
- Properties:
title:String(Needs a default value or initializer argument)isCompleted:BooldueDate:Date?(Optional)createdAt:Date(Set when created)
- Initializer: Provide an
initthat takes necessary values.
- Properties:
-
Add Best Practices and Context: A user can’t just use the model; they need to know how to use it.
- Usage Context: Show how to initialize the container and environment.
- Example Usage: Show how to save a new item and how to read them in a SwiftUI
View.
-
Review and Refine: Structure the answer clearly with code blocks and explanations.
-
Final Output Generation.
Full Response
This is a very common and straightforward use case for SwiftData. When creating a model for persistence in SwiftData, you must mark the class with the @Model macro and define its properties. Here is a complete model definition, followed by an explanation and usage examples.
1. The TodoItem Model
import Foundation
import SwiftData
@Model
final class TodoItem {
// MARK: Properties
var title: String
var isCompleted: Bool
var dueDate: Date?
var createdAt: Date
// MARK: Initializer
init(title: String, isCompleted: Bool = false, dueDate: Date? = nil, createdAt: Date = Date()) {
self.title = title
self.isCompleted = isCompleted
self.dueDate = dueDate
self.createdAt = createdAt
}
var priorityDescription: String {
if dueDate == nil {
return "No Due Date"
} else if dueDate!.timeIntervalSince(Date()) < 0 {
return "Overdue"
} else {
return "Due"
}
}
}
2. Explanation of the Code
| Element | Purpose | Why it’s needed |
|---|---|---|
@Model | Tells SwiftData this class is designed to be persisted to the database. | Required. Makes the class saveable. |
final class | The class must be a class (not a struct) and it is best practice to mark it final. | Required. SwiftData models are reference types. |
var | Properties are marked var because their values are expected to change over time. | Defines the persistent fields. |
init(...) | Explicit initializer ensures the object can be created correctly and saved. | Ensures correct initialization. |
createdAt: Date | Set within init to capture the precise moment of creation. | Tracks when the item was added. |
3. How to Use the Model
App Entry Point:
@main
struct TodoApp: App {
var body: some Scene {
WindowGroup {
ContentView()
}
.modelContainer(for: TodoItem.self)
}
}
ContentView:
import SwiftUI
import SwiftData
struct ContentView: View {
@Query(sort: \TodoItem.createdAt, order: .reverse) var todos: [TodoItem]
@Environment(\.modelContext) private var modelContext
@State private var newItemTitle: String = ""
var body: some View {
NavigationView {
VStack {
HStack {
TextField("Enter new task...", text: $newItemTitle)
.textFieldStyle(.roundedBorder)
Button("Add") { addItem() }
.buttonStyle(.bordered)
.disabled(newItemTitle.isEmpty)
}
.padding()
List {
ForEach(todos) { item in
HStack {
Image(systemName: item.isCompleted ? "checkmark.circle.fill" : "circle")
.foregroundColor(item.isCompleted ? .green : .gray)
Text(item.title)
.strikethrough(item.isCompleted)
Spacer()
Text(item.priorityDescription)
.font(.caption)
.foregroundColor(.secondary)
}
.contentShape(Rectangle())
.onTapGesture { item.isCompleted.toggle() }
}
}
.navigationTitle("To-Do List")
}
}
}
private func addItem() {
let newItem = TodoItem(title: newItemTitle)
modelContext.insert(newItem)
newItemTitle = ""
}
}
total duration: 1m9.500163958s
load duration: 606.196625ms
prompt eval count: 25 token(s)
prompt eval duration: 7.361703042s
prompt eval rate: 3.40 tokens/s
eval count: 1961 token(s)
eval duration: 1m0.009991456s
eval rate: 32.68 tokens/s
The code is correct and the output is thorough. But at 1,961 output tokens and over a minute to respond, it’s not a workflow I’d reach for when I’m in the middle of building something. Thankfully, Ollama lets you customize models — so I put together a Modelfile to fix this.
Taming the Verbosity
The two most important tweaks in the Modelfile below are No Yapping (suppress the thinking monologue in the output) and a low temperature (0.3) for more deterministic, focused answers. Think of it as giving the model a persona: a senior iOS engineer who assumes you know what you’re doing and just wants to hand you the code.
# Modelfile for Gemma 4B - iOS Development Assistant
FROM gemma4:e4b
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
PARAMETER num_predict 2048
PARAMETER repeat_penalty 1.1
PARAMETER stop "<end_of_turn>"
SYSTEM """You are a Senior iOS Engineer and SwiftUI Expert.
Operational Philosophy:
1. Confidence First: Provide solutions directly if confident.
2. Tactical Documentation: Use the MCP server only for unfamiliar APIs or edge cases.
3. No Yapping: Do not output "Thinking..." blocks or internal monologues.
4. Professional Peer: Assume the user is an experienced developer. Skip basic explanations of protocols like Identifiable or Codable unless the implementation is non-standard.
Code Standards:
- Prioritize Swift 6 and @Observable patterns.
- Keep models lean; move logic to ViewModels/Services.
- Use MARK comments for organization.
Response Format:
- Start with the code block immediately.
- Follow with brief, high-level architectural notes only if necessary.
- No conversational filler (e.g., "Certainly! Here is your model...").
"""
Here are the same two prompts using the tweaked model.
Results with the Tweaked Model
Prompt 1 — Basic TodoItem
>>> create a model for a TodoItem
Thinking...
Plan:
- Define a
TodoItemstruct. - Make it conform to
IdentifiableandCodable. - Since the prompt implies modern SwiftUI development, I will use
@Observableif it represents the state that changes, but for a simple data model, a standardstructconforming toIdentifiableis usually sufficient. If the item itself needs to trigger view updates upon modification (e.g., toggling completion), making the container observable or using@Observableon the model itself is appropriate. I will stick to a standardstructfor the model and assume the ViewModel handles the state management, which is best practice.
Refinement: Use Swift 5.9+ features like UUID for IDs and modern optional handling.
import Foundation
struct TodoItem: Identifiable, Codable {
let id: UUID
var title: String
var isCompleted: Bool = false
var creationDate: Date = Date()
init(title: String, isCompleted: Bool = false) {
self.id = UUID()
self.title = title
self.isCompleted = isCompleted
}
}
Architectural Notes:
The TodoItem is kept as a pure struct conforming to Identifiable and Codable. State management (e.g., the array of items and the logic for toggling completion) should reside in an @Observable class (ViewModel) to ensure SwiftUI views react correctly to mutations.
Prompt 2 — TodoItem with SwiftData
>>> create a model for a TodoItem that will use SwiftData
Thinking...
The user wants to adapt the TodoItem model to be used with SwiftData. This requires:
- Adopting the
@Modelmacro. - Ensuring properties are compatible with persistence (e.g., using
UUIDorStringfor IDs, and standard Swift types). - Removing manual
Identifiableconformance if SwiftData handles the primary key, though keeping it doesn’t hurt.
I will use @Model and adjust the initialization pattern slightly, as SwiftData models are often initialized via the context or require property wrappers.
import Foundation
import SwiftData
@Model
final class TodoItem {
var id: UUID
var title: String
var isCompleted: Bool
var creationDate: Date
init(id: UUID = UUID(), title: String, isCompleted: Bool = false, creationDate: Date = Date()) {
self.id = id
self.title = title
self.isCompleted = isCompleted
self.creationDate = creationDate
}
}
Architectural Notes:
By using @Model, the TodoItem class is now managed by the SwiftData container. We switched from a struct to a class because SwiftData models must be reference types. The id property is retained for external referencing, but SwiftData manages persistence via the @Model wrapper. When fetching or creating items, ensure you inject the modelContext into your view or view model.
By the Numbers
The biggest difference isn’t speed — the model generates tokens at roughly the same rate either way. The difference is how many tokens it decides to produce.
| Default system prompt | Tweaked — Basic struct | Tweaked — SwiftData | |
|---|---|---|---|
| Prompt | create model w/ SwiftData | create model | create model w/ SwiftData |
| Output tokens | 1,961 | 334 | 333 |
| Output speed | 32.68 tok/s | 34.02 tok/s | 34.16 tok/s |
| Output duration | 60.0s | 9.8s | 9.7s |
| Total duration | 1m 9.5s | 22.0s | 21.0s |
| Prompt tokens | 25 | 205 | 393 |
| Prompt speed | 3.40 tok/s | 17.44 tok/s | 36.98 tok/s |
A few things stand out:
- Output tokens dropped ~83% (1,961 → ~333) just from the system prompt — the model generates the same useful code in a fraction of the tokens.
- Total time went from ~70s to ~21s, almost entirely because there were fewer tokens to generate, not because the model got faster.
- Prompt eval speed increased across the two tweaked runs (3.40 → 17.44 → 36.98 tok/s) because the second and third prompts benefit from a warm KV cache — the model context was already partially loaded from the previous exchange.
- The thinking block is still present in the tweaked model, but it’s much more concise — just enough planning to arrive at a correct answer, then it stops.
What’s Next
I’ve at least reached the point where I can say that running Gemma 4:e4b locally on my Mac is doable, and I’m going to keep testing it. The next thing I want to figure out is how to feed it an MCP or a RAG containing Apple’s Developer Documentation, so it always has access to the latest APIs and patterns. I’d also love to get it working alongside Claude Code or Opencode in some way.
At the very least, I’m not as worried about hitting my Claude Pro session limits anymore — I’ve got a capable local fallback for when I hit the wall.