Gemma 4:e4b as a Local Coding Assistant — First Look

April 4, 2026 · 10 min read

Google DeepMind released Gemma 4 this week, and I couldn’t help myself — I had to see if the smaller e4b model was worth using as a local coding assistant on my MacBook Pro M5 (base model, 16GB). This is the first time I’ve seriously tested a local LLM. Realistically, I hadn’t found one that ran well enough on my Mac to bother.

What got my attention was this claim from the announcement:

E2B and E4B models: A new level of intelligence for mobile and IoT devices Engineered from the ground up for maximum compute and memory efficiency, these models activate an effective 2 billion and 4 billion parameter footprint during inference to preserve RAM and battery life. In close collaboration with our Google Pixel team and mobile hardware leaders like Qualcomm Technologies and MediaTek, these multimodal models run completely offline with near-zero latency across edge devices like phones, Raspberry Pi, and NVIDIA Jetson Orin Nano. Android developers can now prototype agentic flows in the AICore Developer Preview today for forward-compatibility with Gemini Nano 4.

If it can run on a Google Pixel, my M5 should have no trouble. So I set it up through Ollama and put it to work on iOS development. It runs, it’s fast, and it’s actually useful — with one big caveat out of the box: it talks too much.

The Default System Prompt
Taming the Verbosity
Results with the Tweaked Model
By the Numbers
What’s Next

The Default System Prompt

When prompted, Gemma 4:e4b takes up about 10GB of RAM and the output quality is genuinely good. The problem is the quantity. Gemma 4 likes to think out loud — and it narrates every step of that process in its response. For a quick coding question, you end up wading through a wall of reasoning before you get to the code you actually wanted.

Here’s what I mean. I prompted it to “create a model for a TodoItem using SwiftData”:

>>> create a model for a TodoItem using SwiftData

Thinking...

Here’s a thinking process that leads to the suggested solution:

Understand the Goal: The request is to create a model for a TodoItem specifically using SwiftData.
Identify Key Components of a TodoItem: What attributes should a typical to-do list item have?
- Title/Task Description (String)
- Completion Status (Boolean)
- Due Date (Optional Date)
- Creation Date (Often useful, but sometimes automatic)
- Self-Correction/Enhancement: It’s good practice to include a unique identifier or a way to track it, though SwiftData handles the primary ID automatically.
Recall SwiftData Model Requirements: To make a class persistent in SwiftData, it must:
- Be a class (not a struct).
- Use the @Model macro/property.
- Often benefit from being Observable (though not strictly required for persistence, it’s best practice when working with SwiftUI/State).
Draft the Basic Model Structure (Code Skeleton):

@Model
final class TodoItem {
    // Properties go here
}

Implement Properties and Initializer:
- Properties:
  - title: String (Needs a default value or initializer argument)
  - isCompleted: Bool
  - dueDate: Date? (Optional)
  - createdAt: Date (Set when created)
- Initializer: Provide an init that takes necessary values.
Add Best Practices and Context: A user can’t just use the model; they need to know how to use it.
- Usage Context: Show how to initialize the container and environment.
- Example Usage: Show how to save a new item and how to read them in a SwiftUI View.
Review and Refine: Structure the answer clearly with code blocks and explanations.
Final Output Generation.

Full Response

This is a very common and straightforward use case for SwiftData. When creating a model for persistence in SwiftData, you must mark the class with the @Model macro and define its properties. Here is a complete model definition, followed by an explanation and usage examples.

1. The `TodoItem` Model

import Foundation
import SwiftData

@Model
final class TodoItem {
    // MARK: Properties

    var title: String
    var isCompleted: Bool
    var dueDate: Date?
    var createdAt: Date

    // MARK: Initializer

    init(title: String, isCompleted: Bool = false, dueDate: Date? = nil, createdAt: Date = Date()) {
        self.title = title
        self.isCompleted = isCompleted
        self.dueDate = dueDate
        self.createdAt = createdAt
    }

    var priorityDescription: String {
        if dueDate == nil {
            return "No Due Date"
        } else if dueDate!.timeIntervalSince(Date()) < 0 {
            return "Overdue"
        } else {
            return "Due"
        }
    }
}

2. Explanation of the Code

Element	Purpose	Why it’s needed
`@Model`	Tells SwiftData this class is designed to be persisted to the database.	Required. Makes the class saveable.
`final class`	The class must be a `class` (not a `struct`) and it is best practice to mark it `final`.	Required. SwiftData models are reference types.
`var`	Properties are marked `var` because their values are expected to change over time.	Defines the persistent fields.
`init(...)`	Explicit initializer ensures the object can be created correctly and saved.	Ensures correct initialization.
`createdAt: Date`	Set within `init` to capture the precise moment of creation.	Tracks when the item was added.

3. How to Use the Model

App Entry Point:

@main
struct TodoApp: App {
    var body: some Scene {
        WindowGroup {
            ContentView()
        }
        .modelContainer(for: TodoItem.self)
    }
}

ContentView:

import SwiftUI
import SwiftData

struct ContentView: View {
    @Query(sort: \TodoItem.createdAt, order: .reverse) var todos: [TodoItem]
    @Environment(\.modelContext) private var modelContext
    @State private var newItemTitle: String = ""

    var body: some View {
        NavigationView {
            VStack {
                HStack {
                    TextField("Enter new task...", text: $newItemTitle)
                        .textFieldStyle(.roundedBorder)
                    Button("Add") { addItem() }
                        .buttonStyle(.bordered)
                        .disabled(newItemTitle.isEmpty)
                }
                .padding()

                List {
                    ForEach(todos) { item in
                        HStack {
                            Image(systemName: item.isCompleted ? "checkmark.circle.fill" : "circle")
                                .foregroundColor(item.isCompleted ? .green : .gray)
                            Text(item.title)
                                .strikethrough(item.isCompleted)
                            Spacer()
                            Text(item.priorityDescription)
                                .font(.caption)
                                .foregroundColor(.secondary)
                        }
                        .contentShape(Rectangle())
                        .onTapGesture { item.isCompleted.toggle() }
                    }
                }
                .navigationTitle("To-Do List")
            }
        }
    }

    private func addItem() {
        let newItem = TodoItem(title: newItemTitle)
        modelContext.insert(newItem)
        newItemTitle = ""
    }
}

total duration:    1m9.500163958s
load duration:     606.196625ms
prompt eval count: 25 token(s)
prompt eval duration: 7.361703042s
prompt eval rate:  3.40 tokens/s
eval count:        1961 token(s)
eval duration:     1m0.009991456s
eval rate:         32.68 tokens/s

The code is correct and the output is thorough. But at 1,961 output tokens and over a minute to respond, it’s not a workflow I’d reach for when I’m in the middle of building something. Thankfully, Ollama lets you customize models — so I put together a Modelfile to fix this.

Taming the Verbosity

The two most important tweaks in the Modelfile below are No Yapping (suppress the thinking monologue in the output) and a low temperature (0.3) for more deterministic, focused answers. Think of it as giving the model a persona: a senior iOS engineer who assumes you know what you’re doing and just wants to hand you the code.

# Modelfile for Gemma 4B - iOS Development Assistant
FROM gemma4:e4b

PARAMETER temperature 0.3
PARAMETER num_ctx 8192
PARAMETER num_predict 2048
PARAMETER repeat_penalty 1.1
PARAMETER stop "<end_of_turn>"

SYSTEM """You are a Senior iOS Engineer and SwiftUI Expert.

Operational Philosophy:
1. Confidence First: Provide solutions directly if confident.
2. Tactical Documentation: Use the MCP server only for unfamiliar APIs or edge cases.
3. No Yapping: Do not output "Thinking..." blocks or internal monologues.
4. Professional Peer: Assume the user is an experienced developer. Skip basic explanations of protocols like Identifiable or Codable unless the implementation is non-standard.

Code Standards:
- Prioritize Swift 6 and @Observable patterns.
- Keep models lean; move logic to ViewModels/Services.
- Use MARK comments for organization.

Response Format:
- Start with the code block immediately.
- Follow with brief, high-level architectural notes only if necessary.
- No conversational filler (e.g., "Certainly! Here is your model...").
"""

Here are the same two prompts using the tweaked model.

Results with the Tweaked Model

Prompt 1 — Basic `TodoItem`

>>> create a model for a TodoItem

Thinking...

Plan:

Define a TodoItem struct.
Make it conform to Identifiable and Codable.
Since the prompt implies modern SwiftUI development, I will use @Observable if it represents the state that changes, but for a simple data model, a standard struct conforming to Identifiable is usually sufficient. If the item itself needs to trigger view updates upon modification (e.g., toggling completion), making the container observable or using @Observable on the model itself is appropriate. I will stick to a standard struct for the model and assume the ViewModel handles the state management, which is best practice.

Refinement: Use Swift 5.9+ features like UUID for IDs and modern optional handling.

import Foundation

struct TodoItem: Identifiable, Codable {
    let id: UUID
    var title: String
    var isCompleted: Bool = false
    var creationDate: Date = Date()

    init(title: String, isCompleted: Bool = false) {
        self.id = UUID()
        self.title = title
        self.isCompleted = isCompleted
    }
}

Architectural Notes: The TodoItem is kept as a pure struct conforming to Identifiable and Codable. State management (e.g., the array of items and the logic for toggling completion) should reside in an @Observable class (ViewModel) to ensure SwiftUI views react correctly to mutations.

Prompt 2 — `TodoItem` with SwiftData

>>> create a model for a TodoItem that will use SwiftData

Thinking...

The user wants to adapt the TodoItem model to be used with SwiftData. This requires:

Adopting the @Model macro.
Ensuring properties are compatible with persistence (e.g., using UUID or String for IDs, and standard Swift types).
Removing manual Identifiable conformance if SwiftData handles the primary key, though keeping it doesn’t hurt.

I will use @Model and adjust the initialization pattern slightly, as SwiftData models are often initialized via the context or require property wrappers.

import Foundation
import SwiftData

@Model
final class TodoItem {
    var id: UUID
    var title: String
    var isCompleted: Bool
    var creationDate: Date

    init(id: UUID = UUID(), title: String, isCompleted: Bool = false, creationDate: Date = Date()) {
        self.id = id
        self.title = title
        self.isCompleted = isCompleted
        self.creationDate = creationDate
    }
}

Architectural Notes: By using @Model, the TodoItem class is now managed by the SwiftData container. We switched from a struct to a class because SwiftData models must be reference types. The id property is retained for external referencing, but SwiftData manages persistence via the @Model wrapper. When fetching or creating items, ensure you inject the modelContext into your view or view model.

By the Numbers

The biggest difference isn’t speed — the model generates tokens at roughly the same rate either way. The difference is how many tokens it decides to produce.

	Default system prompt	Tweaked — Basic struct	Tweaked — SwiftData
Prompt	create model w/ SwiftData	create model	create model w/ SwiftData
Output tokens	1,961	334	333
Output speed	32.68 tok/s	34.02 tok/s	34.16 tok/s
Output duration	60.0s	9.8s	9.7s
Total duration	1m 9.5s	22.0s	21.0s
Prompt tokens	25	205	393
Prompt speed	3.40 tok/s	17.44 tok/s	36.98 tok/s

A few things stand out:

Output tokens dropped ~83% (1,961 → ~333) just from the system prompt — the model generates the same useful code in a fraction of the tokens.
Total time went from ~70s to ~21s, almost entirely because there were fewer tokens to generate, not because the model got faster.
Prompt eval speed increased across the two tweaked runs (3.40 → 17.44 → 36.98 tok/s) because the second and third prompts benefit from a warm KV cache — the model context was already partially loaded from the previous exchange.
The thinking block is still present in the tweaked model, but it’s much more concise — just enough planning to arrive at a correct answer, then it stops.

What’s Next

I’ve at least reached the point where I can say that running Gemma 4:e4b locally on my Mac is doable, and I’m going to keep testing it. The next thing I want to figure out is how to feed it an MCP or a RAG containing Apple’s Developer Documentation, so it always has access to the latest APIs and patterns. I’d also love to get it working alongside Claude Code or Opencode in some way.

At the very least, I’m not as worried about hitting my Claude Pro session limits anymore — I’ve got a capable local fallback for when I hit the wall.

Gemma 4:e4b as a Local Coding Assistant — First Look

Contents

The Default System Prompt

1. The TodoItem Model

2. Explanation of the Code

3. How to Use the Model

Taming the Verbosity

Results with the Tweaked Model

Prompt 1 — Basic TodoItem

Prompt 2 — TodoItem with SwiftData

By the Numbers

What’s Next

Get new posts in your inbox

1. The `TodoItem` Model

Prompt 1 — Basic `TodoItem`

Prompt 2 — `TodoItem` with SwiftData