← All articles

Using Claude Code on a $20/Month Subscription: Notes from Daily Mobile Development at MSApps

Author: Shay Shimoni, Mobile Developer, MSApps

Tags: AI-assisted development, Claude Code, React Native, iOS, Android, mobile engineering, developer productivity

If you're a mobile developer wondering whether AI coding tools are worth the overhead — the setup cost, the prompt engineering, the token management — this post tries to give an honest answer. I've been using Claude Code on a $20/month Pro subscription as part of my daily workflow across six production apps. By the end of this post you'll know exactly how the workflow is set up, which plugins make the token budget last, and what a real multi-round bug fix looks like when the AI is involved — including where it got it wrong before it got it right.

The Setup: What MSApps Builds

MSApps is a mobile app agency based in Israel. The codebases I work with include:

  • DreamCard (React Native + TypeScript) — a loyalty card and offers platform with authentication flows, background refresh cycles, and third-party SDK integrations
  • myzeekr iOS (UIKit, Alamofire, ZeekrSDK + Ituran + Au10tix, Hebrew RTL) — a connected car platform for Zeekr Israel
  • myzeekr Android (Kotlin 2.2.0, Retrofit/OkHttp, View-based UI) — the Android counterpart
  • mylexus iOS and mygeely iOS — connected car apps for Lexus Israel and Geely Israel respectively, each with their own SDK and payment integrations
  • nerivio iOS (UIKit + Coordinator + MVVM, BLE, NFC) — a medical device app for the Nerivio migraine treatment device by Theranica

These codebases have enough moving parts — Keychain persistence, biometric authentication, multiple build targets, binary SDK frameworks — that getting an AI assistant to be genuinely useful required some setup work upfront.

Starting with Claude Code: The Learning Curve

The first thing I had to learn was that Claude Code is not a chatbot you paste code into. It's a terminal agent with direct filesystem access, tool use, and persistent memory. The difference matters.

When I first pointed it at myzeekr iOS, I ran the /learn-codebase skill — it read every source file in full, indexing the architecture, singletons, network layer patterns, build targets, and navigation flows into a persistent memory store. That one-time upfront investment meant every subsequent session started with context I didn't have to re-explain.

The memory system (claude-mem) stores observations across sessions. Every time the AI discovers something non-obvious — a stale closure pattern, an SDK initialization sequence, a race condition in async state — it writes it to a searchable index backed by SQLite. On the next session, it searches that index rather than re-reading the same files. More on this in the token section.

A Real Bug: Where the AI Helped (and Where It Needed Multiple Passes)

One case that illustrates the workflow well is Bug #12584 in DreamCard's UserContext.tsx.

The Symptom

Users reported crashes when returning to the app from external navigation (Waze, for example). Pressing buttons after the crash produced a "Maximum call stack size exceeded" error. The React Native bridge was corrupted.

The Root Cause (First Pass)

The original code threw new Error('Error') inside a useEffect to trigger React's ErrorBoundary. In theory, this routes the error to a recoverable error screen. In practice, throwing inside a useEffect in React Native corrupts the bridge between JavaScript and the native layer. Once corrupted, no UI interaction works.

The fix seemed obvious: replace throw new Error() with a safe navigation call to an error screen using navigationRef.navigate(ERROR). This keeps the React tree alive.

The Fix That Needed Three More Rounds

The initial fix wasn't enough. Running the code reviewer agent — which reviews independently, without the context of why the change was made — surfaced three follow-on correctness issues.

Issue 1 — React State Batching

The first attempt at the fix reset a useRef flag (isLoginFlowRef.current = false) synchronously inside handleLogin after await hadleRefreshData(). The reasoning: set the flag true before mutations, false after. Simple.

What the reviewer caught: React batches state updates asynchronously. When a mutation calls setHasRedirectedToError(true) inside onError, that's a scheduled state update — it doesn't re-render immediately. React flushes the batch after the current synchronous queue drains. So the actual execution sequence was:

  1. handleLogin sets isLoginFlowRef.current = true
  2. Mutation fails → onError calls setHasRedirectedToError(true) (schedules update)
  3. hadleRefreshData returns (swallows error in outer try-catch)
  4. handleLogin sets isLoginFlowRef.current = falseflag cleared here
  5. React flushes batched state update, useEffect fires
  6. Effect checks isLoginFlowRef.current → already false → early return
  7. Error screen never shown

The fix: move the ref read inside the effect, atomically capture it and clear it:

const shouldNavigate = isLoginFlowRef.current;
isLoginFlowRef.current = false;
if (!shouldNavigate) return;
if (navigationRef.isReady()) navigationRef.navigate(ERROR);

Issue 2 — Concurrent Background Refresh Race Condition

The app runs two independent background refresh triggers: a 30-minute timer and an AppState listener that fires when the app foregrounds. Both shared a single boolean ref (isBackgroundRefreshRef).

The race: timer fires, sets ref true, starts refresh. User backgrounds and reforegrounds mid-refresh — AppState listener also sets ref true, starts a second refresh. Timer's refresh completes first, sets ref to false. AppState's refresh is still in-flight. When AppState's getCustomerMutation.onError fires, the ref is false — gate open — error screen shows incorrectly for a background refresh failure.

The fix: replace the boolean with a counter:

const backgroundRefreshCountRef = useRef(0);
// On entry:
backgroundRefreshCountRef.current += 1;
// On exit:
backgroundRefreshCountRef.current -= 1;
// Gate check:
if (backgroundRefreshCountRef.current === 0) setHasRedirectedToError(true);

Gate only opens when all in-flight refreshes complete. If timer and AppState overlap, counter reaches 2; whichever completes first decrements to 1; final completion decrements to 0.

Issue 3 — Navigator Not Ready on Cold Start

On cold start, the navigation container may not be mounted when an API error fires. The original code cleared the hasRedirectedToError flag before checking navigationRef.isReady(). If the navigator wasn't ready, the flag was cleared but navigation was skipped — and the effect would never re-run because its dependency was manually reset to false.

Fix: only clear the flag after successful navigation:

if (!navigationRef.isReady()) return; // flag stays true, retries on next render
setHasRedirectedToError(false);
navigationRef.navigate(ERROR);

This took several review-fix-review cycles, not one clean pass. The independent reviewer (no conversation history, no knowledge of my reasoning) is what caught issues 2 and 3 at the point where I thought the fix was done. That's the honest version of how it went.

Integrations: CLIs and APIs I Use Daily

One of the more practical things about Claude Code's terminal-based design is that it can call any CLI or REST API you already use — without a browser, without copy-paste, and without a custom integration.

Azure DevOps — REST API with PAT

Our work items live in Azure DevOps. The setup is simple: store the PAT token, organization name, and project name in ~/.claude/settings.json. When I say "pick up work item 12585," Claude Code calls the Azure DevOps REST API directly with curl, parses the JSON response, and reads the task description — even when it's in Hebrew, since our team writes tasks in Hebrew.

Work item 12585 was titled "שינויים בSDK המוטמע של אקוסטיק" — changes to the embedded Acoustic SDK. Claude read the Hebrew description, identified the root cause (manual payload construction bypassing the SDK's automatic session UUID generation), and outlined the fix. No browser tab, no copy-paste.

The firm rule here: always REST API, never az CLI. The CLI adds a dependency and auth layer that isn't necessary when a PAT works directly.

Monday.com — MCP via Claude AI Connector

For Monday.com I use the built-in MCP integration that ships with Claude's connector ecosystem. Once authenticated, Claude Code can query boards, read items, and pull ticket details directly from a Monday.com URL without leaving the terminal. No custom API wrapper, no personal token management — the MCP connector handles auth.

In practice this means I can say "pull item 12345 from the board" and get the full ticket description, assignee, and status inline, the same way Azure DevOps work items come through. The two integrations cover the two places our team tracks work, and both are accessible from a single terminal session.

GitHub CLI (gh)

For GitHub-related work — creating PRs, checking CI status, reviewing diffs, commenting on issues — I use the gh CLI. Claude Code can call gh directly in the terminal, which means PR creation follows the same format every time: conventional commit subject, bullet list of changes, no trailing noise. The CLAUDE.md rules define the PR format and Claude follows it without prompting.

gh pr create, gh pr view, gh pr checks, gh issue list — all callable inline during a session, without switching context to a browser. When a CI check fails, gh run view can pull the failure log directly into the session.

Learning from Andrej Karpathy's Approach

I want to be honest about where some of my thinking on this came from. Andrej Karpathy — former OpenAI founding member, Tesla AI director, and one of the people who thinks most clearly about how developers should work with LLMs — has written and talked a lot about what changes when AI becomes part of the development loop.

One idea that stuck with me was what he called "vibe coding." His framing: AI has crossed a threshold where you can build real software through English, largely forgetting that code exists underneath. He's used it to write a custom BPE tokenizer in Rust, build quick demo apps, and even spin up entire throwaway applications just to track down a single bug — because at some point, generating code becomes cheap enough to be disposable.

What I took from that isn't a license to stop thinking about code quality. It's almost the opposite. The interesting challenge shifts: if generating code is cheap, the bottleneck becomes knowing what to ask for, how to review what comes back, and how to set up the guardrails that prevent the AI from going off-scope. Karpathy describes this well — vibe coding empowers trained professionals to write software that would otherwise never get written, but the professional judgment about what to build and whether it's correct doesn't disappear.

His "How I use LLMs" video (on his YouTube channel) is worth watching if you're getting started. The practical point that influenced me most: treat the AI as something you direct and verify, not something you trust blindly. The code it writes needs to be read. The explanations it gives need to be checked against actual behavior. The more production-critical the context, the more important it is that a human is in the loop at every significant decision point.

That maps directly to how I set up the workflow here — the CLAUDE.md rules that constrain scope, the mandatory commit approval before anything gets pushed, the independent code reviewer that sees no context from the session. Those aren't just token-saving measures. They're the guard rails that make the speed sustainable.

Token Management: How to Stay Inside $20/Month

Claude Code's $20/month Pro plan includes a generous token budget, but it can disappear quickly if you're not deliberate. Here's what actually makes a difference.

context-mode

The most impactful plugin I use is context-mode. It routes large command outputs through a sandboxed subprocess — instead of cat UserContext.tsx dumping 800 lines into the conversation context, ctx_execute_file runs analysis on the file in a separate process and returns only a summary. Raw tool output that would cost 30,000 tokens costs under 2,000.

The ctx_batch_execute command is the workhorse: run ten shell commands, auto-index the output, and search it with ten queries — all in one tool call. That replaces what would otherwise be thirty separate tool calls followed by ten follow-up searches.

Reported savings over a full session: ~94%.

caveman mode

A plugin that compresses Claude's own responses — drops articles, filler phrases, pleasantries, hedging. "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..." becomes "Bug in auth middleware. Token expiry uses < not <=. Fix:". Same information, about 75% fewer tokens on the response side.

claude-mem — Persistent Memory Across Sessions

Without persistent memory, every session starts from scratch. The same 50 files get read again, the same architecture gets re-explained. claude-mem writes discoveries to a SQLite-backed observation store that survives session boundaries. New sessions search memory first; files only get re-read when the memory is ambiguous or potentially stale.

After running /learn-codebase on all six projects in one sitting early on, every subsequent session could reference architecture details, SDK initialization sequences, and navigation patterns without re-reading source files. By the numbers: 116 sessions tracked across a month, 697 observations stored, 4.67 million discovery tokens logged — representing all the work the AI did that memory now prevents repeating.

Codex Plugin — Delegating Heavy Lifting

The Codex plugin (codex:rescue) is useful when the main session needs to hand off a large investigation or implementation without burning context. Instead of the main Claude conversation expanding to read 20 files and trace 5 call chains, a Codex sub-agent handles the exploration in its own context window and returns a compressed summary.

The practical use case: when a bug investigation requires reading files across multiple modules, or when an implementation task involves changes across 4+ files, delegating to Codex keeps the main session lean. The main conversation stays focused on decisions and review; Codex handles the mechanical reading and writing.

It also helps on tasks where a fresh pair of eyes matters — similar to the code-reviewer agent principle, a Codex agent that hasn't been reasoning about the problem for the last 20 turns will sometimes surface a simpler approach.

Agent Delegation

For broad codebase exploration, I delegate to sub-agents via the Explore agent type — runs in a separate context window, returns a compressed summary, the main conversation never sees the raw file contents.

For code review, the code-reviewer agent runs with no conversation history. It sees only the diff and the code, not my reasoning for the change. That independence is what makes it catch things.

Disciplined Scope

The biggest token sink is context drift — asking Claude to explore things tangentially related to the task. The CLAUDE.md rules enforce: one task at a time, no unsolicited refactoring, no cleanup of unrelated code. This prevents the AI from exploring files that don't need to be read.

Quality of Output: What Actually Works

What I've found it useful for:

  • Tracing bugs across async state. The React batching issue above involved understanding exactly when React flushes state updates relative to synchronous code. Mapping out the execution sequence step by step, rather than relying on a mental model of React internals, helped — though it still took multiple passes.
  • Cross-file impact analysis. Changing a function signature? Claude checks all call sites before confirming the change is safe. Removing a feature flag? It confirms nothing else references it. This kind of mechanical check is easy to skip under time pressure and catches real mistakes.
  • Code review with independent perspective. An agent that hasn't seen my implementation decisions will sometimes flag things I rationalized away when writing the code. Not always, but often enough to be worth running.
  • Getting oriented in an unfamiliar codebase. The /learn-codebase skill reads all source files and stores what it learns. When I'm working on a project I haven't touched in a while, I can ask questions about architecture without manually tracing through files — though I always verify against the source before acting on the memory.

Where it needs guidance:

  • It will over-engineer if you let it. The default tendency is to add abstractions, handle edge cases, and create helper functions. The CLAUDE.md rules push back explicitly: no library for something doable in a few lines, no refactoring beyond the current task, no features that weren't asked for.
  • It cannot test UI. For React Native, it can verify TypeScript type correctness and run linters, but it cannot launch a simulator and confirm the error screen actually appears. UI verification is still manual, and the rules require explicit acknowledgment when UI testing wasn't done.
  • Stale memory is real. A memory that says "function X is at line 157" was accurate when written. Three weeks and a refactor later, it may be wrong. Memory is a starting point, not a source of truth.

The Workflow in Practice

A typical session:

  1. Start Claude Code in the project directory
  2. Memory loads automatically (MEMORY.md index is always in context)
  3. Describe the task or give a work item number
  4. Claude fetches the work item from the task tracker, or I describe the bug directly
  5. Exploration phase: reads relevant files, maps affected code paths (via Codex or Explore agent for large scope)
  6. Calls advisor() before writing anything — routes the full conversation to a stronger model for a second opinion
  7. Implements the fix, runs linter/type-checker
  8. Delegates to code reviewer agent for independent review
  9. Presents commit message for approval before committing
  10. I verify manually if UI is involved; gh pr create if ready to push

The advisor() call in step 6 is worth running on anything stateful or async. On the UserContext bug, it flagged the React batching issue before the first implementation attempt. That's the kind of catch that saves a round trip.

Some Numbers

The numbers below come directly from the claude-mem SQLite database — not estimates. They represent one calendar month of work across six apps.

Metric Value
Sessions tracked 116
Observations stored 697
Total discovery tokens logged 4,674,038
Context-mode token savings ~94% per session
Projects in active use 6 apps (iOS + Android)
Subscription cost $20/month Claude Pro

The 4.67 million discovery tokens represent the actual cost of all work done — architectural analysis, bug investigations, code generation, reviews. Memory prevents repeating that work in future sessions. The counter-based refresh race condition fix, the React batching analysis, the ZeekrPower widget architecture — all stored, all searchable, none of it needs to be re-derived.

Where This Leaves Things

The honest summary: Claude Code on a $20/month plan is a genuinely useful tool for mobile development, but it requires deliberate setup to stay that way. The plugins (context-mode, caveman, claude-mem, Codex delegation) are what keep it inside the budget. The CLAUDE.md rules are what keep it focused. The independent code reviewer is what catches the things you rationalize away.

The main thing Karpathy gets right is that the bottleneck shifts — not to generating code, but to knowing what to ask for and whether what came back is correct. That judgment doesn't get automated. It just gets more important.

← All articles