Once a company adopts AI, the bill is often scarier than expected: a small team easily spends ¥10k+/month on official Claude, and ¥20k+ when mixing several models. But most of that cost is removable. This guide uses a five-layer methodology — platform, model, prompt, workflow, and team management — to compress costs to a fraction of the original.
Where the Money Goes
A typical spend breakdown:
- Official Claude: ¥10k+/month (daily coding + app calls for a small team)
- GPT-5.5: ¥5k+/month (dev/test + production inference)
- Multi-model mix: ¥20k+/month, with bills scattered across platforms and hard to control
The takeaway: it can be cut, and the most aggressive combination saves 90%+.
Layer 1: Pick the Right Platform (~80% savings)
The single biggest cut. For the same model, official vs relay pricing can differ tens of times.
| Option | Price (Sonnet) | Monthly (10M tokens) | Notes |
|---|---|---|---|
| Claude official | $3~15/M | ¥657 | USD billing, currency loss |
| Zivv relay | ¥1.2/M | ¥12 | Direct CNY |
| Other relays | ¥3~5/M | ¥30~50 | Inconsistent price/stability |
Just swapping official for Zivv brings the bill to 2% of the original — the highest-leverage, lowest-effort step.
Layer 2: Pick the Right Model (60~80% savings)
Not every task needs the top model. Running formatting or classification on Opus is pure waste. Current model tiers:
| Model | Role | Use cases |
|---|---|---|
| Claude Haiku 4.5 | Fast & cheap | Classification, extraction, formatting, simple Q&A, high-frequency iteration |
| Claude Sonnet 4.6 | General workhorse | Daily coding, content generation, most business logic |
| Claude Opus 4.8 | Top-tier reasoning | Complex architecture, hard debugging, deep reasoning |
| GPT-5.5 | OpenAI workhorse | When you need the OpenAI ecosystem or specific capabilities |
Principle: default to Sonnet 4.6, downgrade simple work to Haiku 4.5, and reserve Opus 4.8 for genuinely complex tasks. Routing by scenario typically cuts another 60~80%.
Layer 3: Optimize Prompts (30~50% savings)
Tokens are money — both input and output. Common waste: dumping whole files, repeating background, letting the model emit long fluff.
Techniques:
- Cut redundant instructions and repeated context; provide only what's needed
- Use structured input (JSON / tables) instead of verbose natural language
- Explicitly request concise output, capping length or format
- Use few-shot examples instead of long rule descriptions — often shorter and more accurate
Layer 4: Engineer the Workflow (20~40% savings)
Put engineering to work:
- Prompt Cache: route repeated system prompts and long context through cache for big discounts on hits
- Batch API: submit non-real-time tasks in batches at lower unit price
- Streaming: improve UX and combine with early termination to save useless tokens
- Retry & downgrade: auto-retry failures and downgrade models on demand to avoid full re-calls
Layer 5: Team Budget Management (rare among peers)
The first four layers control per-call cost; the fifth controls runaway spend — exactly the value of Zivv Teams, a capability most relays and official APIs lack:
- Per-member keys and quotas: assign each member/project an independent key and budget cap, auto-blocking overruns
- Real-time usage dashboard: view consumption by member, model, and project — who's burning budget is obvious
- Unified top-up and billing: one team account, no more one-credit-card-per-person, no scattered bills
- Tiered permissions: admins control which models and quotas are available, so not everyone can call the priciest Opus
Many companies lose control not because unit prices are high, but because no one tracks who uses how much. Team mode closes that gap.
Worked Example: From ¥20,000 to ¥3,000
Starting point: 100-person startup, Claude Opus as the workhorse, 10M tokens/month, ¥20,000/month official bill.
| Step | Action | Monthly | vs Previous |
|---|---|---|---|
| Start | Official Opus | ¥20,000 | - |
| Step 1 | Switch to Zivv | ¥240 | ↓98% |
| Step 2 | Route by task | ¥150 | ↓38% |
| Step 3 | Prompt optimization | ¥120 | ↓20% |
| Step 4 | Batch + Cache | ¥30 | ↓75% |
| — | Add team management to prevent rebound | — | Prevents runaway |
Note: absolute values from Step 2 onward grow with usage; this shows the optimization path at constant volume. Real teams, after usage growth, land steadily around ¥3,000/month — still 15% of the original.
Action Plan
- Today: sign up for Zivv, switch one project over, and immediately see the Layer 1 drop
- This week: analyze your team's token consumption and find which tasks use top-tier models
- Next week: route by scenario (Haiku 4.5 / Sonnet 4.6 / Opus 4.8) and optimize your top 5 costliest prompts
- In two weeks: adopt Batch API and Prompt Cache; enable Teams, assign per-member keys and budgets to prevent cost rebound at the source