
If you’ve ever tried to find a specific Zerto answer by searching help.zerto.com — what’s actually new in 10.9, the right combination of permissions for a vSphere service account, the one KB article that explains why your VRA install is failing — you know the search box is doing its best but isn’t really pulling its weight. The Zerto docs are huge: ~21,000 pages spanning every product, platform, and version going back to the 9.x line. You usually have to know roughly where the answer lives before you can find it.
I got tired of doing that. So I built a Zerto Docs MCP server — well, Claude did. 🙂 It’s a hosted MCP server that any LLM client (Claude Desktop, Claude Code, Cursor, VS Code with Copilot) can call. You ask Claude something like “What’s new in cyber recovery VPGs in 10.9?” and it gets back a tight, citation-backed answer drawn from the actual Zerto docs.
Disclaimer up front
This is a personal side project. It is not a Zerto or HPE product, not supported by Zerto, and not endorsed by my employer. The corpus is 100% publicly available data scraped from help.zerto.com — exactly the same pages anyone can read in a browser. No internal docs, no support tickets, no customer environments, no roadmap. Don’t use this as a substitute for talking to Zerto support, and don’t take my MCP’s word for anything safety-critical without verifying against the source page (which is always linked in the response).
What it is
The Zerto Docs MCP server is a streamable-HTTP MCP — a small web service that speaks the Model Context Protocol so LLMs can call its tools. There are eleven of them, but the four you’ll actually use are:
search_docs— semantic search across every Zerto doc page, with optional filters for version (10.9,10.8,9.7…) and platform (VMware vSphere, Hyper-V, AWS, Azure, vCD…). Returns the most relevant chunks with their source URLs so the LLM can cite.get_page— full markdown of a specific page when Claude needs more context than a chunk preview.diff_versions— unified diff of a topic between two versions. Useful for “what changed about X between 10.8 and 10.9”.bundle_changelog— bundle-level summary: added pages, removed pages, changed pages with churn counts. Great for getting an overview before diving in.
There’s a separate set of tools for the Zerto Interoperability Matrix (interop_check, interop_versions, interop_platforms, interop_categories) — same idea, backed by the matrix data rather than prose docs. So you can ask things like “Is RHEL 9 supported on Hyper-V in 10.9?” and get an authoritative yes/no with a citation.
What it knows
- Every page from every bundle on help.zerto.com — Admin guides, Install guides, Release Notes, Swagger API references, the lifecycle matrix, and the full KB article archive (659 articles).
- The Interoperability Matrix, scraped weekly from the same source the iframe on help.zerto.com pulls from.
- Cross-version “topic clusters” — Zerto’s editor-curated mapping of “this page in 10.9 corresponds to that page in 10.8” — which is what makes the diff tools work.
It does not know about:
- Anything from inside HPE / Zerto (no internal docs, no support tickets, no presales decks, no roadmap)
- Anything from customer environments
- Anything posted to help.zerto.com after the last weekly refresh (the corpus refreshes on Monday every week)
How to use it
Public endpoint:
https://mcp.jpaul.io/metamcp/zerto-docs/mcp
It’s a streamable-HTTP MCP, no auth required. Add it to any MCP-aware client:
Claude Desktop
Settings → Developer → Edit Config, drop this into claude_desktop_config.json:
{
"mcpServers": {
"zerto-docs": {
"type": "http",
"url": "https://mcp.jpaul.io/metamcp/zerto-docs/mcp"
}
}
}
Restart Claude Desktop, start a fresh chat, and the tools will be available.
Claude Code
claude mcp add zerto-docs --transport http https://mcp.jpaul.io/metamcp/zerto-docs/mcp
Open a new claude session — mcp__zerto-docs__search_docs shows up in the tool list.
Cursor
Settings → MCP → Add new server. Paste the URL.
VS Code with GitHub Copilot
VS Code added MCP support relatively recently. Drop this into your settings.json:
{
"github.copilot.advanced": {
"mcp": {
"servers": {
"zerto-docs": {
"url": "https://mcp.jpaul.io/metamcp/zerto-docs/mcp"
}
}
}
}
}
Once it’s wired up, try things like:
- “What’s new in Zerto 10.9?”
- “How do I configure SAML federation with Keycloak on the Linux ZVM?”
- “Is Microsoft Defender XDR integration supported in 10.9? Where can I learn more?”
- “Give me a changelog of
Admin.VC.HTML.10.9vsAdmin.VC.HTML.10.8.” - “Find me a KB article about Hyper-V VRA install troubleshooting.”
How it’s built (the short version)
I’ll skip the scraping part — the docs come from help.zerto.com’s public JSON API and we re-fetch every week. What’s actually interesting is the retrieval layer, because the first pass was meh and the second pass made it dramatically better.
Round 1: standard RAG
The textbook recipe. Chunk every page on heading boundaries (~600 tokens per chunk), embed each chunk with nomic-embed-text running on Ollama, store the vectors in Chroma. Query time: embed the user’s question with the same model, ask Chroma for the top-K cosine matches, return them with metadata.
This worked… fine. It found roughly the right neighborhood of pages for most queries. But it had three failure modes that showed up over and over:
- It returned five copies of the same wrong sub-topic. Asking “how do I run a failover test?” returned the “Examples of Queries for the Zerto AI Assistant” page across five different platform bundles before any of the actual Failover Test Operation pages appeared.
- It got dragged toward dense topical language. A query that exactly matched a KB article’s title would still rank middle-of-page chunks from long Admin docs higher — those chunks just had more “ZVM/recover/protect” words in close proximity, and dense embeddings rate dense topical overlap as more relevant than a short, precise title match.
- It buried KB articles entirely. Even though the entire 659-article archive was indexed, KBs almost never made the top-50 — they got crowded out by the longer vendor-doc chunks.
These aren’t bugs in the implementation. They’re a fundamental limit of single-tower embeddings — the query and the document are encoded separately, then compared. The model never gets to actually look at the two side-by-side.
Round 2: add a cross-encoder reranker
A cross-encoder is a different beast. It takes a (query, document) pair as a single input — both texts go through the same attention layers — and outputs one number representing “how relevant is this document to this query, considered jointly.” Much more accurate than the bi-encoder you use for dense retrieval. The catch: too slow to run over the whole index, so you only use it as a second pass over the top-N dense candidates.
I built an A/B harness — 25 hand-curated Zerto queries across categories like factual lookup, cross-version, vague intent, and multi-doc synthesis — and measured dense-only vs. dense+rerank on Mean Reciprocal Rank, Recall@5, and nDCG@5. Two candidate rerankers, both running on llama.cpp:
| Retriever | MRR | Recall@5 | nDCG@5 |
|---|---|---|---|
| Dense only | 0.087 | 0.083 | 0.075 |
| Dense + bge-reranker-v2-m3 (568M params) | 0.267 (+0.18) | 0.300 (+0.22) | 0.233 (+0.16) |
| Dense + jina-reranker-v2-base (278M) | 0.342 (+0.25) | 0.296 (+0.21) | 0.280 (+0.20) |
That’s a ~3× improvement across the board. jina edged BGE on the metrics that map most directly to user experience (MRR is “is the right answer at position 1”) and runs about 3× faster. So that’s what got deployed.
What it looks like in practice — same query, both retrievers:
Q: What’s new in Zerto 10.9?
Dense only:
1.
ZertoAVSModule 3.9.x— a note about a deprecated Azure module
2.VRA0061— alarm-code page, body: “Deprecated in Zerto 10.0_U5 and later.”
3.VRA0062— same alarm
4–5. more alarm-code pagesWith the reranker:
1.
Release Notes for Zerto 10.9 > What's New > General(AI Assistant, MCP, Recovery Plans, Defender XDR, Vault integration…)
2.Release Notes for Zerto 10.9 > What's New > Analytics
3.Release Notes for Zerto 10.9 > What's New > AVS
4–5. more release-notes sections
Both retrievers returned five pages. The dense ones shared keyword overlap with “new” / “10.9” because Zerto deprecates a bunch of alarms in every release, and alarm-code pages are short enough that the overlap dominates the embedding. The reranker, reading the query and each candidate jointly, immediately recognized the actual Release Notes sections were the right answer. Same embedding model, same Chroma index, same network path. Just one extra HTTP hop to a cross-encoder running on a 1080 Ti.
The reranker itself runs as a sidecar container — official llama.cpp Docker image, jina-reranker-v2-base-multilingual Q8 GGUF, pinned to one GPU. The MCP container posts the top-200 dense candidates to it for every query and gets back ranked scores. End-to-end latency is roughly a second, dominated by the embed step (which the docs at help.zerto.com don’t really help me speed up).
What’s next
A few things on the list. None of them block useful usage of what’s already live.
- Filter-aware queries for KB articles. KB articles often apply to many Zerto versions (
8.0; 8.5; 9.0; …) and multiple source/target hypervisor combos. Right nowsearch_docs(version="10.9")doesn’t match KB articles even when they apply to 10.9, because our metadata schema storesversionas a single scalar. Fixable, just hasn’t been done yet. - Hybrid retrieval. Adding BM25 + dense fusion would help the long tail of queries that mention exact error codes, file paths, or API endpoint names — things dense embeddings tend to fuzz over.
- A weekly diff digest. The corpus refresh already detects per-bundle changes. A simple summary of “what Zerto edited this week” would actually be useful for anyone keeping up with the docs.
If you find a query where it returns the wrong thing — or just want to compare what it says against what you expected — drop a note in the comments or reach out. The whole project is a personal sandbox and feedback is genuinely useful.
Update (2026-05-20): Curated gotchas for scripting against the Zerto API
The original retrieval pipeline does well on prose questions — “what’s new in 10.9”, “how do I configure SAML”, that sort of thing. Where it falls down is on hard-won knowledge that isn’t in the docs because nobody bothered to write it on help.zerto.com. The swagger says auth uses an implicit OAuth flow. The actual working code uses Keycloak password grant. The swagger says applyUpgrade.platform is nullable. The server rejects empty. There are dozens of these.
I’ve been collecting them over the years building Zerto integrations — the kind of thing where you’ve already wasted an afternoon before you figure out which detail Zerto forgot to document. Last week I had Claude help me distill them into one ~500-line gotchas doc.
That doc is now baked into the MCP. Three delivery mechanisms because no single one catches every case:
A dedicated zerto_api_lessons tool. Call with no args to get the full doc, or topic="authentication" to filter to a single section. The tool description tells the LLM to call it proactively whenever the user asks for help writing a script, integrating with the Zerto REST API, or working with the swagger — and Claude actually does. Tool descriptions are first-class context in MCP, which is one of the protocol’s better design decisions.
Indexed in the corpus. The lessons doc lives in the same Chroma index as the help.zerto.com pages, so search_docs will surface specific sections via dense + reranker for tangential queries — e.g. “why is my Zerto token expiring after 60 seconds” pulls the auth section even though the query never mentioned “API”.
Auto-hint banner in search_docs results. When the query matches script/API trigger words (script, swagger, automation, oauth, keycloak, bearer, powershell, curl, etc.), a one-line nudge appears at the top of the results pointing at the dedicated tool. Safety net for the case where the LLM doesn’t think to call it proactively.
You don’t need to do anything to pick this up — same MCP endpoint, same client config. Just ask Claude to help you with a Zerto script and the gotchas come along automatically. Brings the tool list to twelve.
A few representative things the doc covers:
- ZCA and ZVM expose the same API. Don’t write two clients.
- Every long-running operation returns
202 Accepted+ an id; you pollGET /v1/tasks/{id}. Six terminal completion states, one of which (FailedWithRollbackFailure) means “manual intervention required” — surface it loudly. - Auth is Keycloak password grant against
/auth/realms/zerto/protocol/openid-connect/token. Not the implicit flow the swagger advertises. Notclient_credentials. applyUpgrade.platformmust be exactly"aws","azure", or"vmware". Swagger says nullable. Server says no.PlatformInformationis null on AWS ZCAs. Detect viaSiteTypeinstead.- Port 9669 was 9.7-and-earlier. 10.x uses 443.
If you’ve got your own war stories from integrating against Zerto, I’d genuinely love to add them — the whole point of this is to capture lived experience that doesn’t otherwise get written down. The full doc is in the repo at corpus/curated-api-lessons/zerto-api-knowledge.md.
Update (2026-05-20, later that day): Hybrid retrieval (BM25 + dense + RRF)
Closer to publication time I hit the limit of dense-only retrieval. Specific technical queries — “python script to create a VPG on AWS”, “curl example for a failover test in ZIC”, “powershell example to monitor a Zerto task” — buried the matching example scripts at dense rank #1000+, even though the chunks contained every query term literally. The reranker couldn’t help, because it never saw them.
This is a known failure mode of single-tower dense embeddings: when a query has rare technical tokens (filenames, language names, error codes) and the broader corpus contains thousands of prose pages that semantically match the query’s topic, the prose wins on sheer mass. The Python script for “create a VPG” can’t out-vector the much larger corpus of Admin guide chunks talking about creating VPGs in flowing English.
The fix is the standard “next thing after rerank” in modern RAG: hybrid retrieval. Add a BM25 (lexical) index alongside the dense one, run both queries in parallel, fuse the ranked lists via Reciprocal Rank Fusion, then hand the merged pool to the existing reranker.
I used SQLite FTS5 for BM25 — built into the stdlib, on-disk like Chroma, builds 73,000 chunks in three seconds. Fusion is the textbook RRF formula. The reranker stays unchanged.
Headline numbers from the same A/B harness as before (20 labeled queries):
| Retriever | MRR | Recall@5 | nDCG@5 |
|---|---|---|---|
| dense (original) | 0.087 | 0.083 | 0.075 |
| dense + reranker | 0.342 | 0.296 | 0.280 |
| BM25 alone | 0.325 | 0.321 | 0.290 |
| hybrid (BM25 + dense + RRF) | 0.260 | 0.308 | 0.225 |
Counterintuitively, pure BM25 outperforms hybrid in this bench because the dense contribution dilutes BM25’s high-confidence wins. In production where the cross-encoder reranker scores the merged pool, hybrid+rerank tops them all — the queries that motivated this change now surface their correct scripts at top-1.
Gated behind HYBRID_SEARCH=true env so dev and local stay on the original path, with a one-env-var kill switch if anything misbehaves. There’s a longer writeup of the failure modes, the eval numbers, and the curated-knowledge layer (KB articles + API lessons + working code examples) coming as a Part 2 — this addendum is just to keep the published post accurate.
![]()