Beyond Vector Search: Hybrid Retrieval with Neo4j and Pinecone for Multi-Hop Reasoning
Vector search is great until you ask it a question with a “via” in it.
“What proposals did Maria send to customers that her team manages, before Q4?”
Pure semantic retrieval will return documents about Maria, customers, proposals, and Q4 — and then the LLM has to figure out the relationships, often badly. The model doesn’t know that “her team” is a graph traversal across User → REPORTS_TO ← User → MANAGES → Customer. It just sees a bag of chunks.
For the AI-native knowledge platform we’re building at EG-Labs, this kind of multi-hop, relational query is the common case, not the edge case. So we ended up with a hybrid retrieval layer: Neo4j for entities and relationships, Pinecone for unstructured semantic content, and a planner that decides which one to ask first.
This post is what I’d tell a friend who’s about to build this.
When pure vector search isn’t enough
Vector RAG works beautifully when:
- The corpus is mostly documents
- Questions are answered by finding 1–3 chunks
- “Topical similarity” approximates relevance
It struggles when:
- The corpus is about people, accounts, deals, projects — not just documents
- Questions involve relationships (“who reports to X”, “deals that touched team Y”)
- The same noun phrase resolves to multiple entities and you need disambiguation
- “Recency” or “ownership” or “membership” is part of the answer, not just the search
If you find yourself stuffing structured metadata into chunk text just so the embedder picks it up — [Owner: Maria, Customer: Acme, Stage: Q4-2025] — you’re already trying to encode a graph in vector space. It’s a bad bet at scale.
Architecture
We split storage by content shape:
┌──────────────────────────────────────────────┐
│ Ingestion (Nango + custom connectors) │
│ Slack · Google Drive · GitHub · CRM · ... │
└─────────────┬──────────────┬─────────────────┘
│ │
(entities & │ │ (long-form text:
relations) │ │ docs, threads,
│ │ comments)
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Neo4j │ │ Pinecone │
│ (graph core) │ │ (vector core)│
└──────┬───────┘ └──────┬───────┘
└─────┬───────────┘
▼
┌──────────────────────┐
│ Retrieval planner │
│ (graph-first / │
│ vector-first / │
│ hybrid) │
└──────────┬───────────┘
▼
┌────────────────┐
│ Answer LLM │
└────────────────┘
Two stores, one router, one answer model.
Modeling: what goes where
Neo4j — entities and the relationships between them
For each integration we ingest, we extract a small set of canonical entity types and the relationships between them:
// Nodes
(:User {id, email, name, ext_ids: {slack: ..., gdrive: ...}})
(:Customer {id, name, segment})
(:Deal {id, stage, amount, close_date})
(:Document {id, source, title, url, modified_at})
(:Channel {id, name})
(:Project {id, name})
// Relationships
(:User)-[:REPORTS_TO]->(:User)
(:User)-[:MANAGES]->(:Customer)
(:User)-[:OWNS]->(:Deal)
(:User)-[:AUTHORED]->(:Document)
(:User)-[:MEMBER_OF]->(:Channel)
(:Document)-[:MENTIONS]->(:User|:Customer|:Deal)
(:Document)-[:IN]->(:Channel)
(:Deal)-[:WITH]->(:Customer)
Crucially: documents are nodes too. We store the document’s metadata in Neo4j (author, source, timestamps, mentions) but the body lives in Pinecone. The graph node holds a pointer (pinecone_id) so we can pivot.
Pinecone — long-form semantic content
The unstructured side is conventional. We chunk the body of each document, embed with a mid-sized model, and store with rich metadata:
metadata = {
"doc_id": doc.id, # joins back to Neo4j
"source": "slack" | "gdrive" | ...,
"author_id": author_neo4j_id, # joins back to Neo4j
"modified_at": ts,
"mentions": [neo4j_ids...], # also in Neo4j as relations
"permissions": [...], # ACL filter at query time
}
The doc_id and author_id are the bridge. After a vector search returns chunks, we lift each chunk back into the graph by its doc_id and traverse from there.
Retrieval: pick a starting point, then traverse
The hard part isn’t the storage — it’s deciding where to start a query. We have a small classifier (currently prompt-based, will likely become a fine-tuned model) that sorts incoming questions into three retrieval shapes:
1. Graph-first
The question is about specific entities or relationships:
“What deals is Maria’s team working on?”
The plan:
MATCH (m:User {name: "Maria"})<-[:REPORTS_TO]-(report:User)
MATCH (report)-[:OWNS]->(d:Deal)
WHERE d.stage <> "closed_lost"
RETURN d.id, d.stage, d.amount, d.close_date, report.name
Then for each Deal.id we pull the most recent associated documents from Pinecone (filtered by mentions containing the deal’s id) for context. The graph gives us the answer skeleton; vector search fills in the flesh.
2. Vector-first
The question is about content, not entities:
“What are people complaining about in the onboarding flow?”
Standard semantic search over Pinecone, then we enrich the results via the graph — for each top-k chunk, we look up its doc_id in Neo4j and pull author, channel, related entities. The LLM answers using the chunks but cites using the graph metadata.
3. Hybrid (the interesting one)
The question is both:
“Summarize what the data team has been saying about the migration this quarter.”
Two parallel queries:
- Graph: find the data team (
User -[:MEMBER_OF]-> Channel {name: 'data-team'}), get the user ids, scope to the last 90 days - Vector: semantic search for “migration” within Pinecone, filtered by the user ids returned from the graph
The vector search has its candidate space narrowed by graph membership, which means we don’t need a big top-k — top_k = 10 against ~50 candidates is far better than top_k = 50 against the whole index. Latency drops, recall on the right documents goes up.
This is the pattern most people miss when first building hybrid: the graph is a prefilter for the vector search, not just a separate fetch.
The planner
We tried two approaches before settling on the third.
Attempt 1: LLM-as-router with tool calling. Give the model query_graph(cypher) and query_vectors(query, filters) tools and let it choose. Worked, but: latency was bad (the routing call alone added 800ms), the model occasionally invented Cypher that didn’t match our schema, and debugging “why did it route this query that way?” was hell.
Attempt 2: Pure heuristic classifier. Bag-of-keywords (who, what, when → graph-first; “summarize”, “explain” → vector-first). Cheap, fast, dumb. Worked for 60% of queries.
Attempt 3 (current): Constrained-output classifier with hard fallbacks. A small model (Haiku-class) emits a JSON plan:
{
"shape": "hybrid",
"graph": {
"anchor_entities": ["data-team"],
"anchor_types": ["Channel", "User"],
"time_window_days": 90
},
"vector": {
"query_text": "migration",
"filter_by_graph": true
}
}
If the model produces a plan that fails schema validation, we fall back to vector-only with no graph filter. Boring works.
Things I learned the hard way
1. Permissions belong in both stores. We initially put ACLs only in Pinecone metadata. Then someone asked a graph-first question and we returned data they shouldn’t have seen. Now every Neo4j relationship and every Pinecone chunk carries the same audience set, and queries always filter on it. The duplication is annoying; the alternative is a security incident.
2. Don’t model the world. Model the answer. Our first schema had 30 entity types and 60 relationships because we tried to be faithful to “the domain”. Recall on graph queries was bad because the planner didn’t know which of three near-synonym relationships to pick. We collapsed to 8 types / 15 relationships and recall jumped. Graph schemas should be coarse enough that the planner can hit them reliably — not fine-grained ontologies for their own sake.
3. Eval graph retrieval separately from vector retrieval. We had a vector eval set early. We didn’t have a graph eval set until we shipped a regression that degraded multi-hop answers by 40% and only noticed because a customer flagged it. Now: 100 hand-labeled “answer-supporting subgraph” cases, run on every schema or query change.
4. Cypher generation is brittle. Cypher templates are not. Letting the LLM generate Cypher from scratch is asking for trouble. We expose ~12 parameterized Cypher templates (think: entity_neighbors, time_scoped_authored, team_membership, path_between) and the planner only fills in slots. Coverage is good enough; safety is much better.
5. Update the graph incrementally. Re-ingesting from scratch is expensive and error-prone. Each connector emits change events; we apply them as MERGE operations with last_seen timestamps and run a sweep nightly to expire stale relations. Pinecone updates are similarly delta-driven.
What I’d build next
A few things on the roadmap:
- Embedding-on-graph retrieval — embed entities themselves (their neighborhood as a subgraph summary) so we can do “find entities semantically like this one” without leaving Neo4j. Initial experiments look promising.
- Better disambiguation. “Maria” is currently resolved by exact match plus most-recent-interaction tiebreaker. A learned entity-resolution model would handle “the Maria from sales” vs. “the Maria from data” much more reliably.
- Streaming partial answers. When the graph plan returns 50 candidate deals and we’re enriching each with vector context, we could stream results to the user as they’re enriched instead of waiting for the full set. Latency is the silent UX killer.
The takeaway
If your domain is fundamentally relational — people, accounts, deals, projects, organizations — pure vector RAG will paper over the structure for a while and then quietly stop working. Hybrid retrieval over a graph + vector store is more upfront work, but it pays for itself the first time someone asks a question with a “via” in it.
The architecture is mundane. The leverage is in the planner and the evals.