Knowledge Graphs as Memory: Why Your AI Agent Needs to Think in Relationships
RAG gave agents access to documents. Knowledge graphs might give them something more valuable: understanding.
Knowledge Graphs as Memory: Why Your AI Agent Needs to Think in Relationships
RAG gave agents access to documents. Knowledge graphs might give them something more valuable: understanding.
In this series on AI-era engineering practices, we’ve explored how to specify, test, and continuously evaluate AI systems. We’ve talked about the shift from vibe coding to agentic engineering. But there’s a foundational problem we haven’t addressed directly: memory.
AI agents are forgetful. Not occasionally, but fundamentally. They operate with volatile context windows, truncated histories, and almost no persistent understanding of the user’s evolving world. Every conversation starts from scratch. Every task lacks the context of what came before.
This isn’t a bug we can prompt our way around. It’s an architectural limitation that knowledge graphs might finally solve.
What Does “Knowledge Graph as Memory” Actually Mean?
Let me be concrete about what we’re discussing, because “knowledge graph” gets thrown around loosely.
Sir Tim Berners-Lee invented the web because he believed in the idea that knowledge is represented just as much, if not more, by the relationships between facts than the facts themselves, and he envisioned a collaborative system where it is easy to add these links, or relationships, between the various sources or nodes.
A knowledge graph represents information as a network of entities (nodes) and the relationships between them (edges). Instead of storing facts as isolated text chunks, e.g. “Alice works at Acme Pty Ltd” and “Acme Pty Ltd is in Cape Town”, a knowledge graph captures the underlying structure:
(Alice)-[:WORKS_AT]->(Acme Pty Ltd)-[:LOCATED_IN]->(Cape Town)
This structure enables something that vector databases fundamentally cannot: traversal. You can ask “Where does Alice work?” and get Acme Pty Ltd. You can then ask, “Where is that?” and traverse to Cape Town. The relationships are explicit, queryable, and composable.
When we talk about knowledge graphs as agent memory, we’re talking about storing not just what the agent learns, but how different pieces of information relate to each other, and how those relationships change over time.
Traditional agent architectures treat memory as a collection of text snippets retrieved by semantic similarity. You ask a question, the system finds chunks that seem relevant based on embedding similarity, and the LLM tries to synthesise them into an answer.
Knowledge graph memory inverts this. Instead of finding similar text, you traverse explicit relationships. Instead of hoping the LLM can infer connections, you store them directly.
Why RAG Isn’t Enough
Retrieval-Augmented Generation (RAG) has been transformational. It grounds LLM responses in real data, reduces hallucinations, and enables access to information beyond the model’s training cutoff. We’ve built RAG systems, and they work.
But RAG has fundamental limitations that become painfully apparent in agentic applications:
Single-hop retrieval. Traditional RAG finds chunks similar to your query. It struggles when answering requires connecting dots across multiple documents or understanding how facts relate to each other. “What did Alice say about the roadmap before the product launch?” requires temporal reasoning across multiple interactions; something vector similarity alone can’t provide.
No relationship awareness. RAG treats knowledge as disconnected text fragments. It can’t natively understand that Alice manages Bob, Bob wrote the report, and the report affects Project X. These connections exist implicitly in documents, but the retrieval system can’t traverse them.
Semantic similarity isn’t semantic understanding. Finding text that’s “similar” to a query is different from understanding what the query means. RAG might retrieve text mentioning both Product A and Product B without knowing whether they’re competitors, complements, or entirely unrelated.
Stale context. RAG systems typically require reprocessing entire document collections when data changes. For agents that need to learn continuously from interactions, this creates an impractical maintenance burden.
The research backs this up. GNN-RAG (Mavromatis & Karypis, 2024) demonstrated that graph neural retrieval outperforms competing approaches by 8.9–15.5% on answer F1 for multi-hop and multi-entity questions across the WebQSP and CWQ benchmarks. HopRAG (Liu et al., 2025; ACL Findings) showed that graph-structured retrieval with logical traversal achieves over 36% higher answer accuracy and 21% improved retrieval F1 compared to dense vector retrievers on the MuSiQue, 2WikiMultiHopQA, and HotpotQA benchmarks. And SG-RAG (2025) confirmed that subgraph retrieval from knowledge graphs statistically significantly outperformed traditional RAG across 1-hop, 2-hop, and 3-hop questions using both Llama-3 and GPT-4 Turbo. For complex queries requiring synthesis across multiple sources, the gap widens further.
This doesn’t mean RAG is useless; far from it. But for agents that need to reason about relationships, track temporal changes, and maintain coherent understanding across sessions, we need something more.
The Knowledge Graph Advantage
So what do knowledge graphs actually provide that vector databases don’t?
1. Multi-Hop Reasoning
Consider this query: “Which employees who worked on Project Alpha also contributed to the security audit that flagged the issues Bob mentioned in last week’s standup?”
Answering this requires traversing multiple relationships:
Employees → worked on → Project Alpha
Employees → contributed to → Security Audit
Security Audit → flagged → Issues
Bob → mentioned → Issues
Issues → discussed in → Standup
Standup → occurred → Last Week
A knowledge graph can traverse this path directly. RAG would need to retrieve relevant chunks for each entity and hope the LLM can piece together the connections, often unsuccessfully.
2. Temporal Awareness
Facts change. Alice worked at Acme Pty Ltd. Now she works at BigTech. A naive memory system might return conflicting information. A temporally-aware knowledge graph tracks validity periods:
(Alice)-[:WORKS_AT {valid_from: "2023-01", valid_to: "2025-06"}]->(Acme Pty Ltd)
(Alice)-[:WORKS_AT {valid_from: "2025-06"}]->(BigTech)
Now you can ask “Where did Alice work in 2024?” or “How has Alice’s employment changed?” and get accurate, time-aware answers. This bi-temporal model (tracking both when events occurred and when they were recorded) is essential for agents that need to understand how the world evolves.
3. Explainability
When an agent makes a recommendation, you want to know why. Knowledge graphs provide explicit reasoning paths that can be audited. “I recommended this because Alice → manages → Bob, Bob → wrote → this report, and this report → relates to → your question.” The chain of reasoning is transparent, not a black-box embedding similarity score.
4. Incremental Updates
Unlike RAG systems that often require reindexing entire collections, knowledge graphs can be updated incrementally. New facts are added, relationships are modified, and the graph evolves without expensive recomputation. For agents learning from ongoing interactions, this is essential.
The Landscape: Projects and Frameworks
The tooling for knowledge graph agent memory has matured significantly. Here’s what’s worth knowing:
Graphiti / Zep
Graphiti is Zep’s open-source framework for building temporally-aware knowledge graphs. It’s specifically designed for agent memory, handling chat histories, structured data, and unstructured text in a unified graph.
What makes Graphiti interesting is its bi-temporal model, which tracks both when events occurred and when they were ingested. Every relationship includes validity intervals, enabling powerful historical queries. The system achieves sub-300ms retrieval latency by combining semantic embeddings, keyword search, and graph traversal, avoiding LLM calls during retrieval.
In benchmarks, Zep (powered by Graphiti) outperformed MemGPT on the Deep Memory Retrieval benchmark (94.8% vs 93.4%) while reducing response latency by 90% compared to full-context approaches.
Mem0
Mem0 takes a different approach: providing a memory layer that works with or without graph capabilities. The base system uses vector storage for simplicity, with an optional graph layer (Mem0ᵍ) for relationship-aware applications.
The research shows Mem0 achieves 26% higher accuracy than OpenAI’s memory system on the LOCOMO benchmark, with 91% lower latency and 90% token savings compared to full-context approaches. The graph-enhanced variant adds about 2% to accuracy while maintaining reasonable latency.
Mem0 supports Neo4j, Memgraph, and Amazon Neptune as graph backends, with straightforward integration into existing LLM workflows.
Neo4j MCP Servers
Multiple projects now provide Model Context Protocol (MCP) servers for Neo4j integration with AI assistants:
mcp-neo4j-agent-memory - Specialised for memory operations with 10 purpose-built tools
mcp-neo4j-memory-server - Focuses on maintaining context across conversations with semantic search
These enable AI assistants like Claude to build and query persistent knowledge graphs directly, storing facts as nodes and creating semantic relationships between them.
Microsoft GraphRAG / LightRAG
Microsoft’s GraphRAG approaches the problem from a document-analysis angle: automatically extracting knowledge graphs from text corpora using LLMs, then using the graph structure to enhance retrieval.
GraphRAG excels at “global” queries that require understanding the semantic structure of an entire corpus, outperforming naive RAG by 70-80% on comprehensiveness and diversity metrics. The tradeoff is cost, as initial graph construction can be expensive.
LightRAG offers a faster, cheaper alternative with easier incremental updates. For teams wanting stepwise improvement over pure vector RAG without the full complexity, it’s worth evaluating.
When Knowledge Graphs Make Sense (And When They Don’t)
Knowledge graphs aren’t universally superior to vector databases. The right choice depends on your use case.
Knowledge graphs excel when:
Queries require multi-hop reasoning across relationships
Temporal reasoning matters (what changed, when, how)
Explainability and audit trails are important
Your domain has clear entity types and relationship patterns
Agents need to learn and adapt over long interactions
Vector RAG is often sufficient when:
Queries are straightforward, factual lookups
Relationships between data points don’t matter much
You need fast deployment with minimal infrastructure
Your data is highly unstructured without clear entities
Occasional inaccuracy is acceptable
The hybrid approach is increasingly common: Use vector retrieval to cast a wide net for semantic relevance, then use graph structures to refine results and reason over relationships. Many production systems combine both, getting breadth from vectors and precision from graphs.
The Challenges
Let me be honest about the difficulties. Knowledge graphs aren’t a free lunch.
Schema design complexity. Designing an effective graph schema requires a deep understanding of your domain. What are the entity types? What relationships matter? Get this wrong, and you’ll have a graph that doesn’t support the queries you actually need.
Construction overhead. Extracting entities and relationships from unstructured data is non-trivial. LLM-based extraction has improved dramatically, but it adds latency and cost to ingestion pipelines.
Maintenance burden. Graphs require ongoing schema governance and entity resolution. As your domain evolves, the graph needs to evolve too. This isn’t set-and-forget infrastructure.
Query performance at scale. Complex graph traversals can be slow on large graphs, especially with multi-hop queries. Index design and query optimisation matter.
Learning curve. Graph query languages like Cypher require a different way of thinking than SQL or simple API calls. Your team needs to build new skills.
These challenges are real but manageable. The question is whether your use case justifies the investment.
My Experiment: Personal Context with Neo4j + Google ADK
I’ve been experimenting with a personal AI agent that uses Neo4j as its memory layer. The goal: create an agent that understands my personal context; not just individual data points, but the relationships between life events, health, productivity, goals, and outcomes over time.
Here’s what I learned from actually implementing this.
The Architecture
The system uses a multi-agent architecture with Google’s Agent Development Kit (ADK):
Orchestrator Agent
├── User Context Agent (Neo4j MCP Tools)
└── Personal Insights Agent (Data Analysis Tools)
The orchestrator handles conversation flow. At the start of each session, it asks the context agent to retrieve stored context. After each response, it asks the context agent to store anything worth remembering. The insights agent focuses on analysing the data—it doesn’t need to know how memory works.
Critically, the context agent uses the official Neo4j MCP server (neo4j-mcp) via MCP tools. This was simpler than I expected:
# thrively_agent/agents/context_agent.py
from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool import MCPToolset
from mcp import StdioServerParameters
context_agent = LlmAgent(
name="user_context_expert",
description="Manages user context in Neo4j knowledge graph.",
model="gemini-2.0-flash",
instruction=context_agent_instruction,
tools=[
MCPToolset(
connection_params=StdioConnectionParams(
server_params=StdioServerParameters(
command="neo4j-mcp",
env={
"NEO4J_URI": os.environ["NEO4J_URI"],
"NEO4J_USERNAME": os.environ["NEO4J_USERNAME"],
"NEO4J_PASSWORD": os.environ["NEO4J_PASSWORD"],
},
),
),
tool_filter=["get-schema", "read-cypher", "write-cypher"],
)
],
)
That’s it. The agent gets three tools: read-cypher, write-cypher, and get-schema. Everything else is just prompting.
Why Life Events Need Graphs
The insight that made this click: life events don’t exist in isolation. They have causes, effects, and relationships:
A job change → causes stress → impacts sleep quality
A pregnancy → relates to partner → influences wellness goals
An injury → prevents running → requires alternative exercise
Vector similarity search can find documents mentioning these things. But it can’t traverse the relationship: “What’s preventing me from reaching my running goal?” requires understanding that the injury → prevents → running goal path exists.
In Cypher, storing a life event with its impacts looks like:
MATCH (u:User {user_id: $user_id})
// Create the life event
MERGE (e:LifeEvent {name: "New Job at TechCorp"})
SET e.type = "job_change",
e.description = "Started Senior Engineer role",
e.date = date("2026-01-15"),
e.impact = "mixed",
e.status = "active"
// Link to user
MERGE (u)-[:HAS_EVENT]->(e)
// Create personal insight and link
MERGE (i:PersonalInsight {category: "sleep", name: "Sleep Quality Pattern"})
SET i.observation = "Sleep quality drops during high-stress periods"
MERGE (e)-[:IMPACTS]->(i)
// Create affected goal and link
MERGE (g:Goal {name: "Morning Run Goal"})
SET g.description = "Return to 40km/week running",
g.status = "blocked"
MERGE (e)-[:PREVENTS]->(g)
Now, when I ask “Why am I not sleeping well?”, the agent can traverse: User → has_event → New Job → impacts → Sleep Quality Pattern. And when I ask “What’s blocking my running goal?”, it can find: New Job → prevents → Morning Run Goal.
The Prompt That Makes It Work
The context agent’s instruction is deliberately open-ended. I didn’t define a rigid schema; I let the agent decide what’s worth storing:
You are the User Context Expert.
<Context for my specific application and overarching agent goal...>
### Freedom to Create
You are NOT limited to any fixed schema. Create whatever node labels, relationship types, and properties you think best represent the user's context. Some ideas:
- LifeEvent, PersonalInsight, Preference, Goal, Achievement, Challenge, Person, Habit
- Relationships: IMPACTS, CAUSES, MOTIVATES, PREVENTS, RELATES_TO, KNOWS, etc.
- But feel free to invent new ones that better capture the meaning
### Guidelines
- Quality over quantity: only store genuinely meaningful information
- Include temporal context (dates) when mentioned or inferred
- Capture emotional context and impact when relevant
- Link related entities to each other, not just to the User node
This worked surprisingly well. The agent creates reasonable structures without needing exhaustive schema documentation. It invents relationship types like WORRIED_ABOUT or CELEBRATES when they fit the context better than generic RELATES_TO.
What Surprised Me
The schema-less approach works. I expected chaos. Instead, the agent creates coherent graphs. It reuses labels consistently and creates sensible relationships. There’s drift over time, but the graph remains queryable.
Retrieval is fast. Querying the user’s context graph takes 50-100ms. No vector embedding, no reranking, just a Cypher query that traverses relationships.
The agent decides what’s memorable. I don’t have explicit rules for “what to store.” The prompt says “only store genuinely meaningful information”, and the agent makes reasonable judgments. It stores life events and goals but skips routine queries about sleep scores.
Debugging is visual. Neo4j Browser lets me see exactly what the agent stored. When something goes wrong, I can visualise the graph and understand why.
What I’d Do Differently
More explicit temporal handling. The bi-temporal model from Graphiti (tracking both event time and ingestion time) would be valuable. My current approach stores dates but doesn’t systematically track validity periods.
Entity resolution prompts. The agent sometimes creates duplicate nodes for the same concept (”running goal” vs “40km weekly running”). A dedicated deduplication pass would help.
Structured extraction first. Currently, the insights agent writes its response, then delegates to the context agent to store relevant context. A cleaner approach: extract structured entities during the conversation, then batch-write at session end.
The Bottom Line
Neo4j + MCP gives agents relationship-aware memory with minimal infrastructure. The official neo4j-mcpserver provides read-cypher and write-cypher tools that work directly with ADK agents. For local development, it’s a Homebrew install away (brew install neo4j-mcp). For production, Neo4j Aura’s free tier handles POC workloads.
Where This Is Heading
The direction seems clear: agent memory is moving from flat document retrieval toward structured, relationship-aware knowledge representation. Not because graphs are trendy, but because agents that reason about relationships outperform those that don’t.
Several trends reinforce this:
MCP adoption. The Model Context Protocol is becoming a standard for connecting agents to external systems, and graph databases are a natural fit. As MCP matures, expect tighter integration between AI assistants and knowledge graphs.
Temporal reasoning demands. As agents take on longer-running tasks and maintain relationships across sessions, temporal awareness becomes essential. You can’t manage a multi-week project with an agent that forgets yesterday.
Enterprise requirements. Audit trails, explainability, and compliance all benefit from graph-based memory where reasoning chains are explicit and queryable.
Hybrid architectures. The “vector vs. graph” debate is resolving into “vector AND graph.” Combining semantic search with structured knowledge retrieval gives agents both breadth and precision.
The agents that remember best will be the agents that reason best. And reasoning requires structure.
Start Here
If you’re building agents and haven’t explored graph-based memory:
Identify a use case where relationships matter. Not every agent needs a knowledge graph. Find one where multi-hop reasoning or temporal awareness would clearly improve outcomes.
Start with a managed service. Neo4j Aura, Amazon Neptune, or Zep Cloud let you experiment without infrastructure overhead. Build understanding before building infrastructure.
Try an MCP integration. If you’re using Claude or another MCP-compatible assistant, one of the Neo4j MCP servers can give you graph memory capabilities in an afternoon.
Evaluate against your actual queries. Build a small eval set of queries your agent should handle. Compare graph-based retrieval against pure vector retrieval. Let the numbers guide your decision.
Plan for iteration. Your first schema will be wrong. Build incrementally, learn from actual usage, and refine. Graph-based memory is a capability you grow into, not a switch you flip.
The tools are ready. The research supports it. The question is whether your agents need to understand relationships, and, if they do, how much longer they can afford to forget.
This post continues the AI-Era Engineering Practices series. Previous posts have covered Engineering Fundamentals, Writing User Stories for Uncertain Systems, Testing the Untestable, CI/CD/CE: The Third Pillar, and From Vibe Coding to Agentic Engineering.
Sources and Further Reading:
Graphiti (Zep) - GitHub - Open-source temporal knowledge graphs for agent memory
Zep: A Temporal Knowledge Graph Architecture for Agent Memory (arXiv) - Research paper with benchmark results
Mem0 - GitHub - Universal memory layer for AI agents
Mem0 Research: Building Production-Ready AI Agents (arXiv) - Benchmark comparisons
Microsoft GraphRAG - GitHub - Graph-based RAG from Microsoft Research
Neo4j Agent Memory MCP Server - MCP integration for Neo4j
Neo4j Blog: Graphiti Knowledge Graph Memory - Technical deep dive
AWS Blog: Persistent Memory with Mem0 and Neptune - Enterprise deployment patterns
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning (arXiv) - Graph retrieval outperforms flat RAG by 8.9–15.5% on multi-hop KGQA
HopRAG: Multi-Hop Reasoning for Logic-Aware RAG (arXiv / ACL Findings 2025) - Graph-structured traversal achieves 36%+ gains over dense retrievers
SG-RAG: SubGraph RAG for Multi-Hop Question Answering (MDPI) - Subgraph retrieval outperforms traditional RAG across hop counts
Personalised Health Knowledge Graph (PubMed) - Healthcare application of personal knowledge graphs


