
GraphRAG Visualizer: Knowledge Graph-Enhanced RAG for Document Analysis
Building a visual interface for exploring GraphRAG-indexed document collections with fast NLP-based graph extraction
Introduction
GraphRAG Visualizer is a project for visualizing and exploring knowledge graphs extracted from document collections using Microsoft GraphRAG. The project combines:
- GraphRAG Indexing Pipeline for extracting entities, relationships, and communities
- GraphRAG API for local and global search queries
- GraphRAG Visualizer for interactive exploration of the knowledge graph
While traditional RAG systems (Retrieval-Augmented Generation) rely on simple vector search, GraphRAG goes a step further: It extracts structured knowledge graphs from documents, enabling deeper semantic connections and better answers to complex questions.
Graph structure of an entity with its relationships
Problem Statement: Why GraphRAG?
Limitations of Traditional RAG
Classic RAG systems operate on a simple principle:
- Chunking: Documents are divided into small text sections
- Embedding: Each chunk is converted into a vector
- Retrieval: Upon a query, the semantically most similar chunks are retrieved
- Generation: An LLM generates an answer based on the retrieved chunks
The Problem: This method fails with questions that require global knowledge across the entire document corpus.
Example:
"What are the main themes in these 100 research papers?"
A traditional RAG system would only retrieve a few semantically similar chunks – but the question requires a synthesis across all documents.
GraphRAG's Solution
GraphRAG addresses this limitation through:
- Knowledge Graph Extraction: Entities and relationships are extracted from the text
- Community Detection: Related entities are grouped into thematic clusters
- Hierarchical Summarization: Summaries are generated for each community
- Global Search: Queries can be answered using all community reports
GraphRAG
traditional RAG
Architecture & Technology Stack
System Overview
My GraphRAG Architecture
Technologies Used
| Component | Technology | Description |
|---|---|---|
| Indexing | Microsoft GraphRAG | Knowledge Graph Extraction Pipeline |
| LLM | OpenAI GPT-4o-mini | Community Report Generation |
| Embedding | OpenAI text-embedding-3-small | Query Embedding (for Local/Global Search) |
| API | graphrag-api | FastAPI Backend for Search Queries |
| Frontend | graphrag-visualizer | React-based Visualization |
| Graph Rendering | react-force-graph | 2D/3D Force-Directed Graph |
GraphRAG Indexing: Standard vs. Fast Method
GraphRAG offers two indexing methods with different trade-offs:
Standard Method (graphrag index)
The standard method uses an LLM for all reasoning tasks:
- Entity Extraction: LLM extracts named entities with descriptions
- Relationship Extraction: LLM describes relationships between entity pairs
- Entity/Relationship Summarization: LLM summarizes all instances
- Community Report Generation: LLM generates summaries for each community
Pros:
- High-quality, semantically rich descriptions
- Better graph quality for exploration
Cons:
- High LLM costs (~75% of indexing costs)
- Slow processing
Fast Method (graphrag index --method fast)
The fast method replaces LLM reasoning with classic NLP techniques:
- Entity Extraction: Noun phrases are extracted using NLTK/spaCy (no descriptions)
- Relationship Extraction: Relationships are based on text-unit co-occurrence
- No Summarization: Not necessary
- Community Report Generation: Only this step still uses the LLM
Pros:
- Significantly lower costs
- Faster processing
Cons:
- Less semantically rich descriptions
- "Noisier" graph
My Configuration: Fast Method with OpenAI
For this project, I chose the Fast Method to minimize costs and enable fast iterations:
// LLM settings
models:
default_chat_model:
type: openai_chat
api_base: https://api.openai.com/v1
model: gpt-4o-mini
api_key: ${OPEN_AI_KEY}
model_supports_json: true
concurrent_requests: 3
async_mode: threaded
retry_strategy: native
max_retries: 2
tokens_per_minute: 100000
requests_per_minute: 200
completion_params:
temperature: 0.0
max_tokens: 1536
encoding_model: cl100k_base
default_embedding_model:
type: openai_embedding
api_base: https://api.openai.com/v1
model: text-embedding-3-small
api_key: ${OPEN_AI_KEY}
concurrent_requests: 3
async_mode: threaded
// Input settings
input:
type: file
file_type: text
base_dir: "input"
chunks:
size: 1200
overlap: 100
group_by_columns: [id]
// Workflow settings
embed_text:
enabled: true
extract_graph_nlp:
text_analyzer:
extractor_type: regex_english # Fast NLP extraction
cluster_graph:
max_cluster_size: 10
community_reports:
model_id: default_chat_model
graph_prompt: "prompts/community_report_graph.txt"
text_prompt: "prompts/community_report_text.txt"
max_length: 2000
max_input_length: 8000
Important Configuration Points:
embed_text: enabled: true– LanceDB Vector Store is always created by default, even if disabledextract_graph_nlp.extractor_type: regex_english– Uses regex-based noun phrase extraction instead of LLMcommunity_reports– The only step that uses the LLM
Indexing Pipeline in Detail
Data Flow
Data Flow
Cost Estimation (Fast Method)
For 2 text files (~100 KB):
| Step | Token Usage | Cost (gpt-4o-mini) |
|---|---|---|
| NLP Steps | 0 | $0.00 |
| Community Reports | ~40-70k Input, ~5-10k Output | ~$0.01-0.03 |
| Total | ~$0.02 |
For comparison: The Standard Method would cost about $0.20-0.50 for the same corpus.
Output: Parquet Files
After successful indexing, the following Parquet files are generated:
| File | Content | Required for Visualizer |
|---|---|---|
entities.parquet | Extracted entities (Noun Phrases) | ✓ Required |
relationships.parquet | Relationships between entities | ✓ Required |
documents.parquet | Input document metadata | Optional |
text_units.parquet | Text chunks with entity references | Optional |
communities.parquet | Community cluster assignments | Optional |
community_reports.parquet | LLM-generated community summaries | Optional |
GraphRAG Visualizer
Features
The visualizer offers three main views:
1. Graph Visualization
Graph Visualization View
Features:
- 2D/3D rendering with react-force-graph
- Node coloring by type (Entity, Community, Document, etc.)
- Interactive hover highlighting
- Zoom, Pan, and Focus navigation
- Optional labels for nodes and links
2. Search Interface
Search View
Search Types:
- Local Search: Finds relevant entities and their context
- Global Search: Synthesizes answers from all community reports
Example Query:
"Find me all relationships between Trump and Twitter. Analyse them overall."
3. Data Tables
Data Tables View
Enables exploration of:
- Entities
- Relationships
- Documents
- Text Units
- Communities
- Community Reports
Critical Code Components
Graph Data Processing (useGraphData.ts)
The hook transforms Parquet data into a graph format for react-force-graph:
const useGraphData = (
entities: Entity[],
relationships: Relationship[],
documents: Document[],
textunits: TextUnit[],
communities: Community[],
communityReports: CommunityReport[],
covariates: Covariate[],
includeDocuments: boolean,
includeTextUnits: boolean,
includeCommunities: boolean,
includeCovariates: boolean
) => {
const [graphData, setGraphData] = useState<CustomGraphData>({
nodes: [],
links: [],
});
useEffect(() => {
// Entity nodes
const nodes: CustomNode[] = entities.map((entity) => ({
uuid: entity.id,
id: entity.title,
name: entity.title,
type: entity.type,
description: entity.description,
text_unit_ids: entity.text_unit_ids,
neighbors: [],
links: [],
}));
const nodesMap: { [key: string]: CustomNode } = {};
nodes.forEach((node) => (nodesMap[node.id] = node));
// Relationship links
const links: CustomLink[] = relationships
.map((relationship) => ({
source: relationship.source,
target: relationship.target,
type: relationship.type,
weight: relationship.weight,
description: relationship.description,
}))
.filter((link) => nodesMap[link.source] && nodesMap[link.target]);
// Build neighbor references
links.forEach((link) => {
const sourceNode = nodesMap[link.source];
const targetNode = nodesMap[link.target];
if (sourceNode && targetNode) {
sourceNode.neighbors!.push(targetNode);
targetNode.neighbors!.push(sourceNode);
sourceNode.links!.push(link);
targetNode.links!.push(link);
}
});
setGraphData({ nodes, links });
}, [entities, relationships /* ... */]);
return graphData;
};
Parquet File Reading (parquet-utils.ts)
Parquet files are read client-side with hyparquet:
export const readParquetFile = async (
file: File | Blob,
schema?: string
): Promise<any[]> => {
const arrayBuffer = await file.arrayBuffer();
const asyncBuffer = new AsyncBuffer(arrayBuffer);
return new Promise((resolve, reject) => {
const options: ParquetReadOptions = {
file: asyncBuffer,
rowFormat: "object",
onComplete: (rows: Record<string, any>[]) => {
if (schema === "entity") {
resolve(
rows.map((row) => ({
id: row["id"],
human_readable_id: parseValue(row["human_readable_id"], "number"),
title: row["title"],
type: row["type"],
description: row["description"],
text_unit_ids: row["text_unit_ids"],
}))
);
}
// ... other schemas
},
};
parquetRead(options).catch(reject);
});
};
Graph Visualization (GraphViewer.tsx)
The graph is rendered with react-force-graph-2d:
<ForceGraph2D
ref={graphRef}
graphData={graphData}
nodeAutoColorBy="type"
nodeRelSize={NODE_R}
autoPauseRedraw={false}
linkWidth={(link) => (showHighlight && highlightLinks.has(link) ? 5 : 1)}
linkDirectionalParticles={showHighlight ? 4 : 0}
nodeCanvasObjectMode={(node) =>
showHighlight && highlightNodes.has(node)
? "before"
: showLabels
? "after"
: undefined
}
nodeCanvasObject={(node, ctx) => {
if (showHighlight && highlightNodes.has(node)) {
paintRing(node as CustomNode, ctx);
}
if (showLabels) {
renderNodeLabel(node as CustomNode, ctx);
}
}}
onNodeHover={showHighlight ? handleNodeHover : undefined}
onNodeClick={handleNodeClick}
backgroundColor={getBackgroundColor()}
/>
Project Setup
Prerequisites
- Python 3.10+
- Node.js 18+
- OpenAI API Key
1. GraphRAG Indexing
# Clone and Setup
cd graphrag-api
python -m venv venv
.\venv\Scripts\activate
pip install graphrag
# Set Environment Variable
set OPEN_AI_KEY=sk-your-key-here
# Start Indexing (Fast Method)
cd ragtest
graphrag index --method fast
2. GraphRAG API Server
cd graphrag-api
pip install -r requirements.txt
uvicorn api:app --reload
3. Visualizer Frontend
cd graphrag-visualizer
npm install
npm start
Open http://localhost:3000, upload the Parquet files from ragtest/output/, and explore your knowledge graph!
Results & Observations
Graph Statistics (Example-Twitter)
| Metric | Value |
|---|---|
| Input Documents | 2 Text Files |
| Total Size | 100 KB |
| Extracted Entities | 336 |
| Extracted Relationships | 11483 |
| Communities | 73 |
| Indexing Time (Fast) | 5 |
| Estimated Costs | ~$0.02 |
Performance Observations
Fast Method:
- NLP Steps (1-8) are very fast (seconds to minutes)
create_community_reports_textis the most time-consuming step (~75% of total time)- LLM costs are incurred only for Community Reports
Visualizer Performance:
- Graph rendering can become slow with >1000 nodes
- 2D rendering is significantly faster than 3D
- Labels should be disabled for many nodes
Conclusion & Lessons Learned
Key Takeaways
-
Fast Method is a good compromise – Significantly cheaper than Standard, but sufficient for exploration and prototyping
-
Community Reports are the core value – The LLM-generated summaries enable Global Search and deep understanding
-
Visualization makes the difference – The graph helps to discover connections that remain hidden in text form
-
Configuration is crucial – Chunk size, max_cluster_size, and prompt engineering influence result quality
Areas for Improvement
- Personalized Labels: Better display names instead of technical IDs
- Adaptive Graph Rendering: Clustering or aggregation for large graphs
- Real-time Indexing: Incremental updates instead of full re-indexing
References & Resources
Repositories
- Microsoft GraphRAG - Official GraphRAG Repository
- GraphRAG Documentation - Official Documentation
- graphrag-api - FastAPI Backend
- graphrag-visualizer - React Frontend
Further Reading
- GraphRAG: Unlocking LLM Discovery on Narrative Private Data - Microsoft Research Blog
- GraphRAG Explained: Enhancing RAG with Knowledge Graphs - Medium Article
- From Local to Global: A Graph RAG Approach - Original Paper
Technologies & Tools
- OpenAI API - GPT-4o-mini & text-embedding-3-small
- React - Frontend Framework
- react-force-graph - Graph Visualization
- Material-UI - UI Components
- hyparquet - Client-side Parquet Reader
Keywords: GraphRAG, Knowledge Graphs, RAG, Retrieval-Augmented Generation, NLP, OpenAI, React, Data Visualization, Microsoft Research