GraphRAG Visualizer: Knowledge Graph-Enhanced RAG for Document Analysis
AI & Tech9 min readOctober 10, 2025

GraphRAG Visualizer: Knowledge Graph-Enhanced RAG for Document Analysis

Building a visual interface for exploring GraphRAG-indexed document collections with fast NLP-based graph extraction

GraphRAGKnowledge GraphsRAGNLPOpenAIData VisualizationReact

Introduction

GraphRAG Visualizer is a project for visualizing and exploring knowledge graphs extracted from document collections using Microsoft GraphRAG. The project combines:

  • GraphRAG Indexing Pipeline for extracting entities, relationships, and communities
  • GraphRAG API for local and global search queries
  • GraphRAG Visualizer for interactive exploration of the knowledge graph

While traditional RAG systems (Retrieval-Augmented Generation) rely on simple vector search, GraphRAG goes a step further: It extracts structured knowledge graphs from documents, enabling deeper semantic connections and better answers to complex questions.

Graph structure of an entity with its relationships

Problem Statement: Why GraphRAG?

Limitations of Traditional RAG

Classic RAG systems operate on a simple principle:

  1. Chunking: Documents are divided into small text sections
  2. Embedding: Each chunk is converted into a vector
  3. Retrieval: Upon a query, the semantically most similar chunks are retrieved
  4. Generation: An LLM generates an answer based on the retrieved chunks

The Problem: This method fails with questions that require global knowledge across the entire document corpus.

Example:

"What are the main themes in these 100 research papers?"

A traditional RAG system would only retrieve a few semantically similar chunks – but the question requires a synthesis across all documents.

GraphRAG's Solution

GraphRAG addresses this limitation through:

  1. Knowledge Graph Extraction: Entities and relationships are extracted from the text
  2. Community Detection: Related entities are grouped into thematic clusters
  3. Hierarchical Summarization: Summaries are generated for each community
  4. Global Search: Queries can be answered using all community reports

GraphRAG


traditional RAG


Architecture & Technology Stack

System Overview

My GraphRAG Architecture

Technologies Used

ComponentTechnologyDescription
IndexingMicrosoft GraphRAGKnowledge Graph Extraction Pipeline
LLMOpenAI GPT-4o-miniCommunity Report Generation
EmbeddingOpenAI text-embedding-3-smallQuery Embedding (for Local/Global Search)
APIgraphrag-apiFastAPI Backend for Search Queries
Frontendgraphrag-visualizerReact-based Visualization
Graph Renderingreact-force-graph2D/3D Force-Directed Graph

GraphRAG Indexing: Standard vs. Fast Method

GraphRAG offers two indexing methods with different trade-offs:

Standard Method (graphrag index)

The standard method uses an LLM for all reasoning tasks:

  • Entity Extraction: LLM extracts named entities with descriptions
  • Relationship Extraction: LLM describes relationships between entity pairs
  • Entity/Relationship Summarization: LLM summarizes all instances
  • Community Report Generation: LLM generates summaries for each community

Pros:

  • High-quality, semantically rich descriptions
  • Better graph quality for exploration

Cons:

  • High LLM costs (~75% of indexing costs)
  • Slow processing

Fast Method (graphrag index --method fast)

The fast method replaces LLM reasoning with classic NLP techniques:

  • Entity Extraction: Noun phrases are extracted using NLTK/spaCy (no descriptions)
  • Relationship Extraction: Relationships are based on text-unit co-occurrence
  • No Summarization: Not necessary
  • Community Report Generation: Only this step still uses the LLM

Pros:

  • Significantly lower costs
  • Faster processing

Cons:

  • Less semantically rich descriptions
  • "Noisier" graph

My Configuration: Fast Method with OpenAI

For this project, I chose the Fast Method to minimize costs and enable fast iterations:

// LLM settings
models:
  default_chat_model:
    type: openai_chat
    api_base: https://api.openai.com/v1
    model: gpt-4o-mini
    api_key: ${OPEN_AI_KEY}
    model_supports_json: true
    concurrent_requests: 3
    async_mode: threaded
    retry_strategy: native
    max_retries: 2
    tokens_per_minute: 100000
    requests_per_minute: 200
    completion_params:
      temperature: 0.0
      max_tokens: 1536
    encoding_model: cl100k_base

  default_embedding_model:
    type: openai_embedding
    api_base: https://api.openai.com/v1
    model: text-embedding-3-small
    api_key: ${OPEN_AI_KEY}
    concurrent_requests: 3
    async_mode: threaded

// Input settings
input:
  type: file
  file_type: text
  base_dir: "input"

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

// Workflow settings
embed_text:
  enabled: true

extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english # Fast NLP extraction

cluster_graph:
  max_cluster_size: 10

community_reports:
  model_id: default_chat_model
  graph_prompt: "prompts/community_report_graph.txt"
  text_prompt: "prompts/community_report_text.txt"
  max_length: 2000
  max_input_length: 8000

Important Configuration Points:

  1. embed_text: enabled: true – LanceDB Vector Store is always created by default, even if disabled
  2. extract_graph_nlp.extractor_type: regex_english – Uses regex-based noun phrase extraction instead of LLM
  3. community_reports – The only step that uses the LLM

Indexing Pipeline in Detail

Data Flow

Data Flow

Cost Estimation (Fast Method)

For 2 text files (~100 KB):

StepToken UsageCost (gpt-4o-mini)
NLP Steps0$0.00
Community Reports~40-70k Input, ~5-10k Output~$0.01-0.03
Total~$0.02

For comparison: The Standard Method would cost about $0.20-0.50 for the same corpus.


Output: Parquet Files

After successful indexing, the following Parquet files are generated:

FileContentRequired for Visualizer
entities.parquetExtracted entities (Noun Phrases)✓ Required
relationships.parquetRelationships between entities✓ Required
documents.parquetInput document metadataOptional
text_units.parquetText chunks with entity referencesOptional
communities.parquetCommunity cluster assignmentsOptional
community_reports.parquetLLM-generated community summariesOptional

GraphRAG Visualizer

Features

The visualizer offers three main views:

1. Graph Visualization

Graph Visualization View

Features:

  • 2D/3D rendering with react-force-graph
  • Node coloring by type (Entity, Community, Document, etc.)
  • Interactive hover highlighting
  • Zoom, Pan, and Focus navigation
  • Optional labels for nodes and links

2. Search Interface

Search View

Search Types:

  • Local Search: Finds relevant entities and their context
  • Global Search: Synthesizes answers from all community reports

Example Query:

"Find me all relationships between Trump and Twitter. Analyse them overall."

3. Data Tables

Data Tables View

Enables exploration of:

  • Entities
  • Relationships
  • Documents
  • Text Units
  • Communities
  • Community Reports

Critical Code Components

Graph Data Processing (useGraphData.ts)

The hook transforms Parquet data into a graph format for react-force-graph:

const useGraphData = (
  entities: Entity[],
  relationships: Relationship[],
  documents: Document[],
  textunits: TextUnit[],
  communities: Community[],
  communityReports: CommunityReport[],
  covariates: Covariate[],
  includeDocuments: boolean,
  includeTextUnits: boolean,
  includeCommunities: boolean,
  includeCovariates: boolean
) => {
  const [graphData, setGraphData] = useState<CustomGraphData>({
    nodes: [],
    links: [],
  });

  useEffect(() => {
    // Entity nodes
    const nodes: CustomNode[] = entities.map((entity) => ({
      uuid: entity.id,
      id: entity.title,
      name: entity.title,
      type: entity.type,
      description: entity.description,
      text_unit_ids: entity.text_unit_ids,
      neighbors: [],
      links: [],
    }));

    const nodesMap: { [key: string]: CustomNode } = {};
    nodes.forEach((node) => (nodesMap[node.id] = node));

    // Relationship links
    const links: CustomLink[] = relationships
      .map((relationship) => ({
        source: relationship.source,
        target: relationship.target,
        type: relationship.type,
        weight: relationship.weight,
        description: relationship.description,
      }))
      .filter((link) => nodesMap[link.source] && nodesMap[link.target]);

    // Build neighbor references
    links.forEach((link) => {
      const sourceNode = nodesMap[link.source];
      const targetNode = nodesMap[link.target];
      if (sourceNode && targetNode) {
        sourceNode.neighbors!.push(targetNode);
        targetNode.neighbors!.push(sourceNode);
        sourceNode.links!.push(link);
        targetNode.links!.push(link);
      }
    });

    setGraphData({ nodes, links });
  }, [entities, relationships /* ... */]);

  return graphData;
};

Parquet File Reading (parquet-utils.ts)

Parquet files are read client-side with hyparquet:

export const readParquetFile = async (
  file: File | Blob,
  schema?: string
): Promise<any[]> => {
  const arrayBuffer = await file.arrayBuffer();
  const asyncBuffer = new AsyncBuffer(arrayBuffer);

  return new Promise((resolve, reject) => {
    const options: ParquetReadOptions = {
      file: asyncBuffer,
      rowFormat: "object",
      onComplete: (rows: Record<string, any>[]) => {
        if (schema === "entity") {
          resolve(
            rows.map((row) => ({
              id: row["id"],
              human_readable_id: parseValue(row["human_readable_id"], "number"),
              title: row["title"],
              type: row["type"],
              description: row["description"],
              text_unit_ids: row["text_unit_ids"],
            }))
          );
        }
        // ... other schemas
      },
    };
    parquetRead(options).catch(reject);
  });
};

Graph Visualization (GraphViewer.tsx)

The graph is rendered with react-force-graph-2d:

<ForceGraph2D
  ref={graphRef}
  graphData={graphData}
  nodeAutoColorBy="type"
  nodeRelSize={NODE_R}
  autoPauseRedraw={false}
  linkWidth={(link) => (showHighlight && highlightLinks.has(link) ? 5 : 1)}
  linkDirectionalParticles={showHighlight ? 4 : 0}
  nodeCanvasObjectMode={(node) =>
    showHighlight && highlightNodes.has(node)
      ? "before"
      : showLabels
      ? "after"
      : undefined
  }
  nodeCanvasObject={(node, ctx) => {
    if (showHighlight && highlightNodes.has(node)) {
      paintRing(node as CustomNode, ctx);
    }
    if (showLabels) {
      renderNodeLabel(node as CustomNode, ctx);
    }
  }}
  onNodeHover={showHighlight ? handleNodeHover : undefined}
  onNodeClick={handleNodeClick}
  backgroundColor={getBackgroundColor()}
/>

Project Setup

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • OpenAI API Key

1. GraphRAG Indexing

# Clone and Setup
cd graphrag-api
python -m venv venv
.\venv\Scripts\activate
pip install graphrag

# Set Environment Variable
set OPEN_AI_KEY=sk-your-key-here

# Start Indexing (Fast Method)
cd ragtest
graphrag index --method fast

2. GraphRAG API Server

cd graphrag-api
pip install -r requirements.txt
uvicorn api:app --reload

3. Visualizer Frontend

cd graphrag-visualizer
npm install
npm start

Open http://localhost:3000, upload the Parquet files from ragtest/output/, and explore your knowledge graph!


Results & Observations

Graph Statistics (Example-Twitter)

MetricValue
Input Documents2 Text Files
Total Size100 KB
Extracted Entities336
Extracted Relationships11483
Communities73
Indexing Time (Fast)5
Estimated Costs~$0.02

Performance Observations

Fast Method:

  • NLP Steps (1-8) are very fast (seconds to minutes)
  • create_community_reports_text is the most time-consuming step (~75% of total time)
  • LLM costs are incurred only for Community Reports

Visualizer Performance:

  • Graph rendering can become slow with >1000 nodes
  • 2D rendering is significantly faster than 3D
  • Labels should be disabled for many nodes

Conclusion & Lessons Learned

Key Takeaways

  1. Fast Method is a good compromise – Significantly cheaper than Standard, but sufficient for exploration and prototyping

  2. Community Reports are the core value – The LLM-generated summaries enable Global Search and deep understanding

  3. Visualization makes the difference – The graph helps to discover connections that remain hidden in text form

  4. Configuration is crucial – Chunk size, max_cluster_size, and prompt engineering influence result quality

Areas for Improvement

  • Personalized Labels: Better display names instead of technical IDs
  • Adaptive Graph Rendering: Clustering or aggregation for large graphs
  • Real-time Indexing: Incremental updates instead of full re-indexing

References & Resources

Repositories

Further Reading

Technologies & Tools


Keywords: GraphRAG, Knowledge Graphs, RAG, Retrieval-Augmented Generation, NLP, OpenAI, React, Data Visualization, Microsoft Research