Microsoft Research

A modular graph-based
Retrieval-Augmented Generation system

The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.

Get started View on GitHub

33.1k

GitHub stars

3.5k

Forks

v3.0.9

Latest release

Overview

What makes GraphRAG different

GraphRAG uses knowledge graph memory structures to enhance LLM outputs, moving beyond simple vector search to deliver richer, more connected answers.

Knowledge graph memory

GraphRAG builds a structured knowledge graph from your unstructured text, creating entities, relationships, and communities that ground every LLM response in your data's natural structure.

LLM-powered extraction

Leverages the power of LLMs to extract meaningful, structured data from unstructured text. Entities, relationships, and claims are identified and connected automatically.

Community detection and summarization

Detects communities within the knowledge graph and generates community-level summaries, enabling the system to answer global, structural questions that span your entire dataset.

Modular data pipeline

The data pipeline and transformation suite is fully modular — each step from text chunking to graph construction to answer generation can be configured, extended, or replaced.

How it works

The GraphRAG pipeline explained

GraphRAG transforms raw text into a queryable knowledge graph through a sequence of well-defined stages, each powered by LLMs.

01

Chunk source documents

Raw text is split into manageable chunks. Each chunk preserves enough context for the LLM to extract entities, relationships, and claims with high fidelity.

02

Extract entities and relationships

An LLM reads each chunk and identifies entities (people, places, concepts), the relationships between them, and any claims or attributes attached to each entity.

03

Build the knowledge graph

Extracted entities and relationships are assembled into a knowledge graph. Co-references are resolved, duplicates merged, and the graph structure is persisted for querying.

04

Detect and summarize communities

Community detection algorithms identify clusters of closely connected entities. Each community is summarized by an LLM, producing structured reports that capture the themes and topics of that community.

05

Answer questions with graph context

When a query arrives, GraphRAG retrieves relevant graph context — entities, relationships, and community summaries — and feeds it to the LLM to produce grounded, context-rich answers.

Why it matters

GraphRAG turns unstructured text into structured insight

Most RAG systems rely on flat vector search. GraphRAG adds a knowledge graph layer that captures the relationships between entities, enabling richer, more connected answers.

Global understanding

Vector search finds similar passages. GraphRAG understands the connections between them, answering questions that require synthesizing information across your entire corpus.

Structured grounding

Entities, relationships, and community summaries provide a structured foundation for LLM reasoning, reducing hallucination and improving answer quality.

Proven at scale

Built by Microsoft Research, GraphRAG has been battle-tested on large document collections and is actively maintained with regular releases and community contributions.

Deep dive

What Is GraphRAG? A graph-based approach to RAG

A clear explanation of what GraphRAG is, who built it, how the pipeline works, and how it compares to traditional RAG approaches.

GraphRAG is a modular graph-based Retrieval-Augmented Generation (RAG) system developed by Microsoft Research. Unlike conventional RAG systems that retrieve flat text chunks via vector similarity, GraphRAG first builds a knowledge graph from the source documents — identifying entities, their relationships, and community structures — and then uses that graph to ground LLM responses. This enables the system to answer global, structural, and multi-hop questions that standard RAG approaches struggle with.

Read the full docs
Quick start

How to go from reading about GraphRAG to running it

The project assumes Python 3.10+ and an OpenAI-compatible LLM endpoint. If that matches your setup, this is the shortest path to a first index and query.

# Install GraphRAG
pip install graphrag

# Initialize project
graphrag init --root ./my_project

# Build the knowledge graph index
graphrag index --root ./my_project

# Ask a question
graphrag query --root ./my_project --method global --query "What are the key themes in this data?"
Step 1

Install the graphrag package via pip and run init to create your project structure.

Step 2

Place your source documents in the input directory and run the index command.

Step 3

Query your indexed data using global or local search methods.

FAQ

The fastest answers to the questions people ask first

Start here if you want to understand GraphRAG's key concepts, requirements, and limitations without reading the entire repo first.

What is GraphRAG?

GraphRAG is a modular graph-based Retrieval-Augmented Generation (RAG) system by Microsoft Research. It extracts entities and relationships from unstructured text to build a knowledge graph, then uses that graph to ground LLM responses for richer, more structured answers.

Who created GraphRAG?

GraphRAG comes from Microsoft Research. The repo lives at microsoft/graphrag on GitHub and represents ongoing research into using knowledge graph memory structures to enhance LLM outputs.

How is GraphRAG different from normal RAG?

Normal RAG retrieves flat text chunks via vector similarity. GraphRAG builds a knowledge graph with entities, relationships, and community summaries before retrieval, enabling it to answer global and multi-hop questions that standard RAG cannot handle.

What do you need to run GraphRAG?

GraphRAG requires Python 3.10+, an LLM provider (OpenAI API or compatible), and an embedding model. Indexing can be computationally expensive — start with small datasets and review the cost documentation before scaling.

Is GraphRAG free and open source?

Yes, GraphRAG is open source under the MIT license. It is free to use, modify, and distribute. Note that the service is provided as a demonstration and is not an officially supported Microsoft offering.

Primary sources

Every claim on this page is grounded in the repo

All content here is sourced from the official GraphRAG repository, documentation, or published research so you can verify the details yourself.

The source for the README, project structure, installation, and quick-start commands.

GitHub Repo

Where the README explains the data pipeline, configuration, indexing workflow, and query methods.

Documentation

The Microsoft Research Blog Post introducing the GraphRAG approach and its underlying research.

Research Blog

The GraphRAG paper on Arxiv detailing the methodology, evaluation, and results.

Arxiv Paper