The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.
GitHub stars
Forks
Latest release
GraphRAG uses knowledge graph memory structures to enhance LLM outputs, moving beyond simple vector search to deliver richer, more connected answers.
GraphRAG builds a structured knowledge graph from your unstructured text, creating entities, relationships, and communities that ground every LLM response in your data's natural structure.
Leverages the power of LLMs to extract meaningful, structured data from unstructured text. Entities, relationships, and claims are identified and connected automatically.
Detects communities within the knowledge graph and generates community-level summaries, enabling the system to answer global, structural questions that span your entire dataset.
The data pipeline and transformation suite is fully modular — each step from text chunking to graph construction to answer generation can be configured, extended, or replaced.
GraphRAG transforms raw text into a queryable knowledge graph through a sequence of well-defined stages, each powered by LLMs.
Raw text is split into manageable chunks. Each chunk preserves enough context for the LLM to extract entities, relationships, and claims with high fidelity.
An LLM reads each chunk and identifies entities (people, places, concepts), the relationships between them, and any claims or attributes attached to each entity.
Extracted entities and relationships are assembled into a knowledge graph. Co-references are resolved, duplicates merged, and the graph structure is persisted for querying.
Community detection algorithms identify clusters of closely connected entities. Each community is summarized by an LLM, producing structured reports that capture the themes and topics of that community.
When a query arrives, GraphRAG retrieves relevant graph context — entities, relationships, and community summaries — and feeds it to the LLM to produce grounded, context-rich answers.
Most RAG systems rely on flat vector search. GraphRAG adds a knowledge graph layer that captures the relationships between entities, enabling richer, more connected answers.
Vector search finds similar passages. GraphRAG understands the connections between them, answering questions that require synthesizing information across your entire corpus.
Entities, relationships, and community summaries provide a structured foundation for LLM reasoning, reducing hallucination and improving answer quality.
Built by Microsoft Research, GraphRAG has been battle-tested on large document collections and is actively maintained with regular releases and community contributions.
A clear explanation of what GraphRAG is, who built it, how the pipeline works, and how it compares to traditional RAG approaches.
GraphRAG is a modular graph-based Retrieval-Augmented Generation (RAG) system developed by Microsoft Research. Unlike conventional RAG systems that retrieve flat text chunks via vector similarity, GraphRAG first builds a knowledge graph from the source documents — identifying entities, their relationships, and community structures — and then uses that graph to ground LLM responses. This enables the system to answer global, structural, and multi-hop questions that standard RAG approaches struggle with.
Read the full docsThe project assumes Python 3.10+ and an OpenAI-compatible LLM endpoint. If that matches your setup, this is the shortest path to a first index and query.
Install the graphrag package via pip and run init to create your project structure.
Place your source documents in the input directory and run the index command.
Query your indexed data using global or local search methods.
Start here if you want to understand GraphRAG's key concepts, requirements, and limitations without reading the entire repo first.
GraphRAG is a modular graph-based Retrieval-Augmented Generation (RAG) system by Microsoft Research. It extracts entities and relationships from unstructured text to build a knowledge graph, then uses that graph to ground LLM responses for richer, more structured answers.
GraphRAG comes from Microsoft Research. The repo lives at microsoft/graphrag on GitHub and represents ongoing research into using knowledge graph memory structures to enhance LLM outputs.
Normal RAG retrieves flat text chunks via vector similarity. GraphRAG builds a knowledge graph with entities, relationships, and community summaries before retrieval, enabling it to answer global and multi-hop questions that standard RAG cannot handle.
GraphRAG requires Python 3.10+, an LLM provider (OpenAI API or compatible), and an embedding model. Indexing can be computationally expensive — start with small datasets and review the cost documentation before scaling.
Yes, GraphRAG is open source under the MIT license. It is free to use, modify, and distribute. Note that the service is provided as a demonstration and is not an officially supported Microsoft offering.
All content here is sourced from the official GraphRAG repository, documentation, or published research so you can verify the details yourself.
The source for the README, project structure, installation, and quick-start commands.
GitHub RepoWhere the README explains the data pipeline, configuration, indexing workflow, and query methods.
DocumentationThe Microsoft Research Blog Post introducing the GraphRAG approach and its underlying research.
Research BlogThe GraphRAG paper on Arxiv detailing the methodology, evaluation, and results.
Arxiv Paper