MCP Tool Registry: Automated Creation of RAG Systems

Modern large language models (LLMs) demonstrate impressive capabilities in generating and analyzing texts. However, LLMs have several limitations, two of the most critical being:

the finite size of the context window — the model cannot “remember” large volumes of data;

the lack of access to external tools and data — this makes it difficult to solve complex tasks.

There are two key methods that overcome these limitations:

Retrieval Augmented Generation (RAG) — solves the memory problem by allowing the LLM to query relevant information from external knowledge bases during answer generation.

Model Context Protocol (MCP) — standardizes the interaction of LLMs with external tools, enabling the model to use third-party services, APIs, and systems as if they were its own reasoning and action tools (e.g., Google Calendar or Tavily).

MCP plays a crucial role in building RAG because all RAG components — embeddings, databases, rerankers — can be represented as separate MCP servers. This transforms a monolithic system into a set of modular and interchangeable microservices. However, managing multiple such servers and orchestrating workflows between them becomes a task in itself.

This is where our solution — MCP Tool Registry — acts as the central registry. It does not replace MCP but extends its capabilities by providing a single entry point for managing the entire ecosystem of MCP servers required for RAG. The registry automates complex multi-step processes requiring sequential communication with several servers, hiding this complexity from the end user. Thus, LLM applications can be developed through a single interface, without the need for manual integration of each component.

In this article, we will explain in detail what MCP Tool Registry consists of, how it works, what benefits it provides for businesses and AI engineers, and demonstrate its operation with a practical example.

Introduction

The implemented MCP Registry is a central management node — a single interface between MCP servers and clients. Its architecture is built around the main MCP server, which performs the following key functions:

Registration and discovery of MCP servers

Servers register in the registry, which serves as a unified catalog, allowing clients to easily use available services.

Tool aggregation

Collects metadata from all registered servers and presents them to the client as a unified catalog of capabilities.

Request routing to the appropriate services

Determines which server should handle a given client request, then forwards the request to it.

Health check and monitoring of service states

The system can verify the operability of each server to ensure requests are routed only to active and responsive services.

Authentication and authorization management

Provides access only to authorized users via Bearer tokens.

The developed MCP Registry represents a unified interaction interface with MCP servers, freeing clients from the need to interact manually with each component.

Architecture of the MCP Registry and Its Interaction with the Client

MCP Servers Aggregated in the Registry:

The registry unifies services covering key tasks for working with data and models — from preparing and annotating text and audio to storing vector representations and generating LLM responses. Together, they form a core toolset for building various AI scenarios, where each component is responsible for a specific part of the process, but collectively form a coherent workflow.

To use the MCP Registry, the following infrastructure is required:

A model service for text vectorization (e.g., bge-m3)

A service for extracting text from PDF (e.g., LangChain tools)

A model service for text reranking (e.g., bge-reranker)

A vector database service (e.g., Qdrant)

A PostgreSQL service

An LLM service for text generation (e.g., llama-3-8b-instruct-8k)

A text chunking service (e.g., LangChain)

A transcription service (e.g., Whisper)

As a practical application of MCP Registry’s capabilities, we developed a workflow that automates the construction and operation of a RAG system. The solution allows clients to deploy a local semantic search service on their own data, ensuring security and confidentiality.

The process can be divided into two stages:

1_Automated data preparation for the vector database.

2_Interaction with the ready RAG system via a chat interface.

Workflow of Automated RAG Construction

This is a local proxy server for the registry that processes user data. To send data to the remote server, the base64 format is used, converting binary files into strings for network transfer. As the vector storage, we chose Qdrant due to its high performance and reliability.

Pipeline steps:

4_Save vectors into the vector database.

3_Obtain vector representations of each chunk using the bge-m3 model.

2_Split text into chunks.

1_Extract text from local files.

Using MCP Registry: Example of Creating RAG

Workflow of Automated RAG Construction

Each step of the data preparation workflow uses the corresponding MCP server from the registry. The entire process runs automatically in response to a single user request, without the need for manual intervention. The output is the name of the vector collection containing all processed data.

Workflow for RAG Inference

This workflow is a chat interface where the user can ask questions based on uploaded data. When launched, the collection name from the data preparation stage must be specified.

Five steps of the inference workflow:

1_User submits a query.

2_The query is converted into a vector using the bge-m3 model.

3_A search is performed on the Qdrant vector database.

4_Retrieved chunks are reranked using Reranker.

5_The most relevant chunk is passed to the LLM along with the query to generate the final answer.

Workflow for RAG Inference

The MCP Registry was implemented in Python using the FastMCP library.

Project Codebase

High-level Implementation of MCP Registry. GitHub

The get_server_and_tools function provides full information about servers and their tools.

The router function redirects requests to the appropriate server and calls the required tool. This function implements the Facade pattern, offering a unified programming interface for accessing MCP servers.

The health_check_servers function retrieves the operational status of MCP servers.

The data preparation workflow was written using the LangGraph library and implemented according to the ETL principle.

High-level Implementation of Workflow for Data Preparation for RAG. GitHub

LangGraph was chosen instead of simpler alternatives (such as basic scripts) because it supports the State concept and enables the creation of complex, nonlinear pipelines with conditions and loops.

High-level Implementation of Workflow for RAG Inference. GitHub

For businesses, our MCP Registry serves as a ready-made solution for rapid AI prototyping and deployment. It allows companies to quickly roll out working RAG systems for internal data, saving resources on expensive NLP engineers. The developed workflow prepares data and launches RAG automatically.

The tool is especially valuable for small and medium-sized businesses that lack internal resources and expertise but want to automate processes using AI — for example, documentation analysis, internal knowledge search, or customer support.

For AI engineers and researchers, the solution is not just a platform for experimentation but a production-ready framework for building complex agent systems. They can use the registry to create applications that go beyond simple RAG — such as agents capable of planning task solutions by sequentially using different tools (knowledge base search, computations, API interactions).

A key advantage of the MCP Registry is its ability to easily update or replace individual components — such as the embedding model or database — without rewriting the entire system. This makes it a reusable tool for various tasks. Moreover, the registry’s logic can be packaged into a library to standardize MCP application development, providing a ready-made framework with routing, health checks, and templates for adding new servers. This significantly accelerates integration of MCP servers into new projects, allowing developers to focus on business logic instead of infrastructure.

Practical Value

To demonstrate the MCP Tool Registry in real-world conditions, we integrated it into the popular IDE Cursor, which has built-in MCP support. With just a few steps, local files can be transformed into a fully functional RAG system.

To configure Cursor, MCP servers are connected through the mcp.json file. The configuration file includes the server address and required credentials.

Example of MCP Registry Integration with Cursor

Cursor Configuration File. GitHub

The configuration file for MCP Registry specifies the URL for access and the authorization data set via environment variables.

Cursor should detect the required MCP servers

To prepare data, specify the file paths in the chat and request the agent to prepare data for RAG. The output will be the name of the collection containing the processed data.

Demonstration of Workflow for Data Processing via MCP in Cursor

After preparation, users can start interacting with RAG. The inference workflow was implemented using the rich library. Developed workflows automatically prepare and query data.

Interface for Interacting with RAG

This article presented MCP Tool Registry — a central registry for managing MCP servers. The registry lowers the entry barrier for creating complex AI applications, offering businesses a ready-made tool for rapid prototyping and providing engineers with a standardized experimentation platform.

Its practical value was demonstrated with examples of automated RAG creation and integration into the Cursor IDE — reducing complex tasks to a few simple automated steps.

Conclusion

< Back to homepage