Blog

This paper introduces the Circuit Tracing method for detecting and analyzing hallucinations in Retrieval-Augmented Generation (RAG) systems. The study builds on two key works from Anthropic: the construction of attribution graphs for large language models (LLMs) and their application to fundamental interpretability tasks. We provide a detailed description of the Circuit Tracing methodology, the process of training a transcoder for the Qwen2.5-7B model, and the design and practical implementation of a hallucination detector.

September 16, 2025

Schema-Guided Reasoning: A New Method of Structuring Reasoning in LLMs to Reduce Errors

This work presents Schema-Guided Reasoning, a method that structures LLM reasoning with predefined schemas enforced by structured outputs and constrained decoding. We describe schema specification and enforcement and the orchestration patterns Cascade, Routing, and Cycle for sequential, branching, and iterative workflows; the method improves reproducibility, accuracy, and auditability, with trade-offs in flexibility and schema design overhead. We demonstrate automated text-to-SQL generation, where SGR reduces critical errors and enables stepwise validation.

September 15, 2025

MCP Tool Registry: Automated Creation of RAG Systems

This work is dedicated to the approach of aggregating and using MCP servers within the MCP Registry. It describes the architecture of the created registry as well as the MCP servers that comprise it. An example of its application is provided in the task of automating the creation of RAG and integrating it into the Cursor IDE.

September 11, 2025

Content Filtering System Based on Large Language Models: Architecture, Limitations, and Prospects

This work addresses the problem of filtering unwanted content in AI-driven systems. We describe a multi-layer architecture that combines fast detection methods with deeper contextual analysis. Special attention is given to resilience against filter bypass techniques and the importance of accounting for legal and cultural differences.

September 10, 2025

< Back to homepage