The Limits of “Classical” LLMs and Why RAG is a Solution

In this 6-part blog series, we will give you an overview of how we implemented a RAG system in the Science Media Center. But before we get lost in technical details, we’ll start the series with some theoretical background you should know. In this first part, we’ll look at the problems of “classical” LLM (i.e., foundation models such as GPT, Gemini, or Claude models), and why RAG can be an effective solution.

The Issue with Language Models

LLMs have two major issues with respect to their training data: temporal limitations and knowledge gaps.

Temporal Limitations

Let’s start with a simple example. We’ll use the OpenAI API to ask OpenAI’s newest model a question about the federal elections in Germany:

from openai import OpenAI 
client = OpenAI() 

response = client.responses.create( 
    model="gpt-5-nano", 
    input="When are the next federal elections in Germany?" 
)

Now, this is what the model replies:

The next federal election in Germany (Bundestagswahl) is currently scheduled for autumn 2025, since the last one took place on 26 September 2021.

German Bundestag elections are held every four years unless the parliament is dissolved earlier (which is quite rare). The exact date for 2025 has not yet been officially set — the German Federal President fixes it, but it must fall between late August and late October 2025.

If you’re familiar with German politics, you will recognize factual mistakes. The model assumes it is still before November 2024 and therefore does not know that the federal elections in 2025 already took place on February 23, 2025, seven months ahead of schedule due to the collapse of the Scholz governing coalition.

Why does this happen? LLMs are trained on static data. Their world knowledge is essentially frozen at the time of training. That means they are already “outdated” the moment they are published. For example, OpenAI’s most recent model was released in August 2025, but its training data only goes up to September 2024, creating an immediate knowledge gap.

This temporal limitation is particularly problematic for people who need access to the most current information for their work.

Knowledge Gaps

The second issue is topical rather than temporal. LLMs have domain-specific gaps: they haven’t been trained on all possible knowledge. It’s impossible (for now) to train them on the entire world’s information, and some specialized domains may be underrepresented in their training data.

Let’s try a different query:

client = OpenAI() 

response = client.responses.create( 
 model="gpt-5-nano", 
 input="According to the German Außenwirtschaftsverordnung (AWV), § 75 — prohibitions on trade and brokerage transactions concerning goods listed in Part I, Section A of the Ausfuhrliste. With which countries is trade prohibited?" 
)

print(response.output_text)

And the LLM replies:

Trade is prohibited for goods listed in Part I, Section A of the Ausfuhrliste to the following countries (and their control areas):

– Iran
– North Korea (DPRK)
– Syria
– Cuba

If you need, I can provide the exact legal wording or point you to the current BAFA/ AWV page for the latest version.

This isn’t entirely wrong (you can check out the full list of countries here), but it’s not complete either. Several countries are missing, which indicates that the model hasn’t been trained on the full corpus of German trade law. For most users this may not matter, but if you’re explicitly working in that field, relying on such incomplete answers is risky.

From Gaps to Hallucinations

Both temporal and topical gaps contribute to the probably best-known problem with LLMs: hallucinations. When a model doesn’t know the answer, instead of saying “I don’t know”, it generates a convincing-sounding guess. The danger is that these guesses are presented fluently and confidently, making them easy to believe.

(If you’re curious about why hallucinations happen, how they are a structural property of LLMs as prediction systems, and how this connects to humans’ general preference for confident guesses over explicit admissions of ignorance, check out this paper.)

Why RAG Helps

One way to reduce these risks is retrieval-augmented generation (RAG). As the name implies, RAG consists of three steps:

Retrieve: Based on the input question, retrieve relevant text passages from a knowledge base that we define and control (more on that in the next post).
Augment: Feed those retrieved text passages into the LLM alongside the question.
Generate: Let the LLM generate an answer to the question, grounded in those retrieved text passages.

By connecting LLMs with curated, up-to-date sources, RAG compensates for both temporal and topical gaps and helps mitigate hallucinations. Think of it as giving the LLM a research assistant that can quickly find and provide relevant, current information from trusted sources.

In the next post, we’ll look more closely at how a RAG pipeline works and how relevant text sections can be identified and retrieved from your knowledge base.