Go to article URL

We recently released LLM Expanded Search for our blog’s vector search. It builds on what we covered in our earlier posts about AI-powered search and vector search basics. Here’s how we built it with our internal AI setup (Open WebUI running an OpenAI-compatible API), why it makes search better, and what’s coming next.

What “LLM Expanded Search” actually does

Here’s the basic idea: when you search for something, we first ask an LLM to come up with related terms and phrases. Then we search for all of those terms, not just your original query.

This really helps with short or vague searches where regular vector search might miss the relevant context — for example, “S3” refers to Amazon S3, which is a cloud object storage system, so whereas “S3” doesn’t provide enough context for a useful vector search. An LLM can expand this short search and include context about cloud object storage in general, as well as give enough context to return results about S3.

How it works

The frontend is pretty straightforward: our search bar has two options, “Search” (just hit Enter) and “LLM Expanded Search” (Shift/​Ctrl/​Command+Enter).

When you use expanded search, here’s what happens:

Here’s a simple example of how we expand queries:

from openai import OpenAI
import os

client = OpenAI(
    base_url=os.getenv("OPENAI_BASE_URL"),   # e.g., http://openwebui.local/api/v1
    api_key=os.getenv("OPENAI_API_KEY")      # token managed in your environment
)

def expand_query(raw_query: str) -> list[str]:
    messages = [
        {
            "role": "system",
            "content": (
                "You expand a short search query into a concise, comma-separated list of "
                "synonyms and closely related phrases (8–15 items). No explanations."
            )
        },
        {"role": "user", "content": raw_query}
    ]
    res = client.chat.completions.create(
        model=os.getenv("OPENAI_MODEL", "local-llm"),
        messages=messages,
        temperature=0.2,
        max_tokens=200,
    )
    text = res.choices[0].message.content
    return [t.strip() for t in text.split(",") if t.strip()]

After that, we embed the original query and the expanded terms, search the vector index, then sort by score and drop duplicates so each post appears once. Finally, we render concise snippets.

For example, after a similarity search you can rank and de-duplicate like this:

# given: results = [(doc, score), ...]
valid = [(d, float(s)) for d, s in results if float(s) > 0.05]
valid.sort(key=lambda x: x[1], reverse=True)  # highest score first

seen = set()
unique = []
for doc, score in valid:
    src = doc.metadata.get("source", "")
    if src not in seen:
        unique.append((doc, score))
        seen.add(src)

# unique now holds top ranked, de‑duplicated posts

Why we chose Open WebUI

A few reasons made Open WebUI the right choice:

What’s next: Moving more into Open WebUI

We’re looking into moving more of the search pipeline directly into Open WebUI workflows:

  1. Query expansion (LLM)
  2. Vector retrieval (custom tool that hits our index)

This would give us tighter integration, fewer network calls, and simpler deployment, and make it easier to try new approaches.

What you’ll notice when using it

Give it a try at our blog. Just use the search bar in our header: press Enter for regular search, or Shift/​Ctrl/​Command+Enter for LLM Expanded Search.

Want to know more about why we built this? Check out the announcement and vector search posts linked above.

If you’re interested in setting up LLM-expanded vector search or running something similar self-hosted with Open WebUI, we’d love to help out.

www.endpointdev.com/blog/feed.xml
programming | reporting