← Back to recipes

Build a searchable knowledge base from your documents

complianceintermediateproven

The problem

You've got policies, procedures, training materials, and FAQs scattered across documents. When staff have questions ('What's our lone working policy?', 'How do I process a safeguarding referral?'), they either ask a colleague or spend ages searching through PDFs. The information exists, but it's not findable.

The solution

Use NotebookLM or a RAG (Retrieval Augmented Generation) system to create a searchable knowledge base. Upload all your documents, and staff can ask questions in plain English. The AI finds the relevant sections, synthesises an answer, and cites which document it came from. No need to remember which PDF has what - just ask.

What you get

A conversational interface where staff ask questions and get answers pulled from your documents with citations. 'What's our data retention policy?' returns the relevant policy section plus which document and page it came from. Staff get instant answers, and you reduce repetitive questions to managers.

Before you start

  • Your policies, procedures, and reference materials in digital format (PDF, Word, or text)
  • For NotebookLM: a Google account (easiest option)
  • For custom RAG: Python skills and an OpenAI API key

When to use this

  • Staff frequently ask the same questions that are answered in documentation
  • Your knowledge is scattered across many documents
  • Onboarding new staff takes ages because there's so much to learn
  • You want to reduce the load on managers answering procedural questions

When not to use this

  • Your procedures change daily - the knowledge base would always be out of date
  • You've only got a handful of simple policies - might be quicker to just bookmark them
  • Your documents are so poorly organized that even AI can't make sense of them
  • The questions people ask require human judgement, not procedure lookup

Steps

  1. 1

    Gather your knowledge documents

    Collect all the documents staff need to reference: policies, procedures, training guides, FAQs, org charts, contact lists. Get them into digital format. For NotebookLM you can use PDFs, Word docs, or text. Aim for clarity over volume - 20 good documents beat 100 messy ones.

  2. 2

    Quick option: Use NotebookLM

    Go to NotebookLM, create a new notebook, and upload your documents as 'sources'. That's it. You can now ask questions and it'll pull answers from across all documents, citing where the information came from. Share the notebook with your team. This is the fastest path to a working knowledge base.

  3. 3

    Test with real staff questions

    Ask the questions staff actually ask: 'How do I book annual leave?', 'What do I do if someone makes a safeguarding disclosure?', 'What's our social media policy?'. Check that answers are accurate and cite the right documents. If it struggles, your documents might need better structure.

  4. 4

    Refine document organisation if needed

    If the AI gives wrong answers or can't find information, check your documents. Are policies clearly titled? Are sections well-organised? Sometimes adding a table of contents or clearer headings helps the AI find the right information.

  5. 5

    Optional: Build custom RAG for integration(optional)

    If you need this integrated into your own systems (intranet, Teams bot, etc.), you'll need a custom RAG implementation using the OpenAI API and a vector database. The example code shows the basics. This is more work but gives you more control.

  6. 6

    Keep documents updated

    Set a reminder to update the knowledge base when policies change. Delete old versions, upload new ones. The AI is only as good as the documents you give it. If policies are 3 years out of date, the answers will be wrong.

  7. 7

    Train staff how to ask good questions(optional)

    Show staff how to use it: be specific ('What's the process for closing a safeguarding case?' not just 'safeguarding'), check the citations, and escalate to a human when the answer isn't clear. This is a tool to reduce simple lookups, not replace human judgement.

Example code

Basic RAG implementation with OpenAI

This is a minimal RAG system if you want to build custom. For most charities, NotebookLM is easier.

# This requires: pip install openai chromadb pypdf

from openai import OpenAI
import chromadb
from chromadb.utils import embedding_functions

client = OpenAI()

# Set up vector database
chroma_client = chromadb.Client()
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small"
)

collection = chroma_client.create_collection(
    name="policies",
    embedding_function=openai_ef
)

# Add documents (simplified - in practice you'd chunk long docs)
documents = [
    "Our data retention policy states that service records must be kept for 7 years...",
    "Lone working policy: Staff must not conduct home visits alone if risk assessment indicates...",
    "Safeguarding referral process: 1) Ensure immediate safety. 2) Record disclosure verbatim..."
]

# Index documents
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))]
)

def ask_question(question):
    # 1. Find relevant documents
    results = collection.query(
        query_texts=[question],
        n_results=3
    )

    relevant_docs = results['documents'][0]

    # 2. Generate answer using relevant context
    context = "\n\n".join(relevant_docs)

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant that answers questions based on the provided policy documents. Always cite which policy section you're referencing."
            },
            {
                "role": "user",
                "content": f"""Answer this question based on our policies:

{question}

Relevant policy sections:
{context}

Provide an answer and cite which section you're using."""
            }
        ]
    )

    return response.choices[0].message.content

# Example usage
answer = ask_question("What's our lone working policy?")
print(answer)

# In practice you'd add:
# - Document chunking for long PDFs
# - Metadata (document name, section, date)
# - User interface (web app, Teams bot, etc.)
# - Access controls
# - Update mechanisms

Tools

NotebookLMservice · free
Visit →
OpenAI APIservice · paid
Visit →

Resources

At a glance

Time to implement
hours
Setup cost
free
Ongoing cost
free
Cost trend
stable
Organisation size
small, medium, large
Target audience
operations-manager, it-technical, ceo-trustees

NotebookLM is free and handles most use cases. Custom RAG costs ~£0.001 per query plus setup time. Main cost is organising your documents.

Part of this pathway