Create an AI assistant with web search and document tools

operationsintermediateemerging

The problem

Your team wastes hours researching grant opportunities, finding sector benchmarks, or answering policy questions from your 50-page handbook. Each time someone asks 'Are there grants for youth mental health?' or 'What's our safeguarding escalation process?', it's a manual research task. You want an AI assistant that can actually search the web and your documents to find accurate answers.

The solution

Build an AI assistant with tools for web search and document retrieval. When asked a question, the AI decides whether to search online (for grants, benchmarks, news) or search your internal documents (policies, reports, case studies). It retrieves relevant information, synthesises it, and provides sourced answers. This is research-as-a-service for your team.

What you get

A conversational research assistant that answers questions by searching the web and your documents. Example uses: Grant research bot that finds relevant funding opportunities and explains eligibility. Policy Q&A that cites specific sections of your handbook. Impact research assistant that finds sector benchmarks. All answers include sources so you can verify.

Before you start

Clear use case: what will people ask this assistant?
For web search: API key for search service (SerpAPI, Brave Search API, or similar)
For document search: Your documents in searchable format (PDFs, Word docs, or text files)
API key from OpenAI or Anthropic
Either: n8n account OR Python environment for custom build
DATA POLICIES: Check your API provider's data retention and training policies. OpenAI/Anthropic API terms differ from consumer products, but you should verify that your documents are not used for training. Enterprise tiers typically offer stronger data protections.

When to use this

Team regularly asks research questions with findable answers (grants, policies, sector data)
You have documents that need to be searchable (policy handbooks, reports, procedures)
Research tasks are time-consuming but follow patterns
Answers can be verified (you need sources cited, not just AI opinions)
You want to democratise access to knowledge without everyone reading 50-page handbooks

When not to use this

Questions require expert judgement, not just information retrieval
Documents change constantly (assistant would give outdated answers)
Fewer than 10-20 research queries per week (manual is fine)
You need 100% accuracy (AI can misinterpret sources - always verify critical info)
Documents contain highly confidential information you can't expose via chatbot
Web searches would surface sensitive organisational information

Steps

1
Define your use case and gather sources
Choose ONE use case to start: Grant research assistant OR Policy Q&A OR Sector benchmark finder. Don't try to do everything at once. Gather your sources: For grants, you'll search the web. For policies, collect your handbook PDFs. For benchmarks, identify trusted websites (NCVO, Charity Commission, sector bodies).
2
Choose your tech stack
Easiest: n8n with AI Agent node + Google Search tool or Document Search tool. No code required, visual workflow builder. Reference: https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.agent/. More control: Python with LangChain for custom agents and tools. Choose based on technical skills and customisation needs.
3
Set up web search tool
Get API key from SerpAPI, Brave Search, or use n8n's built-in search. Test it manually: search for 'youth mental health grants UK' and check you get relevant results. Configure: how many results to retrieve (5-10), what to extract (title, snippet, URL). This becomes a tool the AI can call.
4
Set up document search tool
Index your documents: Convert PDFs/Word docs to searchable text, split into chunks (paragraphs or sections), create embeddings (numerical representations for semantic search). n8n: Use Vector Store nodes. Custom code: Use LangChain document loaders + FAISS or Chroma vector store. Test: search for a phrase and verify you get relevant chunks.
5
Connect AI with both tools
Configure AI agent with access to both search tools. Provide clear tool descriptions: 'web_search: Search the internet for current information about grants, sector news, or benchmarks' and 'document_search: Search our internal policy handbook and reports'. The AI will decide which tool(s) to use based on the question.
6
Test with realistic questions
Ask questions your team would actually ask: 'Find grants for youth mental health in London', 'What's our volunteer safeguarding policy?', 'What's the sector average for admin costs?'. Check: Does it use the right tool? Are sources relevant? Is the answer accurate? Common issues: AI searches when it should know, or doesn't search when it should.
7
Add source citation and verification
Critical: Make the AI cite sources. Prompt: 'Always provide sources for your answers. For web search: include URLs. For documents: cite the document name and section.' Add a 'verify this' button in your interface so users can check sources. Never present AI answers as fact without attribution.
8
Build simple interface and gather feedback
n8n: Expose as webhook or chat widget. Custom code: Build Streamlit interface or Slack bot. Launch to 3-5 users first. Gather feedback: What questions work well? What fails? Are sources helpful? Iterate based on real use. Don't worry about polish yet - focus on whether it saves time.

Example code

Grant research assistant with web search

Grant research assistant using LangChain and web search. Note: This example uses LangChain patterns that may change between versions - check langchain.com for current best practices if you encounter deprecation warnings. The core concepts (agents with tools) remain the same. Install: pip install langchain langchain-openai langchain-community google-search-results

# Using modern LangChain imports (v0.2+)
from langchain.agents import initialize_agent, Tool, AgentType
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SerpAPIWrapper
import os

# Configuration - uses environment variables by default
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["SERPAPI_API_KEY"] = "your-serpapi-key"

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)

# Set up web search tool
search = SerpAPIWrapper()

tools = [
    Tool(
        name="Web Search",
        func=search.run,
        description="Search the internet for information about grants, funding opportunities, or sector news. Use this for questions about current grant programmes, funder priorities, or deadlines."
    )
]

# Create agent
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    handle_parsing_errors=True
)

# Custom prompt for grant research
prefix = """You are a grant research assistant for a UK charity. Your job is to find relevant grant opportunities and funding information.

When answering:
1. Always search for current information (grants change frequently)
2. Cite your sources with URLs
3. Summarise eligibility criteria clearly
4. Note application deadlines if found
5. If information is unclear, say so

Answer the question below:"""

def find_grants(question):
    """Research grants using web search"""
    full_prompt = f"{prefix}\n\n{question}"
    response = agent.run(full_prompt)
    return response

# Example usage
questions = [
    "Find grant opportunities for youth mental health projects in London",
    "What are Comic Relief's current funding priorities?",
    "Are there any grants for refugee support closing in the next month?"
]

for q in questions:
    print(f"\nQuestion: {q}")
    print("\nResearch:")
    print(find_grants(q))
    print("\n" + "="*80)

Policy Q&A bot with document search

Policy Q&A using document search with citations. Saves index to disk to avoid re-embedding costs on restart. WARNING: Be careful indexing documents containing personal information (e.g., safeguarding reports, HR files, case notes) - the chatbot could surface sensitive details in responses. Only index documents appropriate for all intended users. Install: pip install langchain langchain-openai langchain-community faiss-cpu pypdf

# Using modern LangChain imports (v0.2+)
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
import os

# Configuration
os.environ["OPENAI_API_KEY"] = "your-api-key"
DOCS_FOLDER = "./policy-docs"  # Folder with your PDFs
INDEX_PATH = "./faiss_index"   # Where to save the index

# Try to load existing index (saves embedding costs on restart)
# SECURITY NOTE: allow_dangerous_deserialization=True is required to load saved FAISS indexes
# but means you should only load indexes you created yourself - don't load untrusted index files
# as they could contain malicious pickle data. If security is a concern, use Chroma instead.
embeddings = OpenAIEmbeddings()
if os.path.exists(INDEX_PATH):
    print("Loading existing vector index...")
    vectorstore = FAISS.load_local(INDEX_PATH, embeddings, allow_dangerous_deserialization=True)
    print("Index loaded!")
else:
    # Load and process documents (only needed first time)
    print("Loading policy documents...")
    loader = DirectoryLoader(
        DOCS_FOLDER,
        glob="**/*.pdf",
        loader_cls=PyPDFLoader
    )
    documents = loader.load()

    # Split into chunks for better retrieval
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    chunks = text_splitter.split_documents(documents)

    print(f"Loaded {len(documents)} documents, split into {len(chunks)} chunks")

    # Create vector store for semantic search
    vectorstore = FAISS.from_documents(chunks, embeddings)

    # Save for future use (avoids re-embedding costs)
    vectorstore.save_local(INDEX_PATH)
    print(f"Index saved to {INDEX_PATH}")

# Create Q&A chain
llm = ChatOpenAI(model="gpt-4o", temperature=0.2)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True
)

def ask_policy_question(question):
    """Answer question using policy documents"""
    result = qa_chain({"query": question})

    answer = result["result"]
    sources = result["source_documents"]

    print(f"\nAnswer: {answer}\n")
    print("Sources:")
    for i, doc in enumerate(sources, 1):
        source_file = doc.metadata.get("source", "Unknown")
        page = doc.metadata.get("page", "?")
        print(f"  {i}. {source_file} (page {page})")
        print(f"     Extract: {doc.page_content[:200]}...\n")

# Example usage
policy_questions = [
    "What is our safeguarding escalation process?",
    "What are the requirements for volunteer DBS checks?",
    "What expenses can volunteers claim?"
]

for q in policy_questions:
    print(f"\nQuestion: {q}")
    ask_policy_question(q)
    print("="*80)

Tools

n8nplatform · freemium · open source

Visit →

OpenAI APIservice · paid

Visit →

SerpAPI or similarservice · freemium

Visit →

LangChainlibrary · free · open source

Visit →

Resources

n8n AI Agent with toolsdocumentation

Build AI agents with web search and document tools in n8n (low-code).

LangChain retrieval agentstutorial

Building agents with document retrieval and web search tools.

SerpAPI for web searchdocumentation

Google search API with generous free tier.

Building a RAG chatbottutorial

Retrieval-augmented generation for Q&A over documents.

At a glance

Time to implement: days
Setup cost: low
Ongoing cost: low
Cost trend: stable
Organisation size: small, medium, large
Target audience: operations-manager, fundraising, program-delivery, it-technical

n8n free tier works for testing. SerpAPI: 100 free searches/month, then $50/month for 5000 searches. LLM costs: £0.02-0.10 per conversation depending on model and sources retrieved. For 50 queries/day: £30-150/month total. Self-hosting n8n and using open search APIs reduces costs significantly.

Part of this pathway

From prompts to agents

This recipe is in the "Tool use foundations" stage

advanced4 stages

The problem

The solution

What you get

Before you start

When to use this

When not to use this

Steps

Define your use case and gather sources

Choose your tech stack

Set up web search tool

Set up document search tool

Connect AI with both tools

Test with realistic questions

Add source citation and verification

Build simple interface and gather feedback

Example code

Grant research assistant with web search

Policy Q&A bot with document search

Tools

Resources

At a glance

Part of this pathway