Build a conversational data analyst with tool use
The problem
Your team asks questions like 'How many donors gave more than £500 last quarter?' or 'Which services had the most referrals in January?' You either answer manually (slow) or they learn SQL (unrealistic). You want an AI assistant they can chat with that actually queries your data to give accurate answers, not just guesses based on seeing the question.
The solution
Build an AI assistant with 'tool use' (also called 'function calling'). The LLM can call functions you define - like executing SQL queries, reading CSV files, or searching documents. When asked a question, it decides which tools to use, executes them, and answers based on real data. This is chat as an interface to your systems.
What you get
A conversational assistant (chatbot, Slack bot, or simple web interface) that answers data questions by actually querying your database or files. Users ask questions in plain English, the AI figures out what data to fetch, executes the query, and explains the results. Accuracy depends on data quality, not AI hallucination.
Before you start
- Data in queryable format (SQL database, CSV files, or structured documents)
- Clear understanding of what questions users need answered
- API key from OpenAI or Anthropic (£10-20 to start)
- Either: n8n account (low-code approach) OR Python/Node skills (custom code approach)
- Time to define and test the tools/functions the AI can call
When to use this
- Staff regularly ask data questions you have to answer manually
- Data exists in structured format (database, spreadsheets, documented APIs)
- Questions follow patterns (donor stats, service numbers, financial summaries)
- You want to democratise data access without teaching everyone SQL
- Data governance allows AI to query your systems (with appropriate access controls)
When not to use this
- Data is unstructured or poorly documented (AI can't query chaos)
- Questions require complex judgement, not just data lookup
- Data contains highly sensitive information without row-level security
- Fewer than 10-20 queries per week (manual answering is fine)
- You need audit trails for every query (tool use can be logged but adds complexity)
- Data changes so frequently that results are immediately stale
Steps
- 1
Map out your data and common questions
List what data you have (donor database, service records, financial data) and where it lives (Salesforce, CSV exports, MySQL database). Write down 10-15 common questions staff ask: 'How many new donors last month?', 'What's our average donation?', 'Which programme had most participants?'. This defines what tools you'll build.
- 2
Choose your approach: n8n (low-code) or custom code
n8n approach: Visual workflow builder with AI agent node. Easier to set up, less flexible. Good for: connecting to common tools (Airtable, Google Sheets, PostgreSQL), teams without developers. Custom code approach: More control, can handle complex logic, requires Python/Node skills. Choose based on your technical capacity and data sources.
- 3
Define your first tool/function
Start with one simple tool. Example: 'get_donor_count' that queries your database for donor numbers in a date range. Define: (1) Function name, (2) Parameters it accepts (start_date, end_date), (3) What it returns (a number). In n8n: create a workflow that executes SQL. In custom code: write a Python function that queries your database. Test it works manually first.
- 4
Connect the AI with tool use capabilities
n8n: Use the AI Agent node (LangChain), add your tool as a workflow connection. Reference: https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.agent/. Custom code: Use OpenAI function calling or Anthropic tool use API - provide your function definition to the model. The AI can now decide when to call your tool.
- 5
Test with real questions
Ask your assistant: 'How many donors did we have in October 2024?' The AI should: (1) Recognise it needs data, (2) Call your get_donor_count tool with the right dates, (3) Receive the result, (4) Answer the question. If it fails: check your function definition is clear, the AI has good context about what data you have, and error handling works.
- 6
Add more tools iteratively
Add one tool at a time: get_donation_total, search_case_notes, get_service_stats. Test each thoroughly before adding the next. The AI gets better as it has more tools, but too many at once confuses it. Aim for 3-5 core tools that cover 80% of questions. You can always add more later.
- 7
Build a simple interface
n8n: Can expose as webhook (API endpoint) or connect to Slack/Teams. Custom code: Build a simple Streamlit app for web interface, or integrate with Slack. Start simple - command line is fine for testing. Polish the interface after you've validated the tool use works reliably.
- 8
Add guardrails and monitoring
Critical: Log all queries the AI makes (what tools it called, with what parameters). Set limits: max queries per user per day, which tables/data the AI can access. Add human confirmation for potentially destructive operations. Monitor for: unusual query patterns, errors, misuse. This is not optional for production use.
Example code
Simple data analyst with OpenAI function calling
Basic conversational analyst using OpenAI function calling. AI queries your database to answer questions accurately.
import openai
import sqlite3
import json
from datetime import datetime
# Configuration
openai.api_key = 'your-api-key'
DB_PATH = 'charity_data.db' # Your SQLite database
# Define tools the AI can use
tools = [
{
"type": "function",
"function": {
"name": "get_donor_count",
"description": "Get the number of donors in a date range",
"parameters": {
"type": "object",
"properties": {
"start_date": {
"type": "string",
"description": "Start date in YYYY-MM-DD format"
},
"end_date": {
"type": "string",
"description": "End date in YYYY-MM-DD format"
}
},
"required": ["start_date", "end_date"]
}
}
},
{
"type": "function",
"function": {
"name": "get_total_donations",
"description": "Get total donation amount in a date range",
"parameters": {
"type": "object",
"properties": {
"start_date": {"type": "string"},
"end_date": {"type": "string"}
},
"required": ["start_date", "end_date"]
}
}
}
]
# Implement the actual functions
def get_donor_count(start_date, end_date):
"""Query database for donor count"""
conn = sqlite3.connect(DB_PATH)
cursor = conn.cursor()
cursor.execute("""
SELECT COUNT(DISTINCT donor_id)
FROM donations
WHERE donation_date BETWEEN ? AND ?
""", (start_date, end_date))
count = cursor.fetchone()[0]
conn.close()
return count
def get_total_donations(start_date, end_date):
"""Query database for total donations"""
conn = sqlite3.connect(DB_PATH)
cursor = conn.cursor()
cursor.execute("""
SELECT SUM(amount)
FROM donations
WHERE donation_date BETWEEN ? AND ?
""", (start_date, end_date))
total = cursor.fetchone()[0] or 0
conn.close()
return total
# Map function names to actual functions
available_functions = {
"get_donor_count": get_donor_count,
"get_total_donations": get_total_donations
}
def chat_with_data(user_question):
"""Main conversation loop with tool use"""
messages = [
{"role": "system", "content": "You are a data analyst assistant. You can query our donor database to answer questions. Today's date is " + datetime.now().strftime("%Y-%m-%d")},
{"role": "user", "content": user_question}
]
# Call OpenAI with tools available
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
response_message = response.choices[0].message
# Check if the AI wants to call a function
if response_message.tool_calls:
# Execute the function call
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
print(f"AI is calling: {function_name}({function_args})")
# Execute the function
function_response = available_functions[function_name](**function_args)
print(f"Function returned: {function_response}")
# Send function result back to AI
messages.append(response_message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": function_name,
"content": str(function_response)
})
# Get final response with function results
second_response = openai.ChatCompletion.create(
model="gpt-4o",
messages=messages
)
return second_response.choices[0].message.content
else:
return response_message.content
# Example usage
questions = [
"How many donors did we have in October 2024?",
"What was our total donation amount in Q3 2024?",
"Compare donor numbers between September and October 2024"
]
for question in questions:
print(f"\nQ: {question}")
answer = chat_with_data(question)
print(f"A: {answer}")Data analyst with Claude tool use
Conversational analyst using Claude tool use. Queries CSV data to answer questions.
import anthropic
import pandas as pd
import json
# Configuration
client = anthropic.Anthropic(api_key='your-api-key')
CSV_PATH = 'donor_data.csv'
# Load data
df = pd.read_csv(CSV_PATH)
# Define tools for Claude
tools = [
{
"name": "query_donor_data",
"description": "Query the donor database. Returns matching records as JSON. Available columns: donor_id, name, total_given, last_donation_date, status",
"input_schema": {
"type": "object",
"properties": {
"filter_column": {
"type": "string",
"description": "Column to filter on (e.g., 'status', 'total_given')"
},
"filter_value": {
"type": "string",
"description": "Value to match"
},
"aggregate": {
"type": "string",
"enum": ["count", "sum", "mean", "none"],
"description": "Aggregation to perform"
}
},
"required": ["aggregate"]
}
}
]
def query_donor_data(filter_column=None, filter_value=None, aggregate="none"):
"""Query donor data from CSV"""
data = df.copy()
# Apply filter if specified
if filter_column and filter_value:
data = data[data[filter_column].astype(str) == filter_value]
# Apply aggregation
if aggregate == "count":
return len(data)
elif aggregate == "sum":
return data['total_given'].sum()
elif aggregate == "mean":
return data['total_given'].mean()
else:
return data.to_dict('records')[:10] # Return first 10 records
def chat_with_claude(user_message):
"""Chat with Claude using tool use"""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4096,
tools=tools,
messages=messages
)
# Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
# Extract tool use
tool_use_block = next(
block for block in response.content
if block.type == "tool_use"
)
tool_name = tool_use_block.name
tool_input = tool_use_block.input
print(f"Claude is calling: {tool_name}({tool_input})")
# Execute the tool
if tool_name == "query_donor_data":
result = query_donor_data(**tool_input)
else:
result = {"error": "Unknown tool"}
print(f"Tool returned: {result}")
# Continue conversation with tool result
messages = [
{"role": "user", "content": user_message},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": json.dumps(result)
}
]
}
]
else:
# Final response
return next(
block.text for block in response.content
if hasattr(block, "text")
)
# Example
print(chat_with_claude("How many active donors do we have?"))
print(chat_with_claude("What's the average donation amount?"))Tools
Resources
Official guide to building AI agents with tool use in n8n.
OpenAI function calling guidedocumentationHow to give GPT models access to custom functions.
Anthropic Claude tool usedocumentationHow to use tool calling with Claude models.
Building a SQL query agenttutorialOpenAI cookbook example of AI that queries databases.
At a glance
- Time to implement
- days
- Setup cost
- low
- Ongoing cost
- low
- Cost trend
- stable
- Organisation size
- small, medium, large
- Target audience
- operations-manager, data-analyst, it-technical, ceo-trustees, fundraising
n8n free tier adequate for testing and small teams. Self-hosted n8n is free forever. API costs depend on usage: £0.01-0.05 per conversation typically. For 100 queries/day: £30-150/month depending on model (GPT-4o-mini cheaper, GPT-4 more capable). Claude Sonnet is a good middle ground.