← Back to recipes

Build a conversational data analyst with tool use

operationsintermediateemerging

The problem

Your team asks questions like 'How many donors gave more than £500 last quarter?' or 'Which services had the most referrals in January?' You either answer manually (slow) or they learn SQL (unrealistic). You want an AI assistant they can chat with that actually queries your data to give accurate answers, not just guesses based on seeing the question.

The solution

Build an AI assistant with 'tool use' (also called 'function calling'). The LLM can call functions you define - like executing SQL queries, reading CSV files, or searching documents. When asked a question, it decides which tools to use, executes them, and answers based on real data. This is chat as an interface to your systems.

What you get

A conversational assistant (chatbot, Slack bot, or simple web interface) that answers data questions by actually querying your database or files. Users ask questions in plain English, the AI figures out what data to fetch, executes the query, and explains the results. Accuracy depends on data quality, not AI hallucination.

Before you start

  • Data in queryable format (SQL database, CSV files, or structured documents)
  • Clear understanding of what questions users need answered
  • API key from OpenAI or Anthropic (£10-20 to start)
  • Either: n8n account (low-code approach) OR Python/Node skills (custom code approach)
  • Time to define and test the tools/functions the AI can call

When to use this

  • Staff regularly ask data questions you have to answer manually
  • Data exists in structured format (database, spreadsheets, documented APIs)
  • Questions follow patterns (donor stats, service numbers, financial summaries)
  • You want to democratise data access without teaching everyone SQL
  • Data governance allows AI to query your systems (with appropriate access controls)

When not to use this

  • Data is unstructured or poorly documented (AI can't query chaos)
  • Questions require complex judgement, not just data lookup
  • Data contains highly sensitive information without row-level security
  • Fewer than 10-20 queries per week (manual answering is fine)
  • You need audit trails for every query (tool use can be logged but adds complexity)
  • Data changes so frequently that results are immediately stale

Steps

  1. 1

    Map out your data and common questions

    List what data you have (donor database, service records, financial data) and where it lives (Salesforce, CSV exports, MySQL database). Write down 10-15 common questions staff ask: 'How many new donors last month?', 'What's our average donation?', 'Which programme had most participants?'. This defines what tools you'll build.

  2. 2

    Choose your approach: n8n (low-code) or custom code

    n8n approach: Visual workflow builder with AI agent node. Easier to set up, less flexible. Good for: connecting to common tools (Airtable, Google Sheets, PostgreSQL), teams without developers. Custom code approach: More control, can handle complex logic, requires Python/Node skills. Choose based on your technical capacity and data sources.

  3. 3

    Define your first tool/function

    Start with one simple tool. Example: 'get_donor_count' that queries your database for donor numbers in a date range. Define: (1) Function name, (2) Parameters it accepts (start_date, end_date), (3) What it returns (a number). In n8n: create a workflow that executes SQL. In custom code: write a Python function that queries your database. Test it works manually first.

  4. 4

    Connect the AI with tool use capabilities

    n8n: Use the AI Agent node (LangChain), add your tool as a workflow connection. Reference: https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.agent/. Custom code: Use OpenAI function calling or Anthropic tool use API - provide your function definition to the model. The AI can now decide when to call your tool.

  5. 5

    Test with real questions

    Ask your assistant: 'How many donors did we have in October 2024?' The AI should: (1) Recognise it needs data, (2) Call your get_donor_count tool with the right dates, (3) Receive the result, (4) Answer the question. If it fails: check your function definition is clear, the AI has good context about what data you have, and error handling works.

  6. 6

    Add more tools iteratively

    Add one tool at a time: get_donation_total, search_case_notes, get_service_stats. Test each thoroughly before adding the next. The AI gets better as it has more tools, but too many at once confuses it. Aim for 3-5 core tools that cover 80% of questions. You can always add more later.

  7. 7

    Build a simple interface

    n8n: Can expose as webhook (API endpoint) or connect to Slack/Teams. Custom code: Build a simple Streamlit app for web interface, or integrate with Slack. Start simple - command line is fine for testing. Polish the interface after you've validated the tool use works reliably.

  8. 8

    Add guardrails and monitoring

    Critical: Log all queries the AI makes (what tools it called, with what parameters). Set limits: max queries per user per day, which tables/data the AI can access. Add human confirmation for potentially destructive operations. Monitor for: unusual query patterns, errors, misuse. This is not optional for production use.

Example code

Simple data analyst with OpenAI function calling

Basic conversational analyst using OpenAI function calling. AI queries your database to answer questions accurately.

import openai
import sqlite3
import json
from datetime import datetime

# Configuration
openai.api_key = 'your-api-key'
DB_PATH = 'charity_data.db'  # Your SQLite database

# Define tools the AI can use
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_donor_count",
            "description": "Get the number of donors in a date range",
            "parameters": {
                "type": "object",
                "properties": {
                    "start_date": {
                        "type": "string",
                        "description": "Start date in YYYY-MM-DD format"
                    },
                    "end_date": {
                        "type": "string",
                        "description": "End date in YYYY-MM-DD format"
                    }
                },
                "required": ["start_date", "end_date"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_total_donations",
            "description": "Get total donation amount in a date range",
            "parameters": {
                "type": "object",
                "properties": {
                    "start_date": {"type": "string"},
                    "end_date": {"type": "string"}
                },
                "required": ["start_date", "end_date"]
            }
        }
    }
]

# Implement the actual functions
def get_donor_count(start_date, end_date):
    """Query database for donor count"""
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    cursor.execute("""
        SELECT COUNT(DISTINCT donor_id)
        FROM donations
        WHERE donation_date BETWEEN ? AND ?
    """, (start_date, end_date))
    count = cursor.fetchone()[0]
    conn.close()
    return count

def get_total_donations(start_date, end_date):
    """Query database for total donations"""
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    cursor.execute("""
        SELECT SUM(amount)
        FROM donations
        WHERE donation_date BETWEEN ? AND ?
    """, (start_date, end_date))
    total = cursor.fetchone()[0] or 0
    conn.close()
    return total

# Map function names to actual functions
available_functions = {
    "get_donor_count": get_donor_count,
    "get_total_donations": get_total_donations
}

def chat_with_data(user_question):
    """Main conversation loop with tool use"""
    messages = [
        {"role": "system", "content": "You are a data analyst assistant. You can query our donor database to answer questions. Today's date is " + datetime.now().strftime("%Y-%m-%d")},
        {"role": "user", "content": user_question}
    ]

    # Call OpenAI with tools available
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )

    response_message = response.choices[0].message

    # Check if the AI wants to call a function
    if response_message.tool_calls:
        # Execute the function call
        for tool_call in response_message.tool_calls:
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)

            print(f"AI is calling: {function_name}({function_args})")

            # Execute the function
            function_response = available_functions[function_name](**function_args)

            print(f"Function returned: {function_response}")

            # Send function result back to AI
            messages.append(response_message)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": function_name,
                "content": str(function_response)
            })

        # Get final response with function results
        second_response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages
        )
        return second_response.choices[0].message.content
    else:
        return response_message.content

# Example usage
questions = [
    "How many donors did we have in October 2024?",
    "What was our total donation amount in Q3 2024?",
    "Compare donor numbers between September and October 2024"
]

for question in questions:
    print(f"\nQ: {question}")
    answer = chat_with_data(question)
    print(f"A: {answer}")

Data analyst with Claude tool use

Conversational analyst using Claude tool use. Queries CSV data to answer questions.

import anthropic
import pandas as pd
import json

# Configuration
client = anthropic.Anthropic(api_key='your-api-key')
CSV_PATH = 'donor_data.csv'

# Load data
df = pd.read_csv(CSV_PATH)

# Define tools for Claude
tools = [
    {
        "name": "query_donor_data",
        "description": "Query the donor database. Returns matching records as JSON. Available columns: donor_id, name, total_given, last_donation_date, status",
        "input_schema": {
            "type": "object",
            "properties": {
                "filter_column": {
                    "type": "string",
                    "description": "Column to filter on (e.g., 'status', 'total_given')"
                },
                "filter_value": {
                    "type": "string",
                    "description": "Value to match"
                },
                "aggregate": {
                    "type": "string",
                    "enum": ["count", "sum", "mean", "none"],
                    "description": "Aggregation to perform"
                }
            },
            "required": ["aggregate"]
        }
    }
]

def query_donor_data(filter_column=None, filter_value=None, aggregate="none"):
    """Query donor data from CSV"""
    data = df.copy()

    # Apply filter if specified
    if filter_column and filter_value:
        data = data[data[filter_column].astype(str) == filter_value]

    # Apply aggregation
    if aggregate == "count":
        return len(data)
    elif aggregate == "sum":
        return data['total_given'].sum()
    elif aggregate == "mean":
        return data['total_given'].mean()
    else:
        return data.to_dict('records')[:10]  # Return first 10 records

def chat_with_claude(user_message):
    """Chat with Claude using tool use"""
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        # Check if Claude wants to use a tool
        if response.stop_reason == "tool_use":
            # Extract tool use
            tool_use_block = next(
                block for block in response.content
                if block.type == "tool_use"
            )

            tool_name = tool_use_block.name
            tool_input = tool_use_block.input

            print(f"Claude is calling: {tool_name}({tool_input})")

            # Execute the tool
            if tool_name == "query_donor_data":
                result = query_donor_data(**tool_input)
            else:
                result = {"error": "Unknown tool"}

            print(f"Tool returned: {result}")

            # Continue conversation with tool result
            messages = [
                {"role": "user", "content": user_message},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_use_block.id,
                            "content": json.dumps(result)
                        }
                    ]
                }
            ]

        else:
            # Final response
            return next(
                block.text for block in response.content
                if hasattr(block, "text")
            )

# Example
print(chat_with_claude("How many active donors do we have?"))
print(chat_with_claude("What's the average donation amount?"))

Tools

n8nplatform · freemium · open source
Visit →
OpenAI APIservice · paid
Visit →
Anthropic Claude APIservice · paid
Visit →
Pythonplatform · free · open source
Visit →

Resources

At a glance

Time to implement
days
Setup cost
low
Ongoing cost
low
Cost trend
stable
Organisation size
small, medium, large
Target audience
operations-manager, data-analyst, it-technical, ceo-trustees, fundraising

n8n free tier adequate for testing and small teams. Self-hosted n8n is free forever. API costs depend on usage: £0.01-0.05 per conversation typically. For 100 queries/day: £30-150/month depending on model (GPT-4o-mini cheaper, GPT-4 more capable). Claude Sonnet is a good middle ground.

Part of this pathway