Build a conversational data analyst with tool use

data-analysisintermediateemerging

The problem

Your team asks questions like 'How many donors gave more than £500 last quarter?' or 'Which services had the most referrals in January?' You either answer manually (slow) or they learn SQL (unrealistic). You want an AI assistant they can chat with that actually queries your data to give accurate answers, not just guesses based on seeing the question.

The solution

Build an AI assistant with 'tool use' (also called 'function calling'). The LLM can call functions you define - like executing SQL queries, reading CSV files, or searching documents. When asked a question, it decides which tools to use, executes them, and answers based on real data. This is chat as an interface to your systems.

What you get

A conversational assistant (chatbot, Slack bot, or simple web interface) that answers data questions by actually querying your database or files. Users ask questions in plain English, the AI figures out what data to fetch, executes the query, and explains the results. Accuracy depends on data quality, not AI hallucination.

Before you start

Data in queryable format (SQL database, CSV files, or structured documents)
Clear understanding of what questions users need answered
API key from OpenAI or Anthropic (£10-20 to start)
Either: n8n account (low-code approach) OR Python/Node skills (custom code approach)
Time to define and test the tools/functions the AI can call

When to use this

Staff regularly ask data questions you have to answer manually
Data exists in structured format (database, spreadsheets, documented APIs)
Questions follow patterns (donor stats, service numbers, financial summaries)
You want to democratise data access without teaching everyone SQL
Data governance allows AI to query your systems (with appropriate access controls)

When not to use this

Data is unstructured or poorly documented (AI can't query chaos)
Questions require complex judgement, not just data lookup
Data contains highly sensitive information without row-level security
Fewer than 10-20 queries per week (manual answering is fine)
You need audit trails for every query (tool use can be logged but adds complexity)
Data changes so frequently that results are immediately stale

Steps

1
Map out your data and common questions
List what data you have (donor database, service records, financial data) and where it lives (Salesforce, CSV exports, MySQL database). Write down 10-15 common questions staff ask: 'How many new donors last month?', 'What's our average donation?', 'Which programme had most participants?'. This defines what tools you'll build.
2
Choose your approach: n8n (low-code) or custom code
n8n approach: Visual workflow builder with AI agent node. Easier to set up, less flexible. Good for: connecting to common tools (Airtable, Google Sheets, PostgreSQL), teams without developers. The AI Agent node can use the 'Workflow Tool' to call other n8n workflows as tools. Custom code approach: More control, can handle complex logic, requires Python/Node skills. Choose based on your technical capacity and data sources.
3
Define your first tool/function
Start with one simple tool. Example: 'get_donor_count' that queries your database for donor numbers in a date range. Define: (1) Function name, (2) Parameters it accepts (start_date, end_date), (3) What it returns (a number). In n8n: create a workflow that executes SQL. In custom code: write a Python function that queries your database. SECURITY: Always use parameterised queries (placeholders like ? or %s) rather than string formatting to prevent SQL injection attacks. Test it works manually first.
4
Connect the AI with tool use capabilities
n8n: Use the AI Agent node (LangChain), add your tool as a workflow connection. Reference: https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.agent/. Custom code: Use OpenAI function calling or Anthropic tool use API - provide your function definition to the model. The AI can now decide when to call your tool.
5
Test with real questions
Ask your assistant: 'How many donors did we have in October 2024?' The AI should: (1) Recognise it needs data, (2) Call your get_donor_count tool with the right dates, (3) Receive the result, (4) Answer the question. If it fails: check your function definition is clear, the AI has good context about what data you have, and error handling works.
6
Add more tools iteratively
Add one tool at a time: get_donation_total, search_case_notes, get_service_stats. Test each thoroughly before adding the next. The AI gets better as it has more tools, but too many at once confuses it. Aim for 3-5 core tools that cover 80% of questions. You can always add more later.
7
Build a simple interface
n8n: Can expose as webhook (API endpoint) or connect to Slack/Teams. Custom code: Build a simple Streamlit app for web interface, or integrate with Slack. Start simple - command line is fine for testing. Polish the interface after you've validated the tool use works reliably.
8
Add guardrails and monitoring
Critical: Log all queries the AI makes (what tools it called, with what parameters). Set limits: max queries per user per day, which tables/data the AI can access. Add human confirmation for potentially destructive operations. Monitor for: unusual query patterns, errors, misuse. If the AI can access personal or sensitive data, conduct a Data Protection Impact Assessment (DPIA) before going live - your data protection lead can advise. This is not optional for production use.

Example code

Simple data analyst with OpenAI function calling

Basic conversational analyst using OpenAI function calling (requires openai SDK v1.0.0+). AI queries your database to answer questions accurately. Note: This example uses parameterised queries (the ? placeholders) which protect against SQL injection. Never build SQL strings using f-strings or concatenation with user-provided values.

from openai import OpenAI
import sqlite3
import json
from datetime import datetime

# Configuration - uses OPENAI_API_KEY environment variable by default
# Or pass explicitly: OpenAI(api_key='your-api-key')
client = OpenAI()
DB_PATH = 'charity_data.db'  # Your SQLite database

# Define tools the AI can use
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_donor_count",
            "description": "Get the number of donors in a date range",
            "parameters": {
                "type": "object",
                "properties": {
                    "start_date": {
                        "type": "string",
                        "description": "Start date in YYYY-MM-DD format"
                    },
                    "end_date": {
                        "type": "string",
                        "description": "End date in YYYY-MM-DD format"
                    }
                },
                "required": ["start_date", "end_date"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_total_donations",
            "description": "Get total donation amount in a date range",
            "parameters": {
                "type": "object",
                "properties": {
                    "start_date": {"type": "string"},
                    "end_date": {"type": "string"}
                },
                "required": ["start_date", "end_date"]
            }
        }
    }
]

# Implement the actual functions
def get_donor_count(start_date, end_date):
    """Query database for donor count"""
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    cursor.execute("""
        SELECT COUNT(DISTINCT donor_id)
        FROM donations
        WHERE donation_date BETWEEN ? AND ?
    """, (start_date, end_date))
    count = cursor.fetchone()[0]
    conn.close()
    return count

def get_total_donations(start_date, end_date):
    """Query database for total donations"""
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    cursor.execute("""
        SELECT SUM(amount)
        FROM donations
        WHERE donation_date BETWEEN ? AND ?
    """, (start_date, end_date))
    total = cursor.fetchone()[0] or 0
    conn.close()
    return total

# Map function names to actual functions
available_functions = {
    "get_donor_count": get_donor_count,
    "get_total_donations": get_total_donations
}

def chat_with_data(user_question):
    """Main conversation loop with tool use"""
    messages = [
        {"role": "system", "content": "You are a data analyst assistant. You can query our donor database to answer questions. Today's date is " + datetime.now().strftime("%Y-%m-%d")},
        {"role": "user", "content": user_question}
    ]

    # Call OpenAI with tools available
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )

    response_message = response.choices[0].message

    # Check if the AI wants to call a function
    if response_message.tool_calls:
        # Execute the function call
        for tool_call in response_message.tool_calls:
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)

            print(f"AI is calling: {function_name}({function_args})")

            # Execute the function
            function_response = available_functions[function_name](**function_args)

            print(f"Function returned: {function_response}")

            # Send function result back to AI
            messages.append(response_message)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": function_name,
                "content": str(function_response)
            })

        # Get final response with function results
        second_response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )
        return second_response.choices[0].message.content
    else:
        return response_message.content

# Example usage
questions = [
    "How many donors did we have in October 2024?",
    "What was our total donation amount in Q3 2024?",
    "Compare donor numbers between September and October 2024"
]

for question in questions:
    print(f"\nQ: {question}")
    answer = chat_with_data(question)
    print(f"A: {answer}")

Data analyst with Claude tool use

Conversational analyst using Claude tool use. Queries CSV data to answer questions. Note: Loading entire CSV files into memory with pandas works for small-to-medium datasets (up to ~100k rows). For larger datasets, use a SQL database instead - it handles memory more efficiently and supports complex queries.

import anthropic
import pandas as pd
import json

# Configuration
client = anthropic.Anthropic(api_key='your-api-key')
CSV_PATH = 'donor_data.csv'

# Load data
df = pd.read_csv(CSV_PATH)

# Define tools for Claude
tools = [
    {
        "name": "query_donor_data",
        "description": "Query the donor database. Returns matching records as JSON. Available columns: donor_id, name, total_given, last_donation_date, status",
        "input_schema": {
            "type": "object",
            "properties": {
                "filter_column": {
                    "type": "string",
                    "description": "Column to filter on (e.g., 'status', 'total_given')"
                },
                "filter_value": {
                    "type": "string",
                    "description": "Value to match"
                },
                "aggregate": {
                    "type": "string",
                    "enum": ["count", "sum", "mean", "none"],
                    "description": "Aggregation to perform"
                }
            },
            "required": ["aggregate"]
        }
    }
]

def query_donor_data(filter_column=None, filter_value=None, aggregate="none"):
    """Query donor data from CSV"""
    data = df.copy()

    # Apply filter if specified
    if filter_column and filter_value:
        data = data[data[filter_column].astype(str) == filter_value]

    # Apply aggregation
    if aggregate == "count":
        return len(data)
    elif aggregate == "sum":
        return data['total_given'].sum()
    elif aggregate == "mean":
        return data['total_given'].mean()
    else:
        return data.to_dict('records')[:10]  # Return first 10 records

def chat_with_claude(user_message):
    """Chat with Claude using tool use"""
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        # Check if Claude wants to use a tool
        if response.stop_reason == "tool_use":
            # Extract tool use
            tool_use_block = next(
                block for block in response.content
                if block.type == "tool_use"
            )

            tool_name = tool_use_block.name
            tool_input = tool_use_block.input

            print(f"Claude is calling: {tool_name}({tool_input})")

            # Execute the tool
            if tool_name == "query_donor_data":
                result = query_donor_data(**tool_input)
            else:
                result = {"error": "Unknown tool"}

            print(f"Tool returned: {result}")

            # Continue conversation with tool result
            messages = [
                {"role": "user", "content": user_message},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_use_block.id,
                            "content": json.dumps(result)
                        }
                    ]
                }
            ]

        else:
            # Final response
            return next(
                block.text for block in response.content
                if hasattr(block, "text")
            )

# Example
print(chat_with_claude("How many active donors do we have?"))
print(chat_with_claude("What's the average donation amount?"))

Tools

n8nplatform · freemium · open source

Visit →

OpenAI APIservice · paid

Visit →

Anthropic Claude APIservice · paid

Visit →

Pythonplatform · free · open source

Visit →

Resources

n8n AI Agents documentationdocumentation

Official guide to building AI agents with tool use in n8n.

OpenAI function calling guidedocumentation

How to give GPT models access to custom functions.

Anthropic Claude tool usedocumentation

How to use tool calling with Claude models.

Building a SQL query agenttutorial

OpenAI cookbook example of AI that queries databases.

At a glance

Time to implement: days
Setup cost: low
Ongoing cost: low
Cost trend: stable
Organisation size: small, medium, large
Target audience: operations-manager, data-analyst, it-technical, ceo-trustees, fundraising

n8n free tier adequate for testing and small teams. Self-hosted n8n is free forever. API costs depend on usage: £0.01-0.05 per conversation typically. For 100 queries/day: £30-150/month depending on model (GPT-4o-mini cheaper, GPT-4 more capable). Claude Sonnet is a good middle ground.

Part of this pathway

From prompts to agents

This recipe is in the "Tool use foundations" stage

advanced4 stages

The problem

The solution

What you get

Before you start

When to use this

When not to use this

Steps

Map out your data and common questions

Choose your approach: n8n (low-code) or custom code

Define your first tool/function

Connect the AI with tool use capabilities

Test with real questions

Add more tools iteratively

Build a simple interface

Add guardrails and monitoring

Example code

Simple data analyst with OpenAI function calling

Data analyst with Claude tool use

Tools

Resources

At a glance

Part of this pathway