Automating Invoice Analysis with Needle and Crew AI
Stop wrestling with spreadsheets. Learn how to automate invoice analysis and generate detailed expense reports in minutes.

Key Takeaways
- Needle + Crew AI automates invoice analysis end-to-end - from data retrieval to expense report generation
- The integration uses ~50 lines of Python code and requires only Needle + OpenAI API keys
- 4-step workflow: search invoice data, build AI agent, define analysis task, orchestrate with Crew AI
- Output is a structured markdown report with vendor breakdowns, totals, and cost-saving recommendations
- Replaces hours of manual spreadsheet work with a process that runs in minutes
Let's face it - nobody enjoys spending hours parsing invoices. Beyond being mind-numbingly tedious, it's a massive waste of time that keeps your team from doing what they do best.
We built something at Needle that could be a game-changer: We combined our AI search with Crew AI to analyze invoices automatically. Feed it your raw invoice data, and it spits out detailed expense reports. No more spreadsheet wrestling matches; just clear insights that help you make better decisions.
The Challenge: Transforming Unstructured Invoice Data
Every organization generates invoices, but manually combing through them to extract key expense details is inefficient and error-prone. Our goal was simple:
- Retrieve invoice data quickly: Use Needle's ability to search across unstructured data sources.
- Analyze expenses automatically: Leverage Crew AI to build an "Expense Analyst" agent.
- Generate actionable insights: Output a structured, markdown report highlighting spending patterns and potential cost optimizations.
Manual vs. Automated Invoice Analysis
| Factor | Manual Process | Needle + Crew AI |
|---|---|---|
| Time per batch | 2–4 hours | 2–5 minutes |
| Error rate | High (manual data entry) | Low (AI-extracted, consistent) |
| Output format | Spreadsheet | Structured markdown report |
| Cost-saving insights | Requires separate analysis | Included automatically |
| Setup effort | N/A (recurring manual work) | ~50 lines of Python, one-time |
The Integration: Needle Meets Crew AI
At a high level, our integration involves two components:
- Data Retrieval: Needle's API searches a connected knowledge base (in our case, a collection of invoice text files) and returns relevant data.
- Automated Analysis & Reporting: A Crew AI agent analyzes the retrieved data, categorizes expenses, and outputs a comprehensive report.
Let's break down the key parts of the integration.
Code Walkthrough: 4 Steps
Step 1: Searching Your Invoice Data
We start by defining a tool that leverages Needle's AI search. This function sends a query to our Needle collection and returns the top 20 matching results from our invoice data. Make sure to have OpenAI and Needle API keys set in your .env file.
from needle.v1 import NeedleClient
from crewai.tools import tool
@tool("Search Knowledge Base")
def search_knowledge_base(query: str) -> str:
"""
Retrieve information from your knowledge base containing unstructured data such as
invoices, reports, emails, and more.
Args:
query (str): The search query to find relevant invoice data.
"""
ndl = NeedleClient()
return ndl.collections.search(
collection_id="clt_01JJVNDYX8CCK8TN8FJ6WYFKH9", # Replace with your actual collection ID
text=query,
top_k=20,
)This snippet defines the search_knowledge_base function as a tool that our AI agent can use. The NeedleClient is called to search within a specific collection for invoice-related information.
Step 2: Building the Expense Analyst Agent
Next, we configure our Crew AI agent. This agent is given a role, a goal, and even a backstory to guide its actions. The agent uses our search tool to pull in invoice data and then processes it to generate an analysis.
from crewai import Agent, Task, Crew
analyst = Agent(
role="Expense Analyst",
goal="Create detailed expense analysis and categorization from invoice data",
backstory="""
You are a meticulous expense analyst with expertise in financial data analysis
and cost categorization. You excel at breaking down expenses, identifying patterns,
and providing actionable cost-saving insights.
""",
verbose=True,
tools=[search_knowledge_base],
)This setup gives the agent context and purpose, making it more than just a script - it becomes a virtual expense analyst.
Step 3: Defining the Analysis Task
We then define a task that outlines the steps our agent must follow. This includes grouping expenses, calculating totals, and providing recommendations.
analysis_task = Task(
description="""
Search, find, and analyze invoices to create a detailed expense drilldown report.
Steps to follow:
1. Group expenses and calculate total spend by vendor.
2. Calculate the gross total spend.
3. Identify potential cost-saving opportunities.
The report should include:
- An executive summary.
- A vendor-wise breakdown.
- Recommendations for cost optimization.
""",
expected_output="""
An expense analysis report with clear sections and actionable recommendations in markdown format.
""",
agent=analyst,
)This task instructs the agent on how to transform raw invoice data into a structured and meaningful report.
Step 4: Orchestrating the Process
Finally, we put everything together in our main script. Crew AI orchestrates the agent and task, executing the entire workflow when the script is run.
if __name__ == "__main__":
crew = Crew(agents=[analyst], tasks=[analysis_task], verbose=True)
crew.kickoff()To run the project, you simply install the dependencies and execute the script:
pipenv install
pipenv run python main.pyProject Structure & Invoice Files
In our repository (found under needle-examples/invoice_summarizer), you'll notice a directory called invoices containing multiple invoice text files (e.g., INV-11776.txt, INV-24749.txt, etc.). These files serve as the unstructured data that Needle's AI search scans through, simulating a real-world scenario where invoices are stored across various documents.
Summary
This integration between Needle and Crew AI turns hours of manual invoice parsing into a 2–5 minute automated process. With ~50 lines of Python, you get a system that retrieves invoice data via Needle's AI search, analyzes expenses through a Crew AI agent, and outputs structured reports with vendor breakdowns and cost-saving recommendations. Whether you're dealing with a small set of invoices or a vast repository of financial data, this approach saves time, reduces errors, and provides actionable insights for better expense management.
If you're interested in diving deeper or exploring similar integrations, feel free to reach out or comment below. Happy automating!
Stay tuned to our blog for more updates, technical deep-dives, and success stories from the world of AI-powered automation.

