// 07_AI_AGENTS

The BeeAI Security Analyst

Jan 20, 2026 • Interactive Web Demo

While CLI agents are powerful, sometimes a visual dashboard is essential for monitoring complex, multi-step reasoning processes. The BeeAI Analyst combines the raw power of local LLMs with a reactive FastAPI web interface.

Below is a simulation of the actual web interface. Watch as the agent receives a high-level security mandate, plans its research using ThinkTool, gathers intelligence via DuckDuckGo and Wikipedia, and synthesizes a final strategic report.

🛡️ BeeAI Analyst (FastAPI)

📟 Live Terminal Events & Tools

Ready for input...

💡 Analysis Report Final Answer

This "Glass Box" approach allows operators to trust the AI's conclusions by verifying the sources (Wikipedia, DuckDuckGo) and reasoning steps (ThinkTool) used to reach them.

Under the Hood

The BeeAI Analyst is built for performance and privacy. Unlike cloud-based agents, this entire stack runs locally on your machine, ensuring no sensitive data leaves your network.

1. FastAPI & Async Architecture

The backend is powered by FastAPI and Uvicorn, utilizing Python's asyncio to handle multiple concurrent connections without blocking. We use Server-Sent Events (SSE) to stream the agent's thought process in real-time to the frontend, providing immediate feedback to the user.

2. Resource Management with Semaphores

Running large language models (LLMs) locally is GPU-intensive. To prevent Out-Of-Memory (OOM) errors, the system implements an Async Semaphore (gpu_semaphore). This acts as a traffic controller, ensuring that only one heavy inference task occupies the GPU at a time, while queuing other requests efficiently.

3. The BeeAI Framework

At the core lies the BeeAI Framework. It orchestrates the agent's lifecycle:

ThinkTool: Allows the agent to pause and plan its next steps.
Research Tools: Integration with DuckDuckGo, Wikipedia, and OpenMeteo for real-world data.
Memory: Unconstrained memory allows the agent to retain context throughout the session.

4. Local LLM via Ollama

The intelligence is provided by Ollama, running a custom-tuned gemma-agent model. By using the OpenAI-compatible endpoint, we can swap underlying models (Llama 3, Mistral, Gemma) without changing a single line of application code.

5. RAG & Document Intelligence with Docling

The analyst features advanced Retrieval-Augmented Generation (RAG) capabilities. Users can upload various file formats (PDF, DOCX, images), which are processed using Docling for high-quality text extraction. The content is then chunked and stored in a local vector database, allowing the agent to provide context-aware answers based on your private documents.

Explore the Code

The full source code, including the FastAPI server, agent configuration, and frontend templates, is available on GitHub.

View on GitHub →