The BeeAI Security Analyst
While CLI agents are powerful, sometimes a visual dashboard is essential for monitoring complex, multi-step reasoning processes. The BeeAI Analyst combines the raw power of local LLMs with a reactive FastAPI web interface.
Below is a simulation of the actual web interface. Watch as the agent receives a high-level security mandate, plans its research using ThinkTool, gathers intelligence via DuckDuckGo and Wikipedia, and synthesizes a final strategic report.
🛡️ BeeAI Analyst (FastAPI)
This "Glass Box" approach allows operators to trust the AI's conclusions by verifying the sources (Wikipedia, DuckDuckGo) and reasoning steps (ThinkTool) used to reach them.
Under the Hood
The BeeAI Analyst is built for performance and privacy. Unlike cloud-based agents, this entire stack runs locally on your machine, ensuring no sensitive data leaves your network.
1. FastAPI & Async Architecture
The backend is powered by FastAPI and Uvicorn, utilizing Python's asyncio to handle multiple concurrent connections without blocking. We use Server-Sent Events (SSE) to stream the agent's thought process in real-time to the frontend, providing immediate feedback to the user.
2. Resource Management with Semaphores
Running large language models (LLMs) locally is GPU-intensive. To prevent Out-Of-Memory (OOM) errors, the system implements an Async Semaphore (gpu_semaphore). This acts as a traffic controller, ensuring that only one heavy inference task occupies the GPU at a time, while queuing other requests efficiently.
3. The BeeAI Framework
At the core lies the BeeAI Framework. It orchestrates the agent's lifecycle:
- ThinkTool: Allows the agent to pause and plan its next steps.
- Research Tools: Integration with DuckDuckGo, Wikipedia, and OpenMeteo for real-world data.
- Memory: Unconstrained memory allows the agent to retain context throughout the session.
4. Local LLM via Ollama
The intelligence is provided by Ollama, running a custom-tuned gemma-agent model. By using the OpenAI-compatible endpoint, we can swap underlying models (Llama 3, Mistral, Gemma) without changing a single line of application code.
5. RAG & Document Intelligence with Docling
The analyst features advanced Retrieval-Augmented Generation (RAG) capabilities. Users can upload various file formats (PDF, DOCX, images), which are processed using Docling for high-quality text extraction. The content is then chunked and stored in a local vector database, allowing the agent to provide context-aware answers based on your private documents.
Explore the Code
The full source code, including the FastAPI server, agent configuration, and frontend templates, is available on GitHub.
View on GitHub →