ApertureData

Remember when smart AI agents were mostly just a cool theory paper? Well, those days are over.

In Part 1, we explored why the right infrastructure—especially databases built for more than just plain text—is foundational for building capable smart AI agents. Because let’s face it: if your agent can’t process images, video, PDFs or other multimodal data , it’s not exactly ready for the real world.

In Part 2, we cracked open the “brain” of an agent and showed how multimodal data fuels richer reasoning, contextual understanding, and better decision-making. TL;DR: Smart data leads to smart actions.

Now in Part 3, we go beyond architecture and design into actual deployment—Agents in the wild. These aren’t cherry-picked demos or academic one-offs, but real-world projects, built by teams pushing the boundaries of what smart AI agents can do with ApertureDB as a core part of the stack.

Let’s take a closer look at how teams are making it happen.
‍

🧠 Agentic RAG with SmolAgents: Smarter Search That Thinks for Itself

‍

"I wanted to move beyond theory and show how Agentic RAG can practically address some of the core limitations of Vanilla RAG."
—*Haziqa Sajid, Data Scientist*

‍

Vanilla RAG has a fatal flaw: once it retrieves bad results, it’s game over. No retries, no corrections, no hope. Enter Agentic RAG—a smarter way to search. Built using Hugging Face’s SmolAgents and powered by ApertureDB, this project by community member Haziqa Sajid tackles academic paper overload.

By giving RAG a brain and the ability to reason, it brings structure and clarity to a sea of information. Instead of relying on brittle keyword search, this smart AI agent refines queries, reruns searches, and even shifts strategies if results fall short.

‍
The Workflow

Extracts information from complex, multimodal documents (like PDFs).
‍
Allows agents to re-ask, refine, and retry when they hit a dead end.
‍
Links semantic memory with structured metadata—crucial for good retrieval.

How The System Uses ApertureDB As The Brain the Brain Behind The Agent

Store vector embeddings from academic PDFs
‍
Enable multimodal retrieval (text, structure, even metadata)
‍
Provide context for LLMs to reason more accurately

The Result

A research assistant that doesn’t hallucinate—it finds actual answers, complete with citations. Whether you are looking for papers on Type II Diabetes or troubleshooting obscure medical treatments, this agent can crawl, adapt, and deliver.

This is not a trivial demo, but a production-grade, agent-powered research system. A template for anyone building smarter knowledge copilots, with a unified multimodal memory at its core. Exactly what ApertureDB was purpose built for.

📖 Read the blog

‍

🛋️ People Coming Over – The Personalized Agent That Shops So You Don’t Have To

‍

“We wanted to free people from Amazon, Reddit, and Shopify—by actually solving their shopping needs with an agent that understands aesthetics and their needs.”
— Team People Coming Over

We have all been there: guests are coming over, your room looks sad, you are not alone. There is even a subreddit about it: r/malelivingspace.

But instead of browsing IKEA or hiring an interior decorator, you snap a photo and let your AI agent handle it.

This is People Coming Over, an image-to-action agent that analyzes your room and makes it better.

‍

The Workflow

Perceive: You upload a photo of your space.
Understand: The agent uses vision models to detect layout, objects, and overall vibe — clean, cluttered, cozy, chaotic?
Plan: It reasons about improvements: “Add a floor lamp here,” “Move the couch closer to the window,” or “Swap the art for something brighter.”
Act: It generates a shortlist of products and links to buy them, based on your preferences and budget.

Why ApertureDB?

Behind the scenes, this agent needs to store and correlate:

Input photos
Object detections and layouts
Embeddings for visual similarity search
LLM-generated suggestions and reasoning chains
Product metadata and user preferences

That is a lot of heterogeneous, linked data, and it has to be queried in-memory, in real-time, as the agent reasons and responds. With ApertureDB:

Images, metadata, embeddings, and text are stored together in a single multimodal graph.
The smart agent performs cross-modal search — e.g., “Find couches that look like this and match this vibe.”
Latency stays low, because data is retrieved in milliseconds, not stitched together from S3, Redis, and Postgres or other third party tool configurations.

The Result

A multimodal shopping experience powered by ApertureDB. It is like having a design expert, research assistant, and fulfillment coordinator rolled into one. But instead of juggling three apps and ten browser tabs, a single agent handles everything—from analyzing your space to recommending products and managing orders—at a fraction of the cost. No duct tape. No hacks. Just fast, intelligent decision-making—built on ApertureDB, the data foundation purpose-built for Agentic AI.

🔗 Explore the project

‍

💻 4. Metis AI – A Browser Agent That Automates SaaS Tasks

“Browser agents are still new—but by grounding them in real documentation, we made them useful.”
— Team Metis

‍

SaaS tools are everywhere—but learning how to use them still feels like a chore. Metis AI aims to change that.
‍

The Workflow

Scrape how-to docs, screen recordings, and video tutorials.
Convert them into multimodal instructions using ScribeAI.
Store those instructions in ApertureDB as vectorized memory.
Use Gemini to match user intent to the right actions, then execute them.

Unlike other agents, Metis can perform complex, multi-step actions on SaaS platforms by grounding its understanding in prior examples.

‍

ApertureDB Plays A Critical Role In:

Storing and retrieving image+text-based guides for the web agent.
Enhancing agent memory to complete tasks more accurately.
Tracking agent performance using Weight & Basis Weave evaluator.

The Result

By leveraging ApertureDB, Metis’ AI team enhanced the performance and capabilities of a web agent by augmenting it with prior knowledge of the platform. Using tutorial videos, generated structured instructions for a multimodal agent, enabling it to:

Execute multiple steps in response to a single customer query
Incorporate prior context to improve task execution
Track accuracy using an evaluator component integrated with W&B Weave

🔗See The Tech Behind Metis

ApertureDB Makes Agentic AI Work—At Scale
‍

What once felt like science fiction—agents that see, reason, and act—is now very real. Across these four projects, we have seen how developers are moving beyond prototypes and academic papers to deploy production-grade AI agents that actually do the work.

From automating SaaS workflows to decoding research papers, redesigning living rooms, and navigating complex interfaces like a human, one common thread runs through them all: ApertureDB.

It is the memory layer that enables agents to retain, retrieve, and reason across multimodal data— videos, images, text, structure, embeddings, and metadata—at scale and in real-time. No brittle pipelines. No fragmented stores. Just a unified, queryable system built for agents that need to think fast and act smarter.

The future of AI isn't just large models—It is intelligent agents with the power to reason, recall, and respond. ApertureDB is how you build them.

👉 Start your free trial of ApertureDB Cloud and give your agents the memory they deserve.

‍

Tags:

AI Agents

Related Blogs

Human Memory As The Perfect Template For AI Memory

Blogs

October 23, 2025

Human Memory As The Perfect Template For AI Memory

This blog explores how human memory inspires the next generation of AI systems that don’t just recall data but learn, adapt, and reason through context. By mirroring how we process multimodal information, AI memory can evolve into a living, dynamic engine for intelligent decision-making.

Watch Now

ApertureDB and AI Workflows: Building Blocks of Multimodal AI Applications

Blogs

September 1, 2025

ApertureDB and AI Workflows: Building Blocks of Multimodal AI Applications

ApertureDB AI Workflows are designed to simplify the creation of multimodal AI applications by providing modular, flexible, and purpose-built components for AI pipelines. These workflows automate common AI/ML tasks such as data ingestion, search, and data correlation, integrating with ApertureDB's graph, vector, and multimodal capabilities, and partnering with models and services from other tools.

Watch Now

Automating Knowledge Graph Creation with Gemini and ApertureDB – Part 3

August 4, 2025

Automating Knowledge Graph Creation with Gemini and ApertureDB – Part 3

In Part 3 of her blog series, Ayesha Imran moves beyond symbolic structure, adding a semantic layer with Gemini embeddings and ingest vectors into ApertureDB native database to enable the hybrid retrieval that makes GraphRAG possible.

Watch Now

What Does Multimodality Truly Mean For AI?

Blogs

July 1, 2025

What Does Multimodality Truly Mean For AI?

For human quality AI or better, applications based on classic ML to Gen AI to AI agents, will have to be based on multimodal data since we, as humans, process a combination of text, voice, imagery to, relationships to answer questions or decide what we want to do. We explore what that really means.

Watch Now

Building Real World RAG-based Applications with ApertureDB

Blogs

Nov 21, 2024

Building Real World RAG-based Applications with ApertureDB

Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Blogs

Oct 15, 2024

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Visual data or image/video data is growing fast. ApertureDB is a unique database...

Blogs

Oct 15, 2024

What’s in Your Visual Dataset?

CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...

Transforming Retail and Ecommerce with Multimodal AI

Blogs

Oct 15, 2024

Transforming Retail and Ecommerce with Multimodal AI

Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Multimodal AI, vector databases, large language models (LLMs)...

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

Blogs

Oct 15, 2024

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

With extensive data systems needed for modern applications, costs...

Minute-Made Data Preparation with ApertureDB

Blogs

Oct 15, 2024

Minute-Made Data Preparation with ApertureDB

Working with visual data (images, videos) and its metadata is no picnic...

Why Do We Need A Purpose-Built Database For Multimodal Data?

Blogs

Oct 15, 2024

Why Do We Need A Purpose-Built Database For Multimodal Data?

Recently, data engineering and management has grown difficult for companies building modern applications...

Building a Specialized Database for Analytics on Images and Videos

Blogs

Oct 15, 2024

Building a Specialized Database for Analytics on Images and Videos

ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Multimodal AI, vector databases, large language models (LLMs)...

Challenges and Triumphs: Multimodal AI in Life Sciences

Blogs

Oct 15, 2024

Challenges and Triumphs: Multimodal AI in Life Sciences

AI presents a new and unparalleled transformational opportunity for the life sciences sector...

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

Blogs

Oct 15, 2024

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

The data landscape has dramatically changed in the last two decades...

Can A RAG Chatbot Really Improve Content?

Blogs

Oct 15, 2024

Can A RAG Chatbot Really Improve Content?

We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..

Blogs

Oct 15, 2024

ApertureDB Now Available on DockerHub

Getting started with ApertureDB has never been easier or safer...

Are Vector Databases Enough for Visual Data Use Cases?

Blogs

Oct 15, 2024

Are Vector Databases Enough for Visual Data Use Cases?

ApertureDB vector search and classification functionality is offered as part of our unified API defined to...

Accelerate Industrial and Visual Inspection with Multimodal AI

Blogs

Oct 15, 2024

Accelerate Industrial and Visual Inspection with Multimodal AI

From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...

ApertureDB 2.0: Redefining Visual Data Management for AI

Blogs

Oct 15, 2024

ApertureDB 2.0: Redefining Visual Data Management for AI

A key to solving Visual AI challenges is to bring together the key learnings of...

Smart AI Agents In the Wild