Sometimes, our users are surprised: “What? You have a database but I cannot use SQL to query it?” (ok that has now changed, more on it later) Every once in a while, a few of them will follow that up with: “Well then you must support Cypher or GraphQL or Gremlin?” I usually take a deep breath and then launch into all the reasons why none of those worked for us, and I have had reasonable success in convincing people why they should give ApertureDB query language a chance. I have even had some users come and say: “Given all that your database supports, we now see why you needed to define a different query language” !

Before we can talk about query language for a multimodal AI database, what is a multimodal AI database? We did a deep dive on the true meaning of multimodality in this article. Fundamentally, a true multimodal AI database not only allows you to search with multimodal vector indexes but also helps you connect the dots between various modalities of data, manage it at scale, and lets you prepare or process the data in the format you need. Now let’s take a look at why existing database query languages were not enough for a multimodal AI database like ApertureDB and why we built ApertureDB query language the way we did.

Popular Database Languages and APIs

SQL’s Staying Power Challenged By Multimodality

SQL has been around forever. People “claim” to love it and some even swear off of any other database query language by not using databases that don’t offer a SQL query interface. Some convert other data models to SQL tables before retrieving the data they need! Then why did we even consider an alternative? What made us even think of a different query language? Let’s travel back in time and analyze the reasons that led to this decision. We can reuse some example queries we used back when we started building ApertureDB and had to make this choice as well as the ones we encounter now with Generative AI applications.

Imagine how you would implement the following queries:

A user has shared a handbag image they like. Let’s show them similar handbags but the vendor wants to include only the handbags from the summer 2025 catalog and within 50-150$. Images should be the best angle and cropped to display.
User typed a natural language query asking for highlights, pictures, video snippets, and a list of performers from a Taylor swift concert attended by more than 20,000 people in midwest America. We need to modify all the data for display on the user’s browser.

Clearly the first part of both the queries fits right in with a multimodal vector search (you can learn more about that here) query which is now supported by not only some standalone vector databases but also most of the popular relational databases. However, if you have a large catalog and 10s of millions of embeddings to search over, this may become a scaling issue for some incumbent databases. Ignoring that, the next step would be filtering the results of the vector search to apply the additional metadata constraints required by those queries. For a normalized relational database (which would be standard practice especially at large organizations), that means joins, likely two going into a catalog table and a product table for the first query and a concert table followed by performers’ table for the second. If there were no further constraints on how to choose the image or the other media, we would just retrieve the URLs and the display client (or a browser) could then fetch the multimodal data. However, we need to choose the best image and crop for the first query as well as prepare various media types for display in the second. This leads to another join query on the best image table followed by some OpenCV / FFMPEG access to prepare the data for display.

It’s this sequence of steps and onus placed on AI teams that made us question the current solutions. There had to be something better! After all, removing the need to do large Frankenstein DIY data solutions was what led us to building ApertureDB.

*Image with annotations, metadata, and you can do similarity search, all in the one database with simple JSON queries*

Even so, why then did we not try to make it work with SQL? JOINs already introduce quite some complexity into SQL (ever tried implementing graph traversals in SQL?). On top of that, we would have needed to introduce the ability to preprocess data but with a wide range of preprocessing operations (along the lines of this research), which already meant breaking off from common SQL syntax. While extensions like PostGIS introduce new data types in SQL, we were looking into supporting a subset of existing operations that were really necessary for AI/ML applications on one hand and introducing a whole set of new data types on the other. It wasn’t just about retrieving a document, image, or video but the ability to look into the complex data type, express interesting components like bounding boxes in images or clips in audio/video, and the ability to process those. Together, it would have meant a much more extensive set of updates to the SQL standard.

Every time we would need to introduce a new data type to support, it would mean modifying SQL syntax. It would not only hurt the speed of our database development, it would do so without really remaining fully compatible with SQL and at the cost of continuing to propagate multiple JOINs as the best way of solving such queries, thus requiring expensive optimizations or hurting performance (after all, there is a reason why there is so much research on optimizing JOIN performance through views, query planning, and so on). Multimodality was the final nail in the SQL coffin for us.

TLDR: We were not willing to give up performance, scalability, or our ability to build fast for the sake of retrofitting multimodal AI search and processing operations in a language built for structured row, column compatible data!

Graph Query Languages

Graph query languages like Cypher, Gremlin, GraphQL, SPARQL have also been getting attention lately and are really good with connected data. Cypher particularly does a decent job of being SQL look-alike but with ways to express traversals and the others try to fallback in familiar object based access patterns. The languages here don’t limit complex searches which could offer users a great way to do data analytics provided the underlying graph database can scale and perform well which is not the topic of concern in this blog. Why then did we not adopt any one of these query languages ? Because we weren’t just building a graph database. Graph or the ability to store metadata was one of the requirements but all that we described above regarding vector search, data access, and data processing were all still just as important requirements. While common graph databases have been introducing vector search extensions, a lot of those would have required our users to implement support for these functions.

Pythonic Query Interfaces

Pythonic querying, seen in tools like Pixeltable, Pandas, and SQLAlchemy, uses native Python syntax to build queries through method chaining and object-oriented logic. It’s flexible for handling multimodal or nested data and integrates well with ML workflows. In contrast, database query languages like SQL, SPARQL, or Cypher are declarative and optimized for engine-side execution, offering better performance for joins, indexing, and large-scale structured data. Pythonic methods are often slower and memory-bound unless backed by optimized runtimes, but they excel in downstream AI pipelines and schema-fluid environments. The choice depends on where computation happens and whether flexibility or throughput is the priority. Of course, most applications rely on a combination of the two whenever working with complex data at scale.

Evolving Communication Standards via Custom JSON

As machine learning and LLM pipelines increasingly rely on flexible, schema-light data flows, SQL’s rigid tabular structure becomes a bottleneck rather than a backbone. JSON, by contrast, has emerged as the lingua franca of evolving communication powering everything from ML labeling tasks to ETL workflows for LLM fine-tuning. Even the Model Context Protocol (MCP), designed to standardize context injection and memory across agentic systems, adopts JSON as its foundational format, reinforcing its role in scalable, interoperable AI infrastructure. Its nested, expressive structure aligns naturally with multimodal data and memory architectures, enabling dynamic updates, sparse fields, and semantic richness that SQL simply can't accommodate without cumbersome workarounds. Of course, we can’t throw any JSON structure at the service, and we still need to conform to a protocol defined in a verifiable JSON schema, it is more easily customizable to a shape that can be validated.

Query Language and API for Multimodal Data

As we discussed earlier, legacy query interfaces were built for rows and columns, not for clips, frames, polygons, or multimodal context. If we are ultimately moving to natural language queries or voice AI interfaces, then why not use the interface that gives us the most flexibility and expressiveness to represent the various modalities of data, search, and operations in the database and build from there?

Breaking the Limitations of Legacy Interfaces – ApertureDB Query Language

We designed ApertureDB’s Query Language (AQL) to break free from traditional language constraints by adopting JSON as its native format. It is easy to read, extensible, and its verbosity was not a problem because we were dealing with larger data types which would be responsible for most of the query response times. It has allowed us to introduce data types and functionalities like image, video, embedding, clips, frames, polygons and intersections over unions (IoU) to find overlaps in interesting objects, leaving room for so many other modalities in the future. JSON also aligns with agentic workflows, supports sparse fields, and enables dynamic updates across modalities as we saw earlier.

Is it really easy in that case to gain adoption since it supports so much? Do new users not balk when they first see queries in AQL. Yes they sometimes do. However, as people gain familiarity with the syntax, it becomes simpler, and they often end up building their domain specific wrappers on it anyway (e.g. get_product_by_skuid() is a common e-commerce wrapper). Besides, we also offer other ways of accessing the database that acknowledge our users and applications, thus offering simpler interfaces that can be built on this more powerful JSON-based query layer. It’s the format that we predicted modern AI systems will speak and so we built our query language to speak it fluently. In this context, choosing JSON-first infrastructure isn't just a technical preference, it's a strategic shift toward systems that speak the language of modern AI.

*Search for movie posters with Johnny Depp in their label (which you could use our LabelStudio integration to label!)*

Verticalization and Simplification of AQL

As discussed just above, AQL’s expressiveness means that first-time users may feel overwhelmed. That’s why we have a layered interface strategy: graphical frontends for visual query construction, Python object wrappers tailored to abstract classic AI operations, and simplified APIs that abstract complexity for application verticals. For developers and analysts, our Python wrappers offer object-oriented access to datasets, simplifying query logic and accelerating iteration. Together, these interfaces form a flexible ecosystem that respects user preferences while unlocking the full capabilities of our JSON-first architecture.These verticalized tools let users engage at their comfort level while preserving the full power of AQL underneath.

Multimodal UI

Multimodality is not easy to navigate beyond just sharing URLs around. If you want to navigate multimodal datasets, search them by natural language queries but not have to fight your tooling ecosystem to display them easily, you need a frontend that is built with multimodal data in mind. That’s exactly what ApertureDB UI is intended to do. You can do no-code searches for your images, videos, PDFs, run semantic searches across these data types, or use the graph schema to build your custom queries and visualize the results!

Wrappers for the Reluctant – SQL and SPARQL

Not everyone wants to write JSON to begin with and they might be working with tools like BI dashboards that have better support for SQL. Some users come from SQL-heavy backgrounds or prefer declarative syntax for familiar workflows. We meet them where they are. ApertureDB supports SQL-style wrappers for structured queries, enabling users to interact with multimodal data using familiar paradigms. These wrappers translate SQL-like inputs into AQL under the hood, bridging the tools and syntax they already know while seamlessly extending into multimodal search and processing. For users from the RDF graph world, we also have SPARQL interfaces. As of now, these are read-only but that’s really where most of the user requirements are.

Next Step – Natural Language

With JSON as the foundation and layered interfaces in place, the next frontier is natural language and ApertureDB is already there. We’ve built retrieval-augmented generation (RAG) and GraphRAG chatbots that serve as natural language interfaces to ApertureDB. Our MCP server plugin enables agents and chatbots to inject structured memory and multimodal context directly into queries, grounding natural language interactions in precise, queryable semantics. This natural language layer doesn’t replace AQL, it builds on it. With its suite of JSON-based query language, Python wrappers, UI, and now natural language retrieval methods, ApertureDB has made access to multimodal data for AI pretty straightforward. Going forward, we will be focusing on improving the support for multimodality and knowledge graphs through our MCP server as well as building an Agentic memory layer to power future AI applications.

In closing, I would add, let the problem and its perfect solution prevail over a perceived language barrier!

‍

Improved with feedback from Nolan Nichols (Stealth Biotech Startup), Drew Ogle, Sonam Gupta (Telnyx), Deniece Moxy

Images by Volodymyr Shostakovych, Senior Graphic Designer

Tags:

Multimodal / Generative AI

Usability and Debugging

Related Blogs

ApertureDB and AI Workflows: Building Blocks of Multimodal AI Applications

Blogs

September 1, 2025

ApertureDB and AI Workflows: Building Blocks of Multimodal AI Applications

ApertureDB AI Workflows are designed to simplify the creation of multimodal AI applications by providing modular, flexible, and purpose-built components for AI pipelines. These workflows automate common AI/ML tasks such as data ingestion, search, and data correlation, integrating with ApertureDB's graph, vector, and multimodal capabilities, and partnering with models and services from other tools.

Watch Now

The Misunderstood World of Knowledge Graphs

Blogs

July 21, 2025

The Misunderstood World of Knowledge Graphs

Graph databases are powerful in what they can let us build but there are a lot of misconceptions limiting their adoption. This blog addresses those and shows what's possible.

Watch Now

What Does Multimodality Truly Mean For AI?

Blogs

July 1, 2025

What Does Multimodality Truly Mean For AI?

For human quality AI or better, applications based on classic ML to Gen AI to AI agents, will have to be based on multimodal data since we, as humans, process a combination of text, voice, imagery to, relationships to answer questions or decide what we want to do. We explore what that really means.

Watch Now

Your Smart AI Agent Needs A Multimodal Brain

Blogs

June 16, 2025

Your Smart AI Agent Needs A Multimodal Brain

Smart AI agents need more than text to truly act like humans—they need unified memory across text, images, video, audio, and metadata. Part 2 of this 3 part series blog series explains how a purpose-built multimodal database like ApertureDB delivers that memory, enabling modern AI agents to perceive, reason, and act with real context and speed.

Watch Now

Building Real World RAG-based Applications with ApertureDB

Blogs

Nov 21, 2024

Building Real World RAG-based Applications with ApertureDB

Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Blogs

Oct 15, 2024

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Visual data or image/video data is growing fast. ApertureDB is a unique database...

Blogs

Oct 15, 2024

What’s in Your Visual Dataset?

CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...

Transforming Retail and Ecommerce with Multimodal AI

Blogs

Oct 15, 2024

Transforming Retail and Ecommerce with Multimodal AI

Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Multimodal AI, vector databases, large language models (LLMs)...

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

Blogs

Oct 15, 2024

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

With extensive data systems needed for modern applications, costs...

Minute-Made Data Preparation with ApertureDB

Blogs

Oct 15, 2024

Minute-Made Data Preparation with ApertureDB

Working with visual data (images, videos) and its metadata is no picnic...

Why Do We Need A Purpose-Built Database For Multimodal Data?

Blogs

Oct 15, 2024

Why Do We Need A Purpose-Built Database For Multimodal Data?

Recently, data engineering and management has grown difficult for companies building modern applications...

Building a Specialized Database for Analytics on Images and Videos

Blogs

Oct 15, 2024

Building a Specialized Database for Analytics on Images and Videos

ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Multimodal AI, vector databases, large language models (LLMs)...

Challenges and Triumphs: Multimodal AI in Life Sciences

Blogs

Oct 15, 2024

Challenges and Triumphs: Multimodal AI in Life Sciences

AI presents a new and unparalleled transformational opportunity for the life sciences sector...

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

Blogs

Oct 15, 2024

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

The data landscape has dramatically changed in the last two decades...

Can A RAG Chatbot Really Improve Content?

Blogs

Oct 15, 2024

Can A RAG Chatbot Really Improve Content?

We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..

Blogs

Oct 15, 2024

ApertureDB Now Available on DockerHub

Getting started with ApertureDB has never been easier or safer...

Are Vector Databases Enough for Visual Data Use Cases?

Blogs

Oct 15, 2024

Are Vector Databases Enough for Visual Data Use Cases?

ApertureDB vector search and classification functionality is offered as part of our unified API defined to...

Accelerate Industrial and Visual Inspection with Multimodal AI

Blogs

Oct 15, 2024

Accelerate Industrial and Visual Inspection with Multimodal AI

From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...

ApertureDB 2.0: Redefining Visual Data Management for AI

Blogs

Oct 15, 2024

ApertureDB 2.0: Redefining Visual Data Management for AI

A key to solving Visual AI challenges is to bring together the key learnings of...

Beyond SQL: The Query Language Multimodal AI Really Needs

Popular Database Languages and APIs

SQL’s Staying Power Challenged By Multimodality

Graph Query Languages

Pythonic Query Interfaces

Evolving Communication Standards via Custom JSON

Query Language and API for Multimodal Data

Breaking the Limitations of Legacy Interfaces – ApertureDB Query Language

Verticalization and Simplification of AQL

Multimodal UI

Wrappers for the Reluctant – SQL and SPARQL

Next Step – Natural Language

Related Blogs

Ready to Accelerate your AI Workflows?