ApertureData

Introduction

Welcome back to Part 2 of our tutorial series on automating knowledge graph creation using Google's Gemini 2.5 and ApertureDB! In Part 1, we established a strong foundation by leveraging Gemini's contextual understanding capabilities to extract structured entities from textual documents, deduplicating these entities, and securely storing them within ApertureDB - a powerful multimodal database optimized for managing complex, interconnected data.

Knowledge graphs offer tremendous value by making data relationships explicit and visually intuitive, significantly improving systems like Retrieval-Augmented Generation (RAG) for more accurate and transparent outcomes. Building on the groundwork from Part 1, we’ll now focus on defining clear relationships among the stored entities and visualizing the complete knowledge graph interactively. By the end of this tutorial, you'll have a dynamic and insightful graph visualization ready to enhance your data-driven applications and analyses.

Please note that the code snippets in this blog have been shortened for brevity. You can find the complete code in the Colab Notebook.

Understanding the Components

To build our knowledge graph, we'll use:

ApertureDB: A specialized multimodal database designed to handle structured metadata along with text, embeddings, images, videos, and other rich media. It excels in rapidly querying and managing complex inter-entity relationships.
Gemini 2.5 Flash: Google's cutting-edge large language model (LLM) providing extensive contextual memory and speedy response times, ideal for detailed content extraction and analysis from lengthy documents.
LangChain: An efficient workflow orchestration tool that enables parallel execution of tasks using the RunnableLamba .batch() functionality, speeding up our entire pipeline significantly.

Workflow Overview

Our knowledge graph creation follows a structured workflow:

Entity Class Schema Extraction: Identify general classes and their properties using Gemini.
‍
Entity Instance Extraction: Extract specific instances of these classes in parallel.
‍
Deduplication and ID Assignment: Clean up duplicates and assign unique IDs for each entity.
‍
Insert Entities in ApertureDB: Insert entities into the ApertureDB instance along with all their properties.
‍
Relationship Extraction: Clearly define explicit relationships between entities through Gemini.
‍
Knowledge Graph Creation in ApertureDB: create connections between the entities in ApertureDB.
‍
Visualization: Interactively visualize the constructed knowledge graph using PyVis.

In Part 1, we covered steps 1 - 4. In this part we’ll cover steps 5 - 7, providing detailed insights and practical code snippets to guide you through the creation of your own powerful knowledge graph.

Step 5: Relationship Extraction

Once entities are clearly defined and deduplicated, extracting explicit relationships between them is crucial for constructing a meaningful knowledge graph. Leveraging Gemini’s deep contextual awareness, we accurately identify and describe relationships such as employment, education, or location-based connections. Again, we utilize chunking and parallel processing here because doing this in one go is too difficult, time-consuming and may potentially give lower-quality results. We use the previously-assigned unique IDs here for reference to entities, along with their class name. The powerful capabilities of Gemini 2.5 shine brightest here.

Pydantic Models for Structured Outputs

Once more we utilize Pydantic models to define a clear output format for consistent and accurate output from Gemini:

‍

class EntityReference(BaseModel):
    class_type: str
    id: int
class Relationship(BaseModel):
    relationship: str
    source: EntityReference
    destination: EntityReference


class RelationshipExtractionResult(BaseModel):
    relationships: List[Relationship]

‍Prompt Template
‍

prompt = PromptTemplate(
    template="""
    You are the fourth agent in a multi-step workflow to build a Knowledge Graph from raw text.


    Workflow Steps Overview:
    1. Extract high-level class types and their properties. [DONE]
    2. Extract specific entities and their properties. [DONE]
    3. Deduplicate and assign IDs. [DONE]
    4. Identify relationships between entities. [CURRENT]
    5. Build the graph.


    YOUR TASK:
    - Extract relationships explicitly stated in the text.
    - Use known entity IDs and class types.
    - Common relationships: "works_at", "part_of", "connected_to", etc.
    - Do not infer or guess.


    ORIGINAL TEXT:
    {input_text}


    CLASS ENTITIES:
    {class_entities}


    FORMAT YOUR RESPONSE:
    {{
      "relationships": [
        {{
          "relationship": "works_at",
          "source": {{"class_type": "Person", "id": 1}},
          "destination": {{"class_type": "Company", "id": 3}}
        }}
      ]
    }}


    {format_instructions}
    Begin your extraction now:
    """,
    input_variables=["input_text", "class_entities"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

Parallel Processing
‍

def extract_relationships_from_chunk(data):
    text, entities = data
    entity_refs = [
        {"name": name, "class": obj["Class"], "id": props["id"]}
        for obj in entities for name, props in obj["Entities"].items()
    ]
    return retry_llm_call(lambda: chain.invoke({
        "input_text": text,
        "class_entities": json.dumps(entity_refs, indent=2)
    }))


data = [(chunk.page_content, deduplicated_entities) for chunk in chunks]
results = RunnableLambda(extract_relationships_from_chunk).batch(data, config={"max_concurrency": 6})
extracted_relationships = merge_relationships(results)
save_json({"relationships": extracted_relationships}, "step4_output.json")

Here’s a sample taken from the relationship extraction step:
‍

{
  "relationships": [
    {
      "relationship": "is_an_example_of",
      "source": {
        "class_type": "Computing System",
        "id": 3
      },
      "destination": {
        "class_type": "Computing System",
        "id": 1
      }
    },...

Gemini's context-awareness ensures the relationships identified are accurate and explicitly supported by the text, creating a trustworthy foundation for our knowledge graph.

Step 6: Creating Relationships in ApertureDB

We have already inserted all the entities into our ApertureDB instance previously. In this step, we will create connections between entities in ApertureDB using the extracted relationships.

Creating Relationships (aka Connections):

This step creates edges between entity nodes to form our knowledge graph, using ApertureDB connections. We use our relationship schema dict here to easily create relationships using unique ids. Again we use ApertureDB’s ParallelLoader to speed things up.
‍

def prepare_relationship_data(relationships):
    data = []
    for r in relationships:
        query = [
            {"FindEntity": {"with_class": r["source"]["class_type"], "constraints": {"id": ["==", r["source"]["id"]]}, "_ref": 1}},
            {"FindEntity": {"with_class": r["destination"]["class_type"], "constraints": {"id": ["==", r["destination"]["id"]]}, "_ref": 2}},
            {"AddConnection": {
                "class": r["relationship"],
                "src": 1,
                "dst": 2,
                "properties": {"created_at": time.strftime("%Y-%m-%d %H:%M:%S")}
            }}
        ]
        data.append((query, []))
    return data


def insert_relationships_parallel_loader(client, relationships):
    data = prepare_relationship_data(relationships)
    ParallelLoader(client).ingest(generator=data, batchsize=20, numthreads=4, stats=True)
    print(f"Created {len(data)} relationships.")


# Run
insert_relationships_parallel_loader(client, extracted_relationships)

We can see on our ApertureDB dashboard that the connections between entities have indeed been made:
‍

Optional: Connect Entities to PDF Source

If you inserted the entire document as blob into your database instance previously, you can now connect it with every single entity using a "belongs_to_data_source" relationship:
‍

def connect_entities_to_source(client, entities, pdf_blob_id=0):
    data = []


    for obj in entities:
        cls = obj["Class"]
        for _, props in obj["Entities"].items():
            eid = props["id"]
            query = [
                {"FindEntity": {"with_class": cls, "constraints": {"id": ["==", eid]}, "_ref": 1}},
                {"FindBlob": {"constraints": {"id": ["==", pdf_blob_id]}, "_ref": 2}},
                {"AddConnection": {
                    "class": "belongs_to_data_source",
                    "src": 1,
                    "dst": 2,
                    "properties": {"created_at": time.strftime("%Y-%m-%d %H:%M:%S")}
                }}
            ]
            data.append((query, []))


    ParallelLoader(client).ingest(generator=data, batchsize=50, numthreads=4, stats=True)
    print(f"Linked {len(data)} entities to the source document.")

‍
Performance Highlights:

Parallel insertion dramatically reduces processing time.
Entity and relationship insertion logic utilizes efficient batch processing and threading.
Proper indexing in ApertureDB (on entity IDs) further enhances query performance.

5. Step 7: Visualizing the Knowledge Graph

To intuitively explore and understand our knowledge graph, we use PyVis, a powerful Python library for interactive visualization. We first query ApertureDB to fetch entities and their relationships, construct a NetworkX graph from the data, and then use PyVis to generate an interactive HTML visualization, complete with a color-coded legend for entity classes:
‍

def visualize_graph_from_aperturedb(client):
    query = [
        {"FindEntity": {"results": {"all_properties": True}}},
        {"FindConnection": {"results": {"all_properties": True}}}
    ]
    response, _ = client.query(query)
    entities = response[0]["FindEntity"]["entities"]
    connections = response[1]["FindConnection"]["connections"]


    id_map = {int(e["id"]): e["_uniqueid"] for e in entities if "id" in e}
    class_map = {e["class"]: [] for e in entities if "class" in e}


    G = nx.Graph()
    colors = ["#4287f5", "#f54242", "#42f551", "#f5d142"]
    color_map = {cls: colors[i % len(colors)] for i, cls in enumerate(class_map)}


    for e in entities:
        uid = e["_uniqueid"]
        label = e.get("name", f"Entity_{e.get('id', 'unknown')}")
        G.add_node(uid, label=label, color=color_map.get(e.get("class", "Unknown")), title=str(e))


    for c in connections:
        src = id_map.get(c.get("src_id"))
        dst = id_map.get(c.get("dst_id"))
        if src and dst:
            G.add_edge(src, dst, label=c.get("type", "related"))


    net = Network(height="600px", width="100%", bgcolor="#222222", font_color="white")
    net.from_nx(G)
    net.save_graph("aperturedb_knowledge_graph.html")

‍

Result:

Interacting with the generated graph allows users to discover entity clusters, navigate relationship paths, and explore insights from the data effortlessly.

6. Practical Use Cases and Next Steps

The knowledge graph constructed through this workflow has broad applications across multiple domains:

Enhanced Information Retrieval: Query entities and their relationships with precision, enabling structured semantic search over large textual corpora.
Customer Support Systems: Automatically extract and link entities such as products, services, issues, and solutions from support documents, forming the backbone of intelligent FAQ systems.
Educational Tools: Organize and visualize learning materials by topics, concepts, and their interdependencies—ideal for building interactive curricula or study guides.
Data Integration: Merge semi-structured data from disparate sources into a unified graph, simplifying analysis and downstream reasoning tasks.

This tutorial focused on building the graph. In a follow-up post, we’ll explore how to integrate this graph with Retrieval-Augmented Generation (RAG) pipelines, enhancing the factual accuracy and context relevance of LLM responses by grounding them in structured knowledge - ApertureDB further facilitates RAG due to its support for vector embeddings storage and vector search operations. If you stored the document alongside the rest of the extracted information, you can use them to validate content in the graph or eventually embed them for fast semantic search, also supported by ApertureDB.

Conclusion

This end-to-end pipeline demonstrates how combining the graph and multimodal capabilities of ApertureDB with the reasoning capabilities of Gemini 2.5 Flash results in an efficient and scalable method for knowledge graph generation. From schema extraction to entity resolution and relationship mapping, each step is optimized for clarity, robustness, and performance.

The resulting knowledge graph is not only immediately useful for querying and visualization, but it also lays the foundation for advanced applications in AI-driven retrieval, summarization, and question answering. And this is just a textual use case; you can use largely the same workflow - albeit with some added complexities - to make knowledge graphs from image, video, and/or audio data thanks to ApertureDB’s multimodal data support.

We encourage you to experiment with your own documents, adapt this pipeline to your needs, and build domain-specific knowledge graphs that support your unique data and objectives.

Appendix

For your reference and further exploration:

👉 ApertureDB is available on Google Cloud. Subscribe Now‍

‍Author: Ayesha Imran | LinkedIn | Github

I am a software engineer passionate about AI/ML, Generative AI, and secure full-stack app development. I’m experienced in RAG and Agentic AI systems, LLMOps, full-stack development, cloud computing, and deploying scalable AI solutions. My ambition lies in building and contributing to innovative projects with real-world impact. Endlessly learning, perpetually building.

Tags:

Knowledge graph and graph databases

Retrieval augmented generation (RAG)

Related Blogs

Automating Knowledge Graph Creation with Gemini 2.5 and ApertureDB - Part 1

Blogs

June 12, 2025

Automating Knowledge Graph Creation with Gemini 2.5 and ApertureDB - Part 1

This blog shows how to build a knowledge graph using ApertureDB and Gemini 2.5 Flash to power smarter RAG systems. Part 1 covers extracting and storing entities, enabling real-world use cases like semantic search and AI-powered customer support.

Watch Now

Beyond Vanilla RAG: Unlocking Enhanced Retrieval with GraphRAG and ApertureDB

Blogs

April 2, 2025

Beyond Vanilla RAG: Unlocking Enhanced Retrieval with GraphRAG and ApertureDB

Unlock the power of GraphRAG for better AI retrieval. Learn how ApertureDB enables structured knowledge graphs for accurate, context-rich LLM responses in addition to its vector search and multimodal data management capabilities.

Watch Now

Blogs

February 10, 2025

Is Your Chatbot Secure?

ApertureData and Realm Labs help developers build secure RAG chatbots by combining advanced permissions management with graph-vector storage, ensuring data protection and efficient access control.

Watch Now

Agentic RAG with ApertureDB and HuggingFace SmolAgents

Blogs

February 7, 2025

Agentic RAG with ApertureDB and HuggingFace SmolAgents

Agentic RAG is the future of LLM applications! This blog article shows you how to build a powerful research paper search engine using ApertureDB & Huggingface SmolAgents.

Watch Now

Building Real World RAG-based Applications with ApertureDB

Blogs

Nov 21, 2024

Building Real World RAG-based Applications with ApertureDB

Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Blogs

Oct 15, 2024

Managing Visual Data for Machine Learning and Data Science. Painlessly.

Visual data or image/video data is growing fast. ApertureDB is a unique database...

Blogs

Oct 15, 2024

What’s in Your Visual Dataset?

CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...

Transforming Retail and Ecommerce with Multimodal AI

Blogs

Oct 15, 2024

Transforming Retail and Ecommerce with Multimodal AI

Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1

Multimodal AI, vector databases, large language models (LLMs)...

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

Blogs

Oct 15, 2024

How a Purpose-Built Database for Multimodal AI Can Save You Time and Money

With extensive data systems needed for modern applications, costs...

Minute-Made Data Preparation with ApertureDB

Blogs

Oct 15, 2024

Minute-Made Data Preparation with ApertureDB

Working with visual data (images, videos) and its metadata is no picnic...

Why Do We Need A Purpose-Built Database For Multimodal Data?

Blogs

Oct 15, 2024

Why Do We Need A Purpose-Built Database For Multimodal Data?

Recently, data engineering and management has grown difficult for companies building modern applications...

Building a Specialized Database for Analytics on Images and Videos

Blogs

Oct 15, 2024

Building a Specialized Database for Analytics on Images and Videos

ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Blogs

Oct 15, 2024

Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2

Multimodal AI, vector databases, large language models (LLMs)...

Challenges and Triumphs: Multimodal AI in Life Sciences

Blogs

Oct 15, 2024

Challenges and Triumphs: Multimodal AI in Life Sciences

AI presents a new and unparalleled transformational opportunity for the life sciences sector...

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

Blogs

Oct 15, 2024

Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?

The data landscape has dramatically changed in the last two decades...

Can A RAG Chatbot Really Improve Content?

Blogs

Oct 15, 2024

Can A RAG Chatbot Really Improve Content?

We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..

Blogs

Oct 15, 2024

ApertureDB Now Available on DockerHub

Getting started with ApertureDB has never been easier or safer...

Are Vector Databases Enough for Visual Data Use Cases?

Blogs

Oct 15, 2024

Are Vector Databases Enough for Visual Data Use Cases?

ApertureDB vector search and classification functionality is offered as part of our unified API defined to...

Accelerate Industrial and Visual Inspection with Multimodal AI

Blogs

Oct 15, 2024

Accelerate Industrial and Visual Inspection with Multimodal AI

From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...

ApertureDB 2.0: Redefining Visual Data Management for AI

Blogs

Oct 15, 2024

ApertureDB 2.0: Redefining Visual Data Management for AI

A key to solving Visual AI challenges is to bring together the key learnings of...

Automating Knowledge Graph Creation with Gemini 2.5 and ApertureDB - Part 2

Introduction

Understanding the Components

Workflow Overview

Step 5: Relationship Extraction

Pydantic Models for Structured Outputs

Step 6: Creating Relationships in ApertureDB

Creating Relationships (aka Connections):

5. Step 7: Visualizing the Knowledge Graph

6. Practical Use Cases and Next Steps

Conclusion

Appendix

👉 ApertureDB is available on Google Cloud. Subscribe Now‍

Related Blogs

Ready to Accelerate your AI Workflows?