Blogs

Automating Knowledge Graph Creation with Gemini 2.5 and ApertureDB - Part 2

June 13, 2025
10
 Ayesha Imran
Ayesha Imran
Automating Knowledge Graph Creation with Gemini 2.5 and ApertureDB - Part 2

Introduction

Welcome back to Part 2 of our tutorial series on automating knowledge graph creation using Google's Gemini 2.5 and ApertureDB! In Part 1, we established a strong foundation by leveraging Gemini's contextual understanding capabilities to extract structured entities from textual documents, deduplicating these entities, and securely storing them within ApertureDB - a powerful multimodal database optimized for managing complex, interconnected data.

Knowledge graphs offer tremendous value by making data relationships explicit and visually intuitive, significantly improving systems like Retrieval-Augmented Generation (RAG) for more accurate and transparent outcomes. Building on the groundwork from Part 1, we’ll now focus on defining clear relationships among the stored entities and visualizing the complete knowledge graph interactively. By the end of this tutorial, you'll have a dynamic and insightful graph visualization ready to enhance your data-driven applications and analyses.

Please note that the code snippets in this blog have been shortened for brevity. You can find the complete code in the Colab Notebook.

Understanding the Components

To build our knowledge graph, we'll use:

  • ApertureDB: A specialized multimodal database designed to handle structured metadata along with text, embeddings, images, videos, and other rich media. It excels in rapidly querying and managing complex inter-entity relationships.
  • Gemini 2.5 Flash: Google's cutting-edge large language model (LLM) providing extensive contextual memory and speedy response times, ideal for detailed content extraction and analysis from lengthy documents.
  • LangChain: An efficient workflow orchestration tool that enables parallel execution of tasks using the RunnableLamba .batch() functionality, speeding up our entire pipeline significantly.

Workflow Overview

Our knowledge graph creation follows a structured workflow:

  1. Entity Class Schema Extraction: Identify general classes and their properties using Gemini.
  2. Entity Instance Extraction: Extract specific instances of these classes in parallel.
  3. Deduplication and ID Assignment: Clean up duplicates and assign unique IDs for each entity.
  4. Insert Entities in ApertureDB: Insert entities into the ApertureDB instance along with all their properties.
  5. Relationship Extraction: Clearly define explicit relationships between entities through Gemini.
  6. Knowledge Graph Creation in ApertureDB: create connections between the entities in ApertureDB.
  7. Visualization: Interactively visualize the constructed knowledge graph using PyVis.

In Part 1, we covered steps 1 - 4. In this part we’ll cover steps  5 - 7, providing detailed insights and practical code snippets to guide you through the creation of your own powerful knowledge graph.

Step 5: Relationship Extraction

Once entities are clearly defined and deduplicated, extracting explicit relationships between them is crucial for constructing a meaningful knowledge graph. Leveraging Gemini’s deep contextual awareness, we accurately identify and describe relationships such as employment, education, or location-based connections. Again, we utilize chunking and parallel processing here because doing this in one go is too difficult, time-consuming and may potentially give lower-quality results. We use the previously-assigned unique IDs here for reference to entities, along with their class name. The powerful capabilities of Gemini 2.5 shine brightest here.

Pydantic Models for Structured Outputs

Once more we utilize Pydantic models to define a clear output format for consistent and accurate output from Gemini:

class EntityReference(BaseModel):
    class_type: str
    id: int
class Relationship(BaseModel):
    relationship: str
    source: EntityReference
    destination: EntityReference


class RelationshipExtractionResult(BaseModel):
    relationships: List[Relationship]

Prompt Template

prompt = PromptTemplate(
    template="""
    You are the fourth agent in a multi-step workflow to build a Knowledge Graph from raw text.


    Workflow Steps Overview:
    1. Extract high-level class types and their properties. [DONE]
    2. Extract specific entities and their properties. [DONE]
    3. Deduplicate and assign IDs. [DONE]
    4. Identify relationships between entities. [CURRENT]
    5. Build the graph.


    YOUR TASK:
    - Extract relationships explicitly stated in the text.
    - Use known entity IDs and class types.
    - Common relationships: "works_at", "part_of", "connected_to", etc.
    - Do not infer or guess.


    ORIGINAL TEXT:
    {input_text}


    CLASS ENTITIES:
    {class_entities}


    FORMAT YOUR RESPONSE:
    {{
      "relationships": [
        {{
          "relationship": "works_at",
          "source": {{"class_type": "Person", "id": 1}},
          "destination": {{"class_type": "Company", "id": 3}}
        }}
      ]
    }}


    {format_instructions}
    Begin your extraction now:
    """,
    input_variables=["input_text", "class_entities"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

Parallel Processing

def extract_relationships_from_chunk(data):
    text, entities = data
    entity_refs = [
        {"name": name, "class": obj["Class"], "id": props["id"]}
        for obj in entities for name, props in obj["Entities"].items()
    ]
    return retry_llm_call(lambda: chain.invoke({
        "input_text": text,
        "class_entities": json.dumps(entity_refs, indent=2)
    }))


data = [(chunk.page_content, deduplicated_entities) for chunk in chunks]
results = RunnableLambda(extract_relationships_from_chunk).batch(data, config={"max_concurrency": 6})
extracted_relationships = merge_relationships(results)
save_json({"relationships": extracted_relationships}, "step4_output.json")

Here’s a sample taken from the relationship extraction step:

{
  "relationships": [
    {
      "relationship": "is_an_example_of",
      "source": {
        "class_type": "Computing System",
        "id": 3
      },
      "destination": {
        "class_type": "Computing System",
        "id": 1
      }
    },...

Gemini's context-awareness ensures the relationships identified are accurate and explicitly supported by the text, creating a trustworthy foundation for our knowledge graph.

Step 6: Creating Relationships in ApertureDB

We have already inserted all the entities into our ApertureDB instance previously. In this step, we will create connections between entities in ApertureDB using the extracted relationships.

Creating Relationships (aka Connections):

This step creates edges between entity nodes to form our knowledge graph, using ApertureDB connections. We use our relationship schema dict here to easily create relationships using unique ids. Again we use ApertureDB’s ParallelLoader to speed things up.

def prepare_relationship_data(relationships):
    data = []
    for r in relationships:
        query = [
            {"FindEntity": {"with_class": r["source"]["class_type"], "constraints": {"id": ["==", r["source"]["id"]]}, "_ref": 1}},
            {"FindEntity": {"with_class": r["destination"]["class_type"], "constraints": {"id": ["==", r["destination"]["id"]]}, "_ref": 2}},
            {"AddConnection": {
                "class": r["relationship"],
                "src": 1,
                "dst": 2,
                "properties": {"created_at": time.strftime("%Y-%m-%d %H:%M:%S")}
            }}
        ]
        data.append((query, []))
    return data


def insert_relationships_parallel_loader(client, relationships):
    data = prepare_relationship_data(relationships)
    ParallelLoader(client).ingest(generator=data, batchsize=20, numthreads=4, stats=True)
    print(f"Created {len(data)} relationships.")


# Run
insert_relationships_parallel_loader(client, extracted_relationships)

We can see on our ApertureDB dashboard that the connections between entities have indeed been made:


Optional: Connect Entities to PDF Source

If you inserted the entire document as blob into your database instance previously, you can now connect it with every single entity using a "belongs_to_data_source" relationship:

def connect_entities_to_source(client, entities, pdf_blob_id=0):
    data = []


    for obj in entities:
        cls = obj["Class"]
        for _, props in obj["Entities"].items():
            eid = props["id"]
            query = [
                {"FindEntity": {"with_class": cls, "constraints": {"id": ["==", eid]}, "_ref": 1}},
                {"FindBlob": {"constraints": {"id": ["==", pdf_blob_id]}, "_ref": 2}},
                {"AddConnection": {
                    "class": "belongs_to_data_source",
                    "src": 1,
                    "dst": 2,
                    "properties": {"created_at": time.strftime("%Y-%m-%d %H:%M:%S")}
                }}
            ]
            data.append((query, []))


    ParallelLoader(client).ingest(generator=data, batchsize=50, numthreads=4, stats=True)
    print(f"Linked {len(data)} entities to the source document.")


Performance Highlights:

  • Parallel insertion dramatically reduces processing time.
  • Entity and relationship insertion logic utilizes efficient batch processing and threading.
  • Proper indexing in ApertureDB (on entity IDs) further enhances query performance.

5. Step 7: Visualizing the Knowledge Graph

To intuitively explore and understand our knowledge graph, we use PyVis, a powerful Python library for interactive visualization. We first query ApertureDB to fetch entities and their relationships, construct a NetworkX graph from the data, and then use PyVis to generate an interactive HTML visualization, complete with a color-coded legend for entity classes:

def visualize_graph_from_aperturedb(client):
    query = [
        {"FindEntity": {"results": {"all_properties": True}}},
        {"FindConnection": {"results": {"all_properties": True}}}
    ]
    response, _ = client.query(query)
    entities = response[0]["FindEntity"]["entities"]
    connections = response[1]["FindConnection"]["connections"]


    id_map = {int(e["id"]): e["_uniqueid"] for e in entities if "id" in e}
    class_map = {e["class"]: [] for e in entities if "class" in e}


    G = nx.Graph()
    colors = ["#4287f5", "#f54242", "#42f551", "#f5d142"]
    color_map = {cls: colors[i % len(colors)] for i, cls in enumerate(class_map)}


    for e in entities:
        uid = e["_uniqueid"]
        label = e.get("name", f"Entity_{e.get('id', 'unknown')}")
        G.add_node(uid, label=label, color=color_map.get(e.get("class", "Unknown")), title=str(e))


    for c in connections:
        src = id_map.get(c.get("src_id"))
        dst = id_map.get(c.get("dst_id"))
        if src and dst:
            G.add_edge(src, dst, label=c.get("type", "related"))


    net = Network(height="600px", width="100%", bgcolor="#222222", font_color="white")
    net.from_nx(G)
    net.save_graph("aperturedb_knowledge_graph.html")

Result:


Interacting with the generated graph allows users to discover entity clusters, navigate relationship paths, and explore insights from the data effortlessly.

6. Practical Use Cases and Next Steps

The knowledge graph constructed through this workflow has broad applications across multiple domains:

  • Enhanced Information Retrieval: Query entities and their relationships with precision, enabling structured semantic search over large textual corpora.

  • Customer Support Systems: Automatically extract and link entities such as products, services, issues, and solutions from support documents, forming the backbone of intelligent FAQ systems.

  • Educational Tools: Organize and visualize learning materials by topics, concepts, and their interdependencies—ideal for building interactive curricula or study guides.

  • Data Integration: Merge semi-structured data from disparate sources into a unified graph, simplifying analysis and downstream reasoning tasks.

This tutorial focused on building the graph. In a follow-up post, we’ll explore how to integrate this graph with Retrieval-Augmented Generation (RAG) pipelines, enhancing the factual accuracy and context relevance of LLM responses by grounding them in structured knowledge - ApertureDB further facilitates RAG due to its support for vector embeddings storage and vector search operations. If you stored the document alongside the rest of the extracted information, you can use them to validate content in the graph or eventually embed them for fast semantic search, also supported by ApertureDB.

Conclusion

This end-to-end pipeline demonstrates how combining the graph and multimodal capabilities of ApertureDB with the reasoning capabilities of Gemini 2.5 Flash results in an efficient and scalable method for knowledge graph generation. From schema extraction to entity resolution and relationship mapping, each step is optimized for clarity, robustness, and performance.

The resulting knowledge graph is not only immediately useful for querying and visualization, but it also lays the foundation for advanced applications in AI-driven retrieval, summarization, and question answering. And this is just a textual use case; you can use largely the same workflow - albeit with some added complexities - to make knowledge graphs from image, video, and/or audio data thanks to ApertureDB’s multimodal data support.

We encourage you to experiment with your own documents, adapt this pipeline to your needs, and build domain-specific knowledge graphs that support your unique data and objectives.


Appendix

For your reference and further exploration:

  1. Complete Colab Notebook
  2. ApertureDB documentation
  3. GraphRAG with ApertureDB
  4. Google’s Gemini documentation

👉 ApertureDB is available on Google Cloud. Subscribe Now

Author: Ayesha Imran | LinkedIn | Github

I am a software engineer passionate about AI/ML, Generative AI, and secure full-stack app development. I’m experienced in RAG and Agentic AI systems, LLMOps, full-stack development, cloud computing, and deploying scalable AI solutions. My ambition lies in building and contributing to innovative projects with real-world impact. Endlessly learning, perpetually building.

Related Blogs

Automating Knowledge Graph Creation with Gemini 2.5 and ApertureDB - Part 1
Blogs
Automating Knowledge Graph Creation with Gemini 2.5 and ApertureDB - Part 1
This blog shows how to build a knowledge graph using ApertureDB and Gemini 2.5 Flash to power smarter RAG systems. Part 1 covers extracting and storing entities, enabling real-world use cases like semantic search and AI-powered customer support.
Read More
Watch Now
Applied
Beyond Vanilla RAG: Unlocking Enhanced Retrieval with GraphRAG and ApertureDB
Blogs
Beyond Vanilla RAG: Unlocking Enhanced Retrieval with GraphRAG and ApertureDB
Unlock the power of GraphRAG for better AI retrieval. Learn how ApertureDB enables structured knowledge graphs for accurate, context-rich LLM responses in addition to its vector search and multimodal data management capabilities.
Read More
Watch Now
Applied
Is Your Chatbot Secure?
Blogs
Is Your Chatbot Secure?
ApertureData and Realm Labs help developers build secure RAG chatbots by combining advanced permissions management with graph-vector storage, ensuring data protection and efficient access control.
Read More
Watch Now
Industry Experts
Agentic RAG with ApertureDB and HuggingFace SmolAgents
Blogs
Agentic RAG with ApertureDB and HuggingFace SmolAgents
Agentic RAG is the future of LLM applications! This blog article shows you how to build a powerful research paper search engine using ApertureDB & Huggingface SmolAgents.
Read More
Watch Now
Applied
Building Real World RAG-based Applications with ApertureDB
Blogs
Building Real World RAG-based Applications with ApertureDB
Combining different AI technologies, such as LLMs, embedding models, and a database like ApertureDB that is purpose-built for multimodal AI, can significantly enhance the ability to retrieve and generate relevant content.
Read More
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Blogs
Managing Visual Data for Machine Learning and Data Science. Painlessly.
Visual data or image/video data is growing fast. ApertureDB is a unique database...
Read More
What’s in Your Visual Dataset?
Blogs
What’s in Your Visual Dataset?
CV/ML users need to find, analyze, pre-process as needed; and to visualize their images and videos along with any metadata easily...
Read More
Transforming Retail and Ecommerce with Multimodal AI
Blogs
Transforming Retail and Ecommerce with Multimodal AI
Multimodal AI can boost retail sales by enabling better user experience at lower cost but needs the right infrastructure...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 1
Multimodal AI, vector databases, large language models (LLMs)...
Read More
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
Blogs
How a Purpose-Built Database for Multimodal AI Can Save You Time and Money
With extensive data systems needed for modern applications, costs...
Read More
Minute-Made Data Preparation with ApertureDB
Blogs
Minute-Made Data Preparation with ApertureDB
Working with visual data (images, videos) and its metadata is no picnic...
Read More
Why Do We Need A Purpose-Built Database For Multimodal Data?
Blogs
Why Do We Need A Purpose-Built Database For Multimodal Data?
Recently, data engineering and management has grown difficult for companies building modern applications...
Read More
Building a Specialized Database for Analytics on Images and Videos
Blogs
Building a Specialized Database for Analytics on Images and Videos
ApertureDB is a database for visual data such as images, videos, embeddings and associated metadata like annotations, purpose-built for...
Read More
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Blogs
Vector Databases and Beyond for Multimodal AI: A Beginner's Guide Part 2
Multimodal AI, vector databases, large language models (LLMs)...
Read More
Challenges and Triumphs: Multimodal AI in Life Sciences
Blogs
Challenges and Triumphs: Multimodal AI in Life Sciences
AI presents a new and unparalleled transformational opportunity for the life sciences sector...
Read More
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
Blogs
Your Multimodal Data Is Constantly Evolving - How Bad Can It Get?
The data landscape has dramatically changed in the last two decades...
Read More
Can A RAG Chatbot Really Improve Content?
Blogs
Can A RAG Chatbot Really Improve Content?
We asked our chatbot questions like "Can ApertureDB store pdfs?" and the answer it gave..
Read More
ApertureDB Now Available on DockerHub
Blogs
ApertureDB Now Available on DockerHub
Getting started with ApertureDB has never been easier or safer...
Read More
Are Vector Databases Enough for Visual Data Use Cases?
Blogs
Are Vector Databases Enough for Visual Data Use Cases?
ApertureDB vector search and classification functionality is offered as part of our unified API defined to...
Read More
Accelerate Industrial and Visual Inspection with Multimodal AI
Blogs
Accelerate Industrial and Visual Inspection with Multimodal AI
From worker safety to detecting product defects to overall quality control, industrial and visual inspection plays a crucial role...
Read More
ApertureDB 2.0: Redefining Visual Data Management for AI
Blogs
ApertureDB 2.0: Redefining Visual Data Management for AI
A key to solving Visual AI challenges is to bring together the key learnings of...
Read More

Ready to Accelerate your AI Workflows?

Unlock 10X productivity and simplify multimodal AI data management with ApertureDB—try it for free or schedule a demo today!

Stay Connected:
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.