Introduction
Welcome back to Part 2 of our tutorial series on automating knowledge graph creation using Google's Gemini 2.5 and ApertureDB! In Part 1, we established a strong foundation by leveraging Gemini's contextual understanding capabilities to extract structured entities from textual documents, deduplicating these entities, and securely storing them within ApertureDB - a powerful multimodal database optimized for managing complex, interconnected data.
Knowledge graphs offer tremendous value by making data relationships explicit and visually intuitive, significantly improving systems like Retrieval-Augmented Generation (RAG) for more accurate and transparent outcomes. Building on the groundwork from Part 1, we’ll now focus on defining clear relationships among the stored entities and visualizing the complete knowledge graph interactively. By the end of this tutorial, you'll have a dynamic and insightful graph visualization ready to enhance your data-driven applications and analyses.
Please note that the code snippets in this blog have been shortened for brevity. You can find the complete code in the Colab Notebook.
Understanding the Components
To build our knowledge graph, we'll use:
- ApertureDB: A specialized multimodal database designed to handle structured metadata along with text, embeddings, images, videos, and other rich media. It excels in rapidly querying and managing complex inter-entity relationships.
- Gemini 2.5 Flash: Google's cutting-edge large language model (LLM) providing extensive contextual memory and speedy response times, ideal for detailed content extraction and analysis from lengthy documents.
- LangChain: An efficient workflow orchestration tool that enables parallel execution of tasks using the RunnableLamba .batch() functionality, speeding up our entire pipeline significantly.
Workflow Overview
Our knowledge graph creation follows a structured workflow:

- Entity Class Schema Extraction: Identify general classes and their properties using Gemini.
- Entity Instance Extraction: Extract specific instances of these classes in parallel.
- Deduplication and ID Assignment: Clean up duplicates and assign unique IDs for each entity.
- Insert Entities in ApertureDB: Insert entities into the ApertureDB instance along with all their properties.
- Relationship Extraction: Clearly define explicit relationships between entities through Gemini.
- Knowledge Graph Creation in ApertureDB: create connections between the entities in ApertureDB.
- Visualization: Interactively visualize the constructed knowledge graph using PyVis.
In Part 1, we covered steps 1 - 4. In this part we’ll cover steps 5 - 7, providing detailed insights and practical code snippets to guide you through the creation of your own powerful knowledge graph.
Step 5: Relationship Extraction
Once entities are clearly defined and deduplicated, extracting explicit relationships between them is crucial for constructing a meaningful knowledge graph. Leveraging Gemini’s deep contextual awareness, we accurately identify and describe relationships such as employment, education, or location-based connections. Again, we utilize chunking and parallel processing here because doing this in one go is too difficult, time-consuming and may potentially give lower-quality results. We use the previously-assigned unique IDs here for reference to entities, along with their class name. The powerful capabilities of Gemini 2.5 shine brightest here.
Pydantic Models for Structured Outputs
Once more we utilize Pydantic models to define a clear output format for consistent and accurate output from Gemini:
class EntityReference(BaseModel):
class_type: str
id: int
class Relationship(BaseModel):
relationship: str
source: EntityReference
destination: EntityReference
class RelationshipExtractionResult(BaseModel):
relationships: List[Relationship]
Prompt Template
prompt = PromptTemplate(
template="""
You are the fourth agent in a multi-step workflow to build a Knowledge Graph from raw text.
Workflow Steps Overview:
1. Extract high-level class types and their properties. [DONE]
2. Extract specific entities and their properties. [DONE]
3. Deduplicate and assign IDs. [DONE]
4. Identify relationships between entities. [CURRENT]
5. Build the graph.
YOUR TASK:
- Extract relationships explicitly stated in the text.
- Use known entity IDs and class types.
- Common relationships: "works_at", "part_of", "connected_to", etc.
- Do not infer or guess.
ORIGINAL TEXT:
{input_text}
CLASS ENTITIES:
{class_entities}
FORMAT YOUR RESPONSE:
{{
"relationships": [
{{
"relationship": "works_at",
"source": {{"class_type": "Person", "id": 1}},
"destination": {{"class_type": "Company", "id": 3}}
}}
]
}}
{format_instructions}
Begin your extraction now:
""",
input_variables=["input_text", "class_entities"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
Parallel Processing
def extract_relationships_from_chunk(data):
text, entities = data
entity_refs = [
{"name": name, "class": obj["Class"], "id": props["id"]}
for obj in entities for name, props in obj["Entities"].items()
]
return retry_llm_call(lambda: chain.invoke({
"input_text": text,
"class_entities": json.dumps(entity_refs, indent=2)
}))
data = [(chunk.page_content, deduplicated_entities) for chunk in chunks]
results = RunnableLambda(extract_relationships_from_chunk).batch(data, config={"max_concurrency": 6})
extracted_relationships = merge_relationships(results)
save_json({"relationships": extracted_relationships}, "step4_output.json")
Here’s a sample taken from the relationship extraction step:
{
"relationships": [
{
"relationship": "is_an_example_of",
"source": {
"class_type": "Computing System",
"id": 3
},
"destination": {
"class_type": "Computing System",
"id": 1
}
},...
Gemini's context-awareness ensures the relationships identified are accurate and explicitly supported by the text, creating a trustworthy foundation for our knowledge graph.
Step 6: Creating Relationships in ApertureDB
We have already inserted all the entities into our ApertureDB instance previously. In this step, we will create connections between entities in ApertureDB using the extracted relationships.
Creating Relationships (aka Connections):
This step creates edges between entity nodes to form our knowledge graph, using ApertureDB connections. We use our relationship schema dict here to easily create relationships using unique ids. Again we use ApertureDB’s ParallelLoader to speed things up.
def prepare_relationship_data(relationships):
data = []
for r in relationships:
query = [
{"FindEntity": {"with_class": r["source"]["class_type"], "constraints": {"id": ["==", r["source"]["id"]]}, "_ref": 1}},
{"FindEntity": {"with_class": r["destination"]["class_type"], "constraints": {"id": ["==", r["destination"]["id"]]}, "_ref": 2}},
{"AddConnection": {
"class": r["relationship"],
"src": 1,
"dst": 2,
"properties": {"created_at": time.strftime("%Y-%m-%d %H:%M:%S")}
}}
]
data.append((query, []))
return data
def insert_relationships_parallel_loader(client, relationships):
data = prepare_relationship_data(relationships)
ParallelLoader(client).ingest(generator=data, batchsize=20, numthreads=4, stats=True)
print(f"Created {len(data)} relationships.")
# Run
insert_relationships_parallel_loader(client, extracted_relationships)
We can see on our ApertureDB dashboard that the connections between entities have indeed been made:

Optional: Connect Entities to PDF Source
If you inserted the entire document as blob into your database instance previously, you can now connect it with every single entity using a "belongs_to_data_source" relationship:
def connect_entities_to_source(client, entities, pdf_blob_id=0):
data = []
for obj in entities:
cls = obj["Class"]
for _, props in obj["Entities"].items():
eid = props["id"]
query = [
{"FindEntity": {"with_class": cls, "constraints": {"id": ["==", eid]}, "_ref": 1}},
{"FindBlob": {"constraints": {"id": ["==", pdf_blob_id]}, "_ref": 2}},
{"AddConnection": {
"class": "belongs_to_data_source",
"src": 1,
"dst": 2,
"properties": {"created_at": time.strftime("%Y-%m-%d %H:%M:%S")}
}}
]
data.append((query, []))
ParallelLoader(client).ingest(generator=data, batchsize=50, numthreads=4, stats=True)
print(f"Linked {len(data)} entities to the source document.")
Performance Highlights:
- Parallel insertion dramatically reduces processing time.
- Entity and relationship insertion logic utilizes efficient batch processing and threading.
- Proper indexing in ApertureDB (on entity IDs) further enhances query performance.
5. Step 7: Visualizing the Knowledge Graph
To intuitively explore and understand our knowledge graph, we use PyVis, a powerful Python library for interactive visualization. We first query ApertureDB to fetch entities and their relationships, construct a NetworkX graph from the data, and then use PyVis to generate an interactive HTML visualization, complete with a color-coded legend for entity classes:
def visualize_graph_from_aperturedb(client):
query = [
{"FindEntity": {"results": {"all_properties": True}}},
{"FindConnection": {"results": {"all_properties": True}}}
]
response, _ = client.query(query)
entities = response[0]["FindEntity"]["entities"]
connections = response[1]["FindConnection"]["connections"]
id_map = {int(e["id"]): e["_uniqueid"] for e in entities if "id" in e}
class_map = {e["class"]: [] for e in entities if "class" in e}
G = nx.Graph()
colors = ["#4287f5", "#f54242", "#42f551", "#f5d142"]
color_map = {cls: colors[i % len(colors)] for i, cls in enumerate(class_map)}
for e in entities:
uid = e["_uniqueid"]
label = e.get("name", f"Entity_{e.get('id', 'unknown')}")
G.add_node(uid, label=label, color=color_map.get(e.get("class", "Unknown")), title=str(e))
for c in connections:
src = id_map.get(c.get("src_id"))
dst = id_map.get(c.get("dst_id"))
if src and dst:
G.add_edge(src, dst, label=c.get("type", "related"))
net = Network(height="600px", width="100%", bgcolor="#222222", font_color="white")
net.from_nx(G)
net.save_graph("aperturedb_knowledge_graph.html")
Result:


Interacting with the generated graph allows users to discover entity clusters, navigate relationship paths, and explore insights from the data effortlessly.
6. Practical Use Cases and Next Steps
The knowledge graph constructed through this workflow has broad applications across multiple domains:
- Enhanced Information Retrieval: Query entities and their relationships with precision, enabling structured semantic search over large textual corpora.
- Customer Support Systems: Automatically extract and link entities such as products, services, issues, and solutions from support documents, forming the backbone of intelligent FAQ systems.
- Educational Tools: Organize and visualize learning materials by topics, concepts, and their interdependencies—ideal for building interactive curricula or study guides.
- Data Integration: Merge semi-structured data from disparate sources into a unified graph, simplifying analysis and downstream reasoning tasks.
This tutorial focused on building the graph. In a follow-up post, we’ll explore how to integrate this graph with Retrieval-Augmented Generation (RAG) pipelines, enhancing the factual accuracy and context relevance of LLM responses by grounding them in structured knowledge - ApertureDB further facilitates RAG due to its support for vector embeddings storage and vector search operations. If you stored the document alongside the rest of the extracted information, you can use them to validate content in the graph or eventually embed them for fast semantic search, also supported by ApertureDB.
Conclusion
This end-to-end pipeline demonstrates how combining the graph and multimodal capabilities of ApertureDB with the reasoning capabilities of Gemini 2.5 Flash results in an efficient and scalable method for knowledge graph generation. From schema extraction to entity resolution and relationship mapping, each step is optimized for clarity, robustness, and performance.
The resulting knowledge graph is not only immediately useful for querying and visualization, but it also lays the foundation for advanced applications in AI-driven retrieval, summarization, and question answering. And this is just a textual use case; you can use largely the same workflow - albeit with some added complexities - to make knowledge graphs from image, video, and/or audio data thanks to ApertureDB’s multimodal data support.
We encourage you to experiment with your own documents, adapt this pipeline to your needs, and build domain-specific knowledge graphs that support your unique data and objectives.
Appendix
For your reference and further exploration:
- Complete Colab Notebook
- ApertureDB documentation
- GraphRAG with ApertureDB
- Google’s Gemini documentation
👉 ApertureDB is available on Google Cloud. Subscribe Now
Author: Ayesha Imran | LinkedIn | Github
I am a software engineer passionate about AI/ML, Generative AI, and secure full-stack app development. I’m experienced in RAG and Agentic AI systems, LLMOps, full-stack development, cloud computing, and deploying scalable AI solutions. My ambition lies in building and contributing to innovative projects with real-world impact. Endlessly learning, perpetually building.