Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
2c90a5e
Update .gitignore to exclude .qodo files
owlmoonss Mar 14, 2025
ee93447
Add secrets configuration and update to use Ollama for embeddings
owlmoonss Mar 15, 2025
ce16a69
Update pydantic and pydantic-core versions in requirements.txt
owlmoonss Mar 15, 2025
db74ab9
Refactor to use ChatOllama for embeddings in graph_cypher_chain.py
owlmoonss Mar 20, 2025
0e4eaa8
Refactor to use ChatOllama for language model in rag_agent.py
owlmoonss Mar 20, 2025
50d6bca
Update embedding model in vector_chain.py to use Ollama with llama2
owlmoonss Mar 20, 2025
7205ece
Update vector_graph_chain.py to use ChatOllama with llama2 for embedd…
owlmoonss Mar 20, 2025
b0d66c9
Update models in graph_cypher_chain.py, rag_agent.py, vector_chain.py…
owlmoonss Mar 20, 2025
e5555f2
Update models in graph_cypher_chain.py, rag_agent.py, vector_chain.py…
owlmoonss Mar 27, 2025
9fc2031
Refactor graph_cypher_chain and rag_agent to improve result extractio…
owlmoonss Apr 3, 2025
8fa12db
Update Cypher generation template to enhance clarity and remove examp…
owlmoonss Apr 3, 2025
312c855
Add JSON logging for chain results in graph_cypher_chain.py
owlmoonss Apr 3, 2025
84b627b
Refactor Cypher generation template and improve schema clarity in gra…
owlmoonss Apr 8, 2025
5ce60b3
Enhance relationship context in Cypher schema and clean up logging st…
owlmoonss Apr 9, 2025
b60ca17
Remove redundant comments in graph_cypher_chain.py for improved clarity
owlmoonss Apr 9, 2025
3e261dc
Refactor Cypher generation to use schema from graph and remove unused…
owlmoonss Apr 12, 2025
a77e2ec
Refactor Cypher generation template for clarity and enhance question …
owlmoonss Apr 13, 2025
757305b
Refactor Cypher generation template for improved clarity and detail; …
owlmoonss Apr 13, 2025
bbc44b2
Add .secrets.toml to .gitignore to prevent sensitive information from…
owlmoonss Apr 13, 2025
495b338
Refactor comments for clarity and consistency; remove unused graph_to…
owlmoonss Apr 13, 2025
8e1450b
Enhance Cypher generation template with new guidelines for matching r…
owlmoonss Apr 14, 2025
5b5b2be
Refactor graph_cypher_chain.py and rag_agent.py for improved LLM inte…
owlmoonss Apr 17, 2025
42978e4
Refactor rag_agent.py for improved LLM processing; enhance conversati…
owlmoonss Apr 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@ __pycache__
# secrets.toml

# Dependency managers
*.lock
*.lock
.qodo
.secrets.toml
5 changes: 5 additions & 0 deletions .streamlit/secrets.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
NEO4J_URI="neo4j://localhost:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="neo4j123"
SEGMENT_WRITE_KEY=""
OPENAI_API_KEY=""
2,532 changes: 0 additions & 2,532 deletions poetry.lock

This file was deleted.

228 changes: 151 additions & 77 deletions rag_demo/graph_cypher_chain.py
Original file line number Diff line number Diff line change
@@ -1,47 +1,114 @@
import json
import logging
import streamlit as st
import urllib.parse
from retry import retry
from langchain.chains import GraphCypherQAChain
from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain_community.graphs import Neo4jGraph
from langchain.prompts.prompt import PromptTemplate
from langchain_ollama import ChatOllama
from langchain_openai import ChatOpenAI
from retry import retry
import logging
import streamlit as st

CYPHER_GENERATION_TEMPLATE = """Task: Generate Cypher statement to query a graph database strictly based on the schema and instructions provided.
Instructions:
1. Use only nodes, relationships, and properties mentioned in the schema.
2. Always enclose the Cypher output inside 3 backticks. Do not add 'cypher' after the backticks.
3. Always do a case-insensitive and fuzzy search for any properties related search. Eg: to search for a Company name use `toLower(c.name) contains 'neo4j'`
4. Always use aliases to refer the node in the query
5. Always return count(DISTINCT n) for aggregations to avoid duplicates
6. `OWNS_STOCK_IN` relationship is syonymous with `OWNS` and `OWNER`
7. Use examples of questions and accurate Cypher statements below to guide you.

Schema:

CYPHER_GENERATION_TEMPLATE = """
You are a Cypher expert who translates natural language questions into Cypher queries for a Neo4j graph database.
The database contains entities such as:
- Paper (p)
- Location (l)
- OceanCirculation (oc)
- WeatherEvent (we)
- Teleconnection (t)
- Model or Project (m)

Relationships include:
- :Mention (from Paper to another node), with property: Mention_Sentence
- :TargetsLocation (from a concept like OceanCirculation or WeatherEvent to a Location)

Properties include:
- Name (for all nodes)
- Mention_Sentence (in the :Mention relationship)
- wikidata_description (for Location)

- Ocean circulation processes often target specific oceanic locations, such as the Southern Ocean.
- Mentions of concepts in papers are linked via the :Mention relationship, which includes a Mention_Sentence field.
- To filter by concepts like "upwelling", check if the Mention_Sentence contains that word.
- A common type of question is: "What [scientific concept] are discussed in relation to [location] and involving [mechanism/phenomenon]?"
- The Cypher query often starts by matching a domain concept (e.g., OceanCirculation) and the location it's associated with.
- Then it retrieves papers mentioning that concept, filtering by keywords in the mention sentence.
- **Always use `[m:Mention]` when matching the mention relationship, never `[m]` or `[:Mention]`.**
- **Use labels like `:WeatherEvent`, `:OceanCirculation`, etc., only when the natural language question explicitly refers to the concept. Otherwise, leave the node unlabeled.**
- **Wrap multiple conditions in WHERE clauses (e.g., with OR/AND) inside parentheses to preserve logic clarity.**
- **When using a Location name in a Cypher match, convert it to all uppercase and replace spaces with underscores. (e.g., "North Atlantic" → "NORTH_ATLANTIC")**

Important: Never use [:Mention] in query and Name of Location always uppercase and replace space with _ (example "North Atlantic" becomes "NORTH_ATLANTIC").

The following is the schema of the Neo4j database. The schema is a simplified representation of the graph database, showing the types of nodes and relationships present in the database. The schema includes nodes for Paper, Location, OceanCirculation, WeatherEvent, Teleconnection, and Model or Project, along with their respective properties and relationships.


Schema database is:
{schema}

Examples: Here are a few examples of generated Cypher statements for particular questions:
Here are some examples:

### Example 1
Natural Language Question:
Which papers mention anomalous temperature regimes such as cold air outbreaks (CAOs) or warm waves (WWs) in relation to North America, specifically in the sentences where these terms appear?

Cypher:
MATCH (we)-[:TargetsLocation]-(l{{Name:"NORTH_AMERICA"}})
MATCH (p:Paper)-[m:Mention]-(we)
WHERE m.Mention_Sentence CONTAINS 'WW' OR m.Mention_Sentence CONTAINS 'CAOs'
RETURN p,l,we;

---

### Example 2
Natural Language Question:
Which papers discuss ocean circulation processes—such as thermohaline circulation—in oceanic regions that include either “North” or “South” in their names?

Cypher:
MATCH (n:Location)
WHERE n.Name CONTAINS 'OCEAN' AND (n.Name CONTAINS 'NORTH' OR n.Name CONTAINS 'SOUTH')
MATCH (oc:OceanCirculation)-[:TargetsLocation]-(l)
MATCH (p:Paper)-[m:Mention]-(oc)
WHERE m.Mention_Sentence CONTAINS 'thermohaline circulation'
RETURN n,oc,p;

---

### Example 3
Natural Language Question:
Which papers mention CMIP5 models and the North Atlantic Oscillation (NAO) in the context of the Southeast United States?

Cypher:
MATCH (p:Paper)-[r:Mention]->(m:Model|Project)
WHERE m.Name CONTAINS 'CMIP_5'
MATCH (p)-[t:Mention]-(n:Teleconnection{{Name:"NORTH_ATLANTIC_OSCILLATION"}})
WHERE t.Mention_Sentence CONTAINS 'Southeast'
RETURN p,m,n;

# How many Managers own Companies?
MATCH (m:Manager)-[:OWNS_STOCK_IN]->(c:Company)
RETURN count(DISTINCT m)
---

# How many companies are in filings?
MATCH (c:Company)
RETURN count(DISTINCT c)
### Example 4
Natural Language Question:
Which papers mention the Pacific-North American (PNA) pattern in connection with locations in the United States?

# Which companies are vulnerable to material shortage?
MATCH (co:Company)-[fi]-(f:Form)-[po]-(c:Chunk)
WHERE toLower(c.text) CONTAINS "material"
RETURN DISTINCT count(c) as chunks, co.name ORDER BY chunks desc
Cypher:
MATCH (p:Paper)-[z:Mention]->(t:Teleconnection{{Name:"PACIFIC_NORTH_AMERICAN_PNA_PATTERN"}})
MATCH (t)-[:TargetsLocation]-(l:Location)
MATCH (p)-[z:Mention]-(l)
WHERE l.wikidata_description CONTAINS "United States"
RETURN p,t,l;


---

Now generate a Cypher query for:
{question}
"""

# Which companies are in a specific industry?
MATCH (co:Company)-[fi]-(f:Form)-[po]-(c:Chunk)
WHERE toLower(c.text) CONTAINS "industryName"
RETURN DISTINCT count(c) as chunks, co.name ORDER BY chunks desc

The question is:
{question}"""

CYPHER_GENERATION_PROMPT = PromptTemplate(
input_variables=["schema", "question"], template=CYPHER_GENERATION_TEMPLATE
Expand All @@ -51,79 +118,86 @@
memory_key="chat_history",
input_key='question',
output_key='answer',
return_messages=True)
return_messages=True
)

# Neo4j connection
url = st.secrets["NEO4J_URI"]
username = st.secrets["NEO4J_USERNAME"]
password = st.secrets["NEO4J_PASSWORD"]

if "USER_OPENAI_API_KEY" in st.session_state:
openai_key = st.session_state["USER_OPENAI_API_KEY"]
else:
openai_key = st.secrets["OPENAI_API_KEY"]

graph = Neo4jGraph(
url=url,
username=username,
password=password,
sanitize = True
sanitize=True
)

# Official API doc for GraphCypherQAChain at: https://api.python.langchain.com/en/latest/chains/langchain.chains.graph_qa.base.GraphQAChain.html#

graph_chain = GraphCypherQAChain.from_llm(
#cypher_llm=ChatOllama(model="qwen2", temperature=0),
#qa_llm=ChatOllama(model="qwen2", temperature=0),
cypher_llm=ChatOpenAI(
openai_api_key=openai_key,
temperature=0,
model_name="gpt-4"
),
qa_llm=ChatOpenAI(
openai_api_key=openai_key,
temperature=0,
model_name="gpt-4"
),
validate_cypher= True,
openai_api_key=st.secrets["OPENAI_API_KEY"],
temperature=0,
model_name="gpt-4o-mini"
),
qa_llm=ChatOpenAI(
openai_api_key=st.secrets["OPENAI_API_KEY"],
temperature=0,
model_name="gpt-4o-mini"),
graph=graph,
verbose=True,
# return_intermediate_steps = True,
return_direct = True
cypher_prompt=CYPHER_GENERATION_PROMPT,
validate_cypher=True,
return_direct=True,
verbose=True,
allow_dangerous_requests=True,
return_intermediate_steps=True,
)


@retry(tries=2, delay=12)
def get_results(question) -> str:
"""Generate a response from a GraphCypherQAChain targeted at generating answered related to relationships.

Args:
question (str): User query

Returns:
str: Answer from chain
"""

logging.info(f'Using Neo4j database at url: {url}')

"""Generate a response from the GraphCypherQAChain using a cleaned schema and improved prompt."""

logging.info(f'Using Neo4j database at URL: {url}')
graph.refresh_schema()

prompt=CYPHER_GENERATION_PROMPT.format(schema=graph.get_schema, question=question)
print('Prompt:', prompt)
#Log full Neo4j schema in terminal
#print("\n========= Raw Schema from Neo4j =========\n")
#print(graph.get_schema)

chain_result = None
prompt = CYPHER_GENERATION_PROMPT.format(schema=graph.get_schema, question=question)
print('\n========= Prompt to LLM =========\n')
print(prompt)

try:
chain_result = graph_chain.invoke({
"query": question},
chain_result = graph_chain.invoke(
{"query": question},
prompt=prompt,
return_only_outputs = True,
return_only_outputs=True,
)
except Exception as e:
# Occurs when the chain can not generate a cypher statement
# for the question with the given database schema
logging.warning(f'Handled exception running graphCypher chain: {e}')

logging.debug(f'chain_result: {chain_result}')

if chain_result is None:
logging.warning(f'Handled exception running GraphCypher chain: {e}')
return "Sorry, I couldn't find an answer to your question"

if chain_result is None:
print('No answer was generated.')
return "No answer was generated."

# Debug: show Cypher used
cypher_query = chain_result.get("cypher", "No Cypher returned")
print("\n========= Generated Answer=========\n")
print(cypher_query)

result = chain_result.get("result", None)
print("\n========= Final Result =========\n")
print(json.dumps(chain_result, indent=2))

try:
query = chain_result["intermediate_steps"][-1]["query"].replace("cypher", "", 1).strip()
chain_result["intermediate_steps"][-1]["query"] = urllib.parse.quote(query)
except Exception as e:
pass

return result
return chain_result
8 changes: 0 additions & 8 deletions rag_demo/graph_tool.py

This file was deleted.

5 changes: 3 additions & 2 deletions rag_demo/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from constants import TITLE
import logging
import rag_agent
import graph_cypher_chain
import streamlit as st
from sidebar import sidebar

Expand Down Expand Up @@ -39,7 +40,7 @@
st.session_state.messages = [
{
"role": "ai",
"content": f"This is a Proof of Concept application which shows how GenAI can be used with Neo4j to build and consume Knowledge Graphs using both vectors and structured data.\nSee the sidebar for more information!",
"content": f"This is a Proof of Concept agpplication which shows how GenAI can be used with Neo4j to build and consume Knowledge Graphs using both vectors and structured data.\nSee the sidebar for more information!",
},
# {"role": "ai", "content": f"""This the schema in which the EDGAR filings are stored in Neo4j: \n <img style="width: 70%; height: auto;" src="{SCHEMA_IMG_PATH}"/>"""},
# {"role": "ai", "content": f"""This is how the Chatbot flow goes: \n <img style="width: 70%; height: auto;" src="{LANGCHAIN_IMG_PATH}"/>"""}
Expand Down Expand Up @@ -88,7 +89,7 @@
# StreamlitCcallbackHandler api doc: https://api.python.langchain.com/en/latest/callbacks/langchain_community.callbacks.streamlit.streamlit_callback_handler.StreamlitCallbackHandler.html

agent_response = rag_agent.get_results(
question=user_input, callbacks=[]
question=user_input,
)

if isinstance(agent_response, dict) is False:
Expand Down
Loading