Runs completely free and locally
Accepts MCP-style { action, params } via a Flask API
Routes to a local LLM (via Ollama) for answers
app.py ← Flask API (MCP-style server)
│
├── app.py ← Flask API (MCP-style server)-> Flask web server
├── mcp_agent.py ← LangChain logic: interprets actions -> LangChain agent
├── requirements.txt ← Python dependencies
└── README.md ← Project overview
The LangChain agent received the correct action
It executed it using your logic (from mcp_agent.py)
It responded with the correct time back to Postman
Next Steps: Test with other action values
Add new functions to mcp_agent.py
Or integrate this with a frontend
adding more intelligent tasks or connecting it with a UI!
Explore https://ollama.com/library
ollama run gemma:2b
gemma:2b uses just ~2–3 GB RAM.
ollama run mistral
mistral only needs ~4 GB of RAM and still performs quite well for general tasks.
ollama run llama3
Ollama runs models like llama3 entirely on your machine, in memory.
Even though it's optimized, LLaMA 3 still needs at least ~6 GB of RAM free, and ideally more (8–12 GB total system RAM recommended).
ollama list
ollama pull mistral
ollama pull codellama
ollama stop
python app.py
You need to install the latest LangChain with community modules, specifically:
pip install langchain-community
If you're using LangChain with Ollama, it's best to ensure these are installed:
pip install langchain langchain-community langchain-core langchainhub
pip install ollama
Static Knowledge: These models are trained on data available up to a certain point (e.g., mid-2023 for many models).
They don’t know anything that happened after their training cut-off date.
You can combine the local model with:
RAG (Retrieval-Augmented Generation), where you fetch live news via API (like NewsAPI or Google News),
Then pass that info as context to the LLM to summarize or answer based on it.
LangChain + News API integration using local Ollama model.