LLMs are brilliant at deciding what to do and useless at actually doing it. Automation tools are the opposite — they execute flawlessly but can't reason about anything. The obvious move is to bolt the two together: let the model plan, and let deterministic automation carry out each step. The question is how you connect them cleanly.
The answer that's become a standard is MCP — the Model Context Protocol. It's a common language for exposing "tools" to a model. Wrap your automation actions as MCP tools, point a reasoning model at them, and the model can drive real software. This is the exact pattern under MeBot's agentic RPA. Let's build a small version of it.
Step 1 — Install the pieces
pip install "mcp[cli]" playwright
playwright install chromium
Step 2 — Stand up an MCP server
FastMCP (bundled with the MCP SDK) turns plain Python functions into tools. Any function you decorate becomes something a model can call — the docstring and type hints are the tool's description, so the model knows when to use it.
from mcp.server.fastmcp import FastMCP
from playwright.sync_api import sync_playwright
mcp = FastMCP("mebot-browser")
_pw = sync_playwright().start()
browser = _pw.chromium.launch(headless=False)
page = browser.new_page()
Step 3 — Expose automation actions as tools
Here's the key design decision: the tools themselves stay strictly deterministic. Each one does exactly one concrete thing — no cleverness, no improvising. You never want an LLM "creatively interpreting" a click on a banking screen. The intelligence lives in the model's choice of which tool to call; the execution is boringly reliable.
@mcp.tool()
def open_url(url: str) -> str:
"""Navigate the browser to a URL."""
page.goto(url)
return f"Opened {url}"
@mcp.tool()
def click(text: str) -> str:
"""Click the visible element matching the given text."""
page.get_by_text(text, exact=False).first.click()
return f"Clicked '{text}'"
@mcp.tool()
def type_text(label: str, value: str) -> str:
"""Type text into the field with the given label."""
page.get_by_label(label).fill(value)
return f"Typed into '{label}'"
@mcp.tool()
def read_screen() -> str:
"""Return the visible text on the current page."""
return page.inner_text("body")
if __name__ == "__main__":
mcp.run()
Step 4 — Let a reasoning model drive
Now connect a model as an MCP client. It requests the tool list, then works toward a goal by calling tools and reading results — a loop of observe → decide → act. Give it an objective like:
Goal: "Log into the portal and download this month's invoice."
The model plans and calls tools in sequence:
open_url("https://portal.example.com")
read_screen() -> sees a login form
type_text("Username", "...")
type_text("Password", "...")
click("Sign in")
read_screen() -> sees the dashboard
click("Invoices")
...
Crucially, the model reads the screen between steps. When a button moves or a field is renamed, it adapts — because it's reasoning about what's actually there, not replaying recorded coordinates. That's the difference between a bot that breaks on the first UI change and one that copes.
Step 5 — Guardrails
Autonomy without brakes is a liability, so real deployments add two things: an approvals gate for irreversible or high-risk actions (payments, deletions, submissions) that pauses for a human, and vision — OCR and UI detection — for legacy screens where there's no clean text or label to target. Together they make the agent safe to point at production systems, including the old Windows apps that classic RPA struggles with most.
Deterministic execution, reasoning on top
That's the whole philosophy in one line: keep the doing precise, put the thinking in a layer above it. Wrapping actions as MCP tools is what lets a model orchestrate real work without you hard-coding every branch — and it's exactly how MeBot turns brittle scripts into automation that adapts.
If you'd rather deploy this than assemble it — with the vision layer, approvals inbox, and legacy-app support already built — that's what we ship → takemebot.com