How AI Agents Choose Tools: Inside the Resolution Engine

An AI agent receives a task: "Find the cheapest flight from New York to London next Tuesday and summarize the options." The agent has no flight search tool installed. In a traditional setup, the agent fails — it can only use tools its developer hardcoded at build time. But in a dynamic tool ecosystem, the agent searches for a flight search capability, evaluates multiple candidates, selects the most trustworthy one, installs it, and completes the task.

How does that selection happen? What algorithm decides that Tool A is better than Tool B for a given request? And how can the system make these decisions safely, without installing malicious or broken tools?

This article goes deep into the mechanics of tool resolution — the process by which AI agents discover, evaluate, and select tools at runtime. We will walk through AgentNode's resolution engine step by step, covering the data structures, scoring algorithms, and safety checks that make autonomous tool acquisition possible.

The Problem with Hardcoded Tools

Most AI agent frameworks today use hardcoded tool lists. When you build a LangChain agent, you define a list of tools at initialization:

tools = [
    WikipediaTool(),
    CalculatorTool(),
    WebSearchTool(),
]
agent = initialize_agent(tools=tools, llm=llm)

This approach has three fundamental limitations:

Static Capabilities

The agent can only use tools its developer anticipated. If a user asks the agent to convert a PDF to markdown and no PDF tool was included, the agent cannot help — even if a perfect PDF conversion tool exists somewhere.

Manual Curation

Every tool requires a human developer to find it, evaluate it, write integration code, and add it to the tool list. This does not scale. As the number of available tools grows from dozens to thousands, manual curation becomes a bottleneck.

No Quality Signal

When a developer picks a tool, they rely on GitHub stars, PyPI downloads, or word of mouth. There is no programmatic way for the agent itself to assess whether a tool is safe, reliable, or well-maintained.

Dynamic tool resolution solves all three problems. Instead of hardcoding tools, the agent queries a registry at runtime, and the resolution engine returns the best match.

What Is Tool Resolution?

Tool resolution is the process of translating a capability need into a specific tool installation. It answers the question: "Given that I need X capability, which tool should I install and use?"

The resolution process has four stages:

Capability matching — find all tools that claim to provide the requested capability
Trust scoring — rank candidates by verification status and trust signals
Compatibility filtering — eliminate tools that are incompatible with the current runtime environment
Version selection — pick the optimal version of the winning candidate

Let us examine each stage in detail.

Stage 1: Capability Matching

When an agent calls resolve_and_install, it passes a capability descriptor — a string or structured query describing what it needs. For example:

from agentnode_sdk import AgentNodeClient

client = AgentNodeClient()
client.resolve_and_install(["web-scraping"])

The resolution engine searches the registry for all packages whose capability declarations match the query. Matching uses three strategies in parallel:

Exact Capability ID Match

Every ANP capability has a capability_id field (e.g., web-scraping.extract, text-analysis.sentiment). If the query exactly matches a capability ID, those packages are returned with the highest relevance score. This is the most precise match type.

Semantic Description Match

The engine also performs semantic similarity matching against capability descriptions. If you search for "extract text from web pages," the engine compares this against every capability's description field using embedding-based similarity. Tools with descriptions like "Scrape and parse a web page into structured text" score highly even if the capability ID does not exactly match the query.

Tag and Category Match

Packages include tags and category metadata. A search for "PDF" matches tools tagged with "pdf", "document-processing", or categorized under document tools. This catches tools that might have different names but serve the same purpose.

The result of Stage 1 is a candidate list — all packages that plausibly provide the requested capability, each with a relevance score from 0 to 1.

Stage 2: Trust Scoring

Having multiple candidates is common. A search for "web scraping" might return 15 different tools. Trust scoring determines which candidates are safe and reliable.

AgentNode computes a composite trust score for each candidate using these signals:

Verification Score (Weight: 40%)

The verification pipeline score (0-100) is the strongest trust signal. A tool that passes all four verification stages (install, import, smoke test, unit tests) with a Gold tier score of 95 is dramatically more trustworthy than an unverified tool scoring 30.

Publisher Reputation (Weight: 20%)

Publishers build reputation over time through consistent, high-quality releases. A publisher with 20 verified packages has earned more trust than a brand-new account publishing for the first time. Publisher reputation is computed from their average verification score, total packages, and history of flagged or removed packages.

Usage Signals (Weight: 20%)

Install counts, active usage metrics, and retention rates provide social proof. A tool installed by 5,000 agents with a 95% success rate is a strong signal. However, usage alone is not sufficient — the ClawHavoc incident demonstrated that popular tools can still be malicious — which is why verification score carries the highest weight.

Recency and Maintenance (Weight: 10%)

Recently updated packages score higher than abandoned ones. A tool last updated three years ago may have compatibility issues with current Python versions or framework APIs. The engine looks at last publish date, update frequency, and version history.

Community Signals (Weight: 10%)

User ratings, reviews, and issue reports contribute to the trust score. A tool with consistently positive reviews and resolved issues demonstrates active maintenance and user satisfaction.

The trust score produces a ranked list of candidates, from most to least trustworthy.

Stage 3: Compatibility Filtering

Not every trustworthy tool is usable in every environment. Compatibility filtering removes candidates that cannot run in the current runtime:

Python version — if the tool requires Python 3.11+ and the agent runs on 3.9, it is filtered out.
Framework compatibility — if the agent uses CrewAI and the tool only declares LangChain compatibility, it may be deprioritized (though AgentNode's adapter layer handles most cross-framework cases).
Permission policies — if the agent's runtime policy forbids tools with filesystem write access, tools declaring "filesystem": "write" are excluded.
Dependency conflicts — if the tool requires a dependency version that conflicts with already-installed packages, it is flagged for manual resolution or filtered.
Platform constraints — some tools are Linux-only or require specific system libraries. These are filtered on incompatible platforms.

Compatibility filtering is a hard filter — incompatible tools are removed entirely, not just deprioritized.

Stage 4: Version Selection

Once the winning package is identified, the engine selects the optimal version. This is not always the latest version. The selection algorithm considers:

Verification status per version — each version has its own verification score. If version 2.1.0 scored Gold but 2.2.0 dropped to Partial (perhaps due to a new dependency issue), the engine may recommend 2.1.0.
Stability preference — agents can specify whether they prefer the latest version or the most stable verified version. Production agents typically prefer stability.
Constraint satisfaction — if the agent specified version constraints (e.g., >=1.0,<3.0), only matching versions are considered.

The version selection algorithm mirrors best practices from package managers like pip and npm, but adds verification awareness as a first-class signal.

The resolve_and_install Algorithm

Putting it all together, here is the complete flow when an agent calls resolve_and_install:

# Pseudocode for the resolution algorithm
def resolve_and_install(capability_queries, policy=None):
    results = []
    for query in capability_queries:
        # Stage 1: Find candidates
        candidates = capability_match(query)

        # Stage 2: Score trust
        scored = [(c, compute_trust_score(c)) for c in candidates]
        scored.sort(key=lambda x: x[1], reverse=True)

        # Stage 3: Filter compatibility
        compatible = [c for c in scored if check_compatibility(c, policy)]

        if not compatible:
            raise NoCompatibleToolError(query)

        # Stage 4: Select version
        best = compatible[0]
        version = select_version(best, policy.stability_preference)

        # Install
        install(best.package_name, version)
        results.append(best)

    return results

The entire process typically completes in under two seconds — fast enough for real-time agent workflows. The SDK caches resolution results, so repeated queries for the same capability return instantly.

How Agents Rank Multiple Candidates

When several tools compete for the same capability slot, the final ranking combines relevance and trust into a single score:

final_score = (relevance_score * 0.4) + (trust_score * 0.6)

Trust is weighted more heavily than relevance because a highly relevant but untrustworthy tool is worse than a slightly less relevant but Gold-verified tool. This weighting reflects a security-first philosophy: it is better to use a solid tool that mostly fits than a perfect tool that might be malicious.

In practice, this means:

A Gold-verified tool with moderate relevance (0.7) beats an Unverified tool with perfect relevance (1.0)
Among equally verified tools, relevance breaks the tie
Among equally relevant tools, trust breaks the tie

Developers can adjust these weights through the SDK's policy configuration for specialized use cases.

Real-World Resolution Example

Let us trace through a concrete example. An agent needs to extract text from PDF files and searches for this capability:

client.resolve_and_install(["pdf-text-extraction"])

Stage 1 returns five candidates: pdf-parser (exact capability ID match), document-toolkit (semantic match on description), pdf-reader-pro (tag match), doc-extract (semantic match), and all-in-one-converter (partial tag match).

Stage 2 scores trust: pdf-parser (Gold, 94), document-toolkit (Verified, 82), pdf-reader-pro (Gold, 91), doc-extract (Partial, 55), all-in-one-converter (Unverified, 28).

Stage 3 filters: all-in-one-converter requires Python 3.12 (agent runs 3.10) — removed. doc-extract requires filesystem write access (policy forbids it) — removed.

Stage 4 computes final scores: pdf-parser (relevance 1.0, trust 0.94, final 0.964), pdf-reader-pro (relevance 0.85, trust 0.91, final 0.886), document-toolkit (relevance 0.72, trust 0.82, final 0.780).

Winner: pdf-parser. It is installed, loaded, and ready to use — all within two seconds of the agent's request.

Safety Guardrails

Autonomous tool installation creates obvious security concerns. What prevents a malicious tool from being selected? AgentNode's resolution engine includes multiple safety layers:

Minimum trust threshold — by default, the engine will not install tools below the Partial tier (score < 50). Agents can raise this threshold.
Permission review — tools requesting sensitive permissions (network external, filesystem write, code execution) trigger a permission review step that agents can surface to users.
Sandbox execution — newly resolved tools run in a sandboxed environment for their first invocation, allowing the runtime to detect unexpected behavior.
Blocklist — known malicious packages are blocklisted at the registry level and never appear in resolution results.
Human approval mode — agents can be configured to require human approval before installing any new tool, using resolution results as recommendations rather than automatic installations.

These guardrails mean that even in a fully autonomous agent workflow, tool installation passes through multiple safety checks. For a deeper analysis of the threat landscape, read about building an agent that finds its own tools safely.

The Future: Fully Autonomous Tool Acquisition

Today, most agents use resolution in a semi-automated way — a developer configures capability requirements, and the engine resolves them at startup or on demand. The next frontier is fully autonomous tool acquisition, where agents independently decide they need a capability, search for it, evaluate options, and install — all without human intervention.

This requires advances in three areas:

Intent Recognition

Agents need to recognize when they lack a capability for a given task. Current LLM-based agents can often articulate "I need a tool that does X" — the gap is connecting that articulation to a resolution query programmatically.

Risk Assessment

Autonomous installation needs better risk models. An agent should understand the difference between installing a text analysis tool (low risk) and a tool that requires full filesystem access (high risk), and make proportional decisions.

Composition Planning

Complex tasks often require multiple tools working together. Future resolution engines will need to resolve tool sets rather than individual tools, understanding that a "data analysis pipeline" requires data loading, transformation, visualization, and export capabilities working in concert.

These capabilities are actively being developed. The resolution engine architecture is designed to support them as the trust and safety infrastructure matures. To learn more about how AgentNode's architecture enables this future, read the resolution API documentation or explore the agent skill catalog to see the breadth of capabilities already available for resolution.

Building Agents That Leverage Resolution

If you are building an agent today and want to take advantage of dynamic tool resolution, the path is straightforward. Understanding how AgentNode works will give you the conceptual foundation, and the SDK provides the practical tools:

from agentnode_sdk import AgentNodeClient, load_tool

client = AgentNodeClient(
    policy={
        "min_trust_level": "verified",
        "stability_preference": "stable",
        "permission_review": True,
        "max_resolution_time": 5000  # ms
    }
)

# Resolve capabilities dynamically
client.resolve_and_install([
    "web-scraping",
    "text-summarization",
    "sentiment-analysis"
])

# Load and use resolved tools
scraper = load_tool("web-scraper")
summarizer = load_tool("text-summarizer")
sentiment = load_tool("sentiment-analyzer")

The policy configuration gives you fine-grained control over what the resolution engine is allowed to do, balancing autonomy with safety for your specific use case.

Frequently Asked Questions

How do AI agents find tools?

AI agents find tools through capability-based resolution. Instead of knowing specific tool names, an agent describes what it needs (e.g., "web scraping" or "PDF text extraction"), and a resolution engine searches a registry of published tools. The engine matches the request against capability declarations, descriptions, and tags to find all tools that can fulfill the need. This allows agents to discover tools they were never explicitly programmed to use.

What is tool resolution?

Tool resolution is the four-stage process of translating a capability need into a specific tool installation. The stages are capability matching (finding tools that claim to provide the needed function), trust scoring (ranking candidates by verification status and safety signals), compatibility filtering (removing tools that cannot run in the current environment), and version selection (choosing the optimal version of the winning tool). The entire process typically completes in under two seconds.

How does AgentNode rank agent tools?

AgentNode ranks tools using a weighted combination of relevance and trust. Trust accounts for 60% of the final score and includes verification pipeline results (40% weight), publisher reputation (20%), usage signals (20%), recency (10%), and community signals (10%). Relevance accounts for 40% and is based on how closely the tool's capabilities match the query. This trust-heavy weighting ensures that verified, safe tools are preferred even if a slightly more relevant but unverified option exists.

Can agents install tools automatically?

Yes, agents can install tools automatically through the resolve_and_install SDK method. However, multiple safety guardrails protect against malicious installations: minimum trust thresholds prevent low-quality tools from being installed, permission review flags tools requesting sensitive access, sandbox execution isolates first-time tool runs, and a blocklist excludes known malicious packages. Agents can also be configured in human-approval mode where resolution results are presented as recommendations requiring explicit user consent.

LLM Runtime: Let the Model Handle It

If your agent uses OpenAI or Anthropic tool calling, AgentNodeRuntime handles tool registration, system prompt injection, and the tool loop automatically. The LLM discovers, installs, and runs AgentNode capabilities on its own — no hardcoded tool calls needed.

from openai import OpenAI
from agentnode_sdk import AgentNodeRuntime

runtime = AgentNodeRuntime()

result = runtime.run(
    provider="openai",
    client=OpenAI(),
    model="gpt-4o",
    messages=[{"role": "user", "content": "your task here"}],
)
print(result.content)

The Runtime registers 5 meta-tools (agentnode_capabilities, agentnode_search, agentnode_install, agentnode_run, agentnode_acquire) that let the LLM search the registry, install packages, and execute tools autonomously. Works with Anthropic too — just change provider="anthropic" and pass an Anthropic client.

See the LLM Runtime documentation for the full API reference, trust levels, and manual tool calling.