Webpage Extractor Pack

Name: Webpage Extractor Pack
Author: AgentNode

★Trusted◇Sandbox optionalv1.0.0MIT✔Verified80

by AgentNode · published 3 months ago · toolpack

Extract clean text and metadata from any webpage.

Wraps trafilatura to provide reliable webpage content extraction.

langchaincrewaigeneric

Quick Start

bash

agentnode install webpage-extractor-pack

Runs in a subprocess with filtered environment by default. Declared permissions are policy-checked, not sandboxed.

Usage

From package

python

from webpage_extractor_pack.tool import run

result = run(
    url="https://openai.com/research/gpt-4-technical-report",
    output_format="markdown"
)

print(f"Title: {result['title']}")
print(f"Author: {result['author']}")
print(f"Date: {result['date']}")
print(f"Word count: {result['word_count']}")
print(f"\n--- Content Preview ---")
print(result["text"][:500])

# Save as markdown
with open("extracted_article.md", "w") as f:
    f.write(f"# {result['title']}\n\n{result['text']}")

Runs locally on your machine. No execution data is sent to AgentNode. Permissions are checked before execution. Learn how this works

Verification

high confidence80/100✔ Verified

smokeReturned valid result

+25/25

testsTests failed

0/15

importAll tools imported successfully

+15/15

installInstalled in 2.6s

+15/15

contractAll contract checks passed

+10/10

determinismConsistent output across runs (normalized)

+5/5

reliability3/3 runs passed

+10/10

Package installs and imports correctly. runtime checks passed.

✔install2.6s

✔import73ms

✔smoke2.1s

✖tests1.2s

This package was executed and validated by AgentNode before listing. Install, import, and runtime checks passed.

Verified in real_auto mode

Python 3.12.3ffmpegpopplertesseractuv

Last verified 29d ago· Runner v2.0.0

Use this when you need to...

›Extract article text and metadata from news website URLs
›Pull structured product information from e-commerce pages
›Scrape blog post content for summarization or analysis pipelines
›Extract author, date, and title metadata from research publications
›Clean HTML pages into plain Markdown for knowledge base ingestion

README

Version History

v1.0.0latestverified

3/13/2026

Capabilities

webpage_extractionextract_webpagetool

Permissions

◇Sandbox optionalFrom a trusted publisher — runs on the host by default. You can require isolation with sandbox.host_trust_policy.

Declared by the publisher. Checked before execution by the policy gate.

Networkunrestricted

Filesystemnone

Code Executionnone

Data Accessinput_only

User Approvalnever

Permissions are policy-checked before execution. For trusted and curated packages that run on the host, network and filesystem access are policy-checked but not OS-sandboxed. When runtime isolation is required for untrusted/community code, AgentNode uses sandbox-or-fail-closed if the required container runtime and pinned image are available. Learn more

Privacy

All tool execution happens locally on your machine. AgentNode never receives:

• Tool inputs or outputs
• Execution logs
• Data your agent processes

Only install events and search queries are sent to the registry.

bash

agentnode install webpage-extractor-pack

Files (5)

License

MIT

Stats

Downloads2

Installs0

Versionv1.0.0

Published3/13/2026

Channelstable

Typetoolpack

Entrypointwebpage_extractor_pack.tool

Compatibility

Frameworks

langchaincrewaigeneric

Runtime

python

Python Version

>=3.10

Trust & Security

Publisher★Trusted

SignatureNone

ProvenanceNone

Security Issues0

Publisher

AgentNode

@agentnode

Report an issue with this package