Webpage Extractor Pack
★Trustedv1.0.0MIT★Gold Verified95by AgentNode · published 2 months ago · toolpack
Extract clean text and metadata from any webpage.
Wraps trafilatura to provide reliable webpage content extraction.
Quick Start
agentnode install webpage-extractor-packRuns in a subprocess with filtered environment by default. Declared permissions are policy-checked, not sandboxed.
Usage
From packagefrom webpage_extractor_pack.tool import run
result = run(
url="https://openai.com/research/gpt-4-technical-report",
output_format="markdown"
)
print(f"Title: {result['title']}")
print(f"Author: {result['author']}")
print(f"Date: {result['date']}")
print(f"Word count: {result['word_count']}")
print(f"\n--- Content Preview ---")
print(result["text"][:500])
# Save as markdown
with open("extracted_article.md", "w") as f:
f.write(f"# {result['title']}\n\n{result['text']}")Runs locally on your machine. No execution data is sent to AgentNode. Permissions are checked before execution. Learn how this works
Verification
Package installs and imports correctly. runtime checks passed. publisher tests passed.
This package was executed and validated by AgentNode before listing. Install, import, and runtime checks passed.
Last verified 13d ago· Runner v2.0.0
Use this when you need to...
- ›Extract article text and metadata from news website URLs
- ›Pull structured product information from e-commerce pages
- ›Scrape blog post content for summarization or analysis pipelines
- ›Extract author, date, and title metadata from research publications
- ›Clean HTML pages into plain Markdown for knowledge base ingestion
README
Version History
Capabilities
Permissions
Declared by the publisher. Checked before execution by the policy gate.
Permissions are policy-checked before execution. Network and filesystem access are not sandboxed at runtime. Learn more
Privacy
All tool execution happens locally on your machine. AgentNode never receives:
- • Tool inputs or outputs
- • Execution logs
- • Data your agent processes
Only install events and search queries are sent to the registry.
agentnode install webpage-extractor-packFiles (5)
License
MITStats
Compatibility
Frameworks
Runtime
pythonPython Version
>=3.10Trust & Security
Publisher
AgentNode
@agentnode