Skip to main content

Webpage Extractor Pack

Trustedv1.0.0MITGold Verified95

by AgentNode · published 2 months ago · toolpack

Extract clean text and metadata from any webpage.

Wraps trafilatura to provide reliable webpage content extraction.

langchaincrewaigeneric

Quick Start

bash
agentnode install webpage-extractor-pack

Runs in a subprocess with filtered environment by default. Declared permissions are policy-checked, not sandboxed.

Usage

From package
python
from webpage_extractor_pack.tool import run

result = run(
    url="https://openai.com/research/gpt-4-technical-report",
    output_format="markdown"
)

print(f"Title: {result['title']}")
print(f"Author: {result['author']}")
print(f"Date: {result['date']}")
print(f"Word count: {result['word_count']}")
print(f"\n--- Content Preview ---")
print(result["text"][:500])

# Save as markdown
with open("extracted_article.md", "w") as f:
    f.write(f"# {result['title']}\n\n{result['text']}")

Runs locally on your machine. No execution data is sent to AgentNode. Permissions are checked before execution. Learn how this works

Verification

high confidence95/100★ Gold Verified
smokeReturned valid result
+25/25
testsPublisher-provided tests passed
+15/15
importAll tools imported successfully
+15/15
installInstalled in 2.4s
+15/15
contractAll contract checks passed
+10/10
determinismConsistent output across runs (normalized)
+5/5
reliability3/3 runs passed
+10/10

Package installs and imports correctly. runtime checks passed. publisher tests passed.

install2.4s
import695ms
smoke2.5s
tests3.5s

This package was executed and validated by AgentNode before listing. Install, import, and runtime checks passed.

Python 3.12.3ffmpegpopplertesseractuv

Last verified 13d ago· Runner v2.0.0

Use this when you need to...

  • Extract article text and metadata from news website URLs
  • Pull structured product information from e-commerce pages
  • Scrape blog post content for summarization or analysis pipelines
  • Extract author, date, and title metadata from research publications
  • Clean HTML pages into plain Markdown for knowledge base ingestion

README

Version History

Capabilities

webpage_extractionextract_webpagetool

Permissions

Declared by the publisher. Checked before execution by the policy gate.

Networkunrestricted
Filesystemnone
Code Executionnone
Data Accessinput_only
User Approvalnever

Permissions are policy-checked before execution. Network and filesystem access are not sandboxed at runtime. Learn more

Privacy

All tool execution happens locally on your machine. AgentNode never receives:

  • • Tool inputs or outputs
  • • Execution logs
  • • Data your agent processes

Only install events and search queries are sent to the registry.

bash
agentnode install webpage-extractor-pack

Files (5)

License

MIT

Stats

Downloads2
Installs0
Versionv1.0.0
Published3/13/2026
Channelstable
Typetoolpack
Entrypointwebpage_extractor_pack.tool

Compatibility

Frameworks

langchaincrewaigeneric

Runtime

python

Python Version

>=3.10

Trust & Security

PublisherTrusted
SignatureNone
ProvenanceNone
Security Issues0

Publisher

A

AgentNode

@agentnode