Skip to content

DOMHarvestPlaywright-powered web scraping

Extract DOM elements with precision and speed

DOMHarvest Logo

Quick Example

javascript
import { harvest, text, array } from 'domharvest-playwright'

// Extract quotes using the declarative DSL
const quotes = await harvest(
  'https://quotes.toscrape.com/',
  '.quote',
  {
    text: text('.text'),
    author: text('.author'),
    tags: array('.tag', text())
  }
)

console.log(quotes)
// [{ text: "The world as we...", author: "Albert Einstein", tags: ["change", "world"] }, ...]

Why DOMHarvest?

DOMHarvest makes web scraping simple and reliable by leveraging Playwright's battle-tested browser automation. Whether you're building a data pipeline, monitoring websites, or extracting content for analysis, DOMHarvest provides the tools you need with minimal setup.

Features at a Glance

  • Declarative DSL: Use text(), attr(), array(), exists(), html(), count() for clean, readable extraction
  • Authentication helpers: login(), fillLoginForm(), SessionManager for authenticated scraping
  • Rate limiting: Built-in global and per-domain rate limiting to avoid overwhelming servers
  • Retry with backoff: Automatic retries with exponential or linear backoff strategies
  • Batch harvesting: Process multiple URLs concurrently with harvestBatch()
  • Custom extractors: Mix DSL with custom functions when you need complex logic
  • Screenshot capture: Take screenshots during or after scraping operations
  • Proxy support: Route requests through proxy servers with authentication
  • Full Playwright access: Direct access to browser, context, and page for advanced use cases

Released under the MIT License.