DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Show HN: Robust LLM Extractor for Websites in TypeScript

Quality: 8/10 Relevance: 9/10

Summary

Lightfeed Extractor is a TypeScript library that enables robust web data extraction using LLMs and Playwright. It offers browser automation in stealth mode, AI-driven page navigation, HTML-to-Markdown conversion, LLM-based extraction with Zod schemas and JSON recovery, and URL cleaning for cleaner links. The README includes practical examples for e-commerce data extraction and instructions for local, serverless, and remote browser deployments, making it useful for building production data pipelines.

🚀 Service construit par Johan Denoyer