markdown.new + Jina Reader: Stop Feeding Your LLM Raw HTML
Two tools for converting web pages to clean markdown. markdown.new runs on Cloudflare edge, Jina Reader uses a URL prefix. Both slash token usage by 80% or more.
Same problem, two fixes
Raw HTML is a token furnace. You feed a web page into your context window and 80% of the budget goes on <div> tags and inline styles that add absolutely nothing. Rubbish way to spend your tokens.
markdown.new runs on Cloudflare's edge. Paste a URL, get clean markdown back in under a second. No signup, 500 requests a day free. Their own numbers: 16,180 tokens of HTML down to 3,150 as markdown. Five times more content per context window. Proper mint.
Jina Reader is even simpler. Stick r.jina.ai/ in front of any URL and it does the conversion. r.jina.ai/https://example.com gives you markdown straight back. Handles PDFs too, does image captioning, and you can target specific CSS selectors if you only want part of a page.
I reach for Jina when I need the extras like PDF parsing or JSON extraction. For quick page grabs where speed matters, markdown.new is the one. Either way, stop chucking raw HTML at your models.