markitdown: Convert Anything to Markdown for Your Agent
Microsoft tool that converts PDF, DOCX, PPTX, images, audio, HTML, and more into clean markdown. Perfect for feeding documents into your coding agent context.
What It Does
Your agent needs to read a PDF. Or a Word doc. Or a PowerPoint. Or an Excel spreadsheet. Or an image with text on it. You could faff about with custom parsers for each format, or you could use markitdown and convert all of them to markdown with one command.
pip install markitdown
markitdown path/to/document.pdf > output.md
95,000 stars. Microsoft-backed. Handles PDF, DOCX, XLSX, PPTX, HTML, images (OCR via LLM), audio (transcription), ZIP archives, EPUBs, and more. The output is clean markdown your agent can read without choking on binary formats.
Why It Matters for Agents
Every document your agent can't read is context it's missing. Client sends a spec as a PDF? markitdown spec.pdf and pipe it into the conversation. Legacy docs in Word? Same thing. The conversion is fast and the markdown output is token-efficient compared to raw HTML or extracted text with formatting artefacts.
I use it as a preprocessing step whenever I'm feeding documents into agent context. Beats copying and pasting from Preview like a caveman.