Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonopendatalab/MinerU

MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

91.4/100
55.7KForks: 4.6K
View on GitHubHomepage →
Loading report...

Similar Projects

wdoc

84

Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, etc

Python509

text-extract-api

66

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

Python3.0K

markpdfdown

70

A high-quality PDF to Markdown tool based on large language model visual recognition. 一款基于大模型视觉识别的高质量PDF转Markdown工具

Python1.7K

thepipe

74

Get clean data from tricky documents, powered by vision-language models ⚡

Python1.5K
Back to List