Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonopendatalab/MinerU

MinerU

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

93.4/100
60.9KForks: 5.1K
View on GitHubHomepage →
Loading report...

Similar Projects

wdoc

81

Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, etc

Python517

json_repair

86

Repair malformed JSON from LLMs, APIs, logs, and user input in Python.

Python4.7K

text-extract-api

62

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

Python3.1K

docext

71

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

Python2.0K
Back to List