Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
PythonPaddlePaddle/PaddleOCR

PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

96.5/100
81.7KForks: 10.7K
View on GitHubHomepage →
Loading report...

Similar Projects

MinerU

93

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Python67.0K

unstract

88

LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows

Python6.6K

EvoScientist

85

🔬 Harness Vibe Research with Self-evolving AI Scientists

Python3.5K

llm_aided_ocr

57

Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFs

Python2.9K
Back to List