Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonallenai/dolma

dolma

Data and tools for generating and inspecting OLMo pre-training data.

68.2/100
1.4KForks: 175
View on GitHubHomepage →
Loading report...

Similar Projects

langextract

88

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

Python34.4K

LightRAG

92

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

Python29.1K

storm

71

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Python28.0K

Qwen

76

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python20.6K
Back to List