LLM2CLIP significantly improves already state-of-the-art CLIP models.
🥂 Gracefully face hCaptcha challenge with multimodal large language model.
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
TrailSnap (行影集) | AI-Powered open-source photo album for travel & life memories.(AI赋能的开源相册工具,珍藏旅行与生活点滴)
The agent that grows with you