人工知能の最前線を推進する非営利研究イニシアチブです。オムニモーダルAIシステム、効率的なアーキテクチャ、大規模合成データに焦点を当てています。
Retrievatar is a multimodal dataset designed to enhance the retrieval-augmented generation capabilities of vision-language models, specifically focusing on fictional anime characters and real-world celebrities.
Retrieval-Based Multi-Turn Chat SFT Synthetic Data, a new 100k entry, multi-turn synthetic dialogue dataset for SFT, building on our work with CausalLM/Refined-Anime-Text.
We introduce our unique recipe for generating high-quality synthetic datasets to boost LLM performance, featuring our new 1M+ entry Anime dataset as a proof of concept.