<aside> 💡 AI Generated Summary
KL3M is a family of models trained on clean, high-quality content with clear provenance, avoiding copyright issues, breaches of contract, synthetic data from other models, and toxic sources. The first two models, kl3m-170m and kl3m-1.7b, have shown efficient performance on business content and low toxicity scores. They are already in use for tasks like drafting and revising time entries, invoices, contract clauses, and SEC filings. KL3M can be further trained on specific content, fine-tuned for safe conversational AI or specific tasks, and its underlying training data can be licensed for personal use.
</aside>
🍊 KL3M
KL3M is a family of models built on a shared set of training data principles.
🍊 We know where every word in our training data came from and have clear documentation to support it.
🍊 KL3M is trained on content that is higher quality than the vast majority of the Internet.
🍊 KL3M doesn't rely on "fair use" or violate copyright.
🍊 KL3M doesn't "scrape" websites in violation of their terms of services or use policy.
🍊 KL3M does not contain synthetic data generated by other models like GPT, Claude, Llama2, or Mistral.
🍊 KL3M does not contain data from sources that are known to be toxic.