https://www.kl3m.ai/

<aside> 💡 AI Generated Summary

KL3M is a family of models trained on clean, high-quality content with clear provenance, avoiding copyright issues, breaches of contract, synthetic data from other models, and toxic sources. The first two models, kl3m-170m and kl3m-1.7b, have shown efficient performance on business content and low toxicity scores. They are already in use for tasks like drafting and revising time entries, invoices, contract clauses, and SEC filings. KL3M can be further trained on specific content, fine-tuned for safe conversational AI or specific tasks, and its underlying training data can be licensed for personal use.

</aside>

🍊 KL3M

How is KL3M trained?

KL3M is a family of models built on a shared set of training data principles.

Clean provenance

🍊 We know where every word in our training data came from and have clear documentation to support it.

High-quality content

🍊 KL3M is trained on content that is higher quality than the vast majority of the Internet.

No copyright issues

🍊 KL3M doesn't rely on "fair use" or violate copyright.

No breach of contract

🍊 KL3M doesn't "scrape" websites in violation of their terms of services or use policy.

No LLM synthetic data

🍊 KL3M does not contain synthetic data generated by other models like GPT, Claude, Llama2, or Mistral.

No toxic sources

🍊 KL3M does not contain data from sources that are known to be toxic.

https://www.kl3m.ai/assets/images/ft_cert_transparent.png