Classifiers ASL - Search News

'Constitutional Classifiers' Technique Mitigates GenAI Jailbreaks

Researchers at Anthropic, the company behind the Claude AI assistant, have developed an approach they believe provides a practical, scalable method to make it harder for malicious actors to jailbreak ...

VentureBeat

From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation

Enterprises, eager to ensure any AI models they use adhere to safety and safe-use policies, fine-tune LLMs so they do not respond to unwanted queries. However, much of the safeguarding and red teaming ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

'Constitutional Classifiers' Technique Mitigates GenAI Jailbreaks

From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation

Trending now