Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What the hell is a "safety score for violence"?
 help



It's making sure AI condemns violence perpetuated by people without power and sanctifies violence of those who have it.

So long as those who have it deem it legal to perpetuate.

They define what's legal.

States are the most prolific users of violence by far.


ChatGPT will gladly defend any actions of the 'US government' from my testing.

Just as an unscientific anecdata point: from a quick test using the same prompt about being an independent journalist wanting to cover a report of the US/Israel/Iran double-tapping a refugee camp, ChatGPT consistently gave advice to beware disinfo, check my sources and be transparent about verifiability and sourcing of the claims.

However when the prompt was phrased to make it appear as an action of the US military it did push back a little bit more by emphasizing that it couldn't find any news coverage from today about this story and therefore found it hard to believe. In the other cases it did not add such context. Other than that the results were very similar. Make of that what you will.

EDIT: To be fair, when it was phrased as an action of the Israeli military it did include a link to an article alleging an Israeli "double tap" on journalists from Mondoweiss (an anti-Zionist American news site) as an example of how such allegations have been framed in the past.



I was sure the parent comment was a joke about OpenAI's recent deal with the DoD. But no, there it is, disallowing violence down from 90.9% of the time to 83.1%.

No, I was just remarking how ridiculous it is to pretend to do violence safely. It's like a fat score for butter.

Sorry I meant gradparent comment, by theParadox42.

Its how safely it can commit violence.

I asked an AI. I thought they would know.

What the hell is a "safety score for violence"?

A “safety score for violence” is usually a risk rating used by platforms, AI systems, or moderation tools to estimate how likely a piece of content is to involve or promote violence. It’s not a universal standard—different companies use their own versions—but the idea is similar everywhere.

What it measures

A safety score typically evaluates whether text, images, or videos contain things like:

Threats of violence (“I’m going to hurt someone.”) Instructions for harming people Glorifying violent acts Descriptions of physical harm or abuse Planning or encouraging attacks


I still can't tell which direction this score goes... Does a decreasing score mean it is "less safe" (i.e. "more violent") or does it mean it is "less violent" (i.e. "more safe")?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: