bias is some normative lens that some people came up with, but it is purely subj...

semi-extrinsic · on Sept 26, 2024

If you forget about the social justice stuff for a minute, there are many other types of bias relevant for an LLM.

One example is US-centric bias. If I ask the LLM a question where the answer is one thing in the US and another thing in Germany, you can't really de-bias the model. But ideally you can have it request more details in order to give a good answer.

Al-Khwarizmi · on Sept 26, 2024

Yes, but that bias has been present in everything related to computers for decades.

As someone from outside the US, it is quite common to face annoyances like address fields expecting addresses in US format, systems misbehaving and sometimes failing silently if you have two surnames, or accented characters in your personal data, etc. Years go by, tech gets better, but these issues don't go away, they just reappear in different places.

It's funny how some people seem to have discovered this kind of bias and started getting angry with LLMs, which are actually quite OK in this respect.

Not saying that it isn't an issue that should be addressed, just that some people are using it as an excuse to get indignant at AI and it doesn't make much sense. Just like the people who get indignant at AI because ChatGPT collects your input and uses it for training - what do they think social networks have been doing with their input in the last 20 years?

slt2021 · on Sept 26, 2024

agree with you.

all arguments about supposed bias fall flat when you start asking question about ROI of the "debiasing work".

When you calculate $$$ required to de-bias a model, for example to make LLM recognize Syrian phone numbers: in compute and labor, and compare it to the market opportunity than the ROI is simply not there.

There is a good reason why LLMs are English-specific - because it is the largest market with biggest number of highest paying users for such LLM.

If there is no market demand in "de-biased" model that covers the cost of development, then trying to spend $$$ on de-biasing is pure waste of resources

slt2021 · on Sept 26, 2024

What you call bias, I call simply a representation of a training corpus. There is no broad agreement on how to quantify a bias of the model, other than try one-shot prompts like your "who is the most hated Austrian painter?".

If there was no Germany-specific data in the training corpus - it is not fair to expect LLM to know anything about Germany.

You can check a foundation model from Chinese LLM researchers, and you will most likely see Sino-centric bias just because of the training corpus + synthetic data generation was focused on their native/working language, and their goal was to create foundation model for their language.

I challenge any LLM sceptics - instead of just lazily poking holes in models - create a supposedly better model that reduces bias and lets evaluate your model with specific metrics

nickpsecurity · on Sept 26, 2024

That’s pre-training where the AI inherits the biases in their training corpus. What I’m griping about is a separate stage using highly-curated, purpose-built data. That alignment phase forces the AI to respond exactly how they want it to upon certain topics coming up. The political indoctrination is often in there on top of what’s in the pre-training data.