> threads being penalised now for activity from several seconds ago, Exactly... ...

ghusbands · on Dec 26, 2021

I think you're suffering hindsight bias, here. A trace is rarely as clear as that, and it's hard to see the details it's not designed to expose.

Your original message would probably be better received if you'd omitted the "I think this problem would have been debugged and solved much quicker [...]" and its insulting implications and instead started with "Sometimes, I find that CPU activity traces can really help with diagnosing this sort of problem".

The_rationalist · on Dec 26, 2021

Please stop advocating for politeness over correctness. Sure hindsight help but regardless, a company such as Twitter should have experts at tracing that have tools and knowledge that goes beyond the average developer knowledge about tracing methodologies. Excusing that is an appeal to a lowering of technical excellence worldwide, which is majorly important and matter more than hypothetical feelings.

londons_explore · on Dec 26, 2021

> a company such as Twitter should have experts at tracing

In a big company, getting the person with the most skills to solve a problem to be the one actually tasked with solving the problem is very hard. This particular problem had many avenues to find a solution - and while I think my proposed route would have been quicker, if you aren't aware of those tools or techniques, then other avenues might be much quicker. When starting an investigation like this, you don't know where you're going to end up either - if it turned out that the performance cliff was caused by CPU thermal throttling, it would be hard to see in a scheduling trace - everything would just seem universally slow all of a sudden.

neerajsi · on Dec 26, 2021

On Windows, we have the xperf and wpa toolset that makes looking at holistic scheduling performance, including processor power management and device io tractable. Even then, the skillset to analyze an issue like the one presented here takes months to acquire and only a few engineers can do it. We have dedicated teams to do this performance analysis work, and they're always in high demand.

jeffbee · on Dec 26, 2021

I completely agree. KUTrace would have been ideal for this and indeed KUTrace was developed to diagnose this exact problem.

piyh · on Dec 28, 2021

What tools would you use to start going down this route? I'm completely unfamiliar but would like to learn more.