Supercomputers are exactly the wrong sort of tool to use for this- nearly every ...

godelski · on Aug 13, 2019

I'm confused by this comment, especially since you seem to work with super computers. One of the biggest challenges in ECP (exascale computing project) is disk IO. So much so that they were inventing complicated heterogeneous architectures. In fact many teams are just skipping disk all together and using in situ methods, only saving results. I would think that would be needed here.

But the question is what heavy processing computer doesn't have IO issues? Also, Blue Waters isn't really a GPU super computer like Summit and Sierra are (or the up coming Aurora and Frontier). It has 4228 nodes (out of ~27000 nodes) that have GPUs on them, and they only have one, and they are Keplers. Those aren't great GPUs and aren't going to do very well in parallel either. There's a big bottleneck in GPU IO. I think this program will not be utilizing GPUs very heavily. Worse, they don't have many CPUs per node. It's 8-16 cores and 32-64 GB memory per node. There's going to be a lot of time spent in communication.

I'll admit that BW doesn't seem like the best computer to the job, but you use what you got. I'll buy the argument that this is the wrong computer for the specific job, but what would you use besides a super computer? (I think Summit would be a good computer for this job)

HenryKissinger · on Aug 13, 2019

[flagged]

dekhn · on Aug 13, 2019

I'm a supercomputer expert - that's a part of my job - who works on data processing fulltime. I know these codes, I know the hardware architecture, and my statement above is technically correct. I'm not missing anything.

throwawaye373 · on Aug 13, 2019

I haven't read the details on the implementation but I used to be a developer on Google Earth and you are exactly right on the Disk IO being the main bottleneck for building large terrain datasets with 3D globes

HenryKissinger · on Aug 13, 2019

[flagged]

dekhn · on Aug 13, 2019

I direct you to my literature. https://scholar.google.com/citations?user=TFgipkIAAAAJ&hl=en...

You may find especially interesting an article I published that used an embarassingly parallel computing system that I built which ran on Google's internal infrastructure (not a supercomputer) in response to my codes not running well on supercomputers.

If you wish to have a substantive discussion about my statements (rather than flinging insults), I'd be happy to.