Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software, say computer scientists.

Just stop doing that!

Seriously, testing is not wasted effort and for any project that's large enough it's not slowing you down. For a very small and simple project testing might slow you down, for bigger things - testing makes you faster! And the same goes for documentation. And full source code should be part of every paper.

Many programmers in industry are also trained to annotate their code clearly, so that others can understand its function and easily build on it.

No, you document code primarily so YOU can understand it yourself. Debugging is twice as hard as coding, so if you're just smart enough to code it, you have no hope of debugging it.



The point is that since software development is not their main goal or background, their practices tend to be ad-hoc. We know the value of testing and documentation, but they do not. People don't know to stop doing something until they know it's a bad practice. And they're not going to know it's a bad practice until they discover that fact on their own (which can be a slow process) or someone teaches them (faster, but potential cultural problems).

That they should is basically a given in the article. The question is how to make it happen.


The mindset of a scientist is that the code is a one-time thing to achieve a separate goal - data for a paper. The code isn't supposed to last, it's simply a stepping stone. For a lot of folks, whose research areas tend to move around, there isn't always the expectation that you'll get to a 2nd or 3rd paper on the same data.

Now, all of this is different if you research actually is building the model. But, I'm speaking for experience on the rest. I've built plenty of software tools that I need "right now" to get a set of data.


It may not be intended to last, but it's still supposed to be correct. And, of course, there's probably gobs of software out there was not intended to last, yet did.


The thing about scientific code is that it's often a potential dead end. The maintenance phase of the software life cycle is not as assured as it is in industry.

Writing good engineering software is not the scientist's goal so much as demonstrating that someone else with a greater tolerance for tedium (also someone better-paid) could write good engineering software.


Exactly. I'd go further: in industry, the software is typically the end product, and the quality of the software is inherently relevant. In science, the output (the prediction of the simulation, the result of the analysis, etc.) is typically the end product, and the quality of the software is relevant only insofar as it affects the quality of the output.

In practice, of course, the quality of the software often does affect the quality of the output---but time spent on software quality creates less immediate value than it does in industry.


You won't see any clean code written by scientists until (major) journals make it mandatory to submit code for peer review and publication.

When it happens, I hope that they'll manage to agree on a sensible license (even though I won't set my hopes too high).


I have all of my code on github under a CRAPL license [1]. It assumes a certain amount of good-faith from others, but I feel that if you're worrying about getting scooped, your problem isn't ambitious enough. Luckily, my adviser agrees, and is very in favor of open releases of data [2].

[1]http://matt.might.net/articles/crapl/ [2]http://www.michaeleisen.org/blog/?p=440


The terms of the license are good, but its name is literally crappy:-/


That will never happen, because the universities are bigger than the journals and will push back. The universities want to own the code if there's money to be made. Stanford made a small fortune from Google, for example. If journals required code review, other journals would pop up that wouldn't require it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: