Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> One of the things we learned very quickly was that having generated source code in the same repository as actual source code was not sustainable.

Keeping a repository with the prompts, or other commands separate is fine, but not committing the generated code at all I find questionable at best.



If you can 100% reproduce the same generated code from the same prompts, even 5 years later, given the same versions and everything then I'd say "Sure, go ahead and don't saved the generated code, we can always regenerate it". As someone who spent some time in frontend development, we've been doing it like that for a long time with (MB+) generated code, keeping it in scm just isn't feasible long-term.

But given this is about LLMs, which people tend to run with temperature>0, this is unlikely to be true, so then I'd really urge anyone to actually store the results (somewhere, maybe not in scm specifically) as otherwise you won't have any idea about what the code was in the future.


> If you can 100% reproduce the same generated code from the same prompts, even 5 years later

Reproducible builds with deterministic stacks and local compilers are far from solved. Throwing in LLM randomness just makes for a spicier environment to not commit the generated code.


Temperature > 0 isn’t a problem as long as you can specify/save the random seed and everything else is deterministic. Of course, “as long as” is still a tall order here.


My understanding is that the implementation of modern hosted LLMs is nondeterministic even with known seed because the generated results are sensitive to a number of other factors including, but not limited to, other prompts running in the same batch.


Gemini, for example, launched implicit caching on or about 2025-05-08: https://developers.googleblog.com/en/gemini-2-5-models-now-s... :

> Now, when you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit. We will dynamically pass cost savings back to you, providing the same 75% token discount.

> In order to increase the chance that your request contains a cache hit, you should keep the content at the beginning of the request the same and add things like a user's question or other additional context that might change from request to request at the end of the prompt.

From https://news.ycombinator.com/item?id=43939774 re: same:

> Does this make it appear that the LLM's responses converge on one answer when actually it's just caching?


Have any of the major hosted LLMs ever shared the temperature parameters that prompts were generated with?


I didn't read it as that - If I understood correctly, generated code must be quarantined very tightly. And inevitably you need to edit/override generated code and the manner by which you alter it must go through some kind of process so the alteration is auditable and can again be clearly distinguished from generated code.

Tbh this all sounds very familiar and like classic data management/admin systems for regular businesses. The only difference is that the data is code and the admins are the engineers themselves so the temptation to "just" change things in place is too great. But I suspect it doesn't scale and is hard to manage etc.


I feel like using a compiler is in a sense a code generator where you don't commit the actual output


> I feel like using a compiler is in a sense a code generator where you don't commit the actual output

Compilers are deterministic. Given the same input you always get the same output so there's no reason to store the output. If you don't get the same output we call it a compiler bug!

LLMs do not work this way.

(Aside: Am I the only one who feels that the entire AI industry is predicated on replacing only development positions? we're looking at, what, 100bn invested, with almost no reduce in customer's operating costs other than if the customer has developers).


> Compilers are deterministic. Given the same input you always get the same output

Except when they aren't. See for instance https://gcc.gnu.org/onlinedocs/gcc-15.1.0/gcc/Developer-Opti... or the __DATE__/__TIME__ macros.


From the link:

> You can use the -frandom-seed option to produce reproducibly identical object files.

Deterministic.

Also, with regard to __DATE__/__TIME__ macros, those are deterministic, because the current date and time are part of the inputs.


Determinism is predicated on what you consider to be the relevant inputs.

Many compilers are not deterministic when only considering the source files or even the current time. For example, any output produced by iterating over a hash table with pointer keys is likely to depend on ASLR and thus be nondetermistic unless you consider the ASLR randomization to be one of the inputs. Any output that depends on directory iteration order is likely to be consistent on a single computer but vary across computers.

LLMs aren’t magic. They’re software running on inputs like anything else, which means they’re deterministic if you constrain all the inputs.


LLMs are 100% absolutely not deterministic even if you constrain all of their inputs. This is obviously the case, apparent from any even cursory experimentation with any LLM available today. Equivocating the determinism of a compiler given some source code as input, with the determinism of an LLM given some user prompt as input, is disingenuous to the extreme!


Most LLM software isn’t deterministic, sure. But LLMs are just doing a bunch of arithmetic. They can be 100% deterministic if you want them to be.


In practice, they definitely are not.


Only because nobody cares to. Just like compilers were not deterministic in practice until reproducible builds started getting attention.


Why does it matter to you if the code generator is deterministic? The code is.

If LLM generation was like a Makefile step, part of your build process, this concern would make a lot of sense. But nobody, anywhere, does that.


> If LLM generation was like a Makefile step, part of your build process, this concern would make a lot of sense. But nobody, anywhere, does that.

Top level comment of this thread, quoting the article:

> Reading through these commits sparked an idea: what if we treated prompts as the actual source code? Imagine version control systems where you commit the prompts used to generate features rather than the resulting implementation.


Ohhhhhhh. Thanks for clearing this up for me. I felt like I was going a little crazy (because, having missed that part of the thread, I sort of was). Appreciated!


LLMs CAN be deterministic. You can control the temperature to get the same output repeatedly.

Although I don’t really understand why you’d only want to store prompts…

What if that model is no longer available?


They’re typically not, since they typically rely on operators that aren’t (e.g. atomics).


Sure, but compilers are arguably idempotent. Same code input, same output. LLMs certainly are not.


Yeah I fully agree (in the other comments here, no less) I just think "I don't commit my code" to be a specific mindset of what code actually is




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: