Dependencies Belong in Version Control

reactordev · on Nov 26, 2023

>“My background is C++ gamedev”

This is why you think this way. Your proposal is not new. The issue is not in the fact that you don’t have your dependencies, it’s the fact that you are coming from a world that doesn’t/hasn’t had support for it. Every other language has had package managers for this very reason.

Where do you draw the line? OS libs and package manager is ok but it’s not for a developer?

Go learn vcpkg and come back to us when you learn why everyone does it this way.

hoten · on Nov 26, 2023

vcpkg may expire assets after 1.5 years, so to achieve long-term reproducibility you will need to cache your dependencies.... Somewhere. Not sure what the expected solution is.

https://github.com/microsoft/vcpkg/pull/30546#issuecomment-1...

KRAKRISMOTT · on Nov 26, 2023

Microsoft needs to fund Vcpkg more, the developer experience (especially installation) still has room for improvement.

reactordev · on Nov 27, 2023

You see the sentiment from C++ folks. "Why do I need a package manager?". So until they realize they've been hamstrung for years, it will continue to rot on the vine.

groestl · on Nov 26, 2023

> Every other language has had package managers for this very reason

Nah. Package managers are nice, but they only solve Problem 1 (Usability). If you have any business continuity concerns, you'll at least cache and archive the dependencies, and your package management will effectively become a binary blob extension of your VCS.

forrestthewoods · on Nov 26, 2023

OP here.

> Your proposal is not new.

Good thing I never claimed it was? I merely claimed it was "the right thing".

I'm exceedingly familiar with vcpkg. And many other package managers in many other languages. Some good (Rust Cargo) and some bad (Python pip/Conda).

Have you ever worked in a repo that committed all of its dependencies? I have. It's quite nice! Maybe you should have an open mind and come back to us when you've tried the other side.

reactordev · on Nov 27, 2023

I don't want to download 3TB of dependencies from git. What is your gripe with vcpkg? What short-comings does it have? Do you need to keep archives of your dependencies? That's an IT problem. I simply don't agree that it's "The Right Thing". The only thing that is right is making sure you can build. Whether that's pulling dependencies from the internet. Pulling dependencies from your OS. Or Pulling dependencies off a USB stick authorized by the government for use in the clean room you must build and install the software into?

I don't want a world where 1 CVE requires 100,000 commits to all the repo's of the world. I want one where 1 CVE requires 1 commit to 1 repo and everyone else get's their share.

Bifurcating your software out of the gate is not a sound software development strategy. It's just laziness. If you need to build on 10 year old hardware, you can checkout 10 year old Linux. You can build for 10 year old windows. You can build for 10 year old Mac (time limiting...). You can build Win 3.1 apps today with vcpkg. Storage is cheap. Maybe not so much on Github, but Github isn't the only place you can run git. I've worked at organizations that archived and kept every build, every dependency, every configuration, for every version, since the inception of the software project. Nowhere in the repo is all of that information.

forrestthewoods · on Nov 27, 2023

> I don't want to download 3TB of dependencies from git.

Oh you didn’t read the post. Got it.

reactordev · on Nov 27, 2023

I beg your pardon but I did read it and your argument is flawed. All of it comes down to the fact that you don’t want to spend step 0 configuring your build environment and I completely agree with that statement. What I don’t agree with is you jumping to conclusions about its VCS’s fault. It’s not. It’s C++’s fault. C/C++ needs to be opinionated to keep people on the rails, otherwise you end up in autoconfigure hell. CMake is the most widely adopted build processor, GCC or CLang, don’t chose but let the toolchain decide. If you have specific reasons for clang over gcc or whatever than that responsibility is up to you. Don’t go preaching VCS anti patterns because you can’t make -j8 on first clone.

You simply can’t just download a zip of master, with all of the dependencies, disconnect from the internet, and hit build. You don’t have enough disk space. I’d LOVE to see you try doing that on an embedded environment.

forrestthewoods · on Nov 27, 2023

We agree that it is good to not force every user to spend time configuring their build environment. That’s a start!

I propose solving this problem by storing the necessary environment components in VCS. This eliminates the vast majority of environment setup issues.

Many many many companies do exactly what I propose. It works great. The primary reason that more projects don’t use this approach, imho, is because Git doesn’t have the necessary features. Other VCS tools do, Git does not. I think it would be good to have a popular VCS tool, Git or otherwise, with this capability.

> You don’t have enough disk space.

Sure I do. Downloading dependencies from a package manager or downloading dependencies from VCS requires the exact same amount of disk space.

The nice thing about VCS is it works for all platforms, for all languages, for all dependencies.

cxr · on Nov 27, 2023

> You simply can’t just download a zip of master, with all of the dependencies, disconnect from the internet, and hit build. You don’t have enough disk space. I’d LOVE to see you try doing that on an embedded environment.

Huh? What does this have to do with anything? It looks like you just moved the goalposts, hard.

KRAKRISMOTT · on Nov 26, 2023

> Go learn vcpkg and come back to us when you learn why everyone does it this way.

IMO it's mostly because of a culture of laziness among system programmers. They like to kick the ball down the road and make dependency management the distro's problem. Too many embedded engineers not being paid enough. When you are building a product you shouldn't be wasting your time on this sort of "make work" that reinvents the wheel multiple times. Come to the machine learning world and everything including the kitchen sink and graphic drivers are bundled in the build because otherwise your deployment won't work and you can't afford to pay engineers that cost half a mil a year to spend their time fiddling with deployment binaries.

reactordev · on Nov 27, 2023

>"you can't afford to pay engineers that cost half a mil a year to spend their time fiddling with deployment binaries"

Maybe because you're spending half a mil a year on folks doing matrix and vector math? I agree the reasoning is flawed. In order to have a decent ML shop, you need devops. DevOps will make sure your builds work and can be deployed into the architecture of whatever product you're building. Not saying this is a person, but you should be practicing good software engineering first, before trying to AI things. You can grow both skill sets at the same time but one definitely comes before the other. Any company who is just deploying ML models with the kitchen sink and throwing everything over the fence is going to need to brush up on their investigative skills because they will be hacked.

moomoo3000 · on Nov 26, 2023

Do you have any links you can recommend for how this is done in machine learning projects?

KRAKRISMOTT · on Nov 26, 2023

The short version is to adopt modern tooling (the vcpkg suggestion is an excellent one) and dependency management rather than using OS specific tools (unless you are on Windows). Part of the reason for this mess is because the Unix world operates on an evergreen philosophy and nothing is truly backwards compatible out of the box without manual intervention. The modern web development and machine learning world runs on the opposite doctrine that programmer time is the most expensive commodity above all else; bandwidth is cheap, storage has a negligible cost, and horizontal scaling can sometimes fix compute bound problems. Deployment processes are thus optimized for reliably reproducible builds. Docker is the classic example: bundle literally every dependency possible just to ensure that the build always succeeds, anywhere, anytime. It has its downsides but it is still one of the most widely used deployment methods for a reason.

In the Windows world, you often find desktops with ten different copies of the "Windows C++/.Net redistributable" (the windows version of the C++/CLR standard library dynamically loaded artefacts) installed because each individual app have their own specific dependencies and it's better for them to bundle/specify it rather than rely on the OS to figure out what to load. The JavaScript, Julia, Rust, Go ecosystems all have first party support for pulling in driver binaries that may be hundred of gigabytes in size (because Nvidia is about as cooperative as a 3 year old child). You don't waste time fiddling with autotools and ./configure and praying that everything would run. Just run `npm install` and most if not all of the popular dependency heavy libraries would work out of the box.

reactordev · on Nov 27, 2023

To further these suggestions. Act as if your "installation" is your "deployment" and perform all the necessary checks to ensure your dependencies are there (and are the correct versions) before running. In .Net, this is handled for you mostly by the framework. In Go, everything is compiled together mostly so you (again) don't have to worry about it. In javascript or python it's assumed that you can npm install or pip install your requirements and that the versions will match. From there, you can treat that as your final build and run it.

As a C++ game developer myself, I make sure that my dependencies are part of my repo as submodules so that I can update/pull and build the version I need to from git tag versions.

So if you are tagging your releases, your final outputs, in your git source tree, then going back to a version from 20 years ago is just as simple as git checkout v0.0.1

Vcpkg for C++ dependencies is another option (my preferred if you don't go git submodule route) and ALWAYS USE CMAKE! Don't opt for some crazy build setup, or some internal build tool used by <insert FAANG here> that they force you to use (V8 team, if you're reading this, fix your build pipeline).

KISS. Keep it simple slick. If your package isn't available in the OS package manager, it's time to adopt a package manager or adopt a devops practice that allows you to revert to any version of the code you need (git submodule route).

cxr · on Nov 26, 2023

> Go learn vcpkg and come back to us when you learn why everyone does it this way.

What reason do you have in mind? Can you articulate it in a way that can be quantified/falsified?

The last time I pressed someone on HN about this[1], it quickly devolved into vague handwaving[2] of the same sort—that we all already know why so we don't have to talk about it.

(Asking because I thought I knew, and then one day I realized I couldn't actually explain it, at least not in a way that I could be confident that everyone else would actually agree with. It feels a lot like the hand of Ra[3] is at play.)

I'm curious to hear from anyone who thinks they know why or can perhaps point to where they've already explained it. I'm particularly interested in responses from people who deal with NPM. (What is the "NPM hypothesis", i.e. the value proposition for why having a skeleton source tree in Git that delegates the bulk of the work to `npm install` is objectively better—or subjectively better, even? Can you state it in the form of a falsifiable claim? If you're able to state a testable hypothesis and you have tested it, can you show your work?)

1. <https://news.ycombinator.com/item?id=37605966>

2. <https://news.ycombinator.com/item?id=37607202>

3. <https://srconstantin.wordpress.com/2016/10/20/ra/>

pinkgolem · on Nov 27, 2023

In case you are serious, and I am not super sure about that.

- depencys include blobs, or other blob like things(big code bases, think react)

- there is little value in seeing changes in a blob or blob like things as changes in those have a bad signal to noise ratio. Version information (minor/major) + changelog is much more interesting for the average dev

- there is a performance penalty for having to many things/big things in git

- there is no nice way to keep dependencies up to date in git

- the discoverability of git/GitHub has not been great

You can use git as a package provider, see pip for example, but using git alone is painfull.

forrestthewoods · on Nov 27, 2023

This is an argument that Git has serious limitation (it does). It's not an argument against the theory of dependencies in VCS.

reactordev · on Nov 27, 2023

Here's an argument for not having your dependencies in your repo as hard source. Any CVE's will need to be patched by you or you'll need to check-in a huge diff tree for the latest version patched and then figure out of you've made modifications to your local blob of libidiot that you now have to rectify.

If you want to use git for version control of dependencies, for the love of god use git submodules. It's what they were designed for. If you want to check-in single file stb_lib fine but checking in MESA into your source tree is just plain stupid.

forrestthewoods · on Nov 27, 2023

Git Submodules are a well known source of pain and suffering.

For CVEs, if you use docker images you need to rebuild and redeploy all images which isn’t particularly different from updating committed code.

If you’ve modified a library you have to merge changes any which way. So not sure what difference it makes.

Checking in GPU drivers probably doesn’t make sense. That’s closer to Docker territory at least. Haven’t run into a case where I’ve wanted/needed to do that so no strong opinion.

reactordev · on Nov 27, 2023

Git submodules work flawlessly for me because I use them correctly. So YMMV. There’s plenty of documentation out there on how to do a git clone —recursive or git submodule update —init —recursive

Never edit the code in the submodules, edit them in the parent repos where they properly live. Your repo is for your code only. Don’t go wandering:editing in those submodules.

pinkgolem · on Nov 27, 2023

1/2 are pretty universal, and would make any VCS which includes dependencys basically look like any other package manager today.

Every other existing VCS btw to my knowledge has the same limits

cxr · on Nov 27, 2023

> In case you are serious, and I am not super sure about that.

What's unclear? I linked directly to a comment thread where we had this exact exchange. Yes, I'm being serious. Do you have solid backing for your position or not? Can you state it clearly or not?

"Seriously?" is not an argument. And in this context it is—ironically—a very unserious way to respond. If I had to guess, I'd say it's probably geared more towards being a vehicle for communicating some sort of subtext in the vein of, "You should abstain from asking these sorts of questions because the reasons are so obvious and so well-known that it is a foolish and embarrassing thing to ask, which makes anyone who is asking those questions look foolish and embarrassing."

> dependencys include [...] other blob like things(big code bases, think react) [...] there is a performance penalty for having to many things/big things in git

This is not a self-contained argument. A performance penalty compared to what?

> there is no nice way to keep dependencies up to date in git

Huh? (Alternatively: "How is this relevant?") When you check your dependencies in to the Git repo, you have all the same tools to update your dependencies that you have when you don't check them in.

pinkgolem · on Nov 27, 2023

>What's unclear?

You linking to Ra and going on other tangends make you seem very trollish/edgelord like.

>This is not a self-contained argument. A performance penalty compared to what?

Compared to not checking them in? I don't understand what you are trying to say here.

>Git repo, you have all the same tools to update

So I still need a package manager of my choice & a dependency file to install & update dependencys?

That also makes it easy to modify single source files which then only get overwriten on an dependency update as you have no clear separation between source code and dependencys.

cxr · on Nov 27, 2023

> You linking to Ra and going on other tangends make you seem very trollish/edgelord like.

Huh? Did you read the essay? There's nothing trollish or edgy about it. It describes exactly the type of response I got in the other thread (and now from you here).

> Compared to not checking them in?

I don't think you're thinking clearly. You can opt not to check your dependencies in—there's certainly less overhead involved when you don't have your dependencies around compared to having them checked into your repo. But the question then becomes, "How are you going to get your app to build/run if you don't have your dependencies?" We are not talking about building projects with dependencies versus projects that don't have dependencies. Same project, same dependencies. You need these dependencies. That's why they're dependencies.

When you say there are performance problems when you have a big Git repo, what are you actually comparing that _to_?

> I don't understand what you are trying to say here.

I haven't strayed from my goal. I am looking for someone to provide a clear answer to the question, "What is the NPM hypothesis?" In other words, state clearly in quantitative terms how it is that keeping e.g. 12% of your source code in Git and leaving the other 88% (or whatever) to be fetched after you clone the repo and run `npm install` is better than keeping 100% of the source code, dependencies included, in Git?

I'll give you an example by what I mean: if I see that you're trying to email someone some artwork and it's a bunch of Windows BMP files or raw TIFFs, I could say something like, "If you save these as PNGs and JPEGs, they will take up less space on disk and will be quicker to download." We can test this. We can figure out if it's true or not. We have a hypothesis that is falsifiable. This is the basis of science.

We're doing computer science. We are not asking people their favorite cheeses. We are not even asking them whether static types are better than dynamic types or if functional programming is good and object-oriented is bad. We are talking about version control systems. There are beliefs involved here, but they are not mere beliefs—I'd guess that you have some sort of belief about what makes NPM + Git the "right" way to do things and just checking the dependencies into Git the "wrong" way that can be stated in the form of one or more fact claims that we can scrutinize, do some measurements, work out the math, etc. and figure out whether those claims are true or untrue. I'm asking you to state clearly what your claims are. What are they?

> So I still need a package manager of my choice & a dependency file to install & update dependencys?

I don't understand this question. Is it a question? It looks like a declarative statement followed by a question mark instead of a period.

pinkgolem · on Nov 27, 2023

>Huh? Did you read the essay? Did I read a multiple page completely irrelevant Metapher... No, I did not.

> I haven't strayed from my goal...

Irrelevant tangent

> How are you going to get your app to build/run? Already mentioned, lockfiles.

>We have a hypothesis that is falsifiable.

Already delivered multiple:

you can test it... Git gets slow when you check in to many changes, esp. changed blobs.

Diffs on blobs are unreadable/the version change alone is more readable

Separation makes it more difficult to change your depencys, which is good because you can update easily/will prefer to use puplic interfaces. (This one might need a study to verify, but based on experience... Forks are a ton of work)

> I don't understand this question.

It's an assumption with an ask for clarification. I would the answer to be yes, or no because...

cxr · on Nov 27, 2023

I'm sorry, but I'm not going to continue the discussion this way. It's not productive.

pinkgolem · on Nov 27, 2023

Sure if you want

Would have really liked an explanation why:

Git gets slow when you check in to many changes, esp. changed blobs.

Is not provable, but well.

reactordev · on Nov 27, 2023

The value proposition of only checking in your code is the more dependencies you add to your source, the more you have to keep it all up to date. As your boss, I pay you for features/value, not to maintain dependencies. I get that Mozilla stuff is cool and that you want to use React or Next or whatever, but I won't pay you to work on that stuff. I'll pay you to work on <SaaS-Of-Tomorrow>. Also, I want to have smaller deployments because our cloud bill is enormous. So make sure any dependency we are using is using the same versions and that we only include 1 of them at build time. Vendoring or including your version of structlog (then npm installing someone elses version of structlog, and so on, and so on) will end up with 1024 instances of structlog package. Totaling 400mb. For a ~3mb library.

We use package managers to explicitly say "These things are required to build and run but don't belong to me, my project, or even my organization". You don't want to write Yet Another Markup Language ;) or write another JSX. So you use a package. If you include that package into your source code, and that package is updated, and other dependencies or even - users of your package, will end up with version mismatch and will come with pitchforks and torches to your door.

cxr · on Nov 27, 2023

Thanks for engaging. Unfortunately this is still a little bit too handwavy and vague.

I think it would help to flesh out your 3MB/400MB example. Something but not quite necessarily conforming to a set of steps to reproduce would be really reassuring in the, "Look, really, there's nothing up my sleeve" department.

Again, what I'm after is a hypothesis that can be clearly stated and subsequently proven or disproven. We're trying to do science here[1].

> the more dependencies you add to your source, the more you have to keep it all up to date

There's something unspoken here that you're leaving out, because that doesn't follow. (Or, to put it another way, "Compared to what?")

> We use package managers to explicitly say "These things are required to build and run but don't belong to me, my project, or even my organization"

I know what you're getting at, but that is neither explicit nor is being able to "say" that the reason why. (If that were the true reason why, a file that simply says exactly that would suffice. And indeed, continuing to use your existing package.json as-is is not something that's precluded by the commit-all-your-dependencies approach.)

1. <https://news.ycombinator.com/item?id=38427548>

reactordev · on Nov 27, 2023

If you want to store your whole dependency tree in git, be my guest, all 3rd party packages vendor into your repo. After a few months you’ll be spending more time keeping them all up to date than building features. This is something measurable. Something falsifiable. Something that has bit us in the past and is the reason why package managers exist.

cxr · on Nov 27, 2023

> This is something measurable. Something falsifiable.

Well, not quite. We're sort of getting there—much closer than before, at least. We were also kind of getting there, though, with your 3MB/400MB example, too. But you abandoned it. Why? A terse off-the-cuff, still-too-unclear comment like "you'll be spending more time keeping them all up to date" isn't exactly a rigorously specified experiment; again, this is way too vague. I don't understand why this is so excruciatingly difficult, esp. in the defense of a practice that is purportedly well-founded and presumably well-understood. It should be as simple as this:

"Hello, I see that you have a project and up 'til now have been storing everything in Git. Try doing this instead: [...] What you will observe is that compared to having everything managed as first-class objects in your version control system, this package management discipline affords you $X and $Y and quantitatively it results in $Z [&c...]"

It is your job to fill in the blanks. What _exactly_ are you comparing? What _exactly_ are the benefits, in quantitative terms? What are we going to see when we subject these claims to scrutiny? (And what _exactly_ should we do in order to subject these claims to scrutiny? What is the experiment?) In other words, please clearly state the hypothesis. It needs to be testable.

Being vague instead provides way too much cover for bad, not-well-founded ideas.

(If there's an argument that going to such lengths to state these things is needlessly tedious, there are a few things to take into account. 1. in science, you don't get to just wave your hands and skip over crucial stuff, so if you're attacking this, you're attacking the entire scientific process; 2. programmers are (or should be, at least) particularly adept at this level of tedium—programming a computer is like explaining things to a really, really dumb person by saying exactly what you mean, and providing testcases and/or written steps to reproduce an issue just an ordinary day (or should be, at least); 3. programmers are responsible for tons of bugs over their careers, most of them probably not released to the public because they've been spotted and debugged on the workbench (before any in-progress work is released into the wild) and in doing so are confronted with a bunch of misapprehensions during debugging sessions about what the code should be doing versus what it is actually doing; 4. for all the words written and energy otherwise spent in these nearly 100 comments, we could have been there by now, and ditto even if we narrowly focus only on this subthread.)

And to the extent that I can make sense of what you are saying ("you’ll be spending more time keeping them all up to date"), it looks like you might actually (maybe unintentionally) be doing a bait-and-switch here.

If today your workflow consists of:

1. git clone; 2. running your $PKGMGR's install step; 3. running `$PKGMGR update` from time to time + bumping the version strings in the JSON and/or lockfile

... then the suggestion to keep your dependencies in version control doesn't really change wrt updates—keeping the packages up to date remains as easy as ever (you just... run `$PKGMGR update` like before), the only difference being that it does change your workflow slightly: step 2 is no longer required, and step 3 necessarily involves an eventual push where you're not only pushing a change to a file of metadata that contains the version strings of your dependencies (you're pushing the contents of the dependencies themselves).

It sounds like you're saying that if you keep your dependencies in version control, you necessarily have to stop using automation for your updates, which is where the bait-and-switch comes in: you're now talking about a different argument entirely. We're talking about keeping a copy of the packages right there in the version control system—the one that you're already using—instead of having it half in the bag and delegating the rest to package infrastructure associated with (maybe even operated by) the package manager you're using. Am I misreading you? Are you actually saying something else—that the simple process of syncing the updates to your cloud-hosted repo (pushing those changes) is slow to complete?

(I also can't help but notice that in your other comments, you're actually strenuously arguing for the use of Git submodules—you don't actually disagree with the proposition that dependencies belong in version control... so what are we even doing here...?)

TillE · on Nov 26, 2023

Yeah vcpkg.json is the appropriate solution. That plus one git commit hash pins and verifies everything. Make port overlays if you need em.

And you really shouldn't get married to one particular version of a compiler toolchain unless you know for sure that something is going to break if you update. That just leads to a lot of annoyed programmers stuck using ancient tools for no reason.

codetrotter · on Nov 26, 2023

> The issue is not in the fact that you don’t have your dependencies, it’s the fact that you are coming from a world that doesn’t/hasn’t had support for it.

Exactly. With Rust you can commit your Cargo.lock file and you will then be able to rebuild your project with the exact same version of all your deps in the future. No need to commit the deps themselves.

kibwen · on Nov 26, 2023

And cargo-vendor is available out of the box as well (although it doesn't go quite so far as to add the entire Rust compiler toolchain to your repo): https://doc.rust-lang.org/cargo/commands/cargo-vendor.html

reactordev · on Nov 27, 2023

Exactly, In case you need to transport your dependencies and build air-gapped (happens) or you need to make some modifications (keep them as patches please, to be re-applied). Not so you can check them all into your repo. You should .gitignore your vendor folder.

HumanOstrich · on Nov 26, 2023

Of course, just rewrite everything in Rust! It's so easy!

reactordev · on Nov 27, 2023

That's not what was said. What was said was "This is how rust does it, take note".

matrss · on Nov 26, 2023

Nix (https://nix.dev/) can provide all of this, although in a smarter way than just through dumping everything in the VCS. Some projects use it already to provide a reproducible development environment and if done right a clean build is just a `nix-build` away.

HumanOstrich · on Nov 26, 2023

It does not provide "all of this" and would just make the experience worse for everyone.

matrss · on Nov 27, 2023

What is it missing? Using nix with nixpkgs you can pin all dependencies to a specific version identified by a hash value of that dependencies source code and dependencies, recursively building a dependency graph down to nixpkgs' bootstrap toolchains. That is deeper down than you could reasonably go by dumping everything in the VCS. Building becomes a single command that fetches everything it needs. Even the offline requirement is satisfiable if you prefetch all dependencies, it is just not happening by default (and for good reason, you might already have the toolchain from another project or some other dependencies are already on your system).

delotrag · on Nov 26, 2023

Honestly surprised the article didn't mention Nix or Guix. Seems like functional package management solves the exact problems the author is worried about.

FoodWThrow · on Nov 29, 2023

The article starts with a gamedev disclaimer. Most gamedev folks would rather die on that Microsoft hill than use another OS.

eternityforest · on Nov 26, 2023

Seems like what all we would need on the VCS side is something like Git LFS but with global chunk based deduplication, binary file patching algorithms, which serves the actual files over bittorent.

The last part is essential, because GitHub LFS is too expensive for anyone to just try out on their own.

But then, on the dev tool side, we would need automated ways to get all that stuff in the repo in the first place, and make sure that the IDE linting and autocomplete knows about it.

I used to put my python dependencies in a thirdparty folder, and have a line that alters the sys.path to point at it.

I just spent a week getting rid of all that and using PyPi, because it didn't play nice with linters and I couldn't swap different versions of dependencies with virtualenv, updating required manual work, and there was no management to make sure there wasn't version conflicts in subtle ways.

I like the idea of deps in VCS, but not as much as I like using tools as the devs intended and sticking to the official workflow everyone else uses.

reactordev · on Nov 26, 2023

This is horrible. Just use vcpkg or any number of other c++ package managers. Pypi exists for a reason. Maven and gradle exist for a reason. Nuget exists for a reason. NPM/Yarn as well.

Storing your dependencies with your code ensures you will be out of date, vulnerable to whatever vulnerabilities have been patched since then, and that your build will produce a different hash so windows defender will do a full binary scan on you every time. Not to mention an all-hands on deck weekend holiday to upgrade.

cxr · on Nov 27, 2023

> Storing your dependencies with your code ensures you will be out of date, vulnerable to whatever vulnerabilities have been patched since then

I missed the part where having a copy of the dependencies you last built with means that you are required to never ever ever make use of any bugfixes that have come out since the first time those dependencies were committed.

eternityforest · on Nov 27, 2023

If you don't have automated management, you will probably not regularly check for updates on every one of dozens of dependencies manually.

cxr · on Nov 27, 2023

Sorry, what? I genuinely can't understand what you're saying/talking about.

reactordev · on Nov 27, 2023

There's a John Cleese quote that comes to mind. What they are saying is, if your dependencies are manually checked in, are you going to be on top of keeping them up to date for all the sev-1 CVE's that are found? Patches that are backported? Bugs that are fixed? And keep them all in sync? If your answer is yes to any of these, go look up the John Cleese quote. That is, unless you have an army of devs and maintainers to do it and get paid for it (hi Red Hat!).

cxr · on Nov 27, 2023

> if your dependencies are manually checked in, are you going to be on top of keeping them up to date for all the sev-1 CVE's that are found? Patches that are backported? Bugs that are fixed? And keep them all in sync?

Compared. To. What? (Alternatively: "uh... yes?")

This is something like the third time that I've had to confront a comment alluding to the idea that having dependencies checked in to your repo means that updating them becomes, through some unspecified means, extremely difficult. And not just difficult, but intractably difficult. How exactly? Who knows—I've asked, but all I get are the same sort of continual allusions, as if it's some forgone conclusion on which there is common agreement, or it's a self-evident truth or something, but the actual thought process behind the remarks remain as impenetrable as the first time it was said. Please show your work. Please.

What precisely is the mechanism by which this this is supposed to happen and that forms the basis for your position? What two things precisely are you comparing to one another? Be specific. Don't be vague.

This conversation shouldn't be this exasperatingly difficult to have.

eternityforest · on Nov 27, 2023

The precise mechanism is that if your update mechanism involves a manual step per-dependency(You have manually copied and pasted things into your repo), then the time needed to check for updates will scale with the number of dependencies.

Manually checking all of them to even see what has updates available will take at least 10 to 20 minutes searching one by one. Likely up to several hours, even at 30 seconds per manual check.

The odds that laziness or tight schedules will take over and nobody ever actually does this are higher.

If one does actually do this, then they will be wasting 20 minutes regularly.

cxr · on Nov 27, 2023

> The precise mechanism is that if your update mechanism involves a manual step per-dependency(You have manually copied and pasted things into your repo), then the time needed to check for updates will scale with the number of dependencies.

I don't know what you're referring to. The resolution up for debate is "Dependencies belong in version control". You seem to be having a totally different conversation (where you "manually copied and pasted things into your repo").

What "Dependencies belong in version control" means is:

- DO NOT add them to .gitignore.

- DO `git add` them just like any other code.

Updating things doesn't change; you update the packages that you depend on the same way you do if you aren't checking your dependencies in—e.g. by running `npm update` (or whatever).

eternityforest · on Nov 28, 2023

True, if the language tooling supports that kind of workflow well, and you do it right then it should indeed be a different case.

Things like this SO thread seems to suggest manually copying dependencies is a somewhat common (bad ideaful) interpretation of the concept though:

https://softwareengineering.stackexchange.com/questions/3724...

cxr · on Nov 28, 2023

You don't need special language tooling support. `git add` works regardless of which language you're using; we're shedding excessive tooling here (read: disentangling it from places it shouldn't be).

"Why prefer a package manager over a library folder?" is a false dichotomy. Package managers are still responsible for managing packages—when your dependencies are checked into the repo, the package manager just operates on the packages that are already on disk instead of fetching them in a separate step and/or at the very last minute and being intertwined with the build process.

reactordev · on Nov 27, 2023

[flagged]

dang · on Nov 27, 2023

Whoa - you can't attack another user like that, no matter how wrong they are or you feel they are. We ban accounts that do this. Moreover we've had to warn you at least once before: https://news.ycombinator.com/item?id=25772871.

eternityforest · on Nov 27, 2023

It worked well for many years, but was really a time drain and in fact quite horrible.

I still have some JS code stashed right in the repo, but I think that should be fixed as part of a more comprehensive npm-ification.

There's lots of pretty much vanilla HTML/JS Mako templates, with a few one-page Vue applets, maybe it should all be ported to SvelteKit or just a proper Vue project, so there's a proper framework managing everything.

I didn't know c++ had general purpose package management now! I've only ever used C++ for embedded, where we've got PlatformIO(Which I love).

reactordev · on Nov 27, 2023

>"It worked well for many years"

Those years where the web was in it's infancy? Or the years prior to that of the PDP days? Because since the WWW, the industry has seen some serious growth. Not only in profits (yey) but in people (oh no!) and having to manage it all (what to do when there's two "pch.h"?). The problem reared its head in the Linux early days when trying to amalgamate everyone's patches and code into the monolith that is KERNEL. This is why git exists. Lessons learned from falling flat on our faces and having no solution but to come up with a solution (our best talent!).

However, we have amnesia. We forget what we did, let alone why we did it. So we are repeating ourselves, only this time it's amplified by Web 2.0 (3.0?), GitHub, OSS is cool, OSS is how to go from garage to Meta. So why do we balk at the idea of doing the same management to our source code repositories? Why must we have all the things in our repo to look at? Why can't we trust the ABI? There are legit reasons for these questions, however, saying "check everything into your repo, balloons and all" is not the way.

eternityforest · on Nov 27, 2023

Trusting the ABI is not a thing when you don't have an ABI(Because you're not doing a compiled language) and any or all of your dependencies could make breaking changes (either accidentally or just for fun because devs like to tweak function names) at any moment.

Package management is great(For development, for release and distribution I greatly prefer static linking or things like snap packages or APKs that trust the OS and bring everything else included), but only because you can pin stuff.

Pinning is basically the same concept as checking everything in, except the repo doesn't physically hold the data, and you can update with automated tools.

adrianmonk · on Nov 26, 2023

> Source code, binary assets, third-party libraries, and even compiler toolchains. Everything.

How far down the stack of turtles do you go, though?

Should you include the libc that the compiler requires to run? Other parts of the operating system? The kernel? An emulator for hardware that can run that kernel?

Eventually, all of those things will stop being produced or easy to find. Even if you have the libraries and compiler in version control, can you build a game that ran on MS-DOS 5.x or CP/M or DEC VMS?

My point is that you may want to just designate a stable interface somewhere (a language standard, some libc ABI, etc.) as the part you expect to not change. Be aware of what it is, and account for it in your plans. If a language is the interface that you expect to be stable, then don't upgrade to a compiler version that breaks compatibility. Or do upgrade, but do it in an orderly manner in conjunction with porting your code over.

If you want your code to be runnable by our descendants 1000 years from now, you should probably have an incredibly simple VM that can be described in a picture and a binary that will run on that VM. (In other words, you go down so many turtles that you reach "anyone who knows how to code or build machines can implement this VM".)

ClumsyPilot · on Nov 26, 2023

The VMs have a reasonably standard interface, so do container images, so kinda either could work as your ‘everything’.

Alternatively just make a system image of the entire PC, setup exactly how you want it, with some common hardware that you can expect to be available in 10 years -like a standard intel cpu.

lebean · on Nov 26, 2023

Everything! Check in your OS's system files! Build a new PC with parts identical to the one you're building on! Build many of them, one for each commit, and put them all in a warehouse!

adrianmonk · on Nov 26, 2023

I mean, if you're the military, this might be the right answer. If you're making a personal web site, probably not.

maksimur · on Nov 26, 2023

I've been thinking for a while how fragile computers are, from the software down to the hardware. I've been wondering if maybe it would be better to keep some things on paper, but paper doesn't last long too, and if it does, then you have the ink to think about. This drives me crazy sometimes.

rocqua · on Nov 27, 2023

Computers are fragile, but they are copyable. Quite easily and exactly copieable.

Consider how books written 1000 years ago still survive today. They were diligently copied. Now compare the effort (and accuracy) of such book copying to computer copying. Next Consider error correction and error detection that is trivially built in to digital data.

Nothing physical is permanent, but with just a little upkeep, information can live indefinitely.

hducy4 · on Nov 26, 2023

This is a strawman. You're extending their argument to an extreme so it sounds silly when it's not what has been proposed.

The argument is clearly to keep direct dependencies required for building in source control so that if you have a working build system, you can build the software indefinitely and independently from the internet.

Build systems and operating systems don't disappear overnight. Leftpad does.

adrianmonk · on Nov 27, 2023

Well, I wasn't trying to mock them or disagree, although I can see how it might have sounded that way.

What I am trying to do is point out that you can't just say "everything" without defining what that means and knowing why. Maybe you need less or maybe you need more. IMHO, you should do it by looking for places that might be stable and then (consciously) choosing one that meets your needs.

I've actually been in a situation where our source, all the libraries in our build, and the toolchain were not enough. It turned out that one of the libraries we used was itself dependent on another library, a certain dynamic library (which I'll call libfoo.so) included in the Linux install. We got new hardware, but it would not boot the same version of that Linux distro, probably due to lack of drivers. It would boot a newer version of the same Linux distro, but that included libfoo-v2.so, not libfoo.so. Since libfoo-v2.so was a different major version with breaking API changes, our project wouldn't compile.

dboreham · on Nov 26, 2023

This is both right and wrong:

If you're shipping shrink-wrap product, or equivalent, then you should freeze everything in carbonite so you can later recreate the thing you released. The article is written as if this is a novel idea but it isn't. Decades ago when I worked on products shipped on CD it was standard procedure to archive the entire build machine to tape and put that in a fire safe. In fact I subsequently (decades later) worked for law firms on patent cases where they were able to get those backup tapes to prove something about the content of software shipped on a specific date.

otoh for the typical present-day software project you don't want to re-create an identical build result as someone else got six months ago. For example if it's a JavaScript project then Node is going to be two versions out of date and probably full of security bugs from that time. So you actually want "code that behaves as expected but built with current dependencies and toolchain". Admittedly experience shows that for some languages this is an unreasonable expectation. Some level of ongoing maintenance to the codebase is often required just to keep it building with security-fixed dependencies.

cxr · on Nov 27, 2023

> otoh for the typical present-day software project you don't want to re-create an identical build result as someone else got six months ago

According to whom? I absolutely want to be able to do this.

> if it's a JavaScript project then Node is going to be two versions out of date and probably full of security bugs from that time

Sounds like building on something as unstable as Node is a significant drawback. That's a Node problem. (I missed how we got to talking about Node, anyway—you said "JavaScript project". NodeJS does not have a monopoly on JS. It's not even a great example of a particularly good JS runtime; for the reasons you mention—instability—it's actually a pretty good example of a bad JS runtime.) Fortunately if you're doing JS, there's an incredibly stable set of interfaces available for you to program against, and the best part is you don't need to download anything extra because every computer already has a JS runtime installed.

simonw · on Nov 26, 2023

If you're concerned about bloating your Git repository with non-unique binary files (as you should be) a trick I've used in the past that worked really well was to have a separate project-dependencies Git repository that all of my dependencies lived in.

I was working with Python, so this was effectively a Git-backed cache of the various .tar.gz and .whl files that my project depended on.

This worked, and it gave me reassurance that I'd still be able to build my project if I couldn't download packages from PyPI... but to be honest these days I don't bother, because PyPI has proven itself robust enough that it doesn't feel worth the extra effort.

I keep meaning to set myself up a simple S3 bucket somewhere with my PyPI dependencies mirrored there, just in case I ever need to deploy a fix in a hurry while PyPI is having an outage.

echelon · on Nov 26, 2023

This top-down, prescriptive suggestion is wrong. The truth is that it depends on project construction, language, build toolchain, operating system support, and libraries.

Python projects are a particular hell as the multiple attempts to solve dependencies didn't capture transitive dependencies well. Python also builds against native dynamically linked libraries, which introduces additional hell. But that's Python.

The author is trying to use ML projects on Windows, and he probably hasn't realized that academics authoring this code aren't familiar with best practices, aren't writing multi-platform support, and aren't properly and hermetically packaging their code.

To compare Python to a few other languages:

- Rust projects are almost always incredibly hermetic. You really only need to vendor something in if you want to make changes to it.

- Modern Javascript is much better than the wild west of ten years ago. Modern tools capture repeatable builds.

Don't go checking in built binaries unless you understand that your particular set of software needs it. Know the right tools for the job.

pmorici · on Nov 26, 2023

Isn't this the same kind of attitude around containers. "I don't want to think about or document the dependencies" lets just throw it all in a container full of crap that no one fully understands and it will "just work" because it worked for someone once before.

One of the things that I find very useful is to start from a base install of a particular OS and then be very meticulous about documenting each package I need to install to get software to build. You can even put this into the documentation and automate checking the dependencies are there with the system package manager. The dependencies and how you check them will be different across different distros and versions but at least you had an understanding at one point to work from if you need to figure it out going forward.

lsaferite · on Nov 27, 2023

> lets just throw it all in a container full of crap that no one fully understands

I mean, my usage of containers is very tightly controlled. I prefer to start with the scratch container in most cases. If I'm using software that needs more, then I'm very deliberate is what packages make it into the build container and the deploy container.

So, maybe some just toss in the kitchen sink, others are deliberate about build and runtime deps.

drewcoo · on Nov 27, 2023

The putting into containers is a repeatable recipe.

One would hope that everything you put in is also built from a repeatable recipe.

The point is that you can reuse the container - that environment - and know that it was always built the same as any other copy. Because, especially with the way some people use package mangers, those "repeatable" recipes are not always repeatable.

matisseverduyn · on Nov 26, 2023

"Security" would be a useful benefit/section to add to this post:

A.) If maintainers of your dependencies edited an existing/previous version, or

B.) If your dependencies did not pin their dependencies.

For instance, if you installed vue-cli in May of last year from NPM with --prefer-offline (using the cache / basically the same as checking in your node_modules), you were fine. But because vue-cli doesn't pin its dependencies ("node-ipc"), installing fresh/online would create WITH-LOVE-FROM-AMERICA.txt on your desktop [1], which was at the very least a scare, but for some, incredibly problematic.

[1] https://github.com/vuejs/vue-cli/issues/7054

brendoncarroll · on Nov 26, 2023

I don't think you need to go quite so far as checking gigabytes of executables into version control. If you download some dependencies at build time, that's fine as long as you know exactly what they are ahead of time. "Exactly what they are" means a hash, not a name and version tag.

The dockerized build approach is actually a good strategy, unfortunately it's done by image name instead of image hash in practice.

Upgrading dependencies, or otherwise resolving a name and version to a hash is a pre-source task, not a from-source task. Maybe it can be automated, and a bot can generate pull requests to bump the versions, but that happens as a proposed change to the source, not in a from-source task like build, test, publish, or deploy.

lamontcg · on Nov 27, 2023

You're better off just taking your build artifacts and shoving them into some artifact repo which is an already solved problem. If you want to ship all your runtime deps use an OCI container image, or something like flatpak which allows shipping bundled libraries. Those are the artifacts which should flow through your CI/CD pipeline (which may be as simple as just an automated test stage before manual deployment if you don't like those abbreviations I just used). Beyond that there's not really much utility to being able to rebuild exactly the same thing, because that's what you already shipped. The problem becomes pulling in new stuff and bumping the build, in which case you always have the cost of integration testing all the new code together. I do fully support Gemfile.lock/Cargo.lock/flake.lock/etc files being checked into git and the build process should be bumping those and building and shipping the artifacts off to a repo -- and that's good enough without actually checking all the blobs into git.

And the reason why git is bad at LFS and binary blob / artifact support is likely that everyone has adopted this method and so making git better for the use case isn't a very high priority.

cxr · on Nov 27, 2023

> You're better off just taking your build artifacts and shoving them into some artifact repo

Better off how? What does "better" refer to here? Size on disk? Data transfer? Time? CPU cycles? Brain cycles? Body health? Please be specific.

> there's not really much utility to being able to rebuild exactly the same thing, because that's what you already shipped

This strikes me as profoundly myopic (maybe even willfully). The unspoken benefit here is related to a general understanding that it's good to be able to establish a base case showing that you can build the same thing in the first place, so that when you make slight changes to any part of it, you end up with exactly the same thing as before plus your changes, rather than being the casualty of hundreds or thousands of other uncontrolled inputs that have nothing to do with what you changed. You're casting aspersions on and summarily dismissing the basic tenet of having a control[1].

> The problem becomes pulling in new stuff and bumping the build, in which case you always have the cost of integration testing all the new code together.

Just to emphasize: you are correct to say that this is a cost you always have, whether or not you're checking the results of package installation into your repo. Not checking them in doesn't ipso facto cause any issues related to upgrades to go away, and checking them in doesn't ipso facto increase the number of integration work you have to do. There's nothing magical about pulling the dependencies in over the network at the extreme last moment versus reading them from some other place on your own disk.

1. <https://en.wikipedia.org/wiki/Scientific_control>

lamontcg · on Nov 27, 2023

> Better off how? What does "better" refer to here? Size on disk? Data transfer? Time? CPU cycles? Brain cycles? Body health? Please be specific.

Getting shit done. Git doesn't support checking blobs into git very well, so you're pissing into the wind trying to do that because of philosophical opinions. It won't help you deliver your business objectives at the end of the day/week/month/year.

> This strikes me as profoundly myopic (maybe even willfully).

Based on a decade of experience, you do not need that incredibly strict level of control over reproducibility.

The bigger problem is just that your dependencies will change with new version and you need to pull in those versions and deal with the tech debt of integration. Even if you sit down and perfectly and accurately solve the problem of exact reproducability of yesterday's builds, you will always have the problem of bumping versions and dealing with tomorrows builds. Dealing with the constant incoming flow of technical debt starts to dwarf any small problems caused by slightly inconsistent "rebuildability", while the cost of pursuing the perfect ability to rebuild starts to climb. What you're deaing with is a multi-objective optimization problem[1] where the cost function is the finite resource constraints available to you where the two problems cannot be solved 100% under those resource constraints and where overall velocity suffers if the costs become too great, so you are looking for something close to a Pareto optimal[2] tradeoff between the different objectives rather than strict perfection along any one axis.

But yeah, keeping things pretty consistent is useful for dealing with tomorrow's builds as well, that is why I suggested all the different lockfile approaches. The optimization problem is nonlinear and up to a point solving yesterday's build problem does help tomorrow's build problem as well.

> Just to emphasize: you are correct to say that this is a cost you always have, whether or not you're checking the results of package installation into your repo. Not checking them in doesn't ipso facto cause any issues related to upgrades to go away, and checking them in doesn't ipso facto increase the number of integration work you have to do. There's nothing magical about pulling the dependencies in over the network at the extreme last moment versus reading them from some other place on your own disk.

The problem is inherently in the bumping of versions, and pulling in new code from upstream and dealing with emerging tech debt and integration problem. Once you have builds reasonably reproducible (lockfiles, etc) then just dealing with all the new code you're pulling in dwarfs the benefits of iterating on _perfect_ reproducibility of yesterday's builds.

And in fact if you are checking your artifacts into git _and then not bumping them_ you are just kicking the can down the road and hiding your future tech debt. Yeah, you can try to freeze time and build on 8 year old versions of your build tools, but then one day you're going to have a business requirement that requires a newer toolchain (new O/S targets, new chipsets, whatever) and then you're stuck with solving 8 years of technical debt.

> 1. <https://en.wikipedia.org/wiki/Scientific_control>

Do you really think I'm dumb enough to need you to condescendingly cite this wikipedia page?

[1] https://en.wikipedia.org/wiki/Multi-objective_optimization

[2] https://en.wikipedia.org/wiki/Pareto_efficiency

cxr · on Nov 27, 2023

Thank you for your response. Please do not assume ill intent on my part; there's nothing condescending (or passive aggressive, etc.) in linking the word "control" to the article on scientific controls. (There is also no attempting to seek philosophical purity on my part at the expensive of productivity. I am _only_ interested in productivity and reason here.) However, I don't think you are thinking clearly. You're certainly not speaking clearly.

A reminder that the resolution up for debate is "Dependencies belong in version control". (Specifically, the author is making the case that they belong in the same version control system you're using to store your own application code which is relying on those dependencies—i.e. dependencies should should not be in e.g. some ZIP files "over there", nor should they be versioned in an orthogonal version control system like Cargo, NPM, RubyGems, etc—to the exclusion of your primary version control system. In other words, they should not be late-fetched from a package host immediately after cloning or at some point leading up to and including immediately prior to the build. The author is saying you should check them into your repo. By doing this, once you've cloned your repo, then you have the dependencies corresponding to the last version that you built/deployed/whatever at the time that you built/deployed it or whatever.)

I am not saying this to be condescending. I am saying this because it is important to be clear and for us to stay on topic rather than straying into totally unrelated territory, talking about totally unrelated things, or bringing hidden assumptions into the discussion, etc[1]. I believe that you are either responding to what could be reasonably characterized as perceived constraints that are unstated on my part (i.e. that I'm failing to disclose them but you understand them nonetheless—I'm not, and there are none there), or you are taking liberties in making assumptions of your own without stating them. This is bad, and it doesn't lead to clear thinking or useful conversation for the purpose of getting there.

> The problem is inherently in the bumping of versions, and pulling in new code from upstream and dealing with emerging tech debt and integration problem. Once you have builds reasonably reproducible (lockfiles, etc) then just dealing with all the new code you're pulling in dwarfs the benefits

Okay, that's nice, but... compared to what? How does late-fetching your dependencies from a third-party package repo sometime between cloning and build time solve the integration problem? (Alternatively, you can interpret this question as, "How does keeping a copy of those dependencies directly in your repo exacerbate integration problems?") Your position seems to hinge on some magical property where in the latter case, when your repo has its own copies of those dependencies and you don't have to fetch them in a separate step, integration somehow becomes hard. (NB: I'm not trying to be reductive. I'm trying to make sense of your position.) It doesn't; there is no such magical property. You were right the first time when you said "you _always_ have the cost of integration", with special emphasis on "always".

I'll repeat myself from before: if you're running into upgrade issues because of a change to a dependency, then not checking in your dependencies doesn't ipso fact solve any of those issues, and checking in your dependencies doesn't inexplicably cause any more. Where the bits are coming from—whether some location on your own disk, or streamed in at the last second over the network—doesn't matter in this regard. If there's a breaking change upstream, it's going to break no matter what.

(Am I being uncharitable? Am I misunderstanding what you're actually arguing for/against? I'm not saying this out of convenience for myself, that is, that it would be convenient for me if you were. It strikes me as something that nobody would actually advocate for because of how obviously untrue it is. Again, I'm trying to make sense of your position based on what you've actually said. This is exactly why I pre-emptively pleaded that you be specific in your response.)

> Based on a decade of experience

I don't know why you think this is relevant. I have two decades of experience to your one. I am not a neophyte programmer who e.g. picked up React last year and is stymied or frustrated by all this tooling and looking for excuses to assure myself that it's not really all that important after all. I was around before any of this tooling was. I adopted this tooling when it showed up. But then I had a realization one day and stopped and said, "Waitaminute. What the hell are we doing? What _actual_ problem, in concrete terms, are we saying this solves, exactly?" My position is that we've sleptwalked into adopting complex tooling for which we can't clearly articulate the tangible benefit it's supposed to deliver when we use it this way. I can't articulate "The NPM hypothesis" myself, and in my post-realization prodding, I've been unsuccessful at getting anyone else to, either. I covered this all in my first comment to this submission[2]. But most importantly, though, none of this matters.

You seem to be approaching this as if does matter—at least that's how I'm reading things. I could be wrong. (I'm just as open to the possibility that this is a poor reading as I am open to the possibility that I've overlooked something on the subject of package management—or anything else. I'm a programmer after all. We write bugs and have to debug them and are shown every single day the things that we were wrong about. There's no reason programmers shouldn't be among the most humble people in the world, having gotten used to consistently being told they're wrong like this.)

The reality is it doesn't matter. There are fact claims in play here that are within the realm of science, and they either true or they are untrue. We should be able to subject them to scrutiny. Science doesn't care who's asking the questions, nor should it.

Feel free to point out anything that I'm overlooking or anywhere I've misrepresented what you're saying. If that has happened, I hope you do.

1. e.g. <https://news.ycombinator.com/item?id=38438530#38437340>

2. <https://news.ycombinator.com/item?id=38426051>

gravypod · on Nov 26, 2023

I think something like this, if someone could build it, would be amazing. Essentially, this would bring the benefits of a BigCo monorepo (and all the SWE-time performance benefits it has) to the rest of the world. Lots of nice things could come from tech like this being adopted.

I don't think it would get mass adoption though. Git+GitHub+$PackageManager is "good enough" and this approach wouldn't be significantly better for every use case.

cxr · on Nov 26, 2023

> Git+GitHub+$PackageManager is "good enough"

Weird way to frame it. Surely if that's good enough, then Git + repo host alone, sans package manager, satisfies the same criterion. It didn't/doesn't become more complicated by omitting package managers like npm, cargo, etc., and their associated methodology from the equation. It's the other way around. Adding their package management philosophy/methodology into the fray is strictly more complicated than not. It's extra.

justinwp · on Nov 26, 2023

> This is exactly how Google and Meta operate today

I wish it was that great. Works until you need to have a different version than some other piece of code, import a dependency that requires 100 others, or you need to build for some other platform than the "blessed" target. I choose google3 only when I have to or when I am certain of the scope. (I am in DevRel, so I have more options than typical engineers for my work)

Joel_Mckay · on Nov 26, 2023

True in some situations, but a fundamentally flawed approach to FOSS.

Indeed, if you are statically linking noncritical code, than for maintainability it is easier to version-lock an entire OS with the build tree in a VM. Thus, the same input spits out the same audited binary objects every time. In some situations it is unavoidable (see Microsoft or FPGA ecosystem).

However, a shared object library ecosystem is arguably a key part of FOSS when it works properly on *nix. As it is fundamentally important to keep versioned interoperability with other projects to minimize library RAM footprints etc. Additionally, all projects have a shared interest in maintaining each others security, rather than trying to waste resources on every application that has legacy stripped static obfuscated vulnerable leaky objects.

"Reuse [shared objects], Reduce [resource costs], and Recycle [to optimize]."

Sounds like familiar advice... =) like how some git solutions don't handle large binary artifacts well or simply implode.

Good luck, and maybe one can prove all those cautionary stories wrong. YMMV =)

o11c · on Nov 26, 2023

> Have you ever had a build fail because of a network error on some third-party server? Commit your dependencies and that will never happen.

And here is the true root of the problem - people use build systems as if they were package managers.

Use a real package manager (and install your dependencies before the build) and suddenly it is clear why dependencies do not, in fact, belong in version control.

gravypod · on Nov 26, 2023

I don't think a package manager solves this case. If you own a repo with software in it you'll make sure to back up everything in that repo. If part of the build of your code in your repo fetches something from the network that will eventually fail and you will no longer be able to build your software again.

I recently had someone ping me for a build of software that I wrote >10years ago. I couldn't build the code in the repo since I didn't contain the deps in my repo and the servers my build scripts reached out to where down.

amluto · on Nov 26, 2023

It’s worse than just that. Say I pick Fedora as my base. Now my project is a pile of SRPMs, that need a nasty bootstrap to build (okay, this is hard to avoid), and I’m stuck using the Fedora build system. Which sucks, sorry. (Okay, at least there is only one Fedora build system.) Also a whole OS is dragged along. If I want to edit a dependency, eww, I have to commit patch files and muck with .spec files. Good times.

Good tooling like the author describes could exist. No distro I’ve ever used is it. Gentoo is probably closer than Fedora, as Gentoo is actually intended to build packages as opposed to just running them.

api · on Nov 26, 2023

If you mean an OS package manager, now your dependencies are all really old and you have only one version to choose from. There’s a reason Rust and Go built their own dependency fetching systems and other languages like Java has had them for years.

reactordev · on Nov 26, 2023

And you should use them instead of checking in Jar files into your repo…

api · on Nov 27, 2023

So your builds fail if a remote site goes down, and the speed of that site rate limits your CI.

reactordev · on Nov 27, 2023

Nexus/Artifactory my friend. If your site is down, you have bigger issues on your hands. If the remote end is down, cache is there to save you.

shortrounddev2 · on Nov 26, 2023

vcpkg is really awesome, but the versioning system has something to be desired

api · on Nov 26, 2023

“Vendoring” dependencies was a good idea that fell out of favor because it wasn’t quite implemented right. It should be revisited.

jbverschoor · on Nov 26, 2023

I think you forgot to include:

  toolchains
        \win arm
        \linux x86
        \linux arm
        \macos x86
        \macos arm

What I do think is that dependencies should be versioned, and their artifacts should be immutable.

Dependency management is not the only thing wrong with gamedev

TeMPOraL · on Nov 26, 2023

> What I do think is that dependencies should be versioned, and their artifacts should be immutable.

And you should keep a copy of those artifacts, and build from it.

Having a hash is not enough. People have this weird assumption that you can just fetch all your dependencies by their hashes from the cloud. But cloud is just other people's servers; no one is going to be hosting build artifacts indefinitely just for you.

jmisavage · on Nov 26, 2023

I agree in principle, but you also need the same machine too otherwise new OS and hardware might introduce issues too. I resurrected an old project that needed to convert LESS to CSS and the node version required couldn’t run on my machine. Upgrading it to a version that could introduced filesystem changes that broke the packages that it was looking for.

Now just imagine businesses in the middle of a platform shift like Macs going from Intel to Apple’s own ARM chips. Eventually you’ll going to be missing something and all this work of bundling everything will end up being busy work.

spiffytech · on Nov 26, 2023

Is it going to be a problem for your Windows/Linux developers that someone committed a node_modules with macOS-only binaries inside?

mr_tristan · on Nov 26, 2023

The challenge when I see posts like this is the people in charge of building this "check it all in" ecosystem usually forget about the developer experience and basically just implement a CI system. Cool, you can 're-run' an old build cleanly, which is good, but not enough.

How about commercial IDEs? Cloud environments? A lot of developer environments these days include a ton of stuff that likely doesn't make sense to check in, usually licensing config is annoying, or because you're relying on runtime services. And all this time engineers spend on their own machines is basically time wasted, which isn't really a great solution to pitch to a business.

Side note: I used to work for Perforce until the private equity sale. If there was a platform to vendor everything like this, it would be Perforce, because you could already do this kind of thing for years. AFAIK not many Perforce customers ever did this, and I don't think it was because Perforce wasn't capable. It's just a subtly wicked problem. Getting this right - just check out and go across different software development stacks - requires a lot of investment. It does look like Perforce has been acquiring many other aspects related to the application lifecycle, so in theory, they should be better positioned to be the "vendor everything on our stack" solution, but I'm not convinced this is going to work out well.

Cloud development environment vendors seem to be the best positioned as a product for solving this problem, because there is less of that "go figure out your DX" aspect left to the customer. But the right CDE would have to have a lot of enterprise-style controls. This is so new that I'm not sure who will get it right first, but my guess is that we'll get to a more "development to delivery" integrated environment, and away from a hodgepodge of tools managed per project.

malkia · on Nov 26, 2023

This came out this year from Perforce - https://www.perforce.com/blog/vcs/what-is-virtual-file-sync-...

malkia · on Nov 26, 2023

There is also this - https://github.com/microsoft/p4vfs and several other solutions - just need to dig around.

forrestthewoods · on Nov 26, 2023

Microsoft also has VFSforGit. Sadly they abandoned it to pursue sparse clones. I'm not sure the full story why. :(

hervem · on Nov 27, 2023

What about working over a way to build a dev. env. correctly? ansible, container, dev-container..

Git shouldn't be used for backing all dependencies, tooling, and OS.. And while we are at it, do we track git itself in the "toolchain"?

jiggawatts · on Nov 26, 2023

This wouldn’t be a problem if the default in VCS systems was to use something like S3 blob storage by default for large binary files. Just store the torrent-like Merkle tree hash in the VCS database.

eternityforest · on Nov 26, 2023

I wish packages managers had P2P fetching from machines on the LAN as a fully supported built-in feature.

Unfortunately, if they did so, they'd probably use IPFS, and I don't know when they're going to fix their idle bandwidth issues, a lot of people seem to just give up and use web gateways, completely defeating the purpose of working without single points of failure.

dharmab · on Nov 26, 2023

https://github.com/microsoft/p4vfs

jbverschoor · on Nov 26, 2023

Don't forget https://fossil-scm.org/

jmyeet · on Nov 26, 2023

No. No, they don't. Specifically, binary dependencies don't belong in a repo and you want to use binary dependencies rather than source dependencies where possible.

Once again, we see 20 years of dependency systems that have failed to do what Maven established as the bare minimum. Specifically:

1. Create signed versions of dependencies so you can't rebuild a given release;

2. Allow you to specify a specific version, the latest version or even the latest minor version of a specific major version;

3. Allow you to run internal repos in corporate environments where you might want to publish private libraries; and

4. Version information is nowhere near any source code. Putting github URLs in Go source files is the most egregious example of bad dependency management from a language in recent history.

Every line of source code, whether its yours or third-party comes at a cost. Depending on your toolchain, this may well increase compilation time and required resources.

You want reproducible builds. If you can do that without putting every dependency in a repo then you should. If you can't then you have a bad dependency system.

__MatrixMan__ · on Nov 26, 2023

I definitely agree about URL's in go.

Maven comes up all the time as an example of packaging done correctly. It just does JVM stuff though, right? Seems like it's winning at an easier game.

Reproducible builds are a hard thing to achieve in the general case. Even something as simple as packaging your files in a tar will blow your determinism.

I think retooling for determinism is worth spending time on, but I'm not sure I could convince my boss of that.

So far as I'm aware, all dependency systems are bad. Nix is the least bad one I know.

jmyeet · on Nov 26, 2023

So Java is for JVM languages and binaries. That includes things people probably don't care about anymore (eg Groovy, Clojure, Kotlin, Scala) because they all compile to JVM bytecode so as far as Maven is concerned, they're indistiYou cannguishable.

You can include any static assets you want (eg css, js, html files). I honestly haven't looked into that. Putting static assets in a project is relatively straightforward but a JS depdendency? Yeah, that's probably a no go. Or it's really awkward.

For hermetic builds, given a blank slate, I'd probably start with Bazel.

shortrounddev2 · on Nov 26, 2023

I think the only C/C++ package manager that is even close to maven is vcpkg

jmyeet · on Nov 26, 2023

Header file is both a huge strength and weakness of C/C++ and are really an anachronism now. It's very general purpose text substitution, which is very powerful, but means you have to include .h files to get types, signatures and so on. This kills binary compatibility (specificially, if you're linking to static or shared libraries, you still need a .h file).

Even C++ templates aren't really much better and they're still text replacement.

More modern languages have taken the use cases for .h files and incorporated them without the general purpose text substitution. Like Rust's macros a superior to #define macros. Type aliases (in various langauges) are better than #define uint32 unsigned int.

shortrounddev2 · on Nov 27, 2023

Modern C++ also has constexpr, consteval, and using aliases

malkia · on Nov 26, 2023

.exe may belong in p4, .pdb's not! :)

rch · on Nov 26, 2023

I'm annoyed that ever few months I'll start something new that I know will eventually run on Kudu and Impala (per my employer), but the local build requirements are such that it's more effective to start with Postgres and figure out porting later on. As a NixOS user, I know the answer, but I just haven't allocated the time. Maybe this holiday season... Advent of Nix or something.