Good point, but that's may not be in scope either... since this is not even something you can get from Debian easily: not just looking at a Debian pool or diving into a package control files AFAIK?
Say I rebuild a Debian package with some new build options.
Is this a the same or a new package? I'd say a new one.
Is this the same name? I'd say a new one.
Is this distributed by Debian? Nope, so this comes from another repo and pool, right?
The idea with PURL is to have simple and short PURLs for the common case, and make it possible to handle less common cases. Rebuilding a package and sharing it on another repo would be a less common case to me? WDYT?
I've worked with ingesting and generating SBOMs a bit which is where my experience with PURLs come from. I loved the idea because it gets about 80% to usefully identifying software components. So just to be clear I don't dislike them and think you've done good work.
I don't necessarily agree that a site-built package is a different package. It's just a single line of text might not be enough to encode build configurations.
A binary package built by Debian's build fleet is a unique artifact signed by the project's keys. It's a thing with a canonical identifier. A deb-src, Gentoo package, or FreeBSD port might have a canonical identifier for the original source but that isn't canonical once it's built on a machine. In many cases the difference is immaterial but there's a lot of #IFDEFs in a lot of code. Then whatever autoconf generates for any system.
The canonical source distribution is useful information but then so is the build information. I'm not sure this can be captured via qualifiers, at least I can't think of a way to do it.
Maybe just a source package is enough. For reporting a bug or CVE knowing something came from a particular source package is a start to triaging an issue. But you'd want a distinct namespace for source packages. A source package namespace at least tells you "in summary this package contains all the diffs Debian uses" versus the PURL for the upstream source package (from GitHub etc).
Nitpick: Debian does not sign binary packages, they sign Release files, which contain hashes of Packages files, which contain hashes of .deb binary packages.
Debian uses .buildinfo files for builders to record the information about the inputs to building a binary package, including the source hashes, environment variables etc.
A site-built package could be a different package, but it could also be a bit-identical package, due to Debian working on Reproducible Builds.
> It's just a single line of text might not be enough to encode build configurations.
that's the tough part, and IMHO outside of PURL? ... Note that for C/C++ code ... @alcroito mentions cps in the same comment page at https://news.ycombinator.com/item?id=44196246 ... and a quick glance is that this attempts to capture these details may be?
IMHO, a bare git stuff would be a git URL as specified in pip and SPDX and not a PURL... I would be interested to know more about your use case. Feel free to drop a note at pombredanne@aboutcode.org
For "generic" interface-based dependencies, that's tougher.
This is a problem with a few ecosystems. OTH rpms, debs and Java OSGI... and may be a few more. We need to survey these to find if we can solve that and if this is a PURL problem at all.
Can I rope you in and interest you in filing an issue in the spec so we can move the discussion there? :P This would be great.
Well, for one thing a dependence on an interface could not have a hash to bind the provider(s), but one could have a dependence on an interface and also associated dependencies on one-of-N providers of the interface, then the latter could have hashes.
Basically you need a way to indicate "this package is an interface and requires providers of it" and also you need a way to indicate which packages are the associated providers (either as attributes of the interface PURLs, as attributes of the provider PURLs, or both).
All abstractions leak eventually, so we need that escape hatch IMHO. Otherwise you end up with the other issue which is that there are stuff you cannot track with PURL?
> isn't the issue that sometimes a given scanner can't know from where the package is sourced?
That's the problem: there is no metadata with or in libssl.so.1 that I can reliably use to tell what this is
Eventually I can see a solution made of
1. create the metadata, say a simple YAMl or deb822 key-valud pair file that can then be included upstream or as an overlay
2. define a simple spec for binary formats to include a PURL (say in an ELF section or a WinPE string or sorts, where many of these are already stored)
3. create content-based tools like we have in PurlDB to match code, but may be more like a bunch of generated yara rules that would match symbols and strings from source to binaries and can recognize that libssl.so.1 is from OpenSSL 1.1.1g.
And based on that approach we can either:
1. create new, sensible types as needed
2. and/or maintain a last resort open registry of generic types at least so we get some sanity in the process.