On relative paths: the current behavior is intentional for simplicity. µJS checks whether the URL starts with a slash (but not a double slash) to identify internal links. No one has reported this as an issue so far, but it's a valid feature request and I'll keep it in mind for a future version.
On the integrity attribute: the reason it was missing is that the library was evolving quickly and the hash would have changed with every release. Now that it's stable, I'll add it.
That's a valid use case. For opt-in per link, you can disable global link interception with `mu.init({ processLinks: false })` and then add `mu-url` explicitly to the links you want µJS to handle. For the variable base path scenario, the `urlPrefix` option in `mu.init()` might also help. It prepends a prefix to all fetched URLs.
That said, proper support for relative paths is on the roadmap.
On the flip side, it’s easy to get a bit stuck down the road by the mere fact that you have a singleton. Maybe you have amazing performance and very carefully managed safety, but you still have a single object that is inherently shared by all users in the same process, and it’s very very easy to end up regretting the semantic results. Been there, done that.
One thing I found confusing about the nature article is that it mostly discusses conventional linear accelerator + bremsstrahlung X-ray radiation versus very high dose rate FLASH in the form of electron beams, proton beams, or even carbon ion beams.
Do we know that what the chemical mechanism for damage from charged particle beams is? Is it similar enough to compare directly like this? Are the timescales short enough that charge deposition might matter?
The article is a bit unclear, but we have both a very wide range of X-ray vs charged particle studies, and increasingly of conventional vs FLASH studies with a range of modalities (e.g. the seminal FLASH paper was FLASH electrons vs conventional electrons). FLASH photon vs conventional photons are also increasingly being generated, although they've been more of a pain to generate.
So it's clear there is a temporal FLASH effect, which is not purely a question of radiation type.
That's not to say it's necessarily exactly the same effect - we still don't have a perfect quantitative understanding of the effects of different radiation types even at normal dose rates, let alone when FLASH differences are added into the mix.
To add more context: yes, US Treasuries are exempt from state tax, and municipal bonds are tax exempt too. It's pretty rare for startups to hold them directly; they usually hold money market funds. It varies between different MMFs, but they can be partially state tax-exempt depending on what percentage of the underlying assets are federal bonds.[1] For instance, Vanguard shows you how much of each of their funds is tax-exempt here: https://investor.vanguard.com/content/dam/retail/publicsite/...
However, this tax exemption is usually priced in: muni bond funds, and MMFs that hold lots of tax-exempt assets, tend to return less than funds which are not tax exempt. For the majority of startups that operate at a net loss, tax-exempt funds are probably a bad choice, since you're earning less yield and the tax exemption likely doesn't affect you.
[1] The rules around this also varies from state to state; for instance, in CA, CT, and NY, you can only get any tax exemption if a fund is at least 50% tax-exempt in each quarter of a given year.
There are many systems that take a native data structure in your favorite language and, using some sort of reflection, makes an on-disk structure that resembles it. Python pickles and Java’s serialization system are infamous examples, and rkyv is a less alarming one.
I am quite strongly of the opinion that one should essentially never use these for anything that needs to work well at any scale. If you need an industrial strength on-disk format, start with a tool for defining on-disk formats, and map back to your language. This gives you far better safety, portability across languages, and often performance as well.
Depending on your needs, the right tool might be Parquet or Arrow or protobuf or Cap’n Proto or even JSON or XML or ASN.1. Note that there are zero programming languages in that list. The right choice is probably not C structs or pickles or some other language’s idea of pickles or even a really cool library that makes Rust do this.
(OMG I just discovered rkyv_dyn. boggle. Did someone really attempt to reproduce the security catastrophe that is Java deserialization in Rust? Hint: Java is also memory-safe, and that has not saved users of Java deserialization from all the extremely high severity security holes that have shown up over the years. You can shoot yourself in the foot just fine when you point a cannon at your foot, even if the cannon has no undefined behavior.)
Fully agreed. rkyv looks like something that is hyper optimizing for a very niche case, but doesn't actually admit that it is doing so. The use case here is transient data akin to swapping in-memory data to disk.
"However, while the former have external schemas and heavily restricted data types, rkyv allows all serialized types to be defined in code and can serialize a wide variety of types that the others cannot."
At a first glance, it might sound like rkyv is better, after all, it has less restrictions and external schemas are annoying, but it doesn't actually solve the schema issue by having a self describing format like JSON or CBOR. You won't be able to use the data outside of Rust and you're probably tied to a specific Rust version.
> You won't be able to use the data outside of Rust and you're probably tied to a specific Rust version.
This seems false after reading the book, the doc, and a cursory reading of the source code.
It is definitely independent of rust version. The code make use of repr(C) on struct (field order follows the source code) and every field gets its own alignment (making it independent from the C ABI alignment). The format is indeed portable. It is also versioned.
The schema of the user structs is in Rust code. You can make this work across languages, but that's a lot of work and code to support. And this project appears to be in Rust for Rust.
On a side note, I find the code really easy to understand and follow. In my not so humble opinion, it is carefully crafted for performance while being elegant.
> Depending on your needs, the right tool might be Parquet or Arrow or protobuf or Cap’n Proto
I think parquet and arrow are great formats, but ultimately they have to solve a similar problem that rkyv solves: for any given type that they support, what does the bit pattern look like in serialized form and in deserialized form (and how do I convert between the two).
However, it is useful to point out that parquet/arrow on top of that solve many more problems needed to store data 'at scale' than rkyv (which is just a serialization framework after all): well defined data and file format, backward compatibility, bloom filters, run length encoding, compression, indexes, interoperability between languages, etc. etc.
> (OMG I just discovered rkyv_dyn. boggle. Did someone really attempt to reproduce the security catastrophe that is Java deserialization in Rust?
Trusting possibly malicious inputs is an universal problem.
Here is a simple example:
echo "rm -rf /" > cmd
sh cmd
And this problem is no different in rkyv than rkvy_dyn or any other serialization format on the planet. The issue is trusting inputs. This is also called a man in the middle attack.
The solution is to add a cryptographic signature to detect tempering.
This is an unhelpful interpretation. With a decent memory-safe parser, it’s perfectly safe [1] to deserialize JSON or (most of) XML [0] protobuf or Cap’n Proto or HTTP requests, etc. Or to query a database containing untrusted data. You need to be careful that you don’t introduce a vulnerability by doing something unwise with the deserialized result, but a good deserializer will safely produce a correctly typed output given any input, and the biggest risk is that the output is excessively large.
But tools like Pickle or Java deserialization or, most likely, rkyv_dyn will happily give you outputs that contain callables and that contain behavior, and the result is not safe to access. (In Python, it’s wildly unsafe to access, as merely reading a field of a Python object calls functions encoded by the class, and the class may be quite dynamic.)
[0] The world is full of infamously dangerous XML parsers. Don’t use them, especially if they’re written in C or C++ or they don’t promise that they will not access the network.
> The solution is to add a cryptographic signature to detect tempering.
If you don’t have a deserializer that works on untrusted input, how do you verify signatures. Also, do you really thing it’s okay to do “sh $cmd” just because you happen to have verified a signature.
> This is also called a man in the middle attack.
I suggest looking up what a man in the middle attack is.
Ah, I see the confusion. rkyv_dyn doesn't serialize code. Rust is compiled to machine code. It would be quite a feat to accomplish.
I was a bit confused when you compared it to Python pickle and assumed you were talking about general input validation somehow.
I agree that pickle and similar are profoundly surprising and error prone. I struggle to find any reasonable reason one would want that.
As for the man in middle attack, I meant that if somebody intercepts the serialized form, they can mutate it. And without a cryptographic signature, you wouldn't know.
> rkyv_dyn doesn't serialize code. Rust is compiled to machine code.
Java is compiled to bytecode, and Obj-C is compiled to machine code. Yet both Android and iOS have had repeated severe vulnerabilities related to deserializing an object that contains a subobject of an unexpected type that pulls code along with it. It seems to be that rkyv_dyn has exactly the same underlying issue.
Sure, Rust is “safe”, and if all the unsafe code is sufficiently careful, it ought to be impossible to get the type of corruption that results in direct code execution, memory writes, etc. But systems can be fully compromised by semantic errors, too.
If I’m designing a system that takes untrusted input and produces an object of type Thing, I want Thing to be pure data. Once you start allowing an open set of methods on Thing or its subobjects, you have lost control of your own control flow. So doing:
thing.a.func()
may call a function that wasn’t even written at the time you wrote that line of code or even a function that is only present in some but not all programs that execute that line of code.
Exploiting this is considerably harder than exploiting pickle, but considerably harder is not the same as impossible.
You know very well what I meant by "compile to machine code". But you decided to interpret it in a combative way. Even though you seem very knowledgeable, this makes me want to stop discussing with you.
Ultimately you should read the code of rkyv_dyn to understand what it does instead of making random claims.
It will be faster for you to read the code than for me to attempt explaining how it works. Especially since you will most likely choose the least charitable interpretation of everything I say. There is very little code, it won't take long.
> You know very well what I meant by "compile to machine code".
I really don't. I think you mean that Rust compiles to machine code and neither loads executable code at runtime nor contains a JIT, so you can't possibly open a file and deserialize it and end up with code or particularly code-like things from that file being executed in your process.
My point is that there's an open-ended global registry of objects that implement a given trait, and it's possible (I think) to deserialize and get an unexpected type out, and calling its methods may run code that was not expected by whoever wrote the calling code. And the set of impls and thus the set of actual methods may expand by the mere fact of linking something else into the project.
This probably won't blow up quite as badly as NSCoding does in ObjC because Rust is (except when unsafe is used) memory-safe, so use-after-free just from deserializing is pretty unlikely. But I would still never use a mechanism like this if there was any chance of it consuming potentially malicious input.
> even a really cool library that makes Rust do this.
The first library that comes to mind when I think of this is `serde` with `#[derive(Serialize, Deserialize)]`, but that gives persistence-format output as you describe is preferable to the former case. I usually use it with JSON.
Maybe a little bit. But serde works with JSON (among other formats), and you can use it to read and write JSON that interoperates with other libraries and languages just fine. Kind of like how SQLAlchemy looks kind of like you’re writing normal Python code, but it interoperates with SQL.
I know "serde" is a take on "codec" but *rewrite* was right there! Also, as long as I'm whinging about naming? 'print' and 'parse' are five letter p words in a bidirectional relationship. Oh! Oh! push, peek, poke, ... pull! It even makes more sense than pop! And it's four letters!
But if you use complicated serialisation formats you can't mmap a file into memory and use it directly. Which is quite convenient if you don't want to parse the whole file and allocate it to memory because it's too large compared to the amount of memory or time you have.
Actually, it's you who is giving that impression with an ultra vague "doesn't solve the problems described".
The only problem in the blog post is efficient coding of optional fields and all they was introduce a bitmap. From that perspective, JSON and XML solve the optional fields problem to perfection, since an absent field costs exactly nothing.
I guess you missed the part where the size of the data stored on disk and efficient deserialization are also critically important performance characteristics that neither JSON nor XML have?
Capnproto doesn’t support transform on serialize - the optional fields still take up disk space unless you use the packed representation which has some performance drawbacks. Also the generated capnproto rust code is quite heavy on compile times which is probably some consideration that’s important for compiling queries.
Even completely ignoring the issues of language-centric vs data-format-centric serializers, your list is missing two very notable entries from my list: Arrow and Parquet. Both of them go to quite some lengths to efficiently handle optional/missing data efficiently. (I haven’t personally used either one for large data sets, but I have played with them. I think you’ll find that Arrow IPC / Feather (why can’t they just pick one name?) has excellent performance for the actual serialization and deserialization part as long as you do several rows at a time, but Parquet might win for table scans depending on the underlying storage medium.). Both of them are, quite specifically, the result of years of research into storing longish arrays of wide structures with potentially complex shapes and lots of missing data efficiently. (Logical arrays. They’re really struct-of-arrays formats, and I personally have a use case I kind of want to use Feather for except that Feather is not well tuned for emitting one row at a time.)
> Protobufs definitely doesn’t solve the problems described. Capnproto may solve it but I’m not 100% sure. JSON/XML/ASN.1 definitely don’t.
I'm not sure you are serious. What open problem do you have in mind? Support for persisting and deserializing optional fields? Mapping across data types? I mean, some JSON deserializers support deserializing sparse objects even to dictionaries. In .NET you can even deserialize random JSON objects to a dynamic type.
Can you be a little more specific about your assertion?
The space overhead and the overhead of serialization/deserialization. Rkyv is zero overhead - it’s random access without needing to deserialize and can even be memory mapped.
The whole “zero overhead” thing is IMO a red herring. I care about a few things: stability across versions and languages, space efficiency (sometimes) and performance. I do not care about “overhead” — performance trumps overhead every time.
Your deserializer is probably running on a CPU, and that CPU probably has a very fast L1 cache and might be targeted by a compiler that can do scalar replacement of aggregates and such. A non-zero-overhead deserializer can run very quickly and result in the output being streamed efficiently from its source and ending up hot in L1 in a useful format. A zero-overhead deserializer might do messy reads in a bad order without streaming hints and run much slower.
And then to get very very large records, as in the OP, where getting a good on-disk layout may require thought. And, frequently, the right layout isn’t even array-of-structs, which is why there are so many tools designed to query column stores like Parquet efficiently.
Serdes time can be significant. There are use cases for the zero copy formats even though they use more space. Likewise bit-packed asn1 is often slower than byte-aligned.
If you care about space, you're almost certainly going to compress your output (unless, like, you're literally storing random noise) and so you'll necessarily have overhead from that.
Unless the reason you care about space is because it's some sort of wire protocol for a slow network (like LoRaWAN or Iridium packets or a binary UART protocol), where compression probably doesn't make sense because the compression overhead is too large. But even here, just defining the data layout makes sense, I think.
Tihs could take the form of a C struct with __attribute__((packed)) but that is fragile if you care about more platforms than one. (I generally don't, so that works for me!).
I have zero doubt that you’re on some ‘no true Scotsman’-style “you’re not doing Real Development if you are using these technologies to solve these problems” thing. Let’s just drop that. There are myriad ‘real man webscale development’ scenarios where these are more than acceptable.
Pretty sure protobuf used a header to track field presence within a message, similarly to what this article does. That does have its own overhead you could avoid if you knew all fields were present, but that's not the assumption it makes.
Sure, if your structure doesn't contain any pointers and you only ever want to support one endianness and you trust your compiler to fix the machine layout of the struct forever.
Pretending your laptop is a screaming fast workstation and compiling C++ code on all cores can use quite a bit of RAM.
(I have a MacBook Pro that is only around 10% slower at this than an AMD workstation. The workstation has considerably higher TDP. I’m quite impressed.)
On a quick read of the paper, it's incoherent (pun intended). It seems to conflate quantum states with classical vectors, which thoroughly loses both the source of the exponential speedup in Shor's algorithm and the difficulty of quantum algorithm design.
The paper doesn't actually give a clear description of its own algorithm, and there are two specific problems that are apparent even without much of a description:
1. It confuses quantum state vectors with classical vectors or vectors of values. Classically, or on a quantum computer, you can have n values stored in registers or in memory or on a piece of paper or whatever and you have an n-element vector. But on a quantum computer, if you have n qubits and write down their state, you have a 2^n-element vector of complex numbers. These are not the same thing.
So you can have the quantum Fourier transform, which Fourier transforms the coefficients in the state vector of n qubits, which is not at all the same thing as taking 2^n logical numbers and Fourier transforming the numbers.
But this paper very glibly discusses how the QNTT (Quantum Number Theoretic Transform) is nicer than the QFT, but as far as I can tell, the "QNTT" is described in one single paper, doesn't really have much to say for itself, and is actually just an algorithm, supposedly optimized to run on current quantum hardware, that transforms n numbers stored in n registers. (And if a paper wants to claim to number-theoretic-transform the coefficients of a quantum state vector, it should start by explaining how the coefficients of said state vector are to be viewed as elements of a finite ring or field, which these papers do not even pretend to do.)
I think they're using the QNTT to optimize modular exponentiation, which is at least vaguely plausible, but that's using the QNTT for a purpose completely unrelated to what Shor's algorithm uses the QFT for.
2. The replacement of quantum modular exponentiation with classical modular exponentiation is just weird and is completely missing an explanation. Modular exponentiation is just a classical function, like f(r) = 2^r mod p. You can make it reversible (where all operation have inverses) by instead doing somthing like (z, r) -> (z + 2^r mod p, r) -- if you start with z = 0, you get the answer, and if you start with z != 0, you get z added to the answer.
Quantum computers can evaluate quantum functions where the input is qubits instead of classical bits, and they do it by running reversible calculations as above, and many algorithms require doing exactly this while carefully avoiding entangling the inputs or outputs with anything else. So if you start with two quantum registers, you can write the state as complex number times each possible input state (all 2^b of them where b is the total number of bits in all the registers) and you get those same complex numbers times the output states. [0]
The paper claims, with no explanation that I can see, that somehow you can instead do the modular exponentiation on a regular computer and encode those exponents into the quantum circuit. If you are willing to do all 2^b of them, then fine [1], but remember, b is larger than 2048, and this isn't going to work. So maybe they're approximating the modular exponentiation by somehow extrapolating it from a very, very spare set of samples? If that works, that would be quite nifty, but again the paper doesn't appear to so much as acknowledge any complication here. On the other hand, I can easily imagine factoring a number like 15 this way, since the number of samples needed to completely capture the function is rather small.
(I hope I did an okay job of making this both correct and somewhat accessible.)
[0] The calculation is reversible, each input state maps to exactly one output state and vice versa, so each coefficient appears in front of a different logical output state, which makes the math work.
[1] Not fine, because the resulting circuit will be so large that you will never finish running it. But mathematically fine in the sense that you'll get the right answer. Also, by the time you have classically sampled the entire search space for a problem like modular exponentiation, you have already brute forced all possibly discrete logs, at which point you don’t need the quantum computer!
Because there isn't such a relation. It's a thing people believe when they don't have actual experience with peer review. If anything, predatory journals and low-quality pubs can charge more, since publication is more guaranteed (and researchers reaching for these pay-to-publish journals are more desperate).
It's a reputation economy. Like review sites. They start off truthful, and then as time goes on incentives shift to bad actors to subvert it. Or they just sell out their reputation.
Yelp, TripAdvisor, wire cutter, hell even Google results themselves.
Once you start poisoning that well, it's difficult if not impossible to claw it back.
I tend to agree, but keep in mind that most likely you just don't even bother reading the shittiest of the shittiest papers just based on title and abstract. And for every good article there are like 10 unindexed shitty ones.
“Covered application store” means a publicly available internet website, software application, online service, or platform that distributes and facilitates the download of applications from third-party developers to users of a computer, a mobile device, or any other general purpose computing that can access a covered application store or can download an application.
So… DNS servers are “covered application stores”, right? As is PyPI or GitHub or any other such service. S3 and such, too — lots of facilitating going on.
And I’m wondering… lots of things are general purpose computers. Are servers covered? How about embedded systems? Lots of embedded systems are quite general purpose.
edit: Yikes, whoever wrote the text of the law seems to have failed to think at all.
> (b) (1) A developer shall request a signal with respect to a particular user from an operating system provider or a covered application store when the application is downloaded and launched.
The developer shall request? Not the application? So if I write an application and you download it and run it on an operating system, then I need to personally ask your OS how old you are? This makes no sense.
> (2) (A) A developer that receives a signal pursuant to this title shall be deemed to have actual knowledge of the age range of the user to whom that signal pertains across all platforms of the application and points of access of the application even if the developer willfully disregards the signal.
Did they forget to make this conditional on getting g the right answer? If I develop an application used by a 12-year-old and the OS says the user is 18+ (which surely will happen all the time even if no one lies because computers have multiple users), and the OS answers my query, then courts are directed to deem that I have actual knowledge that the user is under 13? Excuse me?
My reading of 2A is that devs can take the word of the OS or App Store. If they say the user’s 18, and the user’s really 13, then the developer’s in the clear for serving adult content to them because they took the word of the certifying entity.
Conversely, if the OS says the user’s 13, then they can’t say they thought the user was actually 18. Guess sucks to suck if you want to buy a movie ticket from your kid’s phone, or if you mistyped your age when you set yours up because you didn’t have your passport nearby.
2A just says that if the e.g. client request headers say the age bracket, the server (dev) can trust the reported age, but also shall not ignore it on purpose. No "just ignore the do-not-track flag" escape hatch here. "A bartender can't willfully refuse to check someone's ID if they are presented with it."
For incorrect OS answers, keep reading. 3B covers what happens if there's clear and convincing evidence that the age covered in 2A is inaccurate. (Reported profile birthday, for instance)
This is "if someone shows a bartender a valid drinking-age ID but says they're celebrating their 17th birthday, this can't be ignored".
Nothing there responds to the question. If my 17 year old answers “I'm 23”, what exactly prevents them from posting to /r/nsfw? What constitutes “clear and convincing evidence”? If there's no answer here, then there appears to be no purpose to this law as this sort of thing is precisely what it's supposed to be preventing.
The difference is a bartender has a handy thing called a human brain that can integrate every evidence and prior without explicit handling. Which a computer program can not. Now we have another "legitimate interest", potentially _forcing_ us to collect biometric and behavioural data we definitely wouldn't monetize to just cover its cost.
> “Covered application store” means a publicly available internet website, software application, online service, or platform that distributes and facilitates the download of applications from third-party developers to users of a computer, a mobile device, or any other general purpose computing that can access a covered application store or can download an application.
So OpenWRT would be covered since they allow the user to download packages (ie software) via apk/opkg.
Quite possibly, yes. Though maybe a router wouldn't qualify as a general purpose computing device, and maybe the packages wouldn't qualify as being from third-party developers when the binaries that get downloaded are both built and distributed by OpenWRT.
I’ve only tried doing a phone repair per iFixit’s instructions once, and the instructions sucked. They explained in excruciating detail how to take the phone apart and then the instructions just ended. No details on reassembly.
Also, perhaps the CDN script snippets in the getting started page should include the integrity attribute.
reply