T O P

  • By -

wwabbbitt

I see someone has reinvented the https://en.wikipedia.org/wiki/XOR\_linked\_list


tialaramex

You don't need those backslashes. I also expect that this is Undefined Behaviour in standard C++ although I'd be pleased if I'm wrong and someone has a citation.


slither378962

Casting pointers to integers is a big can of worms labelled "pointer provenance".


tialaramex

Presumably you mean casting integers to pointers? The other way around is fine, clearly we can make some integer from a pointer - the problem (which is where provenance comes in) is whether that integer is enough to reconstitute a pointer and if not, why. Optimisers do not like the "bag of bits" aka Provenance Via Integers model of pointers in which this "obviously" just works. Aria's Strict Provenance Experiment (on Rust) examines what you can still do with a model where pointers aren't just integers and as I understand it you can do these XOR lists in that model, because we had a pointer (with the correct provenance) and we're just XORing some of its bits to get a different pointer with the same provenance, which is fine - similarly squirrelling away bit flags in the low bits of aligned pointers works in Aria's experiment. Some things the bag-of-bits people like cannot work, and you also can't explain bare metal memory mapped I/O with strict provenance, it cannot be the whole world - but Aria's question was essentially is it enough of the world to be worth giving people these simpler and easier to understand guarantees. It'd be nice to give the LLVM people hiding bit flags in pointers safety tools that couldn't possibly work for somebody who is writing values to address 0x3FD00 because that's what it said in the manual. The compiler has no way to check that's correct, in the real world maybe it runs the attached fan motor or, maybe it's completely wrong and trashes actual RAM with important stuff in it. Whereas the compiler absolutely can check that the bits you are "stealing" to put flags in them weren't needed for something else.


PrimozDelux

It's a bug with reddit causing the backslashes


kog

Nope, here's the link without backslashes, no special handling required to post it. https://en.wikipedia.org/wiki/XOR_linked_list


wrosecrans

The user probably isn't adding them intentionally or doing any special handling. "New" Reddit just does things differently than "Old" Reddit which results in the bug depending on which UI version you use.


domiran

If I’m understanding this correctly, isn’t this going to break if the whole 64-bit address space is used? [Edit] Nope, I’m dumb. Interesting trick.


patstew

No, it's not using any tag bits, just xoring the integer version of prev* and next* which means you can recover next* given prev* or vice versa, so it's bidirectionally iterable.


no-sig-available

> isn’t this going to break if the whole 64-bit address space is used? No, that is not a problem. The article mentions the real problem - with three values xor'd together, you need to know *two* of them to get the third. If you have a Node, the `this` value is easy, but to get `next` you also have to already know `previous` (and vice versa). How do you do that if starting in the middle? So the limitation is that you *can* traverse the list, but you have to start at either end. Or possibly know two *consecutive* nodes in the middle (but how did you get those?!),


domiran

Yeah, seems only necessary if memory is really at a premium, or you really only need to go from the ends.


skullt

The advantage of a doubly linked list is that you can start iterating from any node as well as insert at or delete an arbitrary node in constant time. This trick requires you to always iterate from one end of the list. You can't do anything to the list just given an arbitrary node. You can't even insert after a node without first scanning through the list for it, which I think actually makes this worse than a singly linked list.


epostma

A "handle" to an entry of this type of list (iterator, "fat" pointer (though that usually means something else)) needs to consist of two raw pointers. You can't find an entry in any linked list other than by iterating over the list anyway; you just need to make sure to retain both pointers when you do so. The only tricky thing is if you want to retain such a handle while modifying the list through other handles. That would be supported by a traditional double linked list but not this type.


skullt

> You can't find an entry in any linked list other than by iterating over the list anyway Not talking about finding entries, talking about when you already have saved node pointers. The only time I would consider using a doubly linked list is when the list needs to be updated quickly at arbitrary points cached in other structures. > The only tricky thing is if you want to retain such a handle while modifying the list through other handles. That a pretty big disadvantage. If a single insert potentially invalidates every other fat pointer, they don't seem like a viable solution.


epostma

You're not wrong, but if you do this in full generality in the "typical" case, then there's also a risk that you delete the subject of one pointer using another. (Of course in some situations there's some invariant that prevents this, e.g. if each pointer only deletes its own subject.)


hmoein

The most powerful action on doubly linked list is splice in constant time which this trick cannot do. But it is interesting


NilacTheGrim

This is a bad idea. If you really need a doubly-linked list, then don't do this. Saving 8 bytes per node on modern hardware likely doesn't matter. And you may be opening up cans of UB worms.


matthieum

Or if you _really_ need to save 8 bytes per node, and can't use anything else than a linked-list (not even an unrolled one), look into using indexes/offsets instead of pointers. For example, with a custom allocator guaranteeing that all nodes are within 2^31 of each others' _size_, you could record the offset between your node and the predecessor/successor nodes, 4 bytes each. It _would_ require that the custom allocator provides all allocations from within a single memory slab, otherwise UB ensues, but it'd be doable.


NilacTheGrim

Sounds like a good idea but in practice I think writing such an allocator would be tricky.


matthieum

An allocator for fixed-size, fixed-alignment requests, which can likely be single-threaded? This is the easiest allocator in the world :) The one tricky aspect is that there's an upper bound on the maximum number of elements which can be allocated at once, and even the allocator can leverage virtual memory by using an anonymous mmap of very large size, and counting on the OS to lazily map the virtual pages to actual RAM.


ack_error

Just store the list nodes in a vector -- indices/offsets would make the list structure relocatable.


NilacTheGrim

Umm. That would mean it's no longer got the attractive properties of a doubly-linked list. It would make splicing no longer be O(1).


ack_error

I'm not sure what you mean...? It's still a doubly linked list, just using a vector as the backing store for nodes similar to a linear allocator. Splicing still takes an O(1) number of operations when the list uses offsets or indices, as long as both lists have their nodes in the same vector.


NilacTheGrim

Oh I see. And deletion I guess could swap the current element with the last one, and update adjacent "pointers" appropriately. That would work. So basically the vector is like an allocated slab except it has the crappy property of occasionally requiring O(n) traversal as you add elements. Meh. I'd still rather just use a regular doubly linked list and not bother. Unless you're on some super memory constrained arch where saving 8 bytes matters (or 4 bytes in that case since on such arch's you'd rarely be 64 bit).. I wouldn't bother. But yes you can mimic a slab of memory with a vector then constrain that memory to use as few bits as you want for each "pointer".


justinhj

This is the way. Same memory footprint but keeps the original properties of the double linked list. In addition you benefit from data cache locality of the list members.


HolyGarbage

This is really clever, and the requirement to keep track of the pointer to the previous element can be easily abstracted away with iterators.


tinylittlenormous

No article when I open the site on my phone ?


Beosar

It works in Chrome on Android. Maybe try deleting the cache or cookies?


tinylittlenormous

Safari and Firefox have the same issue.


flutterdro

I remember reading that this kind of pointer manipulation is very dangerous, because, from compilers viewpoint, pointers are way more than just integers and casting from pointer to integer and back can lead compilers to incorrect optimizations. I am terrified to even think of this kind of tricks, let alone use them.


ack_error

Both the C and C++ standards guarantee that a pointer can be converted to a suitably sized integer (i.e. intptr_t) and back, and the resulting pointer is guaranteed to be equal to the original. It's true that there have been compiler bugs related to this, and there is debate on whether the XOR list in particular is or should be permitted. But temporarily tunneling pointers through pointer-sized ints is fairly common, and simple pointer-int-pointer should be considerably safer than mixing the integer values of pointers.