companion_cube changed the topic of #ocaml to: Discussion about the OCaml programming language | http://www.ocaml.org | OCaml 4.12 released: https://ocaml.org/releases/4.12.0.html | Try OCaml in your browser: https://try.ocamlpro.com | Public channel logs at https://libera.irclog.whitequark.org/ocaml/
waleee has quit [Ping timeout: 240 seconds]
Haudegen has quit [Ping timeout: 248 seconds]
average has joined #ocaml
vicfred has joined #ocaml
zebrag has quit [Quit: Konversation terminated!]
favonia has quit [Ping timeout: 250 seconds]
favonia has joined #ocaml
gravicappa has joined #ocaml
mbuf has joined #ocaml
cross has quit [Ping timeout: 240 seconds]
cross has joined #ocaml
bartholin has joined #ocaml
bartholin has quit [Quit: Leaving]
vicfred has quit [Quit: Leaving]
Serpent7776 has joined #ocaml
hendursa1 has joined #ocaml
hendursaga has quit [Ping timeout: 276 seconds]
favonia has quit [Ping timeout: 240 seconds]
olle has joined #ocaml
average has quit [Quit: Connection closed for inactivity]
salkin has quit [Quit: salkin]
olle has quit [Ping timeout: 240 seconds]
Haudegen has joined #ocaml
waleee has joined #ocaml
quernd has joined #ocaml
favonia has joined #ocaml
PinealGlandOptic has joined #ocaml
bartholin has joined #ocaml
bartholin has quit [Ping timeout: 252 seconds]
bartholin has joined #ocaml
olle has joined #ocaml
elf_fortrez has joined #ocaml
elf_fortrez has quit [Quit: Client closed]
bartholin has quit [Ping timeout: 252 seconds]
olle has quit [Ping timeout: 252 seconds]
bartholin has joined #ocaml
zebrag has joined #ocaml
Serpent7776 has quit [Read error: Connection reset by peer]
Serpent7776 has joined #ocaml
cross has quit [Quit: leaving]
cross has joined #ocaml
elf-fortrez has joined #ocaml
Soni has joined #ocaml
<Soni> hey uh, does ocaml have anything like this? https://doc.rust-lang.org/std/collections/struct.BTreeMap.html
<Corbin> Sure. https://ocaml.org/api/Hashtbl.html is a good starting place. "ocaml b-tree" and "ocaml btree" are good queries in a search engine to see what folks have built.
<Corbin> Because B-trees are usually chosen for speed and customized to fit a cache layout, there likely is not a single pre-existing implementation which perfectly fits your needs; you'll have to experiment a bit.
<Soni> they're also useful for order of iteration and predictable/(mostly) deterministic (cache effects aside) performance, but yeah we didn't really find a BTreeMap equivalent of Hashtbl...
bartholin has quit [Ping timeout: 250 seconds]
<Corbin> This is a common difficulty with data structures; a data structure *implements* an interface, but it also *is implemented* in terms of other data structures with their own interfaces. So, the question must be more specific: Do you want the interface of a B-tree, or the implementation of a B-tree?
<Corbin> Soni: For example, the top two hits on my search look like fine implementations to start from, but they both assume that you'll be saving your B-trees to block devices. This might be what you want, but maybe not!
<Corbin> (https://github.com/mransan/btree in particular might be useful.)
<Soni> nope, we just want type-safe btrees with no regards for concepts such as bytes
<Corbin> Then try Hashtbl; it's the same sort of mutable in-place structure. You can override the hash function with roughly the same level of control as in Rust.
<Soni> <Soni> they're also useful for order of iteration and predictable/(mostly) deterministic (cache effects aside) performance,
<Soni> (we're gonna need these properties, and Hashtbl doesn't provide them)
<Corbin> I haven't read the implementation yet, but Rust's BTreeMap is a dynamically-allocated structure, isn't it?
PinealGlandOptic is now known as Everyone
<Corbin> It's been a few years, but is "we" a group of blockchain developers? In that case, I imagine that y'all could buy whatever you "need". (If this is about IRC clients, then this sounds like a great story for the channel, but I imagine that you're usually I/O-bound!)
Everyone is now known as Everything
<companion_cube> Soni: just use Map
<Corbin> companion_cube: Wouldn't that give sorted order under iteration, instead of insertion order? And also it's not mutable. But maybe those are preferable.
<d_bot> <dinosaure> if you want something more fast than `Hashtbl` and if you key is a `string`, you can look the `art` project: https://github.com/dinosaure/art
<companion_cube> It's much closer to a btree than Hashtbl
Everything has quit [Quit: leaving]
Everything has joined #ocaml
<Soni> nah this is for #soupault -related things
<Soni> (and it involves untrusted input)
<Corbin> Oh! Then definitely Map. Don't hash untrusted keys if you can avoid it.
<companion_cube> ?!
<Soni> uh but Map copies on write?
<companion_cube> No
<companion_cube> It's purely functional
bartholin has joined #ocaml
<Corbin> companion_cube: Hashing untrusted keys for a hashmap is not security-dangerous, but it does endanger average-time data-structure performance. In general, we should assume that untrusted keys will degrade hash tables to worst-case behavior.
<companion_cube> Corbin: ocaml hashtables can be randomized
<Corbin> companion_cube: That's not a real mitigation, unfortunately. I know the feeling and the argument (I was there for the Python experience a few years ago) but ultimately the right move is to provide a red-black tree or similar, and tell app developers to use it when appropriate.
<companion_cube> Why is it not a mitigation?!
<companion_cube> I've never read that anywhere
<Corbin> Because an attacker can use timing attacks to attack the random seed, and the random choice of hash function is usually just setting a constant within a single family (e.g. in PCG)
<Soni> this is generally not interactive (which mostly defeats timing attacks) and the sorted order of iteration is a more exciting property, but anyway
<Soni> so Map moves on write?
<companion_cube> No? This is not rust
<Soni> so Map is slow when doing lots of insertions?
<companion_cube> It just allocates a new tree, which shares most of its structure with the old tree
<companion_cube> Well, benchmark first
<companion_cube> Depends what you call "a lot"
<Corbin> Soni: The rule of thumb for persistent data structures is that logarithmically many new structs are allocated on write, if you're doing asymptotic analysis.
<Tardigreat[m]> if map is implemented as a RB tree, then it's just RB tree complexity + the lack of caching you'd get with a tree
<fluxm> let's say though HashTbl would generate a new seed when it needs to grow/shrink, wouldn't that be pretty effective mitigation against timing attacks?
<companion_cube> Tardigreat[m]: it's an AVL tree
<Tardigreat[m]> close enough :P
<Tardigreat[m]> (but good to know)
<Tardigreat[m]> companion_cube: not very relevant, but i'm curious as to why AVL was chosen over red-black?
<companion_cube> It's simpler and faster
<companion_cube> Oops
<Tardigreat[m]> :D
<companion_cube> It's an AVL with a bit more slack than usual, too
<Corbin> Tardigreat[m]: It may have been written before the "left-leaning" flavor of red-black tree was discovered, which simplified the implementation a lot.
<companion_cube> Much simpler than RB in any case
<fluxm> because the know-how about how to remove nodes from RB tree is arcane knowledge 👀
<companion_cube> It also has merge operations, for example
<Tardigreat[m]> have i started a holy war of the red-black crusaders and avl insurgents?
<Corbin> fluxm: There's two threat models. I'll say that the "transparent" model lets the attacker directly call hash() on chosen keys, while the "opaque" model only lets the attacker measure the relative time taken for insertions and resizes.
<fluxm> btw, the OCaml Workshop 2021 had a demo of GOSPEL being applied to make a proof for Set
<Corbin> The transparent model is surprisingly common, and in that model, reseeding doesn't help. Reseeding does help in the opaque model.
<fluxm> cobin: there is also the case where the attacker has no interactiev access whatsoever, as the processing is performed e.g. on files by a user. not all processing is yet in the cloud.. :)
<Soni> so much for allocation-free* inserts (except when resizing) and sorted order...
<companion_cube> Soni: this is not rust
<companion_cube> Forget about allocation - free
<Tardigreat[m]> rewrite it in rust then!
* Tardigreat[m] hides
<fluxm> personally I prefer trees, but if you need speed, it's dificult to beat the constant factor of hash tables
<Corbin> Tardigreat[m]: The left-leaning paper seems to have come out in 2008: https://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf OCaml's quite a bit older than that. It just wasn't an option back then.
<Soni> dmbaturin: hey wanna rewrite soupault in rust? XD
<fluxm> there's two projects for ocaml-rust-interop, btw.
<Tardigreat[m]> Corbin: i mean it doesn't really matter all that much. it's an implementation that can change at any point depending on what's most convenient. if it works, leave it alone :P
<Armael> isn't soupault a website generator? why would you need to fine-tune datastructures for that
<fluxm> I've had this dream (obviously done nothing for it) how "better typed" languages should join their forces. Given their expressive type systems, type-safe interop _seems_ like a thing they could pull off smoothly.
<companion_cube> If you want to use modern trees, go HAMT/RRB, not RB trees
<Soni> Armael: because we're trying to use it with git
<Soni> and not in the way you'd usually do so
<Armael> hmm, what do you mean "with git"?
<companion_cube> There's a git implementation in ocaml
<Corbin> Soni: AIUI OCaml is implemented with a *nursery*. This means that the GC does not actually allocate, in the Rust/C sense, when creating small new objects. Allocation only happens when long-lived objects survive a garbage collection.
<Soni> Armael: this http://sprunge.us/Br1CgX and some stuff about symlinks :p
<Armael> fluxm: agreed, but my feeling is that it's hard to do; different languages can have widely different runtimes and/or types
<fluxm> armael: yeah, I guess the proof is in the lack of smooth solutions :)
<Soni> (this is the input to soupault)
<dmbaturin> Soni: I'll first need an HTML manipulation library on par with lambdasoup rewritten in Rust. :)
<fluxm> on the other hand, maybe compiler developers would be better positioned to solve this issue
<Tardigreat[m]> Soni: where specifically have you identified bottlenecks
<Tardigreat[m]> do you have any numbers?
cedric has joined #ocaml
<Armael> right, I would start with that as well
<Soni> anyway, lua is awesome but lua 2.5 is sad. and lua doesn't require hashtables, it can be implemented with btreemaps.
jess has joined #ocaml
<Tardigreat[m]> FWIW, guix walks massive git DAGs in guile. i find it hard to believe that the bottleneck would be ocaml when guile can do it just fine
<Soni> and for whatever reason soupault's lua implementation has something about sorted keys
olle has joined #ocaml
<Armael> mm so what's the issue with the current implementation?
<Soni> it doesn't have integers, mostly
<Corbin> companion_cube: I don't want to sign you up for a long read, so I'll quote the only important bit from https://bugs.python.org/issue13703#msg150592: "There simply aren't any cryptographic hashing algorithms available that are in the performance class we need. My proposal does make the collision attack quite difficult to carry out, even if the raw output values from the hash are available to the attacker (they should not usually be)."
<dmbaturin> This may be flameware-inducing, but I came to believe that "everything is an unordered dictionary" is a deeply flawed approach for a language.
<Soni> so the idea was to make a lua 5.3 impl with btrees and an array optimization
<dmbaturin> I sure do appreciate any help with Lua-ML in any case, because there's only that much I can do maintaining it alone.
<Corbin> This entire experience was just terrible, and I hated it. I *did* learn about the concept of "nemesis"; there's an algorithm which provably takes any quicksort implementation and generates worst-case inputs.
<dmbaturin> Soni: Adding integers would be way simpler and independent from the hash table implementation actually. I'll need to examine the current Lua 5.3 behaviour first of course, but I think it will be a matter of adding a new internal type and some conversions in the built-ins.
<Soni> also the lua reference implementation is vulnerable to hashtable attacks, altho soupault generally doesn't run untrusted lua
<Corbin> dmbaturin: I agree. I personally took a page from the E language: We should use insertion-ordered dictionaries. They are far more predictable under certain kinds of distributed and concurrent assumptions.
<Soni> dmbaturin: it also has all the bitwise operators
<dmbaturin> Wait, is a hash table attack more dangerous than simply degrading it to the worst case performance?
<Soni> also, varargs are awesome
cedric has quit [Quit: Konversation terminated!]
<Soni> dmbaturin: nope
<Tardigreat[m]> dmbaturin: no, that's basically what it is
<Soni> except the worst case is O(n)
<Corbin> (Another big data-structure improvement makes insertion-ordered mutable dictionaries practical: https://www.pypy.org/posts/2015/01/faster-more-memory-efficient-and-more-4096950404745375390.html)
<Soni> (imagine if all your operations were about looking things up in an unordered array)
<Soni> dmbaturin: one thing that lua 5.1+ have, that isn't possible in lua 2.5, is a vararg-based continuation-passing interpreter, where the varargs are basically a stack
<dmbaturin> 2.5 doesn't even have anonymous functions. That's a nightmare for iter/map.
<dmbaturin> Well, not a nightmare since the workaround is trivial—create a named function, but a hassle for sure.
<Soni> hmm, does lua 2.5 even have tail call optimization?
<companion_cube> Wait, lua 2.5, not 5.2?
<Soni> companion_cube: https://github.com/lindig/lua-ml/
<companion_cube> Damn
<companion_cube> I'd rather use the bindings to lua
<Soni> (or just run lua in a wasm VM)
<Soni> (but also reimplementing lua 5.3 sounds fun)
<dmbaturin> companion_cube: Bindings are to specific Lua version, would need you to build that Lua version, and also lack the type safety of Lua-ML (which uses functors to build the stdlib of the embedded interpreter).
<dmbaturin> Actually, the reference implementation of WASM is in OCaml. It can potentially be made an embeddable runtime...
<companion_cube> use wasmer or sth like thar
<companion_cube> that*
olle has quit [Ping timeout: 250 seconds]
<Soni> do note that setjmp/longjmp support is still WIP (waiting on wasm exceptions spec), altho you can use emscripten (which emulates setjmp/longjmp with native/usually javascript exceptions) and provide your own native/ocaml components for it
<companion_cube> dmbaturin: bindings embed lua afaik
<Soni> but anyway, <Soni> (but also reimplementing lua 5.3 sounds fun)
<companion_cube> You lose type safety but you get a better implementation
<Soni> (especially if you've played with lua internals before...)
<dmbaturin> companion_cube: Those in the OPAM repo, they don't. I can potentially make them embed it... but my own plan out of that situation is to make a subset of Scheme on the ideas of Lua-ML. :)
<Soni> can ocaml export a C API?
<dmbaturin> Yes.
<Soni> so a lua 5.3 in ocaml could load C modules?
<dmbaturin> Soni: https://github.com/vyos/libvyosconfig/ This is a shared library in OCaml. We use it from Python.
<Soni> how do you deal with 64-bit integers?
mbuf has quit [Quit: Leaving]
bartholin has quit [Ping timeout: 245 seconds]
<dmbaturin> ZArith or similar.
gravicappa has quit [Ping timeout: 250 seconds]
<companion_cube> But you can only do it once per process, no?
<companion_cube> I think the rust approach is better, you could have a deriving lua to make interop type safe
<companion_cube> And use the normal lua, possibly vendored
bartholin has joined #ocaml
elf-fortrez has quit [Quit: Client closed]
gravicappa has joined #ocaml
bartholin has quit [Ping timeout: 248 seconds]
<d_bot> <darrenldl> im curious when one would need (hard) real time navigation of git, but anyhow
bartholin has joined #ocaml
salkin has joined #ocaml
vicfred has joined #ocaml
jess has quit []
Everything is now known as Everyone
Everyone has quit [Quit: leaving]
olle has joined #ocaml
gravicappa has quit [Ping timeout: 248 seconds]
bartholin has quit [Quit: Leaving]
mro has joined #ocaml
hendursa1 has quit [Quit: hendursa1]
hendursaga has joined #ocaml
mro has quit [Read error: Connection reset by peer]
mro has joined #ocaml
zebrag has quit [Quit: Konversation terminated!]
oriba has joined #ocaml
Haudegen has quit [Quit: No Ping reply in 180 seconds.]
Haudegen has joined #ocaml
salkin has quit [Quit: salkin]
salkin has joined #ocaml
salkin has quit [Quit: salkin]
oriba has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
salkin has joined #ocaml
olle has quit [Ping timeout: 240 seconds]
Tuplanolla has quit [Quit: Leaving.]
salkin has quit [Ping timeout: 240 seconds]
Haudegen has quit [Ping timeout: 240 seconds]
sleepydog has quit [Ping timeout: 240 seconds]
theblatte has quit [Ping timeout: 240 seconds]
theblatte has joined #ocaml
sleepydog has joined #ocaml
vb has quit [Ping timeout: 240 seconds]
vb has joined #ocaml
Chouhartem has quit [Ping timeout: 240 seconds]
nfc_ has quit [Ping timeout: 240 seconds]
nfc_ has joined #ocaml
mro has quit [Quit: Leaving...]
mro has joined #ocaml
mro has quit [Client Quit]
Chouhartem has joined #ocaml
Serpent7776 has quit [Ping timeout: 240 seconds]
Serpent7776 has joined #ocaml