companion_cube changed the topic of #ocaml to: Discussion about the OCaml programming language | http://www.ocaml.org | OCaml 5.0 released(!!1!): https://ocaml.org/releases/5.0.0.html | Try OCaml in your browser: https://try.ocamlpro.com | Public channel logs at https://libera.irclog.whitequark.org/ocaml/
TakinOver has quit [Ping timeout: 248 seconds]
<dh`> perf is linux-only, isn't it?
<dh`> also the basic blocks question stands
<companion_cube> Wdym by basic blocks?
xd1le has quit [Quit: xd1le]
TakinOver has joined #ocaml
waleee has quit [Ping timeout: 240 seconds]
waleee has joined #ocaml
waleee has quit [Ping timeout: 255 seconds]
chrisz has quit [Ping timeout: 255 seconds]
chrisz has joined #ocaml
TakinOver has quit [Ping timeout: 276 seconds]
ansiwen has quit [Quit: ZNC 1.7.1 - https://znc.in]
ansiwen has joined #ocaml
Haudegen has joined #ocaml
<dh`> in order to get useful profile results a profiler has to be able to connect object code locations to source locations
<dh`> in C code you can do this on a per-function basis and still get useful results (hence gprof) but that's not so true in ocaml
mbuf has joined #ocaml
TakinOver has joined #ocaml
_whitelogger has joined #ocaml
czy has quit [Remote host closed the connection]
<discocaml> <geoff> a
mbuf has quit [Remote host closed the connection]
mbuf has joined #ocaml
bartholin has joined #ocaml
gentauro has quit [Read error: Connection reset by peer]
olle has joined #ocaml
gentauro has joined #ocaml
Serpent7776 has joined #ocaml
trillion_exabyte has quit [Ping timeout: 255 seconds]
trillion_exabyte has joined #ocaml
drewolson has quit [Quit: Ping timeout (120 seconds)]
drewolson has joined #ocaml
trillion_exabyte has quit [Ping timeout: 240 seconds]
trillion_exabyte has joined #ocaml
mro has joined #ocaml
mro has quit [Client Quit]
bgs has joined #ocaml
bartholin has quit [Quit: Leaving]
<zozozo> dh`: I'm not sure what you mean, but if you build with ocamlopt -g, there is enough information for perf to map back to the source ocaml code
olle has quit [Remote host closed the connection]
<dh`> good to know
<dh`> not that perf does me any good since I'm not running on linux
jmiven has quit [Quit: reboot]
jmiven has joined #ocaml
rwmjones has quit [Read error: Connection reset by peer]
rwmjones has joined #ocaml
xd1le has joined #ocaml
czy has joined #ocaml
<adrien> which logging library should be used? I'm looking for one that is as light as possible at runtime and none mentions the runtime cost for disabled logging
<adrien> (seriously, there are more logging libraries than there are libraries to replace the stdlib)
<Armael> logs by dbuenzli?
<zozozo> the logs library was indeed made to be as light as possible when logging is disabled
<adrien> thanks
Haudegen has quit [Ping timeout: 276 seconds]
dh` has quit [Ping timeout: 276 seconds]
xd1le has quit [Quit: xd1le]
Anarchos has joined #ocaml
John_Ivan_ has joined #ocaml
John_Ivan has quit [Ping timeout: 240 seconds]
alexherbo2 has joined #ocaml
Anarchos has quit [Ping timeout: 265 seconds]
Anarchos has joined #ocaml
<companion_cube> adrien: logs
czy has quit [Remote host closed the connection]
<Anarchos> hello companion_cube
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
<companion_cube> Coucou
<Anarchos> companion_cube j'ai vu qu'un portage de gsourceview4 pour lablgtk était bloqué dans les limbes...
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
alexherbo2 has quit [Remote host closed the connection]
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
alexherbo2 has joined #ocaml
terrorjack has joined #ocaml
<adrien> switched to Logs but I'm still getting a huge slowdown when I extract my not-so-hot-loop to a dedicated function no matter how much code I move; unrolling a for loop by hand makes the code much faster however (2-way makes it >20% faster and 4-way makes it 35% faster)
alexherbo2 has quit [Remote host closed the connection]
<adrien> with opam, is there a way to create a flambda switch that has the same set of packages as an existing switch?
alexherbo2 has joined #ocaml
<companion_cube> probably with export/import
<companion_cube> I don't recall the exact command
<adrien> thanks; I've spotted that in the manpages but I was hoping there was something more direct (although export/import isn't complicated either)
<companion_cube> you could also look at the assembly I suppose
<companion_cube> even in perf, it can show which instructions are hot
<octachron> "opam install ocaml-option-flambda --update-invariant" will replace your current switch with an flambda one.
alexherbo2 has quit [Remote host closed the connection]
bartholin has joined #ocaml
alexherbo2 has joined #ocaml
rf has joined #ocaml
TakinOver has quit [Ping timeout: 246 seconds]
<adrien> octachron: interesting, thanks; I'm keeping my current switch though so that I can compare performance
<adrien> and I'm going to look at assembly and perf but I was hoping to make the code more readable first because that would make them more readable :D
<companion_cube> it really is a pity OCaml isn't relocatable
<companion_cube> otherwise you could just copy the switch
alexherbo2 has quit [Remote host closed the connection]
<adrien> I had to tweeak output of opam switch export because the compiler options (or their lack) are exported too and that conflicted, but I only had to remove a couple lines
<adrien> the nice thing with ocaml is that the switch almost finished rebuilding by the time I had written that message
alexherbo2 has joined #ocaml
TakinOver has joined #ocaml
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
rf has quit [Ping timeout: 250 seconds]
rf has joined #ocaml
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
<adrien> flambda performs significantly worse it seems
<zozozo> adrien: that's interesting, could you share the code ?
<adrien> zozozo: https://gitlab.com/adrien-n/compsort/-/blob/main/analysis.ml#L194 (this is the core of the hot loop) but this project still relies on a slightly patched xz unfortunately and it complicates usage by others
alexherbo2 has quit [Remote host closed the connection]
<adrien> but anyway, if I unroll that loop, I get major speedups (although I skip up to 7 iterations but that should make little difference) and if I move some of the code to another function, it gets much slow
<adrien> er
<zozozo> adrien: ow, when you extract to a separate function, do you add type annotations ?
<zozozo> operations on bigarrays are specialised by the compiler when the type are known and concrete, and that can bring a big speedup
<zozozo> and if you extract to a function that ends up being polymorphic, then you lose that specialisation
<zozozo> (I suppose BA1 is a module alias to some bigarray module)
Anarchos has quit [Quit: Vision[]: i've been blurred!]
<adrien> let me try; I had hypothesized the code got slower if I separate the use of the bigarrays from their creation but I wasn't able to actually test that (unfortunately I need to afk for a few minutes at least)
alexherbo2 has joined #ocaml
<adrien> zozozo: that was it! (and BA1 is a thin wrapper around Bigarray.Array1 to provide create* functions tailored to my needs and it was natural to add the corresponding types)
<zozozo> adrien: yeah, when using bigarrays, you should be careful and add as much type annotations as possible to ensure that accesses and reads are correctly specialised
<zozozo> also, and very sadly, this specialisation (which can be criticla for performance as you saw) happens during typechecking iirc, and thus before flambda, so even inlining annotations would not help if I remember correctly
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
<adrien> ah, ok; I had tried to use high thresholds for inline without effect
<adrien> and with annotations it's still a bit slower than when the code isn't moved but it's around 15% slower and not something like 1000% slower (I don't have exact numbers because I never waited long enough)
<adrien> (otoh, it's stupid to "for ... do f () done" rather than put the loop in the function itself so that's maybe not surprising)
mbuf has quit [Quit: Leaving]
<companion_cube> ahhhh, yeah, bigarrays are dangerous
<companion_cube> (for inlining)
Haudegen has joined #ocaml
TakinOver has quit [Remote host closed the connection]
TakinOver has joined #ocaml
<adrien> I just moved the "for" inside the function and got all the performance back
<adrien> I guess I'll try flambda again too
<adrien> but flambda is still fairly slower even with -O3
<adrien> I'm going to improve some stuff to reduce the size of one of the intermediate file so that it's smaller and almost sane to share
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
Tuplanolla has joined #ocaml
Stumpfenstiel has joined #ocaml
alexherbo2 has quit [Remote host closed the connection]
alexherbo2 has joined #ocaml
bgs has quit [Remote host closed the connection]
wingsorc has joined #ocaml
alexherbo2 has quit [Remote host closed the connection]
John_Ivan_ has quit [Quit: Phantom of the future.]
John_Ivan has joined #ocaml
bartholin has quit [Quit: Leaving]
oriba has joined #ocaml
Serpent7776 has quit [Ping timeout: 255 seconds]
Tuplanolla has quit [Quit: Leaving.]
<zozozo> adrien: ah, the manually inlined for loop is better because it allows the compiler to unbox the reference
<zozozo> With a recursive function emulating the loop, even with inlining, the ref would not be unboxed (however, flambda2 would be able to do that)
Haudegen has quit [Ping timeout: 240 seconds]
Stumpfenstiel has quit [Ping timeout: 260 seconds]
TakinOver has quit [Ping timeout: 250 seconds]
<companion_cube> But with a for loop it would, right?
mauke_ has joined #ocaml
mauke has quit [Ping timeout: 252 seconds]
mauke_ is now known as mauke
<zozozo> Yup