rgrinberg has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
rgrinberg has joined #ocaml
pi3ce has joined #ocaml
drobban has quit [Ping timeout: 256 seconds]
azimut has quit [Ping timeout: 240 seconds]
rgrinberg has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
drobban has joined #ocaml
rgrinberg has joined #ocaml
trev has joined #ocaml
waleee has quit [Ping timeout: 256 seconds]
migalmoreno has quit [Ping timeout: 256 seconds]
_alix has quit [Ping timeout: 276 seconds]
ymherklotz has quit [Ping timeout: 276 seconds]
brettgilio has quit [Ping timeout: 276 seconds]
sleepydog has quit [Ping timeout: 276 seconds]
henrytill has quit [Ping timeout: 256 seconds]
seeg has quit [Read error: Connection reset by peer]
soni_ has quit [Read error: Connection reset by peer]
kuruczgy has quit [Ping timeout: 260 seconds]
jakzale has quit [Read error: Connection reset by peer]
patrick has quit [Read error: Connection reset by peer]
toastal has quit [Read error: Connection reset by peer]
arya_elfren has quit [Read error: Connection reset by peer]
richardhuxton has quit [Ping timeout: 276 seconds]
philipwhite has quit [Ping timeout: 276 seconds]
immutable has quit [Ping timeout: 256 seconds]
whereiseveryone has quit [Ping timeout: 256 seconds]
Ankhers has quit [Ping timeout: 276 seconds]
pluviaq has quit [Ping timeout: 276 seconds]
b0o has quit [Ping timeout: 276 seconds]
ggb has quit [Ping timeout: 276 seconds]
rgrinberg has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<adrien>
companion_cube: 1GB here
rgrinberg has joined #ocaml
<adrien>
one aspect is that I don't know how to combine a fast input, possibly on a bigarray or bytes, with Re and I don't know if it could be done without a fairly differnet API
<adrien>
so I have to read the whole strings
<adrien>
string*
<discocaml>
<darrenldl> whats the regex in question?
<discocaml>
<darrenldl> also are you reading the entire string in before scanning?
rgrinberg has quit [Client Quit]
<discocaml>
<darrenldl> i think im primarily confused by the 1GB buffer - is the file not line based?
<adrien>
the regex is '/zsys$' and yes, I'm reading the entire string first because the speed is limited by the time it takes to read the data
<adrien>
I'm actually not doing any processing in the ocaml code at the moment
rgrinberg has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<adrien>
I can read the data 30% faster if I Unix.read to the beginning of the string (i.e. pos = 0)
<adrien>
I strace'd rg which only issues read() calls: no vmsplice (I know they can be a bit tricky to use but I'm curious)
<adrien>
I guess there are virtual memory/cache things so I need to perf that to confirm which I'll do later on unless someone beats me to it (I guess this applies to /dev/zero too)
<adrien>
(it does)
bartholin has quit [Quit: Leaving]
Tuplanolla has joined #ocaml
czy has joined #ocaml
<discocaml>
<lukstafi> Does `ctypes` & more complex C FFI work with bytecode?
azimut has quit [Remote host closed the connection]
azimut has joined #ocaml
<companion_cube>
rg is going to be extremely fast on this regex, for sure
<companion_cube>
It uses simd and aho-corasick, iirc, to process multiple bytes at a time since this is basically just substring search
whereiseveryone has joined #ocaml
immutable has joined #ocaml
sleepydog has joined #ocaml
pluviaq has joined #ocaml
patrick has joined #ocaml
jakzale has joined #ocaml
brettgilio has joined #ocaml
Ankhers has joined #ocaml
arya_elfren has joined #ocaml
soni_ has joined #ocaml
b0o has joined #ocaml
philipwhite has joined #ocaml
ymherklotz has joined #ocaml
henrytill has joined #ocaml
kuruczgy has joined #ocaml
migalmoreno has joined #ocaml
_alix has joined #ocaml
toastal has joined #ocaml
richardhuxton has joined #ocaml
seeg has joined #ocaml
ggb has joined #ocaml
philipwhite has quit [Ping timeout: 260 seconds]
<discocaml>
<darrenldl> adrien: if you handle it as a line stream, you can do pipelining for some level of parallelism (though maybe not worth it for short lines and simple regex), and also avoid huge allocation of 1GB up front
ggb has quit [Ping timeout: 260 seconds]
henrytill has quit [Ping timeout: 260 seconds]
ymherklotz has quit [Ping timeout: 260 seconds]
b0o has quit [Ping timeout: 260 seconds]
brettgilio has quit [Ping timeout: 260 seconds]
pluviaq has quit [Ping timeout: 260 seconds]
whereiseveryone has quit [Ping timeout: 260 seconds]
kuruczgy has quit [Ping timeout: 260 seconds]
Ankhers has quit [Ping timeout: 260 seconds]
jakzale has quit [Ping timeout: 260 seconds]
patrick has quit [Ping timeout: 260 seconds]
<discocaml>
<darrenldl> that being said, i have no idea where the slowdown is precisely, so maybe this doesn't speed anything up
<adrien>
reading older docs, it uses memchr() so for this regex it would maybe/probably search for \n and memchr definitely SIMDs this
_alix has quit [Ping timeout: 260 seconds]
migalmoreno has quit [Ping timeout: 260 seconds]
soni_ has quit [Ping timeout: 260 seconds]
arya_elfren has quit [Ping timeout: 260 seconds]
seeg has quit [Ping timeout: 260 seconds]
richardhuxton has quit [Ping timeout: 260 seconds]
toastal has quit [Ping timeout: 260 seconds]
immutable has quit [Ping timeout: 260 seconds]
sleepydog has quit [Ping timeout: 260 seconds]
<adrien>
and if it does line-by-line, it doesn't have to accumulate buffers as long as it finds one \n in each
<adrien>
so I would expect something quite fast there, and I'm not trying to replicate that: I expect that part to be really fast anyway
<adrien>
my main concern was speed of reading from the pipe and then having a type that is appropriate for further processing
<adrien>
Buffer.add_channel was very slow and line-by-line in_channel was slow too IIRC but I don't have numbers for that anymore
<adrien>
I'm not trying to beat speed records for that, but mostly to understand better and see where performance boundaries lie
<discocaml>
<darrenldl> my impression of large allocation in GBs is you pay a relatively big up front cost, but it's likely my knowledge is outdated
<adrien>
I don't think the allocation of 1GB was an issue because I could see it being fast; however, dirtying all pages and moving memory repeatedly in and out of cache is likely expensive
<discocaml>
<darrenldl> yeah you're right, just tried Buffer.create in utop, just an additional second
<adrien>
I shall get hard numbers for that but a large allocation _without_ initializing the memory should be very inexpensive
<discocaml>
<darrenldl> is your code online?
<adrien>
no but it's just either Bytes.create and a recursive function that calls Unix.read while moving the offset, or Buffer.create and Buffer.add_channel
<discocaml>
<darrenldl> gotcha
<adrien>
I'll maybe publish something later on but right now computing alternatives is based on uncommenting the corresponding block of code
<discocaml>
<darrenldl> yeah this is interesting, not obvious why it's much slower
ggb has joined #ocaml
Ankhers has joined #ocaml
sleepydog has joined #ocaml
henrytill has joined #ocaml
b0o has joined #ocaml
_alix has joined #ocaml
ymherklotz has joined #ocaml
philipwhite has joined #ocaml
<discocaml>
<darrenldl> does rg do any mmap to file type of stuff?
<discocaml>
<darrenldl> oh wait you checked rg strace already, nvm
<adrien>
it's reading from a pipe so mmap is impossible
<adrien>
I still haven't tested but I think that 64KB of data fits in the CPU cache nicely
<adrien>
I tried reading more and 1MB at a time is a kind of maximum after which performance decreases
<adrien>
4KB isn't enough, 2MB is too much, and I would have to plot in-between but I should also have a machine that is otherwise silent so I get good benchmark numbers
<adrien>
too much involvement right now
ymherklotz has quit [Ping timeout: 260 seconds]
philipwhite has quit [Ping timeout: 260 seconds]
sleepydog has quit [Ping timeout: 260 seconds]
_alix has quit [Ping timeout: 276 seconds]
b0o has quit [Ping timeout: 276 seconds]
henrytill has quit [Ping timeout: 276 seconds]
Ankhers has quit [Ping timeout: 276 seconds]
ggb has quit [Ping timeout: 276 seconds]
czy has joined #ocaml
rgrinberg has joined #ocaml
fweht has quit [Quit: Connection closed for inactivity]
opus has joined #ocaml
Serpent7776 has quit [Ping timeout: 256 seconds]
a51 has joined #ocaml
bartholin has joined #ocaml
rgrinberg has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
rgrinberg has joined #ocaml
rgrinberg has quit [Client Quit]
<discocaml>
<darrenldl> reading ripgrep author's blogpost, and indeed a sliding window is used, tho i guess ill need to dig into the source code for threading details
rgrinberg has joined #ocaml
szkl has quit [Quit: Connection closed for inactivity]
motherfsck has joined #ocaml
<adrien>
I wouldn't be surprised the code has changed quite a lot since then
famubu has joined #ocaml
<famubu>
Hi. Is there a way to use an infix operator as a prefix operator?
<discocaml>
<._null._> Not exactly as a prefix operator, but you can get it as a regular ident
<discocaml>
<._null._> Wrap it in parentheses and spaces
<famubu>
Specifically, I was tyring to see if I could shorten `List.map (fun s -> "a" ^ s) str_list` to `List.map (^ "a") str_list`
<discocaml>
<._null._> That doesn't mke it much more readable
<discocaml>
<._null._> make*
<discocaml>
<Kali> you can also just use the named version (`String.cat "a"`)
<famubu>
Yeah.. `(^) "a"` doesn't really make it more readable. `String.cat` is better. Thank you.
waleee has joined #ocaml
azimut has quit [Ping timeout: 240 seconds]
azimut has joined #ocaml
azimut has quit [Ping timeout: 240 seconds]
azimut has joined #ocaml
azimut has quit [Ping timeout: 240 seconds]
rgrinberg has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
azimut has joined #ocaml
neuroevolutus has joined #ocaml
czy has quit [Read error: Connection reset by peer]
czy has joined #ocaml
azimut has quit [Ping timeout: 240 seconds]
azimut has joined #ocaml
rgrinberg has joined #ocaml
<discocaml>
<ypkl> hi there! is there a way to log what the garbage collector is doing?
<discocaml>
<ypkl> hi there! is there a way to log what the garbage collector is doing (as in all performed operations, ideally with timestamps)?
neuroevolutus has quit [Quit: Client closed]
<companion_cube>
with OCaml 5 you can, with `Runtime_events`
trev has quit [Quit: trev]
neuroevolutus has joined #ocaml
szkl has joined #ocaml
rgrinberg has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
neuroevolutus has quit [Quit: Client closed]
bartholin has quit [Quit: Leaving]
rgrinberg has joined #ocaml
azimut has quit [Ping timeout: 240 seconds]
<adrien>
you can also increase verbosity in the Gc module and timestamp externally (like with ts from moreutils)
tizoc has joined #ocaml
<discocaml>
<tjammer> dune-release or opam-publish: Which one should I use again?
tizoc has quit [Client Quit]
<discocaml>
<rgrinberg> if you're using dune, dune-release is usually simpler. if you're using something other than dune, opam-publish is your main option
azimut has joined #ocaml
rgrinberg has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<discocaml>
<tjammer> got you
rgrinberg has joined #ocaml
rgrinberg has quit [Client Quit]
wingsorc has joined #ocaml
<discocaml>
<regularspatula> What's with the suffixes in the js_of_ocaml change log? Eg `5.1.1 (2023-03-15) - Lille` and `5.1.0 (2023-03-07) - Otari` (the Lille and Otari parts)?