companion_cube changed the topic of #ocaml to: Discussion about the OCaml programming language | http://www.ocaml.org | OCaml 5.0 released(!!1!): https://ocaml.org/releases/5.0.0.html | Try OCaml in your browser: https://try.ocamlpro.com | Public channel logs at https://libera.irclog.whitequark.org/ocaml/
oriba has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
Haudegen has quit [Ping timeout: 240 seconds]
<discocaml> <darrenldl> can you do any lightweight compression of data?
chrisz has quit [Ping timeout: 240 seconds]
chrisz has joined #ocaml
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
terrorjack has joined #ocaml
wingsorc has quit [Remote host closed the connection]
wingsorc has joined #ocaml
Haudegen has joined #ocaml
spip has quit [Quit: Konversation terminated!]
bgs has joined #ocaml
motherfsck has quit [Quit: quit]
Serpent7776 has joined #ocaml
bartholin has joined #ocaml
m5zs7k has quit [Ping timeout: 240 seconds]
m5zs7k has joined #ocaml
Serpent7776 has quit [Ping timeout: 240 seconds]
mro has joined #ocaml
wingsorc__ has joined #ocaml
wingsorc has quit [Ping timeout: 265 seconds]
bartholin has quit [Quit: Leaving]
Serpent7776 has joined #ocaml
olle has joined #ocaml
xd1le has joined #ocaml
<adrien> for each byte of [0..200M] I might currently store int*int*int; I'm on a 64-bit machine and that means 24 bytes at least and therefore at least 10GB of data but this could be int32*int32*int16 which would use half the space roughly except I don't know if that would work due to boxing
<octachron> With boxing? Are you thinking of using Int32.t? This is not the right solution: you have to do the packing yourself. With Bytes and a custom operator that should not be too painful.
<adrien> I've also managed to bring the memory use down to 17GB or less (or 14GB but at a much higher CPU cost)by avoiding loading up all the data upfront; that didn't use that much memory but that had to be kept around during the most intensive operation
<adrien> ok, thanks, that's what I thought; I'll probably get around to doing it at some point today
<adrien> indeed, I think it's going to be doable because I only have a few places that need to be changed (but I need to clean the surrounding code first)
<adrien> and a possible subsequent optimization would be to re-implement BatIMap on top of a fixed-size array
kakadu has joined #ocaml
azimut has quit [Ping timeout: 255 seconds]
waleee has quit [Quit: WeeChat 3.8]
waleee has joined #ocaml
<adrien> my type is "type t = | Literal | Match of int * int * int | Padding"; shall I re-encode everything or only the int*int*int part?
<octachron> How is the data stored?
<octachron> A first change might be to replace `Match of int * int * int` by `Match of int * int`.
<adrien> Batteries' BatIMap, which I think is an AVL-tree
<adrien> I've packed the three ints into 10 bytes and it's currently running; unfortunately the first two ints can easily reach 2^30
<adrien> the last one is <= 273
<adrien> (this is LZMA's match finder btw)
mro has quit [Remote host closed the connection]
mro has joined #ocaml
<octachron> Hm, it should be possible to compress further by storing native ints in the map to represents either Literal | Padding or an offset in a bytes array which avoids the block header.
<adrien> I would only have the array after I've built the map I think
<adrien> also, I messed up and I'm currently using int*int*int where all of them are offsets; I wanted to switch to what amounts to "pos, pos+len, other_pos" but I haven't done it yet and it requires a bit of care (lzma offsets are not really 0-based)
<adrien> and I have a concern with the array: I'm not sure I would be able to allocate one that is large enough
Haudegen has quit [Ping timeout: 250 seconds]
<adrien> I tried with 12 bytes and I'm not getting a lower memory usage (but I'm getting assert failures :) )
<adrien> I'll stop there for now
wingsorc__ has quit [Ping timeout: 255 seconds]
spip has joined #ocaml
mro has quit [Remote host closed the connection]
mro has joined #ocaml
emp_ has quit [Ping timeout: 252 seconds]
John_Ivan has quit [Remote host closed the connection]
John_Ivan has joined #ocaml
emp has joined #ocaml
mro_ has joined #ocaml
mro has quit [Ping timeout: 240 seconds]
mro_ has quit [Remote host closed the connection]
mro has joined #ocaml
neiluj has joined #ocaml
mro has quit [Remote host closed the connection]
<neiluj> hi!
mro has joined #ocaml
Geekingfrog has quit [Quit: ZNC 1.8.2 - https://znc.in]
mro has quit [Ping timeout: 246 seconds]
Geekingfrog has joined #ocaml
Serpent7776 has quit [Quit: WeeChat 1.9.1]
bartholin has joined #ocaml
azimut has joined #ocaml
notnotdan has quit [Quit: bye]
szkl has joined #ocaml
Serpent7776 has joined #ocaml
neiluj has quit [Quit: WeeChat 3.7.1]
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
terrorjack has joined #ocaml
bgs has quit [Remote host closed the connection]
olle has quit [Ping timeout: 246 seconds]
waleee has quit [Quit: WeeChat 3.8]
waleee has joined #ocaml
bartholin has quit [Quit: Leaving]
mro has joined #ocaml
Tuplanolla has joined #ocaml
alexherbo2 has joined #ocaml
szkl has quit [Quit: Connection closed for inactivity]
gdd has quit [Ping timeout: 255 seconds]
azimut_ has joined #ocaml
azimut has quit [Ping timeout: 255 seconds]
John_Ivan has quit [Quit: Phantom of the future.]
mro has quit [Quit: Leaving...]
alexherbo2 has quit [Remote host closed the connection]
olle has joined #ocaml
Serpent7776 has quit [Ping timeout: 255 seconds]
John_Ivan has joined #ocaml
olle has quit [Ping timeout: 240 seconds]
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
terrorjack has joined #ocaml
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
terrorjack has joined #ocaml
Tuplanolla has quit [Quit: Leaving.]
wingsorc has joined #ocaml
czy has quit [Remote host closed the connection]