companion_cube changed the topic of #ocaml to: Discussion about the OCaml programming language | http://www.ocaml.org | OCaml 5.0 released(!!1!): https://ocaml.org/releases/5.0.0.html | Try OCaml in your browser: https://try.ocamlpro.com | Public channel logs at https://libera.irclog.whitequark.org/ocaml/
pieguy128 has quit [Quit: ZNC 1.8.2 - https://znc.in]
pieguy128 has joined #ocaml
waleee has joined #ocaml
Soni has quit [Ping timeout: 255 seconds]
pieguy128 has quit [Quit: ZNC 1.8.2 - https://znc.in]
pieguy128 has joined #ocaml
hsw has quit [Remote host closed the connection]
hsw has joined #ocaml
Soni has joined #ocaml
chrisz has quit [Ping timeout: 255 seconds]
chrisz has joined #ocaml
barak has joined #ocaml
barak has quit [Remote host closed the connection]
barak has joined #ocaml
pieguy128 has quit [Ping timeout: 252 seconds]
pieguy128 has joined #ocaml
barak has quit [Ping timeout: 248 seconds]
barak has joined #ocaml
szkl has quit [Quit: Connection closed for inactivity]
barak has quit [Ping timeout: 265 seconds]
spip has quit [Quit: Konversation terminated!]
barak has joined #ocaml
Haudegen has joined #ocaml
mbuf has joined #ocaml
motherfsck has joined #ocaml
barak has quit [Remote host closed the connection]
barak has joined #ocaml
barak_ has joined #ocaml
barak has quit [Ping timeout: 246 seconds]
bgs has joined #ocaml
Serpent7776 has joined #ocaml
wingsorc__ has quit [Quit: Leaving]
barak_ has quit [Remote host closed the connection]
barak_ has joined #ocaml
barak_ has quit [Remote host closed the connection]
barak_ has joined #ocaml
Techcable has quit [Ping timeout: 250 seconds]
waleee has quit [Quit: WeeChat 3.8]
barak__ has joined #ocaml
barak_ has quit [Ping timeout: 252 seconds]
bartholin has joined #ocaml
bgs has quit [Remote host closed the connection]
barak__ has quit [Ping timeout: 256 seconds]
barak has joined #ocaml
famubu has quit [Ping timeout: 248 seconds]
barak_ has joined #ocaml
barak has quit [Ping timeout: 250 seconds]
Techcable has joined #ocaml
szkl has joined #ocaml
mro has joined #ocaml
olle has joined #ocaml
azimut has quit [Ping timeout: 255 seconds]
bartholin has quit [Quit: Leaving]
barak_ has quit [Ping timeout: 240 seconds]
<adrien> that's probably the epilogue of my memory optimization journey for this: I think the intervals in the interval tree were only 2 or 3 bytes long; that's pretty terrible and the interval tree added a huge cost for tiny savings
<adrien> I switched to bigarray that's the same length as the file and holds Int32; that makes the usage 4*input_size and the algorithm simpler (although I wouldn't have arrived to that algorithm without first using BatIMap)
<adrien> and I populate a map of BatISet.t as before so that I can easily compute their intersection
<adrien> I was kind of reluctant to actively share code for which the usage instruction started like "step 0: get 64GB of RAM"
kakadu has joined #ocaml
spip has joined #ocaml
<olle> adrien: What about moving data to database? Not possible?
<adrien> well, for starters I don't need to optimize things further because usage is down to a few gigabytes at most (and I might have some low hanging fruits which are now more noticeable)
<adrien> but more importantly, I need to loop over the array in a quadratic fashion
waleee has joined #ocaml
<olle> Pre-fetch the parts you need from db before you arrive at them? :)
<olle> But yeah, few gigs is ok then
<adrien> one possible improvement is to split that bigarray into several smaller (up to 6000 ones with my current testcases but of uneven sizes)
<adrien> which would allow to move 90% of that into a cold storage
<adrien> (cold-ish but it could be refined)
<olle> Yeah, but doesn't sound necessary anymore
<adrien> I'll probably be working parallelization next
<adrien> but not this month
Haudegen has quit [Ping timeout: 276 seconds]
<adrien> but I'll add a progress indicator
<discocaml> <darrenldl> adrien: i'm going to ask silly question which you may have already answered long ago - is the repo online?
<companion_cube> adrien: what was the max integer? Could bitvectors work?
mro has quit [Remote host closed the connection]
mro has joined #ocaml
waleee has quit [Ping timeout: 240 seconds]
waleee has joined #ocaml
waleee has quit [Quit: WeeChat 3.8]
szkl has quit [Quit: Connection closed for inactivity]
waleee has joined #ocaml
<adrien> companion_cube: for intersection of BatISet.t? that's possible, in any case I'll probably do something for performance fairly soon because O(n²) is too painful on one of my tests and counting the number of elements in both sets can be much faster
<companion_cube> I guess you could also give a try to https://roaringbitmap.org/
<companion_cube> bitvectors, only, work better for a large set of values
<adrien> and the code is at https://gitlab.com/adrien-n/compsort/ but it's completely lacking documentation, is ugly, contains various weird things/attempts to limit memory usage, and you can still see the first version was shell script
<adrien> oh, I could look at them indeed; I remember seeing them a few months ago and thinking I definitely didn't have a use for them :P
<adrien> but I might not have concerns about memory usage here
<adrien> the first step is a modified run of "xz" which takes a lot of memory (10 * input_size) and if my program doesn't use more, it's not going to be a bottleneck
<adrien> (a future change is to implement a match finder; this might reduce the memory usage, in which case it might make sense to revisit other steps)
waleee has quit [Ping timeout: 246 seconds]
Anarchos has joined #ocaml
<adrien> and basically, the program is about optimizing the order of files in archives that are to be compressed; there can be some significant gains and for distributions, it is an operation that can be done very infrequently
barak has joined #ocaml
<adrien> companion_cube: but roaring bitmaps could reduce memory usage by a lot; I'll have to compare the CPU time
<companion_cube> reducing memory => reducing time, often
<adrien> yup but in this case I think I need a bigarray of the same size as my first bigarray, then for a small-ish range of the first bigarray, store 1 in corresponding cells of the second bigarray, then for another small-ish range of the first bigarray, look for 1s in corresponding cells of the second bigarray and count how many matches there are
<adrien> that's 2*k where k is the average size of files in the archive, which is fairly small
<adrien> and overall that should be like going through one of these bigarrays twice only
mro has quit [Quit: Leaving...]
Myrl-saki has quit [Ping timeout: 265 seconds]
Anarchos has quit [Quit: Vision[]: i've been blurred!]
chrisz has quit [Ping timeout: 248 seconds]
barak has quit [Ping timeout: 246 seconds]
chrisz has joined #ocaml
motherfsck has quit [Ping timeout: 255 seconds]
motherfsck has joined #ocaml
Haudegen has joined #ocaml
olle has quit [Ping timeout: 248 seconds]
Anarchos has joined #ocaml
motherfsck has quit [Ping timeout: 248 seconds]
oriba has joined #ocaml
berberman_ has quit [Ping timeout: 256 seconds]
Exa has quit [Quit: see ya!]
joseemds has joined #ocaml
mbuf has quit [Quit: Leaving]
joseemds has quit [Client Quit]
Anarchos has quit [Quit: Vision[]: i've been blurred!]
Serpent7776 has quit [Quit: leaving]
bartholin has joined #ocaml
berberman has joined #ocaml
alexherbo2 has joined #ocaml
<discocaml> <darrenldl> oh wew, getting started with eio takes quite a bit of elbow grease
Exa has joined #ocaml
szkl has joined #ocaml
waleee has joined #ocaml
alexherbo2 has quit [Ping timeout: 260 seconds]
alexherbo2 has joined #ocaml
olle has joined #ocaml
Stumpfenstiel has joined #ocaml
olle has quit [Ping timeout: 240 seconds]
barak has joined #ocaml
gentauro has quit [Read error: Connection reset by peer]
dstein64 has quit [Ping timeout: 255 seconds]
gentauro has joined #ocaml
Tuplanolla has joined #ocaml
oriba has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
bartholin has quit [Quit: Leaving]
barak has quit [Remote host closed the connection]
alexherbo2 has quit [Remote host closed the connection]
barak has joined #ocaml
Haudegen has quit [Ping timeout: 248 seconds]
Tuplanolla has quit [Quit: Leaving.]
Stumpfenstiel has quit [Ping timeout: 276 seconds]
barak has quit [Remote host closed the connection]
barak has joined #ocaml
barak has quit [Ping timeout: 248 seconds]
barak_ has joined #ocaml
barak has joined #ocaml
barak_ has quit [Read error: Connection reset by peer]
noxp has joined #ocaml
barak has quit [Ping timeout: 260 seconds]