beneroth changed the topic of #picolisp to: PicoLisp language | The scalpel of software development | Channel Log: | Check for more information
alexshe35 has quit [Remote host closed the connection]
seninha has quit [Quit: Leaving]
<tankf33der> morning all
<tankf33der> still thinking about this one
<abu[7]> Hi tankf33der
<abu[7]> Does it have to be in Java? Or not?
<tankf33der> it is already implemented on different languages
<tankf33der> common lisp - 43sec
<tankf33der> go - 4s (with parallelizations)
<abu[7]> Nice task
<abu[7]> How many stations are in the file?
<tankf33der> demo file has 44k
<tankf33der> but every language generates it own 1_000_000_000 lines
<abu[7]> By repeatirg?
<tankf33der> no, random
<tankf33der> city names are repeating, yes
<abu[7]> I think using many processes is uninteresting. It becomes a matter of hardware. I would use two processes (i.e. a pipe for preprocessing)
<tankf33der> Does it increase performance?
<abu[7]> Yes
<tankf33der> Even single process version is tricky
<tankf33der> to be fast
<abu[7]> Would you use 'idx'?
<tankf33der> Yeap
<tankf33der> hard modified accu variation
<abu[7]> (idx 'X (line T) ...
<abu[7]> I would split at ";" in the pipe
<abu[7]> then (line T) and (format (line) 2)
<tankf33der> Yea
<abu[7]> Is it too slow?
<tankf33der> Still thinking, no code
<tankf33der> :)
<abu[7]> Good :)
<tankf33der> i belive in many-many minutes
<abu[7]> No hurry, just fun
<tankf33der> done
<tankf33der> output is trivial.
<tankf33der> now i will generate on 43k cities 1M row of values.
<abu[7]> I would not use 'lup'. This way you travers the tree twice each time. Better just 'idx' once and decide on the return value.
<tankf33der> ok
<abu[7]> Are the input city names sorted? If so, performance gets bad.
<abu[7]> Random order is better
<abu[7]> or, if sorted, even better because then you can use 'balance' for an optimal tree
<tankf33der> not sorted
<abu[7]> ok, fine
<abu[7]> That's easiest
<tankf33der> now i understood idx is smart. ok
<abu[7]> yeap, very convenient here
<abu[7]> Something like (ifn (idx 'Var "str
<abu[7]> oops
<abu[7]> (ifn (idx 'Var City T) (set City (list N N N)) ... inc and keep min/max
<tankf33der> aha, i am using idx wrong
<tankf33der> i felt it
<abu[7]> If the return value is NIL, you initialize the transient symbol, else you use the symbol in the CAAR of the return value (not the newly read transient symbol!) and increment etc. its value.
<tankf33der> ok
<tankf33der> prop list is ok here/
<tankf33der> prop list is ok here?
<abu[7]> I thirk too expersiwe
<abu[7]> Better a list (list N N N)
<abu[7]> for avg, max, min for example
<tankf33der> ok
<abu[7]> Then (inc L) for counter, compare (cdr L) etc.
<tankf33der> if so then (list 1 N N N) should be
seninha has joined #picolisp
<abu[7]> yeah, the count
<tankf33der> fighting
<abu[7]> So four list elements?
<tankf33der> yeap
<abu[7]> counter, sum, max, min
<tankf33der> for example, yes
<abu[7]> Perhaps better ((cnt . min) . (sum . max)) for shorter C...Rs
<tankf33der> ah
<tankf33der> next level
<abu[7]> but we have cadddr for the last
<abu[7]> so a flat list should be ok
<abu[7]> For 5 elements we would need 2 calls
<abu[7]> (cdr (cddddr))
<abu[7]> (car (cddddr L)) I mean
<tankf33der> i am stuck in idx, completly.
<tankf33der> i knew it
<abu[7]> Perhaps look at 'cache', iirc it does somthing similar
<tankf33der> what command should be next if i want (set "A" (list 1 2))
<tankf33der> ?
<tankf33der> lurkingf
<tankf33der> lurking
<abu[7]> 'off' without quote
<abu[7]> (idx 'R "A" T) is good
<tankf33der> ok
<abu[7]> (let S '"A" (if (idx 'R S T) (inc (caar @)) (set S (list 0))))
<abu[7]> Just counter
<tankf33der> running
<abu[7]> (caar @) is a shortcut for (val (car @))
<abu[7]> legal
<tankf33der> :/
<tankf33der> done
<tankf33der> how to print val of "Delhi" ?
<abu[7]> you need a sorted print, so simply (for S (idx 'R) (... (val S) ...
<abu[7]> (get Lst 2) is a little slower than (cadr L)
<tankf33der> nono, one city only
<abu[7]> The city is the symbol
<abu[7]> ah
<abu[7]> (car (idx 'R "Delhi"))
<tankf33der> it prints Delhi
<abu[7]> (caar (idx 'R "Delhi"))
<abu[7]> is the value then
<abu[7]> And instead of (put Lst 2 B) is (set (cdr Lst) B) better
<abu[7]> very little faster though ;)
<abu[7]> And if there are no empty lines, you don't need a separate (eof)
<abu[7]> (while (till ...
<abu[7]> Is the 'skip' needed? I mean do we need to support comments?
<tankf33der> :)
<abu[7]> heven more important:
<abu[7]> (put Lst 3 (+ B (get Lst 3)))
<abu[7]> (should be (inc (cddr Lst) B)
<tankf33der> ok
<abu[7]> We could also try if this is faster:
<abu[7]> (when (< B (get Lst 2)) (put Lst 2 B))
<abu[7]> (set (cdr Lst) (min (cdr Lst B)))
<abu[7]> hmm
<tankf33der> too much info i can handle all this :/
<abu[7]> Not much difference I think
<tankf33der> too much info i can noty handle all this :/
<tankf33der> too much info i can not handle all this :/
<abu[7]> No hurry :)
<abu[7]> You can try these things later to see if it gets a little faster :)
<abu[7]> I think the main bottleneck is 'till'
<tankf33der> split ?
<abu[7]> Even more expensive, as it creates a lot of garbage
<abu[7]> I would use a pipe
<abu[7]> and make two lines of each
<abu[7]> one line name, then one line value
<abu[7]> then (while (line T) (idx ..) (format (line) 4) ...
<abu[7]> (pipe (in "file" (prinl (till ..) (prinl))) (while (line T) (idx ..) (format
<tankf33der> insane.
<abu[7]> Not sure if it gets much faster
<abu[7]> but then the parsing is in the piped process
<tankf33der> two cpu's instead of one ?
<abu[7]> yeah
<abu[7]> Depends on where the bottleneck is
<abu[7]> probably it is 'idx' anyway
<tankf33der> i think so
<abu[7]> So not much gain if parsing is cheaper
<abu[7]> Better keep it simple first
<tankf33der> removed
<tankf33der> put-get
<tankf33der> 1B rows file is 13GB in size
<abu[7]> I think (set (cddr Lst) (+ B (caddr Lst))) is better (inc (cddr Lst) B)
<tankf33der> sure, forgot to replace
<abu[7]> 13 GiB means average 12 chars per line
<tankf33der> i found this info in internet
<abu[7]> reasonable
<abu[7]> ok
<tankf33der> i did not play with generation yet
<abu[7]> Which generation do you mean?
<tankf33der> there is no 1B file
<tankf33der> you must handle this yourself
<abu[7]> I understand
<tankf33der> exhausted
<abu[7]> :)
<tankf33der> thinking how to remove (eof)
<abu[7]> good
<tankf33der> ungly
teddydd has quit [Ping timeout: 268 seconds]
<abu[7]> Looks very good I think
<tankf33der> ok then. commited.
<abu[7]> 😎👍
<tankf33der> lets play on one million rows
<abu[7]> good
<abu[7]> Perhaps an initial (gc <size>) is also good to speed up a little
<abu[7]> If you check how much memory is used *after* the run with (heap)
<tankf33der> maybe instead of cities lets use numbers or random 8 bytes random string /
<tankf33der> ?
<abu[7]> I though you have a short file with the cities (?)
<tankf33der> found 78k list on github
<tankf33der> already have around 40k
<abu[7]> How did other solvers of the task do it?
<abu[7]> Concatenated the short file several times?
<tankf33der> no energy to spend time on another projects
<abu[7]> I wonder why using random strings is an option
<tankf33der> 1M - 10sec
<tankf33der> shoked
<abu[7]> uh
<tankf33der> on simple laptop
<abu[7]> Did you pre-alloc heap?
<abu[7]> So that no gc runs all the time
<tankf33der> forgot
<tankf33der> 1sec
<tankf33der> aaaa
<abu[7]> Found something?
<tankf33der> no, 1sec is final run without gc
<tankf33der> too fast
<tankf33der> :)
<abu[7]> wow
<tankf33der> trying 100m
<tankf33der> calibrating gc
<abu[7]> ok
<tankf33der> 10m - 10sec
<tankf33der> 100m - 100secs
<tankf33der> right ?
<abu[7]> yes, linear
<tankf33der> ok
<tankf33der> enough for today
<abu[7]> ok
<tankf33der> thanks
<abu[7]> Thank you too :)
<abu[7]> It all depends on how deep the idx tree gets, i.e. how many different cities there are
<abu[7]> Next time, can you print the number of citiet, and (depth R) to see how big it is and how well it is balanced?
<tankf33der> I belive uniq cities are not 1M
<tankf33der> maybe 40-80k
<abu[7]> (lenghh (idx 'R))
<abu[7]> and (depth R) should ideally be in the order of the 2-log of that
<abu[7]> i.e. if 1 M cities than 20, but typically 30 of 40 (but not 1000)
<abu[7]> s/than/then
<abu[7]> Ah, the count is in '@@' after (depth R)
<abu[7]> afp
seninha has quit [Ping timeout: 260 seconds]
seninha has joined #picolisp
clacke has quit [K-Lined]
seninha has quit [Ping timeout: 264 seconds]