#picolisp on 2024-03-03 — irc logs at libera.irclog.whitequark.org

2021-05-27 09:06 beneroth changed the topic of #picolisp to: PicoLisp language | The scalpel of software development | Channel Log: https://libera.irclog.whitequark.org/picolisp | Check www.picolisp.com for more information

00:51 alexshe35 has quit [Remote host closed the connection]

03:54 seninha has quit [Quit: Leaving]

07:17 <tankf33der> morning all

07:17 <tankf33der> still thinking about this one

07:17 <tankf33der> https://www.morling.dev/blog/one-billion-row-challenge/

07:21 <abu[7]> Hi tankf33der

07:23 <abu[7]> Does it have to be in Java? Or not?

07:25 <tankf33der> it is already implemented on different languages

07:26 <tankf33der> common lisp - 43sec

07:26 <tankf33der> go - 4s (with parallelizations)

07:28 <abu[7]> Nice task

07:31 <abu[7]> How many stations are in the file?

07:39 <tankf33der> demo file has 44k

07:40 <tankf33der> but every language generates it own 1_000_000_000 lines

07:41 <abu[7]> By repeatirg?

07:41 <tankf33der> no, random

07:42 <tankf33der> city names are repeating, yes

08:25 <abu[7]> I think using many processes is uninteresting. It becomes a matter of hardware. I would use two processes (i.e. a pipe for preprocessing)

08:26 <tankf33der> Does it increase performance?

08:27 <abu[7]> Yes

08:56 <tankf33der> Even single process version is tricky

08:56 <tankf33der> to be fast

08:57 <abu[7]> Would you use 'idx'?

08:58 <tankf33der> Yeap

08:59 <tankf33der> hard modified accu variation

08:59 <abu[7]> (idx 'X (line T) ...

09:00 <abu[7]> I would split at ";" in the pipe

09:00 <abu[7]> then (line T) and (format (line) 2)

09:01 <tankf33der> Yea

09:03 <abu[7]> Is it too slow?

09:17 <tankf33der> Still thinking, no code

09:18 <tankf33der> :)

09:18 <abu[7]> Good :)

09:18 <tankf33der> i belive in many-many minutes

09:19 <abu[7]> No hurry, just fun

13:10 <tankf33der> done

13:10 <tankf33der> http://pb1n.de/?0a5ff0

13:11 <tankf33der> output is trivial.

13:12 <tankf33der> now i will generate on 43k cities 1M row of values.

13:18 <abu[7]> I would not use 'lup'. This way you travers the tree twice each time. Better just 'idx' once and decide on the return value.

13:22 <tankf33der> ok

13:27 <abu[7]> Are the input city names sorted? If so, performance gets bad.

13:27 <abu[7]> Random order is better

13:28 <abu[7]> or, if sorted, even better because then you can use 'balance' for an optimal tree

13:28 <tankf33der> not sorted

13:29 <abu[7]> ok, fine

13:29 <abu[7]> That's easiest

13:36 <tankf33der> now i understood idx is smart. ok

13:40 <abu[7]> yeap, very convenient here

13:40 <abu[7]> Something like (ifn (idx 'Var "str

13:40 <abu[7]> oops

13:41 <abu[7]> (ifn (idx 'Var City T) (set City (list N N N)) ... inc and keep min/max

13:45 <tankf33der> aha, i am using idx wrong

13:45 <tankf33der> i felt it

14:01 <abu[7]> If the return value is NIL, you initialize the transient symbol, else you use the symbol in the CAAR of the return value (not the newly read transient symbol!) and increment etc. its value.

14:02 <tankf33der> ok

14:15 <tankf33der> prop list is ok here/

14:15 <tankf33der> prop list is ok here?

14:16 <abu[7]> I thirk too expersiwe

14:16 <abu[7]> Better a list (list N N N)

14:16 <abu[7]> for avg, max, min for example

14:17 <tankf33der> ok

14:17 <abu[7]> Then (inc L) for counter, compare (cdr L) etc.

14:18 <tankf33der> if so then (list 1 N N N) should be

14:18 seninha has joined #picolisp

14:18 <abu[7]> yeah, the count

14:18 <tankf33der> fighting

14:19 <abu[7]> So four list elements?

14:19 <tankf33der> yeap

14:19 <abu[7]> counter, sum, max, min

14:19 <tankf33der> for example, yes

14:21 <abu[7]> Perhaps better ((cnt . min) . (sum . max)) for shorter C...Rs

14:21 <tankf33der> ah

14:22 <tankf33der> next level

14:22 <abu[7]> but we have cadddr for the last

14:22 <abu[7]> so a flat list should be ok

14:23 <abu[7]> For 5 elements we would need 2 calls

14:23 <abu[7]> (cdr (cddddr))

14:25 <abu[7]> (car (cddddr L)) I mean

14:29 <tankf33der> i am stuck in idx, completly.

14:29 <tankf33der> i knew it

14:30 <tankf33der> http://pb1n.de/?5dc369

14:30 <abu[7]> Perhaps look at 'cache', iirc it does somthing similar

14:30 <tankf33der> what command should be next if i want (set "A" (list 1 2))

14:30 <tankf33der> ?

14:31 <tankf33der> lurkingf

14:31 <tankf33der> lurking

14:31 <abu[7]> 'off' without quote

14:31 <abu[7]> (idx 'R "A" T) is good

14:31 <tankf33der> ok

14:43 <abu[7]> (let S '"A" (if (idx 'R S T) (inc (caar @)) (set S (list 0))))

14:43 <abu[7]> Just counter

14:43 <tankf33der> running

14:45 <abu[7]> (caar @) is a shortcut for (val (car @))

14:45 <abu[7]> legal

14:45 <tankf33der> :/

15:13 <tankf33der> done

15:13 <tankf33der> http://pb1n.de/?c6d7a2

15:13 <tankf33der> how to print val of "Delhi" ?

15:15 <abu[7]> you need a sorted print, so simply (for S (idx 'R) (... (val S) ...

15:15 <abu[7]> (get Lst 2) is a little slower than (cadr L)

15:15 <tankf33der> nono, one city only

15:16 <abu[7]> The city is the symbol

15:16 <abu[7]> ah

15:16 <abu[7]> (car (idx 'R "Delhi"))

15:17 <tankf33der> it prints Delhi

15:17 <abu[7]> (caar (idx 'R "Delhi"))

15:17 <abu[7]> is the value then

15:19 <abu[7]> And instead of (put Lst 2 B) is (set (cdr Lst) B) better

15:19 <abu[7]> very little faster though ;)

15:20 <abu[7]> And if there are no empty lines, you don't need a separate (eof)

15:20 <abu[7]> (while (till ...

15:21 <abu[7]> Is the 'skip' needed? I mean do we need to support comments?

15:22 <tankf33der> :)

15:24 <abu[7]> heven more important:

15:24 <abu[7]> (put Lst 3 (+ B (get Lst 3)))

15:24 <abu[7]> (should be (inc (cddr Lst) B)

15:24 <tankf33der> ok

15:26 <abu[7]> We could also try if this is faster:

15:26 <abu[7]> (when (< B (get Lst 2)) (put Lst 2 B))

15:27 <abu[7]> (set (cdr Lst) (min (cdr Lst B)))

15:27 <abu[7]> hmm

15:27 <tankf33der> too much info i can handle all this :/

15:27 <abu[7]> Not much difference I think

15:27 <tankf33der> too much info i can noty handle all this :/

15:27 <tankf33der> too much info i can not handle all this :/

15:27 <abu[7]> No hurry :)

15:28 <abu[7]> You can try these things later to see if it gets a little faster :)

15:30 <abu[7]> I think the main bottleneck is 'till'

15:30 <tankf33der> split ?

15:31 <abu[7]> Even more expensive, as it creates a lot of garbage

15:31 <abu[7]> I would use a pipe

15:31 <abu[7]> and make two lines of each

15:31 <abu[7]> one line name, then one line value

15:32 <abu[7]> then (while (line T) (idx ..) (format (line) 4) ...

15:34 <abu[7]> (pipe (in "file" (prinl (till ..) (prinl))) (while (line T) (idx ..) (format

15:34 <tankf33der> insane.

15:34 <abu[7]> Not sure if it gets much faster

15:35 <abu[7]> but then the parsing is in the piped process

15:35 <tankf33der> two cpu's instead of one ?

15:35 <abu[7]> yeah

15:35 <abu[7]> Depends on where the bottleneck is

15:36 <abu[7]> probably it is 'idx' anyway

15:36 <tankf33der> i think so

15:36 <abu[7]> So not much gain if parsing is cheaper

15:36 <abu[7]> Better keep it simple first

15:51 <tankf33der> removed

15:51 <tankf33der> put-get

15:51 <tankf33der> http://pb1n.de/?c46f6b

15:52 <tankf33der> 1B rows file is 13GB in size

15:53 <abu[7]> I think (set (cddr Lst) (+ B (caddr Lst))) is better (inc (cddr Lst) B)

15:53 <tankf33der> sure, forgot to replace

15:54 <abu[7]> 13 GiB means average 12 chars per line

15:54 <tankf33der> i found this info in internet

15:54 <abu[7]> reasonable

15:54 <abu[7]> ok

15:54 <tankf33der> i did not play with generation yet

15:55 <abu[7]> Which generation do you mean?

15:55 <tankf33der> there is no 1B file

15:55 <tankf33der> you must handle this yourself

15:56 <abu[7]> I understand

15:56 <tankf33der> exhausted

15:57 <abu[7]> :)

15:59 <tankf33der> thinking how to remove (eof)

15:59 <abu[7]> good

16:17 <tankf33der> ungly

16:23 <tankf33der> http://pb1n.de/?978e9e

16:23 teddydd has quit [Ping timeout: 268 seconds]

16:24 <abu[7]> Looks very good I think

16:25 <tankf33der> ok then. commited.

16:25 <abu[7]> 😎👍

16:26 <tankf33der> lets play on one million rows

16:26 <abu[7]> good

16:28 <abu[7]> Perhaps an initial (gc <size>) is also good to speed up a little

16:28 <abu[7]> If you check how much memory is used *after* the run with (heap)

16:29 <tankf33der> maybe instead of cities lets use numbers or random 8 bytes random string /

16:29 <tankf33der> ?

16:30 <abu[7]> I though you have a short file with the cities (?)

16:30 <tankf33der> found 78k list on github

16:30 <tankf33der> already have around 40k

16:31 <abu[7]> How did other solvers of the task do it?

16:31 <abu[7]> Concatenated the short file several times?

16:36 <tankf33der> no energy to spend time on another projects

16:37 <abu[7]> I wonder why using random strings is an option

16:45 <tankf33der> 1M - 10sec

16:45 <tankf33der> shoked

16:46 <abu[7]> uh

16:48 <tankf33der> on simple laptop

16:48 <abu[7]> Did you pre-alloc heap?

16:48 <abu[7]> So that no gc runs all the time

16:49 <tankf33der> forgot

16:50 <tankf33der> 1sec

16:50 <tankf33der> aaaa

16:58 <abu[7]> Found something?

16:59 <tankf33der> no, 1sec is final run without gc

16:59 <tankf33der> too fast

16:59 <tankf33der> :)

16:59 <abu[7]> wow

17:00 <tankf33der> trying 100m

17:00 <tankf33der> calibrating gc

17:00 <abu[7]> ok

17:09 <tankf33der> 10m - 10sec

17:09 <tankf33der> 100m - 100secs

17:09 <tankf33der> right ?

17:10 <abu[7]> yes, linear

17:10 <tankf33der> ok

17:14 <tankf33der> enough for today

17:14 <abu[7]> ok

17:14 <tankf33der> thanks

17:15 <abu[7]> Thank you too :)

17:53 <abu[7]> It all depends on how deep the idx tree gets, i.e. how many different cities there are

18:01 <abu[7]> Next time, can you print the number of citiet, and (depth R) to see how big it is and how well it is balanced?

18:29 <tankf33der> I belive uniq cities are not 1M

18:29 <tankf33der> maybe 40-80k

18:44 <abu[7]> (lenghh (idx 'R))

18:45 <abu[7]> and (depth R) should ideally be in the order of the 2-log of that

18:45 <abu[7]> i.e. if 1 M cities than 20, but typically 30 of 40 (but not 1000)

18:46 <abu[7]> s/than/then

18:49 <abu[7]> Ah, the count is in '@@' after (depth R)

18:50 <abu[7]> afp

18:55 seninha has quit [Ping timeout: 260 seconds]

19:39 seninha has joined #picolisp

20:17 clacke has quit [K-Lined]

22:10 seninha has quit [Ping timeout: 264 seconds]