00:51
alexshe35 has quit [Remote host closed the connection]
03:54
seninha has quit [Quit: Leaving]
07:17
<
tankf33der >
morning all
07:17
<
tankf33der >
still thinking about this one
07:21
<
abu[7] >
Hi tankf33der
07:23
<
abu[7] >
Does it have to be in Java? Or not?
07:25
<
tankf33der >
it is already implemented on different languages
07:26
<
tankf33der >
common lisp - 43sec
07:26
<
tankf33der >
go - 4s (with parallelizations)
07:31
<
abu[7] >
How many stations are in the file?
07:39
<
tankf33der >
demo file has 44k
07:40
<
tankf33der >
but every language generates it own 1_000_000_000 lines
07:41
<
abu[7] >
By repeatirg?
07:41
<
tankf33der >
no, random
07:42
<
tankf33der >
city names are repeating, yes
08:25
<
abu[7] >
I think using many processes is uninteresting. It becomes a matter of hardware. I would use two processes (i.e. a pipe for preprocessing)
08:26
<
tankf33der >
Does it increase performance?
08:56
<
tankf33der >
Even single process version is tricky
08:56
<
tankf33der >
to be fast
08:57
<
abu[7] >
Would you use 'idx'?
08:59
<
tankf33der >
hard modified accu variation
08:59
<
abu[7] >
(idx 'X (line T) ...
09:00
<
abu[7] >
I would split at ";" in the pipe
09:00
<
abu[7] >
then (line T) and (format (line) 2)
09:03
<
abu[7] >
Is it too slow?
09:17
<
tankf33der >
Still thinking, no code
09:18
<
tankf33der >
i belive in many-many minutes
09:19
<
abu[7] >
No hurry, just fun
13:11
<
tankf33der >
output is trivial.
13:12
<
tankf33der >
now i will generate on 43k cities 1M row of values.
13:18
<
abu[7] >
I would not use 'lup'. This way you travers the tree twice each time. Better just 'idx' once and decide on the return value.
13:27
<
abu[7] >
Are the input city names sorted? If so, performance gets bad.
13:27
<
abu[7] >
Random order is better
13:28
<
abu[7] >
or, if sorted, even better because then you can use 'balance' for an optimal tree
13:28
<
tankf33der >
not sorted
13:29
<
abu[7] >
That's easiest
13:36
<
tankf33der >
now i understood idx is smart. ok
13:40
<
abu[7] >
yeap, very convenient here
13:40
<
abu[7] >
Something like (ifn (idx 'Var "str
13:41
<
abu[7] >
(ifn (idx 'Var City T) (set City (list N N N)) ... inc and keep min/max
13:45
<
tankf33der >
aha, i am using idx wrong
13:45
<
tankf33der >
i felt it
14:01
<
abu[7] >
If the return value is NIL, you initialize the transient symbol, else you use the symbol in the CAAR of the return value (not the newly read transient symbol!) and increment etc. its value.
14:15
<
tankf33der >
prop list is ok here/
14:15
<
tankf33der >
prop list is ok here?
14:16
<
abu[7] >
I thirk too expersiwe
14:16
<
abu[7] >
Better a list (list N N N)
14:16
<
abu[7] >
for avg, max, min for example
14:17
<
abu[7] >
Then (inc L) for counter, compare (cdr L) etc.
14:18
<
tankf33der >
if so then (list 1 N N N) should be
14:18
seninha has joined #picolisp
14:18
<
abu[7] >
yeah, the count
14:18
<
tankf33der >
fighting
14:19
<
abu[7] >
So four list elements?
14:19
<
abu[7] >
counter, sum, max, min
14:19
<
tankf33der >
for example, yes
14:21
<
abu[7] >
Perhaps better ((cnt . min) . (sum . max)) for shorter C...Rs
14:22
<
tankf33der >
next level
14:22
<
abu[7] >
but we have cadddr for the last
14:22
<
abu[7] >
so a flat list should be ok
14:23
<
abu[7] >
For 5 elements we would need 2 calls
14:23
<
abu[7] >
(cdr (cddddr))
14:25
<
abu[7] >
(car (cddddr L)) I mean
14:29
<
tankf33der >
i am stuck in idx, completly.
14:29
<
tankf33der >
i knew it
14:30
<
abu[7] >
Perhaps look at 'cache', iirc it does somthing similar
14:30
<
tankf33der >
what command should be next if i want (set "A" (list 1 2))
14:31
<
tankf33der >
lurkingf
14:31
<
tankf33der >
lurking
14:31
<
abu[7] >
'off' without quote
14:31
<
abu[7] >
(idx 'R "A" T) is good
14:43
<
abu[7] >
(let S '"A" (if (idx 'R S T) (inc (caar @)) (set S (list 0))))
14:43
<
abu[7] >
Just counter
14:43
<
tankf33der >
running
14:45
<
abu[7] >
(caar @) is a shortcut for (val (car @))
15:13
<
tankf33der >
how to print val of "Delhi" ?
15:15
<
abu[7] >
you need a sorted print, so simply (for S (idx 'R) (... (val S) ...
15:15
<
abu[7] >
(get Lst 2) is a little slower than (cadr L)
15:15
<
tankf33der >
nono, one city only
15:16
<
abu[7] >
The city is the symbol
15:16
<
abu[7] >
(car (idx 'R "Delhi"))
15:17
<
tankf33der >
it prints Delhi
15:17
<
abu[7] >
(caar (idx 'R "Delhi"))
15:17
<
abu[7] >
is the value then
15:19
<
abu[7] >
And instead of (put Lst 2 B) is (set (cdr Lst) B) better
15:19
<
abu[7] >
very little faster though ;)
15:20
<
abu[7] >
And if there are no empty lines, you don't need a separate (eof)
15:20
<
abu[7] >
(while (till ...
15:21
<
abu[7] >
Is the 'skip' needed? I mean do we need to support comments?
15:24
<
abu[7] >
heven more important:
15:24
<
abu[7] >
(put Lst 3 (+ B (get Lst 3)))
15:24
<
abu[7] >
(should be (inc (cddr Lst) B)
15:26
<
abu[7] >
We could also try if this is faster:
15:26
<
abu[7] >
(when (< B (get Lst 2)) (put Lst 2 B))
15:27
<
abu[7] >
(set (cdr Lst) (min (cdr Lst B)))
15:27
<
tankf33der >
too much info i can handle all this :/
15:27
<
abu[7] >
Not much difference I think
15:27
<
tankf33der >
too much info i can noty handle all this :/
15:27
<
tankf33der >
too much info i can not handle all this :/
15:27
<
abu[7] >
No hurry :)
15:28
<
abu[7] >
You can try these things later to see if it gets a little faster :)
15:30
<
abu[7] >
I think the main bottleneck is 'till'
15:30
<
tankf33der >
split ?
15:31
<
abu[7] >
Even more expensive, as it creates a lot of garbage
15:31
<
abu[7] >
I would use a pipe
15:31
<
abu[7] >
and make two lines of each
15:31
<
abu[7] >
one line name, then one line value
15:32
<
abu[7] >
then (while (line T) (idx ..) (format (line) 4) ...
15:34
<
abu[7] >
(pipe (in "file" (prinl (till ..) (prinl))) (while (line T) (idx ..) (format
15:34
<
tankf33der >
insane.
15:34
<
abu[7] >
Not sure if it gets much faster
15:35
<
abu[7] >
but then the parsing is in the piped process
15:35
<
tankf33der >
two cpu's instead of one ?
15:35
<
abu[7] >
Depends on where the bottleneck is
15:36
<
abu[7] >
probably it is 'idx' anyway
15:36
<
tankf33der >
i think so
15:36
<
abu[7] >
So not much gain if parsing is cheaper
15:36
<
abu[7] >
Better keep it simple first
15:51
<
tankf33der >
removed
15:51
<
tankf33der >
put-get
15:52
<
tankf33der >
1B rows file is 13GB in size
15:53
<
abu[7] >
I think (set (cddr Lst) (+ B (caddr Lst))) is better (inc (cddr Lst) B)
15:53
<
tankf33der >
sure, forgot to replace
15:54
<
abu[7] >
13 GiB means average 12 chars per line
15:54
<
tankf33der >
i found this info in internet
15:54
<
abu[7] >
reasonable
15:54
<
tankf33der >
i did not play with generation yet
15:55
<
abu[7] >
Which generation do you mean?
15:55
<
tankf33der >
there is no 1B file
15:55
<
tankf33der >
you must handle this yourself
15:56
<
abu[7] >
I understand
15:56
<
tankf33der >
exhausted
15:59
<
tankf33der >
thinking how to remove (eof)
16:23
teddydd has quit [Ping timeout: 268 seconds]
16:24
<
abu[7] >
Looks very good I think
16:25
<
tankf33der >
ok then. commited.
16:26
<
tankf33der >
lets play on one million rows
16:28
<
abu[7] >
Perhaps an initial (gc <size>) is also good to speed up a little
16:28
<
abu[7] >
If you check how much memory is used
*after* the run with (heap)
16:29
<
tankf33der >
maybe instead of cities lets use numbers or random 8 bytes random string /
16:30
<
abu[7] >
I though you have a short file with the cities (?)
16:30
<
tankf33der >
found 78k list on github
16:30
<
tankf33der >
already have around 40k
16:31
<
abu[7] >
How did other solvers of the task do it?
16:31
<
abu[7] >
Concatenated the short file several times?
16:36
<
tankf33der >
no energy to spend time on another projects
16:37
<
abu[7] >
I wonder why using random strings is an option
16:45
<
tankf33der >
1M - 10sec
16:45
<
tankf33der >
shoked
16:48
<
tankf33der >
on simple laptop
16:48
<
abu[7] >
Did you pre-alloc heap?
16:48
<
abu[7] >
So that no gc runs all the time
16:49
<
tankf33der >
forgot
16:58
<
abu[7] >
Found something?
16:59
<
tankf33der >
no, 1sec is final run without gc
16:59
<
tankf33der >
too fast
17:00
<
tankf33der >
trying 100m
17:00
<
tankf33der >
calibrating gc
17:09
<
tankf33der >
10m - 10sec
17:09
<
tankf33der >
100m - 100secs
17:09
<
tankf33der >
right ?
17:10
<
abu[7] >
yes, linear
17:14
<
tankf33der >
enough for today
17:14
<
tankf33der >
thanks
17:15
<
abu[7] >
Thank you too :)
17:53
<
abu[7] >
It all depends on how deep the idx tree gets, i.e. how many different cities there are
18:01
<
abu[7] >
Next time, can you print the number of citiet, and (depth R) to see how big it is and how well it is balanced?
18:29
<
tankf33der >
I belive uniq cities are not 1M
18:29
<
tankf33der >
maybe 40-80k
18:44
<
abu[7] >
(lenghh (idx 'R))
18:45
<
abu[7] >
and (depth R) should ideally be in the order of the 2-log of that
18:45
<
abu[7] >
i.e. if 1 M cities than 20, but typically 30 of 40 (but not 1000)
18:46
<
abu[7] >
s/than/then
18:49
<
abu[7] >
Ah, the count is in '@@' after (depth R)
18:55
seninha has quit [Ping timeout: 260 seconds]
19:39
seninha has joined #picolisp
20:17
clacke has quit [K-Lined]
22:10
seninha has quit [Ping timeout: 264 seconds]