<
tankf33der>
Morning
<
abu[m]>
Good morning tankf33der!
<
tankf33der>
now i think we should fix fork issue on solaris because it makes more correct linux platform too. Double win.
<
abu[m]>
Also because it may be a hidden bug in Pil21 (as the other cases yesterday)
<
abu[m]>
Something is wrong with the global '$Child'
<
abu[m]>
in the parent fork
<
tankf33der>
When we can start?
<
abu[m]>
: (vi 'llvm~forkLisp)
<
abu[m]>
Probably it is wrong already after the first fork
<
tankf33der>
rebuilding with latest
<
tankf33der>
bugis here
<
tankf33der>
bug is here
<
abu[m]>
Crash on second fork?
<
tankf33der>
crash on second this:
<
tankf33der>
(unless (fork) (wait 60000) (bye))
<
abu[m]>
Yesterday it crashed here: (Cld: buf null) # No buffer yet
<
tankf33der>
remember
<
abu[m]>
So Cld: is probably null
<
tankf33der>
pil21 passed all tests on linux 6.0
<
tankf33der>
pil21 passed all tests on kernel linux 6.0
<
tankf33der>
Do you need me debug something on solaris?
<
abu[m]>
Yes, I have no idea what goes wrong
<
abu[m]>
I'm back in 30 minutes Then I can send test dbg's
<
abu[m]>
Back in 20 min
<
tankf33der>
Back in 1h
<
abu[m]>
ok, so (Cld:) is not null
<
abu[m]>
and the place of crash is the same
<
abu[m]>
reproducible
<
abu[m]>
Size 28 (difference between the two pointers) is also correct for 'child' structure
<
abu[m]>
But (Cld: buf null) fails
<
abu[m]>
This just sets the first field in the struct to null
<
abu[m]>
Perhaps something wrong with alloc()?
<
tankf33der>
easy to test
<
abu[m]>
This prints the size it tries to alloc
<
abu[m]>
I see "224 N"
<
abu[m]>
on the first call
<
abu[m]>
The second call does not need to allocate then
<
abu[m]>
The crash makes no sense then
<
abu[m]>
Can you expect the core dump and paste the instruction?
<
tankf33der>
what command in gdb ?
<
abu[m]>
yes, gdb on the core dump
<
abu[m]>
I need the instruction which caused the crash
<
abu[m]>
Perhaps "l" gives a listing of the current position?
<
abu[m]>
or some "print $pc" (I forgot the syntax)
<
abu[m]>
"x" command perhaps? With program counter register
<
tankf33der>
(gdb) print $pc
<
tankf33der>
$1 = (void (*)()) 0x100058dc8 <forkLisp+584>
<
abu[m]>
Getting close!
<
abu[m]>
We know the position 584
<
abu[m]>
So please do "disas forkLisp" (very long)
<
abu[m]>
Somehow redirect to file and paste ;)
<
abu[m]>
For "disas" you have to keep pressing Enter for each page
<
tankf33der>
downloading latest possible patches for solaris 11 os
<
tankf33der>
upgrade will end in several days
<
abu[m]>
Good, <+584>: clrx [ %i4 + %i5 ]
<
abu[m]>
Looks like the right place, stores null
<
abu[m]>
I cannot see any reason why it crashes
<
tankf33der>
good you checked
<
abu[m]>
Strange it gives a bus error
<
abu[m]>
'dbg' output looks like a valid address
<
abu[m]>
What exactly does bus error mean? Non-existing addrees?
<
abu[m]>
Segfault means illegal access of existing memory I think
<
abu[m]>
So it looks a lot like an alignment problem!
<
abu[m]>
Yes! That's it!
<
abu[m]>
The second one, 4297738620, is not a multiple of 8
<
abu[m]>
Solaris requires 8-byte-alignment here
<
abu[m]>
Can be solved easily, but I should check the other structures too (in @src/dec.l)
<
abu[m]>
dbFile has the same requirement
<
abu[m]>
I think it would crash too
<
abu[m]>
Can you test that?
<
abu[m]>
Make a DB with more than one file
<
abu[m]>
Then access the second file
<
tankf33der>
damn :)
<
abu[m]>
I make an example
<
tankf33der>
Please
<
tankf33der>
i can do it without example
<
tankf33der>
i do not understand db completely
<
abu[m]>
no problem
<
abu[m]>
(new 2) creates a symbol in the second DB file
<
abu[m]>
I think it will crash on Solaris
<
tankf33der>
$ ../pil
<
tankf33der>
Bus Error (core dumped)
<
tankf33der>
: (pool "xxx" (2 2))
<
abu[m]>
I change the structure definitions in @src/dec.l to be multiples of 8
<
tankf33der>
do it for solaris only ?
<
abu[m]>
The others should be ok with this change
<
tankf33der>
but @lib/test.l crash with bus error anyway
<
abu[m]>
Which test is it?
<
tankf33der>
digging
<
tankf33der>
### kids ###
<
tankf33der>
(link (or (fork) (wait 2000) (bye))) ) )
<
tankf33der>
(flip (kids)) )
<
tankf33der>
this one
<
abu[m]>
again in (fork)?
<
tankf33der>
(unless (fork) (wait 6000) (bye))
<
tankf33der>
^^^^ this works
<
abu[m]>
Do you still have the 'dbg' output of the pointers?
<
tankf33der>
i will create then
<
abu[m]>
good, let's check the sizes again
<
abu[m]>
The pointers are good now. But what is that "0"?
<
tankf33der>
this is my 0, i missed
<
abu[m]>
Just an output?
<
tankf33der>
(test NIL (pipe (prog (kill *Pid) (pr 7)) (rd)))
<
abu[m]>
So now it crashes on th2 8th time
<
abu[m]>
ok, pipe calls forkLisp
<
abu[m]>
I try here too to fork more than 8 times
<
tankf33der>
(test 7 (pipe (protect (kill *Pid) (pr 7)) (rd)))
<
tankf33der>
here crashes too
<
abu[m]>
No idea again ;)
<
abu[m]>
The crash is in another place now
<
abu[m]>
"9 T" is printed
<
abu[m]>
Can you find out from core dump where it crashes now?
<
tankf33der>
(pipe (call *CMD "-prog (println (argv)) (bye)" "abc" 123) (read))
<
tankf33der>
4297607744
<
tankf33der>
Bus Error (core dumped)
<
tankf33der>
<tankf33der> 4297607744
<
tankf33der>
is this address ok?
<
abu[m]>
Not a multiple of 8
<
abu[m]>
Where is that?
<
tankf33der>
line above
<
tankf33der>
pipe call again
<
abu[m]>
I mean in the base source. forkLisp()
<
abu[m]>
It is after "9 T", so probably after forkLisp returned
<
abu[m]>
Perhaps some other unaligned data
<
tankf33der>
output and code
<
abu[m]>
yes, this was clear
<
abu[m]>
But
*where* in 'pipe'?
<
abu[m]>
after "9 T" this time, so forkLisp exited
<
tankf33der>
$ cat pipe1.l
<
tankf33der>
(test NIL (pipe (prog (kill *Pid) (pr 7)) (rd)))
<
tankf33der>
(msg 'ok)
<
tankf33der>
backtrace
<
tankf33der>
#0 0x000000010004a2b4 in pushOutFile ()
<
abu[m]>
So we are in the child process this time
<
abu[m]>
pushOutFile operates on structures on he stack
<
abu[m]>
so this means we also have to align all stack structures
<
abu[m]>
I think it is not aligned, because the pil21 llvm compiler does not align on the stack
<
abu[m]>
The 'T' ones are from pushOutFile?
<
tankf33der>
i think so
<
abu[m]>
The last one is it
<
abu[m]>
18446744071562062524
<
abu[m]>
This is not aligned
<
tankf33der>
this clean output, i disabled dbg in forklisp
<
abu[m]>
All are from pushOutFile, right?
<
abu[m]>
These structures are allocated on the stack with alloca(). I have no control over the alignment
<
abu[m]>
This is tough
<
tankf33der>
you can release dec.l changes anyway
<
tankf33der>
good we check as much as can
<
abu[m]>
But it would be better also for the other CPUs if these structures were properly aligned
<
abu[m]>
ok, I release first, then think about a rewrite of some stack structures
<
tankf33der>
x64 ok
<
tankf33der>
s390x ok
<
abu[m]>
Great! But I must give up for now. Need to think more about it, needs a different strategy for all those stack structures
<
abu[m]>
I must call alloca() sometimes with an alignment arg, so I must extend @src/lib/llvm.l
<
tankf33der>
Ok, enough for today
<
tankf33der>
too much picolisp in my life.
<
tankf33der>
See you
<
abu[m]>
Thanks a lot!!! ☺
<
abu[m]>
Now I found some time ☺
<
abu[m]>
Changed the relevant stack structures
<
abu[m]>
And I will release it now.
<
abu[m]>
Though this alignment was not required by the hardware on the architectures we tested before, it is better in any case.
<
abu[m]>
Uses a little more stack space but is faster.
<
abu[m]>
tankf33der: When you have time, please test! I hope very much that Solaris passes now too.
chexum has quit [Quit: No Ping reply in 180 seconds.]
chexum has joined #picolisp
clacke has quit [Remote host closed the connection]
hrberg has quit [Ping timeout: 256 seconds]
clacke has joined #picolisp