04:56
pablo_escoberg has joined #picolisp
05:05
aw- has quit [Quit: Leaving.]
05:05
aw- has joined #picolisp
06:36
aw- has quit [Ping timeout: 240 seconds]
06:51
aw- has joined #picolisp
10:03
pablo_escoberg has quit [Quit: Client closed]
11:35
stultulo is now known as f8l
12:10
pablo_escoberg has joined #picolisp
12:48
gahr has joined #picolisp
12:49
<
gahr >
hi, I'm trying to build picolisp on FreeBSD 13.2-RELEASE, amd64. './bin/picolisp' terminated by signal SIGBUS (Misaligned address error)
12:49
<
gahr >
I have tried with both llvm14 and llvm15
12:49
<
gahr >
is this something known?
12:55
<
abu[7] >
Hmm, can you try to find the exact location?
12:57
<
abu[7] >
e.g. bt in the core dump
13:09
<
gahr >
let me rebuild with debug symbols
13:10
<
abu[7] >
Not needed probably
13:11
<
abu[7] >
Just backtrace usually works in gdb
13:16
<
abu[7] >
oh, in a system lib
13:17
<
abu[7] >
tankf33der here?
13:17
<
abu[7] >
FreeBSD 13?
13:17
abu[7] has left #picolisp [#picolisp]
13:18
abu[7] has joined #picolisp
13:20
abu[7] has left #picolisp [#picolisp]
13:20
abu[7] has joined #picolisp
13:21
<
abu[7] >
The question was for tankf33der, he tested many systems
13:23
<
tankf33der >
Freebsd was not working
13:23
<
tankf33der >
you can try yourself on newer system
13:26
<
gahr >
I have tried on 13.2, it still SIGBUS's
13:27
<
gahr >
#0 0x000000000023db01 in evSym ()
13:27
<
gahr >
#1 0x0000000000223f0d in _open ()
13:27
<
gahr >
now I get this
13:27
<
gahr >
I have changed ASM to opt15 --debugify
13:28
<
tankf33der >
this is because freebsd is not in supported systems list
13:29
<
gahr >
what's missing?
13:29
<
gahr >
I thought any system that supported llvm would automatically be supported :)
13:30
<
gahr >
perhaps a bit naively
13:30
abu[7] has left #picolisp [#picolisp]
13:30
abu[7] has joined #picolisp
13:31
<
abu[7] >
Perhaps some 64bit assumption in pil which does not hold for FreeBSD
13:34
<
tankf33der >
Solaris and openbsd works at the same time
13:35
<
gahr >
how do I build in debug mode so I can gdb through it?
13:36
<
gahr >
I'm not familiar with using llvm execpt from clang
13:41
<
tankf33der >
Just add -g to clang call
13:41
<
tankf33der >
debug will not help to port to freebsd
13:42
<
abu[7] >
I ever build in debug mode, but used gdb (or lldb) occasonally
13:42
<
abu[7] >
The source is not C, so normal sorce debugging won't work
13:43
<
abu[7] >
*source level
13:46
<
gahr >
alright, I think I'll give up, unless you have some ideas where I should look
13:47
<
beneroth >
so pil works on OpenBSD and Solaris, but not FreeBSD? I wouldn't have expected such differences between OpenBSD and FreeBSD
13:47
<
beneroth >
interesting
13:48
<
gahr >
yeah, it's weird
13:48
<
gahr >
uhm.. I might be using a clang version != than llvm version
13:49
<
abu[7] >
_open is not very long
13:49
<
gahr >
let me try with everything on the same version
13:49
<
gahr >
ok I can try to strip down / remove parts of _open until I find the culprit
13:50
<
abu[7] >
Can you look at the core dump again, and list _open to get the addresses?
13:51
<
gahr >
commented on the gist
13:54
<
abu[7] >
thanks, still have in tmux backlog :)
13:54
<
abu[7] >
It is quite early in _open
13:54
<
abu[7] >
I check the asm
13:58
<
abu[7] >
mov 0x8(%rdi),%rbx is (let (X (cdr Exe)
13:59
<
abu[7] >
So just fetching the argument fails
14:00
<
gahr >
what's mov (%rbx),%rdi ?
14:01
<
abu[7] >
yeah, this is strange
14:01
<
abu[7] >
It takes the CAR
14:01
<
abu[7] >
but the next should be (evSym X)
14:01
<
abu[7] >
in llvm it is %4 = call i64 @evSym(i64 %3)
14:02
<
abu[7] >
line 51771 in src/base.ll
14:02
<
abu[7] >
base.ll is in the distro
14:02
<
abu[7] >
then llvm generates target assembly (here x86-64)
14:03
<
abu[7] >
mov (%rbx),%rdi does not match here
14:04
<
abu[7] >
%rbx should be 'X'
14:04
<
abu[7] >
why the CAR then?
14:06
<
gahr >
do you want to see more of the assembly of that function?
14:06
<
abu[7] >
Looks like evSym is expanded inline
14:07
<
tankf33der >
installing freebsd 13.2
14:07
<
abu[7] >
This is ok, as it crashed here
14:07
<
abu[7] >
thanks tankf33der!
14:08
<
abu[7] >
yes, evSym is inline, also here on ARMv8
14:08
<
abu[7] >
ok, so the asm makes sense
14:08
<
abu[7] >
_open has a single argument in %rdi
14:09
<
abu[7] >
%rbx is the CDR
14:09
<
abu[7] >
i.e. (cdr Exe) -> X
14:09
<
abu[7] >
then taking the CAR crashes
14:09
<
abu[7] >
(%rbx),%rdi
14:10
<
abu[7] >
%rbx is not properly aligned and gives a BUS error
14:11
<
abu[7] >
Probably aligned to 32 bits but the hardware needs 64 bit alignment?
14:11
<
abu[7] >
Shouldn't LLVM take care of that?
14:13
abu[7] has left #picolisp [#picolisp]
14:14
abu[7] has joined #picolisp
14:20
<
abu[7] >
I wonder why (open) is called at all. It is a seldom used function
14:21
<
abu[7] >
This is in the build?
14:22
<
abu[7] >
I think there is no place in the build process that calls (open)
14:24
<
tankf33der >
i got freebsd
14:25
<
tankf33der >
the same issue
14:25
<
abu[7] >
Can you try to find where it happens?
14:26
<
abu[7] >
but why _open?
14:27
<
abu[7] >
@test/src/io.l is the only place where (open) is called
14:27
<
abu[7] >
But the test suite does not run in build, right?
14:28
<
gahr >
it's not while building, it's when I run picolisp after it's been built
14:29
<
abu[7] >
then (open) is called in DB code
14:29
<
abu[7] >
but this does not run here yet
14:29
<
abu[7] >
You just did $ ./pil + ?
14:30
<
gahr >
I did ../bin/picolisp
14:30
<
gahr >
from within the src dir where I gmake'd
14:31
<
abu[7] >
ok, just the absolute minimum
14:31
<
abu[7] >
(open) which is _open() should not be called at all
14:31
<
abu[7] >
So the code must be completely lost and jumps somewhere
14:32
<
abu[7] >
_open is a Lisp level function
14:33
<
abu[7] >
but plain bin/picolisp does not call any Lisp fun yet
14:33
<
abu[7] >
Very strange
14:34
calle has joined #picolisp
14:42
calle has quit [Ping timeout: 260 seconds]
14:44
<
gahr >
sorry for having puzzled you :)
14:45
<
gahr >
on a Monday, even
14:45
<
abu[7] >
No problem! Good that you brought up the issue
14:45
<
abu[7] >
Maybe we solve it one day ;)
14:50
<
gahr >
no clue whatoever?
14:51
<
abu[7] >
No idea :(
14:52
<
abu[7] >
Needs a step by step monitoring to see where it goes wrong
14:52
<
abu[7] >
Somehow it jumps to _open
14:52
<
abu[7] >
Perhaps a wrong pointer somewhere
14:53
<
abu[7] >
then _open has of course also a bad argument, so accessing that crashes
14:57
<
tankf33der >
gahr: pil21 passed all tests under all sanitizers, so in general everything is fine or should work everywhere
14:58
<
abu[7] >
Perhaps it is just some build option?
14:58
<
tankf33der >
nope, i tried different and also disable optimizations
15:00
<
gahr >
tankf33der: it passes sanitizers -> ship it :)
15:36
calle has joined #picolisp
15:38
<
gahr >
I changed _open to just "ret i64 %0", now it segfaults in _close. Changed _close the same, now it segfaults in _read
15:39
<
gahr >
so it
*does* look like it's trying to do some IO using those functions
15:39
<
gahr >
not just jumping around in code randomly
15:40
<
abu[7] >
Or it is just because these functions are near each other?
15:41
<
abu[7] >
Very interesting
15:42
<
abu[7] >
Question is, where does it come from when it hits _open?
15:42
<
gahr >
how do I get a stack trace with function names?
15:43
<
gahr >
ah yeah, but everything but _open is in libc
15:43
<
abu[7] >
break _open
15:44
<
abu[7] >
oh, it is called from libc?
15:44
<
gahr >
yeah, clock_gettime then 3 frames, then _open
15:44
<
gahr >
that suggests something is wrong, yeah?
15:45
<
abu[7] >
Where is clock_gettime called from?
15:46
<
abu[7] >
main() calls (set $USec (getUsec YES))
15:46
<
abu[7] >
and that calls gettimeofday()
15:47
<
abu[7] >
Seems the only thing involving time
15:49
<
gahr >
I don't hit gettimeofday
15:50
<
abu[7] >
main() gets started?
15:50
<
gahr >
I don't see the name of pil functions
15:51
<
gahr >
but I'm not sure it hit pil's main yet
15:51
<
gahr >
do you have any static initialization that run before main?
15:51
<
gahr >
like ctors for static objects in c++
15:51
<
abu[7] >
Only what clang puts there, the standard binary startup
15:52
<
abu[7] >
break main
15:52
<
abu[7] >
should work
15:53
<
gahr >
it dies before
15:54
<
gahr >
perhaps I could try to get the string containing the file it's trying to open?
15:54
<
gahr >
would "info reg" help you?
15:55
calle has quit [Ping timeout: 260 seconds]
15:55
<
abu[7] >
I don't think it is intended to open a file
15:56
<
abu[7] >
_open is called only from Lisp code, with a list of cells as arguments
15:56
<
abu[7] >
There is not even a heap yet, thus no cells
15:56
<
abu[7] >
I think it is a stray jump at random
15:56
<
abu[7] >
hmm, but then why does it hit exactly those functions?
15:57
<
abu[7] >
main() call (heapAlloc) and later:
15:57
<
abu[7] >
(let P $Nil # Init internal symbols
15:58
<
abu[7] >
Here is it where symbols like 'open' are initialized, receiving the name "open" and a function pointer to _open
15:58
<
abu[7] >
But if it does not even hit main ... ?
15:59
<
abu[7] >
And even there, the "function pointer to _open" is not *called*, only assigned
16:04
<
abu[7] >
Oh, an idea!
16:04
<
abu[7] >
Could it be a name collision?
16:05
<
abu[7] >
That glibc in FreeBSD uses names like _open, _close and _read?
16:06
<
tankf33der >
string picolisp | grep open
16:06
<
tankf33der >
i am out of freebsd
16:06
<
gahr >
85:#define open _open
16:06
<
gahr >
in a libc header
16:07
<
abu[7] >
That's the problem
16:08
<
gahr >
arguably, _open is reserved for the system
16:08
<
gahr >
well, anything starting with underscores
16:08
<
gahr >
can we bulk rename your symbols to picolisp__open or something
16:09
<
abu[7] >
Then there is another problem
16:09
<
abu[7] >
Pil calls open()
16:09
<
abu[7] >
but it must call _open
16:09
<
gahr >
where is that mapped?
16:10
<
abu[7] >
As you sdsowed, on the C include level
16:10
ello has quit [Ping timeout: 264 seconds]
16:10
<
gahr >
in the symTab?
16:10
<
abu[7] >
src/glob maps names to built-ins
16:10
<
abu[7] >
"open" to a symbol with function pointer to _open
16:11
ello has joined #picolisp
16:11
<
gahr >
sorry I lost you
16:11
<
abu[7] >
Buw the code calls libc and other libs with names like open() and read()
16:11
<
gahr >
what's the other problem you're mentioning?
16:11
<
abu[7] >
*Not* _open()
16:11
<
abu[7] >
it is what I just said
16:12
<
abu[7] >
Pil code, also 'native' via ffi calls the documented names
16:12
<
abu[7] >
*not* the ones mapped by include files
16:12
<
abu[7] >
So
*everything* needs to be changed
16:13
<
abu[7] >
also existing Lisp code
16:13
<
abu[7] >
_open vs open is just the tip of the iceberg ;)
16:13
<
gahr >
ok I fail to see the bottom :)
16:14
<
gahr >
let's say we renamed the internal picolisp _open (and the others)
16:14
<
gahr >
to something that doesn't start with an underscore
16:14
<
gahr >
and adjust the code like the symbol map
16:14
<
abu[7] >
This is the first step
16:15
<
abu[7] >
But glibc is called also directly
16:15
<
abu[7] >
from Lisp code
16:15
<
abu[7] >
(native "@" "unlink" 'I ...
16:15
<
gahr >
ah that's fine
16:15
<
gahr >
you can call open() on freebsd
16:15
<
gahr >
like, you can dlopen and call "open"
16:15
<
gahr >
if that's the thing
16:16
<
abu[7] >
(signal (val SIGINT Sig) (val SigIgn))
16:16
<
gahr >
I just think that there's some _open underneath
16:16
<
abu[7] >
The sources call libc functions
16:16
<
abu[7] >
without C in between
16:16
<
abu[7] >
Look at src/dec.l
16:17
<
abu[7] >
All those functions are called directly by C name
16:17
<
gahr >
that's fine, we
*do* expose open() proper in libc
16:18
<
abu[7] >
I thought the internal name is _open
16:18
<
abu[7] >
And the C preprocessor translates open() to _open()
16:19
<
gahr >
I think at a certain point there's an _open in the implementation of libc
16:19
<
gahr >
we can very well dlopen and call "open"
16:19
<
gahr >
like, the symbol name is "open" proper
16:19
<
abu[7] >
See e.g. malloc()
16:19
<
abu[7] >
Pil calls malloc
16:19
<
abu[7] >
but it must call _malloc
16:20
<
abu[7] >
Sorry, must hurry
16:20
<
abu[7] >
Must go, back later
16:21
<
gahr >
btw, I changed _open, _read, and _close to Pil_open etc.. now picolisp starts up and I get a prompt :)
16:22
<
gahr >
be back tomorrow
16:23
<
tankf33der >
Huge progress
16:27
<
abu[7] >
The question is if glibc exposes names like 'malloc' or '_malloc'
16:28
chexum_ has quit [Ping timeout: 240 seconds]
16:29
chexum has joined #picolisp
16:35
<
abu[7] >
tankf33der, can you 'nm' on glibc?
16:35
<
abu[7] >
grep for malloc
16:36
<
tankf33der >
I can not today, only tomorrow
16:38
<
abu[7] >
Good, I'm also not free now
19:12
<
gahr >
it's malloc, and it's open. In addition, there's _open too, but that's not an issue: the posix api is exposed with proper names :)
19:15
calle has joined #picolisp
19:18
<
abu[7] >
Ah, ok, cool. Then all should work
19:18
<
abu[7] >
Just needs a rename of of built-in functions
19:20
<
gahr >
yep :) weird that you've never been hit by anything like this.. some
19:20
<
abu[7] >
I'll think of some nice pattern
19:20
<
gahr >
of those functions have pretty common names
19:21
<
abu[7] >
Why is this? why have 'malloc' and '_malloc'?
19:21
<
gahr >
thanks abu[7] , I'm looking forward to exploring picolisp :)
19:21
<
gahr >
well, I can imagine some system
19:22
<
gahr >
having some _open in libc
19:22
<
gahr >
I mean, it's not totally alien to think of
19:22
<
abu[7] >
true, must have some reason
19:23
<
abu[7] >
Previous versions of PicoLisp (pil32, pil64 and mini) used patters like doOpen()
19:23
<
gahr >
I'll look into our _open tomorrow. Maybe it should be static in some file, but I guess the libc implementation also needs internal names which are shared across compilation units
19:23
<
abu[7] >
What if _open etc. were declared "static"?
19:24
<
gahr >
heh yep that
19:24
<
abu[7] >
Let's try tomorrow
19:24
<
abu[7] >
I'm still with a beer here with friends :)
19:24
<
gahr >
I can tell you in the morning.. I'm on a phone now, not about to start grepping source code :)
19:25
<
gahr >
enjoy, tty tomorrow
19:25
<
abu[7] >
Thanks! No hurry anyway :)
19:33
pablo_escoberg has quit [Quit: Client closed]
20:14
<
beneroth >
congrats, good you found that out, impressive!
23:00
calle has quit [Ping timeout: 246 seconds]