beneroth changed the topic of #picolisp to: PicoLisp language | The scalpel of software development | Channel Log: https://libera.irclog.whitequark.org/picolisp | Check www.picolisp.com for more information
pablo_escoberg has joined #picolisp
aw- has quit [Quit: Leaving.]
aw- has joined #picolisp
aw- has quit [Ping timeout: 240 seconds]
aw- has joined #picolisp
pablo_escoberg has quit [Quit: Client closed]
stultulo is now known as f8l
pablo_escoberg has joined #picolisp
gahr has joined #picolisp
<gahr> hi, I'm trying to build picolisp on FreeBSD 13.2-RELEASE, amd64. './bin/picolisp' terminated by signal SIGBUS (Misaligned address error)
<gahr> I have tried with both llvm14 and llvm15
<gahr> is this something known?
<abu[7]> Hmm, can you try to find the exact location?
<abu[7]> e.g. bt in the core dump
<gahr> let me rebuild with debug symbols
<abu[7]> Not needed probably
<abu[7]> Just backtrace usually works in gdb
<abu[7]> oh, in a system lib
<abu[7]> wait, no
<abu[7]> _open
<abu[7]> tankf33der here?
<abu[7]> FreeBSD 13?
<abu[7]> brb
abu[7] has left #picolisp [#picolisp]
abu[7] has joined #picolisp
<gahr> 13.2, yeah
<abu[7]> brb
abu[7] has left #picolisp [#picolisp]
abu[7] has joined #picolisp
<abu[7]> The question was for tankf33der, he tested many systems
<tankf33der> Here
<tankf33der> Freebsd was not working
<tankf33der> you can try yourself on newer system
<gahr> I have tried on 13.2, it still SIGBUS's
<tankf33der> Yeah
<gahr> (gdb) bt
<gahr> #0 0x000000000023db01 in evSym ()
<gahr> #1 0x0000000000223f0d in _open ()
<gahr> now I get this
<gahr> I have changed ASM to opt15 --debugify
<tankf33der> this is because freebsd is not in supported systems list
<gahr> what's missing?
<abu[7]> I see
<gahr> I thought any system that supported llvm would automatically be supported :)
<gahr> perhaps a bit naively
<tankf33der> :)
<abu[7]> brb
abu[7] has left #picolisp [#picolisp]
abu[7] has joined #picolisp
<abu[7]> Perhaps some 64bit assumption in pil which does not hold for FreeBSD
<tankf33der> T
<tankf33der> Solaris and openbsd works at the same time
<gahr> how do I build in debug mode so I can gdb through it?
<gahr> I'm not familiar with using llvm execpt from clang
<tankf33der> Just add -g to clang call
<tankf33der> debug will not help to port to freebsd
<abu[7]> I ever build in debug mode, but used gdb (or lldb) occasonally
<tankf33der> ok
<abu[7]> The source is not C, so normal sorce debugging won't work
<abu[7]> *source level
<gahr> true that
<gahr> alright, I think I'll give up, unless you have some ideas where I should look
<beneroth> so pil works on OpenBSD and Solaris, but not FreeBSD? I wouldn't have expected such differences between OpenBSD and FreeBSD
<beneroth> interesting
<gahr> yeah, it's weird
<gahr> uhm.. I might be using a clang version != than llvm version
<abu[7]> _open is not very long
<gahr> let me try with everything on the same version
<abu[7]> good
<gahr> no luck :(
<gahr> ok I can try to strip down / remove parts of _open until I find the culprit
<abu[7]> Can you look at the core dump again, and list _open to get the addresses?
<gahr> commented on the gist
<gahr> https://gist.github.com/gahr/467849a77d49e8dc0d652522822a94e0 in case you lost it after reconnection
<abu[7]> thanks, still have in tmux backlog :)
<gahr> :)
<abu[7]> It is quite early in _open
<abu[7]> I check the asm
<abu[7]> mov 0x8(%rdi),%rbx is (let (X (cdr Exe)
<abu[7]> So just fetching the argument fails
<gahr> what's mov (%rbx),%rdi ?
<abu[7]> yeah, this is strange
<abu[7]> It takes the CAR
<abu[7]> but the next should be (evSym X)
<abu[7]> in llvm it is %4 = call i64 @evSym(i64 %3)
<abu[7]> line 51771 in src/base.ll
<abu[7]> base.ll is in the distro
<abu[7]> then llvm generates target assembly (here x86-64)
<abu[7]> mov (%rbx),%rdi does not match here
<abu[7]> %rbx should be 'X'
<abu[7]> why the CAR then?
<gahr> do you want to see more of the assembly of that function?
<abu[7]> Looks like evSym is expanded inline
<tankf33der> installing freebsd 13.2
<abu[7]> This is ok, as it crashed here
<abu[7]> thanks tankf33der!
<abu[7]> yes, evSym is inline, also here on ARMv8
<abu[7]> ok, so the asm makes sense
<abu[7]> _open has a single argument in %rdi
<abu[7]> %rbx is the CDR
<abu[7]> i.e. (cdr Exe) -> X
<abu[7]> then taking the CAR crashes
<abu[7]> (%rbx),%rdi
<abu[7]> %rbx is not properly aligned and gives a BUS error
<abu[7]> Probably aligned to 32 bits but the hardware needs 64 bit alignment?
<abu[7]> Shouldn't LLVM take care of that?
<abu[7]> brb
abu[7] has left #picolisp [#picolisp]
abu[7] has joined #picolisp
<abu[7]> I wonder why (open) is called at all. It is a seldom used function
<abu[7]> This is in the build?
<abu[7]> I think there is no place in the build process that calls (open)
<tankf33der> i got freebsd
<tankf33der> the same issue
<abu[7]> Can you try to find where it happens?
<tankf33der> http://ix.io/4CJA
<abu[7]> yeah
<abu[7]> but why _open?
<abu[7]> @test/src/io.l is the only place where (open) is called
<abu[7]> But the test suite does not run in build, right?
<gahr> it's not while building, it's when I run picolisp after it's been built
<abu[7]> ah
<abu[7]> then (open) is called in DB code
<abu[7]> but this does not run here yet
<abu[7]> You just did $ ./pil + ?
<gahr> I did ../bin/picolisp
<gahr> from within the src dir where I gmake'd
<abu[7]> ok, just the absolute minimum
<abu[7]> (open) which is _open() should not be called at all
<tankf33der> afk.
<abu[7]> So the code must be completely lost and jumps somewhere
<abu[7]> _open is a Lisp level function
<abu[7]> but plain bin/picolisp does not call any Lisp fun yet
<abu[7]> Very strange
calle has joined #picolisp
calle has quit [Ping timeout: 260 seconds]
<gahr> sorry for having puzzled you :)
<gahr> on a Monday, even
<abu[7]> No problem! Good that you brought up the issue
<abu[7]> Maybe we solve it one day ;)
<gahr> no clue whatoever?
<abu[7]> No idea :(
<abu[7]> Needs a step by step monitoring to see where it goes wrong
<abu[7]> Somehow it jumps to _open
<abu[7]> Perhaps a wrong pointer somewhere
<abu[7]> then _open has of course also a bad argument, so accessing that crashes
<tankf33der> gahr: pil21 passed all tests under all sanitizers, so in general everything is fine or should work everywhere
<abu[7]> Perhaps it is just some build option?
<tankf33der> nope, i tried different and also disable optimizations
<abu[7]> ok
<gahr> tankf33der: it passes sanitizers -> ship it :)
calle has joined #picolisp
<gahr> I changed _open to just "ret i64 %0", now it segfaults in _close. Changed _close the same, now it segfaults in _read
<gahr> so it *does* look like it's trying to do some IO using those functions
<gahr> not just jumping around in code randomly
<abu[7]> Or it is just because these functions are near each other?
<abu[7]> Very interesting
<abu[7]> Question is, where does it come from when it hits _open?
<gahr> how do I get a stack trace with function names?
<abu[7]> bt
<gahr> ah yeah, but everything but _open is in libc
<abu[7]> break _open
<abu[7]> oh, it is called from libc?
<gahr> yeah, clock_gettime then 3 frames, then _open
<gahr> that suggests something is wrong, yeah?
<abu[7]> T
<abu[7]> Where is clock_gettime called from?
<abu[7]> main() calls (set $USec (getUsec YES))
<abu[7]> and that calls gettimeofday()
<abu[7]> Seems the only thing involving time
<gahr> I don't hit gettimeofday
<abu[7]> ok
<abu[7]> main() gets started?
<gahr> I don't see the name of pil functions
<gahr> but I'm not sure it hit pil's main yet
<gahr> do you have any static initialization that run before main?
<gahr> like ctors for static objects in c++
<abu[7]> Only what clang puts there, the standard binary startup
<abu[7]> break main
<abu[7]> should work
<gahr> it dies before
<gahr> perhaps I could try to get the string containing the file it's trying to open?
<gahr> would "info reg" help you?
calle has quit [Ping timeout: 260 seconds]
<abu[7]> I don't think it is intended to open a file
<abu[7]> _open is called only from Lisp code, with a list of cells as arguments
<abu[7]> There is not even a heap yet, thus no cells
<abu[7]> I think it is a stray jump at random
<abu[7]> hmm, but then why does it hit exactly those functions?
<abu[7]> main() call (heapAlloc) and later:
<abu[7]> (let P $Nil # Init internal symbols
<abu[7]> Here is it where symbols like 'open' are initialized, receiving the name "open" and a function pointer to _open
<abu[7]> But if it does not even hit main ... ?
<abu[7]> And even there, the "function pointer to _open" is not *called*, only assigned
<abu[7]> Oh, an idea!
<abu[7]> Could it be a name collision?
<gahr> ah!
<abu[7]> That glibc in FreeBSD uses names like _open, _close and _read?
<tankf33der> gahr
<gahr> yeah
<tankf33der> string picolisp | grep open
<tankf33der> i am out of freebsd
<gahr> 85:#define open _open
<gahr> in a libc header
<abu[7]> ha!!
<abu[7]> That's the problem
<gahr> arguably, _open is reserved for the system
<gahr> well, anything starting with underscores
<abu[7]> yeah
<gahr> can we bulk rename your symbols to picolisp__open or something
<abu[7]> Then there is another problem
<abu[7]> Pil calls open()
<abu[7]> but it must call _open
<gahr> where is that mapped?
<gahr> src/glob?
<abu[7]> As you sdsowed, on the C include level
ello has quit [Ping timeout: 264 seconds]
<gahr> in the symTab?
<abu[7]> src/glob maps names to built-ins
<abu[7]> "open" to a symbol with function pointer to _open
ello has joined #picolisp
<gahr> sorry I lost you
<abu[7]> Buw the code calls libc and other libs with names like open() and read()
<gahr> what's the other problem you're mentioning?
<abu[7]> *Not* _open()
<abu[7]> it is what I just said
<abu[7]> Pil code, also 'native' via ffi calls the documented names
<abu[7]> *not* the ones mapped by include files
<abu[7]> So *everything* needs to be changed
<abu[7]> also existing Lisp code
<abu[7]> _open vs open is just the tip of the iceberg ;)
<gahr> ok I fail to see the bottom :)
<gahr> let's say we renamed the internal picolisp _open (and the others)
<gahr> to something that doesn't start with an underscore
<gahr> and adjust the code like the symbol map
<abu[7]> This is the first step
<abu[7]> But glibc is called also directly
<abu[7]> from Lisp code
<abu[7]> (native "@" "unlink" 'I ...
<gahr> ah that's fine
<gahr> you can call open() on freebsd
<gahr> like, you can dlopen and call "open"
<gahr> if that's the thing
<abu[7]> no
<abu[7]> (signal (val SIGINT Sig) (val SigIgn))
<gahr> I just think that there's some _open underneath
<abu[7]> in main
<abu[7]> The sources call libc functions
<abu[7]> without C in between
<abu[7]> Look at src/dec.l
<abu[7]> # libc
<abu[7]> All those functions are called directly by C name
<gahr> that's fine, we *do* expose open() proper in libc
<abu[7]> I thought the internal name is _open
<abu[7]> And the C preprocessor translates open() to _open()
<gahr> no no
<gahr> I think at a certain point there's an _open in the implementation of libc
<abu[7]> right
<gahr> we can very well dlopen and call "open"
<gahr> like, the symbol name is "open" proper
<abu[7]> See e.g. malloc()
<abu[7]> Pil calls malloc
<gahr> that's fine
<abu[7]> but it must call _malloc
<gahr> why?
<abu[7]> Sorry, must hurry
<abu[7]> Must go, back later
<gahr> btw, I changed _open, _read, and _close to Pil_open etc.. now picolisp starts up and I get a prompt :)
<gahr> be back tomorrow
<tankf33der> Huge progress
<abu[7]> Not sure
<abu[7]> The question is if glibc exposes names like 'malloc' or '_malloc'
chexum_ has quit [Ping timeout: 240 seconds]
chexum has joined #picolisp
<abu[7]> arrived
<abu[7]> tankf33der, can you 'nm' on glibc?
<abu[7]> grep for malloc
<tankf33der> I can not today, only tomorrow
<abu[7]> Good, I'm also not free now
<abu[7]> afp
<gahr> it's malloc, and it's open. In addition, there's _open too, but that's not an issue: the posix api is exposed with proper names :)
calle has joined #picolisp
<abu[7]> Ah, ok, cool. Then all should work
<abu[7]> Just needs a rename of of built-in functions
<gahr> yep :) weird that you've never been hit by anything like this.. some
<abu[7]> I'll think of some nice pattern
<gahr> of those functions have pretty common names
<abu[7]> Why is this? why have 'malloc' and '_malloc'?
<gahr> thanks abu[7] , I'm looking forward to exploring picolisp :)
<gahr> well, I can imagine some system
<gahr> having some _open in libc
<gahr> I mean, it's not totally alien to think of
<abu[7]> true, must have some reason
<abu[7]> Previous versions of PicoLisp (pil32, pil64 and mini) used patters like doOpen()
<gahr> I'll look into our _open tomorrow. Maybe it should be static in some file, but I guess the libc implementation also needs internal names which are shared across compilation units
<abu[7]> What if _open etc. were declared "static"?
<gahr> heh yep that
<abu[7]> Let's try tomorrow
<abu[7]> I'm still with a beer here with friends :)
<gahr> I can tell you in the morning.. I'm on a phone now, not about to start grepping source code :)
<gahr> enjoy, tty tomorrow
<abu[7]> Thanks! No hurry anyway :)
pablo_escoberg has quit [Quit: Client closed]
<beneroth> congrats, good you found that out, impressive!
calle has quit [Ping timeout: 246 seconds]