f_ changed the topic of ##raspberrypi-internals to: The inner workings of the Raspberry Pi (Low level VPU/HW) -- for general queries please visit #raspberrypi -- open firmware: https://librerpi.github.io/ -- VC4 VPU Programmers Manual: https://github.com/hermanhermitage/videocoreiv/wiki -- chat logs: https://libera.irclog.whitequark.org/~h~raspberrypi-internals -- bridged to matrix and discord
dolphinana has quit [Remote host closed the connection]
dolphinana has joined ##raspberrypi-internals
dolphinana has quit [Quit: Leaving]
jn has quit [Ping timeout: 256 seconds]
jn has joined ##raspberrypi-internals
jn has joined ##raspberrypi-internals
jn has quit [Changing host]
Stromeko has quit [Ping timeout: 256 seconds]
Stromeko has joined ##raspberrypi-internals
bonda_000 has joined ##raspberrypi-internals
<bonda_000> Yo
<bonda_000> clever u here?
f_ has joined ##raspberrypi-internals
<clever> bonda_000: morning
<bonda_000> Good morning
<bonda_000> so it looks like Minix has its own cut-down version of libc in the source files
<clever> thats what i expected
<bonda_000> but I also tried to build full glibc-2.39 with these flags
<bonda_000> ../glibc-2.39/configure CC="vc4-elf-gcc" LD="vc4-elf-ld" --prefix=/home/pi/Downloads/libc-vc4-minix --exec-prefix=/home/pi/Downloads/libc-vc4-minix --with-headers=/home/pi/Downloads/minix/minix/include/minix --with-binutils=/home/pi/Desktop/vc4-toolchain/prefix/bin
<bonda_000> and it says
<bonda_000> *** These critical programs are missing or too old: GNU ld gawk
<bonda_000> *** Check the INSTALL file for required versions
<clever> is gawk installed? what does "which gawk" say?
<bonda_000> says "no"
<clever> then that wouldbe why its complaining about gawk
<bonda_000> ok I thought its part of binutils its installing now
<clever> nope, gawk is part of gawk
<bonda_000> checking version of vc4-elf-ld... 2.23.51.20121030, bad
<bonda_000> *** These critical programs are missing or too old: GNU ld
<clever> the vc4 binutils is based on a fairly old fork of binutils
<bonda_000> I should get older glibc then?
<clever> you have a few choices
<clever> 1: rebase the changes on a newer binutils
<clever> 2: use an older glibc
<clever> 3: use the minix libc
<clever> 4: fix the new glibc to use an older ld
<clever> 3 has the best changes of actually working with the minix kernel
<bonda_000> yeah that's what I'm thinking too
<bonda_000> it says the default compiler to use is LLVM clang but gcc is also an option
<clever> there is another repo (i lost the link) that adds vc4 support to llvm
jcea has joined ##raspberrypi-internals
bonda_000 has quit [Ping timeout: 268 seconds]
bonda_000 has joined ##raspberrypi-internals
<bonda_000> and the task switching is done using that SystemTimer you told me earlier?
<clever> bonda_000: timer and other irq handlers
<clever> if the system is idle for example, and something arrives on the uart, you want to switch to a task that was waiting for the uart
<clever> the timer just helps when multiple things want cpu, and forces them to share
<bonda_000> yep I'm just looking at the decompile and there's a function
<bonda_000> systimer_init
<bonda_000> it's tricky somewhat, branches to core_get_func_table
<bonda_000> then the return value of that, stores to gp+0x405054
<bonda_000> actually does
<bonda_000> bl memset
<bonda_000> bl core_get_func_table
<clever> bonda_000: i do have working context switching in little-kernel, which is probably easier to understand then the decompile
<bonda_000> what file is it
<clever> bonda_000: when the LK thread core (in upstream LK) wants to context switch, it calls arch_context_switch
<clever> that may print a bunch of debug, then it calls vc4_context_switch
<clever> lines 7-9 saves the full cpu state to the stack
<clever> lines 12-19 swaps the stack pointer, saving the old one to the old thread, and restoring the new one from the new thread
<clever> lines 22-30 then restores all of the state that was saved to the new stack
<clever> so basically 7-15 saves the current state
<clever> and whenever this thread gets resumed and the sp restored, 22-31 restores it back
<clever> new threads work via arch_thread_initialize, which creates a fake "saved state"
<clever> so you can then restore that, the first time you enter the thread
<bonda_000> is that like setjmp?
<clever> not entirely, setjmp doesnt change the stack
<clever> here is setjmp
<bonda_000> I'm writing the hardware glue for Minix RS232 but they seem to be using UART interrupts
<bonda_000> which I've seen hermanhermitage never does that in his dump programs but just spins in a polling loop
<bonda_000> is lk using UART interrupts or you also sit in a busy loop waiting?
<clever> bonda_000: its using the PL011 uart with irq support
<bonda_000> are you using the FIFO or reading writing char by char?
<clever> there is both a hw and sw FIFO
<clever> the hw FIFO allows small bursts and lets the irq be a bit late
<clever> the sw FIFO then greatly expands the buffer
<bonda_000> it says mini uart has 8 char fifo
<clever> but i'm using the PL011 uart
<bonda_000> I know but is the buffer much bigger there?
<clever> i think it has a 16 char fifo
<bonda_000> ok
<bonda_000> the code here seems to be written for a 16565 UART like the mini uart uses on bcm
<bonda_000> I just replace the TI stuff with Broadcom stuff
<bonda_000> this minix arch is for BeagleBone
<bonda_000> has some kind of "oxoff" stop byte
<bonda_000> to let it know when to stop receiving
<bonda_000> is that the "Enter" keystroke?
<clever> nope
<bonda_000> then what is it
<clever> xon and xoff are dedicated bytes
<clever> i forget what they are, but google should know
<bonda_000> okay
<clever> its part of software flow control
<bonda_000> yes it looks like a state machine here
<clever> hw flow control is far simpler and much better
<clever> when the receive fifo is full, the hw flow control tells the remote end to stop, entirely under hw control
<clever> and if the remote end is configured for hw flow control, it will stop transmiting
<clever> so you never lose a byte, and the fifo is always kept full
<clever> but that comes at the cost of needing 4 wires, rx/tx, and rts/cts
<bonda_000> yeah i've seen that on the datasheet
<clever> sw flow control is just the receiver sending a special xoff byte, to tell the remote end to stop
<clever> but that can only happen inside the irq handler
<clever> and then the remote end has to receive that, and stop sending data
<bonda_000> that's what I'm doing right now the irq handler
<clever> and what happens if there is already 16 bytes in the remote tx fifo?
<clever> it cant just stop on a dime
<bonda_000> well I have no idea the chip on the uart dongle I have is some silabs
<bonda_000> is that important?
<clever> you mentioned it only has 3 wires, so it cant do hw flow control
<bonda_000> I thought minicom handles
<bonda_000> this stuff for me the fifos
<bonda_000> if minicom does it in timely manner read its fifos at the agreed baud rate
<bonda_000> then there shouldnt be a problem no?
<clever> that only handles the vpu->minicom direction
<bonda_000> unless my remote is dead
<clever> if minux cant read the fifo fast enough, for whatever reason, then you start droping bytes in that direction
<bonda_000> yeah so the irq should be handled fast enough
<bonda_000> there is also a lot of typos in the mini UART register map
<bonda_000> IIR and IER they mixed up descriptions
<bonda_000> AUX_MU_IIR and AUX_MU_IER which one is which
<bonda_000> I'm copying the code from hermanhermitage that worked but its hard to read that section
<bonda_000> the bits
<clever> i would just get the official PL011 uart docs from ARM
<bonda_000> there's less coding for me all the register names here are same to what BCM mini uart has
<bonda_000> I still have to do the rest of it like figure out the threads and where are the two cores after the bootrom
<clever> there are many issues with the mini-uart
<clever> so i would recommend, just forget it exists
<clever> the only reason you need to even consider it, is when you get around to bluetooth support
<bonda_000> serial_in(rs, OMAP3_LCR); this is the only type of line I'm replacing, the second arg should be the BCM AUX_MU register offset
<bonda_000> and comment out this line offset <<= rs->reg_offset; from serial_in() and serial_out() functions
<bonda_000> idk there seems to be a lot I don't where exactly to start
<clever> bonda_000: i think you should start by getting more familiar with the hw first, and ignore minix for now
<bonda_000> yeah the grey box AUX_MU_IIR_REG and AUX_MU_IER_REG are mixed up
<bonda_000> in the datasheet
<clever> bbl
<clever> back
<bonda_000> what does this in arm do
<bonda_000> ldm r9, {r0-r7}
<bonda_000> how is it going to load 38 bytes into a 4 byte register
<bonda_000> 32*
<bonda_000> oh I see now
<bonda_000> its like a stack pointer in r9
bonda_000 has quit [Ping timeout: 260 seconds]
bonda_000 has joined ##raspberrypi-internals
<bonda_000> is this the ldm analogue of vc4?
<bonda_000> v32ld H32(0x0,0x0),(r1)
<clever> bonda_000: thats a vector load, it will get an entire uint32_t[16] from the addr in r1, and load it to 0,0 in the vector registers
<clever> only other vector opcodes can then interact with it
bonda_000 has quit [Read error: Connection reset by peer]
bonda_000 has joined ##raspberrypi-internals
<bonda_000> ?
<clever> bonda_000: i would just use normal memcpy if all you want is to copy things
<bonda_000> I just can't see the ldm instruction
<bonda_000> in the decompile
<bonda_000> I see stm used as push
<bonda_000> but yeah I have memcpy
<bonda_000> nvm found it
<bonda_000> 0ed021a8 23 02 ldm r6-r9,(sp++)
<bonda_000> that's gonna load 16 bytes from where sp points at and increment it
<clever> looks like a normal ldm to pop from the stack
<clever> i count 32 bytes
<clever> r6, r7, r8, r9, 4 registers, 32bits(4bytes) each, thats 16
<clever> i somehow got an 8 in my math, lol
<bonda_000> okay
<bonda_000> do u know what's 'lea?
<bonda_000> 'lea'?
<clever> load effective address
<bonda_000> it's all over the place is it also some kind of load?
<clever> you use it like `lea r1, _start`
<clever> and the assembler/linker will store the relative offset between that opcode and _start
<clever> the cpu will then add that offset to PC to get the address of _start
<clever> and put the addr of _start into r1
<bonda_000> so its a pseudo instruction?
<clever> its a pc-relative thing
<bonda_000> ok
<bonda_000> so
<bonda_000> in ARM we park cores1,2,3 and let core0 go into the kernel
<bonda_000> and then send the message where's the entry point for parked cores
<clever> VPU core1 is already parked when things start
<bonda_000> so you end up in bootcode with just one core?
<clever> yes
<bonda_000> and then how do you un-park it
<bonda_000> I've seen 64-bit instructions had that
<bonda_000> from the VCIV manual
<clever> /home/clever/apps/rpi/rpi-open-firmware/common/broadcom/bcm2708_chip/intctrl1.h:#define IC1_WAKEUP HW_REGISTER_RW( 0x7e002834 )
<bonda_000> IC1 thats interrupt controller 1?
<clever> IC1 is just the name of core1 in some places
<clever> there it is
<clever> line 76 puts the top of the stack into a global variable, line 78 sets the 2nd core loose, at the core2_start function
<clever> core2_start then loads that global variable into sp, and jumps to core2_entry
<clever> which then just starts counting like mad and never exits
Herc has left ##raspberrypi-internals [Leaving]
<bonda_000> btest r0, 0x10
<bonda_000> version r0
<bonda_000> I thought thats the hardware identifier
<clever> one bit of the version register is the core id
<clever> so core0 and core1 have slightly different identifiers
<bonda_000> mine read 0x04000140h
<bonda_000> I see
<bonda_000> other bits are also useful?
<clever> undocumented
<bonda_000> it often compares version to 0x10000
<clever> that might be the core bit
<bonda_000> so I guess vc4 has nothing like ARM's ldm r9, {r0-r7}
<bonda_000> that would load 32 bytes from r9 points at and fill r0 through r7 with those bytes
<bonda_000> all the ldms I see are stack-related
<bonda_000> 0000 0010 0bbm mmmm ldm rb-rm,(sp++) Load registers from stack (highest first).
<bonda_000> 0000 0011 0bbm mmmm ldm rb-rm,pc,(sp++) Load registers from stack and final value into pc.
<bonda_000> 0000 0011 1bbm mmmm stm rb-rm,lr,(--sp) Store lr followed by registers onto stack.
<bonda_000> 0000 0010 1bbm mmmm stm rb-rm,(--sp) Store registers to stack (lowest first).
<bonda_000> where:
<bonda_000> - rb is r0, r6, r16, or r24 for bb == 00, 01, 10, 11.
<bonda_000> - rm = (rb+m)&31
<bonda_000> If sp is stored, then the value after the store is stored.
<bonda_000> If mmmmm is 31 and pc/lr are stored/loaded, then no register
<bonda_000> but pc/lr is stored/loaded ("stm lr/ldm pc"). The same
<bonda_000> applies at least to "stm r24-r7, lr, (--sp)".
<clever> just use normal ld several times
<bonda_000> 0000 010o oooo dddd ld rd, (sp+o*4) Load from memory relative to stack pointer.
<bonda_000> 0000 011o oooo dddd st rd, (sp+o*4) Store to memory relative to stack pointer.
<bonda_000> 0000 1ww0 ssss dddd ld<w> rd, (rs) Load from memory.
<bonda_000> 0000 1ww1 ssss dddd st<w> rd, (rs) Store to memory.
<bonda_000> 0001 0ooo oood dddd add rd, sp, o*4 rd = sp + o*4
<clever> also, your flooding again
<bonda_000> 0001 1ccc cooo oooo b<cc> $+o*2 Branch on condition to target.
<bonda_000> 0010 uuuu ssss dddd ld rd, (rs+u*4) rd = *(rs + u*4)
<bonda_000> 0011 uuuu ssss dddd st rd, (rs+u*4) *(rs + u*4) = rd
<bonda_000> my bad
<clever> that message took 30 seconds to go thru
<bonda_000> you said it doesn't handle too much text
<bonda_000> so
<bonda_000> ld<w> rd, (rs)
<clever> you can just do `ld r0, (r1+4)` for example, i believe
<bonda_000> the <w> I have no clue what that is
<clever> the width of the load
<clever> 8/16/32 bits
<bonda_000> ld8 ld16 or ld32?
<clever> ldb is byte, 8 bits
<bonda_000> ldh is 16bits?
<clever> probably
<clever> whenever i'm in doubt, i just ask gcc to compile something for me
<bonda_000> I tried compiling helloworld from vc4-toolchain and objdump showed me pretty much nothing
<bonda_000> on a Pi
<clever> what did it show?
<clever> and what was the input source? how did you make helloworld.o ?
<bonda_000> do you have vc4-toolchain folder?
<clever> i just build things with nix, i have all of the needed tools in $PATH
<bonda_000> vc4-toolchain has helloworld.c
<bonda_000> I compiled that
<bonda_000> with vc4-elf-gcc
<bonda_000> althoug not sure how if printf is a part of libc which you said vc4-toolchain doesn't have
<clever> try just `vc4-elf-objdump -dr helloworld.o`
<clever> yep, now it works fine
<bonda_000> but you said
<clever> line 9, it pushes the link register to the stack
<bonda_000> it doesnt have libc
<clever> line 10, it loads the addr of the string in .rodata
<bonda_000> so how does it know about printf
<clever> vc4-toolchain includes newlib, which is a libc that has partial vc4 support
<clever> but its a mess, and ive stopped using it
<clever> LK includes its own libc
<clever> line 11 of the gist, tells the linker to shove in a 32bit addr of the string from .rodata, so that 0x0 isnt the truth
<clever> line 12 is a branch&link to a function
<clever> line 13 says to fill in the address of puts, the compiler got sneaky, and realized your not using any printf features
<clever> line 14 is setting the return value to 0, and 15 returns
<clever> adjust helloworld.c to do things, like loading a 16bit value from memory, compile again, and gcc will answer your questions
<bonda_000> okay I will try
<bonda_000> and what about the supervisor call? do you use it in your code?
<clever> nope
<bonda_000> minix tells me it uses user mode and supervisor mode Idk if I really need that since I'm the only user and supervisor
<bonda_000> okay so you just go with whatever state the vpu is after the initialization?
<clever> yeah
<clever> but it does have user and supervisor
<clever> ive just not investigated the details of it
<bonda_000> look
<bonda_000> something wierd
<clever> you want to use `vc4-elf-gcc -c helloworld.c -o helloworld.o`
<bonda_000> ok
<clever> without -c, it will try to link a complete binary, and then it cant find things
f_ has quit [Quit: To contact me, send a memo using MemoServ, PM f_[xmpp], or send an email. See https://vitali64.duckdns.org/.]
<bonda_000> if you try to compile it
<bonda_000> do you see it's not storing lr
<bonda_000> if there is an exception the original lr is lost
<bonda_000> here it tells me:
<clever> bonda_000: leaf functions (those not calling another function) dont have to save the lr, because nothing modifies the lr
<clever> arm does the same thing
<clever> thats the same gist as before with no changes
<bonda_000> the __user_copy_msg_pointer_failure
<bonda_000> my bad
<bonda_000> so if one of the pointers is bad, the exception handler will send me to __user_copy_msg_pointer_failure and I won't be going back to copy_msg_to/from_user
<clever> ive not looked into getting exceptions working on the VPU
<clever> so i dont know what is missing there
<bonda_000> I mean it won't hurt if I push the lr on the stack
<clever> depends on what your doing,
<clever> youll need to understand the context better
<bonda_000> I don't think its good that it compiled this way
<bonda_000> without saving the lr
<bonda_000> or
<bonda_000> it saw there is no further function calls
<bonda_000> and figured it's not necessary to save the lr
<bonda_000> also the memory map
<bonda_000> VPU has no MMU but that doesn't mean we can't do it in software right?
<clever> bonda_000: you could need to replace every load and store opcode with a function call
<clever> which would require re-writing all asm, and major overheads
<clever> at that point, your basically making an emulator
<bonda_000> wish we knew how big these "SDRAM" partitions are in the alias tabl
<bonda_000> table
<bonda_000> and whats that other unnamed rectangle
<bonda_000> in each of the four aliases
<bonda_000> in blue
<clever> bonda_000: the sdram partition is up to 1gig in size, it depends on how big the ram is
<bonda_000> say, 1GB
<clever> let me draw up a diagram....
<clever> read this while i draw one up...
<clever> basically, any access first goes thru that overlay layer, if its in one of those 3 windows, its a hit, and it does the listed thing
<clever> if its not in any of those 3 windows, it thru falls thru to the base layer
<clever> all 4 aliases in the base layer, refer to the same 1gig of ram
<bonda_000> i see
<bonda_000> pvVar1 = (void *)rtos_malloc_priority(__size,0x20,1,unaff_lr | 0x80000000);
<bonda_000> is what they do in vcos
<clever> thats just tagging the allocation with a return addr, so you can know what function to blame for the heap usage
<clever> its just a performance debug thing
<bonda_000> well but with binary loading thing, kernel is gonna malloc memory for them to operate on
<bonda_000> isn't that how it's done on the low level of OS
<clever> yeah, but lr doesnt have anything to do with that
<bonda_000> and from the program side of view that memory should look contiguous
<bonda_000> I understand
<clever> thats only possible if you have an mmu
<bonda_000> it usually saves the lr though in the decompile. I think the program I wrote is just too trivial so no nested function calls it didn't bother saving the lr
<bonda_000> but why do you call a software MMU an emulator
<bonda_000> all it needs is add some arithmetic to each ld/st
<clever> the hardware doesnt support an MMU of any form
<clever> so you need to intercept every load/store
<clever> and there is no way to intercept just load/store
<clever> so you have to pass every single opcode thru software
<clever> and thats what an emulator does
bonda_000 has quit [Quit: Leaving]