f_ changed the topic of ##raspberrypi-internals to: The inner workings of the Raspberry Pi (Low level VPU/HW) -- for general queries please visit #raspberrypi -- open firmware: https://librerpi.github.io/ -- VC4 VPU Programmers Manual: https://github.com/hermanhermitage/videocoreiv/wiki -- chat logs: https://libera.irclog.whitequark.org/~h~raspberrypi-internals -- bridged to matrix and discord
dolphinana has quit [Quit: Leaving]
jcea has quit [Ping timeout: 268 seconds]
Stromeko has quit [Ping timeout: 264 seconds]
Stromeko has joined ##raspberrypi-internals
clever has quit [Ping timeout: 256 seconds]
clever has joined ##raspberrypi-internals
bonda_000 has joined ##raspberrypi-internals
<bonda_000> So did you get the Ghidra working with start_x.elf?
<bonda_000> The ISP stuff is at virtual offset 0ED85CCE is where its referencing 7EA00000 a lot
<clever> bonda_000: all MMIO is at the 0x7e00_0000 range, when accessed by the VPU
<clever> 0ED8_5CCE would just be regular old ram, for the VPU
<bonda_000> its the code that accesses 7EA00000 which seem to be ISP regs
<clever> broadcom/bcm2708_chip/isp.h:#define ISP_BASE 0x7ea00000
<bonda_000> the first mention of 7EA00000
<clever> yep, that is the ISP area
<clever> the official headers that broadcom released years ago, only mention a single register, at that addr, and 32 bits in size
<bonda_000> its a lie
<bonda_000> references up to 7ea035xx
<clever> yeah
<bonda_000> there is a load of registers
<clever> [clever@amd-nixos:~/apps/rpi/firmware]$ git show 4f63c3bdd4d9ca4f308e683fe7e70501b4ac3232
<clever> in this commit of the official firmware, RPF messed up, and forgot to strip the binaryes
<clever> so the .elf's in there, have all of the debug info still present
<clever> somebody then complained that the files had doubled in size, and RPF fixed the issue :P
<clever> bonda_000: with that, i can skip to the ISP area in ghidra, and see references from dozens of functions, isp_plat_init, isp_plat_stop, isp_plat_abort
<bonda_000> yeah I don't see too many labels but there are some strings left in it
<clever> if you checkout that version of the firmware, and load the old `start_x.elf`, youll get a lot more details
<bonda_000> i can't, it says fatal bad object
<clever> you need to `git fetch origin <hash>` first
<bonda_000> oh you mean I need to clone that repo first?
<clever> yeah
<clever> or just let github serve the file
<clever> just replace master in the url, with the above commit
<bonda_000> I'm currently on some 2019 distribution it may have the same strings
<clever> and now you can just download the unstripped start_x.elf from there
<bonda_000> but I will try see if theres a difference
<clever> another thing that massively helps, rename binaries as you import them into ghidra
<clever> start_x-4f63c3bdd4d9ca4f308e683fe7e70501b4ac3232-unstripped.elf for example
<clever> now you have a reminder, on what git revision it came from, and that its the unstripped binary
<clever> that will help when you later have 20 start_x.elf's loaded into ghidra
<bonda_000> huh where am I gonna get this many from
<clever> RPF releases new files several times a month
<clever> and sometimes the new file may have useful data
<clever> so every now and then i load up the latest one, and decompile it some more
<bonda_000> okay that makes sense
<bonda_000> but see there is a thing in all these bootloaders
<bonda_000> you only seem to be able to access ISP regs at 2nd or 3rd bootloader stage otherwise they are unreadable
<bonda_000> its weird
<clever> probably power management stuff
<clever> oh, one problem with the unstripped binary
<clever> every string is tagged with the $S symbol
<clever> ghidra is convinced thats code
<clever> select the whole range (click first, shift click last), hit c to clear the type, then right click the first, data, string
<clever> and for the unstripped binary, at offset ef2ed68, you have the `isp_module` symbol, change the type (t) to `void*[97]` and it will make far more sense
<clever> that is sort of a vtable, for the isp class
<bonda_000> I have it in my decompile as well
<bonda_000> they do have something like a function table for each module and that's where these strings come I suppose
<bonda_000> like if I scroll all the way up in the main decompilation file the first thing in there is a pointer PTR_PTR_s__CAMERA_POOL_END_0ef3a280 its at code address 0ec02800
<bonda_000> thats the beginning of my decompiled .text section
<bonda_000> and then I search for the start_x entry point within the bootcode (my start_x entry address is cec00200) so I search for a scalar 3468689920 and it's not in the bootcode so it must be constructed somewhere in the code
<clever> bonda_000: bootcode.bin understands elf files, and will respect the entry-point defined in the elf headers
<clever> that is very handy when running custom .elf files
<clever> let me load up bootcode-vc4-2022-11-04-f3db2058e9d1f0cefb4c1aff5dbd9fd2d2749233.bin...
<clever> there it is, at offset 800065d0, is a call to what i named fixup_apply, that takes fixup.dat (a binary patch), and directly applies it over the whole damn start.elf file, headers and all
<clever> then it uses memcpy to copy a chunk of code to 0x60008000, search for that!, this is the stub, that changes L2 cache settings, and then jumps to start.elf
<clever> broadcom/bcm2708_chip/vpu_l1_cache_ctrl.h:#define L1_BASE 0x7ee02000
<clever> and once i add a volatile 0x1000 long block, at this offset, it decompiles much better
<clever> line 6 applies the fixup.dat patch, based on the desired gpu_mem, and where the top of ram actually is
<clever> something from 9-14 parses the new elf file (i figured this out on a different .bin version)
<clever> 16 copies the stub to some special sram
<clever> i think 18-21 is an L2 cache flush?
<bonda_000> well part of that is an armstub which is arm code
<clever> and i think 22, is where it goes off to the stub in sram, and uvar8 is the elf entry-point, which came from 10
<clever> the armstub is in start.elf
<clever> bootcode.bin is pure VPU assembly
<clever> bootcode.bin's only job, is to bring the lpddr2 controller online, then load start.elf into dram, and execute it
<clever> start.elf's job, is to boot the rest of the system, drop an armstub at 0, turn on the arm core, and then provide runtime services (like the ISP) to the arm
<bonda_000> yeah mine didn't decompile to the point of recognizing memcpy
<clever> FUN_800002e8 from line 22, is just a `j r1`, its very obvious that the pastebin half was c, and FUN_800002e8 is asm in a second file
<clever> yeah, memcpy isnt labeled at all in the .bin files
<clever> you just have to find it manually
<clever> i forget how, but ghidra can generate a database of function patterns and names
<bonda_000> so that last link you sent is your bootcode entry code?
<clever> and then auto-apply that to future decompiles
<clever> the last pastebin is the end of bootcode.bin, right as it passes control off to start.elf
<clever> this is the entry-point, from 0x8000_0200
<clever> tools->function id, from here, you can generate the database i mentioned
<clever> that helps massively, to apply what youve figured out before to a new .bin file
<bonda_000> create new empty FiDB?
<clever> yeah
<clever> and after youve named a bunch of functions, you can populate the db
<clever> and then when you decompile something else, you can go under analysis options, and enable function id
<clever> and it will try to match the names of everything
<bonda_000> okay but for library calls like memcpy strcmp memset etc you need to identify them yourself right?
<clever> yeah
<clever> but there is one handy trick, thanks to the leak
<clever> load the unstripped start.elf, and populate the FID from that
<clever> if bootcode.bin uses the same functions, it will now carry over
<bonda_000> I dont think I have enough memory to clone the whole firmware repo
<bonda_000> is there a way to get one single branch or whatever
<bonda_000> that particular hash code you sent
<clever> 2024-04-28 04:47:57 < clever> https://github.com/raspberrypi/firmware/tree/master/boot is /boot on master
<clever> 2024-04-28 04:48:04 < clever> just replace master in the url, with the above commit
<clever> and yes, it is huge, my .git is 24gig in size
<bonda_000> so thats the firmware you are decompiling right?
<clever> one of them, yes
<clever> that link takes you directly to the unstripped version
<bonda_000> oh damn thats 10 year old firmware
<clever> yeah, thats how long ago they messed up and leaked all the symbols
<clever> as far as i know, they never repeated that mistake
<clever> but it still supported the isp and camera stuff, so the answers should be in there
<bonda_000> have you generated the pattern folders for your vc4 processor or is it what you also have to do manually?
<clever> when i import a file into ghidra, i rename it manually
<bonda_000> oh yeah it is unstripped
<bonda_000> all the paths of source files are there
<clever> and names for every symbol
<clever> and if we knew how the debug data worked, it may even be possible to recover the source
<bonda_000> ok but then why the bootcode you were telling me about you picked a 2022 one
<bonda_000> and not the same one from that 10 year old unstripped drop. I'm trying to follow what you do
<bonda_000> none of the bins I fed ghidra it recognized, so I had to put mine into .elfs manually alas start_x.elf
f_ has joined ##raspberrypi-internals
<bonda_000> the only problem I see is that this unstripped code is for VideoCore III
<bonda_000> isn't it?
<bonda_000> 10 years ago is before RPi3 came out
<bonda_000> that's why I'm thinking populating some patterns using this unstripped version and trying to apply it to the one that I know works on my 'puter
<clever> oops, i keep getting distrcted with virtio code elsewhere
<clever> bonda_000: the pi0-pi3 are all virtually identical internally
<clever> the only real difference is which arm core is jammed on the side, and the VPU boot rom
<clever> so the isp, hvs, v3d, and all that stuff, performs the same way on everything in that range
<clever> the only major difference, is that the bcm2835 clocked some things at 250mhz, but the bcm2836(pi2) and newer, used a different process node, and clock at 500mhz
<clever> bonda_000: as for why i picked a recent bootcode.bin, i just picked a random one in ghidra, to use as an example
<clever> i named the hash, because i was initially using offsets into the binary, which are only valid for that version
<clever> but once i found 0x60008000, i realized you can search for that, and find the same code in any binary
dolphinana has joined ##raspberrypi-internals
<bonda_000> i guess i will just stick to wrapping the .bin into elfs since its pretty much the same thing
<bonda_000> i'm just new to the ghidra interface and getting a bit lost in the features atm
<clever> yeah, i was planning on moving to elf only as well
<clever> and i did struggle with ghidra too, but after using it for several years, its all easy now
<bonda_000> and these excerpts that you shared with git gists
<bonda_000> they kind of resemble what I see in my bootcode except for the functions that you manually named I guess
<clever> yep
<bonda_000> at the end you have
<bonda_000> do {blink_failure4();}while(true);
<bonda_000> mine says
<clever> yep, thats what causes the bootcode.bin to blink the led 4 times, when start.elf cant be found
<bonda_000> do {pcVar = (code *) swi(0xe); uVar4 = (*pcVar1)((int)uVar4,(int)((ulonglong)uVar4 >> 0x20));} while(true);
<clever> that could be something different, hard to tell from just that chunk
<bonda_000> well it writes something right shifted 32 bits to a variable which could be GPIOLEV
<clever> GPIOLEV should be easy to find, thats in the official docs, but its also read-only i believe
<bonda_000> GPSET I mean
<clever> #define GP_LEV0 HW_REGISTER_RO( 0x7e200034 )
<clever> #define GP_SET0 HW_REGISTER_RW( 0x7e20001c )
<clever> both LEV and SET are an array of 3 x 32bit
<bonda_000> ulonglong is what
<bonda_000> 128bit?
<clever> i think so, but i rarely see that in the rpi code
<clever> that doesnt sound like gpio actions
<clever> can you pastebin the entire function?
<bonda_000> so make it a github file and send you a gist?
<clever> you can make a new gist by just pasting it into the box on here
<dolphinana> Hello! o/
<dolphinana> how's it going here?
<clever> dolphinana: good
<bonda_000> hey
<clever> bonda_000: ah yes, the entry-point, let me compare that to mine...
<clever> bonda_000: https://gist.github.com/cleverca22/baa4f603b5754d309ba9bd8c67c06a89 mine is on the top, yours on the bottom
<clever> but i suspect everything after the swi is garbage
<clever> swi will cause it to jump to an exception handler, and i dont think its ever meant to reach that code
<bonda_000> looks about the same
<bonda_000> where did you get this from
<clever> _start is just at 0x80000200, and for lines 45-76, search your binary for uVar8,0x60008000
<clever> oops, just 0x60008000
<clever> hit s in ghidra, to search for hex values
<bonda_000> i have
<bonda_000> 80003f7c 03 e8 00 80 00 60 mov r3,0x60008000
<bonda_000> and second occurence
<bonda_000> 800063ac 01 e8 00 80 00 60 mov r1,0x60008000
<clever> i need to go eat, i'll be back within an hour
<bonda_000> ok
<clever> bonda_000: back
<bonda_000> same
<bonda_000> I also noticed there's something that r24(base pointer or gp in my decomp) points to that these programs refer to every now and then
<clever> yep, thats a trick to make relocation simpler
<clever> set `gp` to point to something on startup (look at your entry function), and then just do gp-relative load/stores
<bonda_000> if you look at hermanhermitage/dumpbootenv that holds 60008000 huh
<clever> and you dont have to patch 2000 locations when the binary loads to a diff addr
<clever> basically, you just point gp to .data, and then tell gcc that gp is always .data
<bonda_000> thats in the ROM right?
<clever> 0x60008000 is sram
<clever> 0x60000000 is rom
<clever> `mov gp,0x8000c000` is what i find in my bootcode.bin
<bonda_000> isn't boot rom 60010000 down to 60000000
<clever> the rom isnt that big
<bonda_000> 64KB
<clever> my pi1 rom ends at 0x47d0
<clever> only 18kb
<bonda_000> my bootrom is 65536 bytes exactly
<clever> you likely dumped the ram that follows after the rom
<clever> the dump util doesnt know the difference, its all just addresses
<bonda_000> i see yeah maybe
<bonda_000> that's what it was programmed for to dump 64KB
<clever> look at the gap just below 60008000
<bonda_000> at 60008000 in my dump I have
<bonda_000> sleep; bkpt; st r4, (r3+0x10); mov r0, 0x0; bkpt; bkpt; bkpt; bkpt; rti;
<clever> that sounds like garbage
<clever> look up, in the other direction
<clever> youll see a big patch of zeros
<bonda_000> yeah like starting from 60007595
<clever> thats where your rom ends
<clever> 00007720 3e 63 00 60 e2 60 00 60 12 62 00 60 40 00 00 00 |>c.`.`.`.b.`@...|
<clever> 00007730 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
<clever> 00007740 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
<clever> this is the end of my pi3 boot rom
<bonda_000> yeah exactly the same
<clever> the rest of the space up until 60008000 is just unused space where the rom could be
<clever> as the pi2 to pi3, did increase the rom size
<clever> 18kb to 30kb
<bonda_000> so is GP always pointing at 60008000? it's in the isp code too
<clever> nope
<clever> the gp is pointing to whatever the entry function set it to
<clever> 2024-04-28 07:48:36 < clever> `mov gp,0x8000c000` is what i find in my bootcode.bin
<clever> i tend to create a `gp_area` symbol at that address
<clever> but i have yet to figure out how to tell ghidra that gp points there
<clever> so i just create a `struct gp_area`, and populate that
<bonda_000> yeah mine is also mov gp, 0x8000c000 straight away
<clever> the exact number might be set in a linker script
<clever> that would explain why its so nicely aligned
<clever> oh, looking closer, definitely
<clever> there is a giant hole of nulls, from 9e1a to c000
<clever> nearly 8kb of padding, because they didnt adjust things down
<clever> it makes developing simpler, but means bootcode.bin has 8kb of fat that could have been trimmed
<bonda_000> a bunch of strings slightly past that address is what I see
<clever> yeah, the exact location of the strings will depend on how much code is in the binary
<bonda_000> so the bootcode looks something up in the fixup and patches the memory accordingly or patches the start_x elf?
<clever> fixup.dat is a binary patch, that modifies things in start.elf
<clever> and fixup_x.dat is a patch for start_x.elf
<clever> they are a matched pair, and must come from the exact same build
<bonda_000> sorry, I mean't fixup_x
<clever> the inputs to the patch, are the gpu_mem setting, and the addr for the top of ram
<clever> originally, the rpi firmware had no fixup.dat
<clever> start.elf just loaded once, to the address in the elf, and that was it
<clever> and there was only 256mb models of pi
<clever> then a 512mb model came out, and RPF released 2 start.elf files, linked for different addresses
<clever> then gpu_mem and more ram sizes, made things too complex
<clever> so they switched to the patching method
<bonda_000> so what I am looking at in start_x elf is not the real picture
<clever> the only real difference, is that the address of everything is shifted by some number
<bonda_000> my fixup_x on the running pi is 10232 bytes
<clever> [clever@amd-nixos:~/apps/cnlohr/mini-rv32ima/mini-rv32ima]$ readelf -l ./fat-rv32ima.fatter
<clever> Type Offset VirtAddr PhysAddr
<clever> FileSiz MemSiz Flags Align
<clever> LOAD 0x0000000000005000 0x0000000000405000 0x0000000000405000
<clever> 0x0000000000001a50 0x0000000000001a50 R 0x1000
<clever> bonda_000: i'm sure your familiar with these headers, having converted your bins to elf
<clever> ah, my toolchain is ready, redoing the example
<clever> [nix-shell:~/apps/rpi/lk-overlay]$ vc4-elf-readelf -l ../firmware/boot/start.elf
<bonda_000> no mine starts with
<clever> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
<clever> LOAD 0x0004e0 0xcec00100 0xcec00100 0x00100 0x00100 RW 0x20
<clever> > (0xcec00100 - 0xc0000000)/1024/1024
<clever> 236.000244140625
<clever> that first LOAD, starts at about 236mb
<clever> 256-236, is 20, the default gpu_mem
<clever> so, this start.elf, was designed to run 20mb from the "top of ram", on a 256mb model pi
<clever> but if you have a 512mb pi, you need to add 256mb to every address in the elf headers
<clever> and add 256mb to every address in the entire binary
<bonda_000> yeah that 0xcecxxxxx is something I recognize
<clever> fixup.dat, contains a list of where to find those addresses
<bonda_000> i see
<clever> now, lets say you changed it to gpu_mem=30, now you need to subtract 10mb from everything
<clever> this is a big ugly hack, caused by RPF not having elf relocation support :P
<clever> [nix-shell:~/apps/rpi/lk-overlay]$ vc4-elf-objdump -dr build-rpi3-bootcode/platform/bcm28xx/platform.o
<clever> 20: 00 e8 00 00 00 00 mov r0,0x0
<bonda_000> so if I have 1GB ram on my pi
<clever> 20: R_VC4_IMM32 .rodata
<clever> 26: 80 90 00 00 bl 26 <cmd_what_are_you+0x14>
<clever> 26: R_VC4_PCREL27_MUL2 puts
<clever> then the bootcode.bin, has to go thru the entire start(_x).elf file, and add 768mb to every address
<bonda_000> to EVERY address?
<clever> yes
<clever> everything that is pointing within the elf
<clever> but not MMIO
<bonda_000> I mean
<bonda_000> gpu_mem=30 means first 30 MB of RAM are used by VC4
<clever> nope, LAST 30mb
<clever> thats the reason its such a mess
<clever> everything from 0 to (1024-30) belongs to the arm
<clever> and everything from (1024-30) to 1024, belongs to the gpu
<bonda_000> its a bit confusing how on Broadcom data sheet SDRAM is split in four slice one slice per 'alias region'
<bonda_000> what from arm is just straightforward 0x0 - 0x3fffffff
<clever> that is just repeating the same 1gig of ram 4 times
<bonda_000> but thats the awkward VC4 memory map is why its sliced like that
<clever> the top 2 bits of the address, are cache control flags
<clever> any read from 0 to 0x3fffffff, is allowed to access both the L1 and L2 cache
<clever> any read from 0x4000_0000 to 0x7fff_ffff is allowed to reach from the L2 cache, but a cache miss wont load the data into the L2 cache
<clever> anything from 0x8000_0000 to 0xbfff_ffff will use the L2 cache normally, but ignore the L1 cache
<clever> and anything from 0xc000_0000 and up, ignores both L1 and L2
<clever> the reason for all of that, is that you can write to something in the "0 alias" (L1/L2 work normally), then you can do dma from the "4 alias" (L2 reads, but doesnt allocate)
<clever> now the dma will respect what is in the L2 cache, so you dont have to flush the cache
<clever> but the dma wont trash your L2, by filling it with data
<clever> originally, the VPU ran the entire system, and the videocore was arm-less
<bonda_000> thats what I want to do run it arm-less
<bonda_000> so that the 4 cores dont waste power
<clever> you may want to look at my lk-overlay stuff then
<bonda_000> so I'm with you on the split but my VC4 should have all 1GB
<bonda_000> brb
<bonda_000> isn't dma only supposed to work on RAM
<clever> and all 4 of those aliases, are the same 1gig of ram
<bonda_000> and L1 and L2 caches are filled as the processor accesses memory
<clever> the problem, is when the cpu writes something, and the data only lives in the cache
<clever> you want dma to read the most recent version of the data
<bonda_000> yeah thats what write through is for
<clever> you can either do a costly cache flush, or you can just give dma access to the cache
<clever> on x86, the pcie controller is part of the cpu, and it has permission to snoop on the caches, so you never have to flush
<clever> on the vpu, the dma can access the L2 cache, but not L1, and you use the top 2 bits of the address, to control what dma will do
<clever> but dma cant access the arm L1/L2 cache at all (design flaw by broadcom)
<clever> so the arm must flush caches all the time
<bonda_000> so on VC4 if the dma sees the L1/2 data and RAM data differ its going to take the cache version of it?
<clever> not exactly
<clever> one min
<clever> bonda_000: look at fib 2
<clever> the L2 cache is just another slave on the main bus
<clever> so any access to the 0/4/8 alias, will just route the request to the L2 cache
<clever> if the L2 cache has the data, thats a cache hit, it returns the data, and jobs done
<clever> if the L2 cache doesnt have the data, it will then set bits 30/31 to 1 (moving it to the c alias), and send the request back out onto the bus, where dram will respond
<clever> if the original request was to the 4 alias, then the L2 cache doesnt cache the response, and only forwards it on
<clever> but if the request was for the 0 or 8 alias, the L2 cache will fetch an entire cache line, and cache it
<clever> while also forwarding the answer to the original request
<clever> s/fib 2/fig 2/
<bonda_000> thats what I'm looking at
<bonda_000> figure 2
<clever> yep
<bonda_000> whats fib
<bonda_000> ?
<clever> a typo
<bonda_000> oh ok
<clever> fib 3b, is showing the VPU core itself
<clever> dang, i did the same typo again! lol
<bonda_000> thats all videocore?
<clever> that specific example was using the arm, but i have done the same thing entirely on videocore
<clever> this is entirely using the VPU, HVS, and V3D blocks
<clever> the V3D runs shaders to product the triangle in dram
<clever> the HVS takes several 2d images in dram, and outputs them to video
<clever> and the VPU changes the config params to everything, and issues commands
<bonda_000> thats nuts
<clever> and here is 20 sprites, using the VPU and HVS
<clever> its underclocked a bit, so it struggles when too many sprites are on the same scanline
<clever> and this is the most extreme example
<clever> the HVS is drawing 4 sprites
<clever> the top half of the screen is the VPU framebuffer, for VPU logs
<bonda_000> what is that gligtch
<clever> the bottom half of the screen is the linux framebuffer
<clever> bonda_000: the HVS is constantly racing the electron beam, like old 8bit consoles
<clever> and it has to fetch an entire scanline worth of image data, within a certain deadline, at the full hsync rate
<clever> and if you have too many sprites on the same scanline, it cant keep up
<bonda_000> is that VGA?
<clever> most of the examples are ntsc
<clever> but i can do vga
<clever> that one also shows the VPU usage, at 0%
<clever> its using so little cpu, that it doesnt even register
<bonda_000> so mostly done on hardware accelerators
<clever> yep
<clever> this is just regular old c code for mandelbrot, compiled to the VPU and running normally
<clever> it takes about 10 seconds to render a frame
<clever> this is a vectorized mandelbrot, running on the VPU, i got it down to 90ms to render a frame
<clever> https://www.youtube.com/watch?v=l7lIewA9fm4, oh even faster in this version
<bonda_000> and do you know where in the bootcode it releases the reset on ARM? is it through PowerManager or something else?
<clever> never
<clever> bootcode.bin doesnt turn on the arm
<clever> its start.elf that turns the arm on
<clever> this is the source that my open firmware uses
<bonda_000> well then I think what I'll try to do now is try to put the regdump at the exit point of the bootcode and see it if hasn't enabled ISP by that time
<bonda_000> so after it finished writing start_x.elf to SDRAM and before it branches there
<bonda_000> at bootcode stage everything is in 128KB L2 Cache
<clever> the only job bootcode.bin has, is to bring the lpddr2 controller online
<clever> so its not going to be turning the ISP on at any point
<bonda_000> so when the system is not yet booted you cant see these registers, as well as when its fully booted you can't see the ISP registers also
<clever> you need to ask the firmware to turn on the ISP first
<bonda_000> idk I tried playing with PWRMAN registers IMAGE register and it didn't go well
<bonda_000> instead of flipping the bit on ISPRST it killed HDMI
<clever> ive had trouble getting hdmi to even start working
<clever> so thats useful to know!
<bonda_000> thats when Linux fully booted
<clever> first thing i would do, is check what the firmware does here
<clever> #define PM_IMAGE HW_REGISTER_RW( 0x7e100108 )
<bonda_000> idk I guess something in that CSI port probably when you plug in the camera it issues an interrupt and thats when the ISP block goes on
<clever> CSI is completely seperate from ISP
<clever> and i do roughly know how to turn CSI on
<clever> so, for PM_IMAGE, first i go into ghidra, in the memory window, and i create a PM block at 0x7e100000, 0x1000 long, volatile
<clever> then i can use `g` to goto 0x7e100108
<clever> and `l` to label it as `PM_IMAGE`
<clever> then `t` to set the type to `uint`
<clever> now i look at the XREFs, and i can see power_disable_domain, clock_undo_cksm, and clock_suspend_*
<clever> but the list doesnt look complete, so i right click, references, to PM_IMAGE
<bonda_000> as in you can make a typedef struct?
<clever> that can also be done, but i didnt do it this time
<clever> yep, as i expected, there is also a power_enable_domain(), that wasnt visible
<bonda_000> huh
<bonda_000> in that unstripped elf file?
<clever> ye[
<bonda_000> brb 5 min
<clever> _DAT_7e101020 = 0x5a000000;
<clever> i can also see this
<clever> ghidra does that, when it doesnt know about the memory range
<clever> [clever@amd-nixos:~/apps/rpi/rpi-open-firmware]$ git grep 7e101020
<clever> common/broadcom/bcm2708_chip/cpr_clkman.h:#define CM_PERIICTL HW_REGISTER_RW( 0x7e101020 )
<clever> so i do this, and find that its the clock area
<clever> so then i add that as CM
<clever> 0x7e101000 is the base of clock
<clever> _DAT_7e806214 = 0x7e;
<clever> common/broadcom/bcm2708_chip/vec.h:#define VEC_DAC_MISC HW_REGISTER_RW( 0x7e806214 )
<clever> } while ((_DAT_7e00a018 & 2) != 0);
<clever> common/broadcom/bcm2708_chip/apb_async_bridge_ctrl.h:#define ASB_H264_S_CTRL HW_REGISTER_RW( 0x7e00a018 )
<clever> _DAT_7e00b440 = _DAT_7e00b440 & 0xfffffdff;
<clever> common/broadcom/bcm2708_chip/arm_control.h:#define ARM_BASE 0x7E00B000
<clever> case sensitive search tripped me up
<clever> _DAT_7e102080 = uStack48;
<clever> common/broadcom/bcm2708_chip/cpr_apb2wtap.h:#define A2W_HDMI_CTL0 HW_REGISTER_RW( 0x7e102080 )
<clever> bonda_000: so, after labeling every MMIO it used as volatile, the code becomes https://gist.github.com/cleverca22/6b23d57eb6c8d8df3ef4ddb052e5f288
<clever> #define ARM_CONTROL1 HW_REGISTER_RW(ARM_BASE+0x440)
<clever> the arm stuff is defined differently, so you have to read the file to find the addr's
<bonda_000> so I did that added the PM block at 7e100000
<clever> bonda_000: aha, i found a ASB_ISP_M_CTRL, that register will be critical, and the isp will never work without it
<clever> but i have an example of how to use it in the v3d code
<bonda_000> it doesnt show me that function power_enable_domain()
<clever> do you have the symbol tree on the left?
<bonda_000> yeah
<bonda_000> there's still three that start with a dollar sign you mentioned earlier I need to expand
<clever> there is a search box at the bottom of the symbol tree
<clever> and here is an example of those ASB regs being used
<bonda_000> oh I see now
<bonda_000> if I scroll all the way down to 7e100000 it shows what functions refer to this area in memory
<clever> you can also just hit `g` and give it an address
<clever> to goto that addr
<clever> but the list is rather long, and it was fish that i saw disable, but not enable
<clever> 2024-04-28 09:17:19 < clever> but the list doesnt look complete, so i right click, references, to PM_IMAGE
<clever> so i asked for a complete list
<clever> bonda_000: i went over the entire function, and labeled every MMIO address
<clever> you can click `code` to see the non-diff version
<bonda_000> I have references to DAT_7e100108 and how do you give it a name?
<clever> 2024-04-28 09:15:46 < clever> so, for PM_IMAGE, first i go into ghidra, in the memory window, and i create a PM block at 0x7e100000, 0x1000 long, volatile
<clever> re-read the ~5 messages below that one
<clever> then repeat that dance, on every address you see in the decompile
<bonda_000> yeah so it doesnt replace it to PM_Image in the code but still useful
<clever> adding it to the memory window, just changes it to read_volatioe()
<clever> 2024-04-28 09:16:05 < clever> then i can use `g` to goto 0x7e100108
<clever> 2024-04-28 09:16:13 < clever> and `l` to label it as `PM_IMAGE`
<clever> 2024-04-28 09:16:23 < clever> then `t` to set the type to `uint`
<bonda_000> ah
<clever> this then creates a symbol, and it becomes read_volatile(PM_IMAGE)
<bonda_000> forgot the t
<clever> reading the decompile, now that its done, i can see that if param2 is 2, it will do someting in CM_PERIICTL, PM_IMAGE, and VEC_DAC_MISC, so thats likely turning on the VEC (composite video encoder)
<clever> 4, it will do some things with H264
<clever> 8, the ISP
<clever> 0x10, i'm not sure
<clever> 0x40, the v3d, i think
<clever> 0x2000, the CCP2TX (one of the csi things)
<clever> 0x8000, DSI1
<clever> 0x10000, usb
<clever> 0x400, hdmi
<bonda_000> uint or uint32_t* ?
<clever> just `uint` for MMIO
<clever> its not a pointer
<bonda_000> there's like 5 different uints
<clever> just hit enter, and it usually picks the right one
<clever> another issue with the unstripped elf, every string has a $S symbol on it, which makes them show up weirdly
<bonda_000> it didnt become read_volatile(PM_IMAGE) for me but rather PM_IMAGE = 0x5a001000
<clever> i see a ldconfig_get_config($S) in the decompile
<clever> screenshot?
<clever> if i double-click on the $S, i can then use `c` and right click->data->string, to fix the types up
<clever> and it becomes ldconfig_get_config("config_hdmi_boost")
<bonda_000> no I mean the code using this address turned from Ram7e100108 = 0x5a001000 is now PM_IMAGE = 0x5a001000
<clever> sounds like you forgot to check volatile in the memory window
<bonda_000> for the whole PM block?
<clever> yes
<clever> all MMIO is always volatile
<bonda_000> it is RWxV
<clever> https://imgur.com/a/Ry6PsHs your memory window should look like this
<bonda_000> i did make it volatile
<clever> now that ive labeled everything, i can see that this block is clearly in control of the ISP power
<bonda_000> so it put the write_volatile and read_volatile for you?
<clever> yep
<clever> the instant you tag a region as volatile, it does that
<clever> if i then use `s` to search for the addr of power_enable_domain(), it points to a big undefined1[52] array, called power_func_table
<clever> 52/4 is 13, so this is actually a void*[13]
<clever> `t` to fix that type up, and now its clearly an array of power_ functions
<clever> and that is referenced by `power_get_func_table()`
<bonda_000> and it doesnt show up as read_ or write_volatile when accessed
<clever> what does the decompile look like?
<clever> thats strange
<bonda_000> maybe it wants volatile uint or something there's ?? too
<bonda_000> ghidra v11.03
<clever> that ?? is because you didnt flag the PM region as initialized
<clever> so ghidra has no idea what value is in those bytes
<bonda_000> 11.0.3
<clever> my OCD requires it to all say 0, so i check the initialized box
<clever> thats odd, i have version 9.1 of ghidra
<clever> and the PR i'm running from has never been merged
<clever> where did you get ghidra from?
<bonda_000> a .zip from NSA github
<clever> strange
<clever> i built mine from source, using the PR
<clever> let me investigate that...
<bonda_000> it may not really matter
<clever> ive been using the version from here
<bonda_000> whats painful is doing all the hw regs by hand all over again
<bonda_000> oh okay I just got the most recent one and took his language folder
<clever> ?
<clever> just copied the vc4.slaspec file into the latest ghidra?
<clever> and all of the related files
<bonda_000> well that pull request with vc4 support didnt have a Windows .bat so I didn't want to
<clever> ahh
<bonda_000> yeah almost
<clever> i never considered trying that
<bonda_000> ran it through the Eclipse and took a Module file
<clever> i'm on linux, so i just built all of ghidra, directly from the pr
<clever> so you basically just ported the vc4 stuff to the latest ghidra, nice
<clever> i should do the same at some point, my ghidra is pretty old
<bonda_000> the whole data folder plus it needs Module.symvers or something that I got from Eclipse with Ghidra extensions
<clever> i'm going to just assume that PM_IMAGE being in red means volatile, and assume its working fine
<clever> https://postimg.cc/kDNkvWWc but down on line 98, you have a _DAT_7e..., so you need another memory range
<clever> common/broadcom/bcm2708_chip/apb_async_bridge_ctrl.h:#define ASB_H264_M_CTRL HW_REGISTER_RW( 0x7e00a01c )
<bonda_000> yeah that's that bus controller from the patent Fig2 right?
<clever> yeah
<bonda_000> not sure what bridge means in this case
<clever> in this case, there is a port/bridge connecting the ISP to the bus, 2 of them
<clever> ASB_ISP_S_CTRL controls the "slave port", where MMIO on the ISP is exposed to the bus
<bonda_000> oh yeah thats why its unreadable
<clever> ASB_ISP_M_CTRL controls the "master" port, where the ISP is able to perform its own bus transfers (dma)
<clever> both must be enabled for the ISP to function
<clever> but if you enable that while the power is off, the axi bus will malfunction and crash the entire system
<bonda_000> and its just 8 registers long this peripheral memory map
<clever> this is what the decompile gave me, for turning the isp on
<bonda_000> i see it
<clever> lines 158/159, it clears bit0 in the slave register
<clever> #define ASB_ISP_S_CTRL_CLR_REQ_SET 0x00000001
<clever> that is the "clear request" bit
<clever> 160-162, it waits for bit2 to clear itself
<clever> #define ASB_ISP_S_CTRL_CLR_ACK_SET 0x00000002
<clever> thats the ack bit
<clever> 163-167, it does the same thing on the master port
<bonda_000> offtopic
<bonda_000> I wonder if they left all the functions unstripped in the bootcode.bin also?
<clever> not possible
<clever> the symbols are in the elf file
<bonda_000> the 10year old one
<clever> a .bin lacks all of the elf headers
<clever> thats why you needed a custom tool, to put the elf back on
<bonda_000> there still are a bunch of strings laying around .bin even the new one that I have
<clever> those are just log messages it can print
<bonda_000> true
<clever> have you seen how to enable it?
<bonda_000> enable what? logging?
<clever> yeah
<bonda_000> like through uart?
<clever> sed -i -e "s/BOOT_UART=0/BOOT_UART=1/" bootcode.bin
<clever> yep
<clever> just run sed on the bootcode.bin file, to change a 0 to a 1
<clever> and it will print all kinds of logs to the uart
<bonda_000> that's sick
<bonda_000> i mean only after a certain point the firmware initializes the UART and anything before that won't show up on the terminal
<clever> its only 2 levels down the callstack from _entry
<bonda_000> brb
<clever> and i think it brings the uart online before it brings the dram online
<bonda_000> do you think its possible to boot from an off-board EEPROM?
<clever> yep
<bonda_000> or it fails if it doesnt find the sd card
<clever> ive decompiled the boot rom, and documented that
<clever> bonda_000: the rom can load a .bin file from these 8 sources
<clever> usb-device is a tricky one though, it wont "fail" until an attached usb host says the file isnt found
<clever> so if usb-device is enabled, nothing after it will function
<clever> but everything from 1-5 can fail and try the next one
<bonda_000> so in the forums I saw someone say the ISP will use the GPU slices
<bonda_000> but our decompiler setup does not include those 64 bit instructions
<bonda_000> the ones from VideoCore IV manual
<clever> ISP and V3D are entirely different hw blocks
<bonda_000> sorry I mean QPU slices
<clever> and 6by9 (the gpu expert) said ISP doesnt have an instruction set
<clever> so its likely just a bunch of addresses and config params
<clever> which reminds me, let me just look at that in disassembly...
Guest30 has joined ##raspberrypi-internals
<bonda_000> well but all the denoise stuff must have to do with vector math
<bonda_000> which is what QPUs are for. so that code is embedded in ISP and we don't get to see it?
<clever> i think the ISP has its own dedicated hw blocks, for doing the vector math
<clever> there are at least 4 different hw blocks, that can all do different kinds of vector math
<clever> the VPU has vector opcodes, that can act on a [16] of ints at a time, and it can hold 4096 bytes in the vector registers
<clever> the VCE is a turing complete block, that manages the h264/vc1/mpeg2 hw
<clever> the QPU within the V3D is a vector only cpu
<clever> and the ISP loads image data from ram in bulk, does something, and writes it back to ram
<clever> and the HVS also could be said to be a vector core, it loads image data from ram, scales it, and writes it to a FIFO
<clever> write_volatile_4(UINT_7ea00050,*(uint *)(L348.mosaic_patte + uVar3 * 4));
<clever> looking at a random ISP function, i can find this
<clever> L348.mosaic_patte is an array, and from the decompile, its 32bits each, so if i fix the type...
<clever> write_volatile_4(UINT_7ea00050,L348.mosaic_patte[uVar3]);
<clever> it decompiles to this instead
<clever> and uVar3 came from a struct, param_1
<bonda_000> offset 0x50 is a Image Sensor Pipeline Frame Mosaic Register, bits 31:0 MOSAIC
<bonda_000> I put together an xls file alas Broadcom datasheet with SMI and ISP regs
<bonda_000> let me upload it
<clever> where did you get that from??
<clever> having descriptions like that will massively speed things up
<bonda_000> one of the android drops from Broadcom header files
<clever> ah, what was the path within that?
<clever> i must have missed that file
<bonda_000> its on google git, android open stack, architectures Rhea, Hawaii, and something else seem to match our proc
<clever> link? i have a complete copy of the header drop, and its not in here
<clever> i also cant find that on google
<clever> nice, but i'm still puzzled on where you found this information
<bonda_000> it was in the thread I made on the raspberry forum
<clever> thats a different android drop from the one i got, interesting
<bonda_000> I put it into excel file to further fill in
<bonda_000> I put bit field names they provided but almost no descriptions and R/W access
<bonda_000> the actual start_x.elf accesses more than they have listed in that header
<clever> its probably a different version of the ISP
<clever> but thats still a huge help
<clever> cross-referencing to that, and labeling things, i can immediately see useful details
<clever> boom, its setting the file address, to param1, with the top 2 bits cleared, and then set to a 4
<clever> thats the 4 alias i mentioned before
<clever> l2 coherent, non-allocating
<clever> so the ISP will be able to read the L2 cache as it does its thing, but it wont flood the L2 with useless data
Guest30 has quit [Quit: Client closed]
<clever> but, nothing in that android repo refers to things like the mosaic register
<clever> drivers/char/broadcom/isp/isp.c:#include <mach/rdb/brcm_rdb_isp.h>
<clever> oh, wait
<clever> that smells like a linux driver!!
<clever> it is a linux driver!
<bonda_000> there must be
<bonda_000> as well as there is thread control
<bonda_000> I've seen strings like "ISPTASKRESPOND"
<clever> mmap() on the character device, will just map the peripheral directly into userland
<clever> so they dont have to agree with linux GPL, and can hide all the juicy things in userland
<clever> the GPL escape-hatch :P
<clever> but the driver still deals with IRQ's for them
<clever> and they "messed up" and included that full header file, advertising every single register
<bonda_000> well everything there seems to be Linux-driver compatible
<bonda_000> OMX interfaces both on VC4 and Linux side
<clever> yeah, but the driver is missing critical bits of control logic
<clever> but what is isp_ioctl doing...
<clever> ah, just 3 things, wait for irq, release irq, and clock reset
<clever> nothng too interesting
<bonda_000> finding the ISP irq body would be interesting
<clever> static irqreturn_t isp_isr(int irq, void *dev_id)
<clever> thats what this is doing
<clever> upon receiving an irq, it reads the STATUS register, then writes it right back into the status register
<clever> a typical write 1 to clear pattern
<bonda_000> so in that Android document it says there are two ISP blocks present. Maybe its for the phones though with front and back cameras
<clever> it then signals for ioctl() to return from the ISP_IOCTL_WAIT_IRQ
<clever> on the rpi, a single ISP can service 2 cameras
<bonda_000> 0x75000 apart from each other
<clever> the only reason to have an extra, is to increase the bandwidth
<clever> more pixels/sec
<clever> but, the rpi does also use the ISP in h264 encode/decode
<clever> the h264 stuff requires the pixels to be in a really strange order
<clever> and the ISP is good at copying data, and changing the order around
<bonda_000> its beyond me what is Linux ioctl doing on the VC4 side
<bonda_000> its like a mini Linux on the VC4 too you are trying to say?
<bonda_000> or, to be more exact, the ThreadX OS that runs has drivers and Isp is one of them
<clever> i only just now noticed, the repo you linked is a linux fork, lol
<clever> the start.elf firmware is based on threadx, and contains drivers for the isp, hvs, v3d, vce, and much more
<clever> and it also contains services like mmal, openmax, vchiq, and more
<clever> and the firmware acts as a middle-man, for any requests to the hw
<clever> line 89, the isr (interrupt service routine), it will read the status reg, write it back to status, stash that in a struct, and signal completion of irq_sem
<bonda_000> yeah that repo is not much different from what we have aside from more register definitions, it still talks to the VC4 through mmal-vchiq
<clever> line 212, the mmap function for this char device, it will map RHEA_ISP_BASE_PERIPHERAL_ADDRESS when you try to mmap the char device
<clever> think of this like mmap on /dev/ttyS0
<clever> the driver is free to map whatever it wants
<bonda_000> well, I guess if you could enable ISP on RPi then that driver would work on RPi too
<clever> but that driver is only 2% of the code
<clever> they hid the real logic in userland
<clever> likely a closed-source library
<clever> and because its not linking with linux, it doesnt have to respect the GPL and reveal the source
<clever> loop-holes
<clever> the only thing the driver does, is manage the clock, the irq, and expose things via mmap
<clever> write_volatile_4(ISPII_FIFO_OFFSET,uVar2 << 8 | uVar1);
<clever> ah, ive seen very similar code in the HVS stuff
<clever> the fifo is a chunk of ram, and these 2 vars are the start/end of where a fifo lives in the ram
<clever> in one case, its spanning from 0x18 to 0x24
<clever> i'm not sure why the ranges are so short, but things will make more sense with time
<clever> 2 more tips, then i gotta run for a bit more
<bonda_000> linux/drivers/media/platform/bcm2835/vc4-regs-unicam.h
<clever> first, look at isp_write_internal in the unstripped binary
<bonda_000> in the rpi kernel
<clever> yep, ive delt with unicam before
<clever> unicam takes the raw CSI from a camera, and dumps raw bayer into dram
<bonda_000> comments have links in them
<clever> and thats all it can do
<clever> the ISP then converts that bayer into yuv or rgb, and applies corrections to it
<clever> the ISP also generates stats, like the average brightness
<clever> and then other software will adjust the gain, both analog and digital
<clever> 2024-04-28 11:23:00 < clever> 2 more tips, then i gotta run for a bit more
<clever> 2024-04-28 11:23:10 < clever> first, look at isp_write_internal in the unstripped binary
<clever> its got a giant series of if's, checking a bit in param_4, and then calling a function that writes 2 or 3 ISP registers
<clever> this is the open source camera stack for the rpi
<clever> it talks to the unicam (via linux and v4l)
<clever> it talks to the isp (via v4l again, but then linux goes thru the mmal and into start.elf)
<clever> and ive been told before, that the start.elf stuff there, is a very thin wrapper
<clever> try comparing the libcamera code, to the isp_write_internal function
<clever> for example, isp_write_internal will call isp_plat_write_lens_shading, if `param_2 * 8 != 0`
<clever> src/ipa/rpi/vc4/vc4.cpp: bcm2835_isp_lens_shading ls = {
<clever> src/ipa/rpi/vc4/vc4.cpp: ctrls.set(V4L2_CID_USER_BCM2835_ISP_LENS_SHADING, c);
<clever> include/linux/bcm2835-isp.h:#define V4L2_CID_USER_BCM2835_ISP_LENS_SHADING \
<clever> (V4L2_CID_USER_BCM2835_ISP_BASE + 0x0002)
<clever> dang, a 2 rather then an 8
<clever> but, now that we know the names of every register, we can apply that to a new start.elf
<bonda_000> that is a fictional structure for communicating with mmal
<clever> and use FID to carry the unstripped function names over
<bonda_000> as far as my research it got nothing to do with hardware registers
<clever> and then see how everything fits together
<clever> yeah, but something in start.elf must read that
<clever> src/ipa/rpi/vc4/vc4.cpp: bcm2835_isp_lens_shading ls = {
<clever> src/libcamera/pipeline/rpi/vc4/vc4.cpp: reinterpret_cast<bcm2835_isp_lens_shading *>(s.data());
<clever> ah yep, that second file, is creating a dmabuf containing the struct
<clever> so its passing the entire struct to the kernel
<clever> ls->dmabuf = lsTable_.get();
<clever> isp_[Isp::Input].dev()->setControls(&ctrls);
<clever> so it stores the lens shading table, into the lens shading dma buf
<clever> src/libcamera/v4l2_device.cpp:int V4L2Device::setControls(ControlList *ctrls)
<bonda_000> on the vc4 side
<bonda_000> undefined __stdcall isp_init(void) at 0ed71d5c
<bonda_000> three lines of code
<bonda_000> but there's something going on in the IRQ handler to the ISP. I wonder what's the source would that be
<clever> first thing i notice in isp_init, is that its creating a task to run isp_task()
<bonda_000> finished processing a tile/frame or data request
<clever> thread*
<clever> that contains a big do/while loop
<clever> but first, it runs isp_plat_init()
<clever> and in there, i can see it grabbed the intctrl_get_func_table, interrpt controller
<bonda_000> void isp_init(void)
<bonda_000> {
<bonda_000> int unaff_gp;
<bonda_000>
<bonda_000> *(undefined4 *)(unaff_gp + 0x1fb8) = 0;
<bonda_000> *(undefined4 *)(unaff_gp + 0x14ec) = 1;
<bonda_000> /* WARNING: Subroutine does not return */
<bonda_000> memset((uint *)(unaff_gp + 0x1fbc),0,8);
<bonda_000> }
<bonda_000> is what I see
<clever> that didnt decompile right
<clever> i think your memset is configured improperly
<clever> right click memset, and edit function
<clever> it should not have "does not return" checked
<bonda_000> yeah then 06 ef a4 12 1a 00 it did not disassemble
<clever> uncheck "no return" on the memset funciton
<bonda_000> Edit Function Signature
<clever> yep, that one
<bonda_000> has a tick on No return
<clever> untick it
<clever> something incorrectly said that memset doesnt return
<clever> and if memset doesnt return, the code after it doesnt matter
<clever> an interrupt table
<clever> 72/8 is the ISP
<clever> and bingo
<clever> bonda_000: isp_plat_init, calls rtos_register_lisr, with 72, and isp_int_handler!
<bonda_000> still doesnt decompile right
<clever> just like linux, it reads ISP_STATUS, then writes it right back to ISP_STATUS
<bonda_000> or do I have to explicitly say start all over with new memset setting
<clever> select all of the asm in that function, and hit c to clear it
<clever> then select the first byte, and hit d to decompile it again
<bonda_000> oh yeah now its all good
<bonda_000> yeah its a thread
<clever> and do you see the register_lisr in isp_plat_init?
<bonda_000> it keeps doing this iVar2 = *(int *)(unaff_gp + 0x1fc8);
<bonda_000> and only that affects the return value
<bonda_000> some unknown flag somewhere
<bonda_000> in the isp_init()
<clever> down in the data type manager, under your elf file, create a new struct, call it gp_area, and make the size at least 0x2000
<clever> and hit save
<clever> then select that `unaff_gp`, hit ctrl+l, and set it to `gp_area*`
<clever> and then select the new variable, and use `l` to rename it to `gp`
<clever> it will turn into things like `*(uint*)&gp->field_0x1234`
<bonda_000> what does "plat" stand for?
<clever> platform
<clever> isp_init is meant to run on many platforms (different chips)
<bonda_000> yeah isp_plat_init refers to 7ea00000 isp regs
<clever> and isp_plat_init is the bcm2835 specific glue code, that deals with irq handlers, irq#, and other things
<clever> if you label things properly (based on the headers you found), youll get this
<clever> thats checking the ISP_ID_SIGNATURE
<bonda_000> wait but if gp is a runtime thing its not going to give me any sensible still
<bonda_000> anything*
<clever> but it will help you when 2 different functions refer to the same offset
<bonda_000> right
<bonda_000> brb
<clever> as an example, isp_signal_task() gets ran by the irq handler, and it does _tx_event_flags_set(&gp->field_0x1a12a4,param_1,0);
<clever> search->program text, check only instruction operands, search for 1a12a4
<clever> it searches starting from the cursor, so scroll all the way to the top, and click there
<clever> first hit is isp_exit, as it destroys the event
<clever> isp_init is creating it, and calls it ISPTASKEVENT
<clever> isp_task also refers to it
<clever> you can now clearly see, that isp_init creates the event flags, the thread, and then gets the event flags
<clever> isp_exit will delete the event flags
<clever> and isp_task, will set the flags (the very same flag init gets), so that means isp_init wont finish, until the isp_task() thread has reached this point
<clever> and the do/while loop, then gets flags
<clever> and the int handler sets the flags
<clever> so basically, the int handler will unblock the thread, and then the thread takes action
<bonda_000> i see that it takes some local_34 variable and compares it with bit fields
<clever> exactly the same thing linux was doing with its complete(&dev->irq_sem);
<bonda_000> thats the flags variable
<clever> and it gave the addr of local_34 to flags_get
<clever> so local_34 likely contains the flags the irq handler signaled
<clever> and then based on those flags, isp_task() does different things
<bonda_000> isp_task = &gp->isp_task;
<bonda_000> so gp here is some global struct that holds RTOS tasks
<bonda_000> their handles and stats and so on
<clever> gp is basically just .data
<clever> it holds everything
<clever> so, as an example, if the flags contains 0x40, it calls isp_sw_stage_handler
<clever> but isp_int_handler never sets that bit directly
<clever> 0x30...
<clever> not found either
<bonda_000> line 100 void isp_task(void)
<clever> this may take days to unravel, but the header you found is massively valuable
<bonda_000> line 120 isp_task = &gp->isp_task;
<bonda_000> isnt it a name conflict
<clever> it is
<clever> ghidra doesnt care
<clever> but if you tried to paste this into a file and compile, it will fail
<bonda_000> undefined1 *isp_task has the same name as the function name and its all pointers
<clever> i dont always check for collisions when naming things
<bonda_000> yeah i guess so
<clever> ghidra even lets you put invalid characters in a symbol name
<bonda_000> but what does that mean though
<bonda_000> you didnt name it like that manually did you?
<clever> i did name everything in that line
<bonda_000> you wrote isp_task = &gp->isp_task;?
<clever> i used `l` to rename the variables
<clever> and i created a gp_area struct
<clever> as i explained above
<clever> 2024-04-28 11:44:20 < clever> down in the data type manager, under your elf file, create a new struct, call it gp_area, and make the size at least 0x2000
<bonda_000> so what was it before
<clever> re-read everything starting here
<bonda_000> yeah i remember the gp part
<clever> it was just `gp_uaff + 0x1234`
<clever> creating the struct, and changing the type, turned it into `local_gp->field_0x1234`
<clever> then i renamed everything, to make more sense
<clever> you want a single `struct gp_area` for the entire binary
<clever> and just change every `unaff_gp` into a `gp_area*`
<bonda_000> ah I still have these dollar signs and something still isn't decompiling correctly
<clever> as i said, $S is a string constant
<clever> use `c` to clear that, then right click the first character, data, string
<clever> the official compiler puts a $S on every single string, and ghidra blindly assumes symbols in .text are functions
<clever> yep, those are strings
<clever> > Buffer("4953505441534b","hex").toString("ascii")
<clever> 'ISPTASK'
<clever> the first one says this
<clever> select every line from $S to the end of that block, hit c to clear it
<clever> and youll see the char's pop up
<clever> then right click the first character, data, string
<bonda_000> what about this $t do you have it like that also?
<clever> that one is on the same line as the function entry, it can be ignored
<clever> $c is the offset table for a switch-case block
<clever> i think $t is the end of the offset table
<clever> switch.b is always followed by a series of 8bit ints (t, byte)
<clever> while switch is followed by 16bit ints (t, short)
<bonda_000> okay I did convert it to a string but there's still two ?? ??
<clever> ghidra will get that wrong a lot
<clever> the compiler decided to make the "ISPTASK" string 32bit aligned
<clever> so it just put 2 junk bytes at the end of the function to pad it out
<clever> those can be ignored
<clever> x86 does the same thing
<clever> yep, thats just padding, ignore it
<clever> you also have a second $S string in that screenshot, that needs fixing
<bonda_000> okay because it looked like an instruction
<bonda_000> yeah still undecoded bytes
<clever> it is a valid instruction, a nop
<clever> but its unreachable
<clever> the ldm right above it, changes pc, returning from the function
<bonda_000> nvm I was looking at your isp_task and was wondering why mine is so short in comparison
<clever> `ldm r6-r9, pc, (sp++)`, this says to load many, r6, r7, r8, r9, and pc, from the stack, and increment the sp as it goes
<bonda_000> and its isp_init that I'm looking at
<bonda_000> I thought that those bytes and strings wouldnt let decompiler put it all into one function
<clever> the $c and $t can mess that up a lot
<bonda_000> so logically its still good just looks nicer
<clever> the function ends early, and ghidra doesnt correctly assign opcodes to the function
<clever> the only solution ive found, is to hit the delete key on every $c and $t, to untag it as a function
<clever> and then delete the function tag for the function itself, and remake it with `f`
<bonda_000> yeah load pc that would be the jump/branch
<clever> now look at the top of the same function, youll see `stm r6-r9, lr, (--sp)`
<clever> thats storing r6-r9, and the link register
<bonda_000> yeah pushed the frame
<clever> the `bl` opcode saves the old pc in lr, the link register, because its branch and link
<clever> the stm stores that lr to the stack
<clever> and the ldm loads the old lr directly into pc, skipping a step and making the return faster
<bonda_000> and what about these constants IMAGE_FORMAT that it can't identify
<bonda_000> says somewhere in the low ram 0x00040000
<clever> i think IMAGE_FORMAT was a #define constant
<clever> and ghidra is confused, and thinking every 0x00040000 (bit 18 set) is an IMAGE_FORMAT
<bonda_000> so the 01 00 is an unimplemented "nop" I guess
<bonda_000> wasn't it also mentioned that you have to pad each branch with 3 nops on VC4?
<bonda_000> to avoid data corruption
<clever> i dont know of that
<bonda_000> because it doesnt flush the pipeline immediately
<bonda_000> and whats after the branch begins to execute
<clever> ive never ran into trouble with that
<bonda_000> i'm not sure if thats the QPU side or VPU side but I've seen it somewhere
<clever> the QPU avoids that whole mess, by only starting an opcode on every 4th clock cycle
<clever> so you get a free 3 cycles of delay between every opcode
<bonda_000> and to think of it if there is a FIQ in the interrupt controller that VC4 and ARM share then we have FIQ in the VC4 also?
<bonda_000> despite that FIQ is the ARM feature
<bonda_000> probably no
<bonda_000> yeah GPU has its own interrupt handling scheme
<clever> no FIQ on the GPU
<clever> for arm32, there are only 2 irq entry points, normal and fast
<bonda_000> if thats entry 72 in the IRQ table there should just be a branch to the handler
<clever> in both cases, it just sets pc to an offset from the vector table start
<clever> the vector table on the VPU, is just an array of 128 function pointers
<clever> slots 0-13 are various cpu exceptions, like divide by zero
<clever> slots 32-63 are software interrupts, `swi 3` will jump to the addr in slot (32+3)
<clever> slots 64-127 are hw interrupts, one per peripheral
<clever> this code just sets the hw to jump to fleh_irq on any interrupt
<clever> that then saves all registers to the stack, and jumps back into C land
<bonda_000> so what seems like a "shared" interrupt controller between ARM and GPU probably has strict distinctions within what goes where
<bonda_000> like the ARM cannot service ISP interrupt or VC4 can't service Arm SysTick timer (obviously)
<clever> the arm can service isp interrupts just fine
<bonda_000> because its still an electrical signal to the processor
<clever> all 64 peripheral irq's go into the arm irq controller
<clever> and then you can just follow the official bcm2835 datasheet, to configure that irq controller
<clever> and the irq's you enable, will then interrupt the arm
<bonda_000> well to be exact its 31 processor internal exceptions (which processor though), 31 software interrupt types, and 64 peripheral interrupts
<bonda_000> huh isp is one of the processor internal exceptions
<bonda_000> clearly applies to the VC4
<clever> yeah, i have no clue what exception 8 is doing
<bonda_000> arm reset vector is like Reset, BadData, BadInstruction, IRQ, FIQ
<clever> but 72 is clearly the irq
<clever> bonda_000: need to run for a bit, i'll be back in a few hours maybe
<clever> try to hang around, we can work on things more later
<bonda_000> yeah same
Stromeko has quit [Ping timeout: 260 seconds]
Stromeko has joined ##raspberrypi-internals
bonda_000_ has joined ##raspberrypi-internals
bonda_000 has quit [Ping timeout: 264 seconds]
dolphinana has quit [Ping timeout: 268 seconds]
<bonda_000_> im back
<clever> bonda_000_: as am i
<clever> ah, i see a isp_plat_write_denoise, you mentioned that
<clever> i'm seeing some interesting patterns
<clever> like, isp_plat_write_hresize, is writing to registers that start with ISPHR_
<bonda_000_> to be honest most of this stuff is what probably a photographer knows better than a programmer
<clever> yeah
<clever> ive had the same issue with the 3d and audio stuff
<clever> it took years to figure out some of the v3d hw
<clever> even when i had complete docs
<clever> the problem, was just wrapping my head around how certain shaders work
<bonda_000_> its mind boggling Pi has been around for decades and I only discovered it for myself in 2019
<clever> have you tried using the lk-overlay repo yet?
<bonda_000_> I still have all your gists open for further analysis
<bonda_000_> thats where your vc4 programs are right?
<clever> do a `git clone https://github.com/librerpi/lk-overlay --recurse-submodules` and then try building it with...
<clever> `make PROJECT=rpi3-bootcode`
<clever> yeah, ive put all of my vpu code into that repo
<clever> -rwxr-xr-x 1 clever users 68K Oct 27 2023 build-rpi3-bootcode/lk.bin
<clever> when the build is done, you should get this file
<clever> copy it to an sdcard, rename it to bootcode.bin, and then fire up that pi!, with a uart adapter hooked up
vriska has quit [Ping timeout: 256 seconds]
vriska has joined ##raspberrypi-internals
<bonda_000_> is it a kernel? I'm trying not to gaze at the code for very long and strap the bare minimum of setting up the isp.
<bonda_000_> i think clocking is the next important thing to figure out
<clever> bonda_000_: its based on little-kernel, from https://github.com/littlekernel/lk
<clever> so you get threads, a heap, a command prompt, and a bunch of other fun toys
<clever> that is the source code that powers https://www.youtube.com/watch?v=suswjbpR1HU
<bonda_000_> wait its a kernel that run on the VPU solely?
<bonda_000_> with a GUI or at least a terminal?
<clever> its a kernel that runs entirely on the vpu, it has graphics support, but no gui
<clever> the command prompt is over the uart, but it can print the output to the monitor as well
<clever> usb input doesnt work currently, split transaction issues
<clever> for complex reasons, if you enable usb2, then you only get usb2, no usb1
<clever> and all keyboards/mice are usb1 based
<clever> ive not tested it, but i think forcing it to usb1 mode would work, but then you obviously loose the usb2 speeds
<bonda_000_> is hermanhermitage/tinyasm gonna work to compile it?
<clever> you need the gcc from https://github.com/itszor/vc4-toolchain
<clever> and that gcc needs the binutils, from the same link
<bonda_000_> so build that gcc?
<clever> yep
<clever> you can also use the nix package manager to automate building everything
<clever> if your on linux, it should be as simple as these 2 commands:
<clever> $ sh <(curl -L https://nixos.org/nix/install) --daemon
<clever> [clever@amd-nixos:~/apps/rpi/lk-overlay]$ nix-shell
<clever> [nix-shell:~/apps/rpi/lk-overlay]$ make PROJECT=rpi3-bootcode
<clever> [nix-shell:~/apps/rpi/lk-overlay]$ ls -lhs build-rpi3-bootcode/lk.bin
<clever> 45K -rwxr-xr-x 1 clever users 53K Apr 28 15:28 build-rpi3-bootcode/lk.bin
<clever> ok, 3 commands, if you dont count ls, lol
<clever> https://nixos.org/download/ has docs on installing nix, if you want to read more about it
dolphinana has joined ##raspberrypi-internals
<clever> the rpi3-bootcode project, is more of just a demo of a basic bootcode.bin, which you could then expand upon
<clever> vc4-stage1 is a bootcode.bin that functions similarly to the official one, it brings dram online, then loads lk.elf, from ext4!! (it lacks fat support, lol)
<bonda_000_> gives me some error
<clever> what error?
<bonda_000_> trying to build vc4-toolchain
<bonda_000_> something about buggy make lol
<clever> building it yourself, or with nix?
<bonda_000_> myself
<clever> try it with nix
<dolphinana> hi o/
<clever> the rpi3-start project, builds a start.elf, which is compatible with the whole pi0-pi3 range
<dolphinana> (I use vc4-toolchain which I built by myself)
<clever> you can either use the closed bootcode.bin to load it
<clever> or you can use the open vc4-stage1 to load it
<bonda_000_> okay its unpacking 1 channels
<bonda_000_> taking its time
<clever> it will be a bit slow the first time, since it has to build gcc, but its entirely automated, so you will have far fewer issues
<bonda_000_> okay its done
<bonda_000_> the nix part
<clever> now try `nix-shell` in the lk-overlay directory
<bonda_000_> yeah its like hung
<bonda_000_> no prompt
<clever> its probably doing network and cpu stuff
<clever> check `top` in another window
<bonda_000_> yeah its busy
<bonda_000_> using up CPU
<clever> sounds good, just wait
<bonda_000_> its installing a lot of things
<clever> yep
<bonda_000_> is that all dependencies for your OS?
<clever> most of them are just for building gcc
<bonda_000_> i see
<clever> i need to get some sleep, but once its done it will drop you back into a shell, that says nix-shell, like i showed above
<clever> just run make as i showed above, and youll get a .bin out
<clever> just copy that bin to an sd card, and rename it to bootcode.bin
<bonda_000_> alright
<bonda_000_> have a good night
<bonda_000_> it failed though
<bonda_000_> says I need a x64 linux
<bonda_000_> I'm gonna get that going just in case we need it tomorrow
f_ has quit [Ping timeout: 260 seconds]
f_ has joined ##raspberrypi-internals
f_ has quit [Remote host closed the connection]
f_[xmpp] is now known as f_[not-xmpp]
f_[not-xmpp] is now known as f_[xmpp]
Ad0 has quit [Ping timeout: 268 seconds]
Ad0 has joined ##raspberrypi-internals
jcea has joined ##raspberrypi-internals
<bonda_000_> I keep getting the error: a 'x86_64-linux' with features{} is required to build 'nix/store/ijvhxlzfpigvmgzg3bmg9a11wc7kp106-source.drv', but I am a 'aarch-linux' with features {benchmark, big-parallel, kvm, nixos-test, uid-range}
<bonda_000_> I have windows on my main machine so I guess I will have to do it manually
<clever> bonda_000_: yeah, there is some issues in the nix code, and it can only build the cross-compiler on x86
<clever> bonda_000_: the nix page i linked, includes directions for WSL2, which is windows based
<clever> so you can try things there
<bonda_000_> can't I just build binutils and then build lk?
<bonda_000_> like manually
<clever> bonda_000_: yeah, you can still try that route as well
<bonda_000_> i don't see any windows instructions only the bash script
<clever> bonda_000_: WSL2 lets you get a full linux while on windows
<clever> or you can just install ubuntu under virtualbox
<dolphinana> hii, I'll go sleep now
<dolphinana> good night ^^
<bonda_000_> good night
<dolphinana> and happy hacking! ^^
<bonda_000_> wait I only can have wsl1 with my windows version
<bonda_000_> i may just get a partition on a hard drive for that
dolphinana has quit [Quit: Leaving]