f_ changed the topic of ##raspberrypi-internals to: The inner workings of the Raspberry Pi (Low level VPU/HW) -- for general queries please visit #raspberrypi -- open firmware: https://librerpi.github.io/ -- VC4 VPU Programmers Manual: https://github.com/hermanhermitage/videocoreiv/wiki -- chat logs: https://libera.irclog.whitequark.org/~h~raspberrypi-internals -- bridged to matrix and discord
jcea has quit [Ping timeout: 256 seconds]
Stromeko has quit [Quit: Going… gone.]
Stromeko has joined ##raspberrypi-internals
ungeskriptet7 has joined ##raspberrypi-internals
ungeskriptet has quit [Ping timeout: 246 seconds]
ungeskriptet7 is now known as ungeskriptet
f_ridge has quit [Remote host closed the connection]
f_ridge has joined ##raspberrypi-internals
Stromeko has quit [Quit: Going… gone.]
Stromeko has joined ##raspberrypi-internals
<f_ridge> <x​2x6_/D> Hello, clever and everyone. I was in long journey. On my car. I moved from Lithuania to France.
<f_ridge> <x​2x6_/D> In the process I started to use my rapsberry pi 3b+ OS to test how it works as a go-pro or automobile video registrator.
<f_ridge> <x​2x6_/D> I was using mml for a graph that will give me output to SPI display and to sdcard as well.
<f_ridge> <x​2x6_/D> Hello, clever and everyone. I was in long journey. On my car. I moved from Lithuania to France.
<f_ridge> <x​2x6_/D> In the process I started to use my rapsberry pi 3b+ OS to test how it works as a go-pro or automobile video registrator.
<f_ridge> <x​2x6_/D> I was using mml for a graph that will give me output to SPI display and to sdcard as well.
<f_ridge> <x​2x6_/D> this is the video of walking in Brussels, I have recorded overy walking for 10 minutes or so.
<f_ridge> <x​2x6_/D> Camera is imx219.
<f_ridge> <x​2x6_/D> I have an SPI 320x240 display and as I run my own kernel.img, I have only 2 partitions on sdcard - 1. standard boots fs fat32 and 2. just a raw partition, as a destination to encoded video frames.
<f_ridge> <x​2x6_/D> The graph that I use:
<f_ridge> <x​2x6_/D> 'ril.camera' , set to size 1920x1088 with 30 fps.
<f_ridge> <x​2x6_/D> 'ril.video_encoder' set to encoding H264
<f_ridge> <x​2x6_/D> 'ril.isp' acting as a resizer to convert camera preview port from 1920x1088 to 320x240
<f_ridge> <x​2x6_/D> ril.camera.video port is connected to ril_video_encoder.input
<f_ridge> <x​2x6_/D> ril.camera.preview_port is connected to ril.isp.input
<f_ridge> <x​2x6_/D> ril.isp.output is sequence of images resized to 320x240 they are getting drawn on my SPI display as a view-finder
<f_ridge> <x​2x6_/D> ril.video_encoder.ouput is a sequence of H264 data packets, where first packet is a kind of file header with all the required video stream description for players and all consequent packets are one of 2 types of H264 video frames - there are kind of KEYFRAMES , which are like the initial commit and then the follow-up diffs, that can be applied to the initial KEYFRAME to get the actual changed vide
<f_ridge> <x​2x6_/D> So this encoder output is managed by my OS and at some point gets written to SD card. I use bcm2835_emmc peripheral.(edited)
<f_ridge> <x​2x6_/D> I used two configurations during my trip (i actually had 2 sdcards that I used in different times . On one I had a smaller frame size set in ril.camera - 1280x1080. This made even smaller bitrate.
<f_ridge> <x​2x6_/D> I used two configurations during my trip (i actually had 2 sdcards that I used in different times . On one I had a smaller frame size set in ril.camera - 1280x1080. This made even smaller bitrate.
<f_ridge> <x​2x6_/D> If you click on the video, you will see it is very abrupt, but not glitchy. The problem is that VCOS is trying to feed me the video buffers with the FPS that I have requested (25 or 30 i don't remember now), so it sends the buffers 30 times per second, but I fail to process the buffers quickly and return the buffers back to VCOS too late. VCOS gets starvation for buffers and has to skip frames - s
<f_ridge> <x​2x6_/D> For encoder I give 128 buffers by 256kbytes to VCOS, so when video data starts being generated - first VideoCore has plenty of memory to put this data to. Then It starts to give me the buffers it has already filled and the amount of available buffers on the VideoCore side starts to shrink quickly. I have tested - If I just wait until videocore gives me all those buffers, then I hit a breakpoint an
<f_ridge> <x​2x6_/D> What I do is I eventually have to write out video data to SD card, that's the whole idea, but SD card is written too slowly.
<f_ridge> <x​2x6_/D> I tried different ways to manage that.
<f_ridge> <x​2x6_/D> One thing I do I maintain a 2 buffers to switch between them . One is used as source for write out to SD CARD and the other is used as destination to memcopy-collect videodata from VCOS, so I memcpy buffer from vcos and give back the buffer quickly. When buffer is full - I mark this buffer as filled and send a writeout request to sdcard thread, which handles DMA writes to sdcard. And while it is d
<f_ridge> <x​2x6_/D> I tried to do these two interleaving buffers big enough - like 4 megbytes each, so I did not have too much overhead in a sd card write request and most of it would be pure IO.
<f_ridge> <x​2x6_/D> But I clearly see that when DMA IO starts to write to sdcard, memcopies also become slower and I see dropped frames
<f_ridge> <x​2x6_/D> One thing I do I maintain a 2 buffers to switch between them . One is used as source for write out to SD CARD and the other is used as destination to memcopy-collect videodata from VCOS, so I memcpy buffer from vcos and give back the buffer quickly. When buffer is full - I mark this buffer as filled and send a writeout request to sdcard thread, which handles DMA writes to sdcard. And while it is d
<f_ridge> <x​2x6_/D> I tried to do these two interleaving buffers big enough - like 4 megbytes each, so I did not have too much overhead in a sd card write request and most of it would be pure IO.
<f_ridge> <x​2x6_/D> But I clearly see that when DMA IO starts to write to sdcard, memcopies also become slower and I see dropped frames. So there is a huge bus contention.(edited)
<f_ridge> <x​2x6_/D> One thing I do I maintain a 2 buffers to switch between them . One is used as source for write out to SD CARD and the other is used as destination to memcopy-collect videodata from VCOS, so I memcpy buffer from vcos and give back the buffer quickly. When buffer is full - I mark this buffer as filled and send a writeout request to sdcard thread, which handles DMA writes to sdcard. And while it is d
<f_ridge> <x​2x6_/D> I tried to do these two interleaving buffers big enough - like 4 megbytes each, so I did not have too much overhead in a sd card write request and most of it would be pure IO.
<f_ridge> <x​2x6_/D> But I clearly see that when DMA IO starts to write to sdcard, memcopies also become slower and I see dropped frames. So there is a huge memory bus contention, but I am not sure(edited)
<f_ridge> <c​lever___/D> that reminds me, when the linux CSI driver is active, it requests a higher VPU/core clock
<f_ridge> <c​lever___/D> did you pin yours to 500mhz?
<f_ridge> <x​2x6_/D> I also tried already (as we have discussed with Clever) to directly DMA from the buffer provided by VCOS to sdcard, this looks not very possible, because there are alignment issues - as I buffers from VCOS are all of different length
<f_ridge> <x​2x6_/D> Well, I don't know how to do this
<f_ridge> <x​2x6_/D> First what is the Core clock?
<f_ridge> <c​lever___/D> the core clock is the clock rate the main AXI bus runs at
<f_ridge> <x​2x6_/D> Thats the device btw)
<f_ridge> <c​lever___/D> nice
<f_ridge> <x​2x6_/D> Thats the device btw) JTAG and uart headers are hanging(edited)
<f_ridge> <x​2x6_/D> The second issue is that preview kind of lags. So the buffers, going through 'ril.isp' undergoing resize proceduer come half a second later and display shows great lags
<f_ridge> <x​2x6_/D> Can you point me to the code, where CSI driver requests a higher core clock?
<f_ridge> <x​2x6_/D> and if it is required to be higher, why don't they set the clock to be higher by default?
<f_ridge> <x​2x6_/D> I am starting to think I am persuing impossible requirement to write 1920x1088x30fps to sdcard, maybe this is not possible just by data rate
<f_ridge> <c​lever___/D> power saving reasons, when the camera isnt active, it runs at a lower clock rate
<f_ridge> <x​2x6_/D> I was sold the idea in the description of raspberry pi camera that it can easiliy operate at such framesize/fps
<f_ridge> <x​2x6_/D> Does this mean I can theoretically just write this to config.txt?
<f_ridge> <c​lever___/D> yeah
<f_ridge> <G​itHub Lines/D> ```c
<f_ridge> <G​itHub Lines/D> ret = clk_set_min_rate(dev->vpu_clock, MIN_VPU_CLOCK_RATE);
<f_ridge> <G​itHub Lines/D> ```
<f_ridge> <c​lever___/D> thats where the linux driver manages the clock
<f_ridge> <x​2x6_/D> found this
<f_ridge> <c​lever___/D> yeah, thats the default max for each model
<f_ridge> <c​lever___/D> but there is also a min, and if you set both min and max to 500mhz, it will just never change clock again
<f_ridge> <x​2x6_/D> #define MIN_VPU_CLOCK_RATE (250 * 1000 * 1000)
<f_ridge> <c​lever___/D> ```
<f_ridge> <c​lever___/D> core_freq
<f_ridge> <c​lever___/D> core_freq_min
<f_ridge> <c​lever___/D> ```
<f_ridge> <c​lever___/D> set both of those the same, and the code wont need to change
<f_ridge> <c​lever___/D> which model of pi are you on again?
<f_ridge> <x​2x6_/D> here VPU and CORE are synonyms?
<f_ridge> <c​lever___/D> yep
<f_ridge> <x​2x6_/D> 3b+
<f_ridge> <c​lever___/D> then just set both settings to 500
<f_ridge> <x​2x6_/D> Hm, Ok
<f_ridge> <x​2x6_/D> what about the emmc / arm clocks?
<f_ridge> <x​2x6_/D> do you suggest boosting them ?
<f_ridge> <c​lever___/D> arm clock you could also try boostin, emmc i think needs to stay at 50mhz, anything more, and your overclocking the SD card itself
<f_ridge> <x​2x6_/D> Is that also from config.txt? I am asking because I need info that in practice this worked. Some time ago I tried mailbox API altering clocks, it did not work - get clock frequency with the same ID i have just modified returned same values as initial (non-modified)
<f_ridge> <c​lever___/D> arm clock is also in `config.txt`
<f_ridge> <x​2x6_/D> About sdcard clocks. It's nice in STM32 cube program to see the whole clock topology. I don't exactly get how the clocks work on a raspberry pi.
<f_ridge> <x​2x6_/D> Comparing with STM32 - datasheet shows that peripherals are basicly clocked by a bus on which they are located - AHB / APB1 / ABP2, etc. and then you see who clocks these buses and with which divisors, etc .
<f_ridge> <x​2x6_/D> I would say, mapping this knowledge to Raspberry PI, lets say I want to understand if my EMMC controller is limited by performance from raspberry pi side and I can do something with that. ?
<f_ridge> <x​2x6_/D> So I start with trying to understand what it's source clock and divisor? But probably VCOS is in charge of that and will not let me mess with divisors.
<f_ridge> <x​2x6_/D> But also I see that in bcm2835_emmc peripheral there are control registers that also set divisors.
<f_ridge> <x​2x6_/D> But also my mbox_get_clock_frequency(EMMC) shows 200MHz, not 50..
<f_ridge> <c​lever___/D> each label in the green region has its own clock mux (a duplicate of the nearby white box), and its own divider, and some dividers can be fractional
<f_ridge> <c​lever___/D> ```
<f_ridge> <c​lever___/D> #define CM_EMMCCTL HW_REGISTER_RW( 0x7e1011c0 )
<f_ridge> <c​lever___/D> #define CM_EMMCDIV HW_REGISTER_RW( 0x7e1011c4 )
<f_ridge> <c​lever___/D> ```
<f_ridge> <c​lever___/D> this controls the sdhci controller i believe
<f_ridge> <c​lever___/D> sdhost i think is on the core clock