f_ changed the topic of ##raspberrypi-internals to: The inner workings of the Raspberry Pi (Low level VPU/HW) -- for general queries please visit #raspberrypi -- open firmware: https://librerpi.github.io/ -- VC4 VPU Programmers Manual: https://github.com/hermanhermitage/videocoreiv/wiki -- chat logs: https://libera.irclog.whitequark.org/~h~raspberrypi-internals -- bridged to matrix and discord
jcea has quit [Ping timeout: 268 seconds]
Stromeko has quit [Quit: Going… gone.]
Stromeko has joined ##raspberrypi-internals
ungeskriptet has quit [Quit: Contact links: https://david-w.eu]
ungeskriptet has joined ##raspberrypi-internals
wael has joined ##raspberrypi-internals
user_user has quit [Read error: Connection reset by peer]
jcea has joined ##raspberrypi-internals
Ad0 has joined ##raspberrypi-internals
Stromeko has quit [Quit: Going… gone.]
Stromeko has joined ##raspberrypi-internals
user_user has joined ##raspberrypi-internals
<f_ridge> <x​2x6_/D> Ok, now we're talking!
<f_ridge> <x​2x6_/D> 1. I only use ST_CLO to measure time of writing
<f_ridge> <x​2x6_/D> 2. I do not do MMAL parallel activity - just write 2 64 x 512 buffers at emmc speed.
<f_ridge> <x​2x6_/D> 3. I have put timestamps more precisely last timestamp is measured on DMA completed IRQ when last data byte lands on the sdcard, previously I checked time when sleeping task woke up after being notified by my kernel.
<f_ridge> <x​2x6_/D> 4. I am triyng different values of emmc divisor and actually see different speeds based on divisor value.
<f_ridge> <x​2x6_/D> 5. In this screenshot divisor is 0x80 , which is mapped to /256.
<f_ridge> <x​2x6_/D> I will show more screens to see the difference with smaller divisors
<f_ridge> <x​2x6_/D> divisor = 0x10 which maps to /32
<f_ridge> <x​2x6_/D> And this is the time with div=4 (so divided by 8).
<f_ridge> <x​2x6_/D> Anything below 4 does not work.(edited)
<f_ridge> <x​2x6_/D> So the constant part is some 100ms added on top always. Not sure what that is...
<f_ridge> <c​lever___/D> let me read the scrollback...
<f_ridge> <c​lever___/D> there will be 2 things controlling the speed of a write
<f_ridge> <c​lever___/D> first, is just the raw data transfer, how fast can you get a block from the pi ram to the SD ram, that is controlled by the divisor youve been messing with
<f_ridge> <c​lever___/D> second, is how long the card takes to write that to flash and send the all-good signal, that can only be improved by getting better cards
<f_ridge> <c​lever___/D> running `fstrim` on the card from linux can improve the the second one, for some cards
<f_ridge> <x​2x6_/D> Just additional thing I've noticed.
<f_ridge> <x​2x6_/D> If I put sleeps inbetween two consequtive writes - then the writes themselves are much faster.
<f_ridge> <x​2x6_/D> I think this happens because first emmc fifo get pumped by data, then DMA and the waiting code can proceed to the next write IO, they do that and next write might overflow FIFO and then DMA must wait until fifo becomes free by 1 item and so on
<f_ridge> <c​lever___/D> how does dma work with emmc?
<f_ridge> <c​lever___/D> and are you using write-multiple?
<f_ridge> <x​2x6_/D> I am using multiple writes, so a write command is split by two commands actually - are two CMDXX requests -first one is set CMD23 (SET_NUMBER_OF_BLOCKS) (=64) , next is CMD25 (WRITE_MULTIPLE)
<f_ridge> <x​2x6_/D> I am using multiple writes, so a write command is split by two commands actually - first one is set CMD23 (SET_NUMBER_OF_BLOCKS) (=64) , next is CMD25 (WRITE_MULTIPLE)(edited)
<f_ridge> <c​lever___/D> ah, good
<f_ridge> <c​lever___/D> i noticed a decent speed improvement, when i switched to read-multiple, and larger fat clusters
<f_ridge> <x​2x6_/D> Then CMD25 actually is run with DMA enabled. So EMMC peripheral paces DMA
<f_ridge> <c​lever___/D> do you tell emmc the dma addr, or do you configure the main soc dma?
<f_ridge> <x​2x6_/D> I dont need fstrim actually because I don't use filesystem. I just write to partition - btw - this is a great hint because fat32 has to allocate clusters and do lots of lookups
<f_ridge> <x​2x6_/D> of course they are cached as possible but still it cant cache all of it
<f_ridge> <c​lever___/D> ah, then you might want to try the discard/erase commands