f_ changed the topic of ##raspberrypi-internals to: The inner workings of the Raspberry Pi (Low level VPU/HW) -- for general queries please visit #raspberrypi -- open firmware: https://librerpi.github.io/ -- VC4 VPU Programmers Manual: https://github.com/hermanhermitage/videocoreiv/wiki -- chat logs: https://libera.irclog.whitequark.org/~h~raspberrypi-internals -- bridged to matrix and discord
jcea has quit [Ping timeout: 256 seconds]
Stromeko has quit [Quit: Going… gone.]
Stromeko has joined ##raspberrypi-internals
jcea has joined ##raspberrypi-internals
wael has quit [Ping timeout: 256 seconds]
wael has joined ##raspberrypi-internals
user_user has joined ##raspberrypi-internals
wael has quit [Ping timeout: 264 seconds]
<user_user> Hi. Anyone here?
<Bitweasil> user_user, I'll suggest sticking around for a while. This is a pretty low traffic international channel.
<user_user> Ok, I am not very used to irc's)
<user_user> Wanted to find someone who can lend brains to discuss baremetal sdcard speed optimization
<user_user> This is mostly Clever though) But maybe someone else also active here
<user_user> So we have an mailbox interface. In my baremetal project, I ask clock rates to all peripherals. EMMC clock has ID==1, when I ask about this clock - i get a value of 200MHz. I want to change it with set_clock_rate API, for simplicity I first want to make it less and see the result by requesting get_clock_rate again. But I still see 200MHz.
<Bitweasil> That's mostly a question for clever, yes. What gen Pi? The earlier ones suck, there's just nothing you can do about the interface being glacial.
<user_user> I want to stick to 3B+ for just a little bit more time
<Bitweasil> IMO, not worth the hassle, they don't really overclock the card well, they don't support the low voltage high speed modes, etc. Just let it run at whatever it can and call it good, cache stuff if you need.
<clever> user_user: *waves*
<clever> i would start by trying to probe the actual clock pin on the sd socket itself, and confirm if its running at 50mhz
<user_user> I was thinking that too, but at this point I want at least the performance of stock linux kernel. currently I write to SD card at speed of 2.5Mbits per second , but! what makes me still try is that I have two classes of SDcards - v10 and v30, they have different speeds according to the classification, but in my code the writing speed is the same for both
<user_user> I assume it is possible to hit the limit of writing speed on the SDcard itself and then to consider this a good result
<clever> there are ~3 speeds to be aware of
<user_user> Hi, clever
<Bitweasil> The Pi3 SD interface, as I recall, can't run most of them anywhere near "competent or capable speeds."
<clever> 1st, is the speed of the SD bus, i believe the pi0-pi3 family top out at 50mhz 4bit SDR, so 200mbit
<user_user> Probing the clock pin is a good advice..
<clever> 2nd is the clock speed of the sd/emmc peripheral, which is divided down to create that 50mhz
<clever> and 3rd is the speed of the internal axi bus that the data flows over
<user_user> I don't have an oscilloscope at this time though
<clever> what about a logic analyzer?
<clever> or an rp2040/pico
<user_user> nothing, I am in the middle of relocation process. All stuff in storage
<user_user> )
<clever> ah
<user_user> ok, what about rp2040 ?
<clever> the rp2040 has a frequency counter in its PWM block
<clever> so you could measure the frequency of the clk pin
<user_user> is it running during communication or always?
<clever> i'm not actually sure
<clever> another thing, is how does emmc handle the clock divisor
<user_user> So , do you think that emmc clock + the clock divisor set in one of emmc control registers actually form the 50MHz CLK signal on the sdcard pin?
<clever> yeah
<clever> its the same in sdhost
<clever> the SH_CDIV register sets the divisor, to go from i think core0 to the SD bus clock
<user_user> Easiest is to check this part first. Because I already have a 64*512 bytes write outs to sdcard and I measure them. So if I change EMMC clock downwards, then without touching divisors - we could see that
<clever> and this code decides what clock to use, based on the version the card claims to support
<user_user> So in your code line 476 - does this mean that 200MHz are divided by 10 to receive 20MHz on sdcard pin?
<clever> assuming core0 was still at 200mhz, yeah
<clever> here is similar code for emmc
<clever> so the question then becomes, what are you writing to CONTROL1 ?
<user_user> Haha line 235 thats ugly)
<clever> thats what a lack of printf does to people :P
<clever> checking the header files, these are the fields in CONTROL1
wael has joined ##raspberrypi-internals
<clever> bits 6/7 and 8-15 hold the clock
<user_user> What's 'c=41666666/f'?
<clever> i assume f is the desired frequency, and 41666666 is what the code assumes the EMMC clock is set to
<clever> so 41666666 / 25mhz gives 1.66
<user_user> In bcm2835 doc page 74 . 15:8 - SD clock base divider LSB.
<clever> yeah, it looks like 15:8 is the lower 8 bits of the divisor
<clever> and 7:6 is the upper 2 bits
<clever> forming a 10bit divisor
<user_user> clock 0 rate: 00000000 (0 KHz)
<user_user> clock 1 rate: 0bebc200 (200000 KHz)
<user_user> clock 2 rate: 02dc6c00 (48000 KHz)
<user_user> clock 3 rate: 23c34600 (600000 KHz)
<user_user> clock 4 rate: 1dcd6500 (500000 KHz)
<user_user> clock 5 rate: 0ee6b280 (250000 KHz)
<user_user> clock 6 rate: 0ee6b280 (250000 KHz)
<user_user> In my FW I use GET_CLOCK_RATE mailbox API and get this
<user_user> which of this stands for 416666... in that code?
<clever> that code may have assumed wrong
<clever> let me see what linux does...
<clever> linux tends to give the cleanest answers
<user_user> So, what I can try to do - is then change the divisor to bigger value and see if my write speed got slower
<clever> yep
<clever> in linux, the sdhci core will write a 0 to the SDHCI_CLOCK_CONTROL register, compute the proper divisor, and then enable the clock
<clever> and enabling the clock involves writing the internal clock enable flag, waiting for the internal clock to be stable, maybe writing to the PLL enable register (depending on which sdhci core you have), slapping the "clock card en" bit onto the divisor, and writing it to SDHCI_CLOCK_CONTROL
<user_user> well, we have all that more or less
<clever> bcm2835_mmc_writew(host, clk, SDHCI_CLOCK_CONTROL);
<clever> drivers/mmc/host/sdhci.h:#define SDHCI_CLOCK_CONTROL 0x2C
<clever> and 2c, is called CONTROL1 in the rpi headers
<clever> its not the command or transfer-mode register, so that is just a plain 32bit write
<clever> i believe
<clever> that just leaves the question of what the reference clock is
<user_user> ok, thats interesting
<user_user> I've now tried with 0,1,2,3 and 200 as divisor in bits 8:15
<clever> { .compatible = "brcm,bcm2835-mmc" },
<clever> this driver claims to use this name
<user_user> So with 0 as a divisor value - it doest work at all)
<clever> arch/arm/boot/dts/bcm270x.dtsi: compatible = "brcm,bcm2835-mmc", "brcm,bcm2835-sdhci";
<clever> so its one of these entries
<clever> ah, a 0 means its doing something!
<user_user> With 1 same thing - so probably 200 / 1 MHz is too much
<user_user> With 2 the driver failed to initialized at the point of SEND_CSD , whatever it is
<user_user> With 3 the driver actually initialized and starting writing to SD card at better speed. Previously it was 98ms, now its 23.4ms
<clever> and in here, we can see that its using the BCM2835_CLOCK_EMMC clock from the `clocks: cprman@7e101000` node
<user_user> With 200 the driver is writing at speeds close to 198ms
<user_user> Ok, let me focus to follow your mindpath.
<clever> and bingo, its CM_EMMCDIV and CM_EMMCCTL
<user_user> Why you have decided to look at sdhci in the first place?)
<user_user> Just let me understand
<clever> user_user: the emmc controller is based on the sdhci standard
<clever> while the sdhost controller is entirely broadcom custom
<user_user> Ok, what then this means "clocks = <&clocks BCM2835_CLOCK_EMMC>;"?
<clever> linux has a central sdhci driver, that does sdhci_readw and sdhci_writew to access all SDHCI registers
<clever> user_user: that says to use the clock defined here
<user_user> So sdhci is generic object , of a kind, and we feed bcm2835_emmc as device complying to sdhci standard?
<clever> yeah
<clever> bcm2835_emmc deals with the problems the rpi has, like you cant do 16bit writes to the emmc
<user_user> Ok, starting to understand, so sdhci probably also dictates a certain regiter layout offsets and bits?
<clever> yeah
<clever> this function gets called when the sdhci core wants to do a write
<clever> it does a 32bit read, grabs the 16bit half your not changing, merges it with the 16bits you are changing, and then does a 32bit write
<clever> user_user: so, the next thing i would do, is print the contents of CM_EMMCDIV and CM_EMMCCTL
<user_user> ok, give me a minute. So 0x7e1011c0, but I should actually read at 0x3f1011c0, right?
<clever> yep
<user_user> For CM_EMMCCTL its 0x295, for CM_EEMCDIV its 0x5000
<clever> the lower 4bits of CTRL are the source, 5 means PLLC_PER, https://elinux.org/The_Undocumented_Pi#Clocks
<clever> CTL*
<clever> the next 4 bits, 9, means yo have enable set, and busy set, so it is active
<clever> and bit 9 is set, thats a 1 bit fractional divisor
<user_user> How do you know the layout? I am looking at the doc you provide, don't see it in plaintext)
<clever> or maybe fractional enable
<clever> this header
<clever> clk-bcm2835.c then says that its 4bits of int, and 8 bits of fractional
<clever> so 0x5000 means /0x50.00 i believe
<user_user> hm
<user_user> so its the divisor from PLLC_PER?
<clever> also, CM_GP0CTL and CM_GP0DIV have nearly the identical layout, and are documented in the bcm2835 pdf
<user_user> like EMMC_FREQ = PLLC_PER_FREQ/50?
<clever> PLLC_PER_FREQ/0x50 i think
<user_user> yes,,
<user_user> right
<clever> you can then either ask the mailbox what PLLC_PER_FREQ is, or peek at more control registers
<user_user> so if EMMC CLK shown as 200MHZ then PLLC_PER runs at 16000MHz?
<user_user> How do we ask about PLLC_PER_FREQ?
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_RESERVED 0
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_EMMC 1
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_UART 2
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_ARM 3
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_CORE 4
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_V3D 5
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_H264 6
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_ISP 7
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_SDRAM 8
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_PIXEL 9
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_PWM 10
<clever> yeah, the PLL isnt in that list
<f_ridge> <x​2x6_/D> #define MBOX_CLOCK_ID_EMMC2 11
<f_ridge> <x​2x6_/D> This comes from my mbox.h
<clever> probably best to just peek at more control registers
<clever> A2W_PLLC_PER has the PLLC_PER divisor
<clever> #define A2W_PLLC_PER HW_REGISTER_RW( 0x7e102520 )
<clever> and then there is one more you want to dump
<clever> A2W_PLLC_CTRL
<clever> #define A2W_PLLC_CTRL HW_REGISTER_RW( 0x7e102120 )
<clever> so basically, emmc_clk = (19.2 * pll_div) / pllc_per_div / emmc_div
<clever> pllc_div, and mhz
<user_user> ok, lets check
<user_user> PLLC_PER = 0x2, PLLC_CTRL = 0x21034
<clever> > 19.2 * 0x34
<clever> 998.4
<clever> lets just call that 1ghz, we didnt print the fractional part
<user_user> Ok, so tha's looking at the picture you figured that formula? emmc_clk =...?
<clever> so PLLC == 1ghz
ungeskriptet has quit [Ping timeout: 255 seconds]
<clever> and with _PER set to 2, thats 1ghz/2, or 500mhz
<clever> which is pretty typical
<clever> so thats just 500mhz / 0x50, based on what you saw in CM_EMMCDIV
<clever> or maybe 0x5
<clever> 0x5 sounds more right then 0x50
<clever> so we can assume the emmc clock is 100mhz
<user_user> While mbox api returns 200
<clever> oh, but the double flag
<clever> let me check that
<clever> *REG32(A2W_PLLC_ANA1) = A2W_PASSWORD | (prediv ? ANA1_DOUBLE : 0) | KI(2) | KP(8);
<clever> #define A2W_PLLC_ANA1 HW_REGISTER_RW( 0x7e102034 )
<clever> #define ANA1_DOUBLE (1<<14)
<clever> is bit 14 set on that register?
<clever> https://elinux.org/File:Raspberry_Pi_PLL.svg that enables the "ANA1 Prescaler" in here
<user_user> bit 14 is set
<clever> ah, that explains it then
<clever> > 19.2 * 0x34 * 2
<clever> 1996.8
<clever> so PLLC is then 2ghz, via this formula
<clever> and PLLC_PER is 1ghz, and EMMC is 1ghz/5, or 200mhz
<user_user> Cool
<clever> so now we can go back to SDHCI_CLOCK_CONTROL/EMMC_CONTROL1
<clever> #define EMMC_CONTROL1 HW_REGISTER_RW( 0x7e30002c )
<user_user> I observe very interesting thing now in parallel. As I've set divisor in EMMC_CONTROL register to 3 - the writing speed has increased but its not systematic. I would say some things are due to my scheduling, but it jumps from 14ms to 98ms
<user_user> I use DMA to write out 64*512 byte blocks
<user_user> and that is probably bus contention? Because probably videocore writes something in parallel
<user_user> ?
<clever> there should be a FIFO in the emmc, to handle that
<user_user> Yes, its 16Kbytes size, but I write continuosly
<clever> trying to find the right pdf on the above site...
<user_user> I will measure how much bytes videocore provides to me in buffers per second. Tomorrow.
<clever> you can also try doing a write without videocore streaming things
<clever> and see how fast you can write when not interrupted by things
<user_user> Good idea about doing writes without ext distractions!!!
<user_user> yep
<user_user> Ok, will keep you updated. Big thanks!
<clever> ah, found a doc
<clever> this one claims bits 6/7 are unused, and only 8-15 are clock div
<user_user> Yep, looks the same
<clever> but i notice its not a plain division, its got a lookup table, when in 8bit mode
<clever> what did you have in that register?
<user_user> git@github.com:zombie-engineer/t-799.git
<clever> ah nice, youve got some fancy gdb functions!
<user_user> Oww, why it has added commit
<user_user> Looks like I dont know what a permalink is)
<user_user> Not so fancy as openocd scripts I was doing yesterday)
<user_user> ok, see you later, have to sleep before work a little bit)
<user_user> thanks
<clever> so if you want the setup clock, you put 64 into there, 0x40, which means /128
<clever> so 1.5625mhz
<clever> and for normal mode, you put in 4, whcih means /8, or 25mhz
<clever> so your running the SD bus at half its normal rate
<user_user> you mean to check the slow down?
<user_user> I put 200 there it was slowed down but not by 200
<clever> change that 4 into a 2, and it will double the speed, and run at 50mhz
<clever> on line 20
<clever> thats the confusing lookup table from the pdf i linked
<clever> page 15 of 57_SDHC_60001334A.pdf
<clever> near the bottom of the page
<clever> a 4 means 200mhz/8, while a 2 means 200mhz/4
<user_user> aa ok
<clever> and 200 is not a valid value, it could do anything
<user_user> When I put 2 , it did not work, so I've put 3, I will check tomorrow more precisely with this new info
<clever> 3 is also not valid
<clever> but 50mhz mode, may also require switching voltage levels
<clever> i'm not familiar with those levels of SD
<clever> another pdf, says 25mhz 3.3v, is "default speed", and 50mhz 3.3v is "high speed"
<clever> but 50mhz 1.8v is "SDR25"