Tartarus changed the topic of #u-boot to: SOURCE MOVED TO https://source.denx.de/u-boot/u-boot.git / U-Boot v2021.04, v2021.07-rc4 are OUT / Merge Window is CLOSED and next is OPEN / Release v2021.07 is scheduled for 05 July 2021 / http://www.denx.de/wiki/U-Boot / Channel archives at https://libera.irclog.whitequark.org/u-boot
akaWolf has quit [Ping timeout: 268 seconds]
akaWolf has joined #u-boot
vagrantc has joined #u-boot
Pali has quit [Ping timeout: 246 seconds]
LeSpocky_ has joined #u-boot
LeSpocky has quit [Ping timeout: 256 seconds]
akaWolf has quit [Ping timeout: 265 seconds]
akaWolf has joined #u-boot
mmu_man has joined #u-boot
akaWolf has quit [Ping timeout: 272 seconds]
akaWolf has joined #u-boot
vagrantc has quit [Ping timeout: 256 seconds]
mmu_man has quit [Ping timeout: 246 seconds]
akaWolf has quit [Ping timeout: 272 seconds]
alpernebbi has joined #u-boot
akaWolf has joined #u-boot
redbrain has joined #u-boot
Pali has joined #u-boot
jwillikers has joined #u-boot
redbrain has quit [Ping timeout: 272 seconds]
urja has quit [Read error: Connection reset by peer]
urja has joined #u-boot
<rfried> There's a method used by SOCs, utilizing the L1/L2 caches to load the boot-loader before DDR is initialized.
<rfried> Does it have an official name ? I'm trying to find reference to it in google and can't find any.
alan_o has quit [Ping timeout: 250 seconds]
alan_o has joined #u-boot
jwillikers has quit [Remote host closed the connection]
ecrn has joined #u-boot
vagrantc has joined #u-boot
<ecrn> hi, any idea why could booting Linux kernel on i.MX6DL with one u-boot version (provided by my SoM manufacturer, based on freescale tree) cause different results on a specific benchmark from ssvb's tinymembench, than booting it with mainline u-boot which I ported for the SoM?
<ecrn> I suspected misconfigured DDR controller in SPL, but I checked it twice and also, using the SoM manufacturer's SPL and just the u-boot image from mainline also causes the issu
<ecrn> same thing for using mainline SPL and SoM manufacturer's u-boot image
<ecrn> only the SoM manufacturer's SPL + uboot combination give the better benchmark results
<ecrn> the benchmark I'm using is https://github.com/ssvb/tinymembench, and the specific test with strikingly different results is C fill / memset
<ecrn> I got C fill : 1978.2 MB/s (1.0%) and standard memset : 1974.9 MB/s (3.4%)
<ecrn> on the SoM manufacturers combination
<ecrn> and about 500MB/s using mine, from mainline
LeSpocky_ is now known as LeSpocky
alpernebbi has quit [Ping timeout: 256 seconds]
mmu_man has joined #u-boot
<cyrozap> rfried: The coreboot project calls that "cache as ram", I think.
<marex> ecrn: PL310 cache config ?
<marex> ecrn: there is this fill full line of zeroes bit in the L2 cache controller
<marex> one u-boot might configure it, the other not
vagrantc has quit [Quit: leaving]
<ecrn> ok, but isn't kernel booted with data caching off, and then it configures it itself?
<marex> see arch/arm/mach-imx/cache.c
<marex> ecrn: a lot of the l2x0 config depends on what bootloader configured there, although you can tweak some of it in DT
<marex> see Documentation/devicetree/bindings/arm/l2c2x0.yaml
<ecrn> my u-boot configuration sets CONFIG_SYS_L2_PL310 (from ./include/configs/mx6_common.h included from the board .h file), and CONFIG_SYS_L2CACHE_OFF is not set in the config
<marex> ecrn: git grep -i line.of.zero in both u-boot and linux
<marex> that is I think what you are looking for
<marex> /* enable BRESP, instruction and data prefetch, full line of zeroes */
<marex> setbits_le32(&pl310->pl310_aux_ctrl,
<marex> L310_SHARED_ATT_OVERRIDE_ENABLE);
<marex> L310_AUX_CTRL_INST_PREFETCH_MASK |
<marex> L310_AUX_CTRL_DATA_PREFETCH_MASK |
<marex> this is what socfpga does ^ , I think imx6 does something similar
<marex> ecrn: well, verify v7_outer_cache_enable() is called in your u-boot port at all
<marex> and you should be able to read out the pl310 registers to verify the bits are all set as you expect them to be set (in both u-boot variants)
<marex> ecrn: you might also want to dump ACTLR and CPACR settings, see arch/arm/mach-socfpga/board.c s_init() , but that's unlikely the problem on your side ; it is however related to the floz
<ecrn> marex: the v7_outer_cache_enable() gets called, and the registers are: ACTLR: 0x00000000, CPACR: 0x00000000, if asm volatile("mrc p15, 0, %0, c1, c0, 1\n":"=r"(result):); is the right way
<marex> ecrn: then look at the pl310 aux control register
<ecrn> marex: the other u-boot doesn't seem to set pl310_aux_ctrl at all, at least grepping the source code didn't return any writes to it
<ecrn> and the mainline one does: setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
<rfried> cyrozap: Thanks!
<ecrn> marex: CONFIG_L2X0_CACHE (the u-boot dt driver) was disabled, but enabling it didn't seem to make a difference, and also disabling it and commenting out the setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE); so that there are no writes to the pl310_aux_ctl doesn't seem to make a difference
<marex> ecrn: dump the l2x0 registers in both downstream fork and mainline and compare them, maybe there is a difference
<ecrn> looks the same
<ecrn> I dumped it from board_late_init from u-boot, not SPL
<ecrn> hmm, well, I seemingly forgot the aux_ctrl :)
<ecrn> so pl310_aux_ctrl: 0x32450000 vs pl310_aux_ctrl: 0x32050000, and #define L310_SHARED_ATT_OVERRIDE_ENABLE(1 << 22) is 0x00400000
<marex> so mainline sets extra bit ?
<ecrn> yes
<marex> well, clear it and retry ?
<marex> and if that makes the difference, try to understand what it is all about
<ecrn> so the register is now pl310_aux_ctrl: 0x32050000, and C fill : 499.9 MB/s (0.5%)
<ecrn> standard memset : 499.5 MB/s (0.5%)
<ecrn> so didn't make a difference
<marex> so, its not dram and its not cache
<marex> CPU frequency ?
<ecrn> cpufreq says scaling_cur_freq is 996000
<ecrn> and also was the same on the other u-boot
<ecrn> kernel and userspace are exacty the same
<marex> ecrn: do other tests also show this performance degradation ?
<ecrn> doesn't seem so, but I didn't compare them carefully enough, the one with fill/memset was very evident
<ecrn> I'll run the full benchmark
<marex> ecrn: this memset test would likely benefit from this floz optimization, so I would dig around that some more
<marex> except you have to turn it on for both CPU and L2 cache controller
<marex> then when the CPU writes 32 bytes of zeroes into a cacheline, it gets flushed as burst write into DRAM or somesuch
<marex> ecrn: oh, btw, SCTLR is per-core, U-Boot only configures it for one core (0)
<ecrn> marex: https://nopaste.net/6XLP6yuKYX downstream
<ecrn> so other ones are usually better in the downstream
<marex> ecrn: the x4 difference in fill is odd
vagrantc has joined #u-boot
<marex> ecrn: does dmesg say something about L2C-310 ?
<marex> dig around linux arch/arm/mm/cache-l2x0.c and ev. add printks
<ecrn> yes, but no difference in dmesg output
<ecrn> odd thing is that both SPL and u-boot.img have to be from downstream to get the better performance, so maybe whatever it is, it is both the SPL and the u-boot.img from mainline that sets it and u-boot.img from downstream doesn't
<ecrn> marex: I will try to dig into it on monday, got to go for now, thank you for support
ecrn has quit [Quit: Client closed]
redbrain has joined #u-boot
redbrain has quit [Ping timeout: 258 seconds]
mthall has joined #u-boot
matthewcroughan_ has joined #u-boot
mthall has quit [Client Quit]
matthewcroughan has quit [Read error: Connection reset by peer]
mthall has joined #u-boot