<abws>
Kwiboo: yes helios64. Until now I have been unable to run u-boot tpl (ie without rockchip DDR blob), because with u-boot TPL running "for i in $(seq 1 100);do python3 -c "import pkg_resources; pkg_resources.parse_version('1')" || break;done" a few times under Linux always get me a segfault
<abws>
do you have a stable helios64 while switching cpu frequencies (I have stable little cpu frequency switching but cannot switch big cpu more than a hundred times fast)
<abws>
or did you make a code port of the existing helios64 code from armbian without the HW?
mmu_man has quit [Ping timeout: 255 seconds]
<abws>
because my main worry with this board is that the instability with the big cpu frequency switching is a hardware issue (ie helios64 seems the only board where users are still complaining about this instability ... but maybe the other boards have no users anymore or all these users disabled frequency switching asthe helios64 users did.
<abws>
I still have to find a user of an rk3399 board to try my cpu frequency switching stress test to see if their baords are really free of this issue (I can reproduce the crash on helios64 quite easily)
vagrantc has quit [Ping timeout: 255 seconds]
jclsn has quit [Ping timeout: 264 seconds]
jclsn has joined #u-boot
enok has joined #u-boot
naoki has quit [Quit: naoki]
enok has quit [Quit: enok]
enok71 has joined #u-boot
enok71 is now known as enok
ikarso has joined #u-boot
enok has quit [Ping timeout: 272 seconds]
vagrantc has joined #u-boot
sakman has joined #u-boot
vagrantc has quit [Quit: leaving]
qqq has joined #u-boot
stefanro has joined #u-boot
redbrain has quit [Read error: Connection reset by peer]
redbrain has joined #u-boot
monstr has joined #u-boot
ikarso has quit [Quit: Connection closed for inactivity]
goliath has joined #u-boot
gsz has joined #u-boot
prabhakar has quit [Ping timeout: 255 seconds]
mckoan|away is now known as mckoan
ikarso has joined #u-boot
ldevulder has joined #u-boot
zear_ has joined #u-boot
zear has quit [Ping timeout: 268 seconds]
zear_ is now known as zear
sszy has joined #u-boot
naoki has joined #u-boot
naoki has quit [Client Quit]
<Kwiboo>
abws: correct, I have no helios64 hw, I only tried to import device tree from linux and apply similar tweaks and fixes I have prepared for other rk3399 boards, sounds like using rockchip tpl/ddrbin solves your issue?
<Kwiboo>
using CONFIG_ROCKCHIP_EXTERNAL_TPL=y instead of CONFIG_TPL=y should help ask for ROCKCHIP_TPL env with a path to a ddrbin blob instead of using u-boot TPL
enok has joined #u-boot
tgamblin has joined #u-boot
joeskb7 has joined #u-boot
matthias_bgg has joined #u-boot
enok has quit [Ping timeout: 255 seconds]
enok has joined #u-boot
ladis has joined #u-boot
naoki has joined #u-boot
enok has quit [Ping timeout: 255 seconds]
enok has joined #u-boot
tgamblin has quit [Ping timeout: 255 seconds]
slobodan has joined #u-boot
enok has quit [Ping timeout: 268 seconds]
enok has joined #u-boot
naoki has quit [Quit: naoki]
enok has quit [Ping timeout: 264 seconds]
<mwalle>
btw, does anyone else notice that more an more bootloader/tf-a/vendor crap relies on GPT partitions? Eg. some store their bootloader there and (apparently) hardcode the partition name in their bootrom. or some store their ram calibration data there. I'm not sure that is really a good UX, think of "Use entire disk" in the installer for example.
naoki has joined #u-boot
naoki has quit [Client Quit]
tgamblin has joined #u-boot
mckoan is now known as mckoan|away
enok has joined #u-boot
dsimic has quit [Ping timeout: 260 seconds]
dsimic has joined #u-boot
monstr has quit [Ping timeout: 252 seconds]
<calebccff>
mwalle: qualcomm do this, kinda, but the bootloader is stored on dedicated UFS boot LUNs, and most of the annoying other vendor stuff is stored on a small LUN, so you can usually wipe the big one where your OS is entirely and reformat it
ladis has quit [Quit: Leaving]
mmu_man has joined #u-boot
<mwalle>
calebccff: but not everyone has UFS :)
<calebccff>
mwalle: right, mmc has rpmb but not the same ability to split the storage in LUs...
<mwalle>
calebccff: you mean the boot partitions, yeah
<calebccff>
yeah
<mwalle>
i'm not sure what I should think of that development though
<calebccff>
mwalle: we have the exact same issues with postmarketOS, and the solution was basically to just make our own disk image and flash it to one of the partitions, then the initramfs goes through running kpartx until it finds the right disk image
<mwalle>
calebccff: ok, but that are mobile phones, right?
<calebccff>
yeah
<calebccff>
which have a bunch of whacky partitions
<mwalle>
I'm speaking of "normal" PC-like ARM boards
<calebccff>
what's the difference?
<mwalle>
calebccff: basically i want to have the whole storage available for the user and there should be some kind of bios/bootloader regardless what's in the emmmc
<mwalle>
that's what the emmc boot partition is for
<mwalle>
calebccff: i.e. if you wipe your HDD you still have your bios on your pc
<mwalle>
and you are able to install a new OS
<calebccff>
mwalle: well yeah we want the same thing on the phones, it's very dumb to have all these extra partitions
<mwalle>
calebccff: ah ok, yes i agree .)
<calebccff>
the phone vs "PC-like ARM board" dichotomy is a myth invented by big Apple to sell your more macbooks
<calebccff>
the sad part is it's near impossible to actually do anything about this
<calebccff>
qcom phones have a locked down unmodifiable bootloader that reads at least 6 partitions before handing over control to Linux, if any of those is corrupt, missing, or modified then the device won't boot
<marex>
mwalle: ST ?
<mwalle>
marex: which one? mediatek and microchip
<marex>
mwalle: all of the MP use some sort of GPT shenanigans
<marex>
mwalle: tbh I am really disappointed in TFA development, I think the best course of action would be to assimilate pieces from it into U-Boot and stop depending on it
<mwalle>
marex: i mean it might make sense for banana pi boards etc, where you have some kind of sdcard..
<mwalle>
marex: i can feel you, but arm is pushing hard..
<marex>
mwalle: yes, too hard, which is the actual problem here
<marex>
mwalle: I am starting to feel the disregard for any sort of community input, which really rubs me the wrong way
<mwalle>
marex: but aparently ARM wants to achieve the same UX as PCs, with their systemready stuff
<mwalle>
but.. I'm not sure it really helps to put every crap into the storage which a user could delete and then render the system unusable
monstr has joined #u-boot
<calebccff>
marex: think we could pull hypervisor stuff into U-Boot as well? Xen is cool and all but it would be nice to just have something more minimal (... and that can do the m1n1 fancy debugging stuff :D)
<calebccff>
./hj
<marex>
calebccff: I think we need something like U-Boot SPL running as BL2, U-Boot as BL31 in EL3, and then U-Boot should start the next stages, like PSCI provider stage or OpTee before booting Linux
<marex>
calebccff: i.e. BootROM->SPL->U-Boot->PSCI (can be some U-Boot xPL stuff too)->OpTeeOS->Linux
<marex>
calebccff: that way, U-Boot can still be used as debug tool in EL3
<calebccff>
what is BL? how is it different to EL?
<calebccff>
sounds cool though
* calebccff
would love to dig deeper into this stuff on qcom boards, the ChromeOS folks did some interesting things with Coreboot support
<marex>
BL2 is ~SPL stage, BL31 is ~U-Boot stage
<marex>
EL is exception level
<calebccff>
ah ok, was getting confused because they look so similar :P
<calebccff>
qcom bootlflow atm for me is like BROM->(secret special EL3 stuff)->SBL->TZ/el3 init->SBL->hyp->EDKII->ABL (Android bootloader EFI app)->U-Boot->Linux
<calebccff>
it can also be BROM->qc_sec->sbl->tz->U-Boot and then we run in EL2, but the DSPs stop working in Linux because they rely on the hypervisor to kickstart them usually
<marex>
jeezes
<mwalle>
that looks really KISS
<marex>
layers on top of layers on top of layers of seeekrit sauce
<calebccff>
yeah ):
<calebccff>
on the production phones that first one is the best we'll ever get
<calebccff>
but on SB-off boards (that don't have OEM keys fused into them) we can at least theoretically replace most of it
<calebccff>
although it's only really *easy* to replace UEFI, or gain EL2 by flashing U-Boot to the hypervisor partition
<calebccff>
yes, there is a partition which is just an ELF file containing the hypervisor
<calebccff>
would be very cool to flash U-Boot SPL there
* mwalle
is curious what does the hyp do?
<calebccff>
prior to ~2019 mostly just security stuff, it partially emulates the SMMU (hence https://source.denx.de/u-boot/u-boot/-/blob/master/drivers/iommu/qcom-hyp-smmu.c?ref_type=heads), prevents you from touching certain registers, and even blocks writes to certain regions on the SPMI bus (so you can't configure the regulators on the PMIC yourself, you have to go through the RPMh co-processor)
<calebccff>
the modern one is Gunyah though and can actually run multiple VMs, so it can do the Android microVM stuff, and whatever automotive needs I guess
<marex>
mwalle: builds a walled garden on the platform, so you cannot access the hardware directly without qcom approving it
<calebccff>
^^
<calebccff>
unless your board is unfused in which case you can just flash U-Boot :D
<marex>
calebccff: so why not stick U-Boot under all these layers of goo and start all those layers from U-Boot instead ?
<calebccff>
well mostly cuz I'm not smart enough to figure out how to make that work
<calebccff>
but really it's hard to do without side effects, everything is so tightly integrated
<calebccff>
you can't "just replace" EL3, a whole bunch of linux drivers make SCM calls to configure like, shared memory permissions
<calebccff>
also TZ is responsible for configuring a bunch of the XPUs on boot, those are little critters that sit on the bus and protect peripherals from accessing memory regions
<calebccff>
if you don't configure them you'll either have a very insecure system or more likely one that doesn't work at all
monstr has quit [Remote host closed the connection]
Leopold has joined #u-boot
abws has joined #u-boot
<abws>
Kwiboo: rockchip DDR bin does not solve board instability while doing cupfreq on big cpus. Only the usersapce python3 segfaulting. I want to try with u-boot TPL to see if the instability could be related to the rockchip DDR bin (shot in the dark)
slobodan has quit [Read error: Connection reset by peer]
slobodan has joined #u-boot
sakman has quit [Ping timeout: 272 seconds]
slobodan has quit [Read error: Connection reset by peer]
slobodan has quit [Read error: Connection reset by peer]
slobodan has joined #u-boot
shashankx86 has quit [Ping timeout: 250 seconds]
___nick___ has quit [Ping timeout: 255 seconds]
slobodan has quit [Read error: Connection reset by peer]
gsz has quit [Ping timeout: 252 seconds]
slobodan has joined #u-boot
mmu_man has quit [Ping timeout: 268 seconds]
ikarso has joined #u-boot
mmu_man has joined #u-boot
___nick___ has joined #u-boot
___nick___ has quit [Client Quit]
___nick___ has joined #u-boot
tgamblin has quit [Ping timeout: 255 seconds]
___nick___ has quit [Ping timeout: 256 seconds]
<abws>
Kwiboo: your branch gives me "Returning to boot ROM...\n SPL spl_early_init() failed: -96"
<abws>
I am trying to enable debug log but somehow fails to it
<manawyrm>
abws: linux has a built-in memtest functionality (via cmdline)
<manawyrm>
try that first and let it do 100 cycles or so
<manawyrm>
if that completes, it's not the ram
<abws>
manawyrm: thanks I already did (and my instability test case was reproduced by at least another user that though his board was stable).
<abws>
could hte ddr and cpu_b clocks be related in some way?
<manawyrm>
Why are you suspecting the RAM if memtest passes?
<abws>
somehow python3 segfaulting when using u-boot TPL (below v2023.01 at least) , though I don't remind if I tested memtest with u-boot TPL
<abws>
maybe I am wrong in thinking the instability could be related to ddr timings, but in a a way this work would be of use. Currently python3 segfaults with u-boot TPL and not rockchip DDR blob. I wanted to also find if the fixes that I believe were incldued above v2023.04 (training the ddr at 400MHz or was it before) helped with this TPL issue
<cambrian_invader>
the linux memtest is not so comprehensive, since it does not test memory used by the kernel itself
<abws>
about hte cpufreq instability when switching big cpu of rk3399 on helios64 I am beginning to lose all hope to get this sorted out. Could it be the regulator for the big cpu is faulty ? (these boards were built during Covid ...
<cambrian_invader>
that said, the POST in U-Boot is not too much better (and it is hard to configure)
<cambrian_invader>
and tbh booting linux is the best memtest...
<abws>
the thing is another board with rk3399 and lpddr4 was unstable for a long time ago. I don't know if it is still unstable. I though that maybe something was wrong only with these lpddr4 timing for rk3399
<abws>
this wa nano pi m4v2 I believe. Since then I cooked a testcase that switch the available frequencies a lot of times and can reprodiuce easily. I believe I reproduced with the 4.4 kernel. I doubt this is a kernel issue or at least if so that would be something only helios64 use
<abws>
my knowledge is pretty limited. So I tend to try hypothesis that might be absurd (for one I wonder if when I change the cpu frequency this could make the ddr access unreliable by affecting the ddr clock
<manawyrm>
... did you check the voltage rails already? under load/switching conditions with a sufficiently fast scope?
<abws>
thus the memtest would succeed at boot because the cpu are at a specific freq but it would not when the cpu frequency changes
<abws>
manawyrm: with a multimeter ?
<manawyrm>
abws: no chance
<manawyrm>
well... if a multimeter already shows a problem, then your problem is very big
<manawyrm>
:P
<manawyrm>
"boards were built during Covid" sounds a lot like custom boards?
<abws>
oh a scope, no I do not have that
<manawyrm>
were these properly qualified?
<manawyrm>
as in: are the voltage regulators "known good"?
<abws>
not custom boards, but helios64 from Kobol
<abws>
the boards were tested before shipping
<abws>
they did not gave the schematics (well only tiny portions for specific chips in their wiki)
<manawyrm>
*sigh*
<manawyrm>
and those boards are problematic in general?
<manawyrm>
as in others also have that problem?
slobodan has quit [Remote host closed the connection]
<manawyrm>
wow, that's really not a lot of info
<abws>
from armbian forum a lot. They were told to set cupfreq to a specific range if not a fixed frequency
<manawyrm>
... from what I can tell, they're using the "normal" RK808 PMIC for the voltage supply
slobodan has joined #u-boot
<manawyrm>
and it's apparently connected via i2c
<abws>
I managed to find phots of the naked boards were we can markings on IC
<manawyrm>
nice
<manawyrm>
the RK808 has several supply rails (which are individually controllable)
<manawyrm>
I think I'd try to identify which is which first (for example from a device tree if available)
<manawyrm>
and then try and measure those rails
<abws>
I suspect the big cpu have a regulator that is bad, someone on #electonics help me
<manawyrm>
There's nothing you can do if they messed their design up badly, but it's "just" an RK3399
<manawyrm>
Those are tough :P
<manawyrm>
a) is the voltage correct at all
<abws>
how do you measure rails (I mean how do you find where to measure them)?
<manawyrm>
hm, yeah, needs a bit of electronics experience. Typically, you'd measure caps around the voltage regulators against GND
<manawyrm>
ideally, this would be done with a scope to catch dropouts, oscillations, etc.
slobodan has quit [Read error: Connection reset by peer]
<manawyrm>
aren't these full schematics and PCB component lists?!
slobodan has joined #u-boot
<abws>
manawyrm: thanks a lot!
<manawyrm>
ohhhh
<manawyrm>
from a quick look into those schematics, I would double-down on my suspicions regarding voltage rails
<abws>
I found my log from ##electronics, he told me there ought to have a 12>5V converter before the RK808D as its maximum input is 5.5V, that I should check it
<manawyrm>
they're using seperate, discrete regulators for the big, little and GPU cores
<manawyrm>
all of them are software controlled
<manawyrm>
typically, you would tell the kernel about them and it would dynamically adjust the voltage on those regulators for each CPU frequency
<abws>
the schematics page must be new (google tells January 2024 but that might be last update). THey were asked for for years and the reply was always that the only available parts were in the wiki. Indeed now the schematrics are second in the google result
<manawyrm>
side note: there's also #linux-rockchip on this server, which might be more fitting
<manawyrm>
I don't think u-boot is at fault here :D
<manawyrm>
i mean -- that device tree has all the voltage regulators defined (if they work/can be configured by the kernel is another question)
<manawyrm>
but you might as well disable all dynamic voltage control for now and set every voltage fixed at the max (don't fuck this up, wrong values can fry stuff!)
<manawyrm>
and then see if it's stable (cool the thing well while you do that)
<manawyrm>
if it isn't stable then, the board is fucked electrically (and needs some rework/reengineering)
<manawyrm>
if it is, you need to look into the operating power points
<abws>
mind the discrete regulators should already be setup correctly ("silergy,syr827" for cpub) and the voltage are defined globally for all rk3399 boards. I don't see how the dts could be badly configured.
<abws>
I suspected that somehow any converter was faulty and the voltage set in the dts was no supplied to the devices
<abws>
either way thanks for the schematics
<manawyrm>
as I said: Check voltages :)
<manawyrm>
if the voltages are wrong: There's your problem :)
<abws>
as of now I cheked the voltage with a multimeter on the board after the power supply (I had a little above 12V for 12V )
<manawyrm>
"the voltage are defined globally for all rk3399 boards" -- yes, but that doesn't help if your board is badly designed
<abws>
but indeed if the issue is voltage that is not related to u-boot.
<abws>
manawyrm: agreed about the design
<manawyrm>
I've only had a very, very, very brief look at the schematics and they're already violating a number of things that the manufacturer of the regulator IC recommends.
<manawyrm>
That can be OK -- if you know what you're doing -- but other things in the design give me the idea that they don't
<abws>
sad, will have to find another NAS. It is not that easy to find a >=5 bays NAS that consumes less than 40W and enable you to install a custom OS :-/
<abws>
but that is OT
<manawyrm>
true. the other RK3399 platforms are worth a look, though. the new N100 boards as well.
<abws>
that is why I put so much energy in finding out if this board was really faulty. Maybe I can up the voltages for the big cpu for the OPP.
naoki has joined #u-boot
<manawyrm>
raise all of them to the allowed max
<manawyrm>
just wastes power
<manawyrm>
(for now)
<abws>
I already started looking after a new NAS, The Intel boards seemed great (espcially since I want video HW accel) but the NAS seems to have issue with the low PCIe lines count, so they tend to make tradeoffs
<abws>
manawyrm: ? you mean I should try to raise them all to max or that is useless and a waste of power?
<manawyrm>
abws: you should try to raise them all to the max for a test
<manawyrm>
and see if it gets stable then
<manawyrm>
(which will waste power, but is a good test)
slobodan has quit [Ping timeout: 268 seconds]
<abws>
thanks
<abws>
by the way can I completely exclude the option of bad ddr timing (or maybe bad ddr voltage)?
<abws>
in u-boot I mean
<abws>
seems there were very few rk3399 boards with lpddr4
<abws>
later on I will still want to fix u-boot TPL giving python3 segfaults while rockchip DDR bin does not, but indeed that is less critical (if the board is broken by design maybe it is better to let it die)
<manawyrm>
iirc the Pinebook Pro was LPDDR4 as well
<abws>
do I risk damaging the cpu by running it constantly at or close to its max voltage?
<manawyrm>
no, just keep it cool
<manawyrm>
in the Pinebook Pro we constantly overclocked all the boards (for everyone)