#u-boot on 2022-03-10 — irc logs at libera.irclog.whitequark.org

2022-02-14 22:05 Tartarus changed the topic of #u-boot to: SOURCE MOVED TO https://source.denx.de/u-boot/u-boot.git / U-Boot v2022.01, v2022.04-rc2 are OUT / Merge Window is CLOSED / Release v2022.04 is scheduled for 4 April 2022 / http://www.denx.de/wiki/U-Boot / Channel archives at https://libera.irclog.whitequark.org/u-boot

00:00 cJ has joined #u-boot

00:18 GNUtoo has quit [Write error: Connection reset by peer]

00:18 prabhakarlad has quit [Quit: Ping timeout (120 seconds)]

00:19 swiftgeek has joined #u-boot

00:23 GNUtoo has joined #u-boot

00:53 prabhakarlad has joined #u-boot

01:49 camus has joined #u-boot

02:02 swiftgeek has quit [Remote host closed the connection]

02:10 swiftgeek has joined #u-boot

02:22 flyback has quit [Quit: Leaving]

02:22 swiftgeek has quit [Remote host closed the connection]

02:25 swiftgeek has joined #u-boot

02:28 flyback has joined #u-boot

02:31 thopiekar_ has joined #u-boot

02:31 thopiekar is now known as Guest9315

02:31 thopiekar_ is now known as thopiekar

02:31 Guest9315 has quit [Killed (osmium.libera.chat (Nickname regained by services))]

02:32 flyback has quit [Client Quit]

02:39 flyback has joined #u-boot

03:02 <Tartarus> sjg1: does the event test have some special requirements? it runs in CI but fails on bill-the-cat.

03:38 jclsn78 has joined #u-boot

03:41 jclsn7 has quit [Ping timeout: 252 seconds]

04:04 urja has quit [Ping timeout: 250 seconds]

04:18 urja has joined #u-boot

04:29 mmu_man has quit [Ping timeout: 252 seconds]

04:39 camus has quit [Read error: Connection reset by peer]

04:40 camus has joined #u-boot

05:33 vagrantc has quit [Quit: leaving]

05:59 mwalle has quit [Quit: WeeChat 3.0]

06:00 macromorgan_ has joined #u-boot

06:00 macromorgan is now known as Guest8877

06:00 Guest8877 has quit [Killed (silver.libera.chat (Nickname regained by services))]

06:00 macromorgan_ is now known as macromorgan

06:08 michalkotyla has joined #u-boot

06:09 sbach has quit [Read error: Connection reset by peer]

06:11 sbach has joined #u-boot

06:33 swiftgeek has quit [Ping timeout: 272 seconds]

06:42 stefanro has quit [Quit: Leaving.]

06:48 stefanro has joined #u-boot

07:02 stefanro has quit [Quit: Leaving.]

07:05 stefanro has joined #u-boot

07:16 guillaume_g has joined #u-boot

07:17 Rahix has quit [Quit: ZNC - https://znc.in]

07:23 Rahix has joined #u-boot

07:23 sszy has joined #u-boot

07:36 Guest3473 has joined #u-boot

07:37 mckoan|away is now known as mckoan

07:38 <Guest3473> Good Morning @ALL

07:39 lucaceresoli_ has joined #u-boot

07:39 <Guest3473> Is it possible to change u-Boot env that the andoid is opening a shell after boot whenn connect only is possible over a USB-Serial adapter ? And if yes how to do this ?

07:43 swiftgeek has joined #u-boot

07:48 Guest3473 has quit [Quit: Client closed]

07:48 Guest34 has joined #u-boot

07:50 frieder has joined #u-boot

07:56 ac_slater has quit [Quit: WeeChat 3.4]

08:00 milkylainen has joined #u-boot

08:01 zibolo has joined #u-boot

08:23 Harm has quit [Ping timeout: 240 seconds]

08:39 matthias_bgg has joined #u-boot

08:42 user_name has joined #u-boot

08:42 user_name has quit [Client Quit]

08:45 nacre has joined #u-boot

08:54 mwalle has joined #u-boot

09:00 cpackham[m] has quit [Quit: You have been kicked for being idle]

09:00 maxim[m] has quit [Quit: You have been kicked for being idle]

09:34 Thorn has quit [Ping timeout: 256 seconds]

09:37 Thorn has joined #u-boot

09:38 lucaceresoli__ has joined #u-boot

09:41 lucaceresoli_ has quit [Ping timeout: 256 seconds]

09:47 nacre has quit [Quit: leaving]

10:11 apritzel has joined #u-boot

10:12 mauro_anjo has joined #u-boot

10:24 mmu_man has joined #u-boot

10:28 lucaceresoli_ has joined #u-boot

10:31 lucaceresoli__ has quit [Ping timeout: 272 seconds]

11:54 tre has joined #u-boot

12:38 Guest34 has quit [Quit: Client closed]

12:43 tre has quit [Remote host closed the connection]

12:49 matthias_bgg has quit [Read error: Connection reset by peer]

12:50 matthias_bgg has joined #u-boot

12:56 sughosh has joined #u-boot

13:17 lucaceresoli__ has joined #u-boot

13:20 lucaceresoli_ has quit [Ping timeout: 272 seconds]

13:21 nacre has joined #u-boot

13:48 sughosh has quit [Ping timeout: 272 seconds]

13:56 Pali has joined #u-boot

13:57 <Pali> Hello! Is there any reason why aarch64 U-Boot mask System Error exception but does not mask Synchronous Abort exception?

14:00 <Pali> Masking of System Errors exceptions in U-Boot is a big issue... if (buggy) U-Boot code cause System Error then U-Boot do not see it and continue its operations as if nothing happened.

14:01 <Pali> And once U-Boot start booting kernel (which unmask System Error) then kernel is immediately killed by pending System Error caused by U-Boot.

14:02 sughosh has joined #u-boot

14:02 <Pali> But this System Error is reported from kernel context... which is really hard to debug that not kernel caused it, but rather bootloader.

14:02 <Pali> On the other hand Synchronous Abort exceptions are not masked by U-Boot and if buggy code triggers it then U-Boot run its do_sync() handler which reset CPU.

14:16 <apritzel> Pali: that is already fixed in -next, I believe

14:17 <apritzel> Pali: https://lore.kernel.org/u-boot/20220302224113.GV5020@bill-the-cat/

14:18 <apritzel> Pali: and yeah, we ran into the exact same issue multiple times already ...

14:20 <Pali> apritzel: Perfect, I'm going to test those patches!

14:25 <Pali> apritzel: Patches are working fine! System Error is unmasked and u-boot's handler is called, thanks!

14:25 <apritzel> Pali: great, thanks for the test! They should end up in mainline in April, I guess

14:43 <Pali> apritzel: Btw, do you know if there is some way/hack how to force aarch64 core to report aborts caused by store instruction as Synchronous External abort?

14:45 <apritzel> if you mean to catch "bus errors" synchronously: I don't think so, at least not architecturally

14:45 <Pali> external aborts caused by load are reported as synchronous (as core has to wait until data are ready) but aborts caused by store are reported as asynchronous as abort itself is delivered later...

14:45 <Pali> I mean data aborts caused by AXI slaves

14:45 <apritzel> yes, that's a general problem, but hard to fix if you care about performance

14:46 <apritzel> you might get lucky on something simple like an A53

14:46 <sjg1> Tartarus: No that I know of...what is the failure?

14:46 <Pali> apritzel: and if I would not care about performance on A53, it is possible?

14:47 <Tartarus> sjg1: Lost it in my scrollback, but it just doesn't like the output from the test at all

14:47 <Pali> I have tried to find something in public ARM A53 doc, but I was not able to find anything on this topic

14:48 <sjg1> Tartarus: Well one of the tests looks in the 'u-boot' elf file to find the event-spy linker list. That was the issue I had with LTO (it was dropping some entries)

14:48 <apritzel> Pali: I meant architecturally we don't guarantee synchronous aborts, because that would seriously limit implementations

14:48 <Tartarus> Hmm

14:49 <sjg1> Tartarus: So if the entry is not present it will fail

14:49 <apritzel> Pali: in my experience on an A53, in single-core U-Boot you will get those SErrors quick enough, though

14:49 <sjg1> Tartarus: Like this:

14:49 <sjg1> https://www.irccloud.com/pastebin/C5JcKV83/

14:50 <apritzel> Pali: I don't know if there is something in some IMPDEF sysreg to wait for writes

14:52 marc2 has quit [Ping timeout: 268 seconds]

14:53 marc2 has joined #u-boot

14:54 <marex> is this yet again related to PCIe ?

14:54 <Pali> yes, I'm trying to find out how to handle these errors...

14:55 <Pali> They are common across ten different platforms and are aarch64

14:57 torez has joined #u-boot

14:57 <Tartarus> sjg1: No, https://pastebin.com/6cnmUBtC is the failure I see and "00000000006703a0 d _u_boot_list_2_evspy_info_2_EVT_MISC_INIT_F" is in /tmp/.bm-work/sandbox/u-boot

14:58 <Tartarus> The same toolchain as we have for CI being used

14:58 <sjg1> sugosh: Re v4 of the TPM RNG series, yes I see it, hope to get to it on the weekend

15:05 <Pali> apritzel: I looked into this table https://developer.arm.com/documentation/ddi0500/e/system-control/aarch64-register-summary/aarch64-implementation-defined-registers but seems that there is no impdef register for it

15:06 <sjg1> Tartarus: Perhaps drop the ELF file somewhere so I can look? The '?' is supposed to me the function name, i.e. 'f:sandbox_misc_init_f'

15:06 <Tartarus> sjg1: You should still be able to pop over to bill-the-cat, if you have a moment :)

15:06 <Tartarus> otherwise I'll go poke something

15:07 <sughosh> sjg1: Thanks Simon. Btw, you made a comment on not needing malloc for reading the random bytes

15:08 <kettenis> Pali: the traditional way to handle this is to do the PCIe access in a critical section with the appropriate barrier before and after the access

15:08 <sughosh> but we do need memory to read the random bytes by the rng device. how does it work without allocation memory, whether on heap or stack

15:08 <kettenis> if you detect an asynchronous fault while inside that critical section and the reported address matches, you can be sure the fault was caused by the PCIe access

15:09 <kettenis> of course this makes things slow

15:09 <Pali> kettenis: But this is now how PCIe drivers are written, neither in kernel nor in u-boot

15:10 <apritzel> Pali: well, you shouldn't encounter SErrors, normally

15:10 <apritzel> if so, it's a bug in the driver (or somewhere else)

15:10 <Pali> yes, but if PCIe IPs are broken, then Serrors are normal

15:10 <kettenis> I think the sparc64 implementation of pci_config_read/write uses this strategy

15:11 <apritzel> Pali: yes, but then your PCIe IP is broken, and you need some other workaround

15:11 nacre has quit [Quit: leaving]

15:11 <Pali> Lot of PCIe IPs incorrectly maps PCIe CA or CSR responses to AXI SLVERR

15:11 <apritzel> Pali: is this about config space probes returning SErrors instead of 0xff?

15:12 <Pali> not only to config space request, but response to any PCIe request

15:12 mmu_man has quit [Ping timeout: 240 seconds]

15:12 <apritzel> Pali: yes, and this is a serious hardware issue, and should be fixed there

15:12 <sjg1> sughosh: yes, should not use malloc() willy nilly. Just use 'char buf[64]' or something like that

15:12 <apritzel> Pali: you cannot realiably catch SErrors and reason about them. Believe me, I tried, and ran this idea by quite some people

15:13 <kettenis> Pali: the idea is that once you know the device is actually there and functioning you know which registers are implemented and those should never fault

15:13 <Pali> apritzel: so what to do with lot of hw/SoCs which are caused by this PCIe IP implementation errors?

15:13 <apritzel> indeed

15:13 <apritzel> Pali: you respin ;-)

15:13 <Pali> kettenis: This is not truth, they can cause fault again

15:13 <sughosh> sjg1: okay. will do that in v5. will wait for your comments on v4 before posting v5.

15:13 <apritzel> or abandon the platform

15:13 GNUtoo has quit [Quit: leaving]

15:14 <kettenis> Pali: what apritzel says ;)

15:14 <Pali> kettenis: in any time when LTSSM changes state to non-L

15:14 <apritzel> Pali: been there, done that: https://lore.kernel.org/lkml/20191209162645.GA7489@willie-the-truck/T/

15:15 <kettenis> then you need to disable LTSSM and declare hotplugging broken

15:15 <Pali> kettenis: this does not make sense, disabling LTSSM makes PCIe link to go down and you cannot access PCIe card.

15:16 <apritzel> Pali: in this particular case there is some halfway working workaround by catching the SErrors very early (by mgmt processor firmware), then build a table of allowed BDFs, and filter by that

15:16 <Pali> apritzel: this does not help when LTSSM changes state

15:16 <apritzel> which doesn't catch everything, for instance SR-IOV

15:16 <kettenis> what I mean is you have to make sure LTSSM doesn't change the state

15:17 <Pali> once it drop to config or recovery from L* state those serrors are back

15:17 <kettenis> bring up the link and make sure it stays up

15:17 <apritzel> Pali: as I said: fix the hardware ;-)

15:17 <Pali> kettenis: "have to make sure LTSSM doesn't change the state" --> this is not possible by PCIe design

15:17 <Pali> other side of the link may and in some cases must change state

15:18 <kettenis> well, if you combine such hardware with PCIe IP like that, your system is broken

15:19 <kettenis> however, does mapping the relevant address space as nGnRnE make the faults synchronous?

15:19 redbrain has quit [Read error: Connection reset by peer]

15:19 GNUtoo has joined #u-boot

15:19 <apritzel> kettenis: not for writes, AFAIK

15:20 <Pali> it is already mapped as MT_DEVICE_NGNRNE

15:20 <apritzel> because the "nE" goes only so far into the interconnect

15:22 <apritzel> Pali: what CPU core is this? something ARMv8.2? You can hack something up with an "esb" instruction then to contain the SError, but that's not reliable nor upstreamable

15:22 <Pali> In my case it is A53

15:22 <kettenis> should work for config space access, but I guess mmio access is inherently posted

15:23 <Pali> But this error is general, which I see on lot of different platforms

15:24 <Pali> And people are periodically reporting these issues with different PCIe IPs

15:25 redbrain has joined #u-boot

15:25 <Pali> PCIe MEM write commands are posted

15:25 <Pali> Writes which are not posted are only IO and config

15:25 <kettenis> yes

15:26 <kettenis> I forgot about that

15:26 <Pali> Nature of posted/non-posted is on the PCIe bus.

15:26 <Pali> But those AXI errors are reported by PCIe controller prior commands are sent from controller to bus

15:28 <Pali> So something like armv7 "strongly-ordered" memory mapping could help with this issue...

15:28 <apritzel> I think the issue is that you should never signal an SError to the CPU side unless it's a fatal problem

15:28 <Pali> Yes, this is the issue -- the bug in PCIe IP

15:29 <apritzel> and a good part of the problem is that most so called root complexes are tweaked end points

15:29 <Pali> In my opinion PCIe IP designers misunderstood what those PCIe error means and that they cannot be mapped to SLVERR

15:29 <apritzel> exactly

15:29 <apritzel> the PCIe term #SErr does not help here ;-)

15:30 <Pali> But because this is not issue of just one PCIe IP, but at least of 4 or 5 different from different companies, it is really a big problem.

15:30 <sjg1> Tartarus: addr2line: DWARF error: can't find .debug_ranges section.

15:30 <sjg1> Tartarus: The image seems to me missing the line-number info in the debug tables

15:32 <apritzel> Pali: at least for config space accesses there is an SMCCC firmware standard to be able to deploy workarounds in firmware: https://developer.arm.com/documentation/den0115/latest

15:32 <kettenis> for config space access you can easily implement a workaround in the host bridge driver

15:33 <apritzel> (with the major caveat that this SMC is not implemented by Linux)

15:33 <Pali> I know, but this does not help for dynamic reconfiguration of config space memory mapping, plus it was rejected by linux-pci people

15:33 <kettenis> you can't do that in linux and u-boot for mmio and io access because drivers use readl/writel directly

15:34 <apritzel> and rightly so

15:34 torez has quit [Ping timeout: 260 seconds]

15:35 torez has joined #u-boot

15:35 <Pali> (btw, in my case, I can access config space via different PIO method which does not cause serrors, but this does not help MMIO access and neither does not help other platforms)

15:40 sughosh has quit [Read error: Connection reset by peer]

15:43 <apritzel> Pali: so why are those other SErrors happen during normal operation? Is that because the PCIe endpoint is something special?

15:44 <Pali> it happens anytime when you issue request and endpoint is not in L* state

15:44 <Pali> if endpoint is in L1 or L2 state it has to first switch to L0 before it can accept requests... but changing from L2 to L0 is via config or recovery state

15:45 <Pali> and if you do readl() writel() at this time when is in recovery state, you get SError

15:46 <kettenis> under what circumstances does the device decide to switch out of the L0 state?

15:46 <Pali> for example for power saving

15:46 <Pali> and also when card is buggy

15:46 <kettenis> well, you disable that

15:46 <Pali> and do its internal reset

15:47 <Pali> wifi cards are known to be bugy and their firmware lot of times crashes and card itself "reboots"

15:47 <kettenis> more broken hardware

15:47 <Pali> during card reboot PCIe link is down

15:48 <Pali> Or it happens if kernel explicitly want to reset card (either via in-band method, e.g. FLR or Hot-Reset, or out-of-band e.g. Warm-Reset)

15:48 <Pali> Or also when kernel explicitly re-issue Link Retraining

15:48 <kettenis> well, if the kernel initiates the reset, it should know not to access the device until it is back up

15:49 <Pali> "until it is back up" --> but this check is done by reading config space

15:49 <Pali> PCIe mandates that card should return CRS response

15:49 <Pali> and we are at the beginning that broken PCIe IPs maps CRS to Serror

15:49 <kettenis> but you can work around that aspect

15:51 <Pali> ok, if config space is somehow workarounded... there are still those problems with MMIO

15:53 <kettenis> send you pcie device back to the vendor and ask for a refund

15:55 <kettenis> between the linux pcie maintainers being somewhat unreasonable, your pcie host bridge IP being broken and you pcie devices being broken you'll have to come up with a compromise that makes most hardware work

15:57 <Pali> Show me one pcie wifi card which is not broken... I have tested lot of them and I have not found any non-broken.

15:57 <Pali> Just kidding...

15:58 <Pali> But situation is really bad as there is no working hw... and I'm trying to find something useful in general, not just for one PCIe controller.

16:07 zibolo has quit [Ping timeout: 256 seconds]

16:25 frieder has quit [Remote host closed the connection]

16:30 adeepv has quit [Quit: %exit%]

17:09 apritzel has quit [Ping timeout: 252 seconds]

17:09 vagrantc has joined #u-boot

17:22 sszy has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

17:22 ___nick___ has joined #u-boot

17:23 mckoan is now known as mckoan|away

17:33 vagrantc has quit [Ping timeout: 240 seconds]

17:36 ___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

17:38 ___nick___ has joined #u-boot

17:40 vagrantc has joined #u-boot

17:49 baltazar has quit [Ping timeout: 256 seconds]

17:50 baltazar has joined #u-boot

18:00 matthias_bgg has quit [Ping timeout: 272 seconds]

18:50 sobkas has joined #u-boot

19:09 <Tartarus> Sigh, https://source.denx.de/u-boot/u-boot/-/jobs/402349 means the check has been broken for a while

19:09 <Tartarus> I've figured out how to fix the check, but I need to re-migrate a ton of stuff again first

19:09 * Tartarus hunts for brown paper bags

19:38 torez has quit [Ping timeout: 240 seconds]

20:11 <marex> kettenis: pcie maintainers being unreasonable ? :)

20:13 <kettenis> somewhat unreasonable ;)

20:16 <marex> kettenis: thank you, this made me laugh

20:39 mauro_anjo has quit [Ping timeout: 252 seconds]

20:40 mthall has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

20:41 mthall has joined #u-boot

20:46 mthall has quit [Client Quit]

20:46 mthall has joined #u-boot

20:59 grgy has quit [Ping timeout: 240 seconds]

21:04 ___nick___ has quit [Ping timeout: 250 seconds]

21:29 mmu_man has joined #u-boot

21:32 <sjg1> Tartarus: Ah I didn't know about that check. Should we move it to the build?

21:43 grgy has joined #u-boot

21:44 lucaceresoli__ has quit [Quit: Leaving]

21:47 grgy has quit [Client Quit]

21:47 grgy has joined #u-boot

22:02 prabhakarlad has quit [Quit: Client closed]

22:07 sobkas has quit [Quit: sobkas]

22:14 <Tartarus> sjg1: I'm fixing the test so that it works correctly, and a small series of re-migrating stuff again

22:15 <Tartarus> But no, it's not worth making part of every build

22:30 <sjg1> Tartarus: OK

23:02 Pali has left #u-boot [#u-boot]

23:37 prabhakarlad has joined #u-boot

23:51 heijligen has quit [Quit: WeeChat 3.2]