<Guest3473>
Is it possible to change u-Boot env that the andoid is opening a shell after boot whenn connect only is possible over a USB-Serial adapter ? And if yes how to do this ?
swiftgeek has joined #u-boot
Guest3473 has quit [Quit: Client closed]
Guest34 has joined #u-boot
frieder has joined #u-boot
ac_slater has quit [Quit: WeeChat 3.4]
milkylainen has joined #u-boot
zibolo has joined #u-boot
Harm has quit [Ping timeout: 240 seconds]
matthias_bgg has joined #u-boot
user_name has joined #u-boot
user_name has quit [Client Quit]
nacre has joined #u-boot
mwalle has joined #u-boot
cpackham[m] has quit [Quit: You have been kicked for being idle]
maxim[m] has quit [Quit: You have been kicked for being idle]
Thorn has quit [Ping timeout: 256 seconds]
Thorn has joined #u-boot
lucaceresoli__ has joined #u-boot
lucaceresoli_ has quit [Ping timeout: 256 seconds]
nacre has quit [Quit: leaving]
apritzel has joined #u-boot
mauro_anjo has joined #u-boot
mmu_man has joined #u-boot
lucaceresoli_ has joined #u-boot
lucaceresoli__ has quit [Ping timeout: 272 seconds]
tre has joined #u-boot
Guest34 has quit [Quit: Client closed]
tre has quit [Remote host closed the connection]
matthias_bgg has quit [Read error: Connection reset by peer]
matthias_bgg has joined #u-boot
sughosh has joined #u-boot
lucaceresoli__ has joined #u-boot
lucaceresoli_ has quit [Ping timeout: 272 seconds]
nacre has joined #u-boot
sughosh has quit [Ping timeout: 272 seconds]
Pali has joined #u-boot
<Pali>
Hello! Is there any reason why aarch64 U-Boot mask System Error exception but does not mask Synchronous Abort exception?
<Pali>
Masking of System Errors exceptions in U-Boot is a big issue... if (buggy) U-Boot code cause System Error then U-Boot do not see it and continue its operations as if nothing happened.
<Pali>
And once U-Boot start booting kernel (which unmask System Error) then kernel is immediately killed by pending System Error caused by U-Boot.
sughosh has joined #u-boot
<Pali>
But this System Error is reported from kernel context... which is really hard to debug that not kernel caused it, but rather bootloader.
<Pali>
On the other hand Synchronous Abort exceptions are not masked by U-Boot and if buggy code triggers it then U-Boot run its do_sync() handler which reset CPU.
<apritzel>
Pali: that is already fixed in -next, I believe
<apritzel>
Pali: and yeah, we ran into the exact same issue multiple times already ...
<Pali>
apritzel: Perfect, I'm going to test those patches!
<Pali>
apritzel: Patches are working fine! System Error is unmasked and u-boot's handler is called, thanks!
<apritzel>
Pali: great, thanks for the test! They should end up in mainline in April, I guess
<Pali>
apritzel: Btw, do you know if there is some way/hack how to force aarch64 core to report aborts caused by store instruction as Synchronous External abort?
<apritzel>
if you mean to catch "bus errors" synchronously: I don't think so, at least not architecturally
<Pali>
external aborts caused by load are reported as synchronous (as core has to wait until data are ready) but aborts caused by store are reported as asynchronous as abort itself is delivered later...
<Pali>
I mean data aborts caused by AXI slaves
<apritzel>
yes, that's a general problem, but hard to fix if you care about performance
<apritzel>
you might get lucky on something simple like an A53
<sjg1>
Tartarus: No that I know of...what is the failure?
<Pali>
apritzel: and if I would not care about performance on A53, it is possible?
<Tartarus>
sjg1: Lost it in my scrollback, but it just doesn't like the output from the test at all
<Pali>
I have tried to find something in public ARM A53 doc, but I was not able to find anything on this topic
<sjg1>
Tartarus: Well one of the tests looks in the 'u-boot' elf file to find the event-spy linker list. That was the issue I had with LTO (it was dropping some entries)
<apritzel>
Pali: I meant architecturally we don't guarantee synchronous aborts, because that would seriously limit implementations
<Tartarus>
Hmm
<sjg1>
Tartarus: So if the entry is not present it will fail
<apritzel>
Pali: in my experience on an A53, in single-core U-Boot you will get those SErrors quick enough, though
<apritzel>
Pali: I don't know if there is something in some IMPDEF sysreg to wait for writes
marc2 has quit [Ping timeout: 268 seconds]
marc2 has joined #u-boot
<marex>
is this yet again related to PCIe ?
<Pali>
yes, I'm trying to find out how to handle these errors...
<Pali>
They are common across ten different platforms and are aarch64
torez has joined #u-boot
<Tartarus>
sjg1: No, https://pastebin.com/6cnmUBtC is the failure I see and "00000000006703a0 d _u_boot_list_2_evspy_info_2_EVT_MISC_INIT_F" is in /tmp/.bm-work/sandbox/u-boot
<Tartarus>
The same toolchain as we have for CI being used
<sjg1>
sugosh: Re v4 of the TPM RNG series, yes I see it, hope to get to it on the weekend
<sjg1>
Tartarus: Perhaps drop the ELF file somewhere so I can look? The '?' is supposed to me the function name, i.e. 'f:sandbox_misc_init_f'
<Tartarus>
sjg1: You should still be able to pop over to bill-the-cat, if you have a moment :)
<Tartarus>
otherwise I'll go poke something
<sughosh>
sjg1: Thanks Simon. Btw, you made a comment on not needing malloc for reading the random bytes
<kettenis>
Pali: the traditional way to handle this is to do the PCIe access in a critical section with the appropriate barrier before and after the access
<sughosh>
but we do need memory to read the random bytes by the rng device. how does it work without allocation memory, whether on heap or stack
<kettenis>
if you detect an asynchronous fault while inside that critical section and the reported address matches, you can be sure the fault was caused by the PCIe access
<kettenis>
of course this makes things slow
<Pali>
kettenis: But this is now how PCIe drivers are written, neither in kernel nor in u-boot
<apritzel>
Pali: well, you shouldn't encounter SErrors, normally
<apritzel>
if so, it's a bug in the driver (or somewhere else)
<Pali>
yes, but if PCIe IPs are broken, then Serrors are normal
<kettenis>
I think the sparc64 implementation of pci_config_read/write uses this strategy
<apritzel>
Pali: yes, but then your PCIe IP is broken, and you need some other workaround
nacre has quit [Quit: leaving]
<Pali>
Lot of PCIe IPs incorrectly maps PCIe CA or CSR responses to AXI SLVERR
<apritzel>
Pali: is this about config space probes returning SErrors instead of 0xff?
<Pali>
not only to config space request, but response to any PCIe request
mmu_man has quit [Ping timeout: 240 seconds]
<apritzel>
Pali: yes, and this is a serious hardware issue, and should be fixed there
<sjg1>
sughosh: yes, should not use malloc() willy nilly. Just use 'char buf[64]' or something like that
<apritzel>
Pali: you cannot realiably catch SErrors and reason about them. Believe me, I tried, and ran this idea by quite some people
<kettenis>
Pali: the idea is that once you know the device is actually there and functioning you know which registers are implemented and those should never fault
<Pali>
apritzel: so what to do with lot of hw/SoCs which are caused by this PCIe IP implementation errors?
<apritzel>
indeed
<apritzel>
Pali: you respin ;-)
<Pali>
kettenis: This is not truth, they can cause fault again
<sughosh>
sjg1: okay. will do that in v5. will wait for your comments on v4 before posting v5.
<apritzel>
or abandon the platform
GNUtoo has quit [Quit: leaving]
<kettenis>
Pali: what apritzel says ;)
<Pali>
kettenis: in any time when LTSSM changes state to non-L
<kettenis>
then you need to disable LTSSM and declare hotplugging broken
<Pali>
kettenis: this does not make sense, disabling LTSSM makes PCIe link to go down and you cannot access PCIe card.
<apritzel>
Pali: in this particular case there is some halfway working workaround by catching the SErrors very early (by mgmt processor firmware), then build a table of allowed BDFs, and filter by that
<Pali>
apritzel: this does not help when LTSSM changes state
<apritzel>
which doesn't catch everything, for instance SR-IOV
<kettenis>
what I mean is you have to make sure LTSSM doesn't change the state
<Pali>
once it drop to config or recovery from L* state those serrors are back
<kettenis>
bring up the link and make sure it stays up
<apritzel>
Pali: as I said: fix the hardware ;-)
<Pali>
kettenis: "have to make sure LTSSM doesn't change the state" --> this is not possible by PCIe design
<Pali>
other side of the link may and in some cases must change state
<kettenis>
well, if you combine such hardware with PCIe IP like that, your system is broken
<kettenis>
however, does mapping the relevant address space as nGnRnE make the faults synchronous?
redbrain has quit [Read error: Connection reset by peer]
GNUtoo has joined #u-boot
<apritzel>
kettenis: not for writes, AFAIK
<Pali>
it is already mapped as MT_DEVICE_NGNRNE
<apritzel>
because the "nE" goes only so far into the interconnect
<apritzel>
Pali: what CPU core is this? something ARMv8.2? You can hack something up with an "esb" instruction then to contain the SError, but that's not reliable nor upstreamable
<Pali>
In my case it is A53
<kettenis>
should work for config space access, but I guess mmio access is inherently posted
<Pali>
But this error is general, which I see on lot of different platforms
<Pali>
And people are periodically reporting these issues with different PCIe IPs
redbrain has joined #u-boot
<Pali>
PCIe MEM write commands are posted
<Pali>
Writes which are not posted are only IO and config
<kettenis>
yes
<kettenis>
I forgot about that
<Pali>
Nature of posted/non-posted is on the PCIe bus.
<Pali>
But those AXI errors are reported by PCIe controller prior commands are sent from controller to bus
<Pali>
So something like armv7 "strongly-ordered" memory mapping could help with this issue...
<apritzel>
I think the issue is that you should never signal an SError to the CPU side unless it's a fatal problem
<Pali>
Yes, this is the issue -- the bug in PCIe IP
<apritzel>
and a good part of the problem is that most so called root complexes are tweaked end points
<Pali>
In my opinion PCIe IP designers misunderstood what those PCIe error means and that they cannot be mapped to SLVERR
<apritzel>
exactly
<apritzel>
the PCIe term #SErr does not help here ;-)
<Pali>
But because this is not issue of just one PCIe IP, but at least of 4 or 5 different from different companies, it is really a big problem.
<kettenis>
for config space access you can easily implement a workaround in the host bridge driver
<apritzel>
(with the major caveat that this SMC is not implemented by Linux)
<Pali>
I know, but this does not help for dynamic reconfiguration of config space memory mapping, plus it was rejected by linux-pci people
<kettenis>
you can't do that in linux and u-boot for mmio and io access because drivers use readl/writel directly
<apritzel>
and rightly so
torez has quit [Ping timeout: 260 seconds]
torez has joined #u-boot
<Pali>
(btw, in my case, I can access config space via different PIO method which does not cause serrors, but this does not help MMIO access and neither does not help other platforms)
sughosh has quit [Read error: Connection reset by peer]
<apritzel>
Pali: so why are those other SErrors happen during normal operation? Is that because the PCIe endpoint is something special?
<Pali>
it happens anytime when you issue request and endpoint is not in L* state
<Pali>
if endpoint is in L1 or L2 state it has to first switch to L0 before it can accept requests... but changing from L2 to L0 is via config or recovery state
<Pali>
and if you do readl() writel() at this time when is in recovery state, you get SError
<kettenis>
under what circumstances does the device decide to switch out of the L0 state?
<Pali>
for example for power saving
<Pali>
and also when card is buggy
<kettenis>
well, you disable that
<Pali>
and do its internal reset
<Pali>
wifi cards are known to be bugy and their firmware lot of times crashes and card itself "reboots"
<kettenis>
more broken hardware
<Pali>
during card reboot PCIe link is down
<Pali>
Or it happens if kernel explicitly want to reset card (either via in-band method, e.g. FLR or Hot-Reset, or out-of-band e.g. Warm-Reset)
<Pali>
Or also when kernel explicitly re-issue Link Retraining
<kettenis>
well, if the kernel initiates the reset, it should know not to access the device until it is back up
<Pali>
"until it is back up" --> but this check is done by reading config space
<Pali>
PCIe mandates that card should return CRS response
<Pali>
and we are at the beginning that broken PCIe IPs maps CRS to Serror
<kettenis>
but you can work around that aspect
<Pali>
ok, if config space is somehow workarounded... there are still those problems with MMIO
<kettenis>
send you pcie device back to the vendor and ask for a refund
<kettenis>
between the linux pcie maintainers being somewhat unreasonable, your pcie host bridge IP being broken and you pcie devices being broken you'll have to come up with a compromise that makes most hardware work
<Pali>
Show me one pcie wifi card which is not broken... I have tested lot of them and I have not found any non-broken.
<Pali>
Just kidding...
<Pali>
But situation is really bad as there is no working hw... and I'm trying to find something useful in general, not just for one PCIe controller.
zibolo has quit [Ping timeout: 256 seconds]
frieder has quit [Remote host closed the connection]