ChanServ changed the topic of #armlinux to: ARM kernel talk [Upstream kernel, find your vendor forums for questions about their kernels] | https://libera.irclog.whitequark.org/armlinux
djrscally has quit [Ping timeout: 256 seconds]
System_Error has joined #armlinux
Misotauros has quit [Ping timeout: 272 seconds]
System_Error has quit [Remote host closed the connection]
<arnd>
preview from my set_fs() cleanup series, which I want to post later today. arm64 already had removed set_fs a while ago, but the access_ok() check still seemed overly complex
<geertu>
arnd: Nice, 28 KiB saved.
<arnd>
geertu: it's out of 22MB .text for the defconfig, but I'm sure some of the removed code is is fast path syscalls
<arnd>
I guess is it's still 0.12% of the total .text, which feels like a lot for this kind of change
<arnd>
I still need to go through all architectures once more to fix the way they handle the last addressable byte. On some architectures, TASK_SIZE is inclusive (0xbfffffff), on others it's exclusive (0xc0000000) when looking at the same value
Pali has joined #armlinux
frieder has joined #armlinux
alexels has joined #armlinux
nsaenz has joined #armlinux
macromorgan has quit [Read error: Connection reset by peer]
System_Error has joined #armlinux
headless has joined #armlinux
frieder has quit [Ping timeout: 240 seconds]
monstr has joined #armlinux
<milkylainen>
Hrrm. This is x86, but getting a lot more hpet clocksource weirdness after upgrading to glibc 2.35 (No idea why. RSEQ stuff?). Kernel 5.16+ in both host and guest, x86_64. Can't really make sense of it either. Both hpet and tsc wd is complaining? But then in the end kernel spews even more strangeness but decides to switch to hpet?
<arnd>
mrutland: should I leave my patch out then or just use my version? The generic __access_ok() now does 'return (size < limit) && (addr <= (limit - size))'.
<mrutland>
arnd: I just saw that, and tbh, yours looks nicer source code wise!
<mrutland>
arnd: Is this on the list already?
<mrutland>
I'm happy to go check the codegen / test
<arnd>
no, I got distracted and still need to go through most architectures to make sure that 'limit' consistently works for the last addressable byte
<mrutland>
Ok
<mrutland>
No need to drop, just a heads-up that I'm interested, and happy to test, etc :)
<arnd>
there are probably other cases where the difference is bigger because the old version required spilling some local variables to the stack to meet the asm constraints
<mrutland>
mhmm
<arnd>
mrutland: your version has a special case for size==0, and I saw other architectures have that as well, but I did not include it in the generic version because most architectures don't have it, and I could not find a case where returning success for size=0 on an invalid pointer is actually correct
<arnd>
do you know where this comes from?
<arnd>
ah, your size==0 check is different, it still returns false for an invalid pointer
jlinton has quit [Quit: Client closed]
<mrutland>
arnd: IIRC in mine that was just bodging around the compiler to not generate redundant code in that case
<mrutland>
It shoukd be possible to delete that with no functional change
<arnd>
ok
<mrutland>
I should have commented that, becuase now I'm confused/unsure
<arnd>
or maybe it came from an earlier version where size was signed?
<mrutland>
AAAh
<mrutland>
That was from an earlier version when we still had set_fs(), and was used to ensure we wouldn't wrap the limit for KERNEL_DS
<arnd>
ok, in that case the version I have is probably best. I should try to find out who first wrote it like this and give proper credit
<mrutland>
Sounds good!
<ardb>
arnd: doesn't the uint128 arithmetic result in better codegen on arches that support it?
<mrutland>
It does for the non-constant size cases
<ardb>
i'd assume so
<mrutland>
... but I can't recall by exactly how much, and it looks like I didn't commit my test file with a bunch of cases
<mrutland>
It might also be that we can use __builtin_uaddl_overflow() to achieve the same result now that the limit is constant and (for most arches) less than ~0UL
<arnd>
ardb: I've seen a number of cases where double-word arithmetic (u64 on 32-bit, u128 on 64-bit architectures) results in horrible codegen from compilers when you'd expect them to do something decent. My feeling is that across architectures, my version would produce better code on average even for non-constant length, but I have not tried that
<mrutland>
Maybe it's worth splitting out the generic access_ok bit, and sending as an RFC?
<mrutland>
Otherwise, I can go rebuild my test harness and try a few approaches for comparison
<mrutland>
Ah! my test harness was in userspace
<mrutland>
Lemme clean that up a bit, and I can share
<mrutland>
(all at -O2 with GCC 11.1.0 from the kernel.org crosstool pages)
<arnd>
mrutland: I just wrote the same thing before I saw yours, but my testcase shows the u128 version being worse for constant size: https://www.godbolt.org/z/3qv3bTM3Y maybe I did something wrong there though
<arnd>
that is worse with gcc, both are good with clang regardless of the architecture
<mrutland>
arnd: in mine I have two versions of the uint128 form, one for a constant-size
<arnd>
ah, of course
<mrutland>
arnd: the gist is constant-size works better as yours, non-constant size is *slightly* better using uint128
<arnd>
slightly better because you have one correctly predicted branch instead of two, or something else?
<mrutland>
arnd: I was just going by number of insns, and recalling thinking this was bette in the past; I could be wrong
<mrutland>
I also didn't use __bultin_expect, so maybe with that the difference disappears
prabhakarlad has joined #armlinux
<mrutland>
Hmm... that seems to stem from `size < limit` rather than `size <= limit` on the LHS
<mrutland>
Shouldn't it be the latter in case the base is 0 ?
<mrutland>
for arm64 that also avoids generating TASK_SIZE_MAX and TASK_SIZE_MAX-1 separately into GPRs
<mrutland>
... in the non-constant size case
prabhakarlad has quit [Quit: Client closed]
headless has quit [Quit: Konversation terminated!]
matthias_bgg has quit [Read error: Connection reset by peer]
matthias_bgg has joined #armlinux
* arnd
is back from lunch
<arnd>
mrutland, ardb: it looks like the limit version needs fewer registers, which may be important on some architectures, especially inbetween inline asm
<arnd>
I'm still looking through what the limits should be, most architectures I've seen so var use the exclusive number for TASK_SIZE, but notably mips uses one byte lower and x86 seems to use one page lower
<arnd>
the worst case I've run into now is coldfire, which fails to check the limit completely
<mrutland>
oh no
<mrutland>
Was "the limit version" the one in your patches? If it's using fewer regs I agree that's better!
<arnd>
yes
<arnd>
classic m68k has separate address spaces, so they don't need to check, coldfire started out as nommu and didn't need to check, but later coldfire have a new MMU that does need it.
<arnd>
having the same comparison does produce better code, but I have to think about it carefully to ensure I'm not opening a new corner case that way. the version I used is the same that a number of architectures already had
<mrutland>
I think the mismatched form is a fencepost bug
<mrutland>
If you consider size==limit, addr==0, the mismatched check rejects it, the matched form accepts it
<mrutland>
I think that's jsut a latent bug that people get away with because no-one ever does that specifically
<mrutland>
... but logically you should be able to access every user byte below the limit
<mrutland>
which at addr 0 means accepting up to size==limit
Misotauros has quit [Ping timeout: 272 seconds]
<mrutland>
Definitely worth having tests for though
<mrutland>
another way of thinking about it is that if limit is exclusive the LHS rejects a byte it shouldn't, and if the limit is inclusive the RHC accepts a byte it shouldn't
<arnd>
I see the matched comparisons on riscv, openrisc and nds32, but the mismatched form only on powerpc. I probably just picked the wrong one to copy from ;-)
<mrutland>
Heh
<mrutland>
I've also confused myself here, so I'll go write sound boundary tests
prabhakarlad has joined #armlinux
<arnd>
ah, the powerpc version also tested addr<TASK_SIZE_MAX, rather than testing size, so it's not the same anyway, and it's fairly recent as well, hch wrote it in 2020
<arnd>
the version before that did "(((addr) <= (segment).seg) && ((size) <= (segment).seg))", with a comment explaining that segment.seg is small enough on ppc64 that this check is sufficient to stay out of kernel space
<mrutland>
Ah; so `addr` should *ways* be below TASK_SIZE_MAX, it's just that addr+size can be equal to it, and so size can be equal to it
<mrutland>
s/ways/always/
<mrutland>
... and thinking some more, a size must be an exclusive limit, so we should be using <=
<mrutland>
(sorry if this seems like i'm going roung in circles)
<mrutland>
SO if people are using size as inclusive, I think they're doing something wrong
<arnd>
those are the possible values for TASK_SIZE_MAX (ignoring some that are similar)
<arnd>
so these are always exclusive, or they are 0xffffffff on 32-bit nommu architectures
<arnd>
in theory we should make a special version for nommu that allows writing the last byte of memory but disallows wrapping around to the first byte of the addres space, but I don't think anyone cares
<mrutland>
It might be worth just having a NOMMU_TASK_SIZE_MAX with a comment to that effect, but otherwise agreed!
<mrutland>
(I mean, to centralize the definition for all the NOMMU arches)
<mrutland>
It sounds like the common logic should treat that as exclusive anyhow
torez has joined #armlinux
<ardb>
i don't understand how a size could be interpreted in two different ways
<ardb>
with the exception of using ULONG_MAX as ULONG_MAX + 1 does not fit in a ulong
<arnd>
ardb: I assume mrutland meant the limit, i.e. TASK_SIZE, which could be interpreted as the highest valid address, or the lowest invalid address
<arnd>
the size argument to access_ok() should be unambiguous though
<arnd>
one corner case that implementations definitely disagree on is addr=TASK_SIZE, size=0
<arnd>
mips explicitly makes that invalid, while x86 and arm64 accept it as a side-effect of the implementation
<mrutland>
TASK_SIZE, since it's the _size_, must be an exclusive limit. e.g. like malloc of size 8 giving addrs 0,1,2,4,5,6,7, but not 8
shailangsa has joined #armlinux
<mrutland>
THe problem is that a size of the entire address space cannot fir in unsigned long
<ardb>
mrutland: exactly
<ardb>
arnd: so what does size=0 mean? if it really means 'access of size 0', does it really matter?
<mrutland>
I think those who have it as inclusive have a bug, as earlier, but they're kinda forced into that
<arnd>
right. I assumed that there were others that got it wrong, but only the TASK_SIZE=ULONG_MAX ones seem to be left now
<mrutland>
size=0 doesn't matter, but e.g. if base=1, size=0xffffffff, and TASK_SIZE is (ULL(1) << 32), that should work
<mrutland>
i.e. if I want to access every addressable user byte, I should be able to
<mrutland>
if *some* combination of addr+size supports that, *all* should
<ardb>
agreed
<mrutland>
which is why I was arguing the LHS and RHS should use the same check
<arnd>
right, more importantly base=0xffffffff, size=1 should work -- any larger sizes are pretty much irrelevant because no kernel code should ever pass that (it needs to fit into a local buffer if you want a copy of it)
<ardb>
but ULONG_MAX is also a special case because it never fails, right? the wraparound may be an unexpected case, but it doesn't mean the access is not ok
<arnd>
IIRC ppc at some point outright rejected any access of more than 1 megabyte, but eventually stopped doing that because the limit was arbitrary
<mrutland>
IIUC what we actually want is an inclusive USER_ADDR_MAX, because TASK_SIZE_MAX is used elsewhere in ways that behave oddly when it's inclusive
balrog has quit [Quit: Bye]
<arnd>
it looks like 31b1e940c5d4 ("arm64: Fix __range_ok macro") fixed this in the past, and 51369e398d0d ("arm64: Make USER_DS an inclusive limit") changed USER_DS to be inclusive but kept TASK_SIZE as exclusive, which is the common but slightly confusing bit
<arnd>
at least my series gets rid of both KERNEL_DS and USER_DS, so there are fewer things to confuse
<arnd>
(it's already gone on arm64)
<mrutland>
arnd: yeah, and the latter was there precisely because we needed TASK_SIZE to be exclsuive, but it was easier to wite the check logic with inclusive addrs because otherwiwe we couldn't correctly represent KERNEL_DS
<mrutland>
THat's one reason I was glad to remove those entirely!
balrog has joined #armlinux
macromorgan has joined #armlinux
<mrutland>
Maybe the s390 approach is the best one to follow, saying the last page of the addr space can't be used?
<mrutland>
That way we can represent an exclusive limit, and it means it's PAGE-aligned, which seems more intuitive
<arnd>
I'm not sure, that is possibly a page that is actually inaccessible on s390, either because of the way they do vsyscalls, or their per-cpu physical "lowcore" page
<mrutland>
IIRC the lowcore stuff appears at low VAs
<arnd>
in any case, I would try not to change the behavior too much for the moment, once we have a single function that is used everywhere it becomes easier to argue about changes
<arnd>
lowcore is a low PA, it could get mapped anywhere IIRC
<mrutland>
I thought it was an arbitrary PA configured through a register that got aliased at a low VA?
<mrutland>
I'm probably wrong
<arnd>
there is a per-cpu register that swaps the the page at zero PA with another PA. I don't think it has to be mapped at any VA, but maybe they had a reason to map it after all
<arnd>
no, I think you were right: it's just a different way to get around the -1ul address issue
<arnd>
access_ok() on s390 is always true, 1aea9b3f9210 ("s390/mm: implement 5 level pages tables") introduced the constant
<arnd>
I'm glad hch poked me about fixing nds32, otherwise I would have missed the major bug they have:
frieder has quit [Remote host closed the connection]
shailangsa has quit [Remote host closed the connection]
Tokamak has quit [Read error: Connection reset by peer]
Tokamak has joined #armlinux
prabhakarlad has quit [Quit: Client closed]
Misotauros has quit [Ping timeout: 272 seconds]
monstr has quit [Remote host closed the connection]
matthias_bgg has quit [Ping timeout: 272 seconds]
shailangsa has joined #armlinux
TheCoffeMaker_ has quit [Quit: So long and thanks for all the fish]
TheCoffeMaker has joined #armlinux
alexels has quit [Quit: WeeChat 3.4]
Misotauros has joined #armlinux
jlinton has quit [Quit: Client closed]
amitk has quit [Ping timeout: 250 seconds]
nsaenz has quit [Remote host closed the connection]
headless has quit [Ping timeout: 252 seconds]
headless_ has joined #armlinux
headless_ is now known as headless
headless has quit [Quit: Konversation terminated!]
cbeznea has quit [Ping timeout: 252 seconds]
djrscally has joined #armlinux
iivanov has quit [Remote host closed the connection]
iivanov has joined #armlinux
iivanov has quit [Ping timeout: 272 seconds]
<marex>
robher: sboyd: hey, is anyone working on yaml conversion of Documentation/devicetree/bindings/clock/clock-bindings.txt to your knowledge ?
djrscally has quit [Quit: Konversation terminated!]
Grimler has quit [Ping timeout: 256 seconds]
Grimler has joined #armlinux
Grimler has quit [Ping timeout: 272 seconds]
Grimler has joined #armlinux
elastic_dog has quit [Ping timeout: 250 seconds]
elastic_dog has joined #armlinux
<robher>
marex: IIRC, it is all or mostly done. It just lives in dt-schema now. There may be some bits of description that need relicensing that did not get moved.