<vmeson>
451ee2df9a3 linux-yocto/5.4: update to v5.4.117 -> 20 tests, no BUG: on to: 397b549a2e0 linux-yocto/5.10: update to v5.10.36
<vmeson>
ant_: I've been rambling far longer than I've been bisecting! ;-)
<vmeson>
nice to see the WR builder stats in good shape. The last 12 hrs: 801 builds, 6 fail, mostly on master: chromium, opencl-clang. Could have done ~850 without sstate ;-)
<JPEW>
vmeson: are you generating it but not sharing?
<vmeson>
JPEW: yes, right now, sstate-cache is generated to the local disk and then removed. I'd like to avoid the useless work but I understand that this could be problematic for bitbake.
<JPEW>
Fair... You could look at nullfs FUSE driver. Not sure if it would be "better"
<vmeson>
so to be clear: Could have done ~850 or more builds if we didn't have to sstate-cache and then remove it when cleaning up the successful builds ;-)
<JPEW>
Do you not want sstate, or you just don't have anywhere to put it?
<vmeson>
JPEW: I'll keep that in mind if it can't be disabled completely.
<vmeson>
Some of our builds just always build from source so I'd prefer to disable sstate generation in that case.
<JPEW>
Ah
<vmeson>
less work, less clean-up .
* JPEW
is curious because he's fighting tooth and nail to *get* sstate currently
<vmeson>
heh
qschulz has quit [Quit: qschulz]
qschulz has joined #yocto
sakoman has quit [Quit: Leaving.]
ecdhe has joined #yocto
ecdhe_ has quit [Ping timeout: 252 seconds]
Vonter has quit [Ping timeout: 272 seconds]
Vonter has joined #yocto
hpsy1 has quit [Ping timeout: 272 seconds]
hpsy has joined #yocto
camus has joined #yocto
camus has quit [Ping timeout: 268 seconds]
camus has joined #yocto
davidinux has quit [Ping timeout: 245 seconds]
davidinux has joined #yocto
vmeson has quit [Remote host closed the connection]
vmeson has joined #yocto
Vonter has quit [Ping timeout: 272 seconds]
Vonter has joined #yocto
<ant_>
RP: in meta-initramfs we pushed your IMAGE_FSTYPES python fix
<ant_>
no failures here with limited build-testing
Vonter has quit [Ping timeout: 252 seconds]
Vonter has joined #yocto
Vonter has quit [Ping timeout: 264 seconds]
Vonter has joined #yocto
<paulg>
remind me in a couple days to patch runqemu to pre-run "--version" so that the qemu version gets captured in the qemu boot log. I'll sell it as a sanity test to make sure the qemu is sane and executable...
lexano has quit [Quit: Leaving]
goliath has joined #yocto
<vmeson>
at: 7c4c016a3dd linux-yocto/5.10: update to v5.10.37 -- I ran more tests overnight and see 2 out of 125 tests see the "BUG:"
<vmeson>
last "going through the motions" tests of the bisect: 03b9b16598 linux-yocto/5.4: update to v5.4.118 -- but given how the bug is hard to reproduce, I wouldn't draw any conclusions of course.
<paulg>
everything I've seen has dcache signatures on it.... from the 1st RIP following the BUG ; the BUG line itself is rather meaningless from a diagnostic POV.
<paulg>
I can reproduce it on vanilla v5.4, built with distro native gcc-7 and qemu-5.1 built also outside of yocto - largely only leaving yocto's rootfs/LTP content remaining.
<paulg>
I'm leaning heavily towards a dentry UAF or general dentry_cache [kmem_cache_alloc] corruption.
<paulg>
not saying it still couldn't be caused by qemu or cosmic rays - just that it seems to consistently manifest itself in dentry mangulation ; and there is some complex workings in there ; bit spinlock hash lists; RCU walk with fallback to Reference-walk, etc.
<paulg>
oh, and RP gets a pass on his rubber-chicken eye-of-newt boot arg change ; I reproduced it with that reverted (i.e no tsc= and no rcu_expidite) but on the off chance that rcu expidite increases our odds, I put it back.
<paulg>
I guess since this is fs related, I shoulda taken a page from the Viro handbook and used the phrase "cargo cult" instead of rubber-chicken waved over head 3x.
<paulg>
[ 211.052076] CPU: 3 PID: 7969 Comm: mount Not tainted 5.2.0-yocto-standard #1
<paulg>
[ 211.051658] Modules linked in:
<paulg>
so there. Reproduced as far back as v5.2 kernel ; which was used for whatever the 2019 yocto release was called.
<paulg>
I've walked the v5.10 ".config" backwards, one kernel at a time, by "make oldconfig" and accepting defaults for "new" (in this case, since hidden/expired) options.
<paulg>
What would be educational would be to build the 2019 v5.2 based yocto and see what the qemu-x86-64 .config looks like and compare that to the "walked back one" [and/or test that bzimage/config]
lexano has joined #yocto
<paulg>
v5.1 is the same as v5.2 ; guess I'll try v5.0 and let that run while I look at other angles.
fury has joined #yocto
Vonter has quit [Ping timeout: 245 seconds]
<vmeson>
fwiw: git bisect with b/w 20 and 100 tests per commit so not conclusive at all: ends up at: 7c4c016a3d is the first bad commit -- linux-yocto/5.10: update to v5.10.37
camus has quit [Quit: camus]
Vonter has joined #yocto
xmn has joined #yocto
<paulg>
that isn't going back very far.
<paulg>
time for me to find somehwer to build old stuff and capture the kernel .config files for comparison; in case sth slid in that way.
<paulg>
when I've seen "questionable" dentry, Ive also seen paths for non "real" filesystems ; mix that with the ltp doing such tests on cgroup (i.e kernfs) filesystems like this...
<paulg>
while true; do
<paulg>
mount -t cgroup xxx cgroup/ > /dev/null 2>&1
<paulg>
00-old/qemu_boot_log.20210613034535:[ 200.120908] dirat: no inode: --0-- (p=cgroup/0) rc=1
<paulg>
-------------------------
<paulg>
The '||' is my marker for d_alloc_parallel()
<paulg>
"release_agent" belongs to cgroup.
<paulg>
mountinfo is presumably /proc/<PID>/mountinfo
<paulg>
the "no inode" are checking for dentry->d_inode in places where not checking would presumably trigger the null deref we've seen in BUG/OOPS.
<paulg>
dirat is checking for null where we've been seeing RIP: 0010:do_mkdirat+0x6c/0x130
leon-anavi has joined #yocto
<paulg>
RP, the above might give a clue as to which of the tests in your group are probably "key"
<RP>
paulg: I am pleased to say I spent the day away from the computer (mountain biking in a forest that was unfortunately on fire) and should stay away from the computer now due to beer (first time in months) :)
<ant_>
go ahead
* paulg
intends to take some AFK time once I can afford to context flush everything about this from my noodle.
<ant_>
you guys have contributed to global warming
<paulg>
you mean the CPU heating from the test boxes, or the methane from us windbags?
<RP>
paulg: I'll just have to try and page it back in, but tomorrow ;-)
<paulg>
RP, All my knowledge is wrapped in READ_ONCE(...)
<paulg>
In any case, this is interesting. In v5.1-vanilla, I got the dput WARN_ON above 3x in 19 runs.
<paulg>
Identical everything but v5.0-vanilla and I got *zero* instances in 24 testimage runs.
<paulg>
Amusingly there are zero changes to fs/dcache.c in v5.0 --> v5.1 :-/
prabhakarlad has joined #yocto
Vonter has quit [Ping timeout: 245 seconds]
<RP>
paulg: still feels like we're missing something
<paulg>
yah, hence I keep playing all angles.
Vonter has joined #yocto
halstead has quit [Ping timeout: 264 seconds]
barath has quit [Remote host closed the connection]
Emantor[m] has quit [Remote host closed the connection]
shoragan[m] has quit [Read error: Connection reset by peer]
cody has quit [Remote host closed the connection]
ndec[m] has quit [Remote host closed the connection]
janvermaete[m] has quit [Remote host closed the connection]
AlessandroTaglia has quit [Remote host closed the connection]
dwagenk has quit [Remote host closed the connection]
Saur[m] has quit [Remote host closed the connection]
fabatera[m] has quit [Remote host closed the connection]
asus_986_gpu[m] has quit [Remote host closed the connection]
khem has quit [Remote host closed the connection]
Spectrejan[m] has quit [Remote host closed the connection]
jordemort has quit [Read error: Connection reset by peer]
alex88[m] has quit [Write error: Connection reset by peer]
shoragan|m has quit [Remote host closed the connection]
kayterina[m] has quit [Remote host closed the connection]
Pierre-jeanTexie has quit [Remote host closed the connection]
Jari[m] has quit [Remote host closed the connection]
ejoerns[m] has quit [Remote host closed the connection]
Andrei[m] has quit [Remote host closed the connection]
hpsy1 has joined #yocto
Andrei[m] has joined #yocto
kayterina[m] has joined #yocto
sakoman has quit [Ping timeout: 245 seconds]
hpsy has quit [Ping timeout: 245 seconds]
goliath has quit [Ping timeout: 245 seconds]
goliath_ has joined #yocto
Fanfwe42 has joined #yocto
ant_ has quit [Ping timeout: 245 seconds]
Fanfwe has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
gjohnson has quit [Read error: Connection reset by peer]
fray has quit [Ping timeout: 252 seconds]
dev1990_ has quit [Remote host closed the connection]
davidinux has quit [Ping timeout: 252 seconds]
qschulz has quit [Ping timeout: 252 seconds]
stkw0 has quit [Ping timeout: 252 seconds]
nerdboy has quit [Ping timeout: 252 seconds]
Dracos-Carazza has quit [Ping timeout: 252 seconds]
davidinux has joined #yocto
dev1990_ has joined #yocto
stkw0 has joined #yocto
qschulz has joined #yocto
ant_ has joined #yocto
bluelightning has joined #yocto
Dracos-Carazza has joined #yocto
ecdhe_ has joined #yocto
jordemort has joined #yocto
janvermaete[m] has joined #yocto
Emantor[m] has joined #yocto
ejoerns[m] has joined #yocto
Jari[m] has joined #yocto
ndec[m] has joined #yocto
khem has joined #yocto
Pierre-jeanTexie has joined #yocto
Saur[m] has joined #yocto
cody has joined #yocto
shoragan[m] has joined #yocto
barath has joined #yocto
dmoseley_ has joined #yocto
shoragan|m has joined #yocto
AlessandroTaglia has joined #yocto
fabatera[m] has joined #yocto
asus_986_gpu[m] has joined #yocto
alex88[m] has joined #yocto
Spectrejan[m] has joined #yocto
dwagenk has joined #yocto
mattsm has quit [Killed (zirconium.libera.chat (Nickname regained by services))]
<paulg>
vmeson, yep - those are two of the three usual suspects
<paulg>
3rd is do_mkdirat+0x6a/0xf0
<paulg>
sorry, I lied - there is 4.
<paulg>
do_readlinkat+0x86/0x120
<paulg>
normally I'd rule out qemu doing something that is this specific/reproducible in the same kernel area (vs just randomly crashing the guest)... but we do have smp_store_relaase() and similar "less frequently used" barrier foo in use in this area of dentry magic.
kpo_ has quit [Read error: Connection reset by peer]
kpo_ has joined #yocto
vmeson has quit [Ping timeout: 272 seconds]
hpsy1 has joined #yocto
vmeson has joined #yocto
hpsy has quit [Ping timeout: 272 seconds]
hpsy1 has quit [Read error: Connection reset by peer]