dl9pf changed the topic of #yocto to: Welcome to the Yocto Project | Learn more: http://www.yoctoproject.org | Join the community: http://www.yoctoproject.org/community | Channel logs available at https://www.yoctoproject.org/irc/ and https://libera.irclog.whitequark.org/yocto/ | Having difficulty on the list, or with someone on the list? Contact YP community mgr Nicolas Dechesne (ndec)
<vmeson> 451ee2df9a3 linux-yocto/5.4: update to v5.4.117 -> 20 tests, no BUG: on to: 397b549a2e0 linux-yocto/5.10: update to v5.10.36
<vmeson> ant_: I've been rambling far longer than I've been bisecting! ;-)
<vmeson> nice to see the WR builder stats in good shape. The last 12 hrs: 801 builds, 6 fail, mostly on master: chromium, opencl-clang. Could have done ~850 without sstate ;-)
<JPEW> vmeson: are you generating it but not sharing?
<vmeson> JPEW: yes, right now, sstate-cache is generated to the local disk and then removed. I'd like to avoid the useless work but I understand that this could be problematic for bitbake.
<JPEW> Fair... You could look at nullfs FUSE driver. Not sure if it would be "better"
<vmeson> so to be clear: Could have done ~850 or more builds if we didn't have to sstate-cache and then remove it when cleaning up the successful builds ;-)
<JPEW> Do you not want sstate, or you just don't have anywhere to put it?
<vmeson> JPEW: I'll keep that in mind if it can't be disabled completely.
<vmeson> Some of our builds just always build from source so I'd prefer to disable sstate generation in that case.
<JPEW> Ah
<vmeson> less work, less clean-up .
* JPEW is curious because he's fighting tooth and nail to *get* sstate currently
<vmeson> heh
qschulz has quit [Quit: qschulz]
qschulz has joined #yocto
sakoman has quit [Quit: Leaving.]
ecdhe has joined #yocto
ecdhe_ has quit [Ping timeout: 252 seconds]
Vonter has quit [Ping timeout: 272 seconds]
Vonter has joined #yocto
hpsy1 has quit [Ping timeout: 272 seconds]
hpsy has joined #yocto
camus has joined #yocto
camus has quit [Ping timeout: 268 seconds]
camus has joined #yocto
davidinux has quit [Ping timeout: 245 seconds]
davidinux has joined #yocto
vmeson has quit [Remote host closed the connection]
vmeson has joined #yocto
Vonter has quit [Ping timeout: 272 seconds]
Vonter has joined #yocto
<ant_> RP: in meta-initramfs we pushed your IMAGE_FSTYPES python fix
<ant_> no failures here with limited build-testing
Vonter has quit [Ping timeout: 252 seconds]
Vonter has joined #yocto
Vonter has quit [Ping timeout: 264 seconds]
Vonter has joined #yocto
<paulg> remind me in a couple days to patch runqemu to pre-run "--version" so that the qemu version gets captured in the qemu boot log. I'll sell it as a sanity test to make sure the qemu is sane and executable...
lexano has quit [Quit: Leaving]
goliath has joined #yocto
<vmeson> at: 7c4c016a3dd linux-yocto/5.10: update to v5.10.37 -- I ran more tests overnight and see 2 out of 125 tests see the "BUG:"
<vmeson> last "going through the motions" tests of the bisect: 03b9b16598 linux-yocto/5.4: update to v5.4.118 -- but given how the bug is hard to reproduce, I wouldn't draw any conclusions of course.
<paulg> everything I've seen has dcache signatures on it.... from the 1st RIP following the BUG ; the BUG line itself is rather meaningless from a diagnostic POV.
<paulg> I can reproduce it on vanilla v5.4, built with distro native gcc-7 and qemu-5.1 built also outside of yocto - largely only leaving yocto's rootfs/LTP content remaining.
<paulg> I keep finding myself coming back to looking at d_alloc_parallel() --- see https://lwn.net/Articles/692546/
<paulg> debug patches on v5.10 showing me dentries with no d_inode where it is assumed they'd have one (not negative dentry)
<paulg> vanilla v5.4 kernel just tripped this WARN 5m ago...
<paulg> static inline bool retain_dentry(struct dentry *dentry)
<paulg> {
<paulg> WARN_ON(d_in_lookup(dentry));
<paulg> I'm leaning heavily towards a dentry UAF or general dentry_cache [kmem_cache_alloc] corruption.
<paulg> not saying it still couldn't be caused by qemu or cosmic rays - just that it seems to consistently manifest itself in dentry mangulation ; and there is some complex workings in there ; bit spinlock hash lists; RCU walk with fallback to Reference-walk, etc.
goliath has quit [Quit: SIGSEGV]
<paulg> oh, and RP gets a pass on his rubber-chicken eye-of-newt boot arg change ; I reproduced it with that reverted (i.e no tsc= and no rcu_expidite) but on the off chance that rcu expidite increases our odds, I put it back.
<paulg> I guess since this is fs related, I shoulda taken a page from the Viro handbook and used the phrase "cargo cult" instead of rubber-chicken waved over head 3x.
goliath has joined #yocto
sakoman has joined #yocto
<paulg> [ 211.049714] WARNING: CPU: 3 PID: 7969 at /ala-lpggp31/paul/poky/build/tmp/work-shared/qemux86-64/kernel-source/fs/dcache.c:637 dput+0x105/0x140
<paulg> [ 211.052076] CPU: 3 PID: 7969 Comm: mount Not tainted 5.2.0-yocto-standard #1
<paulg> [ 211.051658] Modules linked in:
<paulg> so there. Reproduced as far back as v5.2 kernel ; which was used for whatever the 2019 yocto release was called.
<paulg> I've walked the v5.10 ".config" backwards, one kernel at a time, by "make oldconfig" and accepting defaults for "new" (in this case, since hidden/expired) options.
<paulg> What would be educational would be to build the 2019 v5.2 based yocto and see what the qemu-x86-64 .config looks like and compare that to the "walked back one" [and/or test that bzimage/config]
lexano has joined #yocto
<paulg> v5.1 is the same as v5.2 ; guess I'll try v5.0 and let that run while I look at other angles.
fury has joined #yocto
Vonter has quit [Ping timeout: 245 seconds]
<vmeson> fwiw: git bisect with b/w 20 and 100 tests per commit so not conclusive at all: ends up at: 7c4c016a3d is the first bad commit -- linux-yocto/5.10: update to v5.10.37
camus has quit [Quit: camus]
Vonter has joined #yocto
xmn has joined #yocto
<paulg> that isn't going back very far.
<paulg> time for me to find somehwer to build old stuff and capture the kernel .config files for comparison; in case sth slid in that way.
<paulg> when I've seen "questionable" dentry, Ive also seen paths for non "real" filesystems ; mix that with the ltp doing such tests on cgroup (i.e kernfs) filesystems like this...
<paulg> while true; do
<paulg> mount -t cgroup xxx cgroup/ > /dev/null 2>&1
<paulg> mkdir cgroup/0 > /dev/null 2>&1
<paulg> umount cgroup/ > /dev/null 2>&1
<paulg> rmdir cgroup/0 > /dev/null 2>&1
<paulg> done
<paulg> [ ./testcases/bin/cgroup_regression_5_1.sh ]
<paulg> For example:
<paulg> -----------------
<paulg> 00-old/qemu_boot_log.20210612231906:[ 229.827146] || no inode: flags=4, release_agent (n=release_agent) rc=1
<paulg> 00-old/qemu_boot_log.20210613011638:[ 212.107046] || no inode: flags=4, release_agent (n=release_agent) rc=1
<paulg> 00-old/qemu_boot_log.20210613013254:qemux86-64 login: [ 185.157666] dirat: no inode: --0-- (p=cgroup/0) rc=1
<paulg> 00-old/qemu_boot_log.20210613011019:[ 219.415092] || no inode: flags=0, mountinfo (n=mountinfo) rc=1
<paulg> 00-old/qemu_boot_log.20210613013912:qemux86-64 login: [ 185.721641] dirat: no inode: --0-- (p=cgroup/0) rc=1
<paulg> 00-old/qemu_boot_log.20210613034535:[ 200.120908] dirat: no inode: --0-- (p=cgroup/0) rc=1
<paulg> -------------------------
<paulg> The '||' is my marker for d_alloc_parallel()
<paulg> "release_agent" belongs to cgroup.
<paulg> mountinfo is presumably /proc/<PID>/mountinfo
<paulg> the "no inode" are checking for dentry->d_inode in places where not checking would presumably trigger the null deref we've seen in BUG/OOPS.
<paulg> dirat is checking for null where we've been seeing RIP: 0010:do_mkdirat+0x6c/0x130
leon-anavi has joined #yocto
<paulg> RP, the above might give a clue as to which of the tests in your group are probably "key"
<RP> paulg: I am pleased to say I spent the day away from the computer (mountain biking in a forest that was unfortunately on fire) and should stay away from the computer now due to beer (first time in months) :)
<ant_> go ahead
* paulg intends to take some AFK time once I can afford to context flush everything about this from my noodle.
<ant_> you guys have contributed to global warming
<paulg> you mean the CPU heating from the test boxes, or the methane from us windbags?
<RP> paulg: I'll just have to try and page it back in, but tomorrow ;-)
<paulg> RP, All my knowledge is wrapped in READ_ONCE(...)
<paulg> In any case, this is interesting. In v5.1-vanilla, I got the dput WARN_ON above 3x in 19 runs.
<paulg> Identical everything but v5.0-vanilla and I got *zero* instances in 24 testimage runs.
<paulg> Amusingly there are zero changes to fs/dcache.c in v5.0 --> v5.1 :-/
prabhakarlad has joined #yocto
Vonter has quit [Ping timeout: 245 seconds]
<RP> paulg: still feels like we're missing something
<paulg> yah, hence I keep playing all angles.
Vonter has joined #yocto
halstead has quit [Ping timeout: 264 seconds]
barath has quit [Remote host closed the connection]
Emantor[m] has quit [Remote host closed the connection]
shoragan[m] has quit [Read error: Connection reset by peer]
cody has quit [Remote host closed the connection]
ndec[m] has quit [Remote host closed the connection]
janvermaete[m] has quit [Remote host closed the connection]
AlessandroTaglia has quit [Remote host closed the connection]
dwagenk has quit [Remote host closed the connection]
Saur[m] has quit [Remote host closed the connection]
fabatera[m] has quit [Remote host closed the connection]
asus_986_gpu[m] has quit [Remote host closed the connection]
khem has quit [Remote host closed the connection]
Spectrejan[m] has quit [Remote host closed the connection]
jordemort has quit [Read error: Connection reset by peer]
alex88[m] has quit [Write error: Connection reset by peer]
shoragan|m has quit [Remote host closed the connection]
kayterina[m] has quit [Remote host closed the connection]
Pierre-jeanTexie has quit [Remote host closed the connection]
Jari[m] has quit [Remote host closed the connection]
ejoerns[m] has quit [Remote host closed the connection]
Andrei[m] has quit [Remote host closed the connection]
hpsy1 has joined #yocto
Andrei[m] has joined #yocto
kayterina[m] has joined #yocto
sakoman has quit [Ping timeout: 245 seconds]
hpsy has quit [Ping timeout: 245 seconds]
goliath has quit [Ping timeout: 245 seconds]
goliath_ has joined #yocto
Fanfwe42 has joined #yocto
ant_ has quit [Ping timeout: 245 seconds]
Fanfwe has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
gjohnson has quit [Read error: Connection reset by peer]
fray has quit [Ping timeout: 252 seconds]
dev1990_ has quit [Remote host closed the connection]
davidinux has quit [Ping timeout: 252 seconds]
qschulz has quit [Ping timeout: 252 seconds]
stkw0 has quit [Ping timeout: 252 seconds]
nerdboy has quit [Ping timeout: 252 seconds]
Dracos-Carazza has quit [Ping timeout: 252 seconds]
davidinux has joined #yocto
dev1990_ has joined #yocto
stkw0 has joined #yocto
qschulz has joined #yocto
ant_ has joined #yocto
bluelightning has joined #yocto
Dracos-Carazza has joined #yocto
ecdhe_ has joined #yocto
jordemort has joined #yocto
janvermaete[m] has joined #yocto
Emantor[m] has joined #yocto
ejoerns[m] has joined #yocto
Jari[m] has joined #yocto
ndec[m] has joined #yocto
khem has joined #yocto
Pierre-jeanTexie has joined #yocto
Saur[m] has joined #yocto
cody has joined #yocto
shoragan[m] has joined #yocto
barath has joined #yocto
dmoseley_ has joined #yocto
shoragan|m has joined #yocto
AlessandroTaglia has joined #yocto
fabatera[m] has joined #yocto
asus_986_gpu[m] has joined #yocto
alex88[m] has joined #yocto
Spectrejan[m] has joined #yocto
dwagenk has joined #yocto
mattsm has quit [Killed (zirconium.libera.chat (Nickname regained by services))]
mattsm has joined #yocto
mckoan_ has joined #yocto
Shaun_ has joined #yocto
Fanfwe has joined #yocto
abelloni_ has joined #yocto
leonanavi has joined #yocto
dlan_ has joined #yocto
fullstop_ has joined #yocto
nerdboy has joined #yocto
nerdboy has joined #yocto
nerdboy has quit [Changing host]
ecdhe has quit [Ping timeout: 272 seconds]
leonanavi has quit [Client Quit]
goliath__ has joined #yocto
Fanfwe42 has quit [Ping timeout: 264 seconds]
leon-anavi has quit [Ping timeout: 264 seconds]
zedd has quit [Ping timeout: 264 seconds]
mckoan|away has quit [Ping timeout: 264 seconds]
Shaun has quit [Ping timeout: 264 seconds]
abelloni has quit [Ping timeout: 264 seconds]
hpsy1 has quit [Ping timeout: 264 seconds]
iokill has quit [Ping timeout: 264 seconds]
dlan has quit [Ping timeout: 264 seconds]
fullstop has quit [Ping timeout: 264 seconds]
dmoseley has quit [Ping timeout: 264 seconds]
goliath_ has quit [Ping timeout: 264 seconds]
iokill has joined #yocto
zeddii has joined #yocto
ecdhe has joined #yocto
hpsy has joined #yocto
fray has joined #yocto
Dracos-Carazza_ has joined #yocto
qschulz_ has joined #yocto
sakoman has joined #yocto
Vonter has quit [Ping timeout: 252 seconds]
jsbronder has quit [Ping timeout: 265 seconds]
dmoseley has joined #yocto
davidinux1 has joined #yocto
jsbronder has joined #yocto
zeddii has quit [Ping timeout: 265 seconds]
warthog9 has quit [Ping timeout: 265 seconds]
nohit has quit [Ping timeout: 265 seconds]
zeddii has joined #yocto
ant_home has joined #yocto
warthog9 has joined #yocto
droman has joined #yocto
xantoz has quit [Ping timeout: 265 seconds]
<vmeson> previously "good" 03b9b16598 qemu_boot_log.20210613193204:[ 220.854269] BUG: kernel NULL pointer dereference, address: 0000000000000008
xantoz has joined #yocto
<vmeson> $ grep -m 1 -A 7 "BUG:" `pwd`/../b/yp-corrupt*/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/qemu_boot* | grep RIP | cut -d":" -f3| uniq -c
<vmeson> 4 kernfs_sop_show_path+0x1c/0x60
<vmeson> 1 d_alloc_parallel+0xd5/0x570
dmoseley_ has quit [*.net *.split]
ecdhe_ has quit [*.net *.split]
Dracos-Carazza has quit [*.net *.split]
ant_ has quit [*.net *.split]
qschulz has quit [*.net *.split]
stkw0 has quit [*.net *.split]
dev1990_ has quit [*.net *.split]
davidinux has quit [*.net *.split]
nohit has joined #yocto
Shaun_ is now known as Shaun
leon-anavi has joined #yocto
xmn has quit [Quit: ZZZzzz…]
leon-anavi has quit [Quit: Leaving]
dlan_ is now known as dlan
dlan has quit [Changing host]
dlan has joined #yocto
<paulg> vmeson, yep - those are two of the three usual suspects
<paulg> 3rd is do_mkdirat+0x6a/0xf0
<paulg> sorry, I lied - there is 4.
<paulg> do_readlinkat+0x86/0x120
<paulg> normally I'd rule out qemu doing something that is this specific/reproducible in the same kernel area (vs just randomly crashing the guest)... but we do have smp_store_relaase() and similar "less frequently used" barrier foo in use in this area of dentry magic.
kpo_ has quit [Read error: Connection reset by peer]
kpo_ has joined #yocto
vmeson has quit [Ping timeout: 272 seconds]
hpsy1 has joined #yocto
vmeson has joined #yocto
hpsy has quit [Ping timeout: 272 seconds]
hpsy1 has quit [Read error: Connection reset by peer]
hpsy has joined #yocto
prabhakarlad has quit [Quit: Client closed]
goliath__ has quit [Quit: SIGSEGV]