#fedora-riscv on 2021-12-09 — irc logs at libera.irclog.whitequark.org

2021-06-01 15:14 dgilmore changed the topic of #fedora-riscv to: Fedora on RISC-V https://fedoraproject.org/wiki/Architectures/RISC-V || Logs: https://libera.irclog.whitequark.org/fedora-riscv || Alt Arch discussions are welcome in #fedora-alt-arches

03:04 bkeys has joined #fedora-riscv

07:33 jcajka has joined #fedora-riscv

10:35 <rwmjones> morning

12:34 masami has joined #fedora-riscv

14:00 masami has quit [Quit: Leaving]

14:22 <davidlt[m]> rwmjones: any ideas if this was fixed? https://github.com/NetworkBlockDevice/nbd/issues/51

14:54 davidlt has joined #fedora-riscv

14:55 zsun has joined #fedora-riscv

15:00 <rwmjones> davidlt[m]: looking

15:00 <davidlt[m]> From what understand nothing to be done here

15:01 <davidlt[m]> systemd will do kill(-1, SIGSTOP) that will trigger NBD client to drop the connection

15:01 <davidlt[m]> That generates some IO errors (very few) until systemd is completely finished unmounting

15:02 <rwmjones> it's not one I've seen, but definitely dracut + nbdroot is generally not in a good place and needs fixing

15:02 <davidlt[m]> My initramfs is BusyBox + NBD client, not dracut + systemd, so I assume it works differently

15:02 <rwmjones> it's also true that the kenrel client cannot recover from kill (or dropped connection)

15:03 <davidlt[m]> I assume that this particular issue is that I don't use systemd in initramfs

15:03 <davidlt[m]> Yeah, I have another problem

15:03 <rwmjones> if you're using busybox then you've got better control over things

15:03 <davidlt[m]> By initramfs doesn't have NetworkManager and it doesn't like externally configured connection

15:03 <davidlt[m]> s/By/My/

15:03 <rwmjones> I would say that if you're shutting down, you don't care about nbd-client, just that you unmount the filesystems before shutdown

15:04 <rwmjones> if you really want to be "clean" use nbd-client -d

15:04 <davidlt[m]> So I have to quickly reconnected while running from NBD :)

15:04 <rwmjones> but otherwise as long as the fses are unmounted, nothing bad can happen

15:04 <davidlt[m]> I am on NBD root :)

15:04 <rwmjones> but this is on the shutdown path?

15:05 <davidlt[m]> systemd synces fs, switches them to read only (IIRC), then kills processes, then syncs, unmounts

15:05 <davidlt[m]> There are two issues, this SIGSTOP is at shutdown, doesn't seem to be a big issue right now

15:05 <davidlt[m]> NetworkManager is at the boot

15:05 <rwmjones> the unmount will send NBD_CMD_FLUSH which will write everything to persistent storage

15:06 <rwmjones> who is sending SIGSTOP?

15:06 <davidlt[m]> systemd

15:06 <rwmjones> but I thought there's no systemd here?

15:06 <davidlt[m]> Basically this still exist: https://bugs.devuan.org/cgi/bugreport.cgi?bug=227

15:07 <davidlt[m]> Initramfs doesn't have systemd, but the rootfs has.

15:07 <rwmjones> I see

15:07 <davidlt[m]> nbd has systemd services, I assume somehow those start in rootfs and tell systemd not to send anything

15:08 <davidlt[m]> I think I saw that, but I don't know what would start them (I assume systemd based on detecting NBD drive connected)

15:08 <rwmjones> so I guess what might happen is nbd-client receives SIGSTOP and (correctly) stops, but that deadlocks something which is trying to write to disk

15:08 <rwmjones> the solution is don't send SIGSTOP to nbd-client :-/

15:08 <davidlt[m]> No it works, producing 5 IO errors :)

15:09 <davidlt[m]> systemd does that unconditionally

15:09 <rwmjones> for some definition of "works" :-)

15:09 <davidlt[m]> https://github.com/systemd/systemd/blob/e18f21e34924d02dd7c330a644149d89fcc38042/src/shared/killall.c#L269

15:10 <davidlt[m]> https://github.com/NetworkBlockDevice/nbd/blob/master/systemd/nbd%40.service.sh.in

15:10 <davidlt[m]> This has RemainAfterExit=yes

15:10 <rwmjones> I think ignore_proc probably needs to be modified to ignore nbd-client? or nbd-client needs to set its first char to @

15:10 <davidlt[m]> It does that

15:11 <davidlt[m]> the problem is kill(-1, SIGTOP before that filtering happens :)

15:11 <davidlt[m]> I checked that part :)

15:11 <davidlt[m]> SIGSTOP happens unconditionally for all processes before ignore_proc is used to kill of them

15:11 <davidlt[m]> So it never gets to that as connected is already dropped with SIGSTOP

15:12 <rwmjones> oh I see

15:12 <davidlt[m]> So this: https://github.com/NetworkBlockDevice/nbd/blob/master/systemd/nbd%40.service.tmpl

15:12 <rwmjones> this is basically systemd being wrong, but maybe nbd-client should ignore SIGSTOP or have a flag to do that

15:12 <davidlt[m]> Is most likely what needs to run in rootfs (not initramfs) to block systemd from touching it

15:13 <davidlt[m]> Nah, this topic was discussed in NBD land for many years, SIGSTOP handling will not happen

15:13 <rwmjones> there must be other X-on-root userspace services which have the same problem surely?

15:14 <rwmjones> I guess nfsroot is in the kernel

15:14 <rwmjones> is there such a thing as iscsiroot?

15:14 <davidlt[m]> No idea, didn't check.

15:14 <davidlt[m]> Another problem is NetworkManager in rootfs doesn't like ip=dhcp, the state gets into "connected (externally)" and it never inits resolv.conf, NTP, blah, ...

15:15 <davidlt[m]> So I am risking a bit and manually doing nmcli con up on a wired connection, but that's very risky.

15:15 <davidlt[m]> If not enough crap is cached from NBD it will hang :)

15:50 zsun has quit [Remote host closed the connection]

17:18 jcajka has quit [Quit: Leaving]

17:42 jimwilson has quit [Quit: Leaving]

18:34 jimwilson has joined #fedora-riscv

20:42 jimwilson has quit [Quit: Leaving]

20:48 defolos has quit [Ping timeout: 250 seconds]

20:48 davidlt[m] has quit [Ping timeout: 240 seconds]

20:48 nomnp[m] has quit [Ping timeout: 240 seconds]

20:48 pierce has quit [Ping timeout: 240 seconds]

20:50 organizedglobals has quit [Ping timeout: 260 seconds]

20:50 davidlt has quit [Ping timeout: 256 seconds]

20:50 CarlosEDP has quit [Ping timeout: 250 seconds]

21:07 CarlosEDP has joined #fedora-riscv

21:08 organizedglobals has joined #fedora-riscv

21:34 jimwilson has joined #fedora-riscv

21:41 pierce has joined #fedora-riscv

22:01 davidlt[m] has joined #fedora-riscv

22:10 nomnp[m] has joined #fedora-riscv

22:22 nomnp[m] has quit [Remote host closed the connection]

22:22 pierce has quit [Remote host closed the connection]

22:22 CarlosEDP has quit [Read error: Connection reset by peer]

22:22 davidlt[m] has quit [Write error: Connection reset by peer]

22:22 organizedglobals has quit [Write error: Connection reset by peer]

22:23 defolos has joined #fedora-riscv

22:32 CarlosEDP has joined #fedora-riscv

22:32 pierce has joined #fedora-riscv

22:32 davidlt[m] has joined #fedora-riscv

22:32 organizedglobals has joined #fedora-riscv

22:32 nomnp[m] has joined #fedora-riscv

22:46 defolos has quit [Quit: Client limit exceeded: 20000]

22:47 pierce has quit [Quit: Client limit exceeded: 20000]

22:47 davidlt[m] has quit [Quit: Client limit exceeded: 20000]

22:48 CarlosEDP has quit [Quit: Client limit exceeded: 20000]