dgilmore changed the topic of #fedora-riscv to: Fedora on RISC-V https://fedoraproject.org/wiki/Architectures/RISC-V || Logs: https://libera.irclog.whitequark.org/fedora-riscv || Alt Arch discussions are welcome in #fedora-alt-arches
bkeys has joined #fedora-riscv
jcajka has joined #fedora-riscv
<rwmjones> morning
masami has joined #fedora-riscv
masami has quit [Quit: Leaving]
<davidlt[m]> rwmjones: any ideas if this was fixed? https://github.com/NetworkBlockDevice/nbd/issues/51
davidlt has joined #fedora-riscv
zsun has joined #fedora-riscv
<rwmjones> davidlt[m]: looking
<davidlt[m]> From what understand nothing to be done here
<davidlt[m]> systemd will do kill(-1, SIGSTOP) that will trigger NBD client to drop the connection
<davidlt[m]> That generates some IO errors (very few) until systemd is completely finished unmounting
<rwmjones> it's not one I've seen, but definitely dracut + nbdroot is generally not in a good place and needs fixing
<davidlt[m]> My initramfs is BusyBox + NBD client, not dracut + systemd, so I assume it works differently
<rwmjones> it's also true that the kenrel client cannot recover from kill (or dropped connection)
<davidlt[m]> I assume that this particular issue is that I don't use systemd in initramfs
<davidlt[m]> Yeah, I have another problem
<rwmjones> if you're using busybox then you've got better control over things
<davidlt[m]> By initramfs doesn't have NetworkManager and it doesn't like externally configured connection
<davidlt[m]> s/By/My/
<rwmjones> I would say that if you're shutting down, you don't care about nbd-client, just that you unmount the filesystems before shutdown
<rwmjones> if you really want to be "clean" use nbd-client -d
<davidlt[m]> So I have to quickly reconnected while running from NBD :)
<rwmjones> but otherwise as long as the fses are unmounted, nothing bad can happen
<davidlt[m]> I am on NBD root :)
<rwmjones> but this is on the shutdown path?
<davidlt[m]> systemd synces fs, switches them to read only (IIRC), then kills processes, then syncs, unmounts
<davidlt[m]> There are two issues, this SIGSTOP is at shutdown, doesn't seem to be a big issue right now
<davidlt[m]> NetworkManager is at the boot
<rwmjones> the unmount will send NBD_CMD_FLUSH which will write everything to persistent storage
<rwmjones> who is sending SIGSTOP?
<davidlt[m]> systemd
<rwmjones> but I thought there's no systemd here?
<davidlt[m]> Basically this still exist: https://bugs.devuan.org/cgi/bugreport.cgi?bug=227
<davidlt[m]> Initramfs doesn't have systemd, but the rootfs has.
<rwmjones> I see
<davidlt[m]> nbd has systemd services, I assume somehow those start in rootfs and tell systemd not to send anything
<davidlt[m]> I think I saw that, but I don't know what would start them (I assume systemd based on detecting NBD drive connected)
<rwmjones> so I guess what might happen is nbd-client receives SIGSTOP and (correctly) stops, but that deadlocks something which is trying to write to disk
<rwmjones> the solution is don't send SIGSTOP to nbd-client :-/
<davidlt[m]> No it works, producing 5 IO errors :)
<davidlt[m]> systemd does that unconditionally
<rwmjones> for some definition of "works" :-)
<davidlt[m]> This has RemainAfterExit=yes
<rwmjones> I think ignore_proc probably needs to be modified to ignore nbd-client? or nbd-client needs to set its first char to @
<davidlt[m]> It does that
<davidlt[m]> the problem is kill(-1, SIGTOP before that filtering happens :)
<davidlt[m]> I checked that part :)
<davidlt[m]> SIGSTOP happens unconditionally for all processes before ignore_proc is used to kill of them
<davidlt[m]> So it never gets to that as connected is already dropped with SIGSTOP
<rwmjones> oh I see
<rwmjones> this is basically systemd being wrong, but maybe nbd-client should ignore SIGSTOP or have a flag to do that
<davidlt[m]> Is most likely what needs to run in rootfs (not initramfs) to block systemd from touching it
<davidlt[m]> Nah, this topic was discussed in NBD land for many years, SIGSTOP handling will not happen
<rwmjones> there must be other X-on-root userspace services which have the same problem surely?
<rwmjones> I guess nfsroot is in the kernel
<rwmjones> is there such a thing as iscsiroot?
<davidlt[m]> No idea, didn't check.
<davidlt[m]> Another problem is NetworkManager in rootfs doesn't like ip=dhcp, the state gets into "connected (externally)" and it never inits resolv.conf, NTP, blah, ...
<davidlt[m]> So I am risking a bit and manually doing nmcli con up on a wired connection, but that's very risky.
<davidlt[m]> If not enough crap is cached from NBD it will hang :)
zsun has quit [Remote host closed the connection]
jcajka has quit [Quit: Leaving]
jimwilson has quit [Quit: Leaving]
jimwilson has joined #fedora-riscv
jimwilson has quit [Quit: Leaving]
defolos has quit [Ping timeout: 250 seconds]
davidlt[m] has quit [Ping timeout: 240 seconds]
nomnp[m] has quit [Ping timeout: 240 seconds]
pierce has quit [Ping timeout: 240 seconds]
organizedglobals has quit [Ping timeout: 260 seconds]
davidlt has quit [Ping timeout: 256 seconds]
CarlosEDP has quit [Ping timeout: 250 seconds]
CarlosEDP has joined #fedora-riscv
organizedglobals has joined #fedora-riscv
jimwilson has joined #fedora-riscv
pierce has joined #fedora-riscv
davidlt[m] has joined #fedora-riscv
nomnp[m] has joined #fedora-riscv
nomnp[m] has quit [Remote host closed the connection]
pierce has quit [Remote host closed the connection]
CarlosEDP has quit [Read error: Connection reset by peer]
davidlt[m] has quit [Write error: Connection reset by peer]
organizedglobals has quit [Write error: Connection reset by peer]
defolos has joined #fedora-riscv
CarlosEDP has joined #fedora-riscv
pierce has joined #fedora-riscv
davidlt[m] has joined #fedora-riscv
organizedglobals has joined #fedora-riscv
nomnp[m] has joined #fedora-riscv
defolos has quit [Quit: Client limit exceeded: 20000]
pierce has quit [Quit: Client limit exceeded: 20000]
davidlt[m] has quit [Quit: Client limit exceeded: 20000]
CarlosEDP has quit [Quit: Client limit exceeded: 20000]