jcajka has joined #fedora-riscv
<
davidlt[m]>
rwmjones: seems five is stuck, probably needs a reboot
<
davidlt[m]>
jive probably too
<
davidlt[m]>
both boards seems to have last ping 20+ minutes ago
<
davidlt[m]>
yeah, double checked, both boards went offline within 1-2 minutes of each other
jcajka has quit [Ping timeout: 252 seconds]
bkircher has quit [Ping timeout: 260 seconds]
jcajka has joined #fedora-riscv
bkircher has joined #fedora-riscv
<
rwmjones>
davidlt[m]: hey I'm going to the office later today, will have a lok
<
rwmjones>
actually they seem ok ...
<
rwmjones>
I'll try restarting kojid remotely
<
davidlt[m]>
yeah, it's back
<
davidlt[m]>
it picked up the job
<
rwmjones>
looks like kojid had crashed (python exceptions) on both
<
davidlt[m]>
you can modify kojid system unit file to auto-restart on failure in 10 minutes or so
<
davidlt[m]>
This is what I have:
<
davidlt[m]>
RestartSec=10min
<
davidlt[m]>
Restart=always
<
davidlt[m]>
So it will keep try restarting every 10 minutes on failure
<
davidlt[m]>
Sometimes kojid "likes" to fail due to networking issue (local or server too, e.g. timeout on high server load)
<
davidlt[m]>
It's annoying enough with 170 QEMU to manually restart kojid. Way more easier to just modify kojid.service to do that on failure.
<
rwmjones>
ok, made that change, but didn't restart kojid to pick up the change because it seems both nodes are busy with jobs
<
davidlt[m]>
yeah, I pushed some rebuilds
masami has joined #fedora-riscv
masami has quit [Quit: Leaving]
davidlt has joined #fedora-riscv
defolos has quit [Quit: Bridge terminating on SIGTERM]
pierce has quit [Quit: Bridge terminating on SIGTERM]
CarlosEDP has quit [Quit: Bridge terminating on SIGTERM]
organizedglobals has quit [Quit: Bridge terminating on SIGTERM]
davidlt[m] has quit [Quit: Bridge terminating on SIGTERM]
defolos has joined #fedora-riscv
pierce has joined #fedora-riscv
organizedglobals has joined #fedora-riscv
CarlosEDP has joined #fedora-riscv
davidlt[m] has joined #fedora-riscv
jcajka has quit [Quit: Leaving]
bkircher has quit [Ping timeout: 245 seconds]
bkircher has joined #fedora-riscv
bkircher has quit [Ping timeout: 252 seconds]
bkircher has joined #fedora-riscv
davidlt has quit [Ping timeout: 265 seconds]
bkircher has quit [Ping timeout: 245 seconds]
bkeys has quit [Remote host closed the connection]
bkeys has joined #fedora-riscv
bkeys has quit [Remote host closed the connection]
bkeys has joined #fedora-riscv
cmuellner has quit [Read error: Connection reset by peer]