#fedora-riscv on 2022-09-08 — irc logs at libera.irclog.whitequark.org

2021-06-01 15:14 dgilmore changed the topic of #fedora-riscv to: Fedora on RISC-V https://fedoraproject.org/wiki/Architectures/RISC-V || Logs: https://libera.irclog.whitequark.org/fedora-riscv || Alt Arch discussions are welcome in #fedora-alt-arches

01:59 pjw has joined #fedora-riscv

02:16 zsun has joined #fedora-riscv

03:18 zsun has quit [Quit: Leaving.]

06:24 <davidlt[m]> djdelorie: Koji killed GCC (timeout was reached)

06:24 <davidlt[m]> djdelorie: could you reboot the board, and I would restart the GCC build

06:24 <davidlt[m]> LLVM 15 is also incoming too

06:28 jcajka has joined #fedora-riscv

06:58 zsun has joined #fedora-riscv

07:04 zsun has quit [Remote host closed the connection]

07:21 <davidlt[m]> I will continue with Perl bootstrap, but it's getting close where I will disable perl_bootstrap macros and start rebuilding again

07:21 <davidlt[m]> Majority of direct perl packages (perl-*) are already in.

11:24 esv_ is now known as esv

11:27 zsun has joined #fedora-riscv

11:43 masami has joined #fedora-riscv

12:37 <zsun> davidlt[m]: hi, I see you already have the basic riscv kernel config in your gitea. Do you have a plan when to submit it to fedora kernel-ark?

12:37 <zsun> http://fedora.riscv.rocks:3000/davidlt/kernel-ark/commit/d8d46eb3b0a645e7716cf103d3aba900f12ffd52

12:38 <zsun> I am thinking that submitting to fedora kernel-ark will make it easier for people to collabroate

12:39 <davidlt[m]> It's on the TODO list, all the bits are kinda in place (just some minor updates)

12:40 <zsun> that's great

12:40 <davidlt[m]> There is bugzilla ticket for that, and I did chat with kernel maintainer some time ago.

12:40 <davidlt[m]> Is that something you would need sooner than later?

12:40 <davidlt[m]> If there is a need I could prioritize that maybe next week.

12:41 <zsun> davidlt[m]: not in a hurry. I am helping tekkamannijia generating the config files added for kernel-ark and just realized you already have most of them

12:41 <zsun> as this is already on your plan, I'll do my work on top of yours, which is much easier I believe

12:42 <davidlt[m]> Cool, OK. I will try to do is sooner than later. I hoping to finish Perl bootstrap this week.

12:42 <zsun> cool, thanks

12:55 masami has quit [Quit: Leaving]

15:30 jcajka has quit [Quit: Leaving]

15:55 zsun has quit [Quit: Leaving.]

16:33 <djdelorie> davidlt[m]: rebooted

16:43 <davidlt[m]> djdelorie: understood. resubmitting GCC

16:43 <djdelorie> note it was busy building srpms at the time ;-)

16:48 <davidlt[m]> djdelorie: not a problem, I will force a build on it :)

16:48 <djdelorie> I figured you'd disable general job-getting so it was only running one

16:49 <davidlt[m]> Nah, the machine capacity is enough to lock it at one build

16:49 <djdelorie> oh good

16:49 <davidlt[m]> So I just need to get it to build GCC and no other job will attempt to land.

16:49 <davidlt[m]> I believe the capacity for the node is 2.0 and GCC build weight is 6.0

16:50 <davidlt[m]> I just use my god mode in Koji to shuffle things manually a bit ;)

16:50 <nirik> assign-task --force is fun. ;)

16:50 <davidlt[m]> Alternatively would be to configure different channels and configure some logic in hub, but that's not needed (now).

16:50 <davidlt[m]> nirik: it is :)

16:51 <davidlt[m]> nirik: your board has processed ~650 tasks already

16:51 <nirik> I had a short network offline time yesterday, but I don't think it affected the builder here much

16:51 <nirik> excellent. ;)

16:51 <nirik> temp is staying pretty good.

16:52 <nirik> CPU Temp: +39.8°C

16:52 <davidlt[m]> That's very good

16:54 <davidlt[m]> nirik: your board produced 579 RPMs so far

16:55 <nirik> great. Glad it's getting use... sorry I took so long to finish getting it setup

17:00 <davidlt[m]> nirik: we will need to rebuild our Koji server and to slowly bring infrastructure closer to what upstream Fedora does. Would you be willing to look into that, maybe take a lead or/and at least define the phases with what needs to be done?

17:00 <davidlt[m]> This way I could take myself a bit out of the loop on all the things.

17:01 <davidlt[m]> Otherwise I will just do whatever I do :) But I prefer for things to happen faster thus more people take parts would be nice to have.

17:01 <davidlt[m]> neil: showed an interest a few parts like pungi and disk image generation (which we don't do the way Fedora does).

17:02 <nirik> davidlt[m]: possibly. I'm pretty busy, but I can see... I can at least write up what I think plans make sense and try and work on it?

17:02 <nirik> where's the best place to discuss? mailing list?

17:02 <davidlt[m]> Whatever you prefer. Mailing list might be good as we haven't sent to many emails there :)

17:03 <neil> i need to make my fedoraproject.org email alias work some day...

17:05 <davidlt[m]> We could also use: https://discussion.fedoraproject.org/tag/risc-v

17:07 <nirik> neil: are you in more than one group?

17:07 <nirik> davidlt[m]: I can arrange that.

17:08 <davidlt[m]> If you want I could describe what we have, what we considered, etc.

17:09 <davidlt[m]> Realistically our biggest issues is storage. We have ~20TB of fast flash storage, not used in efficient way. Like we even keep backups on the same drives. We have 100+TB of HDDs, that don't have a physical server (never got funding to get it going, but the drives exist).

17:10 <davidlt[m]> We are running out of /mnt/koji space. Quick solution would be to pool all 3 NVMes (~20TB) of Flash and with no redundancy an keep going for 2-3 years. Depend on backups, but actually build a local physical server.

17:11 <nirik> my thought was to spin up a hub in aws... and import f38 once the current koji builds it... then we only have 38+ in there and can try and start keeping up with mainline...

17:11 <davidlt[m]> Our Koji is located in SF, but majority of the boards will be at my place. Koji is data movement challenge. We moved over petabyte of data. I am in Lithuania thus that's a long way to ship data without a local cache, but I do have a fiber.

17:11 <nirik> but that doesn't take much advantage of your current hw

17:11 <davidlt[m]> We will run out of storage before we can get to full Rawhide.

17:11 <davidlt[m]> Oh yeah, our flash storage was expensive, but I could do repos very fast ;)

17:11 * djdelorie wonders if "move to SF" is one of davidlt[m]'s options ;-)

17:11 <davidlt[m]> And it had no problems feeding ~170 QEMU builders.

17:13 <davidlt[m]> Yeah, but I am not living alone thus not my own personal decision :)

17:13 <nirik> the current hub... what space does it have for storage expansion?

17:13 <neil> nirik: not yet

17:14 <davidlt[m]> Too small main drive, 256G. <20TB PCIe x8 NVMe. All slots filled. There are 8 SAS/SATA drive support IIRC.

17:14 <nirik> neil: ok, thats needed for the alias to kick in. I can add you to some group if you like... \

17:15 <davidlt[m]> Thus I can get it to limb towards full Rawhide by pooling all NVMe to ~20TB (no redundancy, that's fine with me). Updating the main drive to 1-2T NVMe (M.2). Use the same M.2 for postgre db.

17:15 <davidlt[m]> Then have a small machine with those 100+TB raw HDD storage with some RAID6 or something and depend on that.

17:15 <davidlt[m]> Currently backup repository sits here locally, but that's running out of space too.

17:16 <davidlt[m]> I do have a replacement for external drive to NAS (50TB).

17:16 <davidlt[m]> And at some point I might switch to 2Gbps fiber.

17:17 <davidlt[m]> I have 16 Unmatched boards that will be connected to Koji. Still need to buy a few parts.

17:18 <davidlt[m]> Our Koji runs on Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz

17:18 <davidlt[m]> That's 2S, 20C, 40T.

17:18 <davidlt[m]> 128G of RAM

17:18 <nirik> cool.

17:19 <nirik> so it sounds like: limp along to rawhide parity or close... then discuss and figure out plan after that?

17:19 <davidlt[m]> Current config is RAID1 with 2 NVMe for /mnt/koji.

17:19 <davidlt[m]> and one NVMe is used for backups and postgredb.

17:19 <neil> i suspected I'd be a member of at least one group due to signing the contributor agreement but I think maybe the signed_fpca group isn't working or needed anymore maybe

17:19 <nirik> and do you need $'s for the first part now (disclaimer: I don't have any, but I can talk to mgmt)

17:19 <nirik> neil: yeah, it has to be one in addition to that.

17:20 <davidlt[m]> I can work on the funding.

17:20 <neil> I too can inquire about funding via $dayjob and/or Rocky Enterprise Software Foundation

17:20 <davidlt[m]> I have some secured for buying the missing parts for Unmatched to get 16 boards connected.

17:21 <neil> very nice :)

17:21 <davidlt[m]> The rest depends on how you would like to handle our Koji infra :)

17:22 <davidlt[m]> So if we decide to limp for some time until we fully catch up, that's fine. In that case minimal to none investment needed probably.

17:22 <davidlt[m]> But we still need a plan for afterwards.

17:23 <neil> and "wing it" doesn't count as a plan I guess?

17:23 davidlt has joined #fedora-riscv

17:23 <davidlt[m]> You might want to ping Al Stone at RH about this.

17:24 <davidlt[m]> Well, the current stuff only gives us ability to produce RPMs and disk images, but not in a proper way.

17:24 <davidlt[m]> A proper content, but cooked in an old ARMv7 way.

17:25 <davidlt[m]> No pungi, koschei, koji-shadow, modularity, RPM sign infra, CI gating, etc.

17:26 <davidlt[m]> If I do that then I don't have to look into packages :)

17:27 <neil> :)

17:27 <nirik> I think a lot of that could come after mainline/primary and isn't so important for secondary.

17:27 <nirik> pungi/composes might be good tho.

17:28 <davidlt[m]> That's not gonna happen for quite some time.

17:28 <davidlt[m]> We have been discussing this for years now. Until the proper standards based hardware arrives riscv64 will not be in the official Fedora koji instance.

17:29 <davidlt[m]> So it's always gonna be a secondary arch with a separate koji infra.

17:29 <nirik> right, it has to be able to reasonably keep up and have hw that doesn't need a lot of handholding

17:29 <djdelorie> "keep up" is a separate problem

17:29 <djdelorie> "server grade" is more like remote managment etc

17:30 <davidlt[m]> Well, if SiFive / Intel P550 with Intel 7nm happens at some point that shouldn't be an issue I gues :)

17:30 <nirik> right.

17:30 <davidlt[m]> That is/was suppose to be released in 2022.

17:30 <nirik> years not over yet. ;)

17:30 <davidlt[m]> StarFive Tech JH7110 will give a nice boost (but 8GB of RAM, way cheaper too).

17:31 <davidlt[m]> Servers are some years out, but work is WIP.

17:31 <davidlt[m]> Specs are going forward. Ventana is upstreaming their stuff into GNU toolchain.

17:32 <davidlt[m]> Until that we support e-waste hardware (not build based on standards) like old good armv7hl :)

17:32 <nirik> right, so it might be that we have 1 gen of secondary infra we need to run before mainline... anyhow, I can start a thread on this on the list and we can see if there's consensus

17:33 <nirik> our first 32 bit arm hw was the lovely calxeda... 24 (I think) armv7 boards in a 4U chassis

17:33 <davidlt[m]> Note, we will run out of storage sooner than later thus some rebuilding will need to happen :)

17:34 <davidlt[m]> I actually already run out of space one night, but I found an extra ~140G I could delete :)

17:34 <davidlt[m]> I never touch that. I was ARMv8/aarch64 boy. All my toys were from other brands :)

17:35 <nirik> yep. step1: more storage to limp along more. step2: new hub/storage setup with more room to grow/other services. step3: mainline

17:35 <nirik> (at least in my mind)

17:35 <davidlt[m]> So we might need to have "temporary plan" before anything else.

17:35 <davidlt[m]> Basically how to keep it afloat until we catch up on RPMs side.

17:37 <nirik> yep

17:37 <davidlt[m]> My suggestion: pool in all the PCIe x8 NVMes to form ~20TB /mnt/koji (no redudancy). Update the main drive (main OS + postgredb). Or and also look into SAS drives for postgredb.

17:38 <nirik> that sounds fine to me. Does that mean you will have to reformat and sync data back from backup?

17:38 <davidlt[m]> Use those 100+TB RAW HDDs with high redundancy for local backups (we use restic for that).

17:38 <davidlt[m]> Would require buying a sever for that.

17:39 <davidlt[m]> I would considering switch to Btrfs for /mnt/koji for snapshots.

17:39 <davidlt[m]> Yes, either local or remote (directly from home).

17:39 <davidlt[m]> It's probably ~6TB.

17:40 <nirik> so, perhaps we should look for funding for a new server (with no or limited drives, since you have a bunch)

17:40 <davidlt[m]> This is what we have right now in Koji hub:

17:40 <davidlt[m]> Node SN Model Namespace Usage Format FW Rev... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/124131ea766c70e8ce6bb53153a2c945a27f2164)

17:42 <davidlt[m]> Yeah, server also needs a new home. The current Colo in Fremont should be replaced.

17:42 <davidlt[m]> Alternative would be to build a local cheaper server at my place as that's where majority of boards will be.

17:43 <nirik> or aws... ;) (could be in a region near your boards)

17:43 <davidlt[m]> I am not sure moving such amounts of data over such distance makes sense.

17:43 <nirik> we don't really have a place for a machine right now, I would have said our community cage in rdu, but it's supposedly moving at some point before too long... but I guess it's possible to have something there.

17:44 <nirik> hum.

17:44 <neil> i'll check w/ some contacts to see if I can scrounge some hw. colo space is a bit harder. assumedly something lower power would be nice if it's gonna be in your home

17:44 * nirik just remembered a blade center he was trying to find a good use for.

17:44 <davidlt[m]> The thing about AWS is that it's not NVMe and moving data out is expensive :)

17:45 <nirik> it can be nvme. You just need to specify... :) and amazon is comping our fedora account right now at least...

17:45 <neil> 👆

17:45 <neil> :)

17:46 <neil> i'll also throw out that Rocky has a new build system that might be useful at least insofar as it doesn't require NFS at all--just an object storage

17:47 <neil> obviously it needs to go into koji at the end of the day, though

17:57 <davidlt[m]> Well as I said before I can leave the best course of action for you both to figure out :)

18:00 <davidlt[m]> Of course we could move it to AWS ASAP and just forget about it, or Rocky infra.

18:03 <davidlt[m]> djdelorie: check your baord

18:03 <davidlt[m]> djdelorie: I don't see a ping on Koji side

18:03 <djdelorie> right, the usual "won't start kojid on boot" problem

18:04 <davidlt[m]> you can modify service file ;)

18:05 <davidlt[m]> It's back

18:28 <neil> nirik want me to start an ether pad or something and we can jot down ideas about steps/plans/phases?

18:29 <nirik> neil: if you like. I was gonna start a list thread, but we could use something like that to organize the discussion.

18:30 <neil> the openinfra folks infected me with ether pad ☺

18:30 <nirik> I need to drop off for a bit... in town today waiting for my car to get repaied and I need to move and find a place with power. ;)

18:30 <nirik> back in a bit.

18:32 <davidlt[m]> I will be sleeping, but you can leave me any questions in a thread, IRC, or ether pad :)

18:33 <neil> :) sounds good. ty

19:40 davidlt has quit [Ping timeout: 244 seconds]