#riscv on 2022-12-28 — irc logs at libera.irclog.whitequark.org

2021-08-01 01:31 sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv

00:10 <conchuod> I really do want something "server" grade, by which I mean honeycomb lx2 level. So not really server grade at all but a vast step up from SBC/embedded SoCs we have now...

00:10 epony has quit [Ping timeout: 268 seconds]

00:11 <rneese> sifive unmatched is about it currently

00:11 <rneese> if you can find one

00:11 <conchuod> sifive unmatched is not even in the same ballpark I'm afraid.

00:11 <conchuod> I have one ;)

00:12 <conchuod> There's just no way that I am going to waste my time compiling anything significant natively on it!

00:13 <rneese> well crosscompile on a ampere

00:13 <rneese> https://www.tomshardware.com/news/ampere-altra-max-128-core-priced

00:17 epony has joined #riscv

00:19 Gravis has joined #riscv

00:36 <another|> intel horse creek might be interesting

00:36 <another|> but actual server hw is still a few years away

00:45 elastic_dog has quit [Remote host closed the connection]

00:47 elastic_dog has joined #riscv

00:51 jacklsw has joined #riscv

01:03 zjason` is now known as zjason

01:11 radu2427534357 has joined #riscv

01:12 radu242753435 has quit [Ping timeout: 260 seconds]

01:12 radu2427534357 is now known as radu242753435

01:12 zjason has left #riscv [ERC (IRC client for Emacs 28.0.50)]

01:13 zjason has joined #riscv

03:06 frkzoid has joined #riscv

03:09 freakazoid332 has quit [Ping timeout: 260 seconds]

04:11 radu2427534354 has joined #riscv

04:12 radu242753435 has quit [Ping timeout: 246 seconds]

04:12 radu2427534354 is now known as radu242753435

05:11 radu2427534358 has joined #riscv

05:13 radu242753435 has quit [Ping timeout: 260 seconds]

05:13 radu2427534358 is now known as radu242753435

05:57 jacklsw has quit [Quit: Back to the real world]

06:22 jacklsw has joined #riscv

06:39 BootLayer has joined #riscv

07:36 junaid_ has joined #riscv

07:38 Noisytoot has quit [Quit: ZNC 1.8.2 - https://znc.in]

07:41 Noisytoot has joined #riscv

07:51 vagrantc has quit [Quit: leaving]

09:11 radu2427534355 has joined #riscv

09:12 junaid_ has quit [Ping timeout: 268 seconds]

09:12 radu242753435 has quit [Ping timeout: 246 seconds]

09:12 radu2427534355 is now known as radu242753435

09:12 junaid_ has joined #riscv

09:22 bauruine has joined #riscv

09:27 jacklsw has quit [Ping timeout: 264 seconds]

09:35 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

09:35 TMM_ has joined #riscv

09:56 ___nick___ has joined #riscv

10:11 junaid_ has quit [Ping timeout: 248 seconds]

10:24 junaid_ has joined #riscv

11:30 jmdaemon has quit [Ping timeout: 246 seconds]

11:41 junaid_ has quit [Remote host closed the connection]

11:43 crest has quit [Read error: Connection reset by peer]

11:44 <conchuod> How is cross compiling on an ampere any different to cross compiling on my x86 box?

11:44 crest has joined #riscv

11:56 billchenchina has quit [Quit: Leaving]

12:02 billchenchina has joined #riscv

12:06 billchenchina has quit [Read error: Connection reset by peer]

12:07 <mps> rneese: I updated https://dev.alpinelinux.org/~mps/riscv64/visionfive.img.xz image with tzdata and newest kernel

12:08 <q66> i native-compile with qemu-user and it's several times faster than hifive unmatched :P

12:09 <mps> selecting time zone now works with setup-alpine or setup-timezone

12:09 <q66> conchuod: it's not

12:11 radu2427534357 has joined #riscv

12:13 radu242753435 has quit [Ping timeout: 246 seconds]

12:13 radu2427534357 is now known as radu242753435

12:16 <conchuod> Yah, that's what I was getting at.

12:17 <conchuod> It's not grunt I want something like an lx2 for

12:20 <conchuod> Or at least, not *pure* grunt

12:25 <q66> i want faster hardware for builds

12:25 <q66> because qemu-user is a pain in the ass and cross-compiling sucks for distro building

12:26 <q66> too much jank, not everything builds that way without major workarounds and can't run tests

12:27 <q66> qemu-user is *mostly* transparent but still means one can't run tests and rely on their results

12:27 <q66> plus it wastes massive amounts of power

12:30 radu242753435 has quit [Quit: The Lounge - https://thelounge.chat]

12:44 <conchuod> Right. Pure grunt on it's own is not what is important, it's grunt *plus* all of the above

12:45 bauruine has quit [Remote host closed the connection]

12:52 <rneese> I build on arm64 for riscv and its alot fatser then x86

12:53 <rneese> you might look at learning docker

12:53 <rneese> its alot faster for builds

12:54 <rneese> I currently build on a mac m1 in docker in 5 min for a cli img

12:54 <rneese> and 20 min for a fulll on desktop img

13:00 <q66> you are not making any sense

13:00 <q66> whether you build on arm64 or on x86 does not matter, if you have x86 that is faster than the arm it will build faster

13:00 <q66> and docker is just a wrapper around namespaces and similar

13:01 <rneese> but it speeds things up if done correctly

13:01 <q66> it doesn't speed anything up

13:01 <rneese> it removes using caches hich slow thigns down

13:01 <q66> ????

13:03 <rneese> its all abotu the settings in a docker

13:03 <q66> docker does not speed anything up, all it potentially does is provide a different host environment

13:03 <q66> there are many ways you can get that

13:04 <q66> it likewise does not help if you want to do native (non-cross) riscv builds

13:06 <rneese> explain how I went from a 45 min build to 30 min build for a full desktop img for riscv64 in our builder on a docker , and that includes a kernel build. on a mac m1

13:06 <rneese> I will get the guy who setup our docker to outline it

13:06 <q66> compared to doing what

13:07 <rneese> same build on a x86 server with 64 gig ram and 24 cores

13:08 <rneese> docker on m1 speed things up alot faster

13:08 <q66> you are not making any sense

13:08 <rneese> 1 min

13:08 <q66> you are comparing build speeds between two completely different computers

13:08 <q66> docker has nothing to do with it

13:09 <rneese> building with out docker on arm64 is only 5% fatser

13:10 <q66> then you need to describe how you build on arm64 without docker

13:10 <rneese> not going to fight abotu this I will dun a full cli build in a docker with time output it builds a kernel and the base cli img

13:14 <q66> docker on mac is pretty much just a vm (it needs to be, because the mac host does not have the linux syscall layer), while on linux it's a bunch of tooling around kernel namespaces

13:14 <q66> it never makes anything faster by itself

13:14 <q66> if anything the goal is so that it's not much slower than a native run

13:15 BootLayer has quit [Quit: Leaving]

13:16 <rneese> [🐳|🌱] Runtime [ 3 min ]

13:16 <q66> if something is faster inside, it can be due to various factors, e.g. the toolchain within the container environment being different

13:16 <q66> or the tooling having some overhead within a mac host

13:16 <rneese> and I have a full cli img ready to go

13:18 <conchuod> Also

13:18 <conchuod> >same build on a x86 server with 64 gig ram and 24 cores

13:18 <conchuod> Is the m1 faster than said server?

13:18 <conchuod> Just out of curiousity

13:18 <q66> it could be depending on the cores

13:18 <rneese> the m1 is a mac mini 8 cores 16 gig ram

13:19 <q66> the build does not always use all of them, if you are building a whole system image it likely spends considerable time in i/o and single-threaded stuff like running configure scripts and installing

13:19 <q66> mac arm hardware is supposed to have pretty good single-core performance

13:21 <q66> in any case saying that docker somehow on its own magically improves performance or is even capable of doing so is completely off the mark and misleading

13:22 <conchuod> I don't have any decent arm64 hardware, but I would be pretty sure that config for config my x86 box is going to be faster than anything shy of several 1000$

13:22 <conchuod> Certainly it'd be faster than running docker in a vm from macos

13:24 <q66> conchuod: it really depends, like, lots of x86 server hw is xeon systems, say, 2 cpus each 12 cores at 2 ghz

13:24 <q66> and then the performance can be pretty variable

13:24 <q66> anything relying on single-threaded perf is not gonna be great because low clocks

13:24 <conchuod> Yah, I dunno what x86 box rneese is using

13:25 <q66> and then if you have a multisocket system you can have e.g. NUMA memory affinity stuff screw it up

13:26 <conchuod> But my x86 box is pretty good, so I don't think building on arm64 would serve any purpose other than making things clunkier.

13:27 <q66> probably not yeah

13:28 <q66> there is no inherent advantage to using arm64, might as well use the fastest hw you have regardless of architecture

13:28 <q66> if you are cross-compiling anyway

13:29 <q66> if doing emulation x86 is gonna be the fastest again because qemu's tcg jit is usually fastest on x86 hosts

13:29 <q66> it's definitely quite a bit faster than on my power9 boxes for instance

13:29 <rneese> CPU: Intel Xeon E5-2680 v4 (40) @ 2.399GHz Memory: 469MiB / 39164MiB

13:29 <rneese> this is one of the 2 builder

13:30 <rneese> the other builder is in use and access blocked

13:30 <q66> yeah so it's an old xeon with low clocks and multisocket NUMA config

13:30 <rneese> that on is a xeon the other is a diff unit

13:31 <rneese> I just built a xfce riscv img in a dockers

13:31 <rneese> [🐳|🌱] Runtime [ 5 min ]

13:31 <rneese> the whale is how we mark we are using docker

13:32 <conchuod> tbh, without knowing *exactly* what was compiled/recompiled that 5 mins is pretty meaningless

13:32 <mps> building kernel on M1pro with linux installed is fast really, 3 minutes

13:33 <mps> on Ampere Altra it takes lot more time

13:34 <conchuod> The apple arm64 stuff is great, I dont think there's much point denying that.

13:34 <conchuod> But whether it is or not is mostly unrelated to wanting native build infra

13:35 <mps> conchuod: right, I agree

13:36 <mps> I expected that build riscv with qemu-user will be significantly faster or arm64 than on x86_64, but in my test no big difference

13:36 <mps> s/or/on/

13:39 <q66> llvm build in my distro in qemu-user on ryzen 5950x still takes all night and it's not even lto

13:39 <q66> on the same cpu for x86 it takes like 30 mins and that's with lto :)

13:40 <q66> on my power9 boxes it's between an hour and two, depending on the cpu

13:40 <conchuod> And natively on a 5950x that takes like 30 ye

13:40 <q66> on the aarch64 builder i have for arm native packages (which is ampere emag) it takes about 4

13:41 <qwestion> is there a sunxi-like chart for mainlining effort of c910 (and others?)? how can one get paid to contribute to upstreaming foss drivers and such?

13:41 <q66> it used to take longer, then i spent a bunch of time digging around musl and ended up patching in scudo as the default libc allocator

13:41 <q66> and everything got faster

13:42 <conchuod> smaeul: or icenowy might know about that sort of thing qwestion

13:43 <qwestion> i.e. https://linux-sunxi.org/Linux_mainlining_effort

13:44 <mps> q66: so scudo is faster than default musl allocator iiuc correctly

13:45 <conchuod> qwestion: do people get paid for upstreaming that stuff?

13:49 <jn> via employment at companies like Bootlin, maybe

13:55 <qwestion> does icenowy chat here?

13:59 <conchuod> Pretty sure I've seen them here, although I may be mis-remembering

14:24 <rneese> sorry got called away found a bug in our builder for desktops

14:24 <rneese> bbl

14:46 pedja has joined #riscv

14:54 Starfoxxes has quit [Ping timeout: 265 seconds]

15:06 Starfoxxes has joined #riscv

15:12 BootLayer has joined #riscv

15:12 <smaeul> conchuod: there have been a few crowdfunded or manufacturer-sponsored projects in the past, like the video decoder driver, but otherwise nobody doing the upstreaming gets paid

15:15 <smaeul> qwestion: very few people have C910 hardware (and I have none), so I don't think any community upstreaming effort has started. you could ask Guo Ren if Alibaba has plans to do any first-party upstreaming

15:23 <q66> mps: significantly, yeah

15:23 <q66> especially with threads

15:25 <q66> in musl i have a bit of a special configuration that is tuned to be even faster and somewhat leaner

15:25 <q66> mainly comes down to https://github.com/chimera-linux/cports/blob/master/main/musl/files/wrappers.cpp

15:26 <q66> which implements the configuration and frontend

15:27 <mps> q66: ah. chimera linux uses it by default?

15:27 <q66> yeah i made it default because it performed universally well

15:28 <mps> interesting. maybe worth look if it could be added to alpine

15:28 <q66> i have it default to primary32 allocator (which is normally only used on 32-bit archs in default configs, but primary64 reserves tons of virtual memory and i didn't find the performance to be much if at all better)

15:28 <q66> and disabled secondary cache

15:28 <q66> because secondary cache was causing some janky behavior with qemu-user

15:29 <q66> actually using primary64 with qemu-user is kind of a nope too

15:29 ldevulder has joined #riscv

15:29 <q66> because reserving 130GB of virtual memory == qemu causes each startup to take like 500ms

15:29 <q66> which results in degraded performance with anything emulated

15:30 <q66> the number can be tuned, but i couldn't get it under 8GB without it becoming unreliable

15:30 <q66> at 8GB the cost wasn't huge but i still didn't like it

15:30 <mps> q66: thanks for explanation

15:32 <q66> mps: alpine is currently using scudo with lld

15:32 <q66> because standard allocator performance makes linking take 3x as long

15:33 <q66> but it uses the .so that comes with llvm

15:33 <q66> so it's unusable on arm apparently

15:33 <conchuod> smaeul: yah, I figured none of it was first party - but I figured at least some of the driver support would be via the various linux consultancy places

15:33 <q66> i successfully use my in-libc version on all archs

15:34 <q66> which is currently aarch64, ppc64le, riscv64 and x86_65

15:34 <q66> 64

15:35 <mps> q66: yes, I see scudo-malloc pkg in alpine but never looked for what is used

15:36 <q66> replacing allocator in musl is kind of a tricky thing btw, you can't do it with just any allocator

15:36 <q66> most of them seem to rely on thread_local being functional

15:37 <mps> hm, right

15:37 <q66> which is not the case in musl, libc.so is not allowed to contain tls because the dynamic linker does not set it up till later (and the tls is itself malloc'd) and does not handle ELF TLS for itself

15:38 <q66> scudo can be made to work because it's very configurable at build-time and consists of several components that you can mix and match

15:38 <q66> so all it takes is implementing a custom tsd registry that does not rely on thread_local

15:39 <q66> in my case i just mmap the memory for it and then store a pointer in each struct pthread

15:39 <mps> aha, looks like too much complicated for me

15:39 <q66> well, i allocate a 64k-sized chunk and then split it into several tad's and give them out/recycle as needed

15:39 <q66> *tsd's

15:40 <q66> 64k because that's the largest standard page size you will run into

15:40 <q66> and mmap can only deal with pages

15:40 <mps> yes

15:41 <q66> the actual size of the registry struct is maybe 6k or so

15:42 <q66> might as well pack it and reduce waste

15:43 <mps> q66: thanks again for explanation. now I understand from "birds point of view"

15:43 jacklsw has joined #riscv

15:43 <q66> (and recycling means not having to map and unmap all the time)

15:43 <q66> np

15:46 jack_lsw has joined #riscv

15:46 jack_lsw has quit [Client Quit]

15:49 john1 has joined #riscv

17:03 epony has quit [Ping timeout: 268 seconds]

17:30 aburgess has quit [Ping timeout: 272 seconds]

17:31 john1 has quit [Quit: Leaving]

17:41 epony has joined #riscv

17:43 jacklsw has quit [Read error: Connection reset by peer]

17:58 aburgess has joined #riscv

18:04 Andre_H has joined #riscv

18:14 EchelonX has joined #riscv

18:18 ___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

18:20 ___nick___ has joined #riscv

18:21 ___nick___ has quit [Client Quit]

18:23 ___nick___ has joined #riscv

19:12 Andre_H has quit [Ping timeout: 252 seconds]

19:29 wingsorc has quit [Quit: Leaving]

19:42 wingsorc has joined #riscv

20:29 vagrantc has joined #riscv

20:50 Gravis_ has joined #riscv

20:51 Gravis has quit [Ping timeout: 268 seconds]

20:55 BootLayer has quit [Quit: Leaving]

20:58 cousteau has joined #riscv

21:03 Noisytoot has quit [Excess Flood]

21:03 Noisytoot has joined #riscv

21:04 ___nick___ has quit [Ping timeout: 272 seconds]

21:06 <rneese> anyone here using extlinux and not grub that has a img we can look at ?

21:06 <rneese> we are debating extlinux vs grub

21:07 <rneese> and how to set extlinux correctly with the partition layout we have

21:07 <rneese> or is it better to go etxlinx/grub boot

21:08 <rneese> etx/ext

21:29 <mps> rneese: I posted my mmc image link to you, and today posted update one

21:40 <rneese> is that extlinux

21:41 <rneese> or grub

21:41 <rneese> I forget

21:42 <rneese> I have it and will bootit shorly cleaning up living room mess from xmas crap

21:50 <mps> extlinux, and I updated it today with new kernel and added tzdata

21:50 <rneese> ok

21:57 <rneese> I hate cleaning up after holdays

21:57 <rneese> just a pain to put everything back in atic

22:20 pabs3 has quit [Remote host closed the connection]

22:21 pabs3 has joined #riscv

22:36 pabs3 has quit [Read error: Connection reset by peer]

22:36 pabs3 has joined #riscv

22:36 * rneese velcro's pabs3 to the wall

22:52 <solrize> what is DAP debugging?

22:53 <solrize> ah nm

22:53 <rneese> debug adapter protocal

22:53 <solrize> yeah i'll check into it thanks

22:54 <solrize> like remote gdb?

22:54 <rneese> normaly used on jtag ports

22:54 EchelonX has quit [Quit: Leaving]

22:54 <rneese> https://wiki.segger.com/DAP

22:55 <solrize> this seems to do it over usb https://www.cnx-software.com/2022/12/28/4-dshanmcu-pitaya-lite-board-comes-with-mm32-arm-cortex-m3-microcontroller/

22:55 <solrize> i hope there is a gdb target for it

22:57 <solrize> will risc-v parts have it or is it arm specific?

23:05 <rneese> why do you need the debug port and not just the uart ?

23:05 <rneese> do you have a frozen boar

23:05 <rneese> d

23:12 prabhakarlad has joined #riscv