<enyc>
but wondering if there are names/terms/... I should be looking for
mlw has quit [Ping timeout: 240 seconds]
Andre_Z has joined #riscv
mlw has joined #riscv
KREYREN has quit [Remote host closed the connection]
KREYREN has joined #riscv
billchenchina has quit [Remote host closed the connection]
wingsorc has quit [Remote host closed the connection]
billchenchina has joined #riscv
bitoff has joined #riscv
vgtw has quit [Ping timeout: 246 seconds]
billchenchina has quit [Ping timeout: 268 seconds]
vgtw has joined #riscv
bjoto1 has quit [Quit: WeeChat 4.0.5]
mlw has quit [Ping timeout: 268 seconds]
mlw has joined #riscv
bitoff has quit [Ping timeout: 245 seconds]
MaxGanzII__ has quit [Remote host closed the connection]
bitoff has joined #riscv
Tenkawa has joined #riscv
vgtw has quit [Ping timeout: 256 seconds]
bitoff has quit [Ping timeout: 268 seconds]
vgtw has joined #riscv
jacklsw has quit [Ping timeout: 260 seconds]
davidlt has joined #riscv
MaxGanzII__ has joined #riscv
Stat_headcrabed has joined #riscv
arcade_droid has quit [Quit: Quit]
arcade_droid has joined #riscv
junaid_ has joined #riscv
hightower2 has joined #riscv
<muurkha>
https://arxiv.org/abs/2309.00381 "Is RISC-V ready for HPC prime-time: Evaluating the 64-core Sophon SG2042 RISC-V CPU" says, "Leveraging the RAJAPerf benchmarking suite, we discover that on average, the SG2042 delivers, per core, between five and ten times the performance compared to the nearest widely available RISC-V hardware. We found that, on average, the x86 high performance CPUs under test outperform
<muurkha>
the SG2042 by between four and eight times for multi-threaded workloads, although some individual kernels do perform faster on the SG2042."
<muurkha>
this is a pretty huge development
prabhakar has quit [Quit: Connection closed]
Stat_headcrabed has quit [Quit: Stat_headcrabed]
prabhakar has joined #riscv
prabhakarlad has joined #riscv
<courmisch>
I read it as: it's good progress, but it needs to get even better, and RVV 1.0
<gurki>
it can barely compete with sandybridge with 4 cores. while that is an impressive feat thats not exactly competitive (yet :) )
<gurki>
given how much effort went into optimizing 86 id say theres hope
<muurkha>
courmisch: sounds like a fair summary
<courmisch>
also the FOSDEM talk about my RVV 0.7.1 adventures was rejected
<gurki>
sorry to hear that, im always curious about these
<muurkha>
me too
<courmisch>
gurki: IIUC, RISC has an advantage over CISC for scalar, but the flip side is that x86 is actually pretty damn good at SIMD
<courmisch>
I don't know much about HPC, but FFmpeg typically gets much better speed-ups from SIMD on x86 than on Arm or RV
<courmisch>
and I suspect it is in no small part because x86 just sucks at scalar
<muurkha>
SIMD and especially scatter-gather necessarily departs rather far from by-the-book RISC
<gurki>
fmpeg is quite the can of worms
<gurki>
like. its mostly a frontend to various libraries doing stuff
<courmisch>
I mean the own stuff, not the wrappers obviously
<gurki>
intel has tried to optimize parts for avx512 when it was new and they had a pretty rough time
<gurki>
i assumed so, but still ... all these wrappers make it pretty .. hard
<courmisch>
would be interesting to see how dav1d fares, as it does not have the history of FFmpeg
<courmisch>
gurki: FFmpeg very much welcomes internalised decoders. It's just that nobody wants to write them
<gurki>
:D
<muurkha>
well, not *nobody*, just, not that many people
<gurki>
thats a nice way to put things
<courmisch>
Well, even AV-1 got a separate decoder, just so that it could be MIT instead of LGPL
<muurkha>
people who optimize assembly code for fun are not nonexistent, just relatively sparse in number, gurki
<gurki>
muurkha: i am such a person
<gurki>
:3
<courmisch>
that's actually a separate problem from writing the internal decoder
<gurki>
but honestly i dont want to deal with it for rv until that rvv stuff is sosrted
<muurkha>
gurki: oh, I thought you only did it for money! welcome, brother ;)
<gurki>
sorted*
<courmisch>
expectation is that somebody writes a C decoder, and then other people write DSP functions for various archs
<courmisch>
but nowadays, there's nobody to write the C decoders anymore
<sorear>
you'd think that by now there'd be decent open hardware designs for av1 and vp9
<courmisch>
gurki: what do you mean by "sorted RVV" ?
<courmisch>
sorear: hardware, yes, open...
<courmisch>
not many people write open hardware, and I guess that they are not into video DSPs
<muurkha>
courmisch: there was a long period of time where nobody wrote free-software C decoders either. like, Quicktime launched in 01991, but MPlayer wasn't usable until about 02005
<gurki>
sorear: theres a bunch of dw stuff. youll have a hard time convincing ppl to move off it though
vgtw has quit [Ping timeout: 256 seconds]
<gurki>
since. well. nobdy ver got fired for using dw parts
<gurki>
ever*
<muurkha>
dw?
<gurki>
synopsys designware
<courmisch>
muurkha: I was watching IPTV in my dorm using OSS in 2003 already
<muurkha>
aha
<gurki>
qute the collection of parts you can plug into your asic
<courmisch>
that was MPEG-2 with MMX optimisations
<courmisch>
or maybe SSE2 already, not sure
<gurki>
if you would proceed to fab a riscv youd probaby fab something thats about 2/3 dw parts
<muurkha>
courmisch: yeah, I played videos with mpeg_play_motif in 01995, too
<muurkha>
but the free-software video player scene was close to unusable until about 02005
<gurki>
(assuming youd have any reasonable number of peripherals)
<muurkha>
missing codecs, unusable user interfaces, buggy crap
<courmisch>
err, in 1995 no reasonable computer would decode video at any useful resolution, even with proprietary software
<muurkha>
I was using an Indigo2, which was not a reasonable computer
<sorear>
i don't think it helps that the av1 bitstream spec is half the length of c++11 and just as dense
<muurkha>
and it was postage-stamp-sized resolution
<gurki>
courmisch: i want to be able to write asm for avx2. or rv42. i dont cre what comes after rv, but i really dont want to deal with two
<gurki>
since that is really, really painfull on the asm level
<courmisch>
sorear: probably not indeed
<sorear>
what's rv42?
<gurki>
sorear: a riscv vector extension equivalent to avx2
<courmisch>
42 is a typo of 10 if you are one row too high on the numpad
<muurkha>
courmisch: in 01998 I was watching full-screen video on my Windows NT workstation; a Star Trek screensaver that played a bunch of video clips from the latest movie. but that wasn't a reasonable computer either; it was a dual-processor Pentium Pro because I was developing hyperspectral image processing algorithms for NAIC
<gurki>
i made that name up, sorry, i thought its obvious
<muurkha>
it took a while for the free-software world to catch up
<sorear>
gurki: a hypothetical future extension, like zvfh?
<courmisch>
Zvfh is supported by the K230 board right new to me
<gurki>
yes
<courmisch>
it's very much not hypothetical
<courmisch>
s/new/next/
<sorear>
naic, nice
<gurki>
what are your impressions thus far?
<courmisch>
muurkha: sure but then again, there was not much digital video content then, other than video game cutscenes and in some countries, VCDs
<muurkha>
courmisch: there was a lot of porn
<courmisch>
in fact, IPTV pretty much started with OSS
<muurkha>
also there was CU-SeeMe
<courmisch>
well, I'm too young for those
<muurkha>
which I didn't use, but from my friends who did, there was a lot of porn there too
<muurkha>
"I asked him to show me his penis and he did!"
<courmisch>
gurki: I don't really follow. What's the problem with RVV 1.0 now=
<gurki>
i cannot write optimized asm "for riscv"
<muurkha>
AVI had a feature that allowed you to loop a small number of frames for a while, which made it possible to download useful amounts of porn even over a 56k modem with early-90s codecs
<courmisch>
why not?
<gurki>
since ppl use 0.7 and 1.0 now and probably will for quite a while
<another|>
someone got a trustworthy source on the RVV version on C920?
<courmisch>
who uses 0.7?
<courmisch>
another|: 0.7.1
<sorear>
the c920 and the c910 are the same core
<another|>
courmisch: I was hoping for a more quotable source ;)
<muurkha>
the thing I most remember downloading to watch with mpeg_play_motif in the 90s were videos produced by genetic algorithms according to user votes
<courmisch>
sorear: isn't the C920 much beefier than the C910?
<gurki>
courmisch: unfortunately the "big" stuff like the sg2042
<sorear>
I first heard c920 described as a marketing name for c910 with the vector option but my information is sparse and inconsistent, if you have better information
<courmisch>
apart from T-Head/RevyOS kernel and libc code, I'm not seeing much 0.7.1 software optimisations
<gurki>
i feel like we had this discussion already though :3
<courmisch>
sorear: but C910 in LPi4a has vectors too
<sorear>
hence, inconsistent
<courmisch>
another|: the arxiv link above
<sorear>
nothing I've heard paints sg2042 as anything other than 16x th1520 with an inadequately scaled memory system
<courmisch>
it's published in ACM so it's as quotable as it gets
<gurki>
i feel like i should try to get access to one of them to play around a little :)
<another|>
thx =)
<muurkha>
sure
<courmisch>
didn't you just say you wanted to avoid the versioning mess? playing with C920 seems like the last thing you should be doing then
<muurkha>
might be the only way to learn though
<gurki>
courmisch: i wouldnt want to write asm, but i would enjoy throwing a few random benchmarks at ath sg2042
<gurki>
that*
<courmisch>
muurkha: what's wrong with K230 boards that you can't learn with them?
<courmisch>
sure won't build an HPC cluster with them, but it runs Linux
<gurki>
the hard part about simd is keeping the cores fed. so you kinda needa few cores to get a reasonable metric whether your idea was good
<muurkha>
courmisch: sure, if your objective is to debug your compiler and get things to work at all, Kendryte chips should be fine
<gurki>
you can grab that via performance counters and performance modelling, but nothing beats trying it on the real iron
<muurkha>
but if you're trying to get a handle on performance bottlenecks and solutions to them, you probably want something that's as close as possible to the performance characteristics of the real thing
<gurki>
^^ thats a way better phrasing of what i tried to say
<courmisch>
until there's hardware with Zicbop, not sure you can do much about that part
<muurkha>
well, anything you can do today will still be limited
KREYREN has quit [Remote host closed the connection]
<courmisch>
my point is that you're better off testing on RVV 1.0 hardware so at least you can run that code on whatever adequate hardware the future holds
KREYREN has joined #riscv
<courmisch>
if C920 is anything like C910, its RVV implementation is all over the place. C908 actually improves performance (relative to scalar), and I'd expect that future beefier T-Head designs will do too
KREYREN has quit [Remote host closed the connection]
KREYREN has joined #riscv
<another|>
curiosly the c920 is not listed on t-head's website
<another|>
found a spec pdf somewhere on the internet that looks official which clearly says 0.7.1
<courmisch>
it also does not have the C908
heat has quit [Remote host closed the connection]
<courmisch>
(except in the Chinese version)
heat has joined #riscv
KREYREN has quit [Ping timeout: 240 seconds]
hightower2 has quit [Ping timeout: 255 seconds]
<another|>
now there's some weird "news" reporting c920 has been "upgraded" to 1.0