michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
LainExperiments has quit [Ping timeout: 240 seconds]
Traneptora has quit [Quit: Quit]
LainExperiments has joined #ffmpeg-devel
IndecisiveTurtle has joined #ffmpeg-devel
thilo has quit [Ping timeout: 265 seconds]
thilo has joined #ffmpeg-devel
Marth64[m] has joined #ffmpeg-devel
Marth64 has quit [Ping timeout: 246 seconds]
LainExperiments has quit [Quit: Client closed]
IndecisiveTurtle has quit [Ping timeout: 272 seconds]
realies has quit [Quit: ~]
realies has joined #ffmpeg-devel
abdu has joined #ffmpeg-devel
<cone-197> ffmpeg Michael Niedermayer master:0e917389fe73: avcodec/exr: do not output 32bit floats when a file stores 16bit floats
LainExperiments has joined #ffmpeg-devel
Aadil has quit [Ping timeout: 240 seconds]
abdu has quit [Ping timeout: 240 seconds]
LainExperiments has quit [Ping timeout: 240 seconds]
^Neo has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
jamrial has quit []
Tanay has quit [Remote host closed the connection]
Tanay has joined #ffmpeg-devel
ahmedhamed has quit [Quit: Connection closed for inactivity]
^Neo has quit [Ping timeout: 244 seconds]
Aadil has joined #ffmpeg-devel
Martchus_ has joined #ffmpeg-devel
Martchus has quit [Ping timeout: 260 seconds]
mkver has quit [Quit: Leaving]
HarshK23 has joined #ffmpeg-devel
twelve has joined #ffmpeg-devel
twelve has quit [Remote host closed the connection]
twelve has joined #ffmpeg-devel
cone-197 has quit [Quit: transmission timeout]
Aadil has quit [Ping timeout: 240 seconds]
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
Coinflipper has quit [Quit: ​]
Coinflipper has joined #ffmpeg-devel
_av500_ has quit [Remote host closed the connection]
derpydoo has joined #ffmpeg-devel
av500 has joined #ffmpeg-devel
<haasn> are we allowed to use GCC vector extensions?
<haasn> afaict they are not portable to MSVC etc
<compnn> the gcc police will get you /s
<nevcairiel> code has to compile on msvc, and if its substantially slower due to disabled code, that would be yet another argument for using x86inc asm :P
<JEEB> > translate vector optimization results via godbolt or local dumping into x86inc
<haasn> well
<haasn> the reason I ask is because there are optimizations we could do with vectors that would be plain impossible even with hand written SIMD (unless we want to abandon standard calling convention)
<haasn> in particular, if functions accept and return vectors, gcc will directly return results in %ymm0 etc
<haasn> this saves a memory roundtrip
<haasn> let me estimate the speedup
<nevcairiel> sounds like you are already abandoning standard calling conventions
twelve has quit [Ping timeout: 265 seconds]
<haasn> well okay, but the point is that this would retain compatibility with the C code
<nevcairiel> msvc actually specifies the __vectorcall calling convention which supports up to 4 vector registers as return value
<nevcairiel> not sure how compatible that is with gccs extension, but you are still stuck with many other compilers i'm sure
<haasn> Clearly we need a C language extensions for vectors
th3synth4x has joined #ffmpeg-devel
twelve has joined #ffmpeg-devel
twelve has quit [Remote host closed the connection]
<haasn> Admittedly the vectorization scheme would break down in any case because you can’t then also return _multiple_ vectors from a single function
<haasn> So I can’t even benchmark the hypothetical speedup
<haasn> I think what we will want to do to push maximum performance in practice is devise our own internal calling convention for the pure asm impl
<haasn> Where we just keep four vectors reserved for I/O
<wbs> custom calling conventions are fine between two asm functions where we control everything, but I wouldn't attempt doing that between asm and C
<haasn> Right
<haasn> That would require calling into only asm functions, so it’s something we could only attempt once we have 100% coverage anyway
<JEEB> dav1d basically had something like that, I think? (custom calling convention within asm)
<wbs> yes, within the itx functions
<haasn> For now the round trip through L1 is not the end of the world
<haasn> The overhead of going from say 4 to 5 function calls is almost nothing compared to the overhead of the functions themselves
<JEEB> :)
<haasn> The fastest speedup comes from defining subsets of functions that operate only on some components
<haasn> Maybe it would have made sense to define per-component and per-pixel operations separately
<haasn> But without vector extensions that wouldn’t be worth it anyway
<haasn> wbs: btw, maybe you have an idea about how to do an efficient swizzle in asm? Imagine you have a void func(pixel *a, pixel *b, pixel * c, pixel *d, swizzle_t mask); which should permute the contents of the pointers according to the swizzle mask
<haasn> For example swapping *a and *b
<wbs> haasn: swapping the pointers themselves, or the contents they point to?
<haasn> The contents
<wbs> hmm, not sure really, that sounds quite non-idiomatic
<haasn> Assume fixed size, eg 16 elements or w/e
<haasn> Well in reality it is a SoA representation of a block
<haasn> So you have struct chunk { pixel x[], y[], z[], w[] };
<haasn> And I need some routine for swizzle(chunk *x, mask)
<haasn> Obviously when the final input / output is planar we could just swap the pointers directly
<haasn> But it’s not always possible to eliminate these swizzle operations
<haasn> For rxample, on decoding rgb30 vs bgr30; we have one unpack101010 operation which unpacks the 32 bit pixel into four 16 bit values in a fixed order in the chunk struct
<haasn> And then I may need to rearrange this to repack it differently (eg as rgb565)
<haasn> Sure we could jump through hoops to try and imbue the unpack itself with an extra swizzle mask
<haasn> But that just makes the problem difficult elsewhere
twelve has joined #ffmpeg-devel
<haasn> what I did in my implementation now is essentially having one routine for each possible swizzle, or at least each type possible in practice (currently there are 13)
<haasn> and then choosing the right one at init time
<haasn> it's not too bad because they are quite small, each one is only a few opcodes
<haasn> but I wanted to expand this to allow repetitions in the swizzle mask as well which would raise it to 256 possibilities in theory
th3synth4x has quit [Ping timeout: 240 seconds]
derpydoo has quit [Ping timeout: 268 seconds]
th3synth4x has joined #ffmpeg-devel
<kurosu> (late because stuck in buffer) For #11363, I think ubitux' colleague might be of some help, as he contributed and was interested some years ago in this topic. mateo` maybe ?
diniboy has joined #ffmpeg-devel
Gramner has joined #ffmpeg-devel
j45_ has joined #ffmpeg-devel
j45 has quit [Ping timeout: 260 seconds]
j45_ is now known as j45
j45 has quit [Changing host]
j45 has joined #ffmpeg-devel
SohamK has joined #ffmpeg-devel
SohamK has quit [Changing host]
SohamK has joined #ffmpeg-devel
twelve has quit [Ping timeout: 268 seconds]
diniboy has quit [Ping timeout: 240 seconds]
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
diniboy has joined #ffmpeg-devel
vjaquez has joined #ffmpeg-devel
jamrial has joined #ffmpeg-devel
mkver has joined #ffmpeg-devel
lemourin has quit [Quit: The Lounge - https://thelounge.chat]
lemourin has joined #ffmpeg-devel
th3synth4x has quit [Quit: Client closed]
Traneptora has joined #ffmpeg-devel
MetaNova has quit [Ping timeout: 276 seconds]
MetaNova has joined #ffmpeg-devel
Sean_McG has joined #ffmpeg-devel
th3synth4x has joined #ffmpeg-devel
diniboy has quit [Quit: Client closed]
SohamK has quit [Quit: Client closed]
kasper93 has quit [Remote host closed the connection]
th3synth4x has quit [Quit: Client closed]
abdu has joined #ffmpeg-devel
<Lynne> haasn: does vf_libplacebo implement proper motion blur for fps conversions?
<Lynne> speedups in particular
<haasn> Lynne: probably not
<haasn> it just blends frames together linearly
<haasn> no motion analysis, no optical flow
<haasn> watches welcome
<haasn> I saw that there is some optical flow stuff in vulkan these days, is it available in mesa yet?
<Lynne> nope, its nvidia-only, no interest in extending it
<Lynne> blending is fine, skips having to use tblend
<Lynne> (I meant blending, I think its good enough)
th3synth4x has joined #ffmpeg-devel
abdu has quit [Ping timeout: 240 seconds]
ccawley2011 has joined #ffmpeg-devel
th3synth4x has quit [Quit: Client closed]
jamrial has quit [Read error: Connection reset by peer]
kasper93 has joined #ffmpeg-devel
jamrial has joined #ffmpeg-devel
<Sean_McG> michaelni: 2 of the EXR tests seem to be failing everywhere on FATE
<Sean_McG> oh woops, already noted on the ML
minimal has joined #ffmpeg-devel
<jamrial> michaelni: https://fate.ffmpeg.org/report.cgi?slot=x86_64-linux-gcc-14.2-asan&time=20250305120324 this should help you debug the issue
ccawley2011 has quit [Ping timeout: 252 seconds]
<Traneptora> haasn: is it locked to linear? or can you set tscale (like mpv does it)
<haasn> same as mpv
<Traneptora> iirc mpv uses catrom by default
<Traneptora> might be mitchell tho
<fflogger> [editedticket] bubbleguuum: Ticket #10869 ([avformat] [Regression] Failed parsing of "Content-Type: audio/L16"-like HTTP streams) updated https://trac.ffmpeg.org/ticket/10869#comment:8
<jamrial> michaelni: http://pastie.org/p/3dkYLjUMDmz2Y12XgSnvjb looks like this fixes it. can you confirm it's ok?
<michaelni> jamrial, i think so, i am doing some more tests ATM. but feel free to apply
abdu has joined #ffmpeg-devel
<michaelni> exr depends on zlib and my mips and arm testcases have no zlib
tufei_ has joined #ffmpeg-devel
ccawley2011 has joined #ffmpeg-devel
tufei__ has quit [Ping timeout: 264 seconds]
<michaelni> jamrial, also fate checksums need an update for the 2. LGTM otherwise, please apply
abdu has quit [Ping timeout: 240 seconds]
abdu has joined #ffmpeg-devel
cone-806 has joined #ffmpeg-devel
<cone-806> ffmpeg James Almer master:5560a20d770e: avcodec/exr: use the correct step value for plane pointers
<haasn> okay, I identified 7 slow paths in nuscale where it currently still regresses performance
<haasn> about half of which should be easy to fix
<haasn> and surprisingly, only one of which is due to bad code generation versus asm
IndecisiveTurtle has joined #ffmpeg-devel
abdu91 has joined #ffmpeg-devel
abdu has quit [Ping timeout: 240 seconds]
tufei_ has quit [Quit: Leaving]
abdu62 has joined #ffmpeg-devel
abdu91 has quit [Ping timeout: 240 seconds]
ccawley2011 has quit [Ping timeout: 244 seconds]
tufei has joined #ffmpeg-devel
tufei has quit [Ping timeout: 264 seconds]
<cone-806> ffmpeg Andreas Rheinhardt master:431805c09673: avcodec/exr: Remove write-only gamma_table
<cone-806> ffmpeg Andreas Rheinhardt master:72cff47be73f: avcodec/exr: Fix potential effective-type violation
abdu62 has quit [Quit: Client closed]
diniboy has joined #ffmpeg-devel
delewis has quit [Remote host closed the connection]
delewis has joined #ffmpeg-devel
iive has joined #ffmpeg-devel
ccawley2011 has joined #ffmpeg-devel
abdu62 has joined #ffmpeg-devel
ccawley2011_ has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 244 seconds]
diniboy has quit [Ping timeout: 240 seconds]
<haasn> Hmmm
<haasn> technically, the "full range bit depth upconversion" trick everybody is using is not mathematically sound
<haasn> e.g. converting 8 bit RGB to 10 bit by doing (x << 2) | (x >> 6)
<Traneptora> why would that work?
<haasn> because for example this turns a value of 50 = 0x32 into 200 = 0xC8
ccawley2011 has joined #ffmpeg-devel
<haasn> but the correct result is 50 / 255 * 1023 = 200.58823...
<haasn> so it should be dithered to a 58%/42% mix of 201 and 200
ccawley2011__ has joined #ffmpeg-devel
<Traneptora> I still don't see how (x << 2) | (x >> 6) would work
<Traneptora> bits 0 and 1 would overlap with bits 8 and 9 after shifting
ccawley2011_ has quit [Ping timeout: 260 seconds]
abdu62 has quit [Quit: Client closed]
<BtbN> Where do bits 8 and 9 come from in an 8 bit value?
abdu62 has joined #ffmpeg-devel
<Traneptora> well they mentioned RGB so I'm assuming packed I guess
<Traneptora> if it was planar then "RGB" wouldn't be a thing
<BtbN> That's the operation you do on one 8 bit color value to up-convert it to 10 bit
<haasn> this is per component, e.g. gray8 -> gray10
<Traneptora> ah, per component makes more sense
<haasn> the shift trick is only precise when the scale factor is an integer
<haasn> and then you might as well just do an integer multiply
<Traneptora> iirc it's only an integer for 257 right
<haasn> fewer ops than two shifts and an or
<haasn> yeah
<haasn> I guess it's obvious when you think about it; it's only precise when you're not shifting away bits
<JEEB> yea
<haasn> such as in the case of x << 8 | x
<haasn> noted, then I don't actually need an implementation for this operation ever
ccawley2011 has quit [Ping timeout: 265 seconds]
<haasn> unless (x << 8) | x is somehow faster than x * 257 but i doubt it
<haasn> time to bench
<JEEB> as a fun anecdote
<JEEB> x264 in like 2010 or 2011 had a boog in their bit depth logic
<JEEB> so as it would go from X to 16 to OUT_DEPTH
<JEEB> there was a green tint
<JEEB> IIRC got fixed in the end
ccawley2011_ has joined #ffmpeg-devel
<haasn> nice, clang compiles "x * 257" into "x << 8 | x"
<haasn> so much for that theory
<haasn> I'm sure they know what they're doing :)
<haasn> gcc even does the same
ccawley2011__ has quit [Ping timeout: 260 seconds]
<haasn> ah, it's fewer op codes actually, because of widening shifts
abdu62 has quit [Quit: Client closed]
abdu62 has joined #ffmpeg-devel
<Lynne> binary matters more for this optimization, with a *257 you need either a gpr load with immediate (wastes l1i), or a rodata (wastes l1c), and for vectors, even worse
SohamK has joined #ffmpeg-devel
SohamK has quit [Quit: Client closed]
abdu62 has quit [Ping timeout: 240 seconds]
jamrial has quit [Read error: Connection reset by peer]
keith has quit [Remote host closed the connection]
keith has joined #ffmpeg-devel
<haasn> Overall speedup=1.870x faster, min=0.235x max=23.406x // another 11% speedup on average \o/
<haasn> and now all of the weird low bit depth format conversions are faster than swscale again
jamrial has joined #ffmpeg-devel
keith has quit [Remote host closed the connection]
cone-806 has quit [Quit: transmission timeout]
<compnn> did you test 160x120 and 320x240 res conversions? /s
keith has joined #ffmpeg-devel
<compnn> i guess there are a bunch of low bit depths too
<compnn> 15bpp :D
<compnn> i'm not sure we have organized bit depth sample collection though
<compnn> but if you need one, mostly video game and early codecs, it could be collected.
<compnn> i remember 8bpp was that gif ?
ccawley2011 has joined #ffmpeg-devel
Sean_McG has quit [Quit: leaving]
ccawley2011_ has quit [Ping timeout: 252 seconds]
ccawley2011_ has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 252 seconds]
ccawley2011 has joined #ffmpeg-devel
ccawley2011__ has joined #ffmpeg-devel
ccawley2011_ has quit [Ping timeout: 260 seconds]
ccawley2011 has quit [Ping timeout: 265 seconds]
ccawley2011__ has quit [Ping timeout: 252 seconds]
ccawley2011__ has joined #ffmpeg-devel
SuperFashi has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
SuperFashi has joined #ffmpeg-devel
<haasn> Overall speedup=1.931x faster, min=0.245x max=24.129x almost at the 2x
<haasn> Now the only slow paths left are either missing LUT optimizations (e.g. for treating rgb8 as pal8), missing optimized packed read/write fast paths (the MMX code in swscale vastly outperforms our code on these), and one corner case where we do unnecessary swizzling on planar read/writes
<haasn> I'm calling that a major success overall
<haasn> some cases are almost 6x sped up
<haasn> gray 1920x1080 -> gbrp 1920x1080, flags=0 dither=1, SSIM {Y=0.999977 U=1.000000 V=1.000000 A=1.000000}
<haasn> time=554 us, ref=3284 us, speedup=5.923x faster
ccawley2011 has joined #ffmpeg-devel
ccawley2011_ has joined #ffmpeg-devel
ccawley2011__ has quit [Ping timeout: 248 seconds]
ccawley2011 has quit [Ping timeout: 248 seconds]
ccawley2011_ has quit [Ping timeout: 244 seconds]
ccawley2011 has joined #ffmpeg-devel
ccawley2011_ has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 245 seconds]
ccawley2011 has joined #ffmpeg-devel
ccawley2011_ has quit [Ping timeout: 252 seconds]
ccawley2011_ has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 265 seconds]
ccawley2011__ has joined #ffmpeg-devel
ccawley2011_ has quit [Ping timeout: 276 seconds]
thilo has quit [Ping timeout: 272 seconds]
thilo has joined #ffmpeg-devel
thilo has quit [Changing host]
thilo has joined #ffmpeg-devel
ccawley2011__ has quit [Read error: Connection reset by peer]
labnan has quit [Quit: fBNC - https://bnc4free.com]
<jamrial> Lynne: can you check dale's aac patch
cone-528 has joined #ffmpeg-devel
<cone-528> ffmpeg Dale Curtis master:696ea1c22368: Don't attempt to parse ADTS from USAC packets.
<Lynne> didn't have to, the patch was already forwarded to me from the chromium folks, I told them to send it over so I can apply it
<Lynne> just didn't realize they had sent it over, despite checking a few times
labnan has joined #ffmpeg-devel