#ffmpeg-devel on 2025-03-05 — irc logs at libera.irclog.whitequark.org

2025-03-03 01:04 michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct

00:40 LainExperiments has quit [Ping timeout: 240 seconds]

00:47 Traneptora has quit [Quit: Quit]

00:51 LainExperiments has joined #ffmpeg-devel

01:08 IndecisiveTurtle has joined #ffmpeg-devel

01:10 thilo has quit [Ping timeout: 265 seconds]

01:11 thilo has joined #ffmpeg-devel

01:46 Marth64[m] has joined #ffmpeg-devel

01:48 Marth64 has quit [Ping timeout: 246 seconds]

01:51 LainExperiments has quit [Quit: Client closed]

01:52 IndecisiveTurtle has quit [Ping timeout: 272 seconds]

02:04 realies has quit [Quit: ~]

02:04 realies has joined #ffmpeg-devel

02:15 abdu has joined #ffmpeg-devel

02:34 <cone-197> ffmpeg Michael Niedermayer master:0e917389fe73: avcodec/exr: do not output 32bit floats when a file stores 16bit floats

02:56 LainExperiments has joined #ffmpeg-devel

03:03 Aadil has quit [Ping timeout: 240 seconds]

03:05 abdu has quit [Ping timeout: 240 seconds]

03:05 LainExperiments has quit [Ping timeout: 240 seconds]

03:05 ^Neo has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

03:06 ^Neo has joined #ffmpeg-devel

03:20 jamrial has quit []

03:23 Tanay has quit [Remote host closed the connection]

03:23 Tanay has joined #ffmpeg-devel

03:35 ahmedhamed has quit [Quit: Connection closed for inactivity]

03:40 ^Neo has quit [Ping timeout: 244 seconds]

04:13 Aadil has joined #ffmpeg-devel

04:22 Martchus_ has joined #ffmpeg-devel

04:23 Martchus has quit [Ping timeout: 260 seconds]

04:38 mkver has quit [Quit: Leaving]

04:38 HarshK23 has joined #ffmpeg-devel

04:41 twelve has joined #ffmpeg-devel

05:15 twelve has quit [Remote host closed the connection]

05:19 twelve has joined #ffmpeg-devel

05:34 cone-197 has quit [Quit: transmission timeout]

05:35 Aadil has quit [Ping timeout: 240 seconds]

06:00 System_Error has quit [Remote host closed the connection]

06:07 System_Error has joined #ffmpeg-devel

07:42 Coinflipper has quit [Quit: ]

07:44 Coinflipper has joined #ffmpeg-devel

08:07 _av500_ has quit [Remote host closed the connection]

08:10 derpydoo has joined #ffmpeg-devel

08:29 av500 has joined #ffmpeg-devel

08:36 <haasn> are we allowed to use GCC vector extensions?

08:36 <haasn> afaict they are not portable to MSVC etc

08:39 <compnn> the gcc police will get you /s

08:40 <nevcairiel> code has to compile on msvc, and if its substantially slower due to disabled code, that would be yet another argument for using x86inc asm :P

08:40 <JEEB> > translate vector optimization results via godbolt or local dumping into x86inc

08:41 <haasn> well

08:42 <haasn> the reason I ask is because there are optimizations we could do with vectors that would be plain impossible even with hand written SIMD (unless we want to abandon standard calling convention)

08:42 <haasn> in particular, if functions accept and return vectors, gcc will directly return results in %ymm0 etc

08:42 <haasn> this saves a memory roundtrip

08:43 <haasn> let me estimate the speedup

08:43 <nevcairiel> sounds like you are already abandoning standard calling conventions

08:43 twelve has quit [Ping timeout: 265 seconds]

08:44 <haasn> well okay, but the point is that this would retain compatibility with the C code

08:47 <nevcairiel> msvc actually specifies the __vectorcall calling convention which supports up to 4 vector registers as return value

08:48 <nevcairiel> not sure how compatible that is with gccs extension, but you are still stuck with many other compilers i'm sure

08:50 <haasn> Clearly we need a C language extensions for vectors

09:09 th3synth4x has joined #ffmpeg-devel

09:09 twelve has joined #ffmpeg-devel

09:12 twelve has quit [Remote host closed the connection]

09:17 <haasn> Admittedly the vectorization scheme would break down in any case because you can’t then also return _multiple_ vectors from a single function

09:18 <haasn> So I can’t even benchmark the hypothetical speedup

09:18 <haasn> I think what we will want to do to push maximum performance in practice is devise our own internal calling convention for the pure asm impl

09:19 <haasn> Where we just keep four vectors reserved for I/O

09:19 <wbs> custom calling conventions are fine between two asm functions where we control everything, but I wouldn't attempt doing that between asm and C

09:19 <haasn> Right

09:19 <haasn> That would require calling into only asm functions, so it’s something we could only attempt once we have 100% coverage anyway

09:20 <JEEB> dav1d basically had something like that, I think? (custom calling convention within asm)

09:20 <wbs> yes, within the itx functions

09:20 <haasn> For now the round trip through L1 is not the end of the world

09:22 <haasn> The overhead of going from say 4 to 5 function calls is almost nothing compared to the overhead of the functions themselves

09:22 <JEEB> :)

09:22 <haasn> The fastest speedup comes from defining subsets of functions that operate only on some components

09:22 <haasn> Maybe it would have made sense to define per-component and per-pixel operations separately

09:23 <haasn> But without vector extensions that wouldn’t be worth it anyway

09:25 <haasn> wbs: btw, maybe you have an idea about how to do an efficient swizzle in asm? Imagine you have a void func(pixel *a, pixel *b, pixel * c, pixel *d, swizzle_t mask); which should permute the contents of the pointers according to the swizzle mask

09:25 <haasn> For example swapping *a and *b

09:26 <wbs> haasn: swapping the pointers themselves, or the contents they point to?

09:26 <haasn> The contents

09:26 <wbs> hmm, not sure really, that sounds quite non-idiomatic

09:26 <haasn> Assume fixed size, eg 16 elements or w/e

09:27 <haasn> Well in reality it is a SoA representation of a block

09:27 <haasn> So you have struct chunk { pixel x[], y[], z[], w[] };

09:28 <haasn> And I need some routine for swizzle(chunk *x, mask)

09:28 <haasn> Obviously when the final input / output is planar we could just swap the pointers directly

09:29 <haasn> But it’s not always possible to eliminate these swizzle operations

09:31 <haasn> For rxample, on decoding rgb30 vs bgr30; we have one unpack101010 operation which unpacks the 32 bit pixel into four 16 bit values in a fixed order in the chunk struct

09:32 <haasn> And then I may need to rearrange this to repack it differently (eg as rgb565)

09:32 <haasn> Sure we could jump through hoops to try and imbue the unpack itself with an extra swizzle mask

09:33 <haasn> But that just makes the problem difficult elsewhere

09:44 twelve has joined #ffmpeg-devel

09:47 <haasn> what I did in my implementation now is essentially having one routine for each possible swizzle, or at least each type possible in practice (currently there are 13)

09:47 <haasn> and then choosing the right one at init time

09:47 <haasn> it's not too bad because they are quite small, each one is only a few opcodes

09:48 <haasn> but I wanted to expand this to allow repetitions in the swizzle mask as well which would raise it to 256 possibilities in theory

10:33 th3synth4x has quit [Ping timeout: 240 seconds]

10:47 derpydoo has quit [Ping timeout: 268 seconds]

10:49 th3synth4x has joined #ffmpeg-devel

10:49 <kurosu> (late because stuck in buffer) For #11363, I think ubitux' colleague might be of some help, as he contributed and was interested some years ago in this topic. mateo` maybe ?

10:49 diniboy has joined #ffmpeg-devel

11:20 Gramner has joined #ffmpeg-devel

11:26 j45_ has joined #ffmpeg-devel

11:28 j45 has quit [Ping timeout: 260 seconds]

11:28 j45_ is now known as j45

11:28 j45 has quit [Changing host]

11:28 j45 has joined #ffmpeg-devel

11:29 SohamK has joined #ffmpeg-devel

11:33 SohamK has quit [Changing host]

11:33 SohamK has joined #ffmpeg-devel

11:35 twelve has quit [Ping timeout: 268 seconds]

11:50 diniboy has quit [Ping timeout: 240 seconds]

12:04 ^Neo has joined #ffmpeg-devel

12:06 diniboy has joined #ffmpeg-devel

12:11 vjaquez has joined #ffmpeg-devel

12:28 jamrial has joined #ffmpeg-devel

12:30 mkver has joined #ffmpeg-devel

12:37 lemourin has quit [Quit: The Lounge - https://thelounge.chat]

12:38 lemourin has joined #ffmpeg-devel

12:40 th3synth4x has quit [Quit: Client closed]

12:43 Traneptora has joined #ffmpeg-devel

12:50 MetaNova has quit [Ping timeout: 276 seconds]

12:56 MetaNova has joined #ffmpeg-devel

12:59 Sean_McG has joined #ffmpeg-devel

12:59 th3synth4x has joined #ffmpeg-devel

13:14 diniboy has quit [Quit: Client closed]

13:24 SohamK has quit [Quit: Client closed]

13:36 kasper93 has quit [Remote host closed the connection]

13:37 th3synth4x has quit [Quit: Client closed]

13:51 abdu has joined #ffmpeg-devel

13:56 <Lynne> haasn: does vf_libplacebo implement proper motion blur for fps conversions?

13:56 <Lynne> speedups in particular

13:56 <haasn> Lynne: probably not

13:56 <haasn> it just blends frames together linearly

13:56 <haasn> no motion analysis, no optical flow

13:57 <haasn> watches welcome

13:57 <haasn> I saw that there is some optical flow stuff in vulkan these days, is it available in mesa yet?

13:57 <Lynne> nope, its nvidia-only, no interest in extending it

13:58 <Lynne> blending is fine, skips having to use tblend

13:58 <Lynne> (I meant blending, I think its good enough)

14:01 th3synth4x has joined #ffmpeg-devel

14:05 abdu has quit [Ping timeout: 240 seconds]

14:09 ccawley2011 has joined #ffmpeg-devel

14:16 th3synth4x has quit [Quit: Client closed]

14:17 jamrial has quit [Read error: Connection reset by peer]

14:17 kasper93 has joined #ffmpeg-devel

14:18 jamrial has joined #ffmpeg-devel

14:19 <Sean_McG> michaelni: 2 of the EXR tests seem to be failing everywhere on FATE

14:23 <Sean_McG> oh woops, already noted on the ML

14:31 minimal has joined #ffmpeg-devel

14:35 <jamrial> michaelni: https://fate.ffmpeg.org/report.cgi?slot=x86_64-linux-gcc-14.2-asan&time=20250305120324 this should help you debug the issue

14:37 ccawley2011 has quit [Ping timeout: 252 seconds]

14:41 <Traneptora> haasn: is it locked to linear? or can you set tscale (like mpv does it)

14:42 <haasn> same as mpv

14:42 <Traneptora> iirc mpv uses catrom by default

14:42 <Traneptora> might be mitchell tho

14:49 <fflogger> [editedticket] bubbleguuum: Ticket #10869 ([avformat] [Regression] Failed parsing of "Content-Type: audio/L16"-like HTTP streams) updated https://trac.ffmpeg.org/ticket/10869#comment:8

15:01 <jamrial> michaelni: http://pastie.org/p/3dkYLjUMDmz2Y12XgSnvjb looks like this fixes it. can you confirm it's ok?

15:36 <michaelni> jamrial, i think so, i am doing some more tests ATM. but feel free to apply

15:38 abdu has joined #ffmpeg-devel

15:41 <michaelni> exr depends on zlib and my mips and arm testcases have no zlib

15:49 tufei_ has joined #ffmpeg-devel

15:49 ccawley2011 has joined #ffmpeg-devel

15:50 tufei__ has quit [Ping timeout: 264 seconds]

15:53 <michaelni> jamrial, also fate checksums need an update for the 2. LGTM otherwise, please apply

15:56 abdu has quit [Ping timeout: 240 seconds]

15:56 abdu has joined #ffmpeg-devel

16:00 cone-806 has joined #ffmpeg-devel

16:00 <cone-806> ffmpeg James Almer master:5560a20d770e: avcodec/exr: use the correct step value for plane pointers

16:04 <haasn> okay, I identified 7 slow paths in nuscale where it currently still regresses performance

16:05 <haasn> about half of which should be easy to fix

16:05 <haasn> and surprisingly, only one of which is due to bad code generation versus asm

16:15 IndecisiveTurtle has joined #ffmpeg-devel

16:20 abdu91 has joined #ffmpeg-devel

16:21 abdu has quit [Ping timeout: 240 seconds]

16:29 tufei_ has quit [Quit: Leaving]

16:45 abdu62 has joined #ffmpeg-devel

16:45 abdu91 has quit [Ping timeout: 240 seconds]

16:53 ccawley2011 has quit [Ping timeout: 244 seconds]

16:53 tufei has joined #ffmpeg-devel

16:58 tufei has quit [Ping timeout: 264 seconds]

17:01 <cone-806> ffmpeg Andreas Rheinhardt master:431805c09673: avcodec/exr: Remove write-only gamma_table

17:01 <cone-806> ffmpeg Andreas Rheinhardt master:72cff47be73f: avcodec/exr: Fix potential effective-type violation

17:12 abdu62 has quit [Quit: Client closed]

17:23 diniboy has joined #ffmpeg-devel

17:29 delewis has quit [Remote host closed the connection]

17:30 delewis has joined #ffmpeg-devel

17:40 iive has joined #ffmpeg-devel

17:51 ccawley2011 has joined #ffmpeg-devel

17:59 abdu62 has joined #ffmpeg-devel

18:03 ccawley2011_ has joined #ffmpeg-devel

18:04 ccawley2011 has quit [Ping timeout: 244 seconds]

18:08 diniboy has quit [Ping timeout: 240 seconds]

18:23 <haasn> Hmmm

18:23 <haasn> technically, the "full range bit depth upconversion" trick everybody is using is not mathematically sound

18:23 <haasn> e.g. converting 8 bit RGB to 10 bit by doing (x << 2) | (x >> 6)

18:25 <Traneptora> why would that work?

18:25 <haasn> because for example this turns a value of 50 = 0x32 into 200 = 0xC8

18:25 ccawley2011 has joined #ffmpeg-devel

18:25 <haasn> but the correct result is 50 / 255 * 1023 = 200.58823...

18:26 <haasn> so it should be dithered to a 58%/42% mix of 201 and 200

18:26 ccawley2011__ has joined #ffmpeg-devel

18:27 <Traneptora> I still don't see how (x << 2) | (x >> 6) would work

18:27 <Traneptora> bits 0 and 1 would overlap with bits 8 and 9 after shifting

18:28 ccawley2011_ has quit [Ping timeout: 260 seconds]

18:28 abdu62 has quit [Quit: Client closed]

18:28 <BtbN> Where do bits 8 and 9 come from in an 8 bit value?

18:28 abdu62 has joined #ffmpeg-devel

18:28 <Traneptora> well they mentioned RGB so I'm assuming packed I guess

18:28 <Traneptora> if it was planar then "RGB" wouldn't be a thing

18:28 <BtbN> That's the operation you do on one 8 bit color value to up-convert it to 10 bit

18:28 <haasn> this is per component, e.g. gray8 -> gray10

18:29 <Traneptora> ah, per component makes more sense

18:29 <haasn> the shift trick is only precise when the scale factor is an integer

18:29 <haasn> and then you might as well just do an integer multiply

18:29 <Traneptora> iirc it's only an integer for 257 right

18:29 <haasn> fewer ops than two shifts and an or

18:29 <haasn> yeah

18:29 <haasn> I guess it's obvious when you think about it; it's only precise when you're not shifting away bits

18:30 <JEEB> yea

18:30 <haasn> such as in the case of x << 8 | x

18:30 <haasn> noted, then I don't actually need an implementation for this operation ever

18:30 ccawley2011 has quit [Ping timeout: 265 seconds]

18:30 <haasn> unless (x << 8) | x is somehow faster than x * 257 but i doubt it

18:30 <haasn> time to bench

18:34 <JEEB> as a fun anecdote

18:34 <JEEB> x264 in like 2010 or 2011 had a boog in their bit depth logic

18:34 <JEEB> so as it would go from X to 16 to OUT_DEPTH

18:35 <JEEB> there was a green tint

18:35 <JEEB> IIRC got fixed in the end

18:46 ccawley2011_ has joined #ffmpeg-devel

18:46 <haasn> nice, clang compiles "x * 257" into "x << 8 | x"

18:46 <haasn> so much for that theory

18:46 <haasn> I'm sure they know what they're doing :)

18:46 <haasn> gcc even does the same

18:48 ccawley2011__ has quit [Ping timeout: 260 seconds]

18:48 <haasn> ah, it's fewer op codes actually, because of widening shifts

18:50 abdu62 has quit [Quit: Client closed]

18:50 abdu62 has joined #ffmpeg-devel

19:03 <Lynne> binary matters more for this optimization, with a *257 you need either a gpr load with immediate (wastes l1i), or a rodata (wastes l1c), and for vectors, even worse

19:09 SohamK has joined #ffmpeg-devel

19:17 SohamK has quit [Quit: Client closed]

19:21 abdu62 has quit [Ping timeout: 240 seconds]

19:23 jamrial has quit [Read error: Connection reset by peer]

19:38 keith has quit [Remote host closed the connection]

19:41 keith has joined #ffmpeg-devel

19:42 <haasn> Overall speedup=1.870x faster, min=0.235x max=23.406x // another 11% speedup on average \o/

19:43 <haasn> and now all of the weird low bit depth format conversions are faster than swscale again

19:43 jamrial has joined #ffmpeg-devel

19:47 keith has quit [Remote host closed the connection]

20:01 cone-806 has quit [Quit: transmission timeout]

20:05 <compnn> did you test 160x120 and 320x240 res conversions? /s

20:06 keith has joined #ffmpeg-devel

20:06 <compnn> i guess there are a bunch of low bit depths too

20:06 <compnn> 15bpp :D

20:08 <compnn> i'm not sure we have organized bit depth sample collection though

20:09 <compnn> but if you need one, mostly video game and early codecs, it could be collected.

20:14 <compnn> i remember 8bpp was that gif ?

20:40 ccawley2011 has joined #ffmpeg-devel

20:43 Sean_McG has quit [Quit: leaving]

20:43 ccawley2011_ has quit [Ping timeout: 252 seconds]

20:44 ccawley2011_ has joined #ffmpeg-devel

20:46 ccawley2011 has quit [Ping timeout: 252 seconds]

21:10 ccawley2011 has joined #ffmpeg-devel

21:11 ccawley2011__ has joined #ffmpeg-devel

21:13 ccawley2011_ has quit [Ping timeout: 260 seconds]

21:16 ccawley2011 has quit [Ping timeout: 265 seconds]

21:16 ccawley2011__ has quit [Ping timeout: 252 seconds]

21:20 ccawley2011__ has joined #ffmpeg-devel

21:34 SuperFashi has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

21:34 SuperFashi has joined #ffmpeg-devel

22:01 <haasn> Overall speedup=1.931x faster, min=0.245x max=24.129x almost at the 2x

22:05 <haasn> Now the only slow paths left are either missing LUT optimizations (e.g. for treating rgb8 as pal8), missing optimized packed read/write fast paths (the MMX code in swscale vastly outperforms our code on these), and one corner case where we do unnecessary swizzling on planar read/writes

22:05 <haasn> I'm calling that a major success overall

22:05 <haasn> some cases are almost 6x sped up

22:06 <haasn> gray 1920x1080 -> gbrp 1920x1080, flags=0 dither=1, SSIM {Y=0.999977 U=1.000000 V=1.000000 A=1.000000}

22:06 <haasn> time=554 us, ref=3284 us, speedup=5.923x faster

22:10 ccawley2011 has joined #ffmpeg-devel

22:11 ccawley2011_ has joined #ffmpeg-devel

22:13 ccawley2011__ has quit [Ping timeout: 248 seconds]

22:15 ccawley2011 has quit [Ping timeout: 248 seconds]

22:22 ccawley2011_ has quit [Ping timeout: 244 seconds]

22:28 ccawley2011 has joined #ffmpeg-devel

22:29 ccawley2011_ has joined #ffmpeg-devel

22:33 ccawley2011 has quit [Ping timeout: 245 seconds]

22:35 ccawley2011 has joined #ffmpeg-devel

22:37 ccawley2011_ has quit [Ping timeout: 252 seconds]

22:44 ccawley2011_ has joined #ffmpeg-devel

22:46 ccawley2011 has quit [Ping timeout: 265 seconds]

22:47 ccawley2011__ has joined #ffmpeg-devel

22:48 ccawley2011_ has quit [Ping timeout: 276 seconds]

23:00 thilo has quit [Ping timeout: 272 seconds]

23:02 thilo has joined #ffmpeg-devel

23:02 thilo has quit [Changing host]

23:02 thilo has joined #ffmpeg-devel

23:20 ccawley2011__ has quit [Read error: Connection reset by peer]

23:21 labnan has quit [Quit: fBNC - https://bnc4free.com]

23:30 <jamrial> Lynne: can you check dale's aac patch

23:31 cone-528 has joined #ffmpeg-devel

23:31 <cone-528> ffmpeg Dale Curtis master:696ea1c22368: Don't attempt to parse ADTS from USAC packets.

23:32 <Lynne> didn't have to, the patch was already forwarded to me from the chromium folks, I told them to send it over so I can apply it

23:32 <Lynne> just didn't realize they had sent it over, despite checking a few times

23:55 labnan has joined #ffmpeg-devel