#ffmpeg-devel on 2025-03-14 — irc logs at libera.irclog.whitequark.org

2025-03-03 01:04 michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct

00:10 <haasn> ramiro: the DECL_ is really just to save myself from going insane repeating these signatures a million times

00:10 <haasn> being able to redefine them to something else is a very unintended side effect

00:10 <haasn> that said, you can already overload the SwsOpFunc to whatever signature you want

00:10 <haasn> for custom calling conventions

00:10 <haasn> I imagine that a fully asm backend will not be reusing these macros at all

00:11 <haasn> it's just for the generic C code, which maybe even will be deleted again at some point in the future

00:12 <haasn> ramiro: something I really want to see somebody (me?) try is adding a SPIR-V backend

00:12 <haasn> and doing some GPU scaling :)

00:12 <haasn> just to get some numbers

00:12 <haasn> it should be actually relatively straightforward, probably easier than the C backend

00:12 <haasn> ramiro: btw, one quirk I noticed today is that SWS_OP_LSHIFT doesn't work for changing the bitdepth of yuva444p

00:12 <haasn> because the alpha channel is full range, not left shift

00:13 <haasn> in theory we need some sort of LSHIFT_AND_EXPAND_ALPHA hybrid op; though what happens in practice right now is that go via float and do a vec4() scale operation

00:13 <haasn> which is technically more correct as I will remind that the alpha channel technically requires dithering in this instance

00:14 <haasn> but we almost surely want some sort of performance flag to skip dithering full range expansions above a certain bit depth

00:15 <haasn> and in that case we will need some sort of hybrid narrow-and-full range shift; maybe something like a mask of components to do an expanding shift on

00:17 IndecisiveTurtle has quit [Ping timeout: 245 seconds]

01:06 cone-526 has joined #ffmpeg-devel

01:06 <cone-526> ffmpeg softworkz master:0978fea7fa78: avcodec/vlc: Reduce debug logging

01:06 <cone-526> ffmpeg softworkz master:e0aedeb72e05: avformat/webvttdec: Add webvtt extension and MIME type

01:06 <cone-526> ffmpeg Michael Niedermayer master:a90ff8128772: avcodec/ffv1enc: Factor set_micro_version() out of ff_ffv1_write_extradata()

01:06 <cone-526> ffmpeg Michael Niedermayer master:dcf614279426: avcodec/ffv1enc: add space for the remap table to max_size

01:06 <cone-526> ffmpeg Michael Niedermayer master:437cbd25e089: avcodec/ffv1: Implement jeromes idea of making remap flip optional

01:13 ^Neo has joined #ffmpeg-devel

01:13 ^Neo has quit [Changing host]

01:13 ^Neo has joined #ffmpeg-devel

01:23 Thulinma has joined #ffmpeg-devel

01:51 thilo has quit [Ping timeout: 276 seconds]

01:52 thilo has joined #ffmpeg-devel

01:53 ^Neo has quit [Ping timeout: 260 seconds]

02:05 <haasn> ramiro: I'm getting amazing numbers

02:05 <haasn> the new CPS approach matches or even beats the previous approachs' fused fast paths

02:05 <haasn> without even any fast paths..

02:06 <haasn> speedup=2.124x on a simple 16-bit endian swap

02:06 <haasn> time=1618 us, ref=3423 us, speedup=2.116x faster on yuv444p10le -> yuv444p16be (the only two operations I have implemented atm are byte swapping and shifting)

02:06 <haasn> compare to ~2000 us on the previous approach

02:07 <haasn> the per-op overhead is now next to nothing

02:07 <haasn> one pointer dereference and a jump instruction

02:10 <haasn> in fact, the overhead is so low that we can almost go down to SWS_CHUNK_SIZE 8 without significant performance drop

02:10 <haasn> allowing us to fit 32 bit vectors into AVX2 registers

02:10 <haasn> which may even end up faster in the general fast (where floats are being used)

02:13 <haasn> it also has some other nice benefits; since it's so cheap to just clear a vector to zero we can also rely on unread components being initialized to 0 (or perhaps 255 for alpha)

02:13 <haasn> and omit the clear operations in those cases

02:13 <haasn> maybe clearing always to -1 makes the most sense, since clearing alpha is the most common case

02:14 <haasn> and in the other cases involving clears we want to clear to 128 anyways

03:26 philipl has quit [Quit: leaving]

03:31 philipl has joined #ffmpeg-devel

03:57 jamrial has quit []

04:06 cone-526 has quit [Quit: transmission timeout]

04:23 amberjojo has joined #ffmpeg-devel

04:41 Martchus has joined #ffmpeg-devel

04:42 Martchus_ has quit [Ping timeout: 276 seconds]

05:00 System_Error has quit [Remote host closed the connection]

05:06 System_Error has joined #ffmpeg-devel

05:11 <fflogger> [editedticket] Balling: Ticket #11505 ([avcodec] Cuvid decoders do not work with CUDA hwaccel anymore) updated https://trac.ffmpeg.org/ticket/11505#comment:10

05:28 ukn_unknown has joined #ffmpeg-devel

05:44 amberjojo has quit [Quit: Konversation terminated!]

05:54 ukn_unknown has quit [Ping timeout: 240 seconds]

06:35 amberjojo has joined #ffmpeg-devel

06:35 amberjojo has quit [Changing host]

06:35 amberjojo has joined #ffmpeg-devel

06:41 amberjojo has quit [Read error: Connection reset by peer]

07:16 jdek has quit [Ping timeout: 272 seconds]

07:17 jdek has joined #ffmpeg-devel

08:36 ngaullier has joined #ffmpeg-devel

08:36 ngaullier has quit [Remote host closed the connection]

08:39 ngaullier has joined #ffmpeg-devel

08:47 ^Neo has joined #ffmpeg-devel

08:47 ^Neo has quit [Changing host]

08:47 ^Neo has joined #ffmpeg-devel

08:57 MyNetAz has quit [Remote host closed the connection]

09:02 MyNetAz has joined #ffmpeg-devel

09:08 ^Neo has quit [Ping timeout: 272 seconds]

09:40 <fflogger> [newticket] RandomPerson: Ticket #11508 ([undetermined] MediaCodec as a whole is broken) created https://trac.ffmpeg.org/ticket/11508

09:41 MyNetAz has quit [Remote host closed the connection]

09:45 MyNetAz has joined #ffmpeg-devel

10:12 mkver has quit [Quit: Leaving]

10:13 <fflogger> [editedticket] joe: Ticket #7059 ([avfilter] Compile error in ffmpeg git master with opencv 3.4.1) updated https://trac.ffmpeg.org/ticket/7059#comment:3

10:23 <fflogger> [newticket] david-metrica: Ticket #11509 ([avcodec] Failed setup for format videotoolbox_vld - Video Decoding (H264 yuv420p)) created https://trac.ffmpeg.org/ticket/11509

10:24 <JEEB> BtbN: btw just noticed gnome's gitlab seemingly put https://xeiaso.net/blog/2025/anubis/ in front of their instance

10:37 <fflogger> [newticket] david-metrica: Ticket #11510 ([undetermined] Hardware accelerator failed to decode picture - Videotoolbox HEVC) created https://trac.ffmpeg.org/ticket/11510

10:40 Anthony_ZO has joined #ffmpeg-devel

10:45 mkver has joined #ffmpeg-devel

10:49 <Lynne> everyone I've heard from complain about scrapers recently has been happy with blocking alibaba's

11:05 ^Neo has joined #ffmpeg-devel

11:05 ^Neo has quit [Changing host]

11:05 ^Neo has joined #ffmpeg-devel

11:05 rvalue- has joined #ffmpeg-devel

11:06 cone-467 has joined #ffmpeg-devel

11:06 <cone-467> ffmpeg Andreas Rheinhardt master:e4c8e80a2efe: avcodec/x86/constants: Move constants only used by cavsdsp to it

11:06 rvalue has quit [Ping timeout: 252 seconds]

11:08 ahmedhamed has joined #ffmpeg-devel

11:09 rvalue- is now known as rvalue

11:55 jamrial has joined #ffmpeg-devel

12:01 <jamrial> wbs: weird, i can't reproduce the failure

12:01 <jamrial> i see only https://fate.ffmpeg.org/report.cgi?slot=x86_64-mingw32-clang-trunk&time=20250314013244 passes

12:02 <wbs> jamrial: it seems to pass on x86_64, but fail on i686, arm and aarch64

12:02 <jamrial> no, https://fate.ffmpeg.org/report.cgi?slot=x86_64-mingw-w64-windows-native&time=20250314054300 for example fails

12:03 <wbs> right

12:04 <wbs> fails for me in a linux/gcc/i386 setup too

12:07 ccawley2011 has joined #ffmpeg-devel

12:09 ccawley2011 has quit [Read error: Connection reset by peer]

12:12 ccawley2011 has joined #ffmpeg-devel

12:34 <jamrial> wbs: the problem is probably in the unscaled path for semiplanar copy

12:34 <jamrial> so the test did its job at finding something :p

12:35 <wbs> of course! that's usually the case when there's lacking test coverage :)

12:51 <ramiro> haasn: I'm kind of hoping you will write the spir-v backend :P

12:52 <ramiro> I know nothing about spir-v, but I want to play around with emitting assembly directly.

12:57 <haasn> I'm kind of hoping Lynne will :P but yeah I'll get to it eventually

13:13 rvalue has quit [Read error: Connection reset by peer]

13:14 MyNetAz has quit [Remote host closed the connection]

13:14 rvalue has joined #ffmpeg-devel

13:18 ahmedhamed has quit [Quit: Connection closed for inactivity]

13:18 MyNetAz has joined #ffmpeg-devel

13:34 realies has joined #ffmpeg-devel

13:38 ccawley2011 has quit [Ping timeout: 268 seconds]

13:51 <toots5446> Lynne: I started the next phase of the ogg work. Turns out vorbis has frames that need several packets to be decoded it seems.

13:52 <toots5446> There are competing mechanisms in place for metadata-related updates with packets: what's the difference between AV_PKT_DATA_METADATA_UPDATE and AV_PKT_DATA_STRINGS_METADATA ?

14:02 ccawley2011 has joined #ffmpeg-devel

14:05 ccawley2011_ has joined #ffmpeg-devel

14:06 cone-467 has quit [Quit: transmission timeout]

14:08 ccawley2011 has quit [Ping timeout: 260 seconds]

14:22 <Lynne> toots5446: use metadata_update

14:22 <Lynne> jamrial: are you going to send a patch for ffv1 to skip decoding on correct parsing?

14:23 <jamrial> yeah

14:25 <Lynne> thanks

14:30 <haasn> ramiro: oh no...

14:30 <haasn> I ran into a GCC codegen bug

14:31 <haasn> this is terrible

14:36 <Lynne> are you using the native vectors?

14:37 <haasn> yeah :/

14:37 <haasn> actually, it seems it's not an outright bug, just suboptimal code

14:37 <haasn> nvm then

14:40 minimal has joined #ffmpeg-devel

14:45 <Lynne> yeah, they've been broken as hell ever since I first thought using them is a good idea

14:45 <Lynne> 10 years ago

14:45 <Lynne> it doesn't help that clang and gcc have different ideas on what they should be named and how they should work

14:52 ccawley2011__ has joined #ffmpeg-devel

14:53 <jamrial> wbs: should be fixed by the patch i just sent

14:53 ccawley2011_ has quit [Ping timeout: 260 seconds]

14:53 <jamrial> guess nobody ever converted semiplanar to semiplanar with different bitdepths until now

14:57 s55 has quit [Ping timeout: 276 seconds]

15:01 Anthony_ZO has quit [Ping timeout: 268 seconds]

15:11 ccawley2011__ has quit [Ping timeout: 260 seconds]

15:12 <wbs> jamrial: thanks!

15:14 s55 has joined #ffmpeg-devel

15:41 <ramiro> haasn: it's good that every few years people try again to autovectorize stuff with gcc/clang, but the conclusion has pretty much always been "not worth it". I don't mean to throw cold water on your work, but I think the end result will be pretty much hit and miss accross architectures and gcc versions. we might even end up with code that's slower than the c reference.

15:41 <haasn> time=707 us, ref=704 us, speedup=0.996x slower # gray8 -> yuvj444p

15:42 <haasn> down from time=738 us, ref=702 us, speedup=0.951x slower for the old approach

15:42 <haasn> so we now match hand-written asm for clearing chroma

15:43 <haasn> even with the extra function call overhead on top

15:43 <ramiro> I think it will on average give good enough results, which will already be better than what we currently have for non-optimized code. also having a description of the steps taken in the scaler is a huge bonus, since it allows us to check the correctness of each conversion and not have subtle differences in each specific x->y converter.

15:43 <haasn> ramiro: sure, I think people are misunderstanding; I'm never saying we don't need hand written asm

15:44 <ramiro> haasn: I understand that. but I think there's only so much effort that should be spent in autovectorization. other parts of your work are much more valuable imo.

15:44 <haasn> actually, the main reason I am investing so much time into the C code is because it allows me to iterate on the design and get performance numbers that will be representative of an eventual asm rewrite

15:45 <haasn> it's still an order of magnitude easier to get good autovectorized code than hand written asm

15:45 <ramiro> haasn: might be, but hand writing asm is more fun and more predictable :)

15:46 <haasn> well in any case, I think the current approach is closer to final

15:46 <haasn> so if you want to have fun writing some asm routines, the very big offender currently is read/write_packed2/3/4

15:47 <ramiro> haasn: on neon there are single instructions for that

15:47 <haasn> I think I will refactor SwsOpExec slightly to make it more asm friendly but apart from that the current signature should be pretty close to final

15:48 <ramiro> do you plan on adding downsampling with easy chroma offsets (like central for example)?

15:48 <ramiro> (as an SwsOp I mean)

15:49 <haasn> not at the moment

15:50 <haasn> I want to rewrite all of the missing ops from swscale3 on top of swscale4 and then submit that for review again

15:50 <ramiro> ok

15:50 <haasn> but yes the goal is eventually to have dedicated up/downsamplers for these cases, I think

15:54 <ramiro> haasn: currently it only does one line at a time, right? so for now there's no way to have 2 lines input in vectors

15:56 <haasn> not necessarily

15:56 <haasn> the calling code runs the opchain in an access pattern that is striped over luma rows

15:56 <haasn> but the implementation could internally do whatever it wants

15:57 <haasn> so something we could do is define a separate UNSUBSAMPLE() op that has its own signature where it gets two chroma lines

15:57 <haasn> or something like that

16:00 <haasn> yet another advantage of the cps approach

16:00 <haasn> you can add or remove vectors inside the operations chain as long as each function's output matches the signature of the next input

16:07 <haasn> ramiro: https://github.com/haasn/FFmpeg/commits/swscale4 current WIP if you want to take a look at it

16:08 <haasn> I like it a lot more than the previous approach; it's also less bloated as we don't really need to define so many "fused" variants

16:09 <haasn> so in contrast, we can define more "subset" variants of common ops, e.g. ignoring alpha when doing an op like convert u8 -> f32

16:09 <haasn> and on top of this, ops code is quite a bit smaller

16:23 kasper93 has quit [Remote host closed the connection]

16:28 <fflogger> [newticket] kierank: Ticket #11511 ([undetermined] Haivision Pro360 device ships FFmpeg nonfree) created https://trac.ffmpeg.org/ticket/11511

16:36 minimal has quit [Quit: Leaving]

16:55 ccawley2011 has joined #ffmpeg-devel

17:02 jdek has quit [Ping timeout: 248 seconds]

17:03 jdek has joined #ffmpeg-devel

17:16 ccawley2011 has quit [Ping timeout: 276 seconds]

17:34 IndecisiveTurtle has joined #ffmpeg-devel

17:38 ccawley2011 has joined #ffmpeg-devel

17:53 IndecisiveTurtle has quit [Quit: IndecisiveTurtle]

18:09 ngaullier has quit [Remote host closed the connection]

18:14 LainExperiments has joined #ffmpeg-devel

18:32 ccawley2011 has quit [Ping timeout: 248 seconds]

18:51 ccawley2011 has joined #ffmpeg-devel

18:57 cone-515 has joined #ffmpeg-devel

18:57 <cone-515> ffmpeg Niklas Haas master:ae84aa775fa7: swscale/utils: split off format code into new file

19:15 LainExperiments has quit [Quit: Client closed]

19:16 HarshK23 has quit [Quit: Connection closed for inactivity]

19:17 minimal has joined #ffmpeg-devel

19:25 LainExperiments has joined #ffmpeg-devel

19:38 kasper93 has joined #ffmpeg-devel

20:05 <haasn> c->xyzgammainv[i] = round(pow(i / 65535.0, xyzgammainv) * 4095.0);

20:05 <haasn> is there any reason why this would give a different result on 32 bit and 64 bit?

20:06 <haasn> it seems that our XYZ conversion tables are subtly different on 32 bit; but they are all declared as double

20:09 <haasn> seems like the 32 bit code uses 80 bit extended precision

20:11 <jamrial> haasn: x87 has 80 bit intermediates, yes. it's why those tests are failing. force sse math and it's "fixed"

20:14 <haasn> magic value to break it is i == 3415

20:14 <haasn> jamrial: how would you recommend doing that?

20:14 <haasn> do we have a precedent?

20:15 <jamrial> don't think so

20:15 <jamrial> it would require forcing sse2 (for doubles) on an arch that technically (but not realistically) may lack it

20:16 <jamrial> "-msse2 -mfpmath=sse" for gcc

20:16 <jamrial> mkver tried it, and some other test failed in turn

20:17 <haasn> annoying

20:20 ahmedhamed has joined #ffmpeg-devel

20:23 <haasn> GCC man page claims that -fexcess-precision=standard should be implied by -std=c99

20:24 <haasn> and presumably above

20:26 <haasn> but even with explicitly setting that option I get differing results

20:30 <nevcairiel> standard only ensures the right precision on assignments, intermediates will sitll be in the full 80 bit registers

20:33 <haasn> I tried casting (and storing) after every single op and the result is still different

20:35 <nevcairiel> i imagine it wont stop it from optimizing those away

20:36 <haasn> urgh

20:40 <wbs> yes, with floats, it requires a lot of effort and compiler/toolchain specific twiddling to get bitexact results. in practice, you need to allow some tolerance in the end if you're using floats

20:44 <haasn> that would amount to disabling the test

20:45 <haasn> maybe we can just ignore the regression until we switch to the new pipeline I'm working on

20:45 <haasn> it will handle xyz conversion much more efficiently as a side effect

20:48 <fflogger> [editedticket] cgbug: Ticket #11490 ([avformat] [Regression] Audio silent for long MOV file) updated https://trac.ffmpeg.org/ticket/11490#comment:9

20:51 <mkver> haasn: What makes you believe that the new pipeline will be bitexact?

20:54 LainExperiments has quit [Quit: Client closed]

20:55 LainExperiments has joined #ffmpeg-devel

20:59 <haasn> mkver: well, if -fexcess-precision=standard does what it claims at all, we can make our float ops bit exact; with significantly fewer cases to worry about, as long as we make all float ops bit exact, the result of concatenating them has to be bit exact as well

20:59 <haasn> is what I would say, but it seems that -fexcess-precision=standard does not even do anything at all

21:00 <haasn> ah it does, it inserts extra fstp / fldp sequences

21:01 <haasn> but what I'm more worried about is the possibility of us having to define a canonical order for how to do float summation (for filtering)

21:02 <haasn> maybe we will need a fixed precision path in the end to respect BITEXACT

21:10 <wbs> that's a bit problematic though, if all our tests run in bitexact mode, it'd mean that the real-world used performance codepath would be essentially uncovered by tests

21:11 <haasn> true

21:12 <haasn> well, we can still have a bit exact reference C path

21:12 <wbs> for e.g. many audio decoders, we keep a full reference file, and do off-by-one (or off-by-few) test comparisons against that

21:12 <haasn> even with float math

21:13 <haasn> as long as we don't enable things like -fassociative-math

21:13 <haasn> and then non bit exact asm routines could choose a different summation order for things like convolutions

21:22 <jannau> if there is a know (and tested) bit-exact path you could use that as reference and do float ulp based comparissons

21:23 <wbs> the really annoying thing wrt float exactness is if you need to invoke math functions, as they can have much more differing precision than a couple ulps

21:24 <wbs> I remember cases (from ~20 years ago) in trying to get a float based game engine to give identical simulation results across various systems (windows/i386, linux/i386 and macos/ppc at the time)

21:25 <wbs> there were cases like sin(almost_pi) which should be close to zero, but on some system (don't rememeber if it was i386 or ppc) you'd get a float with over half of the mantissa just being zero bits

21:28 <haasn> I think that this heavily applies to the code inside cms.c fwiw

21:28 <haasn> I think our gamut/tone mapping code is 100% not bitexact

21:29 <haasn> but this is already the status quo

21:29 <haasn> I am pretty hesitant to introduce math functions to the format ops pipeline at this stage

21:30 <haasn> I'd rather pregenerate LUTs with AVRational for any sort of gamma conversion

21:30 <haasn> if only to avoid locking us out of a fixed precision path in the future

21:39 Teukka has quit [Read error: Connection reset by peer]

21:44 <toots5446> I've got the next step with chained ogg stuff. I've got it where I can do a while ffmpeg -c copy and keep chained streams w/ metadata with consistent PTS/DTS, pretty cool.

21:45 <toots5446> Would be nice to get the first round commited w/ the samples in FATE so we can finish this. I might send a WIP patch series over the weekend to get some early feedback.

21:45 Teukka has joined #ffmpeg-devel

21:45 Teukka has quit [Changing host]

21:45 Teukka has joined #ffmpeg-devel

21:46 <toots5446> Also ment to add: also removed the ogg header packets from subsequent chained streams from the demuxer. Turns out that the ogg muxer is able to reconstruct those headers so there wasn't much to do on that front except remove them :-)

21:56 <Lynne> every repository I try to pull out of gets me a 502

21:56 <Lynne> all this scraping is getting heavy

21:56 Mirarora has joined #ffmpeg-devel

21:57 <Lynne> and its not like any of this goes to search engines

21:57 cone-515 has quit [Quit: transmission timeout]

21:59 <BtbN> pull via ssh

22:02 Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]

22:09 Mirarora has joined #ffmpeg-devel

22:21 blb has quit [Ping timeout: 260 seconds]

22:22 blb has joined #ffmpeg-devel

22:30 ahmedhamed has quit [Quit: Connection closed for inactivity]

22:36 realies has quit [Quit: Ping timeout (120 seconds)]

22:37 realies has joined #ffmpeg-devel

22:38 kasper93 has quit [Ping timeout: 248 seconds]

22:40 kasper93 has joined #ffmpeg-devel

22:50 ccawley2011 has quit [Read error: Connection reset by peer]

22:50 lemourin has quit [Quit: The Lounge - https://thelounge.chat]

22:51 lemourin has joined #ffmpeg-devel

22:54 lemourin has quit [Client Quit]

22:58 lemourin has joined #ffmpeg-devel

23:00 witchymary has quit [Remote host closed the connection]

23:01 witchymary has joined #ffmpeg-devel

23:28 derpydoo has joined #ffmpeg-devel

23:41 derpydoo has quit [Remote host closed the connection]

23:44 derpydoo has joined #ffmpeg-devel