#ffmpeg-devel on 2025-04-13 — irc logs at libera.irclog.whitequark.org

2025-03-03 01:04 michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct

00:08 abdu has joined #ffmpeg-devel

00:16 thilo has quit [Ping timeout: 260 seconds]

00:17 thilo has joined #ffmpeg-devel

01:24 derpydoo has joined #ffmpeg-devel

01:27 philipl has joined #ffmpeg-devel

01:45 blut has joined #ffmpeg-devel

01:53 \\Mr_C\\ has quit [Remote host closed the connection]

02:09 blut has left #ffmpeg-devel [#ffmpeg-devel]

02:23 Anthony_ZO has joined #ffmpeg-devel

02:29 ^Neo has quit [Ping timeout: 252 seconds]

02:40 abdu has quit [Quit: Client closed]

02:53 jamrial has quit []

03:29 Traneptora has joined #ffmpeg-devel

04:45 kasper93 has quit [Ping timeout: 276 seconds]

04:46 kasper93 has joined #ffmpeg-devel

04:55 kasper93 has quit [Ping timeout: 252 seconds]

04:56 kasper93 has joined #ffmpeg-devel

06:09 mkver has joined #ffmpeg-devel

07:01 cone-291 has joined #ffmpeg-devel

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:18309fba3c82: avcodec/hq_hqadata: Avoid relocations

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:e38616c4acc5: avcodec/hq{xvlc,_hqadata}: Deduplicate and hardcode cbp table

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:12c9ffa569a0: avcodec/hq: Include alpha in cbp VLC table

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:ce0074f97bdc: avcodec/hq_hqa: Use RL-VLC table

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:9c0d6145c9e0: avcodec/hq_hqa: Include implicit +1 run in RL VLC table

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:c12108cdaa70: avcodec/hq_hqa: Don't zero in small chunks, don't zero twice

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:c1f124f3f03c: avcodec/hq_hqa: Use ff_vlc_init_from_lengths()

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:c39e23cc919b: avcodec/hq_hqa: Check available date before allocating frame

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:16943876f877: avcodec/hq_hqa: Remove implicit always-false checks

07:01 <cone-291> ffmpeg Andreas Rheinhardt master:bf327ac6762d: avcodec/hq_hqa: Check size before initializing GetByteContext

07:37 System_Error has quit [Ping timeout: 264 seconds]

07:41 System_Error has joined #ffmpeg-devel

09:41 Anthony_ZO has quit [Remote host closed the connection]

09:44 derpydoo has quit [Quit: derpydoo]

10:01 cone-291 has quit [Quit: transmission timeout]

10:04 abdu has joined #ffmpeg-devel

10:15 pross has quit [Read error: Connection reset by peer]

11:00 mkver has quit [Ping timeout: 244 seconds]

11:01 mkver has joined #ffmpeg-devel

11:45 ^Neo has joined #ffmpeg-devel

11:45 ^Neo has quit [Changing host]

11:45 ^Neo has joined #ffmpeg-devel

12:36 jamrial has joined #ffmpeg-devel

12:47 <haasn> ramiro: sad, avx FMA instructions are not bit exact :( I guess this is where we start hitting real problems

12:48 <jamrial> yeah, fma has high precision intermediates

13:08 microchip_ has quit [Quit: There is no spoon!]

13:09 microchip_ has joined #ffmpeg-devel

13:17 <haasn> gbrp 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1, SSIM {Y=0.999993 U=0.999994 V=0.999993 A=1.000000}

13:17 <haasn> time=738 us, ref=1785 us, speedup=2.419x faster

13:17 <haasn> regardless of precision differences

13:25 Anthony_ZO has joined #ffmpeg-devel

13:32 averne has quit [Quit: quit]

13:33 averne has joined #ffmpeg-devel

13:36 averne has quit [Read error: Connection reset by peer]

13:36 averne_ has joined #ffmpeg-devel

13:37 averne_ is now known as averne

13:40 averne has quit [Read error: Connection reset by peer]

13:40 averne has joined #ffmpeg-devel

13:42 abdu59 has joined #ffmpeg-devel

13:45 abdu has quit [Ping timeout: 240 seconds]

14:10 <haasn> time=80 us, ref=847 us, speedup=10.587x faster // gray -> gbrp on x86 asm backend

14:10 <haasn> versus 4.2x speedup on the C implementation

14:10 <haasn> I am speed

14:14 <haasn> seems like the x86 asm backend is about 2x-3x faster than the autovectorized C reference across the board

14:16 <haasn> gray10 -> gray16 is 4.6x faster even

14:26 <fflogger> [editedticket] Balling: Ticket #11542 ([ffmpeg] gdigrab sometimes fails to capture specific windows on some machines) updated https://trac.ffmpeg.org/ticket/11542#comment:14

14:35 averne has quit [Ping timeout: 244 seconds]

14:35 averne has joined #ffmpeg-devel

14:36 averne has quit [Client Quit]

14:38 averne has joined #ffmpeg-devel

14:43 averne has quit [Quit: quit]

14:43 <fflogger> [editedticket] hackerfactor: Ticket #6763 ([swscale] swscale: Out-of-bounds memory accesses) updated https://trac.ffmpeg.org/ticket/6763#comment:6

14:49 minimal has joined #ffmpeg-devel

15:01 averne has joined #ffmpeg-devel

15:01 averne has quit [Client Quit]

15:08 averne has joined #ffmpeg-devel

15:20 <toots5446> michaelni: thanks for the review. Do you have any suggestion on the logic that should be applied to that decoder patch? I would really like to move forward with it and I'm not sure what is the most advisable path. I'm all about a solution that is satisfactory for now and can be refined once we actually have code using it.

15:21 <haasn> Is there an explanation anywhere of what the SBUTTERFLY and TRANSPOSE macros in x86util are doing, or how to use them?

15:22 <haasn> cc BBB

15:22 <toots5446> Maybe we should clear all pending metadata that have a PTS lower than the latest decoded frame? Or simple keep the latest pending metadata and clear it as soon as another one is ssubmitted or a frame with a higher PTS is decoded.

15:26 <jamrial> haasn: https://pastebin.com/raw/9cfdnAmZ

15:31 <fflogger> [editedticket] MasterQuestionable: Ticket #11430 ([avformat] [Regression] Data stream in output may glitch "-stats" display since 7.0) updated https://trac.ffmpeg.org/ticket/11430#comment:14

15:33 <kurosu> haasn: it's DCT-like stuff

15:35 <haasn> jamrial: I'm guessing that's for 8x8. What do the variants like 4x4B do to the extra elements beyond the first 4 bytes in each register? Just ignore them / put random data?

15:35 Anthony_ZO has quit [Ping timeout: 252 seconds]

15:36 <haasn> How does TRANSPOSE8x8W work when mmsize < 32? Or does it just not work in that case?

15:37 <jamrial> 8 words is 16 bytes, so it works fine with mmsize 16

15:37 <jamrial> it uses an extra reg on x86_64, or stack on x86_32, for temporary storage

15:38 <jamrial> and yes, afaik, the ones smaller than the destination reg just end up with garbage in the upper bits, which can be ignored

15:38 <haasn> oh, right - and there isn't an 8x8D variant. hrm

15:39 <haasn> I have a scenario where I effectively need to transpose 32x8B. I can pshufb this to turn it into a 16x8D transpose istead

15:39 <haasn> but none of those variants exist, it seems

15:39 <BBB> haasn: I'm wondering if you're thinking about this the wrong way

15:39 <haasn> (context: reading RGBA into separate registers for R, G, B, A

15:39 <jamrial> 16x16w uses 16 regs of 16 bytes each, so no mmsize 32 version

15:39 <BBB> don't take this the wrong way

15:40 <BBB> but these macros are not meant to be useful API

15:40 <BBB> they are just repeated sets of instructions

15:40 <BBB> that we turn into a macro so we don't have to explicitly write it out every time

15:42 <BBB> transpose8x8w, for example, likely works for any register size, it's just not necessarily a full cross-lane transpose, just an in-lane one

15:42 <BBB> is that useful? depends on what you're trying to do...

15:42 <jamrial> yeah, if there isn't an specific macro, is because no decoder needed it

15:43 <jamrial> you can just go and mix SBUTTERFLYs and SWAPs to get 8x8d

15:44 <haasn> what is SBUTTERFLY doing on a conceptual level?

15:44 <haasn> {abcd} {xyzw} -> {axby} {czdw}?

15:44 <jamrial> there's a small explanation above its definition

15:45 <haasn> right, I think I get it now

15:48 <haasn> I think the problem I'm facing is that I have contiguous elements in separate _lanes_ of the same register

15:49 <haasn> whereas all of these interleaving instructions will end up placing elements from different registers adjacent to each other

15:50 <haasn> maybe what I should be doing is loading xmm sized registers at a time then

15:50 <jamrial> SBUTTERFLY dqqq is crosslane

15:51 <jamrial> and like BBB said, they are just macros for specific combinations of instructions, to make code shorter

15:51 <jamrial> you don't need to use them

15:52 <haasn> e.g. movu xm0, [r2]; movu xm1, [r2+16]; ... vinserti128 m0, m0, [r2+64], 1; vinserti128 m1, m1, [r2+80]

15:52 <jamrial> just write the shuffles with punpck*, vperm and such as needed

15:53 Marth64 has joined #ffmpeg-devel

15:53 Marth64[m] has quit [Ping timeout: 276 seconds]

15:54 <haasn> and then I can unpack the lanes individually and it will all be in the correct order at the end

15:54 <haasn> yeah I think that's the approach I'll go with

15:54 <haasn> I don't imagine movu xm0 + vinserti128 ym0 is significantly slower than movu ym0, the bottleneck is going to be reading data either way

15:56 DauntlessOne4 has quit [Ping timeout: 252 seconds]

15:58 <jamrial> haasn: maybe a gather

15:59 <haasn> will try that also

16:00 abdu59 has quit [Ping timeout: 240 seconds]

16:00 <haasn> out of interest I did benchmark vinserti128 vs just fixing the lane order afterwards using vpermq for read_packed2 and the former was 8.1 cycles vs the latter 7.9 cycles

16:06 abdu has joined #ffmpeg-devel

16:10 <kurosu> Is looking at anything below 0.5 cycles meaningful? It doesn't exist?

16:13 <kurosu> And I stopped looking at vpgather for anything that is stride loading. Even if dav1d using them would indicate they're beneficial

16:26 <fflogger> [editedticket] MasterQuestionable: Ticket #11435 ([avformat] Added "-extension_picky" breaks various applications) updated https://trac.ffmpeg.org/ticket/11435#comment:30

16:31 <fflogger> [newticket] kasper93: Ticket #11545 ([avcodec] [dec:libzvbi_teletextdec] Error while opening decoder: Internal bug, should not have happened) created https://trac.ffmpeg.org/ticket/11545

16:33 <michaelni> toots5446, really, whatever is simple and clean. Because whatever is choosen it might turn out to fail in some corner case (maybe timestamp resets, maybe a corrupted frame) its easier to adjust when its in git and we have testcases than just thinking about theory

16:33 <mkver> kasper93: Why don't you just ping these patches?

16:37 zsoltiv has joined #ffmpeg-devel

16:37 zsoltiv_ has joined #ffmpeg-devel

16:38 <kasper93> I did, I'm trying new strategy to track patches that noone wants to merge

16:39 <fflogger> [editedticket] MasterQuestionable: Ticket #11542 ([ffmpeg] gdigrab sometimes fails to capture specific windows on some machines) updated https://trac.ffmpeg.org/ticket/11542#comment:15

16:45 <kasper93> would it be possible to add v7 to trac? The latest version you can tag is 6.1.1

16:45 <kasper93> nvm, it's not sorted ;p

16:45 <kasper93> my bad

16:45 <fflogger> [editedticket] MasterQuestionable: Ticket #11217 ([ffmpeg] Output "-ss" memory consumption regression) updated https://trac.ffmpeg.org/ticket/11217#comment:26

17:09 <fflogger> [editedticket] MasterQuestionable: Ticket #11271 ([undetermined] How to choose the best hwaccel?) updated https://trac.ffmpeg.org/ticket/11271#comment:9

17:13 k777 has joined #ffmpeg-devel

17:18 <jkqxz> Is there an explanation anywhere of how the frame/slice threading for an lavc decoder works?

17:19 <jkqxz> I seem to have slice threading working trivially by just calling execute2 at the right moment, but I'm not seeing how to connect the pieces together to get frame threading by similar magic.

17:21 <jkqxz> (And I'm pretty sure for frame threading I need to do more to manage context structures.)

17:25 <mkver> jkqxz: If it is an intra-only codec (with no dependencies between frames), then just use s/ff_get_buffer/ff_thread_get_buffer/ and add the AV_CODEC_CAP_FRAME_THREADS.

17:27 <jkqxz> How is that serialising use of private context structures?

17:27 <mkver> If not, you will need an update_thread_context callback. The paradigm here is as follows: Thread A parses the header and sets ctx fields appropriately, then call ff_thread_get_buffer() and finally ff_thread_finish_setup(). After this, thread A decodes its frame and signals decoding progress, typically via ff_progress_frame_report(). Thread A must not modify any field read by update_thread_context after that.

17:27 <mkver> Every worker thread has its own private context.

17:28 <mkver> FFCodec.init is called on every worker thread.

17:28 <jkqxz> decoder->init got called N times?

17:28 <mkver> Yes.

17:28 <jkqxz> Huh. Ok.

17:29 <mkver> (Decoders can check whether they are the first frame worker thread in order to parse extradata only once.)

17:30 <mkver> What decoder are we talking about?

17:30 <jkqxz> Making that change does seem to run. Disabling slice threads and decoding a three-frame file gets close to a 3x speedup.

17:30 <jkqxz> I will run in tsan to make sure I haven't got some other bad interaction.

17:31 <mkver> You can't really have races with intra-only codecs without update_thread_context.

17:31 <mkver> What decoder are we talking about?

17:31 <mkver> It's not mjpeg, isn't it?

17:31 <jkqxz> If the approach is essentially to make N instances of the decoder and uses them completely independently then indeed it should be good.

17:32 <jkqxz> APV.

17:44 <jkqxz> How do frame and slice threads interact here?

17:45 <jkqxz> On an input with strong slice threading opportunities enabling both makes it much slower.

17:45 <mkver> Only one threading type is active for any AVCodecContext.

17:46 <mkver> Frame threading is prefered generally over slice threading, see lavc/pthreads.c

17:46 <fflogger> [editedticket] MasterQuestionable: Ticket #11544 ([avcodec] Speex playback stutters) updated https://trac.ffmpeg.org/ticket/11544#comment:4

17:46 <mkver> (Hypothetically users can select to use frame threads and override the execute/execute2 callbacks to use both.)

17:49 <jkqxz> Is there any underlying reason for that choice or is it just it hasn't been implemented?

17:51 <mkver> It would not work with the way progress is signaled for frame-threaded decoders (progress is a simple number, typically meaning "the number of macroblock rows that have been decoded and can be referenced").

17:53 ^Neo has quit [Ping timeout: 248 seconds]

17:56 <jkqxz> Right, for other decoders which do have interdependency. But if I do have a decoder where every slice is independent then it could be implemented.

17:57 <jkqxz> AV_CODEC_CAP_SLICE_THREADS_AND_FRAME_THREADS_AT_THE_SAME_TIME

18:01 <mkver> But hasn't.

18:03 <jkqxz> Yep. And it would need the new cap to distinguish existing codecs which support slice xor frame from codecs which can support slice or frame.

18:04 <mkver> I am not sure it would need a new cap. Users can already choose to both, but lavc will just use one (there is field "active_thread_type" (set by lavc) and thread_type (or so)).

18:04 <mkver> s/new cap/new public cap/

18:05 <jkqxz> True, it could be internal-only.

18:09 jamrial has quit []

18:14 jamrial has joined #ffmpeg-devel

18:59 TheAIDev has joined #ffmpeg-devel

19:01 <TheAIDev> any plans for subtitle rework/patches soon?

19:15 ___nick___ has joined #ffmpeg-devel

19:39 kasper93 has quit [Quit: kasper93]

19:39 kasper93 has joined #ffmpeg-devel

19:47 ___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

19:49 abdu has quit [Ping timeout: 240 seconds]

19:49 ___nick___ has joined #ffmpeg-devel

19:49 ___nick___ has quit [Client Quit]

19:52 ___nick___ has joined #ffmpeg-devel

19:54 abdu has joined #ffmpeg-devel

19:57 abdu76 has joined #ffmpeg-devel

20:00 abdu has quit [Ping timeout: 240 seconds]

20:02 abdu has joined #ffmpeg-devel

20:04 ___nick___ has quit [Ping timeout: 265 seconds]

20:04 abdu76 has quit [Ping timeout: 240 seconds]

20:38 abdu has quit [Ping timeout: 240 seconds]

20:56 abdu has joined #ffmpeg-devel

21:04 IndecisiveTurtle has quit [Ping timeout: 248 seconds]

21:15 TheAIDev has quit [Quit: Client closed]

21:17 ^Neo has joined #ffmpeg-devel

21:17 ^Neo has quit [Changing host]

21:17 ^Neo has joined #ffmpeg-devel

21:18 cone-553 has joined #ffmpeg-devel

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:516bcfc169ac: avutil/aes: Use #if checks instead of if (ARCH_X86)

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:f81ace52f8e7: avutil/aes: Make aes_init_static() av_cold

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:ab1bc2f74574: avcodec/aacenc: Remove always-false check

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:c8c4e55b2b56: avcodec/motionpixels: Avoid av_unused

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:35fcdb21322d: swscale/x86/rgb2rgb: Deduplicate ASM constants

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:413905bff2ea: avcodec/opus/tab: Deduplicate arrays

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:044bfc778565: avcodec/aac{enc,}tab: Deduplicate swb tables

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:66310f8a2204: avformat/asf_tags: Deduplicate tags

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:13a0d0ade168: avcodec/mpegaudioenc_template: Remove always-false branch

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:db75955d6023: avcodec/mpegaudioenc_{fixed,float}: Merge encoders

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:6f7ebeff708b: avcodec/mpegaudioenc: Combine writing scale factors

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:87f3e2093144: avcodec/mpegaudioenc: Avoid intermediate buffer

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:62e1abcf0dfd: avcodec/mpegaudioenc: Don't pad one bit at a time

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:7915e2a09586: avcodec/mpegaudioenc: Move PutBitContext to stack

21:18 <cone-553> ffmpeg Andreas Rheinhardt master:75d5672b4bff: avcodec/mpegaudioenc: Rename MPA_encode_* -> mpa_encode_*

21:27 IndecisiveTurtle has joined #ffmpeg-devel

21:42 k777_ has joined #ffmpeg-devel

21:44 IndecisiveTurtle has quit [Quit: IndecisiveTurtle]

21:44 k777 has quit [Read error: Connection reset by peer]

21:44 k777__ has joined #ffmpeg-devel

21:46 abdu has quit [Ping timeout: 240 seconds]

21:47 k777_ has quit [Ping timeout: 252 seconds]

21:55 <toots5446> michaelni: 100% agreed here. I'll revert to the simple solution and we can adjust when we have a more specific use-case. Appreciate.

21:56 abdu has joined #ffmpeg-devel

22:05 DauntlessOne4 has joined #ffmpeg-devel

22:14 mkver has quit [Ping timeout: 276 seconds]

22:17 System_Error has quit [Ping timeout: 264 seconds]

22:30 IndecisiveTurtle has joined #ffmpeg-devel

22:30 <IndecisiveTurtle> Lynne: Compiled the test under msys2 and it appears to work without crashing on my Vega 8

22:30 <IndecisiveTurtle> https://pastebin.com/Dd20fYME

22:36 System_Error has joined #ffmpeg-devel

22:57 abdu has quit [Quit: Client closed]

23:54 <fflogger> [editedticket] Balling: Ticket #11271 ([undetermined] How to choose the best hwaccel?) updated https://trac.ffmpeg.org/ticket/11271#comment:10