#ffmpeg-devel on 2025-03-21 — irc logs at libera.irclog.whitequark.org

2025-03-03 01:04 michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct

00:03 IndecisiveTurtle has joined #ffmpeg-devel

00:13 kasper93_ is now known as kasper93

00:17 abdu67 has joined #ffmpeg-devel

00:18 twelve has quit [Remote host closed the connection]

00:19 aaabbb has quit [Ping timeout: 248 seconds]

00:21 abdu has quit [Ping timeout: 240 seconds]

00:24 LainExperiments has joined #ffmpeg-devel

00:29 LainExperiments4 has joined #ffmpeg-devel

00:30 LainExperiments4 has quit [Client Quit]

00:31 LainExperiments has quit [Ping timeout: 240 seconds]

00:31 aaabbb has joined #ffmpeg-devel

00:48 iive has quit [Quit: They came for me...]

00:48 twelve has joined #ffmpeg-devel

01:06 minimal has quit [Quit: Leaving]

01:11 twelve has quit [Remote host closed the connection]

01:34 abdu67 has quit [Quit: Client closed]

01:34 abdu67 has joined #ffmpeg-devel

01:43 thilo has quit [Ping timeout: 260 seconds]

01:45 thilo has joined #ffmpeg-devel

02:00 zeezie01 has quit [Quit: Leaving]

02:23 cone-371 has joined #ffmpeg-devel

02:23 <cone-371> ffmpeg Peter Ross release/7.1:276bd388f33b: avcodec/Makefile: include aom_film_grain.o file for h264_sei component

02:33 <fflogger> [editedticket] jamrial: Ticket #11491 ([avcodec] n7.1.1 fails to build with some options (but not on master), backport needed) updated https://trac.ffmpeg.org/ticket/11491#comment:2

03:02 ^Neo has quit [Ping timeout: 245 seconds]

03:12 JackJ30 has joined #ffmpeg-devel

03:12 JackJ30 has quit [Remote host closed the connection]

03:12 JackJ30 has joined #ffmpeg-devel

03:16 abdu67 has quit [Ping timeout: 240 seconds]

03:16 rvalue has quit [Read error: Connection reset by peer]

03:17 rvalue has joined #ffmpeg-devel

03:18 sudden has quit [Ping timeout: 248 seconds]

03:20 sudden has joined #ffmpeg-devel

03:24 abdu67 has joined #ffmpeg-devel

03:30 jamrial_ has quit []

03:30 JackJ30 has quit [Remote host closed the connection]

03:32 abdu67 has quit [Ping timeout: 240 seconds]

03:35 <cone-371> ffmpeg Andreas Rheinhardt master:dff498fddfae: avutil/csp: Improve enum range comparisons

03:35 <cone-371> ffmpeg Andreas Rheinhardt master:65154ba99442: swscale/tests/swscale: Fix potential buffer overflow

03:35 <cone-371> ffmpeg Andreas Rheinhardt master:94fd222235a9: avcodec/mathtables: Fix inaccurate macro name

03:35 <cone-371> ffmpeg Andreas Rheinhardt master:e5d62e20c8d8: avdevice/sdl2: Suppress macro redefinition warning

04:21 <Lynne> JEEB: wtf

04:21 <Lynne> I thought I'd seen everything

04:22 <Lynne> tver switched its backend a few days ago from brightcove to streaks

04:22 <Lynne> no more easy geolock bypassing, which is very annoying

04:23 <Lynne> but worse, the new streaks backend, uses AAC at 64kbps... LTP!

04:24 <Lynne> AAC-LTP! AT 64KBPS!

04:26 <Lynne> I don't even know where they got the encoder from... I don't think fraunhofer, so... us?

04:36 zsoltiv has quit [Ping timeout: 246 seconds]

04:36 <Lynne> ...I think its ours, similar artifacts, same bitrate behaviour

04:37 zsoltiv_ has quit [Ping timeout: 252 seconds]

04:40 <Lynne> so many questions, from which version they used, to enabling -strict -2, to even considering its production-quality, let alone that its an appropriate profile when everyone gave up on it as soon as it made it into the aac spec

05:00 System_Error has quit [Remote host closed the connection]

05:06 System_Error has joined #ffmpeg-devel

05:37 System_Error has quit [Remote host closed the connection]

05:43 System_Error has joined #ffmpeg-devel

06:00 av500 has quit [Remote host closed the connection]

06:05 derpydoo has joined #ffmpeg-devel

06:22 System_Error has quit [Remote host closed the connection]

06:29 System_Error has joined #ffmpeg-devel

06:35 cone-371 has quit [Quit: transmission timeout]

06:57 MisterMinister has quit [Ping timeout: 260 seconds]

07:14 RuMi_Nos has joined #ffmpeg-devel

07:56 mlauss2 has joined #ffmpeg-devel

08:04 mlauss2 has quit [Quit: Client closed]

08:43 ngaullier has joined #ffmpeg-devel

09:12 twelve has joined #ffmpeg-devel

09:47 Guest11 has joined #ffmpeg-devel

09:52 Guest11 has quit [Quit: Client closed]

09:57 twelve has quit [Ping timeout: 252 seconds]

10:21 microchip_ has quit [Ping timeout: 244 seconds]

10:31 microchip_ has joined #ffmpeg-devel

10:38 derpydoo has quit [Quit: derpydoo]

10:45 abdu67 has joined #ffmpeg-devel

10:46 System_Error has quit [Ping timeout: 264 seconds]

10:47 compnn has joined #ffmpeg-devel

10:47 Marth64 has quit [Remote host closed the connection]

10:48 ccawley2011 has joined #ffmpeg-devel

10:48 natto17 has joined #ffmpeg-devel

10:49 natto has quit [Ping timeout: 260 seconds]

10:49 compnnn has quit [Read error: Connection reset by peer]

10:49 RuMi_Nos has quit [Ping timeout: 260 seconds]

10:50 RuMi_Nos has joined #ffmpeg-devel

10:55 j45_ has joined #ffmpeg-devel

10:55 j45 has quit [Ping timeout: 260 seconds]

10:55 j45 has joined #ffmpeg-devel

10:55 j45_ is now known as j45

10:55 j45 has quit [Changing host]

10:57 System_Error has joined #ffmpeg-devel

10:58 twelve has joined #ffmpeg-devel

11:08 <JEEB> Lynne: fun that someone actually utilized that. and if I recall correctly you recently removed the support for that in the avcodec encoder?

11:11 ^Neo has joined #ffmpeg-devel

11:11 ^Neo has quit [Changing host]

11:16 ccawley2011 has quit [Ping timeout: 248 seconds]

11:26 ccawley2011 has joined #ffmpeg-devel

12:06 <Lynne> yeah, 2 weeks ago or so

12:09 rvalue has quit [Read error: Connection reset by peer]

12:10 rvalue has joined #ffmpeg-devel

12:23 <BBB> I think the concept of quality comparisons isn't well-understood in some places

12:23 <BBB> "it works!" is what they're looking for

12:23 <BBB> where "works" is mostly a technical concept, not one involving actual QoE

12:47 jamrial has joined #ffmpeg-devel

12:50 abdu67 has quit [Ping timeout: 240 seconds]

12:55 <JEEB> so on zlib-ng hosts `IGNORE_TESTS="copy-apng cover-art-aiff-id3v2-remux cover-art-flac-remux cover-art-mp3-id3v2-remux mov-cover-image png-icc png-mdcv shortest-sub apng png lavf-png vsynth_lena-flashsv vsynth1-flashsv vsynth1-mpng vsynth1-zlib vsynth2-flashsv vsynth2-mpng vsynth2-zlib vsynth3-flashsv vsynth3-mpng vsynth3-zlib vsynth_lena-flashsv vsynth_lena-mpng vsynth_lena-zlib"` seems to work with

12:55 <JEEB> FATE

12:56 abdu67 has joined #ffmpeg-devel

12:56 <jamrial> JEEB: nobody willing to write a native no compression deflate impl?

12:57 <jamrial> i think mkver mentioned there was one already in some encoder

12:57 <JEEB> yea

13:10 <haasn> is there a way to load the image edge into xmm without risking a segfault from overread?

13:10 <haasn> or do I have to fall back to scalar code (or pre-packing the image border)?

13:11 <jamrial> haasn: pad the buffer

13:11 <jamrial> always allocate at least 64 bytes more than you need

13:12 <haasn> I guess in the case of read it's trivial to just memcpy the last chunk into a temporary buffer

13:13 <haasn> and for write I imagine there's some of masked write operation?

13:13 <BtbN> There's a whole mechanism already to pad buffers so no such workarounds are needed

13:14 <BtbN> and a whole bunch of code that relies on buffers being passed already exists

13:14 <haasn> we currently have several open, critical issues on trac about swscale causing segfaults due to overread/overwrite

13:15 <wbs> unfortunately, with libswscale, you can't assume much about how the caller has allocated/padded the buffer it passes to you

13:15 <haasn> I don't think there is any clearly documented padding requirement for swscale and if so, it isn't being enforced

13:16 <haasn> in my rewrite I detect if the buffer is properly aligned/padded (via linesize)

13:16 <haasn> and if not, I want to handle the fallback somehow

13:16 <haasn> I guess memcpy into a padded buffer is the hammer solution

13:17 <JEEB> yea

13:17 <haasn> really we need three separate cases: 1) aligned, padded, 2) unaligned, padded, 3) unaligned, unpadded

13:18 <haasn> for case 1) we can use normal vector load/store; for case 2) we can use unaligned loads/storse

13:19 <haasn> for case 3) we can use unaligned loads for the main loop (we will load extra pixels from the next line), and a memcpy of the tail into a padded buffer for the last line

13:19 <haasn> but for unpadded store we need a masked write or somethign

13:24 <haasn> I think I will force a memcpy on unpadded writes

13:24 <haasn> instead of forcing the poor assembly code to worry about it

13:24 <haasn> in practices it's probably not different, the compiler seems to output the same code actually

13:24 <haasn> (write xmm0 to stack and jump into memcpy)

13:26 abdu45 has joined #ffmpeg-devel

13:26 abdu67 has quit [Ping timeout: 240 seconds]

13:27 <JEEB> I like how ignored tests have `IGNORE\t` in the terminal output :D

13:27 <haasn> is there any platform where unaligned load/store on aligned pointers is slower than aligned load/store?

13:28 <haasn> because if not, we may not need to bother with a separate code path

13:28 <nevcairiel> old x86, maybe some arms

13:31 <JEEB> ah, lavf-gray16be.png I forgot

13:39 <JEEB> `fate-lavf-images` requires `lavf-gray16be.png lavf-rgb48be.png` being ignored

13:39 <ramiro> haasn: I think it might be simpler to always call the c version (one element at a time) at the border of unpadded images.

13:39 <JEEB> alright, this should hopefully get me the full one

13:40 <haasn> ramiro: that is unfortunately pretty difficult, would require maintaining a whole separate ops chain; I also don't really see the point as I strongly doubt it would be faster than a memcpy + calling the asm version

13:40 <haasn> especially given that the C version is now just a reference implementation without any regards for performance

13:41 <haasn> I think it makes much more sense to just memcpy the last block where absolutely necessary, I'm just trying to see if we should bother caring about aligned load/store

13:41 <ramiro> haasn: it won't be faster, but it will be much simpler. simd optimizations do the best they can do with their vector sizes, and not have to worry about edge cases.

13:42 <haasn> that is the argument for doing a memcpy also

13:42 <haasn> I think I will make unaligned load/store the default assumption for asm backends for now

13:43 <haasn> and then later add the ability to make an aligned fast path

13:43 <haasn> only on platforms where we can measure an actual improvement

13:44 <haasn> neon doesn't even seem to have aligned loads

13:45 <wbs> haasn: 32 bit arm has got aligned loads, 64 bit doesn't

13:45 <wbs> (for neon)

13:45 <Lynne> arm's unaligned accesses have been historically slow, IIRC

13:45 <Lynne> but yeah, you shouldn't worry about it these days

13:45 <wbs> in the C intrinsics, you can't request aligned loads though, iirc

13:46 <haasn> rvv also doesn't have aligned loads

13:46 <wbs> Lynne: from what I remember, aligned vs non-aligned loads only made a checkasm measurable difference on the very very earliest armv7 cores

13:46 <haasn> Lynne: unaligned access on aligned pointers, right?

13:46 <haasn> and even then the difference was like, what, 5% on a purely memory constrained kernel?

13:54 <Lynne> no, unaligned on unaligned

13:55 <haasn> but I'm talking about the performance penalty of using unaligned load/store on aligned pointers

13:58 twelve has quit [Remote host closed the connection]

14:34 Anthony_ZO has quit [Read error: Connection reset by peer]

14:35 Anthony_ZO has joined #ffmpeg-devel

14:39 Anthony_ZO has quit [Ping timeout: 276 seconds]

15:00 <averne> Hi all, I'm hitting a bit of a roadblock while working on a multithreaded hwaccel

15:00 <averne> Basically, for interlaced content, I would need to have per-field command buffers etc, except the usual mechanisms such as AVFrame->private_ref->data and hwaccel_picture_private are per-frame

15:01 <averne> The issue is that by proceeding on the two fields simultaneously, I would overwrite codec setup and command list memory/However, if I wait for the first field to have completed before kicking off the second, I run into another issue because subsequent frames, depending on this one, get submitted in-between and try to do inter-pred on undecoded data

15:01 <averne> So essentially, I need to either 1) have per-field acceleration structures or 2) force serialization of the entire decoding process

15:02 <Lynne> averne: vulkan is already multithreaded, did you take a look at it?

15:03 <averne> Yes, I mostly referenced your code while working on mine

15:04 <Lynne> could you simply submit 2 command buffers for each frame sequentially?

15:04 <nevcairiel> the existing multithreading dependency tracking should already track field progress, no?

15:04 <Lynne> ah, right, yup, each start/end_frame gets called once per slice

15:06 <averne> That's not what I've been experiencing with the MR8_BT_B sample. It calls start_frame on the second field before the first field got to end_frame

15:07 <nevcairiel> hwaccel is sometimes special cased to avoid these things since it doesnt typically work on a slice-by-slice basis

15:09 <averne> "could you simply submit 2 command buffers for each frame sequentially?" -> that's what I tried, make the second field wait on the first's completion. However it breaks since other frames get submitted in-between

15:18 <Lynne> are you relying on submission order?

15:18 <Lynne> you should make the hardware wait, not the program

15:19 <Lynne> via semaphores

15:20 abdu45 has quit [Ping timeout: 240 seconds]

15:22 twelve has joined #ffmpeg-devel

15:24 mindfreeze has quit [Ping timeout: 272 seconds]

15:26 mindfreeze has joined #ffmpeg-devel

15:27 <averne> Well since all the decoder instances share the same queue, that would deadlock

15:29 <Lynne> its a queue, though?

15:29 <Lynne> ah, you have no scheduler?

15:30 <Lynne> desktop GPUs have the GSP which handles scheduling and buffer creation for you, I guess the tegra doesn't

15:31 <Lynne> without scheduling, you should disable threading, no way around that sadly without big modifications

15:35 <averne> No I'm actually on desktop right now, but I don't think scheduling is relevant when you have a single queue. Queues on nvidia are basically wrappers around the host engine (also called gpfifo/pbdma sometimes) which processes commands linearly

15:36 <Lynne> you don't get to use the GSP by default on desktop unless you use nouveau/nvk

15:37 <Lynne> scheduling should be relevant, if one submission waits on the result of another, later submission, the hardware should pause the current submission and execute the second one

15:37 twelve has quit [Remote host closed the connection]

15:40 <averne> I'm on nvidia-open which is GSP-only afaik

15:42 ccawley2011 has quit [Ping timeout: 245 seconds]

15:43 <Lynne> it used to be, not sure if it is now

15:43 <Lynne> nvk is pretty good tbh, if you're not using cuda its worth a try

15:44 ccawley2011 has joined #ffmpeg-devel

15:44 <Lynne> it even ran the ffv1 code with no issues

15:45 <averne> Everything gets submitted to a single hardware channel. If you tell it to acquire a semaphore, it will not process further commands until that happens https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/turing/tu104/dev_pbdma.ref.txt#L1629-L1631. It might switch to a different *hardware channel*

15:46 <averne> "nvk is pretty good tbh, if you're not using cuda its worth a try" -> well I'm not really using cuda, I'm just doing this for fun basically

15:47 <averne> At this point if I switch to nvk I'd rather fix up the vulkan-video MR

15:50 <averne> Well I'll see if I can put something together. Otherwise, I guess multi-threading is more trouble than it's worth

15:50 <Lynne> something together?

15:51 <Lynne> the gain from multithreading isn't that significant tbh, its very situational

15:51 <averne> Try to have per-field acceleration structures in a way that's not too hacky

15:52 <averne> And yeah that's my experience too. It only really matters when your decoding is cpu-bound because you can parse headers and push commands faster

15:53 <Lynne> what would you store there?

15:55 IndecisiveTurtle has quit [Quit: IndecisiveTurtle]

15:59 <averne> Everything that the hardware needs (codec setup, ...). If I have a single instance of these structures per-frame, and don't wait on the first field, I would overwrite its data

16:03 <Lynne> that's a lot

16:03 <Lynne> and it's for interlaced...

16:04 <Lynne> I think you should just somehow hack in support fully inside the hwaccel

16:04 <ramiro> haasn: yuv444p -> yuva444p, this could be optimized to memcpy+memset

16:04 <haasn> ramiro: I was thinking about this also

16:05 <Lynne> just ref both field, keep track of it somewhere, and once you have both, submit two command buffers

16:05 <Lynne> its interlaced, which no one should be caring about these days

16:05 <haasn> ramiro: one idea would be to just have a planar memcpy backend

16:06 <haasn> that can compile READ {planar = true}; CLEAR; SWIZZLE; WRITE {planar = true} into a sequence of memcpy and memset calls

16:06 <haasn> something I would like to ultimately be able to do is to solve that with plane refs at a higher level, if we have separate buffers per plane

16:07 <averne> Lynne: a lot of memory? not that much tbh, for 1080p it's a few 100s of kbs at most. And most of that is the compressed bitstream

16:07 <Lynne> not memory, state

16:07 <averne> Ah, right

16:08 <Lynne> with the new changes I made, you can ref the incoming packet

16:08 <Lynne> if you need it

16:09 <averne> Can you link the patch? Maybe this might help

16:11 <Lynne> all it does is it adds an AVBufferRef *pkt to start_frame()

16:11 <Lynne> caff29dbb18feeb87cb00fc4c33d20cf01667be0

16:15 <averne> Thanks, I'll try to see if I can use this

16:16 <averne> But yeah, submitting the two fields at once might be the right way to go about this

16:28 jack has joined #ffmpeg-devel

16:28 swarup3204 has joined #ffmpeg-devel

16:37 <haasn> ramiro: heads up: I refactored the ops_internal.h ABI quite substantially in my branch (not yet pushed); one of the changes is to replace the const void *priv pointer by a small (16 bytes) union

16:37 <haasn> so you can directly store e.g. 4 floats without requiring a second indirection

16:38 <haasn> though I imagine you don't care about this in your JIT impl since you just compile them down to constants

16:40 cone-022 has joined #ffmpeg-devel

16:40 <cone-022> ffmpeg Dmitrii Ovchinnikov master:5b460bde8b31: libavutil/hwcontext_amf: add format validation in transfer_data functions

16:40 <cone-022> ffmpeg Evgeny Pavlov master:079110238a86: avcodec/amfenc: add smart access video option

16:44 jack has quit [Remote host closed the connection]

16:45 <fflogger> [newticket] iyesin: Ticket #11523 ([ffmpeg] Conflicting requirements while encoding with SVTAV1) created https://trac.ffmpeg.org/ticket/11523

16:45 <fflogger> [newticket] Lastique: Ticket #11524 ([avcodec] g726(le) produces clicks in encoded audio) created https://trac.ffmpeg.org/ticket/11524

16:49 <fflogger> [editedticket] galad: Ticket #11523 ([ffmpeg] Conflicting requirements while encoding with SVTAV1) updated https://trac.ffmpeg.org/ticket/11523#comment:1

16:53 <fflogger> [editedticket] iyesin: Ticket #11523 ([ffmpeg] Conflicting requirements while encoding with SVTAV1) updated https://trac.ffmpeg.org/ticket/11523#comment:2

16:55 <fflogger> [editedticket] Lastique: Ticket #11524 ([avcodec] g726(le) produces clicks in encoded audio) updated https://trac.ffmpeg.org/ticket/11524#comment:1

16:55 <fflogger> [editedticket] galad: Ticket #11523 ([ffmpeg] Conflicting requirements while encoding with SVTAV1) updated https://trac.ffmpeg.org/ticket/11523#comment:3

16:56 <fflogger> [editedticket] iyesin: Ticket #11523 ([ffmpeg] Conflicting requirements while encoding with SVTAV1) updated https://trac.ffmpeg.org/ticket/11523#comment:4

16:58 <fflogger> [editedticket] iyesin: Ticket #11523 ([ffmpeg] Conflicting requirements while encoding with SVTAV1) updated https://trac.ffmpeg.org/ticket/11523#comment:5

16:58 ^Neo has quit [Ping timeout: 268 seconds]

17:15 abdu45 has joined #ffmpeg-devel

17:25 abdu45 has quit [Ping timeout: 240 seconds]

17:35 <swarup3204> Hey all, I am Swarup, a final year undergraduate from IIT KGP. I recently joined this chat and am interested in the projects. Could anyone point to a good starting point, especially some example bugs and their patches along with new bugs

17:42 <frankplow> swarup3204: You can find open and fixed bugs on https://trac.ffmpeg.org/ When people close the trac ticket they will tend to refer to a commit SHA if a code change was made.

17:43 swarup3204 has quit [Quit: Client closed]

17:56 <haasn> time=770 us, ref=893 us, speedup=1.159x faster

17:56 <haasn> first numbers from x86 SIMD (yuv420p -> yuva420p)

17:56 <haasn> ramiro: I wonder if memcpy+memset would even be faster than this

18:04 MisterMinister has joined #ffmpeg-devel

18:09 Marth64 has joined #ffmpeg-devel

18:11 <nevcairiel> memcpy is pretty optimized on most systems, for a plain transfer of a plane, it might be

18:13 abdu45 has joined #ffmpeg-devel

18:17 <Lynne> haasn: I'm guessing the compiler wasn't smart enough to do this in blocks

18:17 <Lynne> since you started off with compiler-generated asm

18:17 <haasn> I'm not using compiler generated asm

18:17 ngaullier has quit [Remote host closed the connection]

18:17 <Lynne> I thought you started off with compiler-generated asm

18:18 iive has joined #ffmpeg-devel

18:19 <Lynne> oh, right, yuv->yuva isn't interleaved, you can't do this in blocks since its just a memcpy

18:19 <Lynne> you should handle this in the C layer by reffing the input onto the output and creating a new buffer for just the alpha channel

18:34 abdu1 has joined #ffmpeg-devel

18:38 abdu45 has quit [Ping timeout: 240 seconds]

19:03 abdu35 has joined #ffmpeg-devel

19:04 abdu1 has quit [Quit: Client closed]

19:05 abdu5 has joined #ffmpeg-devel

19:07 <haasn> time=353 us, ref=3354 us, speedup=9.485x faster # gray -> gbrp

19:07 <haasn> versus time=449 us, ref=3419 us, speedup=7.609x faster for the naive C reference implementation

19:07 IndecisiveTurtle has joined #ffmpeg-devel

19:08 abdu35 has quit [Ping timeout: 240 seconds]

19:09 <IndecisiveTurtle> Lynne: I hacked some renderdoc integration into ffmpeg and noticed that with yuv444p16 all values in input buffer have 1 extra nibble. Shiftiing fetched values by 4 in upload shader fixes the graphics. Any idea why that could be?

19:09 <haasn> versus time=560 us, ref=3341 us, speedup=5.960x faster from the initial prototype I had on the ML :)

19:10 <IndecisiveTurtle> Also I would be curious if you think it would be possible to somehow get rdoc API integration into upstream. It makes debugging filters much more enjoyable. Though I'm also fine keeping it a local patch

19:11 <Lynne> nibble?

19:11 <IndecisiveTurtle> On cpu vc2 a value might be 0x1d0 on vc2_vulkan it is 0x1d00

19:13 <Lynne> can you test if ffv1 works fine?

19:13 <Lynne> where's the integration?

19:15 <IndecisiveTurtle> Sure. What should I pass for -c:v to use ffv1 vulkan encoder?

19:15 <IndecisiveTurtle> The rdoc integration I did only for vc2, as I need to insert the points where rdoc begins/ends capture manually.

19:16 <IndecisiveTurtle> Also rdoc does not seem to work with shader object extension

19:16 <IndecisiveTurtle> I had to disable it, otherwise you could not view any resources bound to shaders

19:17 <Lynne> ffv1_vulkan

19:17 <Lynne> I'd like to see the renderdoc code

19:19 <IndecisiveTurtle> It's a bit of a hack atm but sure https://github.com/raphaelthegreat/FFmpeg/commit/898893af

19:19 <IndecisiveTurtle> I would be open to working on it more for an actual implementation that could go upstream if you are open to it

19:20 <IndecisiveTurtle> Making it work with all vk code/adding proper input handling etc

19:24 dionisis has quit [Quit: WeeChat 3.8]

19:29 <BtbN> Is it just me or have spam filters recently gotten much worse? And not in the sense of missing a lot of spam, that actually seems fine

19:29 <IndecisiveTurtle> ffv1 vulkan appears to be working fine

19:29 <BtbN> but looking at my personal and my work spam filter, there's a bloody 50% false-positive rate

19:30 <BtbN> at this kind of rate, I might as well not have a spam filter, cause I need to look at all those mails anyway, to make sure it's actually spam...

19:32 RuMi_Nos has quit [Quit: RuMi_Nos]

19:36 <Lynne> IndecisiveTurtle: odd, how come vc2 reads 4 bits off

19:38 <IndecisiveTurtle> I also checked ffv1_vulkan and it also receives the same "expanded" value yet is fine

19:38 <IndecisiveTurtle> https://imgur.com/a/6en7PDM

19:39 <IndecisiveTurtle> Yet when I print the input buff in vc2 it says pix[0] = 0x1d0

19:39 <IndecisiveTurtle> Where pix[0] is this https://github.com/FFmpeg/FFmpeg/blob/079110238a869b7ca1f3756814e5660c12581d7b/libavcodec/vc2enc.c#L882

19:40 cone-022 has quit [Quit: transmission timeout]

19:47 <IndecisiveTurtle> I wonder if there is some hidden conversion going on, but I'm not super sure where to look

19:58 realies has quit [Quit: ~]

19:58 realies has joined #ffmpeg-devel

20:04 thilo has quit [Ping timeout: 244 seconds]

20:05 <haasn> BBB: x86inc unconditionally turns TAIL_CALL %1 into "call %1; vzeroupper; ret" on AVX2; why is this?

20:05 abdu67 has joined #ffmpeg-devel

20:06 thilo has joined #ffmpeg-devel

20:06 <BBB> haasn: I think gramner might be a better person to ask that question to than me

20:06 <haasn> replacing it by a simple "jmp" brings this function from 354 us to 264 us

20:06 <haasn> cc Gramner

20:07 <BBB> I think you shouldn't over-think some of the abstractions in x86inc.asm

20:07 <BBB> if they're helpful, use them. but you should know what they do and not use them blindly

20:07 <BBB> if they don't work, don't use them. they're not alwys universally helpful

20:07 <haasn> fair

20:07 <BBB> I admit that's not helpful but that's how I use it

20:08 abdu5 has quit [Ping timeout: 240 seconds]

20:11 <haasn> does x86inc define a helper for the number of caller-saved registers I can assign before it would spill onto the stack? I might simplify my code by just always allocating the maximum number of registers and vector registers

20:12 <Gramner> TAIL_CALL turns into a jmp when there's no epilogue. if a vzeroupper is required there is an epilogue

20:14 realies has quit [Quit: ~]

20:14 microchip_ has quit [Quit: There is no spoon!]

20:14 microchip_ has joined #ffmpeg-devel

20:15 <Gramner> haasn: it does not, but fwiw that number of registers is 3 on x86-32, 7 on win64, and 9 on everything else

20:16 delewis has quit [Remote host closed the connection]

20:16 <haasn> great, 7 is the max I need

20:17 <haasn> I guess it's unlikely we'll see new, radically different x86 ABIs at this point; maybe I'm overthinking this

20:17 <Gramner> if you're going to jump into another avx2 function at the end you can just use the jmp instruction directly

20:17 <Gramner> APX is a thing :D

20:18 <Gramner> adds another 16 regs, I would assume they will all be free to use without saving

20:18 <haasn> right, the idea here is for each kernel to tail call into the next kernel

20:19 <haasn> ramiro: I think I will hand write SIMD for x86_64 only and rely on the C templates for 32-bit

20:19 <haasn> they are still faster than legacy swscale as we know from my past benchmarks

20:20 <Gramner> ignoring 32bit is perfectly reasonable

20:20 <Gramner> makes life so much easier

20:20 delewis has joined #ffmpeg-devel

20:21 <haasn> especially given that I _already_ have a set of C templates that allow the compiler to generate reasonable code

20:21 <haasn> especially with the GCC vectors code path enabled

20:28 realies has joined #ffmpeg-devel

20:39 dionisis has joined #ffmpeg-devel

20:50 BtbN has quit [Remote host closed the connection]

20:53 BtbN has joined #ffmpeg-devel

20:53 <ramiro> haasn: I was thinking the same thing about 32-bit

20:53 IndecisiveTurtle has quit [Ping timeout: 265 seconds]

21:19 Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]

21:27 Riviera has quit [Quit: leaving]

21:33 ccawley2011 has quit [Read error: Connection reset by peer]

21:37 HarshK23 has quit [Quit: Connection closed for inactivity]

21:46 Mirarora has joined #ffmpeg-devel

21:48 HarshK23 has joined #ffmpeg-devel

21:49 Warcop has quit [Remote host closed the connection]

22:00 abdu67 has quit [Quit: Client closed]

22:27 Marth64 has quit [Ping timeout: 252 seconds]

22:37 IndecisiveTurtle has joined #ffmpeg-devel

23:01 minimal has joined #ffmpeg-devel

23:23 Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]

23:27 derpydoo has joined #ffmpeg-devel

23:41 uau has quit [Quit: ZNC 1.9.1+deb2+b2 - https://znc.in]

23:42 uau has joined #ffmpeg-devel

23:57 iive has quit [Quit: They came for me...]