michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
^Neo_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
^Neo has quit [Changing host]
<fflogger> [editedticket] Aleksoid1978: Ticket #11505 ([avcodec] Cuvid decoders do not work with CUDA hwaccel anymore) updated https://trac.ffmpeg.org/ticket/11505#comment:7
minimal has quit [Quit: Leaving]
<fflogger> [editedticket] jamrial: Ticket #11490 ([avformat] [Regression] Audio silent for long MOV file) updated https://trac.ffmpeg.org/ticket/11490#comment:7
<cone-049> ffmpeg Timo Rothenpieler master:4c7d0f88f507: avcodec/Makefile: remove redundant object
<cone-049> ffmpeg Timo Rothenpieler master:fed6612415c9: avcodec/cuviddec: use pre-existing chroma format information
<fflogger> [editedticket] Timo Rothenpieler <timo@rothenpieler.org>: Ticket #11505 ([avcodec] Cuvid decoders do not work with CUDA hwaccel anymore) updated https://trac.ffmpeg.org/ticket/11505#comment:8
<fflogger> [editedticket] bermond: Ticket #11505 ([avcodec] Cuvid decoders do not work with CUDA hwaccel anymore) updated https://trac.ffmpeg.org/ticket/11505#comment:9
<fflogger> [editedticket] Wallboy: Ticket #11503 ([avcodec] AC-3 downmix levels defaulting to 1.414 with recent decoder changes) updated https://trac.ffmpeg.org/ticket/11503#comment:4
<haasn> ramiro: wget https://0x0.st/8SdV.c -O swsbench.c && gcc swsbench.c -O3 -mavx2 `pkg-config --cflags --libs libavutil` -o swsbench && ./swsbench
<haasn> Gave all the crazy ideas a try
<haasn> I think based on this I want to use hybrid array/vector CPS approach; since the same code works for both; that way we will get the (close to optimal) vector performance on GCC while falling back to still-decent array code on other compilers
<haasn> (the two relevant lines in the benchmark are "vector" and "pointer")
<haasn> sadly clang shits the bed for all of these approaches
<haasn> #tomorrow I will have to take a look at what happens when we start going up to float sized elements though
<haasn> before committing to any of this
<haasn> but I'm hesitantly optimistic that we can do something like #if VECTOR_SIZE >= sizeof(float[SWS_CHUNK_SIZE]) use f32vec_t #else use float[SWS_CHUNK_SIZE];
<haasn> to not sacrifice too much performance when using floats
<haasn> (as opposed to spilling the f32vec_t all over the stack)
<haasn> what I really like about the CPS approach is it is very flexible; because we can also call the tail multiple times it can even deal with changing element sizes
<haasn> changing chunk sizes rather; e.g. if you want to embed a 2x upscale _inside_ a pipeline
<haasn> so we're not forced to go via memory when scaling
<cone-049> ffmpeg Andreas Rheinhardt master:3e19e5062c42: avcodec/decode: Move is_open check to avcodec_receive_frame()
<cone-049> ffmpeg Andreas Rheinhardt master:47d7c6cd1571: avcodec/codec_internal: Add dedicated is_decoder flag to FFCodec
<cone-049> ffmpeg Andreas Rheinhardt master:c8be309719df: avcodec/codec_internal: Add inlined version of av_codec_is_(de|en)coder
<cone-049> ffmpeg Andreas Rheinhardt master:bfbceb7d554f: avcodec/tests/avcodec: Silence deprecation warnings
<cone-049> ffmpeg Andreas Rheinhardt master:ed1b76cdb79c: avcodec/allcodecs: Don't wrap supported_framerates
<cone-049> ffmpeg Andreas Rheinhardt master:958c46800e68: avcodec/mjpegenc: Reconstify mjpeg encoder
<haasn> what I don't like about the CPS approach is that it slightly hinders our ability to swap out unaligned versions of the read/write callbacks for the edge case, but that's a minor thing to work around by just patching the cps ops list before calling into it
<haasn> what I also don't like is the two levels of indirection for *priv but given that we need to pass the global ctx (for image pointers) and the per-op ctx (for cps) we're a bit short on registers; it's a double indirection one way or the other
<haasn> and *priv is considerably less useful; in theory we could stick some extra data inside the per-op context to allow implementations to store up to e.g. 64 bits without needing to load a pointer
thilo has quit [Ping timeout: 244 seconds]
^Neo has quit [Ping timeout: 252 seconds]
thilo has joined #ffmpeg-devel
thilo has quit [Changing host]
thilo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]
pross has joined #ffmpeg-devel
Mirarora has joined #ffmpeg-devel
^Neo has quit [Ping timeout: 246 seconds]
<Lynne> do we have anything in the code that produces a software YUV frame but where a single buffer in AVFrame holds all planes?
<Lynne> by default we allocate one buffer per plane
<Lynne> just want to know if its worth having a fast path for the case where someone packs all planes in a single buffer
<jamrial> Lynne: av_frame_get_buffer()
<Lynne> huh, I was sure we allocated one buffer per plane
<toots5446> Lynne: do you like the new ogg patch series? I think it's got your last recommendations in it!
<Lynne> the files still need to be added to fate before it can be pushed
<toots5446> okay! Any way I can help with that?
<jamrial> Lynne: lavc's get_buffer2() callback does
<Lynne> cool
<Lynne> thanks
<jamrial> the default one, at least
<jamrial> as in, it allocates one buffer per plane, unlike av_frame_get_buffer()
<Lynne> ah
jamrial has quit []
ukn_unknown has joined #ffmpeg-devel
cone-049 has quit [Quit: transmission timeout]
Kei_N has quit [Read error: Connection reset by peer]
Kei_N has joined #ffmpeg-devel
Martchus has joined #ffmpeg-devel
Martchus_ has quit [Ping timeout: 252 seconds]
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
ccawley2011 has joined #ffmpeg-devel
Kwiboo has quit [Quit: .]
Kwiboo has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 260 seconds]
ukn_unknown has quit [Ping timeout: 240 seconds]
Martchus_ has joined #ffmpeg-devel
Martchus has quit [Ping timeout: 272 seconds]
ngaullier has joined #ffmpeg-devel
mlauss2 has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
^Neo has quit [Changing host]
ahmedhamed has quit [Quit: Connection closed for inactivity]
^Neo has quit [Ping timeout: 276 seconds]
ngaullier has quit [Remote host closed the connection]
ngaullier has joined #ffmpeg-devel
<fflogger> [newticket] redstone: Ticket #11506 ([undetermined] FFMPEG's AV1 hardware decoding is completely broken on Mac) created https://trac.ffmpeg.org/ticket/11506
<JEEB> lol, that is more that ffplay is built without libplacebo in that binary, and he isn't testing just decoding
<ePirat> michaelni, maybe you can help get the files added to fate for toots5446 chained ogg metadata patchset?
<fflogger> [editedticket] quinkblack: Ticket #11506 ([undetermined] FFMPEG's AV1 hardware decoding is completely broken on Mac) updated https://trac.ffmpeg.org/ticket/11506#comment:1
<fflogger> [editedticket] tomwillow: Ticket #7558 ([undetermined] Ignore coded resolutions in using -c:v copy?) updated https://trac.ffmpeg.org/ticket/7558#comment:1
j45_ has joined #ffmpeg-devel
j45 has quit [Ping timeout: 260 seconds]
j45_ is now known as j45
j45 has quit [Changing host]
j45 has joined #ffmpeg-devel
mlauss2 has quit [Quit: Client closed]
MyNetAz has quit [Remote host closed the connection]
<fflogger> [editedticket] Gyan: Ticket #11506 ([undetermined] FFMPEG's AV1 hardware decoding is completely broken on Mac) updated https://trac.ffmpeg.org/ticket/11506#comment:2
MyNetAz has joined #ffmpeg-devel
cone-926 has joined #ffmpeg-devel
<cone-926> ffmpeg wang-bin master:154c00514d88: lavc/videotoolboxenc: add hevc main42210 and p210
Anthony_ZO has joined #ffmpeg-devel
System_Error has quit [Ping timeout: 264 seconds]
^Neo has joined #ffmpeg-devel
^Neo has quit [Changing host]
^Neo has joined #ffmpeg-devel
System_Error has joined #ffmpeg-devel
<wbs> lol, that issue about av1 sw decoding being too slow, and hardware decoding being broken on an M4 ... is using an x86_64 build of ffmpeg
<kierank> loool
<wbs> (and sw decoding of av1 should be plenty fast; the only thing that hwdec gains you is a bit lower power consumption and cpu usage)
<kierank> wbs: does the rosetta simd compiler convert x86 simd to arm simd?
<kierank> or just implements it in scalar?
<wbs> kierank: no idea, but I think it does map to NEON in some form (and it doesn't do AVX, only SSE variants, iirc)
_whitelogger has quit [Remote host closed the connection]
_whitelogger_ has joined #ffmpeg-devel
j45 has joined #ffmpeg-devel
j45 has joined #ffmpeg-devel
ngaullier has quit [Ping timeout: 246 seconds]
<JEEB> wbs: lol, didn't even notice they were using the wrong arch since their testing methodology had issues on a whole different level
Thul has quit [Ping timeout: 245 seconds]
minimal has joined #ffmpeg-devel
<fflogger> [editedticket] cgbug: Ticket #11490 ([avformat] [Regression] Audio silent for long MOV file) updated https://trac.ffmpeg.org/ticket/11490#comment:8
jamrial has joined #ffmpeg-devel
paulk has quit [Ping timeout: 252 seconds]
paulk has joined #ffmpeg-devel
ngaullier has joined #ffmpeg-devel
av500 has quit [Ping timeout: 246 seconds]
HarshK23 has joined #ffmpeg-devel
<ramiro> haasn: have you tested with neon as well? it seems gcc (and clang even worse) don't like passing arguments in vector registers
<ramiro> apparently we'd need to use neon's intrinsics vectors types for that
cone-926 has quit [Quit: transmission timeout]
<haasn> don't see any issue here
<haasn> it only breaks for chunk sizes exceeding the vector length
<haasn> but that's a given
<haasn> and by breaks I mean spills to stack
<haasn> RVV codegen completely breaks but that's a given, on RVV we need to determine the vector size dynamically
<haasn> and probably use hand written asm for it
Anthony_ZO has quit [Ping timeout: 252 seconds]
<ramiro> hmm, that's odd. can you do the whole chain on that godbolt? (read, swizzle, from8, lshift, write). it looks like lshift and write are reading from memory again (?)
ccawley2011 has joined #ffmpeg-devel
<ramiro> haasn: oh, got it. it was a chunk size issue. if I set chunk size to 8 on neon then it also works with the int16 functions.
<haasn> right
<haasn> so
<APic> ☺
<haasn> the approach I'm eyeballing now is to use arrays when the vector size would exceed the native vector size
<haasn> this is just for the C fallback code
<haasn> obviously a hand written asm path can do whatever it wants
<haasn> e.g. passing the high and low halves separately
<haasn> though handling 32 bit float vectors is always gonna be a bit challinging
<haasn> since I'm guessing we will want to go no lower than 16 on the chunk size, that will require storing 512 bits per component, e.g. 8 vectors of size 256 or 16 (!) of size 128
<haasn> at least even on RVV 128 we have 32 vector registers so that's fine I suppose
<haasn> what about NEON?
<haasn> actually I have big plans for an RVV backend
<haasn> since we control the entire call chain we can do neat tricks like only setting $vtype on SWS_OP_READ and SWS_OP_CONVERT
<haasn> all other operations can just assume the vector type is already implicitly set
ccawley2011_ has joined #ffmpeg-devel
<haasn> and we can just use a static pattern m1 for u8, m2 for u16, m4 for f32 + determining the effective chunk size automatically
ccawley2011 has quit [Ping timeout: 252 seconds]
rvalue has quit [Read error: Connection reset by peer]
rvalue has joined #ffmpeg-devel
ukn_unknown has joined #ffmpeg-devel
ukn_unknown43 has joined #ffmpeg-devel
ukn_unknown has quit [Ping timeout: 240 seconds]
ukn_unknown43 has quit [Ping timeout: 240 seconds]
microchip_ has quit [Quit: There is no spoon!]
microchip_ has joined #ffmpeg-devel
ccawley2011_ has quit [Ping timeout: 244 seconds]
rvalue has quit [Read error: Connection reset by peer]
rvalue has joined #ffmpeg-devel
ccawley2011 has joined #ffmpeg-devel
<haasn> I like this framework a lot better overall
<haasn> and it's way faster :)
<haasn> and we can do things like dynamically choosing the correct chunk size, even based on how many remaining operations there are
<haasn> how large, rather
ngaullier has quit [Remote host closed the connection]
cone-821 has joined #ffmpeg-devel
<cone-821> ffmpeg James Almer master:c3b60e0df73b: tests/fate/pixfmt: add conversion tests with semi planar YUV formats
<cone-821> ffmpeg James Almer master:228713ef5dc7: swscale/input: add support for UYYVYY411
<cone-821> ffmpeg James Almer master:52eb0e18db27: avfilter/vsrc_testsrc: use aligned macros for writing
<welder> How to run the new swscale benchmark locally?
another| is now known as another
ccawley2011 has quit [Ping timeout: 252 seconds]
ccawley2011 has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 252 seconds]
<fflogger> [newticket] nathanf: Ticket #11507 ([avfilter] vpp_qsv tonemapping and color space conversion does not change metadata) created https://trac.ffmpeg.org/ticket/11507
minimal has quit [Quit: Leaving]
<fflogger> [editedticket] nyanmisaka: Ticket #11507 ([avfilter] vpp_qsv tonemapping and color space conversion does not change metadata) updated https://trac.ffmpeg.org/ticket/11507#comment:1
<Lynne> what was the way to disable probing in ffmpeg.c?
Guest47 has joined #ffmpeg-devel
Guest47 has quit [Write error: Broken pipe]
<Lynne> speaking of; jamrial, I thought the ffv1 parser avoided the need to decode upfront to detect the format
Guest95 has joined #ffmpeg-devel
Guest95 has quit [Write error: Broken pipe]
Guest47 has joined #ffmpeg-devel
Guest47 has quit [Write error: Connection reset by peer]
cone-821 has quit [Quit: transmission timeout]
Flat_ has joined #ffmpeg-devel
Flat has quit [Ping timeout: 265 seconds]
<jamrial> Lynne: it should, not sure what else could be missing for the demux code to still attempt to decode a frame
psykose has quit [Remote host closed the connection]
psykose has joined #ffmpeg-devel
<fflogger> [editedticket] redstone: Ticket #11506 ([undetermined] FFMPEG's AV1 hardware decoding is completely broken on Mac) updated https://trac.ffmpeg.org/ticket/11506#comment:3
<fflogger> [editedticket] redstone: Ticket #11506 ([undetermined] FFMPEG's AV1 hardware decoding is completely broken on Mac) updated https://trac.ffmpeg.org/ticket/11506#comment:4
<fflogger> [editedticket] redstone: Ticket #11506 ([undetermined] FFMPEG's AV1 hardware decoding is completely broken on Mac) updated https://trac.ffmpeg.org/ticket/11506#comment:5
<fflogger> [editedticket] nathanf: Ticket #11507 ([avfilter] vpp_qsv tonemapping and color space conversion does not change metadata) updated https://trac.ffmpeg.org/ticket/11507#comment:2
<mkver> jamrial, Lynne: Have a look at FF_CODEC_CAP_SKIP_FRAME_FILL_PARAM
<fflogger> [editedticket] redstone: Ticket #11506 ([undetermined] FFMPEG's AV1 hardware decoding is completely broken on Mac) updated https://trac.ffmpeg.org/ticket/11506#comment:6
<Lynne> mkver: adding that to .caps_internal doesn't seem to do it either
<mkver> Yup, it needs to be combined with a skip_frame check.
<Lynne> yes, that works
<mkver> Why are we actually not generically checking for AVDISCARD_ALL in ff_get_buffer()?
another is now known as another|
^Neo has quit [Ping timeout: 272 seconds]
<ramiro> haasn: nice. this way you can define different backends (DECL_IMPL, DECL_IMPL_VEC, CONTINUE, ...)
<ramiro> haasn: btw, "swscale: fix gray -> grayf32 SIGFPE" looks good to me.
<ramiro> (I guess you should still submit it to the ML though)
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]