michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
<fflogger> [newticket] QFox: Ticket #11519 ([avcodec] Undefined symbol: ff_init_float2half_tables from Commit 1eafbf820312d45b31907e16877ae780022598c4) created https://trac.ffmpeg.org/ticket/11519
<BtbN> the hell is dude doing with his builds oO
cone-839 has joined #ffmpeg-devel
<cone-839> ffmpeg Timo Rothenpieler master:d54afd4d6147: avcodec/Makefile: fix build of exr decoder in odd configs
<cone-839> ffmpeg Timo Rothenpieler master:2de14c3e03ed: avcodec/tableprint_vlc: fix build with --enable-hardcoded-tables
<fflogger> [editedticket] Timo Rothenpieler <timo@rothenpieler.org>: Ticket #11519 ([avcodec] Undefined symbol: ff_init_float2half_tables from Commit 1eafbf820312d45b31907e16877ae780022598c4) updated https://trac.ffmpeg.org/ticket/11519#comment:1
<fflogger> [editedticket] Timo Rothenpieler <timo@rothenpieler.org>: Ticket #11518 ([avcodec] Missing ff_tlog due to Commit 0978fea7fa78782377c8b537969f4df1773d82ac) updated https://trac.ffmpeg.org/ticket/11518#comment:3
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
iive has quit [Quit: They came for me...]
Guest13 has joined #ffmpeg-devel
Guest13 has quit [Client Quit]
jamrial has quit [Read error: Connection reset by peer]
jamrial_ has joined #ffmpeg-devel
IndecisiveTurtle has quit [Ping timeout: 244 seconds]
^Neo has quit [Ping timeout: 268 seconds]
thilo has quit [Ping timeout: 272 seconds]
thilo has joined #ffmpeg-devel
thilo has quit [Changing host]
thilo has joined #ffmpeg-devel
<Traneptora> any changes to opus recently? appear to have a regression trying to parse the Opus packet header
<Traneptora> lemme see if I can bisect
minimal has quit [Quit: Leaving]
abdu75 has joined #ffmpeg-devel
abdu75 has quit [Ping timeout: 240 seconds]
cone-839 has quit [Quit: transmission timeout]
HarshK23 has joined #ffmpeg-devel
<fflogger> [editedticket] Balling: Ticket #11515 ([avcodec] Consider NV12 / P010 output pixel format support) updated https://trac.ffmpeg.org/ticket/11515#comment:4
rvalue has quit [Ping timeout: 252 seconds]
rvalue- has joined #ffmpeg-devel
rvalue- is now known as rvalue
jamrial_ has quit []
Mirarora has quit [Ping timeout: 252 seconds]
System_Error has quit [Ping timeout: 264 seconds]
MetaNova has quit [Read error: Connection reset by peer]
Mirarora has joined #ffmpeg-devel
System_Error has joined #ffmpeg-devel
Martchus has joined #ffmpeg-devel
Martchus_ has quit [Ping timeout: 260 seconds]
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
mkver has joined #ffmpeg-devel
twelve has joined #ffmpeg-devel
twelve has quit [Ping timeout: 268 seconds]
twelve has joined #ffmpeg-devel
twelve has quit [Remote host closed the connection]
derpydoo has joined #ffmpeg-devel
twelve has joined #ffmpeg-devel
Sean_McG has quit [Ping timeout: 260 seconds]
MyNetAz has quit [Remote host closed the connection]
MyNetAz has joined #ffmpeg-devel
Sean_McG has joined #ffmpeg-devel
twelve has quit [Ping timeout: 248 seconds]
twelve has joined #ffmpeg-devel
System_Error has quit [Ping timeout: 264 seconds]
twelve has quit [Ping timeout: 244 seconds]
derpydoo has quit [Quit: derpydoo]
Anthony_ZO has joined #ffmpeg-devel
HarshK23 has quit [Quit: Connection closed for inactivity]
Guest18 has joined #ffmpeg-devel
Guest18 has quit [Client Quit]
MetaNova has joined #ffmpeg-devel
ngaullier has joined #ffmpeg-devel
rvalue has quit [Ping timeout: 244 seconds]
abdu has joined #ffmpeg-devel
twelve has joined #ffmpeg-devel
microchip_ has quit [Quit: There is no spoon!]
microchip_ has joined #ffmpeg-devel
pross has quit [Ping timeout: 245 seconds]
IndecisiveTurtle has joined #ffmpeg-devel
System_Error has joined #ffmpeg-devel
abdu has quit [Ping timeout: 240 seconds]
abdu has joined #ffmpeg-devel
cone-325 has joined #ffmpeg-devel
<cone-325> ffmpeg Gyan Doshi master:740d4009656a: ffbuild: use response files only if ar accepts them
rvalue has joined #ffmpeg-devel
<IndecisiveTurtle> Lynne: That yuv444p16 bug is quite strange, the format of image views is basically the same R16_UINT as p10 and even with correct diff_offset it results in complete garbage
<IndecisiveTurtle> I am gonna try to hack a renderdoc API integration, as being able to visualize the images would help out a lot.
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
twelve has quit [Remote host closed the connection]
twelve has joined #ffmpeg-devel
abdu has quit [Ping timeout: 240 seconds]
abdu has joined #ffmpeg-devel
<ramiro> haasn: I'm writing a poc backend using asmjit with neon. I don't like the fact that asmjit is c++ and that its api seems to change frequently, but it's the quickest option I found so far (and it has register allocation which is a plus). I'd like to write ffjit, but it would take a while...
<haasn> I think we are fine depending on external libraries as long as we have the C templates to fall back on
<ramiro> I've only implemented read+swizzle+write so far, but it gets rgb24->gbrp and stuff like that
<haasn> ramiro: if you do have some numbers though I’d be hugely interested in what’s going on with float performance if we’re forced to pass the high and low halves separately
<haasn> Maybe at that point fixed precision will be better
<haasn> I do think I will try and implement a 16 bit fixed precision backend in the C templates as well as my next step
<ramiro> but now I have a question. suppose we're using chunks of 16. read will read one full vector of data. but then when we're supposed to convert to float32, should we use 4 vector registers? how are you currently dealing with that?
<wbs> note, that relying on a JIT is a complex API - not all environments do allow JITs (iirc iOS doesn't allow you to do that)
<wbs> so a JIT shouldn't be mandatory for decent performance
<haasn> ramiro: my current plan is to use half registers for 8 bit data, full registers for 16 bit data and double registers for 32 bit data
<haasn> On platforms with only 16 vregs
<haasn> On platforms with 32 vregs I think we can go 1/2/4
<haasn> ramiro: actually another possibility is to call the continuation twice, although that requires a slight bit of finesse wrt output pointers
<ramiro> wbs: asmjit mentions "MAP_JIT is also supported on Apple platforms". but yes, we definitely shouldn't rely on jit for decent performance.
<haasn> And also requires storing data on the stack, so it’s likely slower overall
<haasn> (It was slower in C when I tried it)
<wbs> ramiro: probably on apple platforms in general, on macOS it's probably no problem. but on iOS it has been disallowed by policy for a long time (and lately it may be allowed within EU for some reasons, but it requires you to negotiate with them for getting the permission etc)
<ramiro> wbs: another option would be to run the swscale test, collect all jited functions into asm files, and add that to the normal build.
<haasn> ramiro: with a slight bit of finesse we can also use just “ret” instructions to chain together ops
<wbs> ramiro: or just do build time concatenation of asm snippets? :-)
<haasn> I think at the point the overhead will be negligible
<haasn> ramiro: maybe not the entire function but at least common compositions
<ramiro> which would turn just-in-time into ahead-of-time which brings us back to a normal-time build :P
<haasn> I will remind that my original approach in the swscale3 branch allowed an implementation to map to multiple functions
<wbs> ramiro: yes, exactly. build time complication is usually no problem, but requiring fancy runtime stuff is problematic across many more exotic platforms
<haasn> Rather than N:M input<-> output pairs you could pregenerate one input function and one output function for every pixfmt
<haasn> That gives us N cases instead if N*N cases to worry about
<ramiro> haasn: but this is pretty much what the non-jit code will do, isn't it?
ccawley2011 has joined #ffmpeg-devel
<haasn> The real reason to use a JIT would be on x86 due to its myriad number of vector extensions
<haasn> The C reference code will likely stick with one function per op
<haasn> If only for sanity and ease of testing
<haasn> Assuming we do add dedicated asm backends
<haasn> Then we could even drop the use of GCC vectors from the reference code and gradually make it smaller/slower
<haasn> wbs: I like this idea actually, basically running the code in swscale/ops.c at compile time to generate the ops list for each pixfmt, and compiling that into an asm function to dump into a big table
<haasn> That way we can guarantee that the implementation will be correct as well, since we’re using the same logic to generate the ops list
<haasn> While avoiding the headache of JIT
<ramiro> haasn: does the current code already negotiate the chunk size based on the conversions that will be performed? for example if it's just read+swizzle+write, we can operate on full vectors. but if it converts to float, then it should read half a vector.
<haasn> ramiro: not yet, but I think that would be an easy change to make
<haasn> Though that would require that eg 8 bit functions will also operate on ymm0 instead of xmm0 even though the second half will often be ignored
abdu has quit [Ping timeout: 240 seconds]
<haasn> For RISC-V I think we are better off operating on M1/M2/M4 and sticking with a fixed chunk size of VLEN/8
<haasn> Since an 8 bit loop operating on M1 elements is executing 4x faster than an 8 bit loop operating on M4 elements
^Neo has quit [Ping timeout: 260 seconds]
<haasn> ramiro: I think that we will also want a dedicated read/write for every possible block multiple then also
<haasn> So one read function to read xmm, one to read ymm and one to read zmm for example
<ramiro> haasn: yes, that makes sense
<haasn> In theory we can have an implementation per block size for the other ops also
<haasn> I don’t see any reason why not except code size
<haasn> On platforms where zmm is slower than xmm for example
<haasn> (Or equivalently, we could have a variant of 8 bit ops operating on M4 in RISC-V)
<haasn> This conversation is suddenly inspiring me with great ideas :D
IndecisiveTurtle has quit [Remote host closed the connection]
<fflogger> [editedticket] jb_alvarado: Ticket #11469 ([ffmpeg] ffmpeg_demux: readrate plays "catch up" if output is blocked, then later resumed) updated https://trac.ffmpeg.org/ticket/11469#comment:6
twelve has quit [Ping timeout: 272 seconds]
jamrial has joined #ffmpeg-devel
<Lynne> ramiro: I don't like depending on external libraries
<Lynne> not for this particular purpose
<Lynne> let alone "its majick vectors lol"
<Lynne> haasn: I still think you're overthinking this way too much
<ramiro> Lynne: I don't like dependencies either. it's a poc.
<Lynne> focus on C, the assembly can come later
<cone-325> ffmpeg Andreas Rheinhardt master:6bd4e8bf76c7: avcodec/vvc/Makefile: Move VVC decoder->h2645data dep to lavc/Makefile
<cone-325> ffmpeg Andreas Rheinhardt master:81c50c33b6f5: avcodec/Makefile: Only compile executor when VVC decoder is enabled
<haasn> ramiro: one gotcha to pay attention to is the fact SWIZZLE can currently also be used to duplicate (fan-out) inputs
<haasn> and in this case you really do want a vmv, not just renaming the vector
twelve has joined #ffmpeg-devel
<haasn> otherwile a sequence like SWIZZLE, SCALE would scale the same vector twice
<ramiro> haasn: then maybe this should be flagged somehow? or separate ops
<haasn> something else I just realized is that 8 vectors may not be enough to implement LINEAR in the worst case without spilling to stack
<haasn> or maybe it's just barely enough
<haasn> should be enough assuming we have vector<->scalar multiplications
<haasn> ramiro: I could add a function to check for it
<haasn> could use separate ops in theory but since the implementation is tied to the exact mask I don't see the benefit in splitting them up
ccawley2011_ has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 245 seconds]
MyNetAz has quit [K-Lined]
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
Anthony_ZO has quit [Ping timeout: 248 seconds]
rvalue has quit [Read error: Connection reset by peer]
rvalue has joined #ffmpeg-devel
twelve has quit [Ping timeout: 246 seconds]
ccawley2011_ has quit [Ping timeout: 246 seconds]
twelve has joined #ffmpeg-devel
ccawley2011 has joined #ffmpeg-devel
<fflogger> [editedticket] Gyan: Ticket #11469 ([ffmpeg] ffmpeg_demux: readrate plays "catch up" if output is blocked, then later resumed) updated https://trac.ffmpeg.org/ticket/11469#comment:8
System_Error has quit [Ping timeout: 264 seconds]
HarshK23 has joined #ffmpeg-devel
System_Error has joined #ffmpeg-devel
<fflogger> [editedticket] jb_alvarado: Ticket #11469 ([ffmpeg] ffmpeg_demux: readrate plays "catch up" if output is blocked, then later resumed) updated https://trac.ffmpeg.org/ticket/11469#comment:9
IndecisiveTurtle has joined #ffmpeg-devel
<haasn> ramiro: do you think it's sane to process too many pixels when the image width is not a multiple of the chunk size, but the linesize is large enough to have padding for the remaining pixels?
<haasn> this does mean we will read and possibly operate on undefined values
<haasn> I mean we can definitely use it at least for writes
<haasn> but I am worried about some platforms doing strange things when we end up e.g. dividing by zero
twelve has quit [Remote host closed the connection]
twelve has joined #ffmpeg-devel
<Lynne> if you can mask them to a constant value, it would be fine
<haasn> maybe, but that seems nontrivial in this design
<haasn> ramiro: is there ever a use case for having a pointer alignment requirement that's *smaller* than the block width?
<haasn> for example a platform where aligned reads of 256 bit data only requires pointers aligned to 16 bytes
<haasn> I suppose it can be the case on implementations that read e.g. 512 bits as two separate 256 bit vectors, in this case we just require 32 byte alignment on pointers
<fflogger> [editedticket] quinkblack: Ticket #11510 ([undetermined] Hardware accelerator failed to decode picture - Videotoolbox HEVC) updated https://trac.ffmpeg.org/ticket/11510#comment:2
<Lynne> I believe that's the case on aarch64, where e.g. ld1 { v0, v1, v2, v3 }, [x0] loads 512 bits but requires 128 bit alignment
<Lynne> internally the hardware breaks that down to multiple uops, but this instruction is encoded in 32 bits, which saves on decoding
abdu has joined #ffmpeg-devel
cone-325 has quit [Quit: transmission timeout]
abdu42 has joined #ffmpeg-devel
abdu42 has quit [Client Quit]
abdu has quit [Quit: Client closed]
abdu42 has joined #ffmpeg-devel
s55 has quit [Ping timeout: 248 seconds]
<fflogger> [editedticket] Gyan: Ticket #11469 ([ffmpeg] ffmpeg_demux: readrate plays "catch up" if output is blocked, then later resumed) updated https://trac.ffmpeg.org/ticket/11469#comment:10
s55 has joined #ffmpeg-devel
twelve has quit [Remote host closed the connection]
<ramiro> haasn: it should be ok to process into the padding between width and linesize. but why would it be dividing by zero only on outside the image data?
<haasn> ramiro: err, right; actually the only circumstance I can think of is if we do a float read and end up reading e.g. NaN, but we need to avoid crashing on NaN float input one way or the other
<haasn> or put another way, if swscale needs to be robust against illegal data in the image it needs to be robust against illegal data in the padding
<ramiro> haasn: I don't know much about your alignment question though. I tend to think that the alignment should be the same as the vector size, but I'm probably wrong. if it crashes or is way too slow, then I look into alignment issues.
<haasn> on x86 at least I found that unaligned reads that don't cross a cache line have almost no performance penalty vs aligned reads
<haasn> maybe like a fraction of a percent slower due to extra checking the cpu needs to do?
<haasn> like, the measurement error was higher than the difference
<ramiro> haasn: what happens when there are NaNs on the image data? I suppose we fix it up somehow. so I guess the same could be done to the padding.
<Lynne> movups vs movaps have zero difference if the address is aligned on modern CPUs
<Lynne> exactly none; the CPU is smart enough to optimize aligned loads automatically
<fflogger> [editedticket] jb_alvarado: Ticket #11469 ([ffmpeg] ffmpeg_demux: readrate plays "catch up" if output is blocked, then later resumed) updated https://trac.ffmpeg.org/ticket/11469#comment:11
Sean_McG has quit [Quit: Lost terminal]
Guest77 has joined #ffmpeg-devel
<Guest77> Hello Everyone,
s55 has quit [Read error: Connection reset by peer]
<Guest77> I am planning to participate in Google summer of Code 2025. However, there is requirement to submit a patch to ffmpeg. Can you please guide from where should I start?
<JEEB> start with building FFmpeg, which starts with cloning the git repository, then basically `mkdir -p build && cd build && ../configure` and then if the configure script succeeds, `make -jN` where N is the amount of cores or so
s55 has joined #ffmpeg-devel
HarshK23 has quit [Quit: Connection closed for inactivity]
<Guest77> JEEB, Thanks, I have already done that. I am using ffmpeg for my thesis. So, I have it build. My thesis work is in AV1 codec where I am working on Switching frames.
<Guest77> I noticed that the website mentions the need to submit code fixes for open issues, and I’m looking for guidance on how to proceed with that aspect. Any pointers would be greatly appreciated.
<JEEB> nice re: AV1
<JEEB> https://trac.ffmpeg.org/ is currently our issue tracker
<JEEB> it is quite messy in the sense that I don't we have a nice newcomer list which would have a curated set of issues
<Guest77> JEEB, Thanks for pointing out to issue tracker. I will spend some time there. I am interested in VP6 encoder project as it aligns with the work I am doing in my thesis. It's qualification task is to "Fix a random bug in FFmpeg _or_ extend libavcodec/vpx_rac.h to support encoding". I will start looking into it.
abdu42 has quit [Ping timeout: 240 seconds]
<haasn> why does av_find_best_pix_fmt_of_2 penalize hwfmt <-> hwfmt translations so heavily? (always returns -1/-2 for these)
<haasn> it leads to the strange situation where filters that support both hwfmts and swfmt (currently vf_libplacebo, but possibly soon vf_scale) will always try to output software frames
<haasn> then again, I guess it leads to less surprising behavior
Guest77 has quit [Quit: Client closed]
delewis has quit [Remote host closed the connection]
<ePirat> haasn, so when I chain two filters that handle both hw and sw formats it would still output a sw one to the next filter even if it could handle the hw one?
<haasn> correct
<ePirat> that seems very wrong
delewis has joined #ffmpeg-devel
<ePirat> I would never expect it to do that…
<haasn> e.g. -vf format=yuv420p,libplacebo,libplacebo,format=yuv420p /* will contain a redundant roundtrip through RAM in between the two libplacebo filters */
<haasn> well, the avfiltergraph code does not know that vf_libplacebo is uploading data to VRAM and back internally
<haasn> in this case you would need -vf format=yuv420p,libplacebo,format=vulkan,libplacebo,format=yuv420p to avoid the round trip
<ePirat> that seems very unintuitive tbh
<haasn> but if we imagine a future where vf_scale can do the same, I would argue that vf_scale should not implicitly upload data to VRAM just because it can
<kepstin> does it properly handle hw frames on different hw as being incompatible?
<haasn> I guess the ideal solution would be to add a flag to the filter saying that it prefers hwaccel formats
<haasn> kepstin: this is impossible; the filter link specifies the hwdevice
<haasn> or else I don't understand what you mean
ccawley2011 has quit [Read error: Connection reset by peer]
abdu42 has joined #ffmpeg-devel
SuperFashi has quit [Quit: No Ping reply in 180 seconds.]
<kepstin> i just seem to recall something where hardware frames from different apis weren't distinguished when checking for link compatibility somewhere in the filter chain code? which could have resulted in it thinking that e.g. a vaapi frame and a vulkan frame were compatible and then failing later.
<kepstin> i could be entirely wrong (or at least outdated) about that, tho :(
SuperFashi has joined #ffmpeg-devel
<ePirat> haasn, yeah I guess such a flag would make sense…
BtbN has quit [Remote host closed the connection]
BtbN has joined #ffmpeg-devel
<ramiro> haasn: when optimizing rgb24 -> 0bgr, it seems that the clear is merged into the read (or it expects the value of a vector to already be zero). but in my asmjit code I don't needlessly clear the vectors before starting.
<ramiro> do you remember off the top of your head where I can disable this optimization? (or mark that vectors aren't cleared to start with?)
<haasn> in ops.c:op_list_update_comps() change op->comps.flags[i] |= SWS_COMP_ZERO | SWS_COMP_EXACT; to op->comps.flags[i] = SWS_COMP_GARBAGE
<haasn> for SWS_OP_ROAD
<ramiro> haasn: thanks, that seems to have worked
ngaullier has quit [Remote host closed the connection]
<haasn> btw, I refactored my framework to allow arbitrary block sizes, e.g. 32x32 in theory
<haasn> not sure if there will be any benefit to it, though a subsampled implementation may want to stripe together multiple luma rows
abdu5 has joined #ffmpeg-devel
abdu42 has quit [Ping timeout: 240 seconds]
realies has quit [Quit: Ping timeout (120 seconds)]
realies has joined #ffmpeg-devel
Guest7821 has quit [Ping timeout: 244 seconds]
realies has quit [Quit: Ping timeout (120 seconds)]
Guest61 has joined #ffmpeg-devel
abdu5 has quit [Quit: Client closed]
abdu5 has joined #ffmpeg-devel
IndecisiveTurtle has quit [Ping timeout: 265 seconds]
realies has joined #ffmpeg-devel
cone-778 has joined #ffmpeg-devel
<cone-778> ffmpeg James Almer master:bf22c4cc3e00: avutil: only duplicate hal2float and float2half in shared builds
Guest61 has quit [Quit: Client closed]
abdu34 has joined #ffmpeg-devel
minimal has joined #ffmpeg-devel
abdu5 has quit [Ping timeout: 240 seconds]
ccawley2011 has joined #ffmpeg-devel
pross has joined #ffmpeg-devel
abdu34 has quit [Quit: Client closed]
abdu34 has joined #ffmpeg-devel
<cone-778> ffmpeg Andreas Rheinhardt master:c0b7f817a4c7: avcodec/Makefile: Skip ffv1_vulkan.h in checkheaders
IndecisiveTurtle has joined #ffmpeg-devel
ccawley2011_ has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 244 seconds]
ccawley2011_ has quit [Read error: Connection reset by peer]
rvalue has quit [Quit: ZNC - https://znc.in]
LainExperiments has joined #ffmpeg-devel
LainExperiments has quit [Quit: Client closed]
rvalue has joined #ffmpeg-devel
abdu34 has quit [Ping timeout: 240 seconds]
LainExperiments has joined #ffmpeg-devel
Marth64 has quit [Remote host closed the connection]
rvalue- has joined #ffmpeg-devel
rvalue has quit [Read error: Connection reset by peer]
rvalue- is now known as rvalue
LainExperiments has quit [Quit: Client closed]