michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
<kasper93>
Direct leak of 1 byte(s) /src/ffmpeg/tools/target_swr_fuzzer.c:135:16
<kasper93>
seems to be leaking when av_samples_fill_arrays() fails
<kasper93>
it probably never failed before(?)
<kasper93>
after 46e3bc2ebd21b215edce773de7c498121c1be766 it jumps over av_freep
<kasper93>
though I'm suprised you don't see this leak in your oss-fuzz dashboard
usagi_mimi has quit [Quit: WeeChat 4.5.2]
usagi_mimi has joined #ffmpeg-devel
usagi_mimi has quit [Quit: WeeChat 4.5.2]
usagi_mimi has joined #ffmpeg-devel
usagi_mimi has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
usagi_mimi has quit [Quit: WeeChat 4.5.2]
usagi_mimi has joined #ffmpeg-devel
usagi_mimi has joined #ffmpeg-devel
usagi_mimi has quit [Client Quit]
usagi_mimi has joined #ffmpeg-devel
<kasper93>
sent a hotfix
<kasper93>
hopefully this is last thing needed to unblock oss-fuzz
<kasper93>
also libswscale has pretty low coverage 0% in gamma.c and csputils.c lut3d.c
usagi_mimi has quit [Quit: WeeChat 4.5.2]
usagi_mimi has joined #ffmpeg-devel
usagi_mimi has joined #ffmpeg-devel
usagi_mimi has quit [Client Quit]
usagi_mimi has joined #ffmpeg-devel
derpydoo has quit [Quit: derpydoo]
mkver has joined #ffmpeg-devel
jamrial has joined #ffmpeg-devel
Grimmauld has joined #ffmpeg-devel
ccawley2011 has joined #ffmpeg-devel
cone-827 has quit [Quit: transmission timeout]
<haasn>
ramiro: I think the cleanest way to handle the multiple block sizes issue on the API level is to re-add the separate block_size() function; and then allow the user to set a lower block size in chain->block_w/h when actually compiling
<mkver>
jamrial: Can you provide a sample for 261cd929e06b716b3?
<haasn>
or, even better idea: the user can set chain->block_w/h before calling compile() to "suggest" a block size
<haasn>
and the implementation will then pick the best compatible one, rounding up or down as needed
<jamrial>
mkver: no, i wrote that three years ago. i have no idea how i tested it
<mkver>
I don't think it ever worked: skipped_last_frame is not synced among threads.
<mkver>
May I just nuke it?
Anthony_ZO has quit [Ping timeout: 252 seconds]
rit has joined #ffmpeg-devel
rit has quit [Remote host closed the connection]
rit has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 265 seconds]
<haasn>
Gramner: wbs: do you have an opinion on adjusting checkasm to report cycles per pixel instead of raw cycles? (for functions with fixed/known sizes)
<haasn>
the main reason I am interested in this is because in my swscale implementation, platforms may choose different block sizes; and benchmarking a 64x1 x86 kernel against a 32x1 C kernel misrepresents the speedup
<Gramner>
not a fan of that
<Gramner>
would just make things confusing imo
<haasn>
one alternative work-around I can implement is to call checkasm_get_perf_context() a second time and manually double the number of iters to normalize everything to the C block size
<haasn>
but I'm not a huge fan of that either
<haasn>
so I think what I'll do is just call a spade by a spade and name tests with different block sizes differently
<wbs>
that's what we do for e.g. codec DSP functions with different sizes yes
<haasn>
this loses the ability to see speedup numbers directly though
<wbs>
can't you loop the elemental functions to do a fixed size, running more or less iterations of the asm function?
<Gramner>
so the dsp functions does different things on different implementations or what? e.g. if you have a 64x64 block do you have call the dsp function 64 times in one implementation and 128 times in the other?
<Gramner>
that seems kind of awkward
<wbs>
I guess that requires some sort of function wrapper?
<haasn>
Gramner: the wrapper calls it as many times as needed, yes
<Gramner>
can't you just benchmark the wrapper then?
<haasn>
the wrapper is the same on all platforms
<haasn>
check_func() doesn't work that way, it relies on the uniqueness of function pointers
<haasn>
I think comparing it with traditional dsp functions is sort of flawed
<haasn>
a better analogy would be a bytecode interpreter
<Gramner>
i am indeed confused about this dsp design, yes. normally such things are considered internal implementations details within the dsp function itself. e.g. with wider vectors you may process more data per loop and perform fewer loop iterations
<Gramner>
but the caller doesn't know nor care about that
<haasn>
the problem is that we have a custom calling convention between dsp functions; and depending on how many vector registers you have available (and their size) there is a cpu-dependent limit on the amount of data that you can pass between functions
<haasn>
we specifically _avoid_ passing data in memory between function calls, which would be required if you need to process more data than you have available registers
<Lynne>
we had the same issue for lavu/tx
<haasn>
so what we do on e.g. AVX2 is reserve 8 vector registers for passing pixel data between functions
<Lynne>
could you just run the code as-is on a single-line, rather than benchmarking each chunk?
<haasn>
I could
<haasn>
I mean I could change func_new to the wrapper before calling bench_new()
<haasn>
and have the wrapper repeat the actual function the number of times until it covers a set size e.g. 256
<haasn>
Gramner: instead of a dsp table of fixed size kernels for specific operations, I have a compile() function that lets the backend assemble any sequence of primitives it wants; and in practice we use an internal calling convention + CPS to link kernels together
<haasn>
but e.g. ramiro is working on an asmjit backend that just directly generates the appropriate simd function
<haasn>
and I will also write a gpu backend that compiles it down to a compute shader
<Gramner>
if you have asm functions with custom calling conventions, surely those are just called internally from some other asm function right? you wouldn't call such functions directly from C and pray that the compiler doesn't clobber some particular register?
<haasn>
correct
<haasn>
so for testing internal functions we construct a minimal sequence { read from memory; OP_TO_TEST; write to memory }
<haasn>
and compile that to a function callable from C using the backend.compile()
<Gramner>
could you make that constructed function loop over a fixed (instead of implementation-specific) block size?
<Lynne>
you compile the function? don't you use function pointers?
witchymary has joined #ffmpeg-devel
<Gramner>
testing and comparing different implementations against each other would probably be more accurate if you're making them do the exact same thing rather than different things and trying to somehow normalize the results afterwards
usagi_mimi has quit [Quit: WeeChat 4.5.2]
<haasn>
yeah I think I've been thoroughly convinced that we should benchmark it over a fixed, larger number of pixels (say 256)
<haasn>
I can definitely do that without too much hassle
<haasn>
Lynne: don't understand the question
<haasn>
so my plan for now is to write a dedicated wrapper inside checkasm that will loop the platform-chosen block size over the fixed size of the checkasm test array
<haasn>
and then split the checkasm test up into each a separate entry for each possible block size
<haasn>
that gives us the most information and also standardizes the benchmark results
<Lynne>
haasn: its called compile, but it's not like you jit by gluing executable chunks, surely you just set a few function pointers in a context for functions to call whichever stage they need
<haasn>
Lynne: yeah, also assembling data for the ops
wyatt8750 has quit [Remote host closed the connection]
Flat has joined #ffmpeg-devel
jess has joined #ffmpeg-devel
averne_ has joined #ffmpeg-devel
averne has quit [Ping timeout: 246 seconds]
averne_ is now known as averne
wyatt8740 has joined #ffmpeg-devel
wyatt8740 has quit [Ping timeout: 245 seconds]
wyatt8740 has joined #ffmpeg-devel
Anthony_ZO has joined #ffmpeg-devel
<ramiro>
haasn: I think SwsFunc should return an int. it'll probably be a little bit tricky, since only write() would return a value. at first I'm thinking only of doing the horizontal loop, so the return value would be either "exec has been updated" or "exec has not been updated"
derpydoo has quit [Quit: derpydoo]
<ramiro>
at some point later in time, it could return "exec.x has been updated", "exec.x and exec.y have been updated", or "exec has not been updated". but I think when we factor in that the scaler could be downscaling or upscaling, it would probably make more sense if exec.y was not updated
<ramiro>
it would be much trickier to jit a scaler that can downscale and upscale, and at that point it would probably be easier to have the C code take care of it.