michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
IndecisiveTurtle has joined #ffmpeg-devel
kasper93_ is now known as kasper93
abdu67 has joined #ffmpeg-devel
twelve has quit [Remote host closed the connection]
aaabbb has quit [Ping timeout: 248 seconds]
abdu has quit [Ping timeout: 240 seconds]
LainExperiments has joined #ffmpeg-devel
LainExperiments4 has joined #ffmpeg-devel
LainExperiments4 has quit [Client Quit]
LainExperiments has quit [Ping timeout: 240 seconds]
aaabbb has joined #ffmpeg-devel
iive has quit [Quit: They came for me...]
twelve has joined #ffmpeg-devel
minimal has quit [Quit: Leaving]
twelve has quit [Remote host closed the connection]
abdu67 has quit [Quit: Client closed]
abdu67 has joined #ffmpeg-devel
thilo has quit [Ping timeout: 260 seconds]
thilo has joined #ffmpeg-devel
zeezie01 has quit [Quit: Leaving]
cone-371 has joined #ffmpeg-devel
<cone-371>
ffmpeg Peter Ross release/7.1:276bd388f33b: avcodec/Makefile: include aom_film_grain.o file for h264_sei component
<fflogger>
[editedticket] jamrial: Ticket #11491 ([avcodec] n7.1.1 fails to build with some options (but not on master), backport needed) updated https://trac.ffmpeg.org/ticket/11491#comment:2
^Neo has quit [Ping timeout: 245 seconds]
JackJ30 has joined #ffmpeg-devel
JackJ30 has quit [Remote host closed the connection]
JackJ30 has joined #ffmpeg-devel
abdu67 has quit [Ping timeout: 240 seconds]
rvalue has quit [Read error: Connection reset by peer]
rvalue has joined #ffmpeg-devel
sudden has quit [Ping timeout: 248 seconds]
sudden has joined #ffmpeg-devel
abdu67 has joined #ffmpeg-devel
jamrial_ has quit []
JackJ30 has quit [Remote host closed the connection]
abdu67 has quit [Ping timeout: 240 seconds]
<cone-371>
ffmpeg Andreas Rheinhardt master:dff498fddfae: avutil/csp: Improve enum range comparisons
<cone-371>
ffmpeg Andreas Rheinhardt master:65154ba99442: swscale/tests/swscale: Fix potential buffer overflow
<cone-371>
ffmpeg Andreas Rheinhardt master:94fd222235a9: avcodec/mathtables: Fix inaccurate macro name
<cone-371>
ffmpeg Andreas Rheinhardt master:e5d62e20c8d8: avdevice/sdl2: Suppress macro redefinition warning
<Lynne>
JEEB: wtf
<Lynne>
I thought I'd seen everything
<Lynne>
tver switched its backend a few days ago from brightcove to streaks
<Lynne>
no more easy geolock bypassing, which is very annoying
<Lynne>
but worse, the new streaks backend, uses AAC at 64kbps... LTP!
<Lynne>
AAC-LTP! AT 64KBPS!
<Lynne>
I don't even know where they got the encoder from... I don't think fraunhofer, so... us?
zsoltiv has quit [Ping timeout: 246 seconds]
<Lynne>
...I think its ours, similar artifacts, same bitrate behaviour
zsoltiv_ has quit [Ping timeout: 252 seconds]
<Lynne>
so many questions, from which version they used, to enabling -strict -2, to even considering its production-quality, let alone that its an appropriate profile when everyone gave up on it as soon as it made it into the aac spec
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
av500 has quit [Remote host closed the connection]
derpydoo has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
cone-371 has quit [Quit: transmission timeout]
MisterMinister has quit [Ping timeout: 260 seconds]
RuMi_Nos has joined #ffmpeg-devel
mlauss2 has joined #ffmpeg-devel
mlauss2 has quit [Quit: Client closed]
ngaullier has joined #ffmpeg-devel
twelve has joined #ffmpeg-devel
Guest11 has joined #ffmpeg-devel
Guest11 has quit [Quit: Client closed]
twelve has quit [Ping timeout: 252 seconds]
microchip_ has quit [Ping timeout: 244 seconds]
microchip_ has joined #ffmpeg-devel
derpydoo has quit [Quit: derpydoo]
abdu67 has joined #ffmpeg-devel
System_Error has quit [Ping timeout: 264 seconds]
compnn has joined #ffmpeg-devel
Marth64 has quit [Remote host closed the connection]
ccawley2011 has joined #ffmpeg-devel
natto17 has joined #ffmpeg-devel
natto has quit [Ping timeout: 260 seconds]
compnnn has quit [Read error: Connection reset by peer]
RuMi_Nos has quit [Ping timeout: 260 seconds]
RuMi_Nos has joined #ffmpeg-devel
j45_ has joined #ffmpeg-devel
j45 has quit [Ping timeout: 260 seconds]
j45 has joined #ffmpeg-devel
j45_ is now known as j45
j45 has quit [Changing host]
System_Error has joined #ffmpeg-devel
twelve has joined #ffmpeg-devel
<JEEB>
Lynne: fun that someone actually utilized that. and if I recall correctly you recently removed the support for that in the avcodec encoder?
^Neo has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
^Neo has quit [Changing host]
ccawley2011 has quit [Ping timeout: 248 seconds]
ccawley2011 has joined #ffmpeg-devel
<Lynne>
yeah, 2 weeks ago or so
rvalue has quit [Read error: Connection reset by peer]
rvalue has joined #ffmpeg-devel
<BBB>
I think the concept of quality comparisons isn't well-understood in some places
<BBB>
"it works!" is what they're looking for
<BBB>
where "works" is mostly a technical concept, not one involving actual QoE
jamrial has joined #ffmpeg-devel
abdu67 has quit [Ping timeout: 240 seconds]
<JEEB>
so on zlib-ng hosts `IGNORE_TESTS="copy-apng cover-art-aiff-id3v2-remux cover-art-flac-remux cover-art-mp3-id3v2-remux mov-cover-image png-icc png-mdcv shortest-sub apng png lavf-png vsynth_lena-flashsv vsynth1-flashsv vsynth1-mpng vsynth1-zlib vsynth2-flashsv vsynth2-mpng vsynth2-zlib vsynth3-flashsv vsynth3-mpng vsynth3-zlib vsynth_lena-flashsv vsynth_lena-mpng vsynth_lena-zlib"` seems to work with
<JEEB>
FATE
abdu67 has joined #ffmpeg-devel
<jamrial>
JEEB: nobody willing to write a native no compression deflate impl?
<jamrial>
i think mkver mentioned there was one already in some encoder
<JEEB>
yea
<haasn>
is there a way to load the image edge into xmm without risking a segfault from overread?
<haasn>
or do I have to fall back to scalar code (or pre-packing the image border)?
<jamrial>
haasn: pad the buffer
<jamrial>
always allocate at least 64 bytes more than you need
<haasn>
I guess in the case of read it's trivial to just memcpy the last chunk into a temporary buffer
<haasn>
and for write I imagine there's some of masked write operation?
<BtbN>
There's a whole mechanism already to pad buffers so no such workarounds are needed
<BtbN>
and a whole bunch of code that relies on buffers being passed already exists
<haasn>
we currently have several open, critical issues on trac about swscale causing segfaults due to overread/overwrite
<wbs>
unfortunately, with libswscale, you can't assume much about how the caller has allocated/padded the buffer it passes to you
<haasn>
I don't think there is any clearly documented padding requirement for swscale and if so, it isn't being enforced
<haasn>
in my rewrite I detect if the buffer is properly aligned/padded (via linesize)
<haasn>
and if not, I want to handle the fallback somehow
<haasn>
I guess memcpy into a padded buffer is the hammer solution
<JEEB>
yea
<haasn>
really we need three separate cases: 1) aligned, padded, 2) unaligned, padded, 3) unaligned, unpadded
<haasn>
for case 1) we can use normal vector load/store; for case 2) we can use unaligned loads/storse
<haasn>
for case 3) we can use unaligned loads for the main loop (we will load extra pixels from the next line), and a memcpy of the tail into a padded buffer for the last line
<haasn>
but for unpadded store we need a masked write or somethign
<haasn>
I think I will force a memcpy on unpadded writes
<haasn>
instead of forcing the poor assembly code to worry about it
<haasn>
in practices it's probably not different, the compiler seems to output the same code actually
<haasn>
(write xmm0 to stack and jump into memcpy)
abdu45 has joined #ffmpeg-devel
abdu67 has quit [Ping timeout: 240 seconds]
<JEEB>
I like how ignored tests have `IGNORE\t` in the terminal output :D
<haasn>
is there any platform where unaligned load/store on aligned pointers is slower than aligned load/store?
<haasn>
because if not, we may not need to bother with a separate code path
<nevcairiel>
old x86, maybe some arms
<JEEB>
ah, lavf-gray16be.png I forgot
<JEEB>
`fate-lavf-images` requires `lavf-gray16be.png lavf-rgb48be.png` being ignored
<ramiro>
haasn: I think it might be simpler to always call the c version (one element at a time) at the border of unpadded images.
<JEEB>
alright, this should hopefully get me the full one
<haasn>
ramiro: that is unfortunately pretty difficult, would require maintaining a whole separate ops chain; I also don't really see the point as I strongly doubt it would be faster than a memcpy + calling the asm version
<haasn>
especially given that the C version is now just a reference implementation without any regards for performance
<haasn>
I think it makes much more sense to just memcpy the last block where absolutely necessary, I'm just trying to see if we should bother caring about aligned load/store
<ramiro>
haasn: it won't be faster, but it will be much simpler. simd optimizations do the best they can do with their vector sizes, and not have to worry about edge cases.
<haasn>
that is the argument for doing a memcpy also
<haasn>
I think I will make unaligned load/store the default assumption for asm backends for now
<haasn>
and then later add the ability to make an aligned fast path
<haasn>
only on platforms where we can measure an actual improvement
<haasn>
neon doesn't even seem to have aligned loads
<wbs>
haasn: 32 bit arm has got aligned loads, 64 bit doesn't
<wbs>
(for neon)
<Lynne>
arm's unaligned accesses have been historically slow, IIRC
<Lynne>
but yeah, you shouldn't worry about it these days
<wbs>
in the C intrinsics, you can't request aligned loads though, iirc
<haasn>
rvv also doesn't have aligned loads
<wbs>
Lynne: from what I remember, aligned vs non-aligned loads only made a checkasm measurable difference on the very very earliest armv7 cores
<haasn>
Lynne: unaligned access on aligned pointers, right?
<haasn>
and even then the difference was like, what, 5% on a purely memory constrained kernel?
<Lynne>
no, unaligned on unaligned
<haasn>
but I'm talking about the performance penalty of using unaligned load/store on aligned pointers
twelve has quit [Remote host closed the connection]
Anthony_ZO has quit [Read error: Connection reset by peer]
Anthony_ZO has joined #ffmpeg-devel
Anthony_ZO has quit [Ping timeout: 276 seconds]
<averne>
Hi all, I'm hitting a bit of a roadblock while working on a multithreaded hwaccel
<averne>
Basically, for interlaced content, I would need to have per-field command buffers etc, except the usual mechanisms such as AVFrame->private_ref->data and hwaccel_picture_private are per-frame
<averne>
The issue is that by proceeding on the two fields simultaneously, I would overwrite codec setup and command list memory/However, if I wait for the first field to have completed before kicking off the second, I run into another issue because subsequent frames, depending on this one, get submitted in-between and try to do inter-pred on undecoded data
<averne>
So essentially, I need to either 1) have per-field acceleration structures or 2) force serialization of the entire decoding process
<Lynne>
averne: vulkan is already multithreaded, did you take a look at it?
<averne>
Yes, I mostly referenced your code while working on mine
<Lynne>
could you simply submit 2 command buffers for each frame sequentially?
<nevcairiel>
the existing multithreading dependency tracking should already track field progress, no?
<Lynne>
ah, right, yup, each start/end_frame gets called once per slice
<averne>
That's not what I've been experiencing with the MR8_BT_B sample. It calls start_frame on the second field before the first field got to end_frame
<nevcairiel>
hwaccel is sometimes special cased to avoid these things since it doesnt typically work on a slice-by-slice basis
<averne>
"could you simply submit 2 command buffers for each frame sequentially?" -> that's what I tried, make the second field wait on the first's completion. However it breaks since other frames get submitted in-between
<Lynne>
are you relying on submission order?
<Lynne>
you should make the hardware wait, not the program
<Lynne>
via semaphores
abdu45 has quit [Ping timeout: 240 seconds]
twelve has joined #ffmpeg-devel
mindfreeze has quit [Ping timeout: 272 seconds]
mindfreeze has joined #ffmpeg-devel
<averne>
Well since all the decoder instances share the same queue, that would deadlock
<Lynne>
its a queue, though?
<Lynne>
ah, you have no scheduler?
<Lynne>
desktop GPUs have the GSP which handles scheduling and buffer creation for you, I guess the tegra doesn't
<Lynne>
without scheduling, you should disable threading, no way around that sadly without big modifications
<averne>
No I'm actually on desktop right now, but I don't think scheduling is relevant when you have a single queue. Queues on nvidia are basically wrappers around the host engine (also called gpfifo/pbdma sometimes) which processes commands linearly
<Lynne>
you don't get to use the GSP by default on desktop unless you use nouveau/nvk
<Lynne>
scheduling should be relevant, if one submission waits on the result of another, later submission, the hardware should pause the current submission and execute the second one
twelve has quit [Remote host closed the connection]
<averne>
I'm on nvidia-open which is GSP-only afaik
ccawley2011 has quit [Ping timeout: 245 seconds]
<Lynne>
it used to be, not sure if it is now
<Lynne>
nvk is pretty good tbh, if you're not using cuda its worth a try
<averne>
"nvk is pretty good tbh, if you're not using cuda its worth a try" -> well I'm not really using cuda, I'm just doing this for fun basically
<averne>
At this point if I switch to nvk I'd rather fix up the vulkan-video MR
<averne>
Well I'll see if I can put something together. Otherwise, I guess multi-threading is more trouble than it's worth
<Lynne>
something together?
<Lynne>
the gain from multithreading isn't that significant tbh, its very situational
<averne>
Try to have per-field acceleration structures in a way that's not too hacky
<averne>
And yeah that's my experience too. It only really matters when your decoding is cpu-bound because you can parse headers and push commands faster
<Lynne>
what would you store there?
IndecisiveTurtle has quit [Quit: IndecisiveTurtle]
<averne>
Everything that the hardware needs (codec setup, ...). If I have a single instance of these structures per-frame, and don't wait on the first field, I would overwrite its data
<Lynne>
that's a lot
<Lynne>
and it's for interlaced...
<Lynne>
I think you should just somehow hack in support fully inside the hwaccel
<ramiro>
haasn: yuv444p -> yuva444p, this could be optimized to memcpy+memset
<haasn>
ramiro: I was thinking about this also
<Lynne>
just ref both field, keep track of it somewhere, and once you have both, submit two command buffers
<Lynne>
its interlaced, which no one should be caring about these days
<haasn>
ramiro: one idea would be to just have a planar memcpy backend
<haasn>
that can compile READ {planar = true}; CLEAR; SWIZZLE; WRITE {planar = true} into a sequence of memcpy and memset calls
<haasn>
something I would like to ultimately be able to do is to solve that with plane refs at a higher level, if we have separate buffers per plane
<averne>
Lynne: a lot of memory? not that much tbh, for 1080p it's a few 100s of kbs at most. And most of that is the compressed bitstream
<Lynne>
not memory, state
<averne>
Ah, right
<Lynne>
with the new changes I made, you can ref the incoming packet
<Lynne>
if you need it
<averne>
Can you link the patch? Maybe this might help
<Lynne>
all it does is it adds an AVBufferRef *pkt to start_frame()
<Lynne>
caff29dbb18feeb87cb00fc4c33d20cf01667be0
<averne>
Thanks, I'll try to see if I can use this
<averne>
But yeah, submitting the two fields at once might be the right way to go about this
jack has joined #ffmpeg-devel
swarup3204 has joined #ffmpeg-devel
<haasn>
ramiro: heads up: I refactored the ops_internal.h ABI quite substantially in my branch (not yet pushed); one of the changes is to replace the const void *priv pointer by a small (16 bytes) union
<haasn>
so you can directly store e.g. 4 floats without requiring a second indirection
<haasn>
though I imagine you don't care about this in your JIT impl since you just compile them down to constants
cone-022 has joined #ffmpeg-devel
<cone-022>
ffmpeg Dmitrii Ovchinnikov master:5b460bde8b31: libavutil/hwcontext_amf: add format validation in transfer_data functions
<fflogger>
[newticket] iyesin: Ticket #11523 ([ffmpeg] Conflicting requirements while encoding with SVTAV1) created https://trac.ffmpeg.org/ticket/11523
<fflogger>
[newticket] Lastique: Ticket #11524 ([avcodec] g726(le) produces clicks in encoded audio) created https://trac.ffmpeg.org/ticket/11524
<swarup3204>
Hey all, I am Swarup, a final year undergraduate from IIT KGP. I recently joined this chat and am interested in the projects. Could anyone point to a good starting point, especially some example bugs and their patches along with new bugs
<frankplow>
swarup3204: You can find open and fixed bugs on https://trac.ffmpeg.org/ When people close the trac ticket they will tend to refer to a commit SHA if a code change was made.
<haasn>
versus time=449 us, ref=3419 us, speedup=7.609x faster for the naive C reference implementation
IndecisiveTurtle has joined #ffmpeg-devel
abdu35 has quit [Ping timeout: 240 seconds]
<IndecisiveTurtle>
Lynne: I hacked some renderdoc integration into ffmpeg and noticed that with yuv444p16 all values in input buffer have 1 extra nibble. Shiftiing fetched values by 4 in upload shader fixes the graphics. Any idea why that could be?
<haasn>
versus time=560 us, ref=3341 us, speedup=5.960x faster from the initial prototype I had on the ML :)
<IndecisiveTurtle>
Also I would be curious if you think it would be possible to somehow get rdoc API integration into upstream. It makes debugging filters much more enjoyable. Though I'm also fine keeping it a local patch
<Lynne>
nibble?
<IndecisiveTurtle>
On cpu vc2 a value might be 0x1d0 on vc2_vulkan it is 0x1d00
<Lynne>
can you test if ffv1 works fine?
<Lynne>
where's the integration?
<IndecisiveTurtle>
Sure. What should I pass for -c:v to use ffv1 vulkan encoder?
<IndecisiveTurtle>
The rdoc integration I did only for vc2, as I need to insert the points where rdoc begins/ends capture manually.
<IndecisiveTurtle>
Also rdoc does not seem to work with shader object extension
<IndecisiveTurtle>
I had to disable it, otherwise you could not view any resources bound to shaders
<IndecisiveTurtle>
I would be open to working on it more for an actual implementation that could go upstream if you are open to it
<IndecisiveTurtle>
Making it work with all vk code/adding proper input handling etc
dionisis has quit [Quit: WeeChat 3.8]
<BtbN>
Is it just me or have spam filters recently gotten much worse? And not in the sense of missing a lot of spam, that actually seems fine
<IndecisiveTurtle>
ffv1 vulkan appears to be working fine
<BtbN>
but looking at my personal and my work spam filter, there's a bloody 50% false-positive rate
<BtbN>
at this kind of rate, I might as well not have a spam filter, cause I need to look at all those mails anyway, to make sure it's actually spam...
RuMi_Nos has quit [Quit: RuMi_Nos]
<Lynne>
IndecisiveTurtle: odd, how come vc2 reads 4 bits off
<IndecisiveTurtle>
I also checked ffv1_vulkan and it also receives the same "expanded" value yet is fine
<IndecisiveTurtle>
I wonder if there is some hidden conversion going on, but I'm not super sure where to look
realies has quit [Quit: ~]
realies has joined #ffmpeg-devel
thilo has quit [Ping timeout: 244 seconds]
<haasn>
BBB: x86inc unconditionally turns TAIL_CALL %1 into "call %1; vzeroupper; ret" on AVX2; why is this?
abdu67 has joined #ffmpeg-devel
thilo has joined #ffmpeg-devel
<BBB>
haasn: I think gramner might be a better person to ask that question to than me
<haasn>
replacing it by a simple "jmp" brings this function from 354 us to 264 us
<haasn>
cc Gramner
<BBB>
I think you shouldn't over-think some of the abstractions in x86inc.asm
<BBB>
if they're helpful, use them. but you should know what they do and not use them blindly
<BBB>
if they don't work, don't use them. they're not alwys universally helpful
<haasn>
fair
<BBB>
I admit that's not helpful but that's how I use it
abdu5 has quit [Ping timeout: 240 seconds]
<haasn>
does x86inc define a helper for the number of caller-saved registers I can assign before it would spill onto the stack? I might simplify my code by just always allocating the maximum number of registers and vector registers
<Gramner>
TAIL_CALL turns into a jmp when there's no epilogue. if a vzeroupper is required there is an epilogue
realies has quit [Quit: ~]
microchip_ has quit [Quit: There is no spoon!]
microchip_ has joined #ffmpeg-devel
<Gramner>
haasn: it does not, but fwiw that number of registers is 3 on x86-32, 7 on win64, and 9 on everything else
delewis has quit [Remote host closed the connection]
<haasn>
great, 7 is the max I need
<haasn>
I guess it's unlikely we'll see new, radically different x86 ABIs at this point; maybe I'm overthinking this
<Gramner>
if you're going to jump into another avx2 function at the end you can just use the jmp instruction directly
<Gramner>
APX is a thing :D
<Gramner>
adds another 16 regs, I would assume they will all be free to use without saving
<haasn>
right, the idea here is for each kernel to tail call into the next kernel
<haasn>
ramiro: I think I will hand write SIMD for x86_64 only and rely on the C templates for 32-bit
<haasn>
they are still faster than legacy swscale as we know from my past benchmarks
<Gramner>
ignoring 32bit is perfectly reasonable
<Gramner>
makes life so much easier
delewis has joined #ffmpeg-devel
<haasn>
especially given that I _already_ have a set of C templates that allow the compiler to generate reasonable code
<haasn>
especially with the GCC vectors code path enabled
realies has joined #ffmpeg-devel
dionisis has joined #ffmpeg-devel
BtbN has quit [Remote host closed the connection]
BtbN has joined #ffmpeg-devel
<ramiro>
haasn: I was thinking the same thing about 32-bit
IndecisiveTurtle has quit [Ping timeout: 265 seconds]
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]
Riviera has quit [Quit: leaving]
ccawley2011 has quit [Read error: Connection reset by peer]
HarshK23 has quit [Quit: Connection closed for inactivity]
Mirarora has joined #ffmpeg-devel
HarshK23 has joined #ffmpeg-devel
Warcop has quit [Remote host closed the connection]
abdu67 has quit [Quit: Client closed]
Marth64 has quit [Ping timeout: 252 seconds]
IndecisiveTurtle has joined #ffmpeg-devel
minimal has joined #ffmpeg-devel
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]