michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
<haasn> ramiro: the DECL_ is really just to save myself from going insane repeating these signatures a million times
<haasn> being able to redefine them to something else is a very unintended side effect
<haasn> that said, you can already overload the SwsOpFunc to whatever signature you want
<haasn> for custom calling conventions
<haasn> I imagine that a fully asm backend will not be reusing these macros at all
<haasn> it's just for the generic C code, which maybe even will be deleted again at some point in the future
<haasn> ramiro: something I really want to see somebody (me?) try is adding a SPIR-V backend
<haasn> and doing some GPU scaling :)
<haasn> just to get some numbers
<haasn> it should be actually relatively straightforward, probably easier than the C backend
<haasn> ramiro: btw, one quirk I noticed today is that SWS_OP_LSHIFT doesn't work for changing the bitdepth of yuva444p
<haasn> because the alpha channel is full range, not left shift
<haasn> in theory we need some sort of LSHIFT_AND_EXPAND_ALPHA hybrid op; though what happens in practice right now is that go via float and do a vec4() scale operation
<haasn> which is technically more correct as I will remind that the alpha channel technically requires dithering in this instance
<haasn> but we almost surely want some sort of performance flag to skip dithering full range expansions above a certain bit depth
<haasn> and in that case we will need some sort of hybrid narrow-and-full range shift; maybe something like a mask of components to do an expanding shift on
IndecisiveTurtle has quit [Ping timeout: 245 seconds]
cone-526 has joined #ffmpeg-devel
<cone-526> ffmpeg softworkz master:0978fea7fa78: avcodec/vlc: Reduce debug logging
<cone-526> ffmpeg softworkz master:e0aedeb72e05: avformat/webvttdec: Add webvtt extension and MIME type
<cone-526> ffmpeg Michael Niedermayer master:a90ff8128772: avcodec/ffv1enc: Factor set_micro_version() out of ff_ffv1_write_extradata()
<cone-526> ffmpeg Michael Niedermayer master:dcf614279426: avcodec/ffv1enc: add space for the remap table to max_size
<cone-526> ffmpeg Michael Niedermayer master:437cbd25e089: avcodec/ffv1: Implement jeromes idea of making remap flip optional
^Neo has joined #ffmpeg-devel
^Neo has quit [Changing host]
^Neo has joined #ffmpeg-devel
Thulinma has joined #ffmpeg-devel
thilo has quit [Ping timeout: 276 seconds]
thilo has joined #ffmpeg-devel
^Neo has quit [Ping timeout: 260 seconds]
<haasn> ramiro: I'm getting amazing numbers
<haasn> the new CPS approach matches or even beats the previous approachs' fused fast paths
<haasn> without even any fast paths..
<haasn> speedup=2.124x on a simple 16-bit endian swap
<haasn> time=1618 us, ref=3423 us, speedup=2.116x faster on yuv444p10le -> yuv444p16be (the only two operations I have implemented atm are byte swapping and shifting)
<haasn> compare to ~2000 us on the previous approach
<haasn> the per-op overhead is now next to nothing
<haasn> one pointer dereference and a jump instruction
<haasn> in fact, the overhead is so low that we can almost go down to SWS_CHUNK_SIZE 8 without significant performance drop
<haasn> allowing us to fit 32 bit vectors into AVX2 registers
<haasn> which may even end up faster in the general fast (where floats are being used)
<haasn> it also has some other nice benefits; since it's so cheap to just clear a vector to zero we can also rely on unread components being initialized to 0 (or perhaps 255 for alpha)
<haasn> and omit the clear operations in those cases
<haasn> maybe clearing always to -1 makes the most sense, since clearing alpha is the most common case
<haasn> and in the other cases involving clears we want to clear to 128 anyways
philipl has quit [Quit: leaving]
philipl has joined #ffmpeg-devel
jamrial has quit []
cone-526 has quit [Quit: transmission timeout]
amberjojo has joined #ffmpeg-devel
Martchus has joined #ffmpeg-devel
Martchus_ has quit [Ping timeout: 276 seconds]
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
<fflogger> [editedticket] Balling: Ticket #11505 ([avcodec] Cuvid decoders do not work with CUDA hwaccel anymore) updated https://trac.ffmpeg.org/ticket/11505#comment:10
ukn_unknown has joined #ffmpeg-devel
amberjojo has quit [Quit: Konversation terminated!]
ukn_unknown has quit [Ping timeout: 240 seconds]
amberjojo has joined #ffmpeg-devel
amberjojo has quit [Changing host]
amberjojo has joined #ffmpeg-devel
amberjojo has quit [Read error: Connection reset by peer]
jdek has quit [Ping timeout: 272 seconds]
jdek has joined #ffmpeg-devel
ngaullier has joined #ffmpeg-devel
ngaullier has quit [Remote host closed the connection]
ngaullier has joined #ffmpeg-devel
^Neo has joined #ffmpeg-devel
^Neo has quit [Changing host]
^Neo has joined #ffmpeg-devel
MyNetAz has quit [Remote host closed the connection]
MyNetAz has joined #ffmpeg-devel
^Neo has quit [Ping timeout: 272 seconds]
<fflogger> [newticket] RandomPerson: Ticket #11508 ([undetermined] MediaCodec as a whole is broken) created https://trac.ffmpeg.org/ticket/11508
MyNetAz has quit [Remote host closed the connection]
MyNetAz has joined #ffmpeg-devel
mkver has quit [Quit: Leaving]
<fflogger> [editedticket] joe: Ticket #7059 ([avfilter] Compile error in ffmpeg git master with opencv 3.4.1) updated https://trac.ffmpeg.org/ticket/7059#comment:3
<fflogger> [newticket] david-metrica: Ticket #11509 ([avcodec] Failed setup for format videotoolbox_vld - Video Decoding (H264 yuv420p)) created https://trac.ffmpeg.org/ticket/11509
<JEEB> BtbN: btw just noticed gnome's gitlab seemingly put https://xeiaso.net/blog/2025/anubis/ in front of their instance
<fflogger> [newticket] david-metrica: Ticket #11510 ([undetermined] Hardware accelerator failed to decode picture - Videotoolbox HEVC) created https://trac.ffmpeg.org/ticket/11510
Anthony_ZO has joined #ffmpeg-devel
mkver has joined #ffmpeg-devel
<Lynne> everyone I've heard from complain about scrapers recently has been happy with blocking alibaba's
^Neo has joined #ffmpeg-devel
^Neo has quit [Changing host]
^Neo has joined #ffmpeg-devel
rvalue- has joined #ffmpeg-devel
cone-467 has joined #ffmpeg-devel
<cone-467> ffmpeg Andreas Rheinhardt master:e4c8e80a2efe: avcodec/x86/constants: Move constants only used by cavsdsp to it
rvalue has quit [Ping timeout: 252 seconds]
ahmedhamed has joined #ffmpeg-devel
rvalue- is now known as rvalue
jamrial has joined #ffmpeg-devel
<jamrial> wbs: weird, i can't reproduce the failure
<wbs> jamrial: it seems to pass on x86_64, but fail on i686, arm and aarch64
<wbs> right
<wbs> fails for me in a linux/gcc/i386 setup too
ccawley2011 has joined #ffmpeg-devel
ccawley2011 has quit [Read error: Connection reset by peer]
ccawley2011 has joined #ffmpeg-devel
<jamrial> wbs: the problem is probably in the unscaled path for semiplanar copy
<jamrial> so the test did its job at finding something :p
<wbs> of course! that's usually the case when there's lacking test coverage :)
<ramiro> haasn: I'm kind of hoping you will write the spir-v backend :P
<ramiro> I know nothing about spir-v, but I want to play around with emitting assembly directly.
<haasn> I'm kind of hoping Lynne will :P but yeah I'll get to it eventually
rvalue has quit [Read error: Connection reset by peer]
MyNetAz has quit [Remote host closed the connection]
rvalue has joined #ffmpeg-devel
ahmedhamed has quit [Quit: Connection closed for inactivity]
MyNetAz has joined #ffmpeg-devel
realies has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 268 seconds]
<toots5446> Lynne: I started the next phase of the ogg work. Turns out vorbis has frames that need several packets to be decoded it seems.
<toots5446> There are competing mechanisms in place for metadata-related updates with packets: what's the difference between AV_PKT_DATA_METADATA_UPDATE and AV_PKT_DATA_STRINGS_METADATA ?
ccawley2011 has joined #ffmpeg-devel
ccawley2011_ has joined #ffmpeg-devel
cone-467 has quit [Quit: transmission timeout]
ccawley2011 has quit [Ping timeout: 260 seconds]
<Lynne> toots5446: use metadata_update
<Lynne> jamrial: are you going to send a patch for ffv1 to skip decoding on correct parsing?
<jamrial> yeah
<Lynne> thanks
<haasn> ramiro: oh no...
<haasn> I ran into a GCC codegen bug
<haasn> this is terrible
<Lynne> are you using the native vectors?
<haasn> yeah :/
<haasn> actually, it seems it's not an outright bug, just suboptimal code
<haasn> nvm then
minimal has joined #ffmpeg-devel
<Lynne> yeah, they've been broken as hell ever since I first thought using them is a good idea
<Lynne> 10 years ago
<Lynne> it doesn't help that clang and gcc have different ideas on what they should be named and how they should work
ccawley2011__ has joined #ffmpeg-devel
<jamrial> wbs: should be fixed by the patch i just sent
ccawley2011_ has quit [Ping timeout: 260 seconds]
<jamrial> guess nobody ever converted semiplanar to semiplanar with different bitdepths until now
s55 has quit [Ping timeout: 276 seconds]
Anthony_ZO has quit [Ping timeout: 268 seconds]
ccawley2011__ has quit [Ping timeout: 260 seconds]
<wbs> jamrial: thanks!
s55 has joined #ffmpeg-devel
<ramiro> haasn: it's good that every few years people try again to autovectorize stuff with gcc/clang, but the conclusion has pretty much always been "not worth it". I don't mean to throw cold water on your work, but I think the end result will be pretty much hit and miss accross architectures and gcc versions. we might even end up with code that's slower than the c reference.
<haasn> time=707 us, ref=704 us, speedup=0.996x slower # gray8 -> yuvj444p
<haasn> down from time=738 us, ref=702 us, speedup=0.951x slower for the old approach
<haasn> so we now match hand-written asm for clearing chroma
<haasn> even with the extra function call overhead on top
<ramiro> I think it will on average give good enough results, which will already be better than what we currently have for non-optimized code. also having a description of the steps taken in the scaler is a huge bonus, since it allows us to check the correctness of each conversion and not have subtle differences in each specific x->y converter.
<haasn> ramiro: sure, I think people are misunderstanding; I'm never saying we don't need hand written asm
<ramiro> haasn: I understand that. but I think there's only so much effort that should be spent in autovectorization. other parts of your work are much more valuable imo.
<haasn> actually, the main reason I am investing so much time into the C code is because it allows me to iterate on the design and get performance numbers that will be representative of an eventual asm rewrite
<haasn> it's still an order of magnitude easier to get good autovectorized code than hand written asm
<ramiro> haasn: might be, but hand writing asm is more fun and more predictable :)
<haasn> well in any case, I think the current approach is closer to final
<haasn> so if you want to have fun writing some asm routines, the very big offender currently is read/write_packed2/3/4
<ramiro> haasn: on neon there are single instructions for that
<haasn> I think I will refactor SwsOpExec slightly to make it more asm friendly but apart from that the current signature should be pretty close to final
<ramiro> do you plan on adding downsampling with easy chroma offsets (like central for example)?
<ramiro> (as an SwsOp I mean)
<haasn> not at the moment
<haasn> I want to rewrite all of the missing ops from swscale3 on top of swscale4 and then submit that for review again
<ramiro> ok
<haasn> but yes the goal is eventually to have dedicated up/downsamplers for these cases, I think
<ramiro> haasn: currently it only does one line at a time, right? so for now there's no way to have 2 lines input in vectors
<haasn> not necessarily
<haasn> the calling code runs the opchain in an access pattern that is striped over luma rows
<haasn> but the implementation could internally do whatever it wants
<haasn> so something we could do is define a separate UNSUBSAMPLE() op that has its own signature where it gets two chroma lines
<haasn> or something like that
<haasn> yet another advantage of the cps approach
<haasn> you can add or remove vectors inside the operations chain as long as each function's output matches the signature of the next input
<haasn> ramiro: https://github.com/haasn/FFmpeg/commits/swscale4 current WIP if you want to take a look at it
<haasn> I like it a lot more than the previous approach; it's also less bloated as we don't really need to define so many "fused" variants
<haasn> so in contrast, we can define more "subset" variants of common ops, e.g. ignoring alpha when doing an op like convert u8 -> f32
<haasn> and on top of this, ops code is quite a bit smaller
kasper93 has quit [Remote host closed the connection]
<fflogger> [newticket] kierank: Ticket #11511 ([undetermined] Haivision Pro360 device ships FFmpeg nonfree) created https://trac.ffmpeg.org/ticket/11511
minimal has quit [Quit: Leaving]
ccawley2011 has joined #ffmpeg-devel
jdek has quit [Ping timeout: 248 seconds]
jdek has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 276 seconds]
IndecisiveTurtle has joined #ffmpeg-devel
ccawley2011 has joined #ffmpeg-devel
IndecisiveTurtle has quit [Quit: IndecisiveTurtle]
ngaullier has quit [Remote host closed the connection]
LainExperiments has joined #ffmpeg-devel
ccawley2011 has quit [Ping timeout: 248 seconds]
ccawley2011 has joined #ffmpeg-devel
cone-515 has joined #ffmpeg-devel
<cone-515> ffmpeg Niklas Haas master:ae84aa775fa7: swscale/utils: split off format code into new file
LainExperiments has quit [Quit: Client closed]
HarshK23 has quit [Quit: Connection closed for inactivity]
minimal has joined #ffmpeg-devel
LainExperiments has joined #ffmpeg-devel
kasper93 has joined #ffmpeg-devel
<haasn> c->xyzgammainv[i] = round(pow(i / 65535.0, xyzgammainv) * 4095.0);
<haasn> is there any reason why this would give a different result on 32 bit and 64 bit?
<haasn> it seems that our XYZ conversion tables are subtly different on 32 bit; but they are all declared as double
<haasn> seems like the 32 bit code uses 80 bit extended precision
<jamrial> haasn: x87 has 80 bit intermediates, yes. it's why those tests are failing. force sse math and it's "fixed"
<haasn> magic value to break it is i == 3415
<haasn> jamrial: how would you recommend doing that?
<haasn> do we have a precedent?
<jamrial> don't think so
<jamrial> it would require forcing sse2 (for doubles) on an arch that technically (but not realistically) may lack it
<jamrial> "-msse2 -mfpmath=sse" for gcc
<jamrial> mkver tried it, and some other test failed in turn
<haasn> annoying
ahmedhamed has joined #ffmpeg-devel
<haasn> GCC man page claims that -fexcess-precision=standard should be implied by -std=c99
<haasn> and presumably above
<haasn> but even with explicitly setting that option I get differing results
<nevcairiel> standard only ensures the right precision on assignments, intermediates will sitll be in the full 80 bit registers
<haasn> I tried casting (and storing) after every single op and the result is still different
<nevcairiel> i imagine it wont stop it from optimizing those away
<haasn> urgh
<wbs> yes, with floats, it requires a lot of effort and compiler/toolchain specific twiddling to get bitexact results. in practice, you need to allow some tolerance in the end if you're using floats
<haasn> that would amount to disabling the test
<haasn> maybe we can just ignore the regression until we switch to the new pipeline I'm working on
<haasn> it will handle xyz conversion much more efficiently as a side effect
<fflogger> [editedticket] cgbug: Ticket #11490 ([avformat] [Regression] Audio silent for long MOV file) updated https://trac.ffmpeg.org/ticket/11490#comment:9
<mkver> haasn: What makes you believe that the new pipeline will be bitexact?
LainExperiments has quit [Quit: Client closed]
LainExperiments has joined #ffmpeg-devel
<haasn> mkver: well, if -fexcess-precision=standard does what it claims at all, we can make our float ops bit exact; with significantly fewer cases to worry about, as long as we make all float ops bit exact, the result of concatenating them has to be bit exact as well
<haasn> is what I would say, but it seems that -fexcess-precision=standard does not even do anything at all
<haasn> ah it does, it inserts extra fstp / fldp sequences
<haasn> but what I'm more worried about is the possibility of us having to define a canonical order for how to do float summation (for filtering)
<haasn> maybe we will need a fixed precision path in the end to respect BITEXACT
<wbs> that's a bit problematic though, if all our tests run in bitexact mode, it'd mean that the real-world used performance codepath would be essentially uncovered by tests
<haasn> true
<haasn> well, we can still have a bit exact reference C path
<wbs> for e.g. many audio decoders, we keep a full reference file, and do off-by-one (or off-by-few) test comparisons against that
<haasn> even with float math
<haasn> as long as we don't enable things like -fassociative-math
<haasn> and then non bit exact asm routines could choose a different summation order for things like convolutions
<jannau> if there is a know (and tested) bit-exact path you could use that as reference and do float ulp based comparissons
<wbs> the really annoying thing wrt float exactness is if you need to invoke math functions, as they can have much more differing precision than a couple ulps
<wbs> I remember cases (from ~20 years ago) in trying to get a float based game engine to give identical simulation results across various systems (windows/i386, linux/i386 and macos/ppc at the time)
<wbs> there were cases like sin(almost_pi) which should be close to zero, but on some system (don't rememeber if it was i386 or ppc) you'd get a float with over half of the mantissa just being zero bits
<haasn> I think that this heavily applies to the code inside cms.c fwiw
<haasn> I think our gamut/tone mapping code is 100% not bitexact
<haasn> but this is already the status quo
<haasn> I am pretty hesitant to introduce math functions to the format ops pipeline at this stage
<haasn> I'd rather pregenerate LUTs with AVRational for any sort of gamma conversion
<haasn> if only to avoid locking us out of a fixed precision path in the future
Teukka has quit [Read error: Connection reset by peer]
<toots5446> I've got the next step with chained ogg stuff. I've got it where I can do a while ffmpeg -c copy and keep chained streams w/ metadata with consistent PTS/DTS, pretty cool.
<toots5446> Would be nice to get the first round commited w/ the samples in FATE so we can finish this. I might send a WIP patch series over the weekend to get some early feedback.
Teukka has joined #ffmpeg-devel
Teukka has quit [Changing host]
Teukka has joined #ffmpeg-devel
<toots5446> Also ment to add: also removed the ogg header packets from subsequent chained streams from the demuxer. Turns out that the ogg muxer is able to reconstruct those headers so there wasn't much to do on that front except remove them :-)
<Lynne> every repository I try to pull out of gets me a 502
<Lynne> all this scraping is getting heavy
Mirarora has joined #ffmpeg-devel
<Lynne> and its not like any of this goes to search engines
cone-515 has quit [Quit: transmission timeout]
<BtbN> pull via ssh
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]
Mirarora has joined #ffmpeg-devel
blb has quit [Ping timeout: 260 seconds]
blb has joined #ffmpeg-devel
ahmedhamed has quit [Quit: Connection closed for inactivity]
realies has quit [Quit: Ping timeout (120 seconds)]
realies has joined #ffmpeg-devel
kasper93 has quit [Ping timeout: 248 seconds]
kasper93 has joined #ffmpeg-devel
ccawley2011 has quit [Read error: Connection reset by peer]
lemourin has quit [Quit: The Lounge - https://thelounge.chat]
lemourin has joined #ffmpeg-devel
lemourin has quit [Client Quit]
lemourin has joined #ffmpeg-devel
witchymary has quit [Remote host closed the connection]
witchymary has joined #ffmpeg-devel
derpydoo has joined #ffmpeg-devel
derpydoo has quit [Remote host closed the connection]
derpydoo has joined #ffmpeg-devel