michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.0.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
_DuBPiRaTe_ has quit [Remote host closed the connection]
_DuBPiRaTe_ has joined #ffmpeg-devel
thilo has quit [Ping timeout: 255 seconds]
thilo has joined #ffmpeg-devel
_DuBPiRaTe_ has quit [Remote host closed the connection]
_DuBPiRaTe_ has joined #ffmpeg-devel
_DuBPiRaTe_ has quit [Remote host closed the connection]
cone-726 has quit [Quit: transmission timeout]
markh has joined #ffmpeg-devel
lexano has quit [Ping timeout: 268 seconds]
lemourin has quit [Quit: The Lounge - https://thelounge.chat]
lemourin has joined #ffmpeg-devel
arch1t3cht has quit [Read error: Connection reset by peer]
arch1t3cht has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
Kei_N has quit [Quit: leaving]
System_Error has joined #ffmpeg-devel
jamrial has quit []
Martchus_ has joined #ffmpeg-devel
Martchus has quit [Ping timeout: 268 seconds]
<Lynne> they've properly overengineered DRC for xHE-AAC
<Lynne> never thought I'd be writing spline curve interpolation for a non-graphic application
<Lynne> I don't think this belongs to a decoder, but putting it into a filter feels like libpostproc all over again
<Lynne> meh, I'll drop this for now until someone actually complains (they won't, if anything disabling DRC improves quality because there's 16 knobs for an encoder to tweak on a per-frame basis)
Kei_N has joined #ffmpeg-devel
ramiro has quit [Ping timeout: 260 seconds]
ramiro has joined #ffmpeg-devel
zsoltiv__ has quit [Ping timeout: 252 seconds]
sepro has quit [Ping timeout: 256 seconds]
sepro has joined #ffmpeg-devel
mkver has joined #ffmpeg-devel
AbleBacon has quit [Read error: Connection reset by peer]
cone-788 has joined #ffmpeg-devel
<cone-788> ffmpeg Leo Izen master:539d2e989d7c: avcodec/aacdec_lpd: remove unused local variables
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
markh has quit [Ping timeout: 256 seconds]
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
ngaullier has joined #ffmpeg-devel
Krowl has joined #ffmpeg-devel
markh has joined #ffmpeg-devel
Livio has joined #ffmpeg-devel
Livio has quit [Ping timeout: 256 seconds]
Krowl has quit [Read error: Connection reset by peer]
Livio has joined #ffmpeg-devel
ccawley2011 has joined #ffmpeg-devel
cone-788 has quit [Quit: transmission timeout]
compnn has joined #ffmpeg-devel
compn has quit [Read error: Connection reset by peer]
Kei_N has quit [Ping timeout: 252 seconds]
Krowl has joined #ffmpeg-devel
Kei_N has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
deus0ww has quit [Ping timeout: 268 seconds]
deus0ww has joined #ffmpeg-devel
j45_ has joined #ffmpeg-devel
j45 has quit [Ping timeout: 240 seconds]
j45_ is now known as j45
j45 has quit [Changing host]
j45 has joined #ffmpeg-devel
<Daemon404> isnt drc one of the things that the marketgeneers go on about re: xhe-aac?
<JEEB> yea. and I think at least for some things it's part of the reference output?
<Daemon404> i think that was part of the selling point iirc
<Daemon404> (sort of like av1 with grain)
mkver has quit [Ping timeout: 272 seconds]
mkver has joined #ffmpeg-devel
Krowl has quit [Read error: Connection reset by peer]
<kurosu> I think we do have some people in this community/room that, while behind av1, picked that over opus. For that reason or else
System_Error has quit [Remote host closed the connection]
novaphoenix has quit [Quit: i quit]
novaphoenix has joined #ffmpeg-devel
<haasn> michaelni: is there ever a use case for accurate_rnd and bitexact to be set differently? why not merge the two into a single 'bitexact' option?
jamrial has joined #ffmpeg-devel
<ramiro> haasn: michaelni: on that subject, it would be interesting to properly define what accurate_rnd and bitexact mean in libswscale.
<haasn> for sure
<ramiro> I'm thinking specifically about yuv2rgb, where the C code uses LUTs, which are a bit less accurate than the output from simd code which does the actual yuv to rgb conversion.
<haasn> I will most likely end up having to standardize the post yuv2rgb representation to fix this disparity
<haasn> I'm thinking that we probably want to decide on one internal representation and stick to it for all processing steps
<haasn> or, more likely, two internal representations - HBD and LBD
System_Error has joined #ffmpeg-devel
<haasn> we currently use a mix of 15-bit and 19-bit (for HBD) inside the scaling pipeline
<ramiro> pardon my ignorance, does hbd and lbd mean high bit depth and low bit depth?
<haasn> exactly
<JEEB> :D
<haasn> 15-bit internal for 8-bit input
<haasn> 19-bit internal for 10/12/14/16 bit input
<haasn> but the intermediate format used in between rgb2yuv and hscale is pretty arbitrary and every platform defines its own preference here
Krowl has joined #ffmpeg-devel
<ramiro> I assume that's all properly documented and clearly explained in the implementation, right? /jk
<haasn> I think that we may end up having to scrap the rgb2yuv/yuv2rgb parts and replacing them by a more general 3x3 matrix application routine that we can re-use in other places as well
<michaelni> we need 2 internal representations because 16 vs 32 bit based SIMD, for 14+bit input 16bit SIMD is just not accurate enough and for 8bit using 32bit SIMD is half speed
<haasn> for example some conversions are very nontrivial and the current infra doesn't handle it
Livio has quit [Ping timeout: 256 seconds]
<haasn> ramiro: it's properly documented if you consider source code to be proper documentation :)
<michaelni> also please update doc/swscale.txt with anything thats missing in it
<ramiro> haasn: the same way the mame project says that they are not writing emulators, but documentation, in the form of code.
<ramiro> haasn: I'm a bit skeptical about moving from rgb2yuv/yuv2rgb to something more general. I can't possibly imagine this would be faster or at least the same speed as we currently have with the unscaled converters. you'll probably have to write a poc with some benchmarks before moving forward with this idea.
<haasn> ramiro: the idea right now is to have a small set of general purpose routines that can cover all use cases, but allow platforms to define “merged” special case functions in cases where benchmarks justify the added gain
<BtbN> Isn't that how is already works? Specialised paths for high profile paths, everything else goes to a defined default format first, and from there to the desired output?
<haasn> BtbN: yes in a way, but currently it's pretty ad-hoc and hard-coded, I want to define a system where _any_ two primitives could be trivially merged if desired
<BtbN> yeah, the current system is a royal mess
<BtbN> I've also been looking into something similar for scale_cuda. The idea would be to generate PTX code on the fly, to buiild a specialised kernel
<BtbN> Cause the current approach of the one mammoth-kernel leads to 20+ seconds of compile time already
<BtbN> but tbf, the advantage is once it's compiled, it's cached and never needs to compile again
<haasn> BtbN: so my idea is to have code that resolves a list of operations to get from A to B, think {OP1, OP2, OP3, OP4, ...}; and platforms can define an arbitrary list of "merged" helper routines, for example {special_op23, {OP2, OP3}}
<haasn> which the dispatch code can then understand and simplify to {OP1, special_op23, OP4, ...}
<haasn> platforms could define as many of these as they want, and also mark them as bitexact vs not-bitexact
<BtbN> yes, that's basically the design of scale_cuda, except it's compile-time
<BtbN> it generates a heapload of kernels, which have such a list of operations in their name
<haasn> but I want to try and keep the "basic" set of primitives as small as possible; which will also define the "bitexact" reference
<haasn> so we might explicitly e.g. double chroma, apply 3x3 matrix, apply 1D LUT, apply 3x3 matrix, hscale, vscale, apply 3x3 matrix, apply 1d lut, dither
<BtbN> I Wonder if a cuda scaler could be plugged into that
<BtbN> but it'd be a lot less efficient
<BtbN> scale_cuda is as fast as it it cause every conversion is one singular CUDA kernel call, and the desired format comes out of it
<haasn> I am thinking the primitives will be some combination of "hfilter, vfilter, hdouble, vdouble, hhalve, vhalve, 3x3 matrix, 3x1D LUT, 3D LUT"
<haasn> BtbN: quite possibly it could
<BtbN> It'd be run kernel, pass outpuit to other kernel, to another kernel, and so on though?
<haasn> the part that worries me moreso is the overhead from reading/writing
<haasn> well
<haasn> the idea I have to make this efficient is to only ever operate on an L1-sized chunk of data at a time
<haasn> so for example you take the first 32 pixels, apply each operation in sequence to it
<haasn> then take the next 32 pixels, and so on
<haasn> a write/read through L1 cache should be pretty fast
<BtbN> for CPU processing, yeah
<BtbN> But at least for CUDA, that seems counterproductive
<haasn> well, I was thinking about how to do this with vulkan/libplacebo
<haasn> and the usual approach would be to have the "operations" actually generate a shader
<haasn> so, here's an idea I have
<BtbN> Yeah, that's how scale_cuda works. The shader is just called kernel.
<haasn> suppose that each platform defines a magic constant for how large of a chunk of data it wants to process at a time
<haasn> for example on RISC-V we might define 32 pixels as the system constant
<haasn> or on AVX-512 we might define 64 pixels, or whatever (numbers made up)
<haasn> then on a platform like vulkan/cuda we could define -1 as the system size which means "process the entire image in one go"
<BtbN> For CUDA, the slow part is calling a kernel. You want to minimize Software-Side CUDA API calls
<BtbN> So one humongous kernel which you call exactly one per frame is quite optimal
<haasn> and the actual kernel would literally just construct the shader and then dispatch it in the final operation
<haasn> yeah
<BtbN> The kernel _is_ the shader
<haasn> (I'm just using vulkan terminology)
<BtbN> Does Vulkan have anything called Kernel?
<haasn> nope
<haasn> well, SPIR-V has both shaders and kernels
<haasn> the only difference between the two being that kernels can dispatch other kernels
<haasn> but shaders cannot dispatch other shaders
<haasn> and kernels are only available under OpenCL, not Vulkan
<haasn> (meanwhile shaders are, you guessed it, only available under Vulkan, not OpenCL)
rvalue has quit [Read error: Connection reset by peer]
rvalue has joined #ffmpeg-devel
ngaullie has joined #ffmpeg-devel
ngaullier has quit [Ping timeout: 268 seconds]
ngaullie has quit [Ping timeout: 268 seconds]
ngaullier has joined #ffmpeg-devel
<haasn> ramiro: btw, do you know who uses sws_send_slice/sws_receive_slice? FFmpeg itself doesn't
<michaelni> about accurate_rnd and bitexact, someone needs to go over the code and list what the do exactly. I see for example a case where without bitexact we use the full SIMD register size for filters but with bitexact we zero after the filter length. So bitexact is actually less accurate but match C in that case i think
<michaelni> also bitexact disables some native only endianness code
<haasn> can't find anything on github either except wrappers and forks
<haasn> I was wondering about whether we could possibly simplify that design (send+receive) and if so, who would be impacted + what their preference for an API would be
<haasn> michaelni: the more confusing ones are SWS_FULL_CHR_H_INT / SWS_FULL_CHR_H_INP
<michaelni> sws_send/receive_slice was added by anton, maybe he knows about users
<haasn> I was generally thinking that we should start from the PoV of always generating fully accurate output and only selectively re-adding these flags in a documented manner iff a legitimate use case can be demonstrated
<haasn> I remember reading a document somewhere sometime explaining what these flags exactly do
<michaelni> some of the chroma subsampling speed vs quality things should maybe be quality by default in 2024
<haasn> but I lost it
<haasn> yeah exactly; especially now that codecs probably take more like 99% of processing time in a typical encoding pipeline
<haasn> and that on decode scaling is usually done by GPU
<michaelni> IIRC one of them was turning input RGB into 4:2:2 or so instead of 4:4:4 during rgb2yuv i need to RTFS :)
<haasn> yeah something like that
<elenril> I added send+receive because michaelni wanted it
<elenril> I am not aware of it having any users
<ramiro> haasn: you'd have to ask elenril about sws_send_slice/sws_receive_slice, it seems he's the one that added it.
<haasn> so it seems nobody remembers, great
System_Error has quit [Remote host closed the connection]
<elenril> I do remember
<elenril> it has no users
<haasn> fair
<elenril> the idea was to interoperate with apis like draw_horiz_band
System_Error has joined #ffmpeg-devel
<michaelni> a pipeline from decoder over various filters and scale to an output sink that could work at slice granularity could benefit from L2 cache, also would reduce latency with frame multithreading
cone-240 has joined #ffmpeg-devel
<cone-240> ffmpeg Ramiro Polla master:a8e2714d8245: libavcodec/mjpeg: preserve unclipped last_dc value
<haasn> elenril: michaelni: could be done also with the simpler API sws_scale_frame_slice(dst, src, ystart, height)
<haasn> would be synchronous instead of asynchronous
<cone-240> ffmpeg Ramiro Polla master:1fb77347c8d9: checkasm: add tests for yuv2rgb
<haasn> afaict there's literally no reason for send_slice() to ever exist because all it does is add the slices to a list, receive_slice() is what does the actual processing
<haasn> so you might as well have receive_slice() be scale_slice()
<haasn> I think I'll go with that instead for my new API
<elenril> the reason it's like this is to allow making it actually async without breaking compat
<elenril> I have no strong attachment to it though
Traneptora has quit [Quit: Quit]
<wellsakus> @jamrial Yesterday I submitted new patchset for EVC implementation (https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=12261)
<wellsakus> The patchset is importatnt since it follows API changes in the libxeve.
<wellsakus> Please review it and if everything is OK please merge it.
Traneptora has joined #ffmpeg-devel
Krowl has quit [Read error: Connection reset by peer]
<cone-240> ffmpeg Frank Plowman master:83b77990c6a3: lavc/vvc: Always set flags for the current picture
<haasn> is it possible to have interlaced bayer frames? :thinking:
<Lynne> Daemon404: yup, DRC is the only reason why anyone would pick xhe-aac over opus
<Lynne> the audio codec is just as efficient as baseline aac-lc, which is to say, 20 years behind opus
Krowl has joined #ffmpeg-devel
<Lynne> they just redid the bitstream and strapped on a terrible entropy coding system
<Daemon404> that's not fair, they may as pick xhe-aac if they work for fraunhofer.
<Daemon404> that's two whole reasons
<Lynne> funnily enough all encoders are written by fraunhofer devs
<Lynne> oh, also a new stereo prediction system kinda similar to CfL, which was a curious experiment that tried to reduce the issues I/S had
<Lynne> only that Opus did it better again by removing all issues by simply L2 normalizing all bands' coeffs
<Lynne> DRC is another spec entirely and not tied at all to the codec, so it can be easily added to another codec in the same way that DoVi works on av1
<Lynne> but good frame-level DRC can't beat a well-mastered, compressed and normalized stream
<jamrial> wellsakus: ok
Livio has joined #ffmpeg-devel
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ffmpeg-devel
sgm has quit [Ping timeout: 260 seconds]
sgm has joined #ffmpeg-devel
compn has joined #ffmpeg-devel
compnn has quit [Read error: Connection reset by peer]
mkver has quit [Remote host closed the connection]
mkver has joined #ffmpeg-devel
<cone-240> ffmpeg Dawid Kozinski master:3e6c7948626f: avcodec/evc: Alterations following changes in libxeve
compnn has joined #ffmpeg-devel
compn has quit [Ping timeout: 252 seconds]
<wellsakus> @jamrial Thanks for review and merging
Livio has quit [Quit: Reconnecting]
Livio has joined #ffmpeg-devel
AbleBacon has joined #ffmpeg-devel
Krowl has quit [Read error: Connection reset by peer]
<kurosu> ramiro: I checked JPEG/ITU-T.81, there is no clipping to the DC after DIFF+PRED. People lived in a nice world back then. No fuzzing, overflows, evil bitstreams. We don't, I guess all the clips are for trying to recover from accidents
jess has joined #ffmpeg-devel
ngaullier has quit [Ping timeout: 264 seconds]
compnn has quit [Read error: Connection reset by peer]
compn has joined #ffmpeg-devel
<cosminaught> Lynne: support for seamless ABR switching due to IPF frames is another reason in addition to DRC
<cosminaught> the low bitrate quality around 24-32 kbps also seemed better than Opus from what I recall, at least according to ViSQOL
<cone-240> ffmpeg James Almer master:3478cf2c2dbd: avfilter/vf_showinfo: print more Stereo 3D fields
<cone-240> ffmpeg James Almer master:6da38e11f6a7: avfilter/vf_showinfo: don't use sizeof(AVStereo3D)
<cone-240> ffmpeg James Almer master:778096757dc0: avfilter/vf_showinfo: use av_spherical_projection_name()
<cone-240> ffmpeg James Almer master:beacdbf4b4b8: avfilter/vf_showinfo: only print yaw, pitch, and roll if set
<cone-240> ffmpeg James Almer master:0cb733d27647: avfilter/vf_showinfo: don't use sizeof(AVSphericalMapping)
<cone-240> ffmpeg James Almer master:826f55d5b3dd: avcodec/cbs_sei: add support for Frame Packing Arrangement SEI parsing
<cone-240> ffmpeg James Almer master:e0b574c483db: avcodec/cbs_h266: move decoded_picture_hash to CBS SEI
<cone-240> ffmpeg James Almer master:1c8b32e19f83: avutil/stereo3d: add a Stereo3D type to signal that the packing is unspecified
<cone-240> ffmpeg James Almer master:8af0919cc668: avutil/stereo3d: add a Stereo3D view to signal that the view is unspecified
<cone-240> ffmpeg James Almer master:0b330d8642ee: avformat/mov: set Stereo3D type when parsing eyes box
Livio has quit [Ping timeout: 268 seconds]
MrZeus has joined #ffmpeg-devel
Krowl has joined #ffmpeg-devel
mkver has quit [Ping timeout: 252 seconds]
IndecisiveTurtle has joined #ffmpeg-devel
IndecisiveTurtle has quit [Ping timeout: 264 seconds]
<Lynne> cosminaught: yeah, its neat how they package extradata and an empty frame, and require you to run it so you can immediately init and output something
<Lynne> but they're a bit rare, usually once every few seconds or so with most streams I found
<Lynne> opus gets away without really needing them by relying on the receiver to keep track of extradata, and the overlap being barely 2.5ms,
mkver has joined #ffmpeg-devel
Krowl has quit [Read error: Connection reset by peer]
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
<cosminaught> they're supposed to be rare, because they add overhead, but just like in video we can get away with adaptive switching every 2-5s. And you can make the audio segment size to match the video segment size
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
<haasn> michaelni: https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavfilter/vf_scale.c#l654 `scale->output_is_pal` can only ever be 0?
<haasn> AV_PIX_FMT_PAL8 is the only pixfmt with AV_PIX_FMT_FLAG_PAL and it's replaced by a different pixfmt immediately before this check
<Lynne> cosminaught: 280ish bits isn't going to cost much, particularly in a VBR codec
<Lynne> but in the grand scheme of things, aac (and usac) has A LOT of overhead
<haasn> apart from the interlaced field offset (has an interlaced pal8 frame ever existed in a real media file that a human being has passed through vf_scale?), the only thing it affects is this line: https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavfilter/vf_scale.c#l986
compn has quit [Ping timeout: 264 seconds]
compn has joined #ffmpeg-devel
<haasn> michaelni: https://0x1.st/iLh0.txt does this seem like a correct/reasonable description of SWS_FULL_CHR_H_INT / SWS_FULL_CHR_H_INP ?
MrZeus has quit [Read error: Connection reset by peer]
MrZeus has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
cone-240 has quit [Quit: transmission timeout]
compn has quit [Read error: Connection reset by peer]
compn has joined #ffmpeg-devel
compnn has joined #ffmpeg-devel
compn has quit [Read error: Connection reset by peer]
HarshK23 has quit [Quit: Connection closed for inactivity]
<michaelni> haasn, output_is_pal could be true in more cases prior to 985c0dac674846721ec8ff23344c16ac7d1c9a1e
<michaelni> interlaced pal8 should exist if someone converts interlaced input to pal8 without deinterlacing
<michaelni> simple mpeg2 -> any pal8 based codec would do this
<michaelni> haasn, your descriptions sound reasonable. Iam not sure they capture every oddity of the flags but they are a big step ahead from the lack of documentation
<Lynne> is there a point to maintaining palette-based frames these days, rather than letting the decoder handle the conversion?
System_Error has joined #ffmpeg-devel
ccawley2011 has quit [Read error: Connection reset by peer]
compnn has quit [Read error: Connection reset by peer]
compn has joined #ffmpeg-devel
compnn has joined #ffmpeg-devel
compn has quit [Read error: Connection reset by peer]
compn has joined #ffmpeg-devel
compnn has quit [Read error: Connection reset by peer]
MrZeus has quit [Read error: Connection reset by peer]
MrZeus has joined #ffmpeg-devel
mkver has quit [Ping timeout: 255 seconds]
Warcop has joined #ffmpeg-devel
mkver has joined #ffmpeg-devel
zip6como has joined #ffmpeg-devel
haihao has quit [Ping timeout: 268 seconds]
haihao has joined #ffmpeg-devel
compn has quit [Ping timeout: 264 seconds]
compn has joined #ffmpeg-devel
pal_ is now known as pal
ramiro has quit [Ping timeout: 252 seconds]
ramiro has joined #ffmpeg-devel
ramiro has quit [Ping timeout: 268 seconds]
ramiro has joined #ffmpeg-devel
<michaelni> some encoders are palette based, gif for example so carriing the palette from the decoder or some RGB -> palette filter is a use case for these
mkver has quit [Ping timeout: 268 seconds]