michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 6.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
AbleBacon has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
Marth64 has quit [Ping timeout: 260 seconds]
Marth64 has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
ravenJPL has joined #ffmpeg-devel
Livio has quit [Ping timeout: 255 seconds]
ravenJPL has quit [Quit: Leaving]
paulk has quit [Ping timeout: 272 seconds]
thilo has quit [Ping timeout: 256 seconds]
iive has quit [Quit: They came for me...]
paulk has joined #ffmpeg-devel
paulk has joined #ffmpeg-devel
paulk has quit [Changing host]
thilo has joined #ffmpeg-devel
thilo has quit [Changing host]
thilo has joined #ffmpeg-devel
paulk has quit [Ping timeout: 268 seconds]
paulk has joined #ffmpeg-devel
mkver has quit [Ping timeout: 252 seconds]
<sdc>
sort of an asm question: I've been trying to write a SIMD lut lookup, using YMM/AVX2, so for the 16bpc version for CTU widths 16 to 128 I just did in a single function and just calculating how many times to loop. I was trying to figure out a good way to do 4 and 8 pixel widths (I was thinking vmaskmov) but it seems like other SIMD versions just have separate functions for each CTU width? is that correct?
<Lynne>
sdc: what's the element size?
<Gramner>
it is common to have a branch or jump table for different widths. you can for example have 3 different code paths, one for w=4, one for w=8, and one for w>8
<Gramner>
you probably don't want to use any of the maskmov instructions as they are non-temporal
<sdc>
16 bits for the 10/12bit version
<Gramner>
actually some are not no-temporal. but they can be really slow, up to 42 µops
<sdc>
yeah when I was benchmarking the maskmov was quite a bit slower
<sdc>
that's when I decided to try and see what the other implementations looked like haha probably should've done that sooner
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ffmpeg-devel
<Lynne>
oof, 16 bit elements
<Lynne>
if there's nothing to exploit, like the block size being small enough to fit into all registers, or the lookup index being small enough, not much you can do on *that* level
<Lynne>
you could check if the data can be tweaked to be nicer to SIMD before the function call
tufei_ has quit [Remote host closed the connection]
<sdc>
ahhh okay good to know, thank you both for the suggestions!
tufei_ has joined #ffmpeg-devel
cone-186 has quit [Quit: transmission timeout]
AbleBacon has quit [Read error: Connection reset by peer]
Marth64 has quit [Remote host closed the connection]
Xaldafax has quit [Quit: Bye...]
tufei__ has joined #ffmpeg-devel
tufei_ has quit [Ping timeout: 260 seconds]
jamrial has quit []
Martchus has quit [Ping timeout: 264 seconds]
Martchus has joined #ffmpeg-devel
<Lynne>
tensorflow was abandoned, wonder how long pytorch will last
<Lynne>
they haven't even updated it to support the latest cuda (12.2), and it's been out for a year now
MisterMinister has quit [Ping timeout: 256 seconds]
qeed has quit [Quit: Leaving]
rooisnoek has joined #ffmpeg-devel
rooisnoek has quit [Client Quit]
deus0ww has quit [Ping timeout: 268 seconds]
deus0ww has joined #ffmpeg-devel
Krowl has joined #ffmpeg-devel
ngaullier has joined #ffmpeg-devel
ngaullie has joined #ffmpeg-devel
Krowl has quit [Read error: Connection reset by peer]
ngaullier has quit [Ping timeout: 256 seconds]
kurosu has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
psykose has quit [Remote host closed the connection]
MisterMinister has joined #ffmpeg-devel
agrosant has joined #ffmpeg-devel
psykose has joined #ffmpeg-devel
Marth64 has quit [Ping timeout: 256 seconds]
Marth64 has joined #ffmpeg-devel
iive has joined #ffmpeg-devel
<Sean_McG>
I just want to take a moment and say thanks to everybody that spent time looking at FATE test failures lately. I know it's un-sexy but at the very least I really appreciate the work.
<Lynne>
still haven't found a broken stream, and I've ran it through all I have saved (not a lot, av1 isn't that popular yet, and youtube's av1 is watered down, like all they provide)
rvalue has joined #ffmpeg-devel
jarthur has joined #ffmpeg-devel
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ffmpeg-devel
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ffmpeg-devel
Marth64 has quit [Ping timeout: 264 seconds]
Marth64 has joined #ffmpeg-devel
cone-117 has quit [Quit: transmission timeout]
<BtbN>
Re D3D11: Is it possible that "GPU to CPU" memory copies _do_ need explicit syncing or something?
<BtbN>
That'd explain stuff at least a little bit, why it sometimes digs out some super old frame, seemingly not performing the CopyRes/SubRegion calls
Livio has joined #ffmpeg-devel
<Sean_McG>
I don't know personally, but it would surprise me if that requirement was not documented somewhere
<Sean_McG>
but might it vary between vendors?
<BtbN>
well, D3D is not _supposed_ to vary between vendors
<BtbN>
That's kinda its whole selling point over Vulkan/OpenGL
<Sean_McG>
TRU.DAT
<BtbN>
But I honestly don't see what the code is doing wrong
<BtbN>
And it works on Nvidia, for me
agrosant has quit [Ping timeout: 264 seconds]
Krowl has joined #ffmpeg-devel
<Lynne>
vulkan is vendor invariant
<Lynne>
just write all the code to support all cases the spec outlines, it'll work
<JEEB>
BtbN: really sounds like an intel funky problem driver-wise
<Lynne>
until you find out the vendor in question which requested that workaround didn't implement it properly
<BtbN>
The initial reporter says the same thing happens on an Nvidia VM though
MisterMinister has quit [Ping timeout: 260 seconds]
<BtbN>
I just can't reproduce it _at all_ locally
<JEEB>
ouch
<BtbN>
Makes me wonder if that "Nvidia VM" is using some funky Optimus setup, and it's Intel after all
<BtbN>
Someone happens to have AMD on Windows 11?
<JEEB>
yea
<JEEB>
I actually thought the same re: nvidia vm being actually optimus or so
<JEEB>
not sure if you noted that it was on mobile or not, but that would have raised the optimus case percentage much higher
<BtbN>
The thing is... even if this is a broken Intel driver... we kinda do need to account for it. No idea how though
<BtbN>
Nah, definitely not mobile
<BtbN>
he cited some big fat Datacenter Nvidia GPU
<JEEB>
okay
MisterMinister has joined #ffmpeg-devel
<BtbN>
reverting the patch that causes it would make ddagrab unuseable for me again
<BtbN>
So kinda gotte pick whom to break it for...
<BtbN>
And even then, it's only working by pure chance without that patch
agrosant has joined #ffmpeg-devel
<Sean_McG>
wow, changes to ATRAC... takes me back to when I used to have a MiniDisc deck
<Lynne>
I still have an encoder I haven't sent to the ML
Livio has quit [Ping timeout: 260 seconds]
<BtbN>
since I'm not at work anymore, I have no way of reproducing it anymore either
<BtbN>
so gotta wait until monday. I'd want to see if just throwing in a Flush() call after the final CopySubRes region works around the issue
Traneptora has quit [Quit: Quit]
Livio has joined #ffmpeg-devel
agrosant has quit [Ping timeout: 264 seconds]
Marth128 has joined #ffmpeg-devel
Marth64 has quit [Killed (NickServ (GHOST command used by Marth128))]
Marth128 is now known as Marth64
<Marth64>
i have 3 nvidias and an arc...happy to help test if needed
<Marth64>
i think one of them is an entry grade Quadro
<Marth64>
but older gen
<Marth64>
no windows tho unless doing vm passthru
HarshK23 has quit [Quit: Connection closed for inactivity]
agrosant has joined #ffmpeg-devel
<BtbN>
Well, that's kinda a requirement for testing D3D11 stuff :D
<Marth64>
i can set it up nbd, have partitions to spare
<BtbN>
I'd say it's not important enough for that
<Marth64>
just meant dont have a metal windows ready to go
<Marth64>
cool
<BtbN>
In theory I _do_ have an AMD GPU in the very system I'm sitting in front of
<BtbN>
but not sure if I can make it do ddagrab
<Marth64>
BtbN: actually just realized, the Quadro is in a real windows machine
<BtbN>
On Nvidia it works for me though. On Intel it doesn't.
<Marth64>
its Pascal generation. feel free to shoot any test tasks my way if that card helps
<Marth64>
ah got it
<BtbN>
I mean, pretty much just try to use ddagrab, draw selection-rects with the mouse, and then look at the footage
<BtbN>
it's very obvious if it's busted
<Marth64>
ok
<Marth64>
Win10 is ok?
<BtbN>
Should be
<Marth64>
will run shortly
Krowl has quit [Read error: Connection reset by peer]
deus0ww has quit [Ping timeout: 272 seconds]
deus0ww has joined #ffmpeg-devel
<Marth64>
BtbN: its busted
<Marth64>
sample upload en route
<BtbN>
On a pure Nvidia system? That's wild
<BtbN>
I don't understand anymore then
<Marth64>
yes its like a flickering effect right?
<BtbN>
Yeah, sometimes an old frame is somehow dug out
<Marth64>
yes exactly
<JEEB>
\o/
<BtbN>
BUT the latest mouse cursor is drawn onto it. On top of the already existing one
<BtbN>
which should be absolutely impossible
<Marth64>
yes the mouse movement is smooth but old frames judder in
<BtbN>
Map texture from DDA -> ID3D11DeviceContext_CopyResource -> Unmap it -> ID3D11DeviceContext_CopySubresourceRegion -> draw onto the frame a bit -> deliver it ---> broken
<BtbN>
where as Map texture from DDA -> ID3D11DeviceContext_CopySubresourceRegion -> draw onto it a bit -> deliver it ---> works
lemourin has joined #ffmpeg-devel
lemourin is now known as Guest2779
Guest2779 has quit [Killed (lithium.libera.chat (Nickname regained by services))]
<Marth64>
i see the trail now
<BtbN>
Can you try a patch on that machine?
<Marth64>
anything you need
<Marth64>
sorry trying to get the video off to upload
<BtbN>
it'll look similar to the video from earlier
<BtbN>
The patch is basically just adding ID3D11DeviceContext_Flush(dda->device_hwctx->device_context); after the singular ID3D11DeviceContext_CopySubresourceRegion call in vsrc_ddagrab.c
<Marth64>
i don't have compiler on it :(
<Marth64>
but i can figure it out
<Sean_McG>
if you have Docker somewhere, we have a node that can do Windows cross-compiles -- we even use it for the CI jobs
<Marth64>
yes that would make my life much easier
<Marth64>
i has docker
<BtbN>
I'm already building a binary, one sec
<Marth64>
windows machine probably wont let me install dx sdk since its not pro/home
<Marth64>
tyty
<BtbN>
DX SDK does not exist anymore anyway
<Marth64>
makes sense with their modernization. i haven't done win32 in long time
<BtbN>
Just clone my builds repo and run `FFMPEG_REPO_OVERRIDE="https://github.com/BtbN/FFmpeg.git" GIT_BRANCH_OVERRIDE="master" ./build.sh win64 gpl` if you need some kinda special build
<Marth64>
nvidia driver 31.0.15.2849 from 2/2/2023
<BtbN>
That's not a driver version I'm aware of
<Marth64>
let me try once more
<BtbN>
But over a year old is definitely old
<Marth64>
i think that's the .dll version, actual is 528.49
<Marth64>
my bad
<BtbN>
Yeah, that's old
<BtbN>
I'm on 551.61 locally
<BtbN>
Are you encoding using nvenc, or downloading to libx264 or something?
<Marth64>
it's actually still broken .. i apologize... when i said no cursor, that was over RDP .. just did it on physical display and can reproduce the original issue
<BtbN>
This build should not exhibit the issue, but it's broken for me cause of the random format switches
<Marth64>
running
<Marth64>
yes, that build looks good
<BtbN>
That build fails for me the moment I move the mouse though...
<Marth64>
my cursor is smooth on it
<Marth64>
i noticed, that it took a little longer to start
<BtbN>
Cause on a frame that's "mouse upate only, no frame change", when you map the desktop resource from a HDR/10bit screen, for some reason you will get an 8 bit texture
<BtbN>
Which will completely break the input, since it's set up for 10 bit textures
<Marth64>
this is a shot in the dark, but i ran clang-tidy and found something possibly smell (but also perfectly valid if it's intended)
<BtbN>
And for me, those flags are already all 0, except for the bind flags, which are render res/shader res, which is both fine
<BtbN>
And now I'd love to know what they are on a broken system
<Marth64>
do they get logged or are they something i can pull from dxdiag?
<Marth64>
if dxdiag is even a thing anymore
<BtbN>
I'll just add a log for them
<BtbN>
no other sensible way to get to them
<Marth64>
cool
<Marth64>
makes sense, they seem to be per texture
<BtbN>
Yeah
<BtbN>
The texture they come from in this case is the desktop texture, straight from the compositor
<BtbN>
I really do wonder why so many people, except for me, have non-normal flags in there that'd ruin the day, and what they'd be
<Marth64>
if i recall correctly, when you install graphics drivers, windows runs some kind of capability test and configures the compositor (Aero/DWM?) based on that
<BtbN>
nah, that's not what the flags are about
<Marth64>
wonder if it's a configuration that happens at that point, that causes it to behave differently later
<Marth64>
got it
<BtbN>
they control stuff of who can access the texture in what way. I.e. it's a GPU<->GPU only, or GPU<->CPU, and so on
<BtbN>
Logs this for me: [Parsed_ddagrab_0 @ 0000023682b99480] >>>>>>>>>> Initial texture flags: Usage: 0, BindFlags: 40, CPUAccessFlags: 0, MiscFlags: 0
<BtbN>
This is an X670E Ryzen system, though I don't think the chipset plays any role in this
<BtbN>
this is gonna be some windows or driver setting somewhere, somehow influencing this
<BtbN>
For me, the misc flags are even different per-screen
<BtbN>
i.e. it has 10496 in the misc flags there. Which is D3D11_RESOURCE_MISC_RESTRICT_SHARED_RESOURCE | D3D11_RESOURCE_MISC_SHARED_NTHANDLE | D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX