#osdev on 2021-05-30 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:02 amanita has joined #osdev

00:04 dhugo has left #osdev [Leaving.]

00:08 Arthuria has quit [Ping timeout: 264 seconds]

00:19 <doug16k> riverdc, it's the same thing on x86 vs x86_64. x86_64 is actually easier

00:20 <doug16k> doublefault and stack fault are clunky as hell on x86, easy on x86_64

00:20 <doug16k> x86 nmi is also even more hell then x86_64 one

00:29 amanita has quit [Quit: Lost terminal]

00:29 transistor has quit [*.net *.split]

00:29 xenos1984 has quit [*.net *.split]

00:38 transistor has joined #osdev

00:38 xenos1984 has joined #osdev

00:42 iorem has joined #osdev

00:45 <riverdc> ok, good to know. i'm pretty green in terms of assembly, didn't know that x86_64 was considered easier

00:51 transistor has quit [Ping timeout: 245 seconds]

00:57 <kazinsal> I wonder if it would be easier to write a new kernel core for x86_64 and port the modules into it than to try to clone the i686 core and change bits as needed until it works

01:01 xenos1984 has quit [*.net *.split]

01:02 xenos1984 has joined #osdev

01:05 <Mutabah> x86_64 has more warts... but they're easier to use

01:06 <GreaseMonkey> [11:54:05] <heat> also arm64 has division, arm32 doesn't I think <-- ARMv7 i think has division which is 32-bit

01:06 <GreaseMonkey> i know ARMv7-M has division and that's the cut-down thumb-only microcontroller version

01:06 <heat> GreaseMonkey, probably, i'm not the biggest ARM fan here :)

01:07 <GreaseMonkey> ARM can be a lot of fun but i want to see more RISC-V stuff

01:07 <heat> kazinsal: if you wrote your paging well there are probably not too many changes you can make

01:08 <heat> just use uint64_t and increase your paging levels

01:08 Lucretia has quit [Quit: Konversation terminated!]

01:08 <merry> ARMv7-A has division yes

01:08 <heat> then there's the IDT and GDT which are slightly different but that's also a piece of cake

01:09 <heat> the TSS is also slighly different

01:10 <heat> for syscalls you probably want to use syscall but int works fine

01:10 <heat> and yeah that's about it

01:11 <Mutabah> GreaseMonkey: riscv is FUN

01:11 <Mutabah> it's just so simple

01:11 <GreaseMonkey> Mutabah: it's basically a better MIPS, another CPU which takes less than a day to make a basic user-mode-level emulator for

01:55 dgb has joined #osdev

02:05 isaacwoods has quit [Quit: WeeChat 3.1]

02:14 Affliction has joined #osdev

02:18 <heat> is it just me or is virtio-blk crap?

02:21 <Mutabah> In what way?

02:23 <heat> you need to create a separate virtio_blk_request header for each sector you want to read, and each sector is very much its own request

02:24 gog has joined #osdev

02:24 <Mutabah> Well, each request - you can read sequential sectors in one request

02:24 * gog meows

02:24 <Mutabah> And you need something to tell the host what address to read

02:24 <heat> you can?

02:24 <Mutabah> Yep

02:26 <heat> ah, I take it back then

02:26 <heat> virtio-blk is great

02:40 <doug16k> you might fragment it because of scatter gather, but it might be one big range

02:40 <doug16k> you can chain them together into one big command

02:42 <doug16k> my read and write are largely wrappers around mphysranges

02:42 <doug16k> the number of items there determine how many items in virtio ring

02:43 vdamewood has joined #osdev

02:46 <doug16k> if you look at the virtio config struct, it has size_max. use that to cap your scatter/gather fragment size

02:46 <doug16k> I have virtio-blk driver

02:46 transistor has joined #osdev

02:47 <doug16k> it reminds me of usb block storage. almost impossible to be simpler

02:49 <doug16k> virtio-blk blows away usb block storage for one reason: block storage is in order, virtio isn't

02:49 <doug16k> each thing completes independently

02:50 heat has quit [Ping timeout: 264 seconds]

02:51 vdamewood has quit [Ping timeout: 268 seconds]

02:52 <doug16k> if you want a guarantee that something run after something else, then don't issue it until the other thing completes

02:53 gog has quit [Ping timeout: 268 seconds]

02:53 <doug16k> you already need to be able to stall to do a useful flush

02:54 <doug16k> you need to be able to block issue from all threads, drain out the command until none are pending, issue the flush, and prevent issue until flush completes

02:56 <doug16k> when you say flush you are saying flush what was issued so far. if it hasn't issued everything up to the flush, the flush doesn't flush everything before it

02:56 <doug16k> the only way to be certain it was issued is to wait for it to complete

02:57 <doug16k> both ahci and nvme need you to do that too

02:58 <doug16k> if you just fling a flush in there you don't know for sure what it flushed

02:58 <doug16k> not allowed to issue that command NCQ anyway

02:59 <doug16k> already need to be able to stall until ncq commands drain, and transition to not-ncq

02:59 <doug16k> it's like it's a tradition

03:01 <doug16k> I wanted to add port multiplier support

03:01 <doug16k> I love the idea of port multipliers

03:02 <doug16k> one sata can handle 4 spinning drives no problem

03:04 <doug16k> too late now, too old

03:05 <Affliction> Something I've not thought of - do any hypervisors or emulators emulate them?

03:08 <Affliction> doesn't look like qemu or vmware do

03:08 <doug16k> I hope there is, I don't think so though

03:10 sniff has joined #osdev

03:10 <doug16k> I bet it's not even that much work to add it to qemu's ahci

03:11 <Affliction> hardware's not that expensive, either

03:11 <Affliction> but, my ONLY use for it would be testing my AHCI driver against it

03:12 <doug16k> a friend almost went that route and went sas-to-many-sata instead

03:12 sniff has left #osdev [#osdev]

03:13 <doug16k> he has an array of 7200 rpm drives that keep up with my 960 pro nvme in linear

03:15 vdamewood has joined #osdev

03:15 <doug16k> so much mechanical it keeps up with solid state :)

03:16 <Affliction> way back in the day I wrote a FAT fragmenter, compare their performance after that treatment :)

03:24 <doug16k> it's fun to have so much disk throughput, that you need to put the disk host controller in the video card slot so you can get x8 :P

03:26 jaevanko has joined #osdev

03:28 jaevanko has quit [Client Quit]

03:28 <Affliction> speaking of ludicrous disk throughput, does anyone know about directstorage? I've not been able to find a straight answer on if it's only useful for the GPU, or if it's useful on the CPU too.

03:30 <Affliction> even now, all I can find is marketing.

03:31 <doug16k> I go for storage so direct, I wrote the block driver and the filesystem and the syscalls and the libraries :P

03:33 <doug16k> then everything is my fault

03:35 <doug16k> fuzzing seems like an excellent idea on syscalls

03:36 <Affliction> Well, the nvidia documenttion seems to be entirely GPU side.

03:38 <Affliction> Dunno, maybe it provided a way to allow applications to map part of a BAR so they can access storage without going through the OS

03:38 <Affliction> But that would require the disk to know about the filesystem

03:39 <doug16k> newest pcie can map any size

03:39 <doug16k> variable bar thing

03:39 <Affliction> Which is something I've thought about - move the filesystem to the SSD controller, which can deal with allocating a bunch of numbered logical streams. The OS' filesystem can use some streams for metadata, others for content.

03:39 <doug16k> goes all the way up to 64 bit

03:40 <Affliction> Program wants to open a file, OS does permission checks, gives it a mapping to a page in one of the BARs that allows it to do read/write commands, without syscalls

03:41 <Affliction> mmap gives it a BAR on the disk which can be read and written

03:41 <doug16k> if hardware has that bank switching window thing, yeah

03:41 <doug16k> oh you mean slide the bar up and down depending on access?

03:42 <doug16k> that's probably not right, you aren't intended to be frequently modifying config space

03:42 <doug16k> weird if they do that

03:42 <Affliction> nah, the BAR could be a fixed sized multiple of the size of the disk.

03:43 <doug16k> yeah just bank switch the one bar and not move it

03:43 <doug16k> that way makes more sense

03:43 <Affliction> the device just creates 'views' of logical streams

03:43 <doug16k> it's EMS memory from DOS 6.22 day

03:43 <Affliction> effectively, the device deals with fragmentation

03:43 <doug16k> with nice big 16MB window

03:44 <Affliction> something I kinda want to try with an FPGA, but I've never done anything with an FPGA

03:44 <Affliction> seems a bit more complex than blinking a LED

03:44 <doug16k> if you want to do that they have amazing stuff

03:44 <doug16k> https://numato.com/product/aller-artix-7-fpga-board-with-m-2-interface/

03:45 <bslsk05> numato.com: Aller Artix-7 FPGA Board with M.2 Interface | Numato Lab

03:45 <Affliction> But, being able to read/write files with 0 syscall overhead

03:45 <doug16k> M.2 fpga - your code can use pcie bus and do dma and stuff

03:45 <Affliction> nice, hardware side sorted

03:45 <doug16k> big ass artiz

03:45 <doug16k> x

03:46 <Affliction> 1 RGB LED for custom use

03:46 <Affliction> so I can blink the LED too :D

03:46 <doug16k> that is amazing if you want to explore making pcie devices

03:46 <doug16k> they give you an led I think lol

03:46 <doug16k> just dma the blinking led into the framebuffer through peer-to-peer bus master stores

03:48 <doug16k> no you're right, that one is for more advanced user. you can get friendlier fpga stuff

03:50 <Affliction> At any rate, effectively building a filesystem in hardware is far above my skill level

03:51 <doug16k> it has an led though

03:51 <doug16k> you aren't even *allowed* to sell an fpga prototyping tool with no led are you?

03:51 <doug16k> the led police make those people disappear

03:51 <Affliction> Oh I'm sure I can blink the hell out of that LED :)

03:52 <Affliction> maybe I can DMA "Hello World!" into the framebuffer!

03:55 <doug16k> you could probably hijack control of the kernel from dma

03:55 <doug16k> point it into an mmio window you made, which has code in it

03:56 <doug16k> make the cpu jmp to the mmio window

03:56 <doug16k> from there you take over easy

03:57 <Affliction> Well, if you're booting from my disk, it can just load my code anyway!

03:57 <Affliction> Unless you're signing your EFI loader

03:57 <doug16k> if there is no iommu, there's nothing stopping you

03:57 <doug16k> you can modify ram whenever you want

03:58 <doug16k> anywhere

03:58 <Affliction> Didn't the IOMMU hve a bit devices can set to ignore it anyway?

03:58 <Affliction> Or at least, some versino of the spec

03:58 <Affliction> Because, that makes sense.

03:59 <doug16k> it could be behind something that doesn't support the remapping

03:59 <doug16k> everything behind that can see the real thing when they do peer to peer

04:01 <Affliction> I was sure there was a bit you could set in your DMA packets to say "nah, don't translate this address, it's completely harmless!"

04:03 <doug16k> oh not sure what you can or can't do maliciously

04:03 <Affliction> maybe it was in a draft

04:03 iorem has quit [Quit: Connection closed]

04:04 <doug16k> that device I linked would allow the user to explore dma attacks

04:04 <doug16k> I had my iommu on full paranoia. sync evict immediately after each I/O

04:05 <doug16k> it was fine, but impacted game engines a bit

04:05 <doug16k> iommu.strict

04:08 <doug16k> I just played tf2, first time playing it in a while, and it was so weird to just sit at fps_max forever 299fps

04:08 <Affliction> found it

04:08 <Affliction> https://colinrothwell.net/thesis.pdf

04:08 <Affliction> ^F for ATS

04:08 <Affliction> anyway, gotta go, back in an hour or so

04:09 <doug16k> nice

04:10 <doug16k> I don't know why I find it so amusing to run older engines at breakneck speed

04:26 bradd has quit [Remote host closed the connection]

04:31 bradd has joined #osdev

04:38 jmpeax has joined #osdev

05:02 Arsen has quit [Changing host]

05:02 Arsen has joined #osdev

05:02 MrBonkers has quit [Changing host]

05:02 MrBonkers has joined #osdev

05:03 Geertiebear has joined #osdev

05:03 Geertiebear has quit [Changing host]

05:07 <Affliction> now that I'm back, let's see what this is about...

05:21 sprock has quit [Quit: ...]

05:23 tenshi has joined #osdev

06:02 Terlisimo has quit [Quit: Connection reset by beer]

06:08 <Arsen> does the forum have an inline code bbcode? like ``code`` in rST?

06:08 sprock has joined #osdev

06:10 iorem has joined #osdev

06:11 divine has quit [Ping timeout: 264 seconds]

06:19 <moon-child> y'all remember colinux?

06:20 Terlisimo has joined #osdev

06:20 <moon-child> it seems like a cool thing to use, but also wildly impractical to have to build or work with

06:21 Terlisimo has quit [Client Quit]

06:23 Terlisimo has joined #osdev

06:48 MarchHare has quit [Ping timeout: 264 seconds]

06:54 <Affliction> I know of colinux, though I've never run it, or looked into how it worked

06:57 <Affliction> so it runs the kernel as a usermode process?

06:58 <Affliction> alongside drivers to use resources from windows

07:18 riverdc has quit [Remote host closed the connection]

07:24 SwitchToFreenode has quit [Remote host closed the connection]

07:24 SwitchToFreenode has joined #osdev

07:25 GeDaMo has joined #osdev

07:29 cultpony has quit [Changing host]

07:29 cultpony has joined #osdev

07:41 jmpeax has quit [Quit: leaving]

07:43 zagto has joined #osdev

08:06 pretty_dumm_guy has joined #osdev

08:06 pretty_dumm_guy has quit [Client Quit]

08:08 pretty_dumm_guy has joined #osdev

08:17 srjek has quit [Ping timeout: 268 seconds]

08:49 mahmutov has joined #osdev

08:53 qookie has joined #osdev

08:55 Lucretia has joined #osdev

09:01 mctpyt has quit [Ping timeout: 268 seconds]

09:03 mctpyt has joined #osdev

09:14 simpl_e has quit [Remote host closed the connection]

09:16 simpl_e has joined #osdev

09:27 alexander has joined #osdev

09:48 sortie has joined #osdev

10:02 lleo has joined #osdev

10:47 mctpyt has quit [Ping timeout: 268 seconds]

10:47 Lucretia has quit [Read error: Connection reset by peer]

10:47 Lucretia has joined #osdev

10:48 mctpyt has joined #osdev

10:48 dgb has quit [Ping timeout: 268 seconds]

10:48 qookie has quit [Ping timeout: 268 seconds]

10:48 iorem has quit [Ping timeout: 268 seconds]

10:48 qookie_ has joined #osdev

10:49 lleo has quit [Ping timeout: 268 seconds]

10:58 Lucretia-backup has joined #osdev

10:59 Lucretia has quit [Killed (NickServ (GHOST command used by Lucretia-backup))]

10:59 Lucretia-backup is now known as Lucretia

11:01 simpl_e has quit [Ping timeout: 268 seconds]

11:04 transistor has quit [Ping timeout: 268 seconds]

11:07 mctpyt has quit [Ping timeout: 268 seconds]

11:09 isaacwoods has joined #osdev

11:28 maksy has joined #osdev

11:31 GeDaMo has quit [Ping timeout: 268 seconds]

11:35 zagto has quit [Ping timeout: 268 seconds]

11:35 zagto has joined #osdev

11:36 GeDaMo has joined #osdev

11:45 dennis95 has joined #osdev

12:07 gog has joined #osdev

12:08 Matt|home has quit [Read error: Connection reset by peer]

12:12 mahmutov has quit [Ping timeout: 268 seconds]

12:27 junon has quit [Ping timeout: 272 seconds]

12:28 junon has joined #osdev

12:32 tricklynch has quit [Ping timeout: 268 seconds]

12:36 tricklynch has joined #osdev

12:39 wgrant has joined #osdev

12:39 wgrant has quit [Changing host]

12:40 tricklynch has quit [Ping timeout: 268 seconds]

12:44 tricklynch has joined #osdev

12:44 mahmutov has joined #osdev

12:57 chartreuse has quit [Ping timeout: 264 seconds]

12:57 heat has joined #osdev

13:02 pretty_dumm_guy has quit [Quit: WeeChat 3.2-dev]

13:10 pretty_dumm_guy has joined #osdev

13:21 iorem has joined #osdev

13:26 mahmutov has quit [Remote host closed the connection]

13:47 Mids_IRC has joined #osdev

14:29 lleo has joined #osdev

14:30 bleb has joined #osdev

15:10 mahmutov has joined #osdev

15:18 ksroot has quit [Ping timeout: 244 seconds]

15:45 redeem has quit [Ping timeout: 250 seconds]

16:09 mahmutov has quit [Ping timeout: 272 seconds]

16:11 transistor has joined #osdev

16:16 iorem has quit [Quit: Connection closed]

16:16 alexander has quit [Ping timeout: 265 seconds]

16:18 redeem has joined #osdev

16:27 pretty_dumm_guy has quit [Quit: WeeChat 3.2-dev]

16:30 amanita has joined #osdev

16:33 gareppa has joined #osdev

16:35 gareppa has quit [Remote host closed the connection]

16:45 MarchHare has joined #osdev

16:54 mahmutov has joined #osdev

17:01 tricklynch has quit [Ping timeout: 268 seconds]

17:01 tricklynch has joined #osdev

17:02 tricklynch has quit [Read error: Connection reset by peer]

17:02 tricklynch has joined #osdev

17:03 mahmutov has quit [Remote host closed the connection]

17:19 srjek has joined #osdev

17:23 qookie_ is now known as qookie

17:30 mahmutov has joined #osdev

17:31 alexander has joined #osdev

17:34 Oli has quit [Ping timeout: 268 seconds]

17:57 Oli has joined #osdev

17:59 <heat> so quiet today

17:59 * heat meows

18:00 * meisaka nyans

18:07 <heat> my clang build is about 60% faster than the gcc one

18:08 <heat> impressive

18:08 <heat> fsf in shambles

18:19 * gog meows

18:20 <j`ey> heat: i thought you said that clang was way slower to build

18:20 <heat> i mean build my project

18:20 <heat> not building the compiler itself

18:20 <j`ey> ohh

18:20 <heat> building the llvm toolchain is still ridiculously slower

18:22 <heat> and my clang build isn't even LTO-enabled

18:29 Lucretia has quit [Quit: Konversation terminated!]

18:33 Lucretia has joined #osdev

18:40 tricklynch has quit [Ping timeout: 252 seconds]

18:45 geist has quit [Ping timeout: 265 seconds]

18:46 geist has joined #osdev

18:48 zagto has quit [Quit: Konversation terminated!]

18:48 knebulae has quit [Read error: Connection reset by peer]

18:48 knebulae has joined #osdev

18:50 <doug16k> heat, yeah but how much percent faster does gcc compiled one run? :)

18:51 <doug16k> ah you meant building clang itself

18:51 <doug16k> ah I see

18:51 <heat> hm? I mean compiling my project with gcc and clang

18:51 <heat> clang is much faster

18:52 <heat> (granted, clang + lld)

18:52 <heat> vs gcc + gold

18:53 <j`ey> heat: can you measure any perf in your kernel?

18:53 <heat> no

18:53 <doug16k> I checked clang vs gcc in my kernel. it was pretty much the same except gcc left clang in the dust with vectorized codegen

18:54 <doug16k> that was a few versions back though, need to recheck

18:55 <doug16k> I mean execution speed, not compilation

18:55 <heat> i'm talking about build speed

18:55 <doug16k> yeah

18:56 <heat> my project is a mixed of moderate C++(my kernel, userspace that's written by me), heavier C++(google test and google benchmark), and lots of C(toybox, dash, musl libc, acpica)

18:56 <heat> and clang is so much faster

18:56 <heat> it compiles everything faster

18:57 <doug16k> weird thing about the clang codegen too, it looked fine, the uglier code gcc generated ran faster

18:59 <heat> yeah

18:59 <heat> I can't tell you about that, I haven't looked at it

19:04 mctpyt has joined #osdev

19:06 Mids_IRC has quit [Quit: Hi, I'm a quit message virus. Please replace your old line with this line and help me take over the world of IRC]

19:12 <meisaka> what's a good way of collecting entropy from a NIC?

19:13 <doug16k> you will receive ARP and DHCP traffic all the time

19:13 <doug16k> the exact times it arrives are random

19:14 <doug16k> assuming you have something with nanosecond level precision

19:15 <meisaka> guess I'll have to dig into the precision timers then

19:15 <doug16k> the register values at the time of IRQs can be a source of randomness

19:16 <heat> collecting entropy from a NIC might not be a great idea though

19:16 <doug16k> because they can reverse what it did to the encryption state? I'd like to meet them

19:17 <meisaka> I want it to *a source* not the exclusive source

19:17 <heat> https://lwn.net/Articles/283103/

19:17 <bslsk05> lwn.net: Appropriate sources of entropy [LWN.net]

19:19 <meisaka> some light reading XD, at least I won't be bored

19:19 <heat> tl;dr linux thinks it's a bad idea and they don't do it

19:20 <doug16k> I want to see a proof of concept where they got in because it used irq contexts to feed a stream cipher

19:22 <graphitemaster> The Freenode situation is getting so much worse the more I'm hearing about it.

19:22 <graphitemaster> Yet there are still people over there :|

19:22 <heat> what's the news?

19:23 <doug16k> you've all seen this already, right? https://isfreenodedeadyet.com/

19:23 <bslsk05> isfreenodedeadyet.com: isfreenodedeadyet

19:25 <graphitemaster> New staff are a bunch of horrible people basically. Signing the other RMS document to keep him in power. The other claims I have no proof for but one is apparently a massive tranphobe and has already banned trans people on the network. Another impersonates people and is harassing those who signed the RMS "stand down" document. And also there's something about making FN a "incel inclusive" server because incels are being banned on other

19:25 <graphitemaster> networks.

19:25 <kazinsal> Lmao, they're masks off about being terrible "well, you see, in *thailand* the age of consent..." types

19:25 <heat> oof

19:26 <heat> I should delete my account

19:26 <doug16k> wow

19:26 <kazinsal> if that crystalmath guy shows up here he's probably going to start bragging about freenode taking a stand about cancel culture

19:26 <kazinsal> I can feel the weirdo cryptolibre already

19:27 <graphitemaster> Yeah I dunno how true the claims are. I'm just picking up comments on Twitter and server mods on other networks.

19:28 <graphitemaster> Like https://twitter.com/mjg59/status/1397384190644002818

19:28 <bslsk05> twitter: <mjg59> Fucking incredible one of the new Freenode staff members is Chris Punches, who stole someone's identity in order to harass people who signed the petition asking for RMS to stand down: <github.com/rms-support-le… https://t.co/YUbDNIuan1>

19:28 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

19:29 <kazinsal> I'm generally inclined to believe the libera/oss-twitter people more than the freenode/bitcoin/darknets people

19:29 <gog> can't ban the trans people that already fled :p

19:29 <gog> also shame about crystalmath

19:30 <graphitemaster> It would turn out to be pretty funny if the whole corporate takeover of freenode just turned out to be a bunch of the cancel cancel culture types / RMS did nothing wrong / "actually the age of consent is" types.

19:31 <kazinsal> the new owner is a bitcoin billionaire who has run IRCs for offsites started by people who were so atrocious they got banned en masse from reddit

19:33 <kazinsal> he was in charge of the voat irc for most of its existence (including reusing its TLS configs for new-freenode), well known as the place that hundreds of thousands of qanon cultists migrated to from reddit

19:33 <kazinsal> the man is basically an alt-right-tech financier

19:34 <kazinsal> anyone who defends freenode nowadays is as good as fertilizer to me

19:34 <gog> lmao voat

19:34 <gog> reddit is bad enough, i couldn't imagine having gone to voat

19:35 <heat> neckbeard^2

19:36 johnjay has joined #osdev

19:36 <johnjay> libdl is referring to libc stuff or gcc internals? the former right?

19:37 <heat> yeah libc

19:38 <heat> you get the dl* family of functions

19:38 tricklynch has joined #osdev

19:38 <heat> in glibc, that is. some libc's don't have a libdl or even libm, libpthread

19:39 <heat> musl for example keeps everything in libc.{a,so}, even the dynamic linker is just a symlink to libc.so

19:40 <gog> neat

19:41 <heat> fun fact: glibc's libc.so.6 is executable

19:41 <doug16k> why?

19:41 <heat> because it is lol

19:41 <doug16k> lol

19:41 <renopt> why not :P

19:41 <kazinsal> sometimes you just gotta do weird shit for the oss cred

19:41 <heat> you get some info on the glibc's version and features and stuff

19:43 <doug16k> yeah I keep thinking I should figure out symbol versioning sooner rather than later

19:44 <doug16k> never used it before

19:44 <johnjay> GNU C Library (Debian GLIBC 2.31-12) stable release version 2.31.

19:44 <johnjay> haha it is

19:47 <doug16k> haha, neat. TCG says it supports "self snoop" because you don't need to flush the cache when changing memory types

19:47 <doug16k> I don't know of any that support self snoop in real life

19:49 <doug16k> I put in an if (!cpuid_has_self_snoop()) cpu_flush_cache(); after setting PAT and it was surprisingly skipping the flush

19:53 lleo has quit [Ping timeout: 272 seconds]

19:54 mctpyt has quit [Ping timeout: 272 seconds]

19:55 tricklynch has quit [Ping timeout: 265 seconds]

19:55 GeDaMo has quit [Excess Flood]

19:55 tricklynch has joined #osdev

19:56 GeDaMo has joined #osdev

20:05 tricklynch has quit [Ping timeout: 264 seconds]

20:08 Skyz has joined #osdev

20:08 tenshi has quit [Quit: WeeChat 3.1]

20:08 <Skyz> Is there anyway to make money with free software?

20:09 <j`ey> sell support

20:09 <doug16k> printf "int main(){}" > money.cc && make money

20:09 <GeDaMo> Do you want to limit yourself to legal means? :|

20:09 <geist2> use free software to make a thing to sell

20:09 <klysm> if it's really good you could make movies with it

20:10 <Skyz> Actually wanted to make a movie once on my own

20:10 <gog> open-source plans for a 3d printer and pistol

20:10 <Skyz> Realized that would be a colossal effort

20:10 <gog> then rob a liquor store

20:11 <Skyz> Does it actually have to be free software?

20:11 <Skyz> Who heard of free cars or free houses?

20:12 <doug16k> free as in freedom

20:12 gmacd has joined #osdev

20:12 <klysm> if you don't like it go sign up for adobe cloud and make a movie that way

20:12 <doug16k> you are free to read it, modify it, and give modified version to your friends

20:12 <Skyz> I see, free as in freedom is a good ideal

20:13 <Skyz> a good motto

20:13 <doug16k> so yeah, I am free to get a car, modify it, and give it to my friends

20:13 tricklynch has joined #osdev

20:13 <sortie> Making money with prinf is considered counterfeit

20:13 <Skyz> I don't trust that at all

20:13 <Skyz> lol

20:13 <Skyz> Would be nice idea though

20:14 <Skyz> They had car kits

20:14 <geist> making money with sortix though is the highest form of money making

20:14 <geist> pinnacle of human development

20:14 <sortie> Sortix for US Treasury 2021

20:14 <GeDaMo> Well, there's that guy who gave up their job to work on their OS full time

20:14 <sortie> printf("$100\n");

20:15 <Skyz> The high art of money making

20:15 <GeDaMo> https://awesomekling.github.io/I-quit-my-job-to-focus-on-SerenityOS-full-time/

20:15 <bslsk05> awesomekling.github.io: I quit my job to focus on SerenityOS full time – Andreas Kling – I like computers!

20:15 <j`ey> he should work on sortix instead

20:16 <GeDaMo> Living the dream slash nightmare :P

20:16 <sortie> Think about it. No performance reviews. No management. No corpspeak.

20:17 <froggey> hmmmm

20:17 <froggey> "corpspeak" sounds like corpspeak

20:18 <doug16k> this is why we need klaxons

20:18 <heat> m a t e r i a l d e s i g n

20:18 <geist> a s t h e t i c s

20:21 <Skyz> E L 0 S E C U R I T Y

20:27 <klysm> class money* bucks = new money( "$100", 100.0 ); printf( "%s", money.value() );

20:27 <doug16k> I just realized my code that sets PAT MSR needs to do way more stuff

20:28 <doug16k> you have to freak out and disable cache fill, writeback/invalidate cache, flush tlb, then disable MTRR, then set it, then undo it back to normal

20:29 <klysm> s/\./->/

20:29 <doug16k> and the other cpu better nor be running anything beyond a spinloop also with its cache not filling and its MTRRs off

20:31 <doug16k> that is easy to arrange though, I'll make the cpu that sends IPI go into no-cache before sending SIPI. it won't get notified of SMP ready until after the other cpu set its PAT

20:32 <doug16k> AP ready I mean

20:33 <doug16k> it's funny how utterly disabled the cache is. it isn't even trying to fill, and the MTRR being off makes all memory UC anyway

20:34 <Skyz> What store would sell operating systems?

20:35 <Skyz> Off-question/

20:35 chartreuse has joined #osdev

20:35 <Skyz> SMP is something every os needs now

20:36 <klysm> skyz, a reseller for chips bought off digi-key

20:36 <doug16k> the amount an OS needs to do grows naturally as more hardware needs to be shared and abstracted

20:37 <junon> unless it has a driver system, unless you count drivers are part of the OS

20:38 <doug16k> picking which thread to run and which page gets to be in ram is the main job of OS

20:39 <doug16k> it boils down to fetching instructions that read and write data

20:40 mahmutov has quit [Quit: WeeChat 3.1]

20:40 <doug16k> we expect lots of programs to be able to share the extremely powerful hardware we have now

20:41 mahmutov has joined #osdev

20:41 mahmutov has quit [Client Quit]

20:41 <Skyz> But what types of programs

20:41 <Skyz> Toy applications?

20:42 <klysm> apps, databases, networks, systems, configurators, viewers, compilers

20:43 <Skyz> I would be really interested to see a hobby os in a car

20:45 <geist> yah it is kinda amazing how much things have changed in consumer world over the last 20-30 years re: multitasking

20:45 <geist> i remember back when it was novel and neat to just be able to forward/background something in windows 3.1 or even a dos TSR

20:45 <geist> though of couse someone will point out their amiga/unix machine/etc already did that

20:49 <Skyz> https://en.wikipedia.org/wiki/List_of_car_manufacturers_of_the_United_Kingdom

20:49 <bslsk05> en.wikipedia.org: List of car manufacturers of the United Kingdom - Wikipedia

20:49 <Skyz> I find it interesting how many car manufacturers they have

20:51 <Skyz> Only a few are known

20:51 <Skyz> most of them are small companies

20:51 <Skyz> and many defunct

20:53 mahmutov has joined #osdev

20:53 GeDaMo has quit [Quit: Leaving.]

20:54 <clever> doug16k: ive hit a bit of a roadblock with that vectorized softfloat, its getting too complex for me, combined with the fact that its not actually going to be any faster then non-vectorized hard floats

20:55 <clever> doug16k: so i'm shifting gears, creating some portable c functions, that replicate the effects of the vector opcodes, limitations and all, to act like an SDK, and i could then use inlined functions to do the same task on a VPU

20:56 <moon-child> clever: have you seen simde?

20:56 <clever> moon-child: nope

20:56 <clever> moon-child: for reference, this is what i have: https://github.com/librerpi/lk-overlay/blob/master/app/float-tests/float.c#L34-L59

20:56 <bslsk05> github.com: lk-overlay/float.c at master · librerpi/lk-overlay · GitHub

20:57 <doug16k> clever, I like your plan though - making some intrinsics for yourself

20:57 <clever> the biggest roadblock, is that my mult opcode, only accepts 16bit inputs

20:57 <moon-child> https://github.com/simd-everywhere/simde it has pure-c implementations of all the simd intrinsics

20:57 <bslsk05> simd-everywhere/simde - Implementations of SIMD instruction sets for systems which don't natively support them. (115 forks/1087 stargazers/MIT)

20:57 <moon-child> might be useful

20:57 <clever> doug16k: yeah, the #gcc channel also said intrinsics would be better then proper vector spport, because gcc doesnt really understand a matrix

20:57 <clever> moon-child: ahh, that sounds almost exactly what i want to do

20:58 <clever> implementing the intrinsics twice, as both pure-c (portable) and inline asm (faster)

20:59 <clever> the pure-c version, would then let people test out the algos, without needing a vpu toolchain and test target

21:00 gog has quit [Quit: bye]

21:00 <clever> one question i have on planning, how should the intrinsics accept inputs?

21:00 <geist> yah i think the general way of writing simd stuff is to use a huge pile of intrisics. it's kinda like assembly except the compiler is doing the busywork of register allocation and load/stores for you

21:00 <doug16k> chunks of that could be reused in telling gcc how to use it natively

21:00 <clever> should i give it a coordinate into the matrix, and a pointer to a matrix

21:01 <clever> or should i just give it a vector of 16 elements?

21:01 <geist> oh reminds me of SH-4: at the time (1999) it was the only cpu that had a straight matrix multiply instruction

21:01 <clever> coordinates, would carry over more of the real limitations/power, but require the user to handle register allocations by hand

21:02 <geist> though i *think* it was just a 1x4 * 4x4 -> 4x1 right? basically 4 of those in a row was a standard 3d transform

21:02 <clever> i think the VPU can do a 16x16 matrix mult in ~4 opcodes, if i understand the formula right

21:02 <geist> though actually i think it may have had a full 4 4x4

21:02 <clever> smaller, would require loading some constants to mess with the per-lane condition flags

21:03 <clever> maybe the matrix pointer, could just be TLS

21:03 <doug16k> having the one matrix across a vector like that isn't true vectorization

21:03 <clever> simpler api

21:04 <doug16k> true vectorization would be N threads each doing scalar things, where N is vector width

21:04 <geist> ah yeah FTRV instruction was a 4x4 * 4x1 it looks like

21:04 <clever> doug16k: internally, the matrix is basically a uint8_t[64][64], and you give it a coordinate, to select either a 1x16 or a 16x1 slice

21:05 <doug16k> yes but proper vector code doesn't slice anything

21:05 <doug16k> everything is entire register, hardly ever swizzle

21:05 <clever> so it would either span m[r][c] to m[r][c+15] or m[r][c] to m[r+15][c]

21:05 <clever> you can basically treat it as a collection of 256 different uint8_t[16]'s

21:06 <clever> with some built in rotation

21:06 <doug16k> you want "whole vector, whole vector, ..." not "did that little piece of a vector, did that little piece of a vector, ..."

21:06 <clever> the immediate encoding, doesnt allow you to pick a non-aligned slice

21:06 <clever> so its almost a hard mapping on where those 256 slices fit

21:07 <clever> when you bump up to 16bit, then there are only 128 uint16_t[16]'s that you can address

21:07 <geist> iirc the SH-4 sttrategy is not to have vector registers but to treat the two banks of 16 single precision fpu registers as either 4 vectors horizontally or vertically, or a single 4x4 matrix

21:08 <clever> doug16k: if you ignore its ability to rotate, then that is basically just 128 vector registers, each holding 16 x 16bit ints

21:08 <geist> was kinda nice to work with

21:08 <clever> doug16k: but knowing how the matrix works, lets you mix&match bit widths, do free high/low halfword slicing, and matrix rotation

21:09 <clever> so you could treat each slice, as a self-contained vector reg, but then your loosing features

21:09 <doug16k> what is the real width

21:10 <clever> bit width or lane width?

21:10 <doug16k> if you pipelined back to back mul add that are sufficiently independent, how wide before it all takes more cycles

21:11 <doug16k> ya is it actually 4-wide or what

21:11 <clever> my original speed test, basically did `a = a * 2;` in a loop, and measured it to be 2 cylces per set of 16 mults

21:11 <clever> so thats 16 lanes wide, 2 clock cycles, with each mult directly consuming the previous result

21:12 <doug16k> yes but there is a loop dependency

21:12 <doug16k> you stall yourself until mul latency elapses each iteration

21:13 <clever> ah, but this wasnt repeating right away

21:13 <doug16k> it might have been able to fit two or three muls in between

21:13 <clever> this was a REP64 opcode

21:13 <doug16k> ah

21:13 <clever> so it was more like doing an int[16*64] based mult, 16 at a time

21:13 alexander has quit [Quit: Goodnight o/]

21:14 <clever> and it measured 128 cycles

21:14 <doug16k> that's why I ask the real width. you would not generate crazy wide vectors if you were doing fully optimized AoSoA code

21:14 <clever> all signs point towards it being 16 wide, because it can only ever operate on vectors of 16 elements

21:15 <doug16k> 256 bit seems good and reasonable

21:15 <clever> it is capable of 32bit x 16, so each ALU input is 512 bits i think

21:15 <clever> but only for adds and basic boolean

21:16 <clever> writing up a pure-c implementation should serve as a much simpler way to document it all

21:16 <doug16k> yeah, you will be an expert in the ISA by the time you fully debug C code that emulates it :D

21:17 <doug16k> and get it to match real thing

21:17 <clever> one tricky problem though, is generating the inline asm

21:17 <clever> lets say i write a function/macro call like this: foo(1,2,3)

21:17 <clever> how can it turn into asm volatile ("foo HX(1,2), 3"); ??

21:17 <doug16k> easy

21:18 <doug16k> static inline. I have 100 or so in my project

21:18 <clever> got a link?

21:18 <doug16k> you can do compile time constants

21:19 <clever> for the asm version, it needs to be a constant, that becomes a string literal

21:19 <doug16k> did you mean registers when you said 1 2 3

21:19 <clever> a literal 1,2 in the inline asm

21:19 <doug16k> ok, yeah you can do that

21:19 <clever> asm volatile ("v32or HY(2, 0), HY(2, 0), HY(4,0) IFNZ");

21:19 <clever> i need to generate inline asm like this, from function args

21:19 thinkpol has quit [Remote host closed the connection]

21:19 <clever> but also accept those args as proper ints, so the pure-c version can index into the matrix

21:19 thinkpol has joined #osdev

21:20 <doug16k> https://github.com/doug65536/dgos/blob/master/kernel/arch/x86_64/cpu/control_regs.h#L229

21:20 <bslsk05> github.com: dgos/control_regs.h at master · doug65536/dgos · GitHub

21:20 <doug16k> it injects the dr number

21:21 <doug16k> should work for parameter if caller passes constant

21:21 <clever> ahhh, the "i" part!

21:21 <clever> for immediates in inline asm!

21:21 <doug16k> yeah

21:21 <clever> that should work out perfectly, and you solved my long-standing question of named args

21:22 <doug16k> I am a named inline asm argument enthusiast

21:22 <clever> does dr have to be a template arg?

21:22 <doug16k> don't think so

21:22 <clever> i'll try things both ways

21:22 <doug16k> if you pass a constant and it is inline as hell like that, should work

21:22 <clever> templating kinda makes things simpler, since i can use a std::pair

21:23 <clever> one thing i didnt mention, is that matrix coords, have both an immediate, and a register component

21:23 <clever> HX(0,0)+r0

21:23 <heat> inline asm enthusiast is the scariest thing I've ever heard

21:23 <clever> the immediate, must be aligned correctly to the bit-width, but r0 then contains a pair of 6bit offets, for the row/col, allowing non-aligned access

21:24 <clever> that also allows programatic movement of the algo

21:24 <clever> so i could have a function that operates on a 16x16 chunk of data, and then point it to a specific chunk at runtime

21:27 <clever> doug16k: do other cpu's allow you to change what regs a vector op acts on, at runtime?

21:27 <doug16k> yes

21:27 <doug16k> there have been ones where you can just set an arbitrary number for vector width

21:27 <clever> ah, so its not that special

21:28 <clever> yeah, ive heard that the new arm specs, allow that

21:28 <doug16k> it's very forward compatible

21:28 <geist> yep, arm SVE and the new riscv vector stuff uses that sceheme

21:28 <doug16k> someday when it is 1024-bit wide, it will run DOOM at 2400fps instead of 1400

21:28 <geist> hardware supports vectors up to N, user spce can dynamically set the width U <= N and then do a series of ops

21:30 <geist> actually one more level of abstraction: hardware has H bits, kernel enables K bits of it (whatever it's willing to context switch) and user space can set U width on the fly

21:30 <geist> and U <= K <= H

21:30 <clever> yeah, that makes sense

21:31 <heat> (x/y/z)mm go brrr

21:31 <geist> i forget f ARM has a scheme to remember the highest water mark user space has used since the last time it was cleared, for essencitally xsave-like-optimizations

21:32 <j`ey> https://www.twitch.tv/asahigpu alyssa working on m1 gpu driver

21:32 <bslsk05> 'asahigpu (live) 2021-05-30 21:32' by asahigpu (live)

21:32 <geist> riscv i think might? it actually for FPU stuff has a nice 2 bit scheme to remember not only if user space touched the fpu but if they wrote to it

21:33 <geist> 4 states is think? disabled, clean, accessed, dirty or something like that

21:34 <heat> j`ey, 0/10 not in a hot tub

21:34 <doug16k> that matches up with how my kernel sees fpu exactly :)

21:34 <geist> doug16k: yah it's nice. though *really* the read vs write states are kinda extraneous, but i guess they're basically 'free' to hardware so may as well

21:34 <j`ey> heat: lol

21:34 <geist> like, how often does user space code just read from the fpu and not write it

21:34 <geist> possible on riscv it may be more than i think, like some access of some fpu condition register in a read only mode in some software patterns

21:35 tricklynch has quit [Read error: Connection reset by peer]

21:36 tricklynch has joined #osdev

21:36 <geist> well, i guess it's less of a hardware point of view and more that read vs dirty states are useful f you dont want to lazy fpu save, but you'd at least like to know if user code never wrote to it

21:37 <geist> so you can context restore it, set the 'read' state and then on the next context switch not bother saving it if they never wrote it

21:37 <heat> i thought the standard was to not do any lazy fpu?

21:37 <geist> but ou dont want to bother doing the classic scheme of leaving it disabled, trapping, and restoring it there

21:37 <geist> heat: depends on the arch. classically speaking most arches have been lazy fpu saving forever

21:37 <doug16k> it works correctly on amd

21:37 <geist> intel kinda ruined it recently by having a spectre thing

21:38 <geist> but... x86 also has the very rich and powerful xsave stuff which also hyper optimizes it so much that it's kinda 'free'

21:38 <heat> I remember linux ditched lazy fpu for x86 in 2015, maybe 2016?

21:38 dennis95 has quit [Remote host closed the connection]

21:38 <heat> not sure, but around that time

21:38 <geist> possibly only if has xsave

21:38 <doug16k> yeah they made xsave/xrstor lazy and able to know whole chunks of context are zeros

21:38 <geist> or xsaveopt or whatever which one the current one is

21:39 <geist> but then spectre says we cant have nice things so i think that nailed that particular coffin

21:39 <doug16k> xsaves is best, with xsaveopt close behind

21:39 dennis95 has joined #osdev

21:39 <geist> but, on risc or pseudo risc arches like arm and riscv you have to do it all manually

21:39 <geist> so you probably want at least one level of 'did user space even touch it?' fpu save

21:40 <heat> defaulted to no-lazy for every x86 cpu in early 2016

21:40 <geist> not necessarily full 'trap and lazy restore' but more like 'leave disabled, trap so i know its dirty'

21:40 <heat> before that it was xsave-only

21:40 <geist> heat: well okay then!

21:40 <clever> doug16k: hmmm, another templating problem, i need 3 versions of a function, 8bit, 16bit, and 32bit, then need to pick the right uint8_t for internal usage, but also insert an 8/16/32 literal in the asm...

21:40 <clever> doug16k: maybe sizeof(t)*8 as a const expr?

21:40 <doug16k> you could use overload resolution

21:40 <doug16k> do you have integral_constant

21:41 <clever> no idea

21:41 tricklynch has quit [Ping timeout: 268 seconds]

21:41 <geist> so on fuchsia for example we do a partial fpu lazy saveon arm. we dont leave old state from previous threads on it, so we always fpu save if dirty

21:41 <heat> "It seems that, on any remotely recent hardware, eagerfpu is a win: glibc uses SSE2, so laziness is probably overoptimistic, and, in any case, manipulating TS is far slower that saving and restoring the full state. (Stores to CR0.TS are serializing and are poorly optimized.)"

21:41 <geist> but i think we delay the loading until a trap

21:41 tricklynch has joined #osdev

21:41 <geist> and there's a TODO to see if that's even worth it

21:41 <geist> right, it also has a lot to do with how user spae uses the fpu. so on x86 yah SSE is used like crazy so there's kinda no point

21:42 <geist> arm64 i think the vector bits are used a bit less aggressively...

21:42 <geist> OTOH last time i timed the full ector load/store on a recent ARM core it was pretty fast

21:42 <geist> like 20 cycles or so? so really it's no big deal

21:43 arch is now known as archenoth

21:43 <geist> really blatting out a large chunk of sequential registers is what modern cpus crave so

21:43 <doug16k> clever, https://github.com/doug65536/dgos/blob/master/kernel/lib/cc/type_traits.h#L28 then add a parameter integral_constant<sizeof(uint32_t)>::type parameter to each variation, and call it with integral_constant<sizeof(whatever)>::type() in that place

21:43 <bslsk05> github.com: dgos/type_traits.h at master · doug65536/dgos · GitHub

21:43 <doug16k> is that what you mean?

21:44 <geist> heat: i think the key would be something like avx512 vs SSE. *however* by the time you get cpus with avx512 you have xsave which can optimize for not saving/restoring more than was dirtied

21:44 <doug16k> you could make it nice to read by typedefing the different sizes to use for that parameter that is just there for overload selection

21:44 <geist> so it's all good

21:44 <clever> doug16k: maybe, let me get an example ready...

21:45 <geist> on something like ARM where SVE can end up with even more state, if there's not hardware tracking support for how much of the upper registers were touched, then it's back to some sort of trap-n-track solution again

21:45 <geist> i can see a scheme where you report to user space that there are 512 byte vectors, but then disabling it to 128, and trapping when user space actually tries to use upper bits

21:45 <geist> then, bumping some water mark on the thread, maybe allocating more state, and eenabling that much register

21:46 <doug16k> sorry, integral_constant<size_t,sizeof(whatever)>

21:47 <heat> geist: i'm not entirely sure what linux does for avx512 but I would assume it's the Intel Sanctioned(tm) way to do fpu save/restore with it

21:47 <heat> considering they probably had that in mind

21:48 <doug16k> this does all kinds of song and dance to boil it down to being so many bits, then does the right l or ll nonsense: https://github.com/doug65536/dgos/blob/master/kernel/lib/bitsearch.h

21:48 <bslsk05> github.com: dgos/bitsearch.h at master · doug65536/dgos · GitHub

21:48 <heat> maybe xsave is still crazy fast with 512? who knows

21:48 tricklynch has quit [Ping timeout: 268 seconds]

21:49 tricklynch has joined #osdev

21:49 <doug16k> like line 103

21:49 <geist> heat: i think xsave just does what you want

21:49 <geist> it tracks which parts of the registers are dirty, etc

21:49 <geist> and code is encouraged to use xzeroupper/etc which xsave can pick up on

21:50 <heat> linux doesn't do that though

21:50 <doug16k> clever, line 176 magically calls the right one

21:50 <heat> I think we had reached that conclusion

21:50 <geist> really its all about having to allocate that much save state for each thread. one of the reasons we haven't added support for avx512 yet in zircon

21:50 <heat> at least for vzeroupper

21:50 <geist> it's aTODO task but

21:50 <geist> heat: hmm, in what case?

21:50 <heat> geist, on the syscall path

21:51 <geist> i'm talking about a generic context switch. like you preempted user space and it was doing something

21:51 <clever> doug16k: ah, one template, calling another template, but using sizeof to fill in the gaps

21:51 <doug16k> yeah

21:51 <geist> ah yeah but context switch works the same if it came out of syscall or a preemption

21:51 <doug16k> and parameter type with unused value selecting which overload

21:51 <geist> but you're right, you'd think linux would vzeroupper and they dont which is lame.

21:51 <clever> doug16k: let me start the code, and try some things, then maybe ask for help...

21:51 <geist> on syscall

21:52 <heat> yeah only mentions vzeroupper on crypto code

21:53 <geist> right, because that's the only real part where the kernel actively uses the vector bits

21:53 <geist> otherwise it just 'passes them through' from user space

21:53 <geist> and the context switch routine saves it as basically user state

21:55 Matt|home has joined #osdev

21:57 <doug16k> right but there is a long stall penalty for transitioning between 128 and 256 bit operation, if it is 256 maybe you should leave it

21:57 <doug16k> sometimes that would help too though

21:58 <doug16k> I think it has to wait for all the 256 bit aware vex stuff retire before it can begin the "assume upper is zero" code

21:59 <doug16k> maybe zeroing upper wouldn't affect that actually

21:59 <doug16k> would just cause better init optimization

22:00 <doug16k> my stuff guarantees zero fpu on every syscall return

22:00 <clever> i just remembered a weird situation i discovered many months ago

22:00 <doug16k> if your syscall got preempted I don't save it

22:01 <clever> a process was consuming 100% cpu, and strace said it was doing nothing at all

22:01 <clever> and it remained like this for over 10 minutes

22:01 <clever> after poking around with gdb and getting a backtrace, i found the cause

22:01 <clever> compression

22:02 <clever> it was ram->ram compression, with pre-allocated buffers, so it never had to do a single syscall

22:05 <Skyz> Was a basic kernel a good idea

22:05 <Skyz> or is c the only good choice?

22:06 <doug16k> best thing is, when pthreads gives up and futex blocks, it doesn't save it. and when futex wait wakes up, it doesn't restore it

22:06 <Skyz> subjective

22:06 <doug16k> it zeros it on way back to user

22:07 <Skyz> Do you use qemu?

22:07 <doug16k> of course it does preserve fcw and mxcsr

22:07 <doug16k> I do yes

22:09 <doug16k> language doesn't matter

22:09 <doug16k> if it mattered then we wouldn't be using the same one so widely for so long

22:10 <doug16k> the one that stops you least wins

22:11 <doug16k> it's like thinking we can make buildings never collapse if only we make the perfect way for architects to write the design down

22:12 <Skyz> Well, I would like a job working on an OS

22:13 <Skyz> but I'm kinda C-illiterate

22:13 <Skyz> I can't write C code well

22:13 <Skyz> Taking some classes to get it down

22:14 <Skyz> As for what a user wants

22:14 <Skyz> they want a GUI most of the time

22:15 <Skyz> I'm thinking of writing some tutorials

22:16 dennis95 has quit [Quit: Leaving]

22:16 <travisg> please dont.

22:17 <Skyz> There's other things besides tutorials I can do

22:17 <Skyz> I'm kinda emulating Fravia+

22:18 <Skyz> Wanting to see if there is an ultimate destination for applications

22:18 <Skyz> I'll hold off

22:18 <geist> what the heck is fravia?

22:21 <heat> i was super confused

22:21 <heat> turns out I /ignore'd skyz

22:21 <j`ey> lol

22:21 <graphitemaster> Making money with free software is like making money as a musician. It's possible, but most of the time it's not about the content so much as it's the celebrity.

22:22 <Skyz> Well

22:22 <Skyz> Open SOurce software doesn't have to be free software

22:22 <Skyz> free as in free

22:23 <moon-child> no, but you're even less likely to make money on non-free oss than on free oss

22:23 <Skyz> Somehow I missed the point somewhere

22:25 <Skyz> Fravia is a reverse engineer

22:27 <Skyz> He is grey hat

22:27 <Skyz> Been working on trying to do something that is for the protection of software

22:29 <geist> heat: yah i had too

22:29 <geist> but my other clients were seeing it and i was like oooh

22:29 tricklynch has quit [Read error: Connection reset by peer]

22:29 tricklynch has joined #osdev

22:31 mahmutov has quit [Ping timeout: 268 seconds]

22:31 <klange> Skyz: Your continued endeavour of hopping from platform to platform, community to community, making zero sense and demonstrating zero knowledge of anything you are asking about has reached a new level of annoyance that my local authorities will doubtless qualify as harassment.

22:31 <Skyz> No harassment intended

22:33 <heat> when are .eh_frame and .eh_frame_hdr relevant?

22:33 <geist> skyz is like the libyians in back to the future

22:33 <doug16k> heat, stack traces and exception unwind

22:33 <geist> you think you lose them and then he shows up in a vw van with a rpg

22:33 <heat> doug16k: but in-process unwinding or debugger?

22:34 <doug16k> _hdr provides a lookup table that speeds up lookup of relevant cfi records for a given pc

22:34 <doug16k> debugger and in process if runtime unwinding like a fancy longjmp that calls landing pads, or full c++ landing pads

22:35 <doug16k> you can make it so C can call C++ that has landing pads that calls C, and if that C longjmps right over C++ it will clean up

22:35 Skyz has quit [Quit: Client closed]

22:35 <doug16k> _Unwind_ForceUnwind

22:36 <doug16k> so yeah even C uses it

22:36 <heat> I'm seeing zircon does -fno-unwind-tables and linux does -fno-asynchronous-unwind-tables

22:36 <doug16k> but if you don't force unwind at runtime ever you could discard it

22:36 <heat> i'm struggling to understand why

22:36 <doug16k> yeah it is turning off support for what I described

22:36 <doug16k> it means don't allow foreign exceptions to propagate through the code

22:37 <doug16k> so generate potentially a lot less CFI

22:37 <heat> what foreign exceptions?

22:37 <heat> C++ exceptions?

22:37 <doug16k> yes or any language

22:37 <doug16k> the way the abi works, all languages can do their own thing and everyone can invoke it

22:38 <doug16k> no-asynchronous-unwind means "please don't support full exception unwind as if I were C++"

22:39 <heat> and -fno-unwind-tables?

22:40 <doug16k> never heard of it

22:40 <heat> all I want is to have so debug info for the debugger to look at, I don't want to use any at runtime

22:40 <doug16k> then you want no asynchrous unwind fno-exceptions

22:40 <geist> may be no-unwind-tables is stronger? i dont thnk it was inherited from Lk

22:41 <doug16k> maybe arm exception abi thing?

22:41 <geist> possible

22:41 <geist> heat: is it in one of the two compiler paths and/or arch specific section?

22:41 <doug16k> oh I found it

22:42 <doug16k> unwind-tables means just generate the data but don't affect codegen with unwind

22:42 <heat> geist, no

22:42 <heat> https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/kernel/BUILD.gn;l=229

22:42 <bslsk05> cs.opensource.google <no title>

22:42 <geist> oh well there's a whole comment about it

22:43 <heat> says it keeps asynchronous unwind tables but discard eh_frame

22:43 qookie has quit [Ping timeout: 265 seconds]

22:46 <geist> does actually remind me. after years of completely eschewing C++ exceptions

22:46 <geist> how bad is it really? (codegen and usability)

22:46 <doug16k> looks like making the unwind data not runtime data makes it go into .debug_frame

22:46 <clever> doug16k: https://gist.github.com/79143cb23a50d572b9d527c9ea479492 my first pass, its only tested to compile, but the code looks like it should do what i intend

22:46 <bslsk05> gist.github.com: simple-test.c · GitHub

22:47 <heat> doug16k, I tried both options and I still get huge eh_frames

22:48 <doug16k> heat, look at the cfi records to see where they come from

22:48 <doug16k> objdump --dwarf

22:48 <doug16k> pc=fffff.....

22:49 <doug16k> look up what

22:49 <doug16k> ...in the disassembly

22:49 <doug16k> sorry for speaking vertically

22:49 <heat> having fno-unwind-tables or not makes no difference in the section's size

22:50 <heat> no

22:50 <heat> problem

22:50 <heat> :)

22:51 <doug16k> mine shows the stuff in .eh_frame first

22:52 <doug16k> what you want is everything in .debug_frame

22:55 <heat> those options seem to make absolutely no difference

22:55 * heat tries with gcc

22:55 <clever> doug16k: now i need to use templates like your example, to dedup this... https://gist.github.com/cleverca22/79143cb23a50d572b9d527c9ea479492#file-vpu-support-purec-h-L27-L57

22:55 <bslsk05> gist.github.com: simple-test.c · GitHub

22:56 <doug16k> you can force instantiate each variation, so everyone can assume they can call the one instance, so it's as good as preprocessor hacking

22:56 <heat> oh it works with gcc, not with clang

22:56 <heat> is this gcc specific?

22:57 <doug16k> you have -fno-exceptions ?

22:57 <heat> clang never complains but the eh_frames are still huge

22:57 <heat> yes

22:57 <doug16k> that alone should go far to shut up with the cfi

22:58 <heat> with gcc I have eh frames of size ~0x30

22:58 <doug16k> my man clang only even mentions unwind in fexceptions

22:58 <heat> compared to several hundred KB

23:01 <doug16k> it isn't expected to even get touched if it is good program that uses exceptions correctly

23:02 <doug16k> demand paging would be a good excuse to say screw it and generate it

23:02 dutch has quit [Quit: Leaving]

23:02 <geist> right it's all about disk usage in a demand paged system

23:06 sortie has quit [Quit: Leaving]

23:06 dutch has joined #osdev

23:07 <clever> doug16k: first thing i notice, `movdqa 0xfb4(%rip),%xmm0`, gcc is vectorizing things for me!

23:07 <doug16k> clever, yes, you can make it try hard with -ftree-vectorize

23:08 <clever> doug16k: from the code in the gist i linked above, how would i select between MATRIX8_WRITE and MATRIX16_WRITE, based on the type of T?

23:10 <clever> just throw in some dumb if statements maybe? if (sizeof(T) == 2), and let const-expr eliminate the negative cases?

23:10 gog has joined #osdev

23:10 <doug16k> could do that

23:10 <clever> and that could itself be another static inline

23:10 <clever> with template

23:10 <doug16k> in newer C++ you can force it to be compile time with constexpr if

23:10 <gog> public static inline private constexpr

23:12 <doug16k> v16ld and v8ld could even not be inline

23:14 <clever> doug16k: what would i gain from not being inline?

23:14 <doug16k> if it is C code, being inline wouldn't help nearly as much as if it is emitting a real vector instruction

23:15 <doug16k> you might cause more cache misses than the call overhead saving

23:15 <doug16k> the real vector instructions will be compact

23:15 <clever> gist updated

23:15 <doug16k> they should be inline

23:16 <clever> i think its compiling down to just 2 vector opcodes right now, plus the normal prologue/epilog

23:17 <clever> http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc183.htm

23:17 <bslsk05> www.jaist.ac.jp: MOVDQA--Move Aligned Double Quadword

23:17 <clever> movdqa 0xfb4(%rip),%xmm0

23:18 <clever> doug16k: does the intel vs at&t plauge extend even to the mmx opcodes???

23:18 <doug16k> no that is the same as normal

23:18 <doug16k> intel has movdqa

23:19 <clever> according to the docs, the 2nd argument is the source

23:19 <doug16k> usually you see movaps - single precision one

23:19 <clever> but looking at the asm, the first thing it does, is save an mmx reg (it never write to), to ram!

23:19 <doug16k> dqa is integer one

23:19 <geist> clever: of course. itd be even weirder if they flipped styles for new instructions

23:19 <geist> gotta at least be consistent

23:20 <doug16k> the docs are intel syntax. the second argument is normally the source

23:20 <clever> geist: so is the above a reg->ram or ram->reg operation? when looking at `objdump -d` with all defaults

23:20 <doug16k> last arg is destination even for avx

23:20 <geist> left to right in at&t, right to left intel

23:20 <geist> though i think it's a bit more subtle

23:21 <geist> more like A, B, C at&t

23:21 <geist> C, A, B intel

23:21 <clever> ah, so that plague does continue, and objdump doesnt agree with the intel docs i linked

23:21 <geist> since most opcodes on x86 are 2 address, it's less obvious

23:21 <doug16k> tell objdump to use intel syntax if that's a problem

23:21 <clever> doug16k: yeah, checking the --help now

23:21 <geist> otherwise just deal with it

23:22 <clever> -M intel-mnemonic now says `movdqa xmm0,XMMWORD PTR [rip+0xfb4]`

23:23 <doug16k> of course

23:23 <clever> now it agrees with the docs, and is a bit more verbose

23:23 <doug16k> says exact same thing as the at&t you said earlier

23:23 <doug16k> just way longer

23:23 <clever> yep

23:24 <clever> movups XMMWORD PTR [rax],xmm0

23:24 <doug16k> the assembler doesn't need your help knowing it is an xmmword ptr

23:24 <clever> my first guess, looking at this asm, is that its just a 16 byte ram->reg->ram copy?

23:24 <doug16k> where's on earth k thanks bye

23:25 <moon-child> ndisasm uses 'oword' rather than 'XMMWORD PTR'

23:25 <moon-child> which is a bit nicer

23:26 <doug16k> movaps loads 128 bits yes

23:26 <doug16k> so does movdqa

23:26 <doug16k> loads or stores

23:26 <clever> ah, so its more of a dumb uint128_t based mov, and how you interpret those bits, depends on what opcode you use later

23:26 <doug16k> must be aligned. there are "u" variants that work unaligned

23:27 <doug16k> right

23:27 <doug16k> it's a bag of bits in movaps movdqa world

23:27 <geist> i do wonder if o modern x86s the aligned/unaligned version make any different

23:27 <doug16k> no difference

23:27 <geist> is there an implied weak memory model on the unaligned stuff maybe?

23:27 <doug16k> there's a cpuid bit to see if u is worse

23:27 <geist> ah

23:28 <doug16k> newer will handle aligned and unaligned the same. older are slower on unaligned

23:28 <moon-child> doug16k: iirc movaps and movdqa function as a hint of some sort

23:28 <doug16k> tells it which domain it is

23:28 <moon-child> like if you're goign to actually do floating ops you should use the float instruction, or int ops you should use the int instruction

23:28 <moon-child> but if you're just shuffling memory doesn't matter

23:28 <doug16k> there is a 1 cycle penalty when transition between integer and float domain

23:28 <moon-child> right

23:29 <moon-child> oh so in that case it's probably better to prefer the *ps instructions to the others?

23:29 <moon-child> because somebody else was most likely already using the simd regs for fp math, so you don't want to transition?

23:29 <doug16k> what matters is what domain the upcoming instruction that uses the value is

23:30 SlyFawkes has joined #osdev

23:30 <doug16k> if you movaps then srli then it's not good

23:30 <moon-child> uses the value, but doesn't matter if it writes to it?

23:30 <doug16k> if you movdqa then addps it is not good

23:30 <moon-child> like if I movdqa xmm3, whatever; addpd xmm3, xmm2, xmm1 does that pay the penalty?

23:30 <clever> now to implement a vst function, and test dumping the matrix contents...

23:30 <doug16k> yes

23:31 <moon-child> so it is better to use the fp instructions for shuffling memory, assuming the fp instructions are generally more common

23:31 <doug16k> oh it breaks the dependency though

23:31 <doug16k> what matters is the latency between two things in a dependency chain

23:31 <doug16k> wrong domain = 1 extra cycle of latency

23:32 <doug16k> you can interleave the two domains no problem in instruction scheduling

23:32 <doug16k> what matters is what domain that register is

23:32 <geist> honeslty still surprised with ERMS that there's still some ability to moe data faster with AVX in some situations, or so i have heard

23:32 <geist> seems like a proper erms internal implementation is just directly fed into the load/store unit

23:33 <moon-child> there's a startup cost I think

23:33 <moon-child> doug16k: hmm, can simd registers be renamed?

23:33 <geist> thoguh i guess it still has to mess with the corresponding integer registers and whatnot because it can still be interrupted

23:33 <doug16k> yes

23:33 <doug16k> massively renamed

23:33 <moon-child> then if you're just writing to the register in the wrong domain, couldn't you rename to avoid the penalty?

23:34 <clever> template <typename T> static inline void matrix_write(int x, int y, T value)

23:34 <clever> how would i help gcc infer the return type? error: there are no arguments to 'matrix_read' that depend on a template parameter, so a declaration of 'matrix_read' must be available

23:35 <clever> oops, for template <typename T> static inline T matrix_write(int x, int y) {

23:35 <geist> yah that can't be implicitly deduced because return type

23:35 <geist> thought some of the newer bits with auto maybe can?

23:36 <geist> i never know precisely how you can use auto in function declaractions, so i usually just try and see and soetimes it surprises you

23:36 <clever> how would i specify it?, since i have the same T param one function call up

23:36 <doug16k> you can force it like matrix_write<decltype(some_expression)>(...

23:36 <doug16k> 2+2 would make int

23:36 <doug16k> *bad_things would be bad_type_t

23:37 <doug16k> reference?

23:37 <doug16k> or const reference if bad_things is a pointer to const

23:38 <moon-child> clever: I don't see how you could infer the return type. But you can make your call be matrix_write<T>(whatever, whatever>

23:38 <doug16k> that's why you see std::remove_reference<T>::type

23:38 <clever> dst[(stride*r) + i] = matrix_read<decltype(dst[0])>(x,y+i); was accepted by the compiler

23:38 <doug16k> you got it

23:38 <clever> ah, matrix_read<T> is also accepted

23:38 <clever> it wasnt before, due to typos

23:39 <doug16k> of course if you have it already, use it :)

23:39 <clever> i prefer <T> over dst[0], i want to give it a type, not a random element from an array of that type

23:39 <doug16k> if it were auto and your code really didn't know, you could use my decltype trick to escape it

23:41 <clever> vpu-support> include/vpu-support-purec.h:30:27: error: cannot bind non-const lvalue reference of type 'unsigned char&' to an rvalue of type 'unsigned char'

23:41 <clever> vpu-support> 30 | return matrix[x][y] | (matrix[x][y+16] << 8);

23:41 <doug16k> if you find yourself not having a clue what type, but you have an expression that is that type, you can use decltype

23:42 <geist> yah decltype is pretty much always 'the type of whatever this is' iirc

23:42 <doug16k> that's why I mentioned the remove_reference thing

23:42 <clever> yeah, i think i see why that worked now, its just returning the T type back out of T*dst

23:43 <clever> oh, i think i kinda see what the above problem is now

23:43 iorem has joined #osdev

23:43 <clever> the 16bit read, is being compiled, when T is 8bit

23:43 <clever> and const-expr hasnt eliminated that branch yet

23:44 <clever> the above, is under a case 2, of switch (sizeof(T)) {

23:44 <clever> i need a constexpr flag, to make it entirely abort the other case sections?

23:45 <klysm> movdqa xmm0,oword [rel 0xfbc]

23:45 <doug16k> use constexpr if you can yeah

23:45 <doug16k> it authorizes open season taking all assumptions about it

23:46 <heat> hmmm

23:46 <heat> why would -O2 generate bad debug info?

23:46 <doug16k> each newer version of C++ supports doing more impressive things in constexpr

23:46 <heat> -O0 works fine

23:47 <clever> `constexpr int s = sizeof(T); switch (s) {` didnt help

23:47 <clever> same error as i pasted above

23:48 <doug16k> heat, what does this say: your-cross-objdump --dwarf your-thing 2>&1 >/dev/null | wc -l

23:49 <heat> doug16k, O0 or with opt?

23:49 <doug16k> when bad debug info

23:49 <doug16k> that asks for all complaints about dwarf data to be sent to wc

23:50 <heat> 1

23:50 <heat> "x86_64-onyx-objdump: Warning: Location lists in .debug_loc section start at 0x180"

23:50 <doug16k> mine is 0

23:51 <heat> this problem only arises with clang

23:51 <doug16k> on system objdump, loads of warnings

23:51 <heat> gcc works okay

23:51 <doug16k> ah

23:51 <heat> I don't get a "oh this was optimised out and whatever", I just get garbage

23:52 <heat> the stack trace is accurate, the other debug info isn't

23:52 <doug16k> what if you use -g3 instead of -g

23:52 <heat> when I switched on O0, I get good values

23:52 <doug16k> or -ggdb

23:52 <heat> hold on

23:53 <doug16k> I have coerced screwy debug to work by plaing with -g

23:54 <heat> nope

23:54 <heat> -g3 gives me garbage still

23:54 <doug16k> what if you turn off the fancy value tracking stuff

23:54 <doug16k> so it doesn't try so hard to always see register variables right

23:55 <heat> how do I do that?

23:55 <doug16k> -fno-var-tracking -fno-var-tracking-assignments

23:56 <clever> yeah, i'm just totally stuck

23:56 <doug16k> turns off heroic attempts to track register variables

23:56 <clever> no matter what i do, gcc refuses to let me do a <<8 with an uint8_t return type

23:56 <doug16k> clever, anything << 8 is zero if uint8_t

23:57 <doug16k> shift in 8 zeros from the right

23:57 <clever> doug16k: it was the decltype!

23:57 <clever> dst[(stride*r) + i] = matrix_read<T>(x,y+i); compiled

23:57 <heat> nope

23:57 <clever> dst[(stride*r) + i] = matrix_read<decltype(dst[0])>(x,y+i); failed

23:58 <clever> doug16k: and that <<8, was in a `if (sizeof(T) == 2)` block, so it would never run for uint8_t

23:58 <clever> but with decltype, it was being fussy

23:59 <doug16k> that's probably why I do it with overloads