#osdev on 2022-04-22 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <moon-child> they do fuse things, but riscv wants much bigger fusion windows

00:00 * geist nods

00:00 <geist> microblaze deals with the delay slot in a silly way: two jmp instructions, one with and one without

00:01 <geist> pick which one. the one without a delay slot takes an extra cycle. have a nice day.

00:01 skipwich has quit [Remote host closed the connection]

00:01 skipwich has joined #osdev

00:02 thinkpol has quit [Remote host closed the connection]

00:02 <moon-child> I think it was even proposed that jump over a mov should get fused and turned into a cmov

00:02 <moon-child> or something like that

00:02 <heat> isn't fuse everything the main prerequisite for fast risc?

00:03 <heat> there's no way a load-do_op-store arch could be faster than add $10, (mem) arch otherwise

00:03 thinkpol has joined #osdev

00:03 <heat> also the weird hoops you need to jump through to load immediates into addresses

00:04 <moon-child> isn't there?

00:04 <geist> yes but it's basied on the idea that the add $10, (mem) is really just a load-do_op-store inside anyway

00:04 <geist> the fusing thing is more of a pragmatic approach to not quite internally riscifying *everything* that modern designs do

00:04 <moon-child> I mean, bottlenecks are branches and cache. If you can stuff in better branch predictors and more cache in exchange for simpler design, you win

00:05 <geist> yah and i *do* think the 16bit instruction stuff in riscv is a real win

00:05 <moon-child> I don't think this has paid off, but that's the idea

00:05 <geist> it seems to be well utilized by the compiler and keeps the instruction size near x86 size

00:05 <heat> do you need to opt in?

00:05 <moon-child> 16bit instructions are neat, but intel/amd demonstrate that you _can_ do full-on variable width with ok performance

00:05 <moon-child> even if it is a pita

00:05 <heat> i don't think i've seen those instructions

00:06 <geist> you probably have, it's just transparent. disassemble something you compiled and you may have seen it

00:06 <heat> the 'c' extension right?

00:06 <geist> the assembler is even allowed to substitute .c versions

00:06 <geist> moon-child: sure but the 16 bit stuff is a good compromise. it gets pretty close to the same density and is still much easier to decode

00:07 <moon-child> yes

00:07 <geist> i dont think an x86 machine has yet gotten to the same level of parallel decode as has yet happened on risc machines

00:07 <geist> POWER, Apple M1, etc are 8+ at this point

00:07 <geist> and that's a full cache line at a time

00:08 <geist> i think the new tiger lake or zen3 is 5 or maybe 6 now?

00:08 <moon-child> well, they do have uop cache. Though I think agner said refilling uop cache was a bottleneck on some core

00:08 <geist> yah

00:08 <geist> the earlier zens i know were i cache bottlenecked i think. could only pull in 16 bytes at a time, so you could get a 5 way decode but only for very simple instructions

00:08 <heat> geist, are the 16-bit instructions suffixed or something?

00:09 <geist> heat: yep. first bits of the instruction (low bits) basically tell what the instruction size is

00:09 <geist> 00, 01, 10 are 16 bit, 011 are 32 bit, 0111 are 48 bit, 01111 are 64bit and so on

00:09 <geist> i dont think there are any 48 and 64 bit instructions yet defined

00:09 <moon-child> huh I didn't know they had left space for bigger instructions

00:10 <geist> yah it *does* mean you burn 3 bits for a 32bit instruction, as a compromise

00:10 <geist> and it's clever that it burns more bits in larger instructions

00:10 <heat> llvm-objdump -d boot/vmonyx | grep '\.c.*' <-- should work right?

00:10 <geist> but having this mixed in the stream is far more useful than the arm32/thumb2 switch-on-branch scheme

00:10 <heat> cuz i don't see any compressed instructions in my binary

00:11 <geist> can you pastebin it?

00:11 <geist> (i'm sure you're right, just curious)

00:11 <heat> the whole binary? it's 1MB xD

00:11 <geist> hah okay.

00:11 <moon-child> hmm, what if you made an out-of-band rle scheme for instruction lengths?

00:11 <geist> well, how are you compiling it?

00:11 <heat> geist, see "Onyx boot image (riscv64) (llvm)" https://github.com/heatd/Onyx/actions/runs/2204888603

00:11 <bslsk05> github.com: ide: Fix bug in DMA code · heatd/Onyx@1a9e494 · GitHub

00:11 <moon-child> '5 32-bit instructions; 2 16-bit instructions; 1 64-bit instruction; ...'

00:11 <heat> march=rv64imac

00:11 <moon-child> bet that would save some space

00:12 <moon-child> jump would have to take two addresses, though, or have some out-of-band translation

00:12 <heat> and yes that's a zip of a .tar.zst of a directory

00:12 <heat> that's the github actions life

00:12 <heat> everything is a zip

00:12 <geist> oh you're not specifying anything at all on the command line

00:13 <geist> so it's whatever default the compiler is

00:13 <heat> hm?

00:13 <geist> well, i didn't see it looking at a gcc compile in your risc thing

00:13 <geist> didn't see a -march or whatnot

00:14 <heat> https://github.com/heatd/Onyx/blob/master/kernel/arch/riscv64/make.config#L2

00:14 <bslsk05> github.com: Onyx/make.config at master · heatd/Onyx · GitHub

00:14 <geist> already closed the window though (see previous discussion about me always too quickly closing windows)

00:14 <geist> then it should. see the 'c' extension

00:14 <geist> but i looked at your build and didn't see any of those in the compile line

00:14 <geist> maybe the build system isn't working right?

00:14 <geist> or it was building user space differently?

00:15 <heat> it's building userspace differently

00:15 <geist> well, i dunno. you should figure it out

00:15 <heat> the kernel stuff has [CC] or [CXX]

00:15 <geist> also ifyou're using clang it may not support 'c' extension

00:15 <heat> you were probably seeing musl's compiles

00:15 <geist> i think the RV support in clang is pretty far behind

00:15 <geist> okay. anyway, it's your problem to debug :)

00:16 <geist> it *should* be basically totally transparent

00:16 <heat> lets see gcc

00:16 <geist> and a nice code size decrease

00:17 <heat> well, if that regex is correct then it's not using compressed instructions

00:17 <geist> basically except for a few academic or hobby riscv cores everything seems to support the compressed extension and it's well designed such that it shouldn't be any slower

00:17 gog has joined #osdev

00:17 <geist> perhaps the disassembler doesn't put it there. look at it with a text editor and just look at the size of the opcode

00:18 <heat> ah yes I think I'm seeing two byte instructions

00:18 <heat> if those are all compressed then yeah, it's using them

00:19 <geist> probably just llvm-objdump which i've found to be far behind binutils's objdump in the disassembly department

00:20 <heat> 9512 add a0,a0,tp <-- is this compressed?

00:20 <geist> yup

00:20 <heat> ok wonderful

00:20 <heat> you can't see the c. in the disassembler

00:20 <heat> both llvm and gcc are using it

00:20 <geist> the disassembler?

00:20 <heat> yup

00:20 <geist> how are they using the disassembler?

00:20 <geist> you mean the assembler?

00:20 <heat> llvm-objdump/riscv64-onyx-objdump

00:21 <heat> I was looking for compressed instructions in the executable

00:21 <geist> yes. and it didn't show it. i know.

00:21 <geist> the *assembler* should know when to emit compressed vs non based on some rules

00:21 <geist> but then clang has it's own built in assembler, and it should know to

00:21 <geist> and it appears to, unless you've explicitly told it not to use its builtin assembler

00:22 <heat> yes it's working

00:22 <geist> ideally the compiler picks instructions based on whether or not it knows it'll be compressed or not

00:22 <heat> i was just looking for the wrong thing

00:22 <heat> plenty of 16-bit instructions

00:22 <geist> and favors those, etc. it means there are at least two clases of registers, etc

00:22 <geist> since the compressed instructions can only access 8 of the 32 registers

00:23 <geist> it's the main reason the way the a/s/t instructions are split up in weird ways on riscv, they're packed in such that there's a medly of them in the 8 registers compressed gets, which iirc is r8-r15

00:23 <geist> a/s/t registers, not instructions

00:23 <heat> ah geez after 7 years my kernel is still smaller than the kernel-resident ACPICA

00:23 <heat> in LOC

00:24 <heat> we'll see in 2029

00:24 <geist> https://en.wikichip.org/wiki/risc-v/registers has a nice table that also shows the compressed numbering

00:24 <bslsk05> en.wikichip.org: Registers - RISC-V - WikiChip

00:26 <heat> gdb and objdump are insconsistent in the way they show riscv disasm

00:26 <geist> also why sometimes you see thec ompiler really try to stick with a0-a5 when doing calculations whatnot, when it really could expand out and use t0-t6 and whatnot

00:26 <heat> inconsistent*

00:27 <geist> depends on which objdump you're talking about

00:27 <heat> binutils

00:27 <geist> binutils objdump i'd expect to be much more consistent with gdb

00:27 <geist> ah

00:27 <heat> in this case llvm is more consistent with gdb than binutils

00:28 <geist> ah

00:28 <geist> might be some switches you can pass it

00:28 <heat> heh all my context switching instructions are compressed

00:28 <heat> neat

00:29 <heat> (the register load/store ones)

00:29 <geist> at least a lot of them i assume, the ones that line up with the compressed registers

00:30 nyah has quit [Ping timeout: 276 seconds]

00:30 <geist> the neat key thing they did in riscv that's nice is every compressed instruction is hard defined as having a 32bit equivalent

00:30 <heat> https://gist.github.com/heatd/8081c329bffe664a2bd4feb07866f23c

00:30 <bslsk05> gist.github.com: riscool · GitHub

00:30 <heat> all of them

00:30 <nomagno> How do they manage to compress RISC-V into 2 bytes?

00:31 <nomagno> Is that one byte for opcode and another for data, or what?

00:31 <heat> no

00:31 <geist> heat: oh huh, there must be a 5 bit register form for the load/store instructions

00:31 <geist> been a while since i looked at it

00:32 <geist> nomagno: no it's pretty tightly packed. usually the first thing that goes is 3 address instructions

00:32 <geist> ie, no more `add a, b, c` because yo ucan't fit 3 operands in

00:32 <geist> and then the second thing that goes is access to all 32 registers, since that's 5 bits a register (two operands would use up 10 of the 16 bits)

00:32 <geist> so they usually can only encode 3 bits, (r8-r15)

00:32 <geist> oh wait, scratch the 3 operand stuff

00:33 <geist> they can, because they use 3 bits

00:33 <geist> anyway it's well specced out. but basically you cant do as much with the compressed ISA, but it always matches 1:1 with an uncompressed i nstruction

00:34 <nomagno> So essentially the assembler figures out how to translate your code into an insane register pressure, 100% PC-relative-indexed version?

00:34 <geist> so if you can as a compiler favor a subset of the ISA the assembler is free to substitute a smaller form

00:34 <geist> nomagno: assembler can only really replace one instruction at a time. it's not really that complicated

00:34 <geist> basically if the compiler generates an instruction that has a compressed version, the assembler uses that instead

00:35 <nomagno> geist: you'd be surprised what modern assemblers can optimize away.

00:35 <geist> sure, but in thei case they do not.

00:35 <nomagno> Fair

00:35 <geist> it's a simple substitution

00:35 <nomagno> Yeah yeah, it's on the assembly writer, not on the assembler

00:36 <jimbzy> Yeah, that's the same thing my calculus professor said, geist. "It's simple substitution..."

00:36 <geist> heh

00:36 <geist> nomagno: yah that's right the compiler is aware of what will and will not go to compressed and chooses accordingly

00:37 <heat> this is why riscv is a fake arch

00:37 <heat> no 15-byte nops? really?

00:38 <geist> heat: aaaah yes i just double checked. it's *stack pointer* based load/stores

00:38 <geist> those have a special compressed form that omits the base register but has 5 bits of target register

00:39 <heat> unfortunately there's no load/store compressed form for tp

00:39 <geist> right, because tp is outside of the 8

00:39 <geist> table 16.1 in the spec is great, it is a list of all the compressed forms

00:41 <geist> https://usercontent.irccloud-cdn.com/file/3bCJAUVi/riscv-compressed.png

00:41 <geist> the compressed instructions are pretty irregular, but that's what you get when you compress things to less bits

00:41 sonny has quit [Ping timeout: 252 seconds]

00:42 <geist> and also yeah. everything is at best 2 registers

00:42 <geist> so you could only substitute a 3 register add if it were something like `add s0, s0, a0`

00:43 <heat> i find opcodes really confusing still

00:43 <heat> probably because I never really look at them

00:43 <geist> this is good, it means you're not a computer

00:44 <heat> beep boop send me your credit card information beep boop

00:44 <geist> https://usercontent.irccloud-cdn.com/file/VqguMHU2/riscv-instructions.png

00:45 <heat> dear humans what is your favourite prefix

00:45 <geist> the 32bit instructions are somewhat more regular. the parts of riscv that are a bit unclean to me but there's a reason for it is the way immediates are spread around

00:45 <geist> also i cant not think of the game when i see R-Type

00:47 <heat> oh wow GAS macros can start with a .

00:49 <heat> geist, do you know off the top of your head how large an offset can be in riscv?

00:49 <heat> 12 bits?

00:50 sonny has joined #osdev

00:50 <geist> 12 bits i think

00:50 <heat> that sounds reasonable

00:50 <geist> usually the I type instruction in the above image

00:51 <geist> yah it actually slots in with the way the address computation stuff works

00:51 <heat> i really wanted to find a way to get a nice instruction sequence for tp accesses

00:51 <heat> unfortunately inline assembly doesn't want to play along

00:51 <geist> also keep in mind, and this is a little funny with riscv, all immediates are signed

00:52 <geist> so that's 12 bit signed. ie +/- 2048ish

00:53 <sonny> interesting

00:53 <sonny> any rationale?

00:53 <geist> there's some reason for it, like *all* immediate computation in the cpu itself goes through the same logic

00:53 <sonny> ohhhh

00:53 <sonny> that makes sense

00:53 <geist> also the way the immediates are split up in the above table is arranged such that bit 31 in the instruction is always the sign extend bit

00:54 <geist> ie, the high bit of the encoded immediate always lies in bit 31

00:54 <heat> all my percpu accesses in riscv have 3 instructions and this doesn't sit right with me

00:54 <geist> to make the muxing logic a bit more consistent apparently

00:55 <heat> ahh what my add_per_cpu is even worse

00:56 <heat> i should've optimised this

00:56 <geist> anyway, i do recommend reading through the spec, and there's i think an online version of the 'riscv primer' i believe is the name

00:56 Likorn has quit [Quit: WeeChat 3.4.1]

00:56 <heat> i just did get_per_cpu() and then write_per_cpu(value + n)

00:56 <geist> it's a short read, goes through the architecture and has a lot of asides about why this or that decision was made

00:56 <heat> i've read the spec but mostly skipped through the instructions

00:57 <geist> http://riscvbook.com/ is the book

00:57 <bslsk05> riscvbook.com: The RISC-V Reader: An Open Architecture Atlas

00:57 <geist> i think you can get a free version, but it's a nice read

00:57 <heat> i think the only thing I don't have yet is SIMD and quad extension support

00:57 <geist> http://riscvbook.com/greencard-20181213.pdf looks like a good thing to have around

00:58 <heat> oh nice find!

00:58 <sonny> patterson at it again :-)

00:58 <heat> geist, do you think there's any reasoning behind simd != floating point or is it standard in most non-x86 archs?

00:59 <geist> probably just so you can include a smaller version

00:59 <geist> ie make a single precision cpu, double + single, or vector + double + single

00:59 <heat> you also have quad

01:00 <geist> also i think the vector stuff has just now been properly ratified. i'm not sure the vector bits on that greencard pdf are up to date since it seems to be from late 2018

01:00 <geist> aso i'm waiting for dh` to get on my case for spewing nonsense

01:03 <heat> they got clz!

01:04 <heat> also max and min instructions which is pretty interesting

01:05 <moon-child> simd != floating point on x86 too

01:05 <moon-child> also x86 has simd max and min, and clz in avx512

01:06 <geist> yeah atomic min/max too. i think ARM64 just added those too which i find interesting. not a thing i've needed but i haven't built any algorithms behind it

01:06 <heat> if you're using x87 fp you should stop immediately :P

01:06 <moon-child> geist: yeah, atomic or/and seems more useful (assuming you have add already)

01:06 <heat> the p extension seems to only have SIMD for 64-bit (even for rv32)

01:07 <moon-child> (amusingly, min/max _are_ or/and on booleans)

01:07 <geist> heat: were are you seeing the clz stuff? is it in the 'b' extension?

01:07 <geist> not sure what the status of that extension is to be honest

01:07 <heat> https://raw.githubusercontent.com/riscv/riscv-p-spec/master/P-ext-proposal.pdf

01:07 <geist> aah yeah

01:08 <heat> chapter 6

01:09 <heat> clz and max/min are also in the b extension now that i'm seeing

01:09 jhagborg has joined #osdev

01:10 <heat> actually chapter 4 lists the duplicated instructions

01:11 <heat> also, subextensions????????

01:12 <geist> what i dont know is how far along a proposal has to be to have gotten this far

01:13 <geist> ie, is the fact that it exists in the official repo mean it's in some sort of final ratification or is it still just an idea someone is tossing around

01:14 <heat> i know that linux doesn't merge riscv code for extensions that haven't been ratified yet

01:14 <klys> subextensions exist in the repo? this question is probably simpler than it looks

01:16 heat has quit [Quit: Leaving]

01:17 <klys> what's new

01:17 <geist> aww klys you ran heat off

01:17 <geist> ice cold

01:17 <klys> sorry about that, I don't have many predilections about temperaments

01:17 <geist> klys is entropy in action

01:18 <geist> anyway not much, drinking some coffee, should go take a wlak

01:18 <klys> well I've been working

01:18 <klys> on something not technical

01:18 <geist> earning some cheddar?

01:18 <klys> yeah I have to decide what to put it on now

01:19 <klys> anyways just a bit curious

01:19 <klys> and I now own a fun domain just working on database frontends and javascript too

01:19 <geist> noice

01:19 <klys> and I might buy an epyc

01:20 <klys> with my huge tax return

01:21 <geist> i was thinking of trying to get a real(ish) server board for my ryzen

01:21 <geist> not sure its worth it though. asrack has something kinda like this

01:22 <klys> socket?

01:22 <geist> AM4

01:22 <klys> right I was looking at SP3

01:23 <klys> so eh, are there new changes to lk?

01:23 <geist> see, ported to 68k. ported to a board i got

01:24 <geist> added pluggable network support, tcp outgoing sockets, a cheesy IRC client (To talk to sortix)

01:24 <geist> working on spiffying up the FAT driver to properly support RW

01:24 <klys> wew fun I hope sortie is enjoying this

01:24 <geist> added e1000 driver and the start of an AHCI driver a while back

01:24 <geist> need to finish up the AHCI driver

01:24 <klys> neat

01:25 <klys> ahci is for sata correct?

01:25 <geist> yes

01:25 <klys> so you've been doing cluster math? how's that coming for ye?

01:26 <klys> with the FAT

01:26 FragmentedCurve has left #osdev [#osdev]

01:27 <klys> and I guess the board you got is that m68010 board I saw last week

01:28 <geist> heh cluster math

01:29 <geist> found a better version of the FAT spec from MSFT laterthan the 1.03 version where the writer is being an ass the whole time

01:29 <geist> someone cleaned it up to include it in the SD card spec i think

01:29 <klys> link

01:29 <geist> made it not so condescending

01:29 <geist> oh i forget where

01:29 <klys> okok

01:29 <geist> i dont save links to things, i make my own copy so i never have to

01:30 <geist> i think its this one: https://academy.cba.mit.edu/classes/networking_communications/SD/FAT.pdf

01:30 <klys> I've had a lot of new browser fun since purchasing this 32GB-RAM dell in november

01:30 <geist> ie, the first hit on google

01:31 <geist> i had an older version of that which is written in this overtly condescending way

01:31 <klys> oh, not too long either

01:31 <geist> like 'all you idiots that dont understand FAT here's how it is and listen up and quit fucking it up'

01:31 <klys> right hehe

01:32 <geist> but yeah i was just reading it closely last night and i think i finally understand cluster math. what a PITA

01:32 <klys> I'd been pushing for changes to the qemu docs to include lcyls= lheads= lsecs=

01:32 <klys> even though that's a seabios feature

01:32 <klys> those are still qemu options

01:32 sonny has quit [Ping timeout: 252 seconds]

01:32 <klys> and qemu should document them

01:33 <klys> you must have an equation then involving "reserved sectors"

01:33 <geist> sure

01:34 <geist> the spec actually describes it pretty precisely, it's just non intuitive

01:34 <klys> yeah on page 29/37

01:34 <geist> as in cluster 2 is actually the first data block after the reserved sectors after the fats after the root dir (if on fat 12/16)

01:35 <geist> and the first 2 clusters of the FAT are used for other purposes, but instead of burning two clusters of allocation, they offset everything from that

01:35 <geist> *eyes roll*

01:35 <geist> like you can just see how it's hack upon hack

01:35 <klys> so do the reserved sectors come after the FAT table?

01:35 <geist> and every step of the way someone could have designed something a little cleaner

01:35 <geist> before

01:35 <klys> I was somehow under the impression it was before, yeah

01:36 <geist> theyr'e basically there so you can pad out the first FAT such that yuor first cluster arrives on a proper boundary

01:36 <klys> ah cool

01:36 <geist> FAT12 stuff tends to pack it all in with no wasted space, whcih makes sense for floppy disks, but then i think you'd never use anything but 512 byte clustesr on a floppy disk

01:37 <geist> so it doesn't matter if the clusters are unaligned

01:37 <klys> so that kind of math exists in the format tool, to cluster align the FAT

01:37 <klys> does that sound right?

01:37 <geist> right

01:38 <klys> okay and the FAT32 is different then too?

01:38 <geist> also since the FAT length in sectors is a field in the BPB you can pad out your FATs such that they align if you want

01:38 <geist> no. FAT32 is fundamentally the same as FAT16 in layout *except* the root dir is a regular file now, instead of being a fixed length right after the fat

01:38 <geist> so that means cluster 2 (the 1st cluster) starts immediately after the fats since there is no reserved root dir space

01:39 <klys> ah, so the root takes FAT entries, and \ is cluster two.

01:39 <klys> er

01:39 <geist> yah, though it's actually specced such that it doesn't have to be cluster 2

01:39 <geist> there's a field that says what the starting cluster is for the root dir

01:39 <klys> oh because that field exists

01:40 <klys> except you add that field to the reserved sectors

01:40 <geist> but i think the spec says it should be 2, unless that's a bad sector, in which case pick the first available cluster that doesn't have a bad sector in it

01:40 <geist> no. you dont. reserved sectors are before the FAT(s) and in sectors

01:40 <klys> oh then I see thanks

01:41 <geist> cluster2 is the first available Data Cluster, which starts after the last FAT and after the root dir sectors

01:41 <geist> but in the case of FAT32 the root dir sectors == 0

01:41 <geist> anyway it's silly, but onc yeou grok it it makes sense. the pdf from above has a decent graphic pretty early on

01:41 <klys> because cluster two is at location zero

01:42 <geist> cluster two is at data block 0. and data block zero is defined as ...

01:42 <klys> that is, the math forces cluster two to the start, block zero

01:42 <geist> data block zero, to be precise. not block zero of the device/partition

01:42 <klys> yeah

01:43 <geist> iirc ext* does a more regular job of this and defines the first block to be block 0 of the device, and simply marks those things as occupied in the bitmap

01:43 <geist> since that's where the bootsector/etc is

01:43 <geist> so it's at least consistently numbered from the 0th byte of the volume itself

01:44 <geist> which is i think how most sane things do it

01:44 <klys> well is your code looking fairly stable from learning this?

01:44 <geist> i'm starting with some existing, mostly broken code, so it's not yet morphed into a good place

01:44 <geist> but this part of the code was mostly okay

01:45 <geist> but i'm basically rewriting it more or less completely as i go

01:45 <klys> okay this is enlightening, do you often start with code that has become messy and forgotten?

01:45 gog has quit [Ping timeout: 246 seconds]

01:46 <geist> not really, but in this case it's pretty clean and forgotten code, just incomplete

01:46 <klys> ah is it thinfs?

01:46 <geist> so it was a reasonable starting point, since it basically works, just needs to be redone

01:46 <geist> hmm? no. it's fat.

01:46 <klys> I remember thinfs

01:46 <klys> it didn't work

01:46 <geist> https://github.com/littlekernel/lk/tree/master/lib/fs/fat32 just some implementation someone tossed in LK years ago

01:47 <bslsk05> github.com: lk/lib/fs/fat32 at master · littlekernel/lk · GitHub

01:47 <geist> it basically worked for fat32 for simple 'read file all at once'

01:47 xenos1984 has quit [Read error: Connection reset by peer]

01:47 <geist> but i've started to redo it, so it'll probably morph into something else by the time i'm done

01:47 <geist> i haven't pushed my changes into it

01:47 sonny has joined #osdev

01:48 mahmutov has joined #osdev

01:51 <klys> oh yeah that source tree looks trivial

01:53 <klys> I think some cameras hardwire the location of FAT data structures

01:56 <klys> all the math was right here https://github.com/littlekernel/lk/blob/master/lib/fs/fat32/fat.cpp#L145

01:56 <bslsk05> github.com: lk/fat.cpp at master · littlekernel/lk · GitHub

01:58 sonny has quit [Ping timeout: 252 seconds]

01:58 <klys> mounting FAT without running a r-o dosfsck on it could potentially result in corruption in some cases

01:59 <klys> or at least having some sanity checks (eg. the FATs match)

02:02 sonny has joined #osdev

02:04 xenos1984 has joined #osdev

02:11 sonny has quit [Quit: Client closed]

02:17 mahmutov has quit [Ping timeout: 246 seconds]

02:32 <wxwisiasdf> osdev unit testing? :)

02:33 <wxwisiasdf> is there any os that has unit testing?

02:33 <wxwisiasdf> like i don't know, `kdebug` for a micro and then like some kind of stress tester hypervisualizer that brings drivers to their limits or so

02:36 <Mutabah> I use the rust unit testing framework for a little bit of testing

02:52 <zid> turns out we have really good integration testing

02:52 <zid> you boot it then run doom

02:52 <zid> and wait for it to crash

03:00 jhagborg has quit [Remote host closed the connection]

03:00 jhagborg has joined #osdev

03:10 Ali_A has joined #osdev

03:20 <vdamewood> Does an x86(_64) CPU need any features to support UEFI, or is that all in the startup code in the firmware?

03:21 <moon-child> I expect the latter

03:23 <vdamewood> I as well.

03:27 Ali_A has quit [Quit: Connection closed]

03:28 <geist> yah nothing other than whats in teh base x86_64 feature set

03:28 <geist> like paging, 64bit, etc

03:31 <vdamewood> So, no magic instructions found only in extension foo, or anything like that?

03:33 <geist> no

03:34 <geist> wouldn't make any sense for it to do so

03:34 <geist> i suppose it could test for an decide ot use 1GB pages, for example, but it could always test for it

03:34 Ali_A has joined #osdev

03:34 <geist> OTOH if it has to test for it it implies it has the fallback code in the implementation, so since it's not intended to be performant, may as well just assume the feature isn't present

03:35 <geist> and things llike AVX512 or whatot it has no need for

03:35 heat has joined #osdev

03:35 <heat> vdamewood, no.

03:35 <heat> UEFI firmware boots the same as BIOS firmware

03:36 <heat> same address, same everything

03:36 <heat> it's everything after switching to protected mode (SEC phase in UEFI) that's different

03:37 <clever> in the case of coreboot, you have a seperate "init the system" and payload

03:37 <heat> yeah

03:37 <clever> the payload can be seabios (legacy bios api), tianocore (uefi api), or just raw grub!

03:37 <heat> but coreboot's UEFI payload is way different

03:38 <heat> after SEC nothing's initialised except for really basic stuff so you're in 32-bit mode and possibly TDX

03:38 <heat> it ends by finding a firmware volume that has the pre-efi initialisation (PEI phase) code in SPI flash

03:39 <heat> the SPI flash is memory mapped of course

03:39 <heat> you don't even have RAM yet, all cache as ram

03:40 <geist> none of this means you couldn't compile an implementaiton of UEFI to require some newer cpu features if you wanted to, but then that particular binary wont run on older stuff

03:40 <geist> but that may be fine. however it's not baked into the API or whatnot

03:40 jhagborg has quit [Remote host closed the connection]

03:40 <heat> yup

03:40 <clever> i read something about how 17h? family amd chips use a PSP (like intel me) to bring ram up

03:41 jhagborg has joined #osdev

03:41 sonny has joined #osdev

03:41 <clever> and that copies the "bios" to the dram, and maps it to the reset vector

03:41 <heat> it's probably safe to assume that the platform's chipset you're going to run on can have a base feature set but afaik that code is totally generic

03:41 <clever> so those dont have to deal with the whole cache-as-ram stuff, but the ram config is outside of the "bios"'s control

03:41 <geist> yah, no particular reason to specialize it since it's not performance critical

03:42 <clever> the only time ive ever seen the rpi firmware using vector opcodes during init, was for the cache-as-ram setup

03:42 <clever> using a vector store, to write to an entire cacheline at once, so it doesnt try to fetch the missing bits from ram

03:43 <heat> some firmware engineer on the edk2 mailing list was saying that he's seen memory reference code with like a megabyte of debug code that prints histograms and whatnot

03:43 <clever> it feels less like proper cache-as-ram, and more like just avoiding a cache miss/eviction

03:43 <clever> and there are a lot of places where i might have used vector opcodes, but the official firmware doesnt

03:44 <geist> yah if nothing else the uefi stuff might be optimized for space, since flash chips aren't free

03:44 <geist> but then i think it's probably not too tight on the average mobo implementation, especially if there's lots of gui code in the bios setup stuff

03:45 <geist> which would most likely dominate the flash usage

03:45 <heat> i've seen some people with concerns yes

03:45 <heat> i think it's tighter than you'd think

03:46 <heat> might be possible that we get dynamic linking in edk2 this gsoc

03:46 * geist nods

03:46 <heat> right now the whole build statically links libraries which is just wasted space

03:47 <heat> except the boot services and protocols and all that

03:47 <heat> cuz like everything's super modular and all in separate .efi executables everywhere, even before you have RAM up

03:48 <clever> damn

03:48 <clever> i would have assumed it would be a bit of a race to get ram up first

03:48 <heat> no

03:48 <heat> i think you even have a heap before ram is up

03:49 srjek has quit [Ping timeout: 240 seconds]

03:51 <geist> clever: it's also entirely possible all of that ram up stuff happens before the bulk of the big stuff

03:51 <geist> ie, the AMD AEGSA may start and do stuff then pass handoff to something else maybe

03:51 <geist> though i guess heat just says something that counteracts that so dunno

03:52 <geist> thinking of the fairly standardized notion in ARM world of the whole BL1, BL2, BL3x stuff, etc

03:52 <heat> i'm talking about intel platforms

03:52 <geist> yah i know. thinking that stuff would be done in a fairly similar way

03:52 <geist> but i guess there's no reason to think that'd be the case really

03:52 <geist> probably all sorts of history there

03:52 <heat> ram init is done in the intel FSP

03:52 <heat> which is intel's AGESA

03:52 <geist> okay, so that'd be kinda like BL1 or so in arm world

03:53 <clever> or SPL?

03:53 <geist> ie, the highly machine specific stuff that gets you into a runnable space

03:53 <geist> clever: what is SPL?

03:53 <clever> secondary program loader

03:53 <clever> ive seen that on a number of arm boards

03:53 <geist> SPL would e a bit later, hence 'secondary'

03:54 <clever> you typically prepend the SPL to the uboot binary, and write the combined pair at a fixed offset on the SD card

03:54 <geist> BL2 or one of the BL3s in arm parlance

03:54 <clever> the rom loads the SPL, the SPL brings ram online and runs the "secondary" program(uboot)

03:54 <geist> yah that's some rpi nonsense

03:54 <clever> this is done on a lot of non-rpi boards

03:54 <geist> that's all BL31 to arm

03:54 <clever> maybe more the armv7 era

03:54 <geist> sure. again thats 'application level bootloader' after stuff is brought up if you're following ARM's world

03:55 <geist> and yes, you dont have to follow it, but you also can't have a secure boot environment if you dont

03:55 <heat> i've been looking more closely and I think that they prelink PEI core to run at a specific image base

03:55 <heat> because they can't actually relocate the image in SPI flash

03:55 <geist> i've seen *tons* of implementationso f this stacked stuff in ARM, but sicne v8 ARM has tried to standardize it, and they're largely okay

03:55 <clever> i can see how the secondary program could also be ATF, but it sounds like ATF has its own names

03:55 <geist> clever: yes. BL1, BL2, BL3x, etc

03:56 <geist> it's all specced out as generic names for 'blob of code here that has this responsibility'

03:56 <clever> yeah, that at least makes it easier to talk about each phase

03:56 <geist> so someone can call it something like uboot or uefi, etc but it ends up being a phase

03:56 <geist> which i think makes sense once you grok it

03:57 <clever> similar with SPL, the secondary program can be either uboot or uefi

03:57 <geist> something like https://ohwr.org/project/soc-course/wikis/ARM-Trusted-Firmware-(ATF)#arm-trusted-firmware-boot-flow

03:57 <bslsk05> ohwr.org: Arm trusted firmware (atf) · Wiki · Projects / SoC Course with Reference Designs · Open Hardware Repository

03:57 <clever> and i should probably yoink a few of those names for my rpi firmware

03:57 <geist> basically by the time you get to BL33 you're in EL2 or EL1 and you're running in non secure world, and then you can build whatever stack of firmware you want there

03:58 <clever> ive just been calling things bootcode.bin and lk.elf

03:58 <geist> but you've already gone through at least BL1 and BL2

03:58 <clever> but lk.elf is confusing, given that every project creates one

03:58 <geist> yah

03:59 <clever> BL31 most closely fits what ive been calling lk.elf

03:59 <geist> anyway, didn't want to hijack the conversation

03:59 <geist> yah normally this is where you insert the ATF binary, which is designed to run at EL3 and stick around

04:00 <geist> and then the apps bootloader (uboot, uefi, etc) would be BL33

04:00 <heat> btw PEI is going to run in 64-bit mode in new intel platforms

04:00 <clever> but in my case, its not even on the arm core

04:00 <geist> BL32 is when you have some secure OS that you run on the side

04:00 <heat> which means that at least their CPUs can have page tables in cache as ram

04:00 <geist> heat: hmm, what's PEI?

04:01 <geist> besides prince edward island

04:01 <heat> pre-efi initialisation. it's totally efi but before the standard UEFI spec environment and it's the part of the firmware that inits the platform, PCI, ram, etc

04:01 * geist tosses one out for the canadians

04:01 <clever> :D

04:01 <heat> part of PEI still runs in temporary cache-as-ram

04:02 <heat> it running in 64-bit mode means that at least new intel CPUs can have their page tables exclusively in cache, which is pretty interesting

04:03 <heat> guess it works the same as the IDT and GDT which always could be in cache-as-ram

04:04 <geist> i'd think PEI runs on potatoes if nothing else

04:06 <heat> it runs on top of cpu magic like everything else

04:09 <heat> geist, https://wiki.osdev.org/Creating_a_64-bit_kernel#Kernel_Virtual_Memory <-- was this ever the case?

04:09 <bslsk05> wiki.osdev.org: Creating a 64-bit kernel - OSDev Wiki

04:10 <geist> yeah i just dont think it's explained very well

04:11 <clever> i think thats a side-effect of the cpu having a limited number of usable addr bits?

04:11 <heat> it's definitely not true now

04:11 <geist> it seems to be a roundabout way of saying you can put a physmap at the bottom of the kernel, but you can also start off by easily unity mapping 0-2GB to -2GB

04:11 <clever> address sizes : 48 bits physical, 48 bits virtual

04:11 <geist> and then just run the kernel out of where it was loaded

04:11 <clever> so you cant freely use the entire 64bit range

04:11 <geist> ie, if the kernel got loaded to 1MB physical, then you could link it to run at (-2GB + 1MB) and then the simple map at startup would Just Work

04:12 <clever> and to allow for a kernel in the upper half, you have 47 bits that function normally, and then everything else acts as 1 massive BIT

04:12 <geist> i think linux did something like ths for a while, though it's probably more sophisticaed now

04:12 <heat> whoever wrote this seems to have confused linux i386 with linux x86_64

04:12 <geist> yes, i think so

04:12 <heat> they totally ditched high memory in x86_64, and the -2GB mapping isn't linear

04:12 <geist> yeah most likely

04:13 <geist> it still is a convenient way to bootstrap paging if you're okay with the kernel being loaded at a fixed spot phyusically though

04:13 <geist> just set up a simple unity map and then get going

04:14 <geist> but a little bit more effort and you can be more flexible

04:14 <geist> i was probably just pontificating at the time some simple way to bootstrap, not necessarily that it was The WAy

04:15 <heat> ah wait it does indeed still have a linear map of 512MB in -2GB

04:15 <heat> if the docs are up to date that is

04:16 <geist> acgtyually kinda easy to verify with qemu

04:16 <heat> I was looking for a random claim in the osdev wiki that I remember reading when I started out with x86_64 that said you couldn't go from long mode to protected mode without rebooting

04:16 <heat> which is definitely false

04:20 Ali_A has quit [Quit: Connection closed]

04:20 <heat> can't find it, hopefully it's not there anymore

04:20 jhagborg has quit [Ping timeout: 276 seconds]

04:22 jhagborg has joined #osdev

04:27 <CompanionCube> `

04:27 Ali_A has joined #osdev

04:38 <wxwisiasdf> i just disabled mutexes and my os stopped crashing

04:38 <wxwisiasdf> woah it's magic

04:39 <CompanionCube> heat: 64-bit PEI sounds cool, do you know if intel ever made good on their intention to kill off the CSM for new things in 3030?

04:39 <CompanionCube> *2020

04:55 <gorgonical> I really need a USB hub. I have two serial USBs plugged in, headphone DAC, sd card reader USB. Too many things

04:58 sonny has quit [Quit: Client closed]

04:59 papaya has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

05:00 <heat> CompanionCube, well, I can't actually find the CSM code

05:00 <klys> gorgonical, https://www.ebay.com/itm/124349237640

05:00 <bslsk05> www.ebay.com: 7 Port Aluminum USB 3.0 HUB 5Gbps High Speed +AC Power Adapter For PC Laptop Mac | eBay

05:00 <heat> so it looks like it was scrapped around 2019

05:00 <heat> OVMF still has a copy though

05:01 <gorgonical> What a strange design

05:01 papaya has joined #osdev

05:01 <gorgonical> But very enticing

05:01 <gorgonical> Too enticing. I got one

05:02 <gorgonical> thanks klys

05:04 <klys> yw!

05:05 <klys> main reason I favored the design in the first place was to accomodate some of those: https://www.ebay.com/itm/194988974944

05:05 <bslsk05> www.ebay.com: HDMI to USB Video Capture Card 1080P 60fps Recorder Phone Game Live Streaming 4K | eBay

05:05 <CompanionCube> 2019 would make sense, yes

05:05 <heat> yeah it looks pretty dead, only OVMF still has CSM unless there's a considerable amount of CSM code that is maintained by vendors off the tree

05:05 <gorgonical> I see. Yeah you definitely need the space. I had too many of those 4-ganged plugs where even thicker USB drives will jam

05:05 <klys> and then I found out there is a delay in the signal so I can't keep up in real time

05:09 <CompanionCube> i could see AMI or the like maintaining CSM if they wanted/needed, i guess

05:15 <wxwisiasdf> usb toothpaste, nom nom

05:15 <wxwisiasdf> gimme some universal toothbrush, ehci contoller for max dental care

05:17 <CompanionCube> xhci, because 'x' makes anything sound cooler

05:20 john has joined #osdev

05:23 heat has quit [Ping timeout: 260 seconds]

05:29 zaquest has quit [Remote host closed the connection]

05:31 zaquest has joined #osdev

05:34 jhagborg has quit [Remote host closed the connection]

05:34 jhagborg has joined #osdev

05:50 john has quit [Ping timeout: 276 seconds]

05:57 Ali_A has quit [Quit: Connection closed]

07:00 vdamewood has quit [Ping timeout: 246 seconds]

07:04 vdamewood has joined #osdev

07:18 jimbzy has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]

07:19 jimbzy has joined #osdev

07:34 Reinhilde is now known as MelMalik

07:38 jhagborg has quit [Ping timeout: 240 seconds]

07:42 mniip_ has quit [Ping timeout: 620 seconds]

07:53 <corecode> is there a channel on simd/bit twiddling optimization?

07:57 <zid> ask fuz in ##asm

08:05 wxwisiasdf has quit [Quit: Lost terminal]

08:59 GeDaMo has joined #osdev

09:06 nyah has joined #osdev

09:24 eroux has joined #osdev

09:35 marshmallow has joined #osdev

09:59 Burgundy has joined #osdev

10:23 eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

10:26 gog has joined #osdev

10:49 eroux has joined #osdev

11:04 dennis95 has joined #osdev

11:05 gog has quit [Ping timeout: 272 seconds]

11:05 gog has joined #osdev

11:25 gog has quit [Ping timeout: 240 seconds]

11:37 eroux has quit [Ping timeout: 256 seconds]

12:55 Ram-Z has quit [Ping timeout: 256 seconds]

13:02 nyah has quit [Quit: leaving]

13:04 nyah has joined #osdev

13:11 nyah has quit [Ping timeout: 246 seconds]

13:13 nyah has joined #osdev

13:16 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

13:33 nyah has quit [Ping timeout: 246 seconds]

13:36 Ram-Z has joined #osdev

13:40 srjek has joined #osdev

13:45 Ali_A has joined #osdev

14:07 nyah has joined #osdev

14:30 Teukka has quit [Read error: Connection reset by peer]

14:35 Teukka has joined #osdev

14:40 les has quit [Quit: Adios]

14:40 les has joined #osdev

14:42 terminalpusher has joined #osdev

14:57 Ram-Z has quit [Ping timeout: 246 seconds]

15:00 diamondbond has joined #osdev

15:06 vdamewood has joined #osdev

15:12 Ram-Z has joined #osdev

15:18 Likorn has joined #osdev

15:18 pg12 has quit [Ping timeout: 240 seconds]

15:20 diamondbond has quit [Quit: Leaving]

15:31 knusbaum has quit [Ping timeout: 246 seconds]

15:32 pg12 has joined #osdev

15:36 knusbaum has joined #osdev

15:43 knusbaum has quit [Quit: ZNC 1.8.2 - https://znc.in]

15:50 gxt has quit [Quit: WeeChat 3.4.1]

15:51 knusbaum has joined #osdev

15:53 gxt has joined #osdev

16:05 wootehfoot has joined #osdev

16:07 wxwisiasdf has joined #osdev

16:07 <wxwisiasdf> hello people :>

16:07 <wxwisiasdf> & good morning

16:12 sonny has joined #osdev

16:13 gog has joined #osdev

16:17 knusbaum has quit [Ping timeout: 240 seconds]

16:23 dennis95 has quit [Quit: Leaving]

16:34 sonny has quit [Ping timeout: 252 seconds]

16:39 heat has joined #osdev

16:40 Ali_A has quit [Quit: Connection closed]

16:43 kingoffrance has quit [Ping timeout: 250 seconds]

16:49 wootehfoot has quit [Quit: Leaving]

16:50 sonny has joined #osdev

17:09 heat has quit [Remote host closed the connection]

17:24 kingoffrance has joined #osdev

17:25 sonny has quit [Ping timeout: 252 seconds]

17:29 sonny has joined #osdev

17:34 dennis95 has joined #osdev

17:59 knusbaum has joined #osdev

18:12 knusbaum has quit [Quit: ZNC 1.8.2 - https://znc.in]

18:18 knusbaum has joined #osdev

18:30 wxwisiasdf has quit [Ping timeout: 246 seconds]

18:30 knusbaum has quit [Ping timeout: 240 seconds]

18:36 Matt|home has quit [Ping timeout: 256 seconds]

18:38 jhagborg has joined #osdev

18:39 knusbaum has joined #osdev

18:44 knusbaum has quit [Ping timeout: 272 seconds]

18:49 wxwisiasdf has joined #osdev

18:49 <wxwisiasdf> hmmm

18:49 knusbaum has joined #osdev

18:51 <sonny> wxwisiasdf: what are you working on?

18:57 knusbaum- has joined #osdev

18:57 knusbaum has quit [Ping timeout: 272 seconds]

18:57 <wxwisiasdf> sonny: separated the shell from the kernel

18:58 <sonny> oh nice

18:58 <wxwisiasdf> and added spooler manager for spooling devices

18:58 mahmutov has joined #osdev

18:58 <wxwisiasdf> and added more japanesse translations that i totally understand (sarcasm)

18:59 <GeDaMo> It's a long time since I even heard of spooling :|

18:59 <wxwisiasdf> my os is for mainframe

18:59 <wxwisiasdf> i've heard stuff like "storage load" -> "load thing from ram"

19:03 <kingoffrance> yes, primary storage

19:03 <wxwisiasdf> GeDaMo: abend, punch card line number, JES2, virtual storage, storage, 3270 data stream

19:03 <wxwisiasdf> etc etc ibm terminology

19:03 <GeDaMo> I just associate spooling with printers :P

19:03 knusbaum- is now known as knusbaum

19:03 <wxwisiasdf> spooling is basically

19:04 <wxwisiasdf> some random userland thing minding it's own business:"hey yo, queue me this device request",kernel: "k sure bud"

19:06 <kingoffrance> sounds a bit like batch :)

19:06 <wxwisiasdf> it is a fancy name for queue+polling

19:07 <wxwisiasdf> well not so much polling, you can just yield until interruption comes

19:07 * geist yawns

19:09 Matt|home has joined #osdev

19:13 Likorn has quit [Quit: WeeChat 3.4.1]

19:16 <jimbzy> yo

19:23 * kingoffrance gives jimbzy coffee

19:24 <jimbzy> Cheers

19:24 Likorn has joined #osdev

19:29 knusbaum has quit [Ping timeout: 240 seconds]

19:29 diamondbond has joined #osdev

19:29 * mjg burps

19:31 knusbaum has joined #osdev

19:32 <geist> was productive last night. felt nice. hacked a lot of FAT code

19:33 <geist> one of those rare times when you just get in the zone for a few hours and bash out code

19:33 <jimbzy> Yeah those are good times

19:34 <mjg> the zone == $$

19:35 <geist> yah it's pretty much impossible to get into the Zone for work stuff anymore. too many distractions, or dependencies on tools/processes that break the flow

19:35 <mjg> let's be real though, everyone would be more productive if they had better life/owrk balance, rest, nutrition etc.

19:35 <geist> oh totally i've actually been much more focused and productive the last month or so

19:35 <mjg> hacking until 2 am sounds fucking great on paper

19:35 <mjg> but it is detrimental to actual productivity

19:36 <mjg> 's my point

19:36 <geist> depends on *whos* productivity you're dipping into

19:36 <jimbzy> Yeah. I almost have to go on a strange schedule due to family stuff.

19:36 <geist> ie, is this for work or for your own personal stuff

19:36 diamondbond has quit [Quit: Leaving]

19:36 <j`ey> geist: LK FAT?

19:36 <Griwes> mhmmmm, fat code

19:36 <geist> yah. i thik i pretty uch fully grok FAT now

19:36 <jimbzy> I did pick up a new soldering iron today, tho, so that feels productive ;)

19:36 <geist> not that it's that difficult but there were lttle details that i hadn't fully grokked

19:37 <jimbzy> Well, ordered one rather.

19:37 <geist> cool, a big one of one of the smaller battery powered ones? the latter looks pretty great, i've seen a few

19:37 <jimbzy> https://www.weller-tools.com/we1010na/

19:37 <bslsk05> www.weller-tools.com: WE 1010NA Soldering Station | Weller

19:37 <geist> i have a pretty good workhorse Hakko that i've using for years but thinking of getting one of the battery ones for smaller work

19:38 <mjg> geist: fair point

19:38 <geist> oh nice. weller makes good stuff

19:38 <mjg> i got a semi-vanity project at work right now

19:38 <mjg> kind of forgot the distinction :D

19:38 <geist> The Flow for work i think i've decided it never was a good idea to waste it on work. best thing you can do there is slow and steady progress, all the time, inefficiently

19:39 <geist> and then find a way to be happy with that

19:39 <jimbzy> I almost got a Hakko FX888 but decided tog o with the Weller instead. I've had good luck with them.

19:39 <geist> jimbzy: yah i have the 888

19:39 <geist> been happy with, but i think the weller is jut as good

19:39 <jimbzy> As long as it works I'm not too picky.

19:40 <geist> yah and has a nice array of replacable tips. but i tink it's all semi standard at that size anyway

19:40 <geist> my only complaint with the hakko (depends on the exact model) is one of them doesn't have a light that says its on

19:40 <geist> it only has a led if it's heating

19:40 <geist> so it's really easy ti accidentally leave it on. a common mod is to add another led for that

19:40 <jimbzy> That's kinda strange.

19:41 <zid> A common mod is to your own hands with severe burns.

19:41 <geist> burned my finger with it exactly once

19:42 <zid> I managed somehow to to stand on a hot soldering iron once

19:42 <geist> jimbzy: not so sure about the weller, aside from the toggle switch it may be hard to tell it's on

19:42 <zid> I was doing a quick fix on a torn off wire on the floor rather than cleaning up my desk

19:42 <jimbzy> I'm used to working with extreme heat, so I should be ok.

19:42 <geist> anywaty, just a thing i've found is useful. if it's not in obvious sight, i've found it easy to accidentally it on

19:43 <jimbzy> that sounds terrible zid.

19:43 <mjg> geist: checked out bare minimum worker? i can dig that

19:43 <mjg> geist: had my own period :-P

19:43 <geist> re: burning myself with the soldering iron, i do use a binocular scope for most soldering work. really convenient and makes for good results, but.... easy to accidentally your finger

19:44 <geist> since you only see a narrow field of view you dont see the tip until you get it right in the right spot

19:44 <geist> so you have to be doubleplus careful and be very aware of where the iron is at all time

19:44 <jimbzy> Yeah I can see that being an issue for sure.

19:44 <geist> but otherwise i highly recommend, especially for peoeple with shitty vision

19:45 <geist> or doing SMT work

19:45 <geist> and not really that expensive. I think i have a fairly low end AmScope binocular thing, got it like 15 years ago so might e ore pricey now

19:45 <jimbzy> I have shitty vision! I can look through a keyhole with both eyes.

19:45 <geist> 30x zoom i think?

19:45 <zid> 30x zoom is pretty good for a soldering iron

19:45 <geist> yah just about the sweet spot

19:45 <zid> could kill some pretty far away animals with that

19:46 <zid> I do wonder if the 30x zoom is more range than you'd be able to sufficiently use without some kind of high pressure tip delivery system, maybe even black powder

19:48 <geist> yah looks like it's a bit priceier now, but it's basically something like https://smile.amazon.com/AmScope-SM-3BZ-80S-Microscope-Magnification-Ring-Style/dp/B006QN5T5G

19:48 <bslsk05> redirect -> www.amazon.com: Amazon.com: AmScope SM-3BZ-80S Binocular Stereo Microscope, WF10x Eyepieces, 3.5X-90X Magnification, 0.7X-4.5X Objective Power, 0.5X and 2.0X Barlow Lenses, 80-Bulb Ring-Style LED Light Source, Single-Arm Boom Stand, 110V : Electronics

19:48 <zid> https://shotkit.com/wp-content/uploads/2018/06/Hilarious-SIgma-Lens-reviews.jpg This is my soldering iron

19:48 <geist> i think i got it back when it was like $250

19:49 <jimbzy> That's pretty wild. I was just gonna get one of those big magnifiers with the LED lights.

19:49 <geist> ah https://smile.amazon.com/dp/B004TOZ6AW is more reasonable

19:49 <bslsk05> redirect -> www.amazon.com: Amazon.com: AmScope SW-3T24Z Trinocular Stereo Microscope, WH10x Eyepieces, 20X/40X/80X Magnification, 2X/4X Objective, Single-Arm Boom Stand, Includes 2.0x Barlow Lens : Electronics

19:49 <geist> though you'll want to get a led ring for it

19:50 <geist> problem being that amscope makes like 150 versions of everything so it's really hard to tell

19:51 <geist> and yeah the round magifiers with a led ring are pretty good too

19:51 <geist> i dont really like em, but my dad has one and it works well

19:52 <jimbzy> That's what I learned with

19:52 <geist> i think i learned with the scope because we always had a lab at a lot of the startups i worked at early on and they always had a similar setup

19:52 <jimbzy> I used to use one for sharpening tools, too, so I could see the edge clearly.

19:53 <geist> oh yah for non electronics or fine work the big things are definitely more useful

19:53 <geist> and the scopes use up so much desk space

19:53 <jimbzy> Gonna be building one of these soon for 20m https://qrp-labs.com/qcxp.html

19:53 <bslsk05> qrp-labs.com: QCX+ 5W CW transceiver kit

19:53 <jimbzy> The SMD components come mounted, so I should be good with the rest.

19:59 <geist> oh looks nice

20:01 <jimbzy> Yeah, it looks like a fun project.

20:06 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

20:07 elastic_dog has quit [Ping timeout: 240 seconds]

20:13 elastic_dog has joined #osdev

20:14 GeDaMo has quit [Remote host closed the connection]

20:14 vdamewood has joined #osdev

20:27 <gorgonical> I have been silently wondering for a long time geist if you are a "i'm getting on irc now" yawner or "i'm getting out of bed now" yawner

20:28 <geist> mostly the 'i'm outta bed and prepared to interface with the world' yawner

20:28 <geist> i had to get up super early today because one of my fire alarms started chirping at 6:30 this morning

20:28 <geist> so had to get up on a ladder and fix that while half asleep then went back to bed and didn't reset the alarm

20:28 <geist> so got up extra late today

20:28 <gorgonical> I see

20:28 <gorgonical> Reasonable

20:38 <kazinsal> what's up fire alarm chirpin all night buddy

20:38 <kazinsal> except in my case it's the one in the unoccupied apartment below mine so I had to ping building management

20:42 <geist> yah this one is 'smart' too so it's even worse

20:43 <geist> it chirps and then says "battery low" every minute

20:43 <kazinsal> oh god

20:43 <geist> in a pleasant female voice, but it still is some voice in the other room

20:43 <kazinsal> this one just pings every 30 seconds, but because it's down a floor it echoes

20:43 <geist> and since it's hard wired you can't just remove the battery and deal with it later

20:43 <geist> since the battery is simply the backup

20:45 <geist> i am vaguely curiuos what the protocol it uses to talk to the other fire alarms. i saw at least 3 wires going into it. so 2 are power/gnd i'm sure, so probably at least one loop or shared bus

20:49 <jimbzy> Probably something like HART if I had to guess.

20:50 <geist> hmm, interesting. didn't know about it

20:51 sonny has quit [Quit: Client closed]

20:52 <geist> this looks a bit master/slave though, so may be something even simpler where it's just someone broadcasting their status on the bus

20:53 <geist> but i do have an action item now to find where the main plug is for it. unless the batteries of each of them power all the rest of the, there's probably a power brick somewhere. probably in the attic

20:54 epony has quit [Ping timeout: 260 seconds]

21:00 <mjg> huh. i wonder if there is any real-world hardware in actual use where linux fails the X86_FEATURE_REP_GOOD test

21:01 <mjg> not to be confused with a troll box in a garage kept only for this purpose

21:01 <geist> hmm what does it test?

21:01 <geist> like there's some x86 hardware that has a broken rep prefix?

21:02 biblio has joined #osdev

21:03 <geist> hard to tell exactly when the feature showed up since it has moved around a bit in the source tree

21:05 <geist> yah there is various places in various cpu detection logic in the kernel that sets that bit

21:05 <geist> usually beteween stepping X and Y of some particular thing

21:06 <geist> doesn't seem to mean that rep prefix is broken as much as it's not as good as a non rep version for memcpy

21:06 <geist> and that's distinctly different from ERMS

21:06 <geist> one quick example for example, is some range of steppings on K8 hardware

21:08 <mjg> ye, ye, but is that really used today?

21:09 <geist> what is really used today?

21:09 <mjg> you know, in that spirit, i can tell you that some variants of amd athlon cpus have a bug where they require an explicit fence after atomic ops

21:09 <geist> that the feature flag? looks like it basically defaults to on, and is unset in particular situations

21:09 <mjg> solaris hotpatches this on boot

21:09 <mjg> no other system that i know of gives a shit today

21:10 <geist> sure. it's like that. whether or not it's used today is whether or not folks have hardware that the feature helps with, and linux seems to go pretty far back all in all

21:10 <mjg> my point being that the above is probably something which should have gotten removed years ago

21:10 <geist> if they're just now dropping 486 in the last few years, etc then they have a long way to go until they get to mid 2000s era AMD hardware

21:10 <geist> i dunno, K8 is only like 15 years old

21:10 <geist> for example

21:11 <geist> but anyway, it's clear linux has the policy of maintaining this stuff until it eventually becomes too difficult to maintain, and tis switch seems to be pretty low maintenance

21:11 <mjg> well see above, in that spirit, they literally have a critical bug for some cpus

21:11 <geist> whereas i can see how keeping 486 around would be a burden. if nothign else maintaining the whole soft x87 emulation stuff could go possibly

21:12 <mjg> all while the rep stuff is some perf in the worst case

21:12 <geist> no no, that's my point. the few places i just looked in the kernel didn't seem like it was critical at all

21:12 <CompanionCube> iirc they only dropped 386?

21:12 <geist> it was simply 'its faster if you avoid using rep'

21:12 dennis95 has quit [Quit: Leaving]

21:12 <geist> CompanionCube: yeah i dunno.

21:12 <mjg> geist: i mean the amd athlon cpus which require a fence after atomic for locking primitives, which the linux kernel does not bother with

21:12 <geist> mjg: okay. i dunno what we're arguing about to be honest

21:12 <kazinsal> quick lxr look at how it works, it does seem that if REP_GOOD is set it just NOPs out a `jmp memcpy_orig` at the entry point of `memcpy`, or if ERMS is set it changes it to `jmp memcpy_erms`

21:12 <kazinsal> so that's clever

21:12 wxwisiasdf has quit [Ping timeout: 240 seconds]

21:12 <geist> i mean because it is the way it is doesn't mean there's some hard core logic governing it

21:13 <mjg> my point is that X86_FEATURE_REP_GOOD should probably get whacked as it adds complexity elsewhere

21:13 <mjg> i happen to be writing a pach right now which has to fuck with it

21:13 <kazinsal> honestly it doesn't seem like it has much overhead

21:13 <geist> yah it just fiddles with selecting a third variant of memcpy, one that seems to avoid all forms of rep

21:14 <mjg> there is more

21:14 <CompanionCube> as of last august linux still had 486

21:14 <CompanionCube> so seems they didn't drop that

21:14 <geist> CompanionCube: ah okay.

21:14 <mjg> i would argue freebsd has this solved better -- you ifunc to the expected variant

21:14 <CompanionCube> also 386 was dropped all the way in 2012

21:14 <geist> yah i was thinking the big drop would be to ditch the old x87 emulation code and require linux run on x86s that have a fpu

21:14 <mjg> and it does not use any indirect calls either, all callsites get relocated

21:15 <mjg> so there is literally 0 overhead from existence of numerous variants

21:15 <mjg> not even a nop sled

21:15 * geist nods

21:15 wxwisiasdf has joined #osdev

21:15 * CompanionCube has no ideas which distros still do 486, but i expect it's a very small list.

21:16 <mjg> btw, interestingly

21:16 <geist> yah dunno where i remember reading it. probably made it up

21:16 <mjg> SYM_FUNC_START(copy_page) ALTERNATIVE "jmp copy_page_regs", "", X86_FEATURE_REP_GOOD movl $4096/8, %ecx rep movsq RET

21:16 <mjg> SYM_FUNC_END(copy_page)

21:16 <geist> there were also a bunch of 486 equivalents at the time that the kernel still has support for

21:16 <geist> nexgen, etc

21:16 <mjg> that is, they don't use ERMS for page copying

21:17 <mjg> i asked intel once what's up with that, they told me about some uarchs being fucked and requiring a sfence afterewards

21:17 <geist> question there is does a rep movsb work better even with erms for a page aligned copy

21:17 <geist> or is it really that rep movsb is better in on the average most cases with ERMS

21:17 <geist> and ah, what mjg just said

21:18 <geist> also reminds me, i was thinking i should see about putting together a more optimal page copy for ARM

21:18 <mjg> afair they used to claim (apart from the nugget above) that using erms is always better

21:18 <geist> with full alignment like that can factor out at least a few levels of memcpy

21:18 gog has quit [Ping timeout: 272 seconds]

21:18 <mjg> makes me wonder if that's true for newer uarchs

21:18 <mjg> as long as the size is a multiple of 8

21:19 <geist> but it may be that it's always better or at least parity with rep movsq. but in the case of a fully aligned copy address that's a multiple of 8 rep movsq is just as good

21:19 <geist> and thus doesn't really require a switch

21:20 <mjg> intel optimization manual used to claim that rep stosb + size of 4096 + eax == 0 is special-cased

21:20 * geist nods

21:20 <geist> reminds me, i should finally add the clzero feature for AMD. just cause

21:20 <mjg> it is plausible rep movsb with 4096 is also special-cased, but they fucked it up above

21:21 <mjg> geist: ha, clzero is something i benchmarked in one real case: the venerable kernel build test

21:21 <geist> OTOH while it is generally a good assuption that whatever is in linux is the optimal case, many times you will be dissapoint, as i'm sure you're aware

21:21 <mjg> key point being clzero uses non-temporal stores

21:21 <geist> mjg: yah i believe it was you telling me in a multi socket/numa AMD machine it's not necessarily faster

21:21 <mjg> it is slower

21:21 <geist> because of the cross-node zeros

21:21 <mjg> not only cross-node

21:21 <mjg> you just end up generating more traffic to the memory controller

21:22 <geist> but it doesn't trash the cache, so it *may* be faster in some case

21:22 <geist> but yeah, i think you were the one that talked me out of it

21:22 <mjg> oh sure, i'm confident there are some cases where it ends up being faster

21:22 <geist> it also raises an interesting point: what about the ARM equivalent of it. clzero is clearly the equivalent of `dc zva` on ARM and the backend support must have fallen out of the K12 development

21:23 <geist> and ARM basically suggests using `dc zva` extremely aggressively. they even use it opportunistically in their memset implementation

21:23 <mjg> to be clear, in the kernel build test you had less system time, but got way more cache misses and econsequently more user time

21:23 <geist> but i do question that sometimes, because of precisely wehat you just said

21:23 <mjg> and did not get faster in total

21:23 <geist> since dc zva is also specced as being cache bypassing

21:23 * geist nods

21:23 <mjg> the way i see it, the kernel tries to reuse pages as much as possible

21:23 <mjg> and when you zero them out with nt-stores, you actively combat that behavior

21:24 <geist> right if the page is just about to be used then it makes sense to go ahead and cacache allocate in the L1

21:24 <mjg> as for linux doing stuff reasonably fast, i used to think that';s true

21:24 <mjg> but i know for a fact it is not :)

21:24 <mjg> and i mean some cases are weirdly bad

21:24 <geist> i guess it may depend on the situation. allocating a new zero page for some user space page fault. does it make sense to bring the page into the cache? you dont know what user space is about to do with it

21:25 <geist> in the case of say a page table, are you about to overwrite most of it?

21:25 <geist> if so, bringing it into the L1 makes sense

21:25 <mjg> if there was a way to tell how much of the page happens to be cached already, i would agree

21:25 <geist> a lot of the other situations like allocating a page for COW, you dont zero it because you're about to overwrite it

21:26 <geist> yah and the cost of flushing the page is almost certainly much more than just delaing with it

21:26 <mjg> linux folks did some benchmarks years ago, i don't remember what was tested apart from building stuff, but the conclusion was that nt stores on page zeroing suck

21:26 <mjg> i did my own benches, got the same result fwiw :-P

21:26 <geist> yah. it also may be that it's a bad implementation of a good idea, especially since ARM is extremely bullish on it

21:26 <geist> but i guess AMD ruined it forever

21:27 <mjg> afair all the nt store zeroing is a mid-2000s ideas

21:27 <mjg> idea

21:27 <geist> especially if AMD say sorts the NT stores as super high priority that starves out other things

21:27 <mjg> i find it mildly plausible it was ok at the time as caches were smaller

21:27 <geist> yah, i think that has a fair aount to do with it. relative cache sizes nowadays change where these tuning points come in for various things

21:27 <mjg> so the current effect of evicting shit was less likely to be happening

21:28 <mjg> but i'm also confident people made the change mostly based on their DEEP BELIEF as opposed to serous measurements

21:28 <mjg> which used to be a plague

21:28 <geist> i *do* suppose the whole background zeroing thing could be *more* plausible with clzero, because then you're at least not trashing the regular cache for it

21:28 <geist> except of course the NT stores probably get in the way of other things and generate unnecessary traffic

21:29 <geist> if you could hint that it was low priority maybe

21:29 <mjg> i don't know how background zeroing came to be. i suspect it was the result of "idle loop" only burning cpu

21:29 <mjg> so they came up with shit to do isntead

21:29 <geist> probably same reason: L1 caches were smaller, so you didn't want to tolerate a memset on every page allocation

21:29 <mjg> i did some tests, anything simple like building the kernel shreds whatever reserve of pre-zeroed pages you might have within seconds

21:29 <geist> whereas now the 4K isn't that big of a deal

21:30 <mjg> and in fact you would not want to keep going to the pre-zerored list either

21:30 <geist> yah also that too, i suspect modern systems are expected to be able to chew through more percentage of physical pages per unit time

21:30 <geist> and workload on older systems was relatively different, perhaps

21:30 <geist> that's sort of a thing lost in time, but i do wonder about that

21:31 <mjg> the metric fuckton of forks + execs results in gigabytes of memory chewed through

21:31 <geist> OTOH, that may also be an indicator that that's not necessarily the Best Test for this stuff

21:31 <geist> its a valid benchmark, but it benchmarks a particular corner of the envelope

21:31 <mjg> no argument here

21:31 <geist> but it's the easiest one to run, so it tends to be the one that folks fall back to

21:31 <mjg> it highlights the crucial point though

21:32 <mjg> let's say you need a new page. say you grabbed a pre-zeroed one, so you got it faster

21:32 <geist> also my guess is the background zeroing is just less useful nowadays with modern (ie, last 20 years) style file caching where in the steady state pretty much all pages should be full of something eventually

21:32 <geist> so there's really not a lot to background zero

21:32 <mjg> but now you freed a page and need a new one zeroed out

21:32 <mjg> is it faster to grab the next pre-zeroed or zero out the one you just freed?

21:33 <mjg> factor smp and i think we have a winner

21:33 <geist> i do think about it when running a bunch of VMs on my overcommitted server. linux host can do KSM page merging and zero page detection, but it only helps if the VMs are actually holding onto zeroed pages

21:33 <geist> which they dont, because you can't tell linux/other oses to do that

21:33 <geist> so i can tell a guest to dump the file cache, but i cant force linxu to then fill those pages with zeros

21:34 <geist> you can of course write code in the guest to attempt to fill most of guest ram with zeros, but it's a hack at best (though it does appear to work)

21:34 <mjg> i have to note that zeroing on demand is slow enough that it does show up as a significant factor on both linux and freebsd

21:35 <mjg> [or perhaps other parts of the respective kernels suck least enough to make it stand out]

21:35 <geist> it's possible that at the end it's just more Fair in the sense that the user that initiated the page allocation pays for it

21:35 <geist> which makes sense

21:36 <mjg> makes you wonder how much faster things would be if pages reused in the same security domain were allowed to be allocated without zeroing

21:36 fkrauthan has quit [Quit: ZNC - https://znc.in]

21:36 <mjg> i mean we can't do that now without everything exploding in userspace

21:37 <geist> would be cool if you could, for example, store a bitmap of all pages in the memory controller that simply says 'this is zeroed'

21:37 <geist> then you could just flip bits as you zero the pages

21:37 <geist> basically like TRIM but for DRAM

21:38 <mjg> ye, weird there is no dedicated support. i guess OS fuckery is niche enough even for processor vendors

21:38 <mjg> :-P

21:38 fkrauthan has joined #osdev

21:38 <mjg> "here is the bare minimum to get teh thing running and go fuck yourself"

21:38 <geist> i know that AXI at least has the ability to push through transactions where the data is explicitly zeroed, and i'd assume at some point intel/amd has the same thing in their private busses to the mem controller

21:39 <geist> but i guess if they did do that it'd be just some private thing, and then suddenly things like writing zeros appears to be even faster than before, but only if you do it to an entire page... so that'd be tricky, since they'd have to observe an entire page clear

21:40 <geist> it's one thing to do it at a cache line granularity, but i guess page level would be difficult

21:40 <geist> or at least more difficult

21:41 sonny has joined #osdev

21:44 <mjg> oh also note amd did not patch linux to zero_page with clzero

21:44 <mjg> i tried to asking them about it but got some weird non-response and dropped the subject

21:44 tds has quit [Read error: Connection reset by peer]

21:45 terminalpusher has quit [Remote host closed the connection]

21:45 tds has joined #osdev

21:53 vdamewood has quit [Remote host closed the connection]

21:54 vdamewood has joined #osdev

21:59 tds has quit [Read error: Connection reset by peer]

21:59 hodbogi has joined #osdev

21:59 tds has joined #osdev

22:00 <hodbogi> It is time for fun

22:00 <hodbogi> fun means getting vscode + gdb to work or something with qemu

22:04 wxwisiasdf has quit [Ping timeout: 256 seconds]

22:24 Burgundy has quit [Ping timeout: 272 seconds]

22:25 dude12312414 has joined #osdev

22:25 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

22:28 toulene has quit [Quit: The Lounge - https://thelounge.chat]

22:29 dude12312414 has quit [Client Quit]

22:36 <nomagno> I just implemented a quasi stack macro to do nested looping in my VM. 33 instructions when setting up loop, 5 instructions when exiting, 1 instruction to do the actual looping since it's just jcz

22:37 <nomagno> Hm... I'd say it's worth it over my previous 6-instruction-to-setup pseudoinstruction, you loose more time looping than pre-looping usually don't you :P

22:42 toluene has joined #osdev

22:42 <hodbogi> I foind this after I had it half working https://wiki.osdev.org/User:TheCool1Kevin/VSCode_Debug

22:42 <bslsk05> wiki.osdev.org: User:TheCool1Kevin/VSCode Debug - OSDev Wiki

23:10 gog has joined #osdev

23:13 jhagborg has quit [Ping timeout: 240 seconds]

23:21 hodbogi has quit [Quit: Lost terminal]

23:30 Likorn has quit [Quit: WeeChat 3.4.1]

23:37 <geist> mjg: boo on team AMD

23:38 <geist> but like i said i bet it was a freebie because they already implemented it for K12, since it looks *precisely* like the `dc zva` instruction

23:38 <geist> the same way their new global TLB shootdown stuff looks exactly like ARM's

23:39 <geist> somewhere i found a more or less official quote from the main AMD designer guy that said yeah Zen was absolultely a late in the game retarget of the existing K12 work

23:54 jhagborg has joined #osdev