klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<doug16k> it suddenly has essentially infinite decode bandwidth in loops
<mrvn> if it can fit it in the cache
<doug16k> yeah but that is like 1000+ insns now
<doug16k> it's amazing
<mrvn> why is the simple "loop" slower then if it would just stream the decoded opcodes?
<doug16k> loop is deliberately bad. I have been avoiding loop since pentium
<doug16k> the story I heard was, it is used in delay loops, so intel deliberately kept it bad
<doug16k> since pentium, the simpler instructions are faster
<doug16k> that is a microcoded instruction
<mrvn> cound down loops are such a common thing the cpu really should have a fast opcode for them.
<doug16k> well they kind of do - there is macroop fusion that fuses dec jnz to one op
<mrvn> doug16k: dec + jnz == loop, makes no sense
<doug16k> it's just super flexible so you just use the existing test inc dec add sub whatever followed by branch to encode one fused op
<mrvn> it's not like delay loop would work with all the MHz changes the cou went through
<moon-child> what I heard is that loop is bad because it has to deal with being interrupted
<moon-child> you decrement rcx, then potentially jump somewhere; if that somewhere isn't mapped, you #pf, but you also need to restore the previous value of rcx
<doug16k> you are right, by now it doesn't even slow it down that bad if it was serializing per iteration. it gradually became almost irrelevant how fast it is
<mrvn> like an IRQ between dec and jnz?
<moon-child> mrvn: no, you can be atomic wrt that
<mrvn> ahh, #pf. yeah, that might be tricky.
<mrvn> With dec + jnz the #pf would have an IP pointing at the jump.
<moon-child> yep
<doug16k> interrupts are serializing, so it will wait until everything is retired, then start the interrupt
xenos1984 has quit [Read error: Connection reset by peer]
<mrvn> It's just so annoyingly complex to use dec+jnz. Adds 6 opcodes to the 6 the loop itself is.
<mrvn> or 4 actually, dec only changes CF, not OF
<moon-child> yeah, no problem with interrupts. Just exceptions from the loop insn itself
<mrvn> it's stupid that x86 always changes the flags. Other archs have a bit for that.
<moon-child> agreed
<doug16k> because x86 instructions are small
<doug16k> other arch have wasted bits they want to use
<mrvn> although dec+jnz needs to change the flags. :(
<moon-child> well. Ideally you could ask dec to only set the zero flag
<moon-child> doug16k: what if it were a prefix you could apply, optionally, only where you could profit?
<doug16k> you can use lea to inc dec
<doug16k> no flag change at all
<doug16k> then what :D
<moon-child> then it wouldn't waste bits
<mrvn> moon-child: dec is so nice and keeps the CF flag. But it overwrites OF.
<doug16k> no flag inc rax -> lea 1(%rax),%rax
<moon-child> yes but we want only-zero-flag inc rax
<doug16k> I know - then what. how do you branch or not. you could jmp *%xxx and cmov to that
<mrvn> doug16k: I need something that jcc can use that leaves CF/OF alone.
<doug16k> then you need to wreck flags lol
<mrvn> hence the whish to set the Zero flag
<doug16k> I think you should try setc then cmp that reg at the top to get it back
<doug16k> it should overlap your loop inc jcc for free
<mrvn> doug16k: I need OF, dec doesn't change CF
<doug16k> oh
<mrvn> Or maybe the whole idea of using adcx/adox for additions is crazy and it's only useful for multiplications.
<doug16k> if you did seto then neg that, then at top, cmp that with something that makes it OF or not?
<moon-child> mrvn: I think the original impetus was for crypto. Not gonna have branches in crypto code anyway (not constant time), and you know the sizes, so you can just fully unroll
<doug16k> ah crap - trashes CF. I see the problem
<mrvn> moon-child: you aren't going to unroll an 4096 bit key.
<doug16k> can't use lahf sahf ?
<moon-child> use neg to set carry?
<mrvn> "It is valid in 64-bit mode only if CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 1." Not sure.
<mrvn> moon-child: carry is unaffected by "dec"
<doug16k> moon-child, the idea was to make "1" from "seto" to become all 1's so it is negative, then you can cmp that with something that makes OF be set
<doug16k> I didn't sit down and figure out the polarity, but it's useless because it would ruin CF we are trying to carry around the loop
<mrvn> doug16k: adox -1, reg should do
<mrvn> lahf works here
<doug16k> 3950x supports it
<doug16k> yours is intel?
<mrvn> that would make it: sahf; N times 6 opcodes; lahf; 4x lea; dec; jnz
<mrvn> amd
xenos1984 has joined #osdev
<mrvn> and whatever compiler explorer has
<mrvn> I should start measuring. It might all be moot because the limiting factor is ram.
<moon-child> most people have no use for numbers so large they don't fit in cache :P
<doug16k> you should unroll it enough that the iteration count is low enough for the predictor to learn the pattern
<doug16k> for the final branch
<doug16k> if you can
<doug16k> if you can get it to speculate right into the return, can't beat that
<doug16k> then it won't speculate into excessive iterations and not realize until the last dependency chain completely retires
<moon-child> isn't 'learning the pattern' mainly a function of whether your bigints actually have predictable sizes, which is an application concern?
knusbaum has quit [Quit: ZNC 1.8.2 - https://znc.in]
<doug16k> if a loop branch is taken a predictable number of times, and it's not too many, it can correctly predict the final not taken and speculate correctly and return and start speculating correctly there, instead of it speculating into the weeds and not realizing until the very last iteration retires
<doug16k> then flushing pipeline and starting all over
<doug16k> it can learn taken taken taken taken ... not taken and be perfect
<mrvn> doug16k: the problem is that this will often be called from "mul", which recursively divides. So you get size 2, 4, 8, 16, 32, ...
knusbaum has joined #osdev
<doug16k> if you are lucky, the branch history that leads to that can make it have another separate history remembered for that size
<mrvn> Sizes 2, 4, 8 should probably be spezial and fully unrolled. And then 16, 32, ... as loops
<doug16k> there can be one branch that mispredicts, but it's right from then on, because that is a different branch history, and the following stuff is using different history values
<doug16k> and it's right from then on for a while
<doug16k> the sequence of taken/not taken that have recently occurred cause it to select a different set of history memories
<doug16k> you hope
lainon has quit [Quit: Textual IRC Client: www.textualapp.com]
<doug16k> there is aliasing that can cancel it
<doug16k> usually it works
<doug16k> for example, if you had if (debug_enabled) print("stuff"); repeatedly, then when that is taken or not, it uses different hisotry memory for the following branches, and learns the if is always false or if is always true pattern
<mrvn> One hope with the tiny loop is that if the loop runs 1024 times the one mispredict at the end is irrelevant.
<doug16k> and all the following ifs that go the same way predict correctly
<mrvn> If you unroll then it's taken less often and might be worse in predicting
<doug16k> I mean, even if you flicked debug_enabled on and off, the first if's mispredict would cause it to select the other branch histories and predict the rest correctly
<doug16k> the mispredict at the end isn't irrelevant if it would have speculated into a load in the caller, and got started on it sooner
<doug16k> way sooner
<doug16k> if you keep it speculating well, it doesn't matter what the instructions do very much, it will decompose everything into a dataflow graph and start everything asap
<doug16k> I would think of it as the pipeline having excessive integer execution pipelines, and one thread can't even saturate them with realistic instructions
<doug16k> I picture the carry dependency chain to be the determining factor, and everything else goes through for free in other execution units
<doug16k> and they have no effect
<mrvn> doug16k: now rethig that agai with all cores running the same loop
<mrvn> rethink even
<doug16k> it must be an epic amount of adding to have two dependency chains back to back on a modern amd
<doug16k> why not avx2?
<mrvn> doug16k: because it has no addx. Getting the carry is complex.
<doug16k> yeah but you can probably get so many carries at once that it isn't bad
<mrvn> only 4
<doug16k> ok, now do 2 dependency chains of that like you did with adc/adx
<doug16k> that would be insane
<doug16k> carry is just compare less than, then subtract that from destination and it will subtract -1 or 0, adding one or not
Teukka has quit [Read error: Connection reset by peer]
<doug16k> unsigned
<mrvn> gcc/clang are a bit stupid there. They compare and then mask it with 1 so they can later add it.
<moon-child> can probably do as many as 4 at once
<mrvn> moon-child: but will it be faster than doing them in sequence?
<moon-child> try it and see
<doug16k> I think you will get the adc and adx going through in the same cycle and everything else is nothing
<doug16k> assuming cache hits
<mrvn> mixing adc + avx is another idea.
<moon-child> I think adc will steal execution ports from avx
Teukka has joined #osdev
<doug16k> why?
<moon-child> so if avx is faster, there'd be no point
<doug16k> I don't think it using the avx opcode space means it uses fpu pipelines
<doug16k> if it did then there would be bypass delays that aren't mentioned in bmi
<mrvn> but if I only need to add 5 numbers I can do avx for 4 and adc for the last.
<moon-child> doug16k: yeah but it's int ops
<moon-child> don't scalar and vector int ops use the same ports?
<doug16k> amd has completely decoupled integer and float
<moon-child> there's no float here though
<doug16k> avx would be
<doug16k> I think we agree anyway
<mrvn> if avx with interger a float or int operation?
<mrvn> is ...
<moon-child> if you use avx to do bigint addition, then you're doing int ops
<doug16k> adx is a 3-operand instruction isn't it?
<moon-child> 'I think we agree anyway' maybe :P
<doug16k> it's floating point pipelines if integer avx yeah
<doug16k> they obviously don't have 256 bit alus in the integer ones
<doug16k> and no gigantic registers
<moon-child> oh hmmm
<moon-child> mrvn: what if you just repeat the adox?
<doug16k> if one thread was avx and other was integer, it would be glorious
<moon-child> wait no that doesn't make sense
<mrvn> moon-child: then I can just adc, it's a shorter opcode
<moon-child> no I meant from the previous iteration
<moon-child> but that doesn't work
<doug16k> avx can do a hell of a lot fewer loads/stores
<doug16k> that alone is huge
<mrvn> doug16k: fewer opcodes, same volume.
<mrvn> if you are waiting for memory it doesn't matter
<doug16k> you can fit way more bandwidth into the same amount of reorder buffer slots
<doug16k> speculate further
<moon-child> mrvn: a simd load and a scalar load have the same throughput
<doug16k> one load can be a byte or 256 bits. which one is faster
<moon-child> in terms of # loads/cycle
<moon-child> but the former does a lot more work
<moon-child> (ditto store)
<mrvn> both take 200 cycles to fetch a cache line from memory
<moon-child> if you're waiting for memory than nothing else matters anyway
<mrvn> that's what I said,.
<moon-child> but you want to optimise
<moon-child> optimisations only matter when you don't hit memory. So focus on that case
<doug16k> you said it was AMD so that means it is going to hit the cache all the time, unless you have more than the huge L3 of bigints
<doug16k> you have gigantic caches
<mrvn> That totally depends on the size of the Bignums you have. If the numbers have a million bits then cache become a bit limited.
<moon-child> then you're bandwidth limited
<mrvn> If you do 4 AVX streams in parallel thats 32MBit or 4MB of data for a million bit numbers.
<moon-child> memory is p fast
<doug16k> one CCX L3 is 16MB
<mrvn> 6MB for a + b instead of a += b
<gamozo> what? memory is so slow!
<doug16k> guessing which gen though
<heat> i know some of these words
<heat> computer go brrr
<moon-child> gamozo: bandwidth
<gamozo> that's fair!
<doug16k> if you know for a fact that it won't fit in the cache, then you should be using non-temporal loads
<doug16k> and stores
<moon-child> ^ that too
<gamozo> memory bandwidth got so much better
<gamozo> tbh, non-termporal is kinda spotty? I've yet to find many good situations for it, even with streaming writes
<gamozo> I don't understand computers
<zid> NT's just very likely to make things worse unless it DEFINITELY makes them better
<zid> because of prediction and caches and stuff
<zid> it's just hard to use in real programs
<moon-child> a colleague recently worked out how to use nt stores in matrix multiplication
<gamozo> the main issue si that _most_ compute you can batch results and keep it in cache rather than going to non-temporal memory
<zid> Like, when was the last time you did a prefetchw
<heat> is nt defined if you use nt and non-nt accesses?
<moon-child> haven't implemented it yet, but I made a model of it. Didn't seem to help. But there are second order effects
<doug16k> gamozo, yeah, it has to be a perfect use case for it to win. everyone uses the data too soon after and that makes it look awful
<zid> and NT is arguably harder
<moon-child> heat: I think you can get either the stored value or the previous value
<heat> zid, most prefetchws are wrong anyway
<zid> yup
<moon-child> and if you do a write, not specified which one gets written
<moon-child> until the next sfence
<doug16k> gamozo, if you know that your multiword adc chain is over 16MB though
<gamozo> :gasp:
<gamozo> That fits in l3!
<zid> Not in my l3 :(
<gamozo> :(
<zid> I have 10M or 12M available
<zid> unless I figure out how to do dual socket with a pair of 1xxx xeons
<zid> of different skus
<doug16k> my gen is 16MB per 4 core CCX, so 64MB total
<gamozo> I just got new procs with 48 KiB of l1 and it's HOT
<zid> did you get the one with 768MB of L3 yet
<zid> Imagine paying £7000 for a 3.8GHz turbo cpu
<doug16k> what cpu is that? some epyc variant?
<zid> 7373x
<gamozo> I only have epycs in my storage server, I need avx-512 :(
<zid> no you don't
<gamozo> YES I DO!
<doug16k> gamozo, zen4 will have it
<doug16k> soon
<zid> unless you happen to be doing *exactly* an avx-512 on that cpu, all day, you don't :P
<zid> load*
<gamozo> zen4 wont have it right?
<gamozo> they only will have the 16-bit flaot stuff, but not even AVX-512F
<gamozo> it's gonna be a scuffed implementation I bet
<gamozo> (at least, that's how I read it)
<gamozo> They've been really slippery on answering questions
<doug16k> yeah I am just going by rumor bs
<zid> avx-512 is a whole family of shit
<zid> so god knows what you'd get even if they did add it
<gamozo> yeah
<doug16k> I am half expecting it to be 2 256-bit ops, but the mask regs would help, if it had avx512f
<gamozo> I mainly want avx-512f and avx-512bw
<gamozo> the mask regs are largely what I want, but I really wouold like all 512 bits
<mrvn> If your Bignum is 16MB then the adc chain will read 32MB and write 16MB.
<doug16k> it's a 3-operand add?
<mrvn> doug16k: frequently.
<doug16k> write allocate could make one load free
<mrvn> If you multiply then some of the sub-terms you need multiple times. So you need a non-destructive add
<mrvn> But with a+b = b+a you can probably shuffle stuff around a lot to use 2-operand add a lot.
<doug16k> I think CPUs are unnecessarily fast already
<doug16k> to the extreme
<zid> play dwarff ortress and say that
* mrvn throws an 6502 at doug16k
<doug16k> try portal on 2K@165Hz. it's so smooth and perfect, it's distracting
<doug16k> every time I do a 180 I am like "whoa that was sooo smooth... geez"
<zid> framerate is a hell of a drug
<gamozo> 165hz? where do you get 165hz monitors?
<gamozo> that's new to me~
<heat> stores
<heat> you even have 240hz ones lol
<heat> also 360hz as well I think
<bslsk05> ​www.amazon.ca: LG UltraGear 32GN600-B 32 Inch(31.5) QHD VA 5ms with 1ms MBR 144Hz 165Hz Gaming Monitor AMD FreeSync, Black : Amazon.ca: Electronics
<doug16k> almost cheap now
pretty_dumm_guy has quit [Quit: WeeChat 3.5]
<doug16k> it's about 4GB/s just to do the scanout for dual monitor
<zid> Give me my ramdac backs damnit
<zid> I don't care about 4k I want 1080 and a ramdac
Likorn has quit [Quit: WeeChat 3.4.1]
Ali_A has joined #osdev
<doug16k> I realized something funny yesterday. if I put a word processor on zoom to whole page, then the page on the screen is more than 8.5x11 lol
<zid> What's that in ISO
<doug16k> A4
<doug16k> I think
<zid> I think my monitor's about A4 tall
<zid> but i'm not sure it has amazing dpi
<zid> 72?
<zid> 112? I forget and I'm lazy
<zid> I'd have to do trig
<doug16k> sounds right
<doug16k> mine is 96 or near that IIRC
<doug16k> 2K
<zid> A = 23", whatever angle 16/9 makes.. err.. something something, dpi.
<zid> H=23 even
<doug16k> 2K is too much for 31", so I had to cancel out some of the extra room anyway, just slight better font rendering, mostly
<doug16k> cancel out with increasing font size I mean
<zid> yea, I'd like that, given I can't turn off truetyp
<zid> I had an anti-aliased font hook program at one point but it was pretty unreliable
_xor has joined #osdev
heat has quit [Ping timeout: 272 seconds]
theruran has quit [Quit: Connection closed for inactivity]
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
terrorjack has joined #osdev
Ali_A has quit [Quit: Connection closed]
pounce is now known as pounce_
srjek has quit [Ping timeout: 255 seconds]
vdamewood has joined #osdev
pounce has joined #osdev
Arsen has quit [Ping timeout: 260 seconds]
qookie has quit [Ping timeout: 244 seconds]
qookie has joined #osdev
Arsen has joined #osdev
jack_rabbit has joined #osdev
knusbaum has quit [Ping timeout: 260 seconds]
air has quit [Ping timeout: 256 seconds]
pounce has quit [Remote host closed the connection]
air has joined #osdev
pounce has joined #osdev
<Jari--> hi all
<klys> hi jari--
foudfou has quit [Remote host closed the connection]
foudfou has joined #osdev
the_lanetly_052_ has joined #osdev
Ali_A has joined #osdev
Celelibi has quit [Ping timeout: 258 seconds]
Celelibi has joined #osdev
kingoffrance has quit [Ping timeout: 265 seconds]
<Jari--> klys: so hows OS business
<Jari--> All kernels seem to have this file system, even drivers access the root file system with open close read write lseek.
<Jari--> I am still manually poking with readwrite block getsize etcs.
<Jari--> vfs
<Jari--> I sometimes wondering what parts of the kernel should be using the internal LIBC and what parts should have direct access.
<Jari--> Drivers for example would probably be better with using internal device API.
sortie has quit [Ping timeout: 240 seconds]
sortie has joined #osdev
Ali_A has quit [Quit: Connection closed]
sortie has quit [Ping timeout: 248 seconds]
sortie has joined #osdev
zaquest has quit [Remote host closed the connection]
bliminse has quit [Quit: leaving]
zaquest has joined #osdev
heat has joined #osdev
bliminse has joined #osdev
<mrvn> Jari--: Join us in the microkernel world. None of them should have direct access.
<heat> D:
<mrvn> Jari--: If you are talking about firmware loading then maybe rethink the approach. Supply the firmware blob from userspace like Linux does. If you are talking about FSes then they kind of need block read/write but that usualy should go through the block cache and have some protection against writing outside the partition the FS is on.
<mrvn> or the FS is on raid or lvm and needs to access a virtual device.
<heat> no it's defo not firmware loading
<heat> you're overthinking this :P
<mrvn> heat: what other than firmware loading would access files?
<heat> who said anything about accessing files
<mrvn> open close read write lseek.
<heat> he's talking about the vfs
<heat> also seems confused
<heat> very unclear question
<Jari--> I want to run my file system driver on Linux text console, thats why I am thinking of adding some features to my VFS.
<heat> what's the linux text console, to you
<heat> what features are you lacking
<Jari--> a terminal
<heat> the terminal is just a pipe of text
<Jari--> lots of dependabilities non-POSIX
<heat> user process reads, user process writes
<heat> that's how the terminal works
<heat> the kernel just displays it
<Jari--> heat: console instead of virtual machine
<heat> oh so you want to run a driver as a userspace program?
<Jari--> heat: yes
<heat> ok, that's doable
<heat> wrap your internal API into libc functions
Ali_A has joined #osdev
<Jari--> heat: my kernel is MS-DOS like, more than a microkernel
<Jari--> although it is linear memory space, non-segmented
<heat> how is it MS-DOS like=
<heat> ?
<Jari--> heat: well I wrote API to be MS-DOS compliant
<Jari--> MS-DOS and C applications
<heat> you might be screwed
<Jari--> DJGPP really
Ali_A has quit [Client Quit]
<mrvn> You can port your kernel to posix as "hardware", using signals, mmap, mprotect, settimer, ... to emulate all the hardware stuff. But it's a major undertaking. Or add a qemu-user-your-kernel backend to qemu.
<mrvn> having drivers access the hardware directly will make it basically impossible to do any of it though. You want to go through the API.
<Jari--> Sorry guys, I get migraine attacks so my talking is probably not the most consistent ever right now.
<mrvn> coding with a migrane is a bad idea. makes it worse and produces crap. better sleep it off.
<Jari--> mrvn: I keep rewriting same functionallities, so it is sort of spaghetti code at worst.
<Jari--> Especially writing interpreters is difficult.
<Jari--> mrvn: I want my OS able to run Commodore Basic token binary programs.
<Jari--> Basically what I am now writing on kernel is it to be Linux like as much as possible.
<Jari--> UN*X OS does not have to be enormous to function, like 386BSD kernel f.e.
<heat> 386BSD was already pretty complex
<heat> same with all the previous BSDs
<Jari--> heat: if I drink coffee, my migraine vaporizes instantly
<Jari--> must be lack of caffeine
GeDaMo has joined #osdev
pretty_dumm_guy has joined #osdev
gorgonical_ has quit [Quit: Client closed]
gildasio1 has quit [Remote host closed the connection]
Jari-- has quit [Ping timeout: 256 seconds]
nyah has joined #osdev
pretty_d1 has joined #osdev
pretty_d1 has quit [Client Quit]
pretty_dumm_guy has quit [Ping timeout: 258 seconds]
pretty_dumm_guy has joined #osdev
<dostoyevsky2> isn't linux just like 250 syscalls?
Burgundy has joined #osdev
<heat> 400 and something but yeah
<Mutabah> plus ioctl/etc
<heat> plus ioctl, plus pseudo fses, setsockopts, etc
<heat> and probably more that I can't think of right now :P
<heat> glorified eBPF interpreter? :P
<mrvn> L4 has 6 syscalls
<mrvn> just for comparison :)
<dostoyevsky2> if you have a C program that implements a couple of syscalls, how difficult is it to get that C program boot up in qemu? Do you need to write your own boot code in asm, or could you just reuse something?
<mrvn> use the multiboot format and you can use qemu --kernel mykernel.elf
<heat> dostoyevsky2, there's significant code behind loading a program
<heat> even more significant if you're doing it properly with the vfs and all that
arch-angel has joined #osdev
Clockface has quit [Ping timeout: 240 seconds]
arminweigl has joined #osdev
<dostoyevsky2> heat: couldn't you just compile a -fPIE/PIC program and thereby be able to simply load that blob into your memory and just jump to the code without any fancy loading?
<mrvn> with or without -fPIE/PIC makes no difference
<heat> those still need to be loaded
<zid> problem imo is cpu modes
<heat> a PIC program isn't just a blob you can run directly
<mrvn> And you need to setup the C runtime environment, meaning you need a stack.
<dostoyevsky2> if you don't have position independent code you'd need to setup proper virtual memory addresses, no?
<mrvn> dostoyevsky2: -fPIC is not position independent code
<zid> It's just as easy with or without, if you can specify the load address
<zid> I do wonder what you intend to provide the syscalls for though if you're not expecting to be loading stuff properly
<mrvn> and what will the syscalls do without malloc or printf or anything else
<mrvn> dostoyevsky2: you might want to read https://wiki.osdev.org/Barebones
<bslsk05> ​wiki.osdev.org: Bare Bones - OSDev Wiki
<heat> syscalls are another question
<heat> depends on the syscall, of course ;)
<dostoyevsky2> does tcc -run/libtcc actually create an executable or does it just directly generate the executable in memory and jumps to it?
<heat> idk
<mrvn> dostoyevsky2: depends on what you accept as executable.
<mrvn> is a bash script an executable?
<heat> why are you being obtuse
<heat> it's clearly an ELF
<mrvn> heat: I think he is talking about "#!/usr/bi/tcc -run/libtcc"
<dostoyevsky2> The simplest loadable binary format could be like: https://dginasa.blogspot.com/2012/10/brainfuck-jit-compiler-in-around-155.html
<bslsk05> ​dginasa.blogspot.com: A Dumb Guy in a Smart Age: Brainfuck JIT Compiler in Around 155 Lines with GNU C
<zid> The simplest is just .com
<zid> but requires.. a loader
<heat> a.out?
<mrvn> coff coff
<heat> yesterday I saw a hobby OS that used a.out
<heat> that was weird
<zid> that is infact, weird
<mrvn> in name or for real?
<heat> for real
<heat> a.out loader
<heat> it's very 80s of them
gildasio1 has joined #osdev
dude12312414 has joined #osdev
gog has quit [Ping timeout: 248 seconds]
<mrvn> I have the best format of them all: bin
<dostoyevsky2> > #define TCC_OUTPUT_MEMORY 0 /* output will be ran in memory (no output file) (default) */
<dostoyevsky2> tcc -run does not bother with a binary format
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<heat> you sure?
<heat> that just means you don't write to a file, you write to memory
<mrvn> and then mprotect it to make it executable and call it
<bslsk05> ​github.com: Search · TCC_OUTPUT_MEMORY · GitHub
<dostoyevsky2> heat: here for a use case, so no fancy loading necessary as tcc already arranged everything in memory for you to just jump to the code: https://github.com/TinyCC/tinycc/blob/82b0af74501bf46b16bc2a4a9bd54239aa7b7127/tests/libtcc_test.c#L104
<bslsk05> ​github.com: tinycc/libtcc_test.c at 82b0af74501bf46b16bc2a4a9bd54239aa7b7127 · TinyCC/tinycc · GitHub
lg has quit [Ping timeout: 240 seconds]
lg has joined #osdev
elastic_dog has quit [Ping timeout: 248 seconds]
elastic_dog has joined #osdev
torresjrjr has joined #osdev
vai has joined #osdev
vai is now known as Jari--
jstoker has quit [Quit: *disappears in a cloud of bits*]
jstoker has joined #osdev
<Jari--> I managed to crash lynx browser
<ddevault> this is bizzare
<ddevault> the page at 2000 gets overwritten when I zero out 120000
henistein has joined #osdev
LittleFox has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
<j`ey> mapped it twice?
<ddevault> I can't imagine so
<ddevault> it's part of the first 64G I have memory mapped
<ddevault> identity mapped*
<heat> is 0x120000 there?
<ddevault> yep
<heat> info tlb pls
<ddevault> wut
<heat> info tlb
<j`ey> in qemu
<heat> the qemu command
* ddevault tries to remember how to summon the qemu console
<heat> tip: use -monitor stdio so you can easily copy stuff
<ddevault> I use stdio for serial
<j`ey> ctrl-a-c
<j`ey> iirc?
<heat> maybe
the_lanetly_052_ has quit [Ping timeout: 260 seconds]
<ddevault> massive dump
<heat> pastebin it
<ddevault> overflows my terminal buffer
<j`ey> post it directly into irc
<ddevault> perfect
<heat> lol
<heat> tee it to a file and pastebin that?
<ddevault> trying to figure out how to capture it
<heat> ^^
LittleFox has joined #osdev
<bslsk05> ​paste.sr.ht: paste.txt — paste.sr.ht
<ddevault> looks fine to me :<
<heat> where is any of that mapped?
<ddevault> ffffff8000000000: 0000000000000000 --PDA---W
<ddevault> the region with the address being overwritten
<ddevault> ffffff8001200000: 0000000001200000 --P-----W
<ddevault> the region I'm supposed to be writing to
<heat> why and when are you zeroing memory
<ddevault> it's a page table
<heat> are you sure you're not overwriting something you're using by accident
<ddevault> not yet
<ddevault> ruling something else out now
toluene has quit [Ping timeout: 244 seconds]
<mrvn> don't map ffffff8000000000 so nullptr mapped to virtual crashes
<ddevault> good call, will follow up on that later
<mrvn> and map text and rodata read-only
<ddevault> yeah, our elf loader is very basic
<ddevault> will improve that later
<heat> huh?
<heat> 0xffffff8000000000 has nothing to do with nullptr
<mrvn> why is anything in lower half still mapped?
<mrvn> heat: nillptr mapped to virtual
<ddevault> got it
<heat> mrvn, what's the problem?
<ddevault> my userspace page allocator was clearing pages regardless of if they are device memory or not
<heat> userspace what
<ddevault> err
<mrvn> heat: it's just one of those addresses you can end up by accident
<ddevault> the code which gives pages to userspace
<heat> :whew:
<heat> mrvn, nothing wrong happens if you touch the null page
<heat> it's a page like any other
<ddevault> there is nothing useful there, though
<mrvn> heat: that's not what this is about.
<ddevault> so might as well take the extra defense against errors
<heat> a real worry would be to have the 0x0 page mapped
<mrvn> although mapping stuff below 1MB is riksy
<heat> the page frame? no, not a problem
Gooberpatrol66 has joined #osdev
<heat> risky how?
<mrvn> heat: there is reserved ram and mmio there you shouldn't mess with accidentally.
<heat> if you have a read/write primitive in the kernel you've already won
<heat> and then you're not looking to crash the machine, but to take it over
<mrvn> ddevault: do you have anything that parses the memory map and maps just the avaibale parts of memory?
<heat> also important to note that you'll get a nice speed advantage if you map everything
<mrvn> heat: if you are taking over the machine then you write your own page tables and none of this matters anyway.
<heat> huge pages are fast
<ddevault> mrvn: yes
<mrvn> that depends on your cpu. Some have only a few TLB entries for huge pages.
<mrvn> ddevault: then why is ffffff8000000000 a 2MB page? That shouldn't be available in its entirety.
bliminse has quit [Quit: leaving]
<ddevault> hm
<ddevault> no clue
<bslsk05> ​todo.sr.ht: ~sircmpwn/helios#30: Why is ffffff8000000000 mapped as a 2MB page? — sourcehut todo
<heat> because you did it probably
* ddevault shrugs
<ddevault> will investigate later
<heat> in fact, i bet your page table mapping code is interpreting the huge page as a page table and writing to it
<heat> that makes so much sense and explains so much
<ddevault> feel free to dig into the code if you want
<ddevault> busy
bliminse has joined #osdev
<Jari--> ohh the simplicity of 32-bit memory management
<mrvn> how is that simpler?
<Jari--> its also easier test sandbox for a kmalloc
<Jari--> mrvn: well I am accustomed to it, no idea about 64-bit
<heat> 32-bit memory management is hell
<heat> it's the opposite of simple lol
<mrvn> gets bad when you have more than 1GB
<Jari--> yeah, I/O mapped memory crashes on more than 1 gig RAM
<Jari--> mrvn: heat: yeah
<Jari--> Ask UEFI for I/O memory mapped memory or PCI bus?
<heat> huh?
<Jari--> Where do you usually get the memory map on your system.
<Jari--> PCI gives I/O and memory addresses on my kernel.
<mrvn> from the bootloader
<heat> yes
<heat> what's the problem
<Jari--> multiple bridges might have issues
<heat> how so
<Jari--> heat, e.g. does AGP register up on PCI bridge?
<heat> you're talking to me about an old ass technology but yes afaik
<Jari--> Okay so PCI is really big business.
<Jari--> heat: on VMware I have ran the kernel with 3 gigs of RAM, with luck.
<mrvn> how do you even fit the PCI devices into 32bit address space? How do you handle a GPU with 8GB ram?
<Jari--> mrvn: memory extension might support up to 8 gigs of RAM, supported in standard 32-bit Linux kernels
<heat> usually, it's not all mapped as a BAR
<heat> see: resizable BAR extensions
<mrvn> Jari--: with PAE? Now you made bad even worse.
Likorn has joined #osdev
<heat> no, PAE is 64GB
<Jari--> really thats awesome
<Jari--> I havent seen many 128 gig systems so far
<heat> this is probably the opposite of awesome
arch-angel has quit [Ping timeout: 252 seconds]
<mrvn> Does anyone still produce systems with PAE and no long mode?
<heat> the answer must be yes
<heat> :)
<Jari--> 64K limited DMA?
<Jari--> Lol should upgrade my drivers soon.
<heat> hm?
<heat> that's some old ass DMA
<heat> ISA DMA? something like that
divine has quit [Read error: Connection reset by peer]
<mrvn> Jari--: with PAE and 64GB ram you already have a problem because you need some space for PCI:
<mrvn> total used free shared buff/cache available
<mrvn> Mem: 64851252 45326304 16804872 1968532 2720076 16868532
<mrvn> Swap: 67108860 602112 66506748
divine has joined #osdev
<heat> free -h pls
<mrvn> total used free shared buff/cache available
<mrvn> Swap: 63Gi 588Mi 63Gi
<mrvn> Mem: 61Gi 43Gi 15Gi 1.9Gi 2.6Gi 15Gi
<heat> thank
<heat> is that a 32-bit machine you're running 64GB of ram on?
<mrvn> no. That would be insane.
<heat> why 61 then?
<mrvn> stupid bios, pci memory hole, shared memory wiht the gpu
<heat> the PCI memory hole won't take away your ram though
<mrvn> heat: with a stupid bios it does
<heat> /unless/ you're running a 32-bit kernel
<heat> huh
Likorn has quit [Quit: WeeChat 3.4.1]
<heat> total used free shared buff/cache available
<heat> Mem: 7.7Gi 5.1Gi 293Mi 1.1Gi 2.3Gi 1.2Gi
<heat> Swap: 8.4Gi 2.7Gi 5.7Gi
<heat> all good here
<qookie> with a stupid bios everything is possible :^)
<mrvn> why 7.7 and not8?
<heat> fuck do I know
<heat> pre-used memory probably
<mrvn> probably the same, A hole below 4GB for 32bit pci
<heat> hmm no
<heat> I don't think so
<heat> lets see
<mrvn> do you still have the memory map in the dmesg output?
<heat> let me journalctl
<qookie> on my system i have 15.5G available to the OS according to lsmem, and 14G usable according to free
<qookie> and that matches up with 512M of stolen memory for the igpu
<qookie> 0x0000000000000000-0x00000000cfffffff 3.3G online yes 0-25
<qookie> 0x0000000100000000-0x000000040fffffff 12.3G online yes 32-129
<heat> oh yessss
<heat> probably stolen memory
<bslsk05> ​gist.github.com: gist:11e38ddc8df738bf058406f00f221512 · GitHub
<mrvn> it's odd, 0x400000000 is 16GB. So it didn't just punch a hole where the PCI regio is but remapped the ram. But then it should go up to 0x43fffffff. So something is stealing from that.
<mrvn> heat: you have a huge hole there in the lower 4GB
xj0hn has quit [Ping timeout: 240 seconds]
ethrl has joined #osdev
ethrl has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
ethrl has joined #osdev
ethrl has quit [Client Quit]
<ddevault> I bet I can port doom fairly soon
<ddevault> without audio, that is
ethrl has joined #osdev
ethrl has quit [Client Quit]
lg has quit [Ping timeout: 260 seconds]
Likorn has joined #osdev
<mrvn> ddevault: do you have pong? Snake? frogger?
<ddevault> no, but why walk when you can run
mahmutov has joined #osdev
mahmutov_ has joined #osdev
mahmutov_ has quit [Client Quit]
heat has quit [Ping timeout: 248 seconds]
henistein has quit [Quit: Client closed]
<doug16k> neat, I didn't know my system had RAM all the way up to 0xdfffffff. I wonder if the hole would be bigger if I booted with CSM
ethrl has joined #osdev
ethrl has quit [Client Quit]
lg has joined #osdev
vdamewood has joined #osdev
ethrl has joined #osdev
ethrl has quit [Client Quit]
vinleod has joined #osdev
vdamewood has quit [Killed (lead.libera.chat (Nickname regained by services))]
vinleod is now known as vdamewood
ethrl has joined #osdev
ethrl has quit [Client Quit]
Burgundy has quit [Ping timeout: 240 seconds]
srjek has joined #osdev
<geist> also if you're using a discrete GPU it tends to be bigger, since 'stolen' graphics ram tends to be just off the top of what appears to be the end of ram
<geist> sometimes you can probe past it and actually find the framebuffer
<geist> er smaller. . okay to rephrase, integrated graphics tend to steal ram off the top of where ram appears to stop
<geist> ie, it'll say ram goes up to 0xb000.0000 but actually there's a chunk at b... to 0xc000.0000 that's just not accounted for
<geist> but it actually ends up being a chunk of ram that is given to the graphics card
<qookie> do dgpus actually steal any CPU memory?
<geist> not that i know of, aside from whatever stuff the driver might allocate locally
<geist> dgpus have their own little address space in their universe, and their own mmu to see their own stuff
<qookie> ah I misunderstood what you said
<geist> yah that's cause i wrote it backwards on the first line. igpus are the one that steals cpu memory
<qookie> yeah
<qookie> but these days afaik they don't steal much, most memory is mapped in via the GTT
<qookie> on my system (integrated AMD Vega 6 or 7) only 512M is stolen, but games can use way more
<geist> yah, though usually something like 64-256MB in my experience
<geist> yah exactly. enough that it causes your end of lower ram to appear to stop shorter than it should
xenos1984 has quit [Read error: Connection reset by peer]
<geist> considering where PCI space starts up, etc
<qookie> yeah
<geist> i found a lot of this by fiddling with the TOLUD MSRs and whatnot on AMD and the intel equivalent
<geist> it's how some of that sausage is made basically
<geist> the registers that control where the cpu stops trying to decode DRAM and starts trying to decide mmio space
sonny has joined #osdev
xenos1984 has joined #osdev
<geist> awww crap. a week after installing the new motherboard: server locked up, exactly the same way
<zid> Turns out once a week, the cleaner comes past the server, plugs a vacuum cleaner into the same outlet, and vacuums the floor
<geist> pretty much
<geist> this mobo is kinda neat though: it has the build in aspeed thing so i can log into a web page and see the console and reset it and whatnot
<geist> nothing interesting on the event log though
<zid> oh yea I've seen those controllers
<zid> It's a bit like ME but vendor specific I guess
<zid> https://www.aspeedtech.com/server_ast2500/#:~:text=AST2500%20is%20ASPEED's%206th%20generation,best%20performance%20server%20management%20solution
<zid> I happened to have looked at this one last week
<geist> yah i think they're fairly ubiquitous. has a little OS on it that is running some thing. yep exactly that one
<zid> I think the main reason is that if you're adding a 2MB VGA framebuffer from matrox, you might as well get this thing instead
<geist> if you have a DGPU on board it doesn't show up on PCI, but somehow the bios has some sort of knowledge to enable it's VGA feature if it doesn't see a DGPU
<geist> 100% and you can then be actually headless with it and still get a console
<geist> though the little web page it serves leaves a bit to be desired, and i would not expose it to anything from a security point of view
Burgundy has joined #osdev
<zid> yea nor ME, as it turns out
<zid> it's had exploits before in its various stacks, which always ends up making the news
<geist> so now i'm starting to believe that zen 2s are simply not stable as a long term system. i've now seen a 3900x and 3950x fail the same way. they're running kinda hot but not super hot
<zid> I wonder how it gets its firmware and stuff, I guess the bios is just adapted to knowing its there and blats it in at device discovery time or something
<geist> i suppose it could be the PSU or memory though, so i guess i can start popping pairs of ram and see. need to establish what the new MTBF is in the new regime. ran for 8 days before locking up this time
<zid> You could still blame your PSU if you like
<zid> or ram yea
<geist> it just seems so unlikely
<zid> eh my machine is perfectly stable until you load enough of the cores for long enough
<zid> which on a "random server setup" is actually very unlikely
<zid> (the 3.3V rail issue I discussed before)
<geist> in this case it doesn't seem to be load related, or even related to being a VM host or not. seems to fail equally fast if i am running a bunch of qemu instances or not. usually fails when it's not loaded at all
<zid> Could be failing to come out of a sleep state then?
<zid> asks the VRMs for more juice, they try draw it from the psu, psu ramps it too slow
<geist> i suppose it's possible it could simply be a linux bug, but that seems highly unlikely
<zid> and everything is undervolted for a bit
<geist> maybe?
<zid> I can't rule it out, at least
<zid> so a psu is a thing you could certainly try
<geist> yah, actually plan on moving to another case tomorrow anyway, so i'll switch to another equal but different PSU at the same time
<geist> have basically the same case for a test machine that has more vents, so going to move it there and install some more fans so it can hopefully run cooler
<geist> it's a nice cheapo case. corsair 100r case. nice solid cheap case
<geist> lots of drive bays and can hold a full atx
<zid> My case is a bit of cheap whatever that I took the window out of to make the cooler fit :P
<zid> the ssds are hanging down from their psu cables
<geist> yeaaaaaaah
kingoffrance has joined #osdev
sonny has quit [Ping timeout: 252 seconds]
sonny has joined #osdev
GeDaMo has quit [Quit: There is as yet insufficient data for a meaningful answer.]
sonny has left #osdev [#osdev]
heat has joined #osdev
<heat> geist, TOLUD is a thing on chipsets for intel
<heat> you'll see it in your chipset docs and there are a bunch of references to it in i915 docs
<geist> yeah there's some AMD equivalent
<geist> like many things i think the AMD one is more straightforward, but its called something vaguely similar
<geist> ah TOPMEM
<heat> i've seen those "hidden" devices for my chipset
<heat> mine has a device that gets hidden after booting (somewhere in the SEC phase IIRC)
<bslsk05> ​fuchsia.googlesource.com: zircon/kernel/platform/pc/pcie_quirks.cc - fuchsia - Git at Google
<heat> completely stops responding to PCI accesses
<heat> it's fascinating
<geist> thats te AMD equivalent. basically trying to read where the PCI allocation space starts, for both the 32 and 64bit regions
<heat> what for?
<geist> i think the idea at the time is if we have to allocate space for PCI devices we need to compute what the aperture is available to us
<geist> i dont think it's really used, but we went ahead and wrote the code anyway, in case it was needed
<geist> but looking at the proper TOLUD/TOPMEM was needed because of the stolen graphics memory thing i was talking about before
<geist> RAM may appear to stop at some address, but it may actually extend past it in stolen graphics that appears as a unused chunk in the e820 stuff
<geist> so if you check TOLUD/TOPMEM you can find where the proper end of DRAM is
<heat> TOPMEM just the lower part?
<heat> is TOPMEM*
<geist> yah and there's a TOPMEM2 that is the end of the 64bit mapping
<geist> i dont know how you find that second spot on intel hardware. this is where AMDs is much more straightforward
<geist> just a pair of MSRs that tell you precisely what you want to know
<heat> btw I've seen AMD is adding some stuff to the EFI memory map on their new supercomputer platform
<heat> basically the gpu mem gets put in the memory map as well
theruran has joined #osdev
<geist> yah, wonder if that generally starts >4GB above TOPMEM2?
<geist> that would be fair game, since that's not decoded as ram
<heat> does TOLUD/TOPMEM include SMRAM?
<geist> dunno what SMRAM is
<heat> system management ram
<geist> dunno
<geist> does SMRAM even show up in the cpu's address space at all?
<heat> the fw steals a good chunk of memory to have as smm state and smm code
<heat> I believe so, you just can't touch it
* geist nods
<geist> in that case it's probably contained within TOLUD sice the idea is that's where he cpu stops trying to decode these as memory controller addresses
<heat> fun fact: it grows based on the number of cpus
<geist> (640k hole nothwithstanding)
<geist> i remembe ron AMD at least there's a set of MSRs that control the 640k hole. iirc a bitmap of 64k chunks. you can configure it such that the 640k hole doesn't exist if you want
<heat> how are those MSRs?
<heat> those should be chipset details afaik
<geist> yes, but all modern x86s are chipsets as well
<geist> this is the SOC side of the world
<heat> intel exposes everything on the PCI bus
<geist> AMD has fully embraced this and simply created a bunch of MSrs to configure stuff like this, root pci bus stuff, even the memory controller itself
<heat> just pci device registers everywhere
<geist> intel stucks to the old model and exposes it as pci stuff
<geist> yeah this is where AMDs solution is far mor straightforward
<heat> actually shouldn't you have catch any exceptions when reading those?
<heat> how do you know they're there?
<geist> when you look at it through that lens an AMD SOC looks a hell of a lot like a standard ARM SOC. a cpu with a bunch of system control registers to set up te world
<geist> intel at least pretends that there's some chip on the other side of the bus that configures everything
<geist> the pci bus that is
<heat> you're just checking if its an AMD cpu. how do you know if that specific platform has it?
<geist> you are the bios, you proibably simply know
<geist> this is bios level stuff
<geist> you can read the cpuid and see what cpu it is
<heat> I mean, this particular fuchsia code
<geist> checks for vendor AMD
<heat> and all AMDs have it?
<geist> all AMDs we care about it, but we also have a safe msr routine that catches the trap
<geist> it's not foolproof for sure. but i also am not sure this code is even called anymore since we moved PCI into user space
<geist> this particular routine may be vestigial
<geist> the general concern in my mind is when you boot on an virtual machine that exposes AMD but acts like something else
<geist> but in general not having a safe msr routine is annoying. i think down in the exception code we have some sort of mechanism for that
<geist> like if the #GP address is a particular non inlined msr instructon set an error code and return
<geist> annoying but basically necessary at some point
eroux has quit [Ping timeout: 244 seconds]
eroux has joined #osdev
<qookie> geist: instead of poking at platform regs, i think you can just ask acpi about the root bus resources if you need to allocate bar space for devices etc
<geist> yah i think that's the actual correct way, i think. problem is of course i think that involves the complex, bytecode parsing parts of ACPI
<geist> which we dont want in the kernel
<geist> but now that we moved pci driver into user space it's possible to get that stuff
<qookie> and besides without driver support for having bars move somewhere else, you're at the mercy of the fw for how much space it assigned to the bridge if you're allocating to a device behind one
<qookie> linux just bails if it's not enough
gorgonical has joined #osdev
<gorgonical> Porting this OS is a lot of work
<gorgonical> When do I get my certificate of genuine hackerman from geist
<geist> whatcha porting?
<gorgonical> A linux-ish kernel used in hpc stuff
<gorgonical> The ARM64 port that I half-did is also wildly incomplete in some areas. E.g. I'm 80% sure processes cannot receive signals on the arm64 port lol
<geist> ah
<gorgonical> We're doing some in-house risc-v design and having a kernel we can modify simply would be nice
<gorgonical> Linux works on a lot of the boards but good luck modifying any major subsystem
<geist> yah makes sense
<gorgonical> It's obvious to any of you I'm sure but there's just a lot of small details to resolve -- where does the TLS pointer go? How is the kernel stack arranged? What does context save/restore look like? What about traps/exceptions? All the hand-asm for things like atomics, etc. And they all vary by architecture
<qookie> speaking of TLS, is there any concrete docs on how it's supposed to work on aarch64? i only found a fuchsia.dev page about it and got the rest of the info i needed from looking at musl, guessing, and looking at our existing code for x86
<qookie> (the last part to figure out how TLS works in general :P)
<gorgonical> my understanding is that the TLS ptr is stored in tpidr_el0
<gorgonical> I don't know of any concrete docs. Is it just agreed-on convention between libc and the kernel? "Let's use tpidr_el0 and both of us agree not to clobber it?"
<geist> yah that's to me the fun part. figuring out all the arch specific details
<geist> seeing how one things maps to another thing
<qookie> yeah that much i figured, i mean the docs on how userspace expects it to be laid out
<geist> qookie: pretty sure it's well documented in the arm docs github
<geist> thats where the official ELF specs and whatnot exist
<qookie> i haven't found anything there about that in the elf abi supplement
<gorgonical> geist: it's definitely very exciting and I love the feeling of programming the machine itself. But each file I have to adapt and don't get to test makes me more afraid to run it
<qookie> for example, userspace expects that tls blocks start at TPIDR_EL0+0x10
<geist> i think it may be a similar one
<geist> ah yes. that may be where the ELF spec stops and the OS specific spec begins
<geist> or at least libc specific spec begins
<qookie> and the linker will sometimes hardcode an offset based on that assumption, for example in the local-exec model
<geist> yep
<gorgonical> and there's just a certain amount of work until the kernel will build at all, much less run
<qookie> but i have not found an official document from arm (nor GNU or anyone who makes toolchains) about that layout requirement
gorgonical has quit [Quit: Client closed]
gorgonical has joined #osdev
<gorgonical> Very rude network
<geist> but yes, FWIW TPIDR_EL0 is the user space TLS root
<geist> there's a TIPIDRRO_EL0 which as far as i know has no real use anywhere. not even sure linux uses it for anything
<geist> may do something like put the cpu # in it or whatnot
<qookie> linux uses it to stash a register in the interrupt handler in some code path
<geist> TPIDR_EL1 is of course up to the kernel to use, but generally it holds a pointer to either the current cpu sturcture or the current thread structure
<qookie> but yeah, we have tls more or less working in our libc, but i'm just annoyed i couldn't find any documentation about it
<geist> also note that x18 is free for the platform to use, in both kernel and user space. the abi says either it's temporary or platform use
<j`ey> qookie: its in the stack overflow path
<gorgonical> klange was saying you have to pass a compiler flag to make sure the compiler doesn't use it, right?
<gorgonical> about x18
<geist> -ffixed-x18 yes. otherwise it's up to the triple to default it to whatever use it has
<gorgonical> Ah which may be just a gpr
<geist> ya if the platform doesn't use it then it's another temporary
<geist> since x16, x17 are otherwise just interprocedural temporaries
<geist> x18 is too if it has no other use. it's basically the highest temporary, since x19 has some use
<geist> it's the first of the callee saved ones
<heat> I also want a certificate of genuine hackerman from geist
<heat> what's the final test
_xor has quit [Ping timeout: 246 seconds]
<kingoffrance> run on vax :D
<heat> "get a vax" sounds pay2win to me
<kazinsal> relax, send a fax to a vax; certified hax
<geist> noice
<mjg_> vaxcine
<mjg_> get it
<heat> hahahahahaha
<heat> omg
<heat> so funny
<mjg_> ikr
<heat> 😂😂😂😂😂😂
<heat> i should use emojis more
<heat> see if your shitty systems break
<mjg_> i'm on ubuntu so you are not far off
<heat> 🚫Ubuntu, 👍👌Arch linux, which I do use
<j`ey> btw
<psykose> btw
mahmutov has quit [Quit: mahmutov]
mahmutov has joined #osdev
<Ermine> btw
gorgonical has quit [Quit: Client closed]
mahmutov has quit [Remote host closed the connection]
mahmutov has joined #osdev
mahmutov has quit [Remote host closed the connection]
mahmutov has joined #osdev
mahmutov_ has joined #osdev
mahmutov has quit [Remote host closed the connection]
<klange> < geist> there's a TIPIDRRO_EL0 which as far as i know has no real use anywhere. not even sure linux uses it for anything ← it's the thread pointer on macOS, contrary to everyone else :)
heat has quit [Remote host closed the connection]
Brnocrist has quit [Ping timeout: 244 seconds]
heat has joined #osdev
Brnocrist has joined #osdev
mahmutov_ has quit [Ping timeout: 240 seconds]
opal has joined #osdev
bradd has quit [Ping timeout: 248 seconds]
pretty_dumm_guy has quit [Quit: WeeChat 3.5]
vdamewood has quit [Quit: Life beckons]
nyah has quit [Ping timeout: 276 seconds]
bradd has joined #osdev
nickster has quit [Ping timeout: 240 seconds]
dragestil has quit [Ping timeout: 248 seconds]
acidx has quit [Ping timeout: 248 seconds]
dzwdz has quit [Ping timeout: 260 seconds]
Ameisen has quit [Ping timeout: 260 seconds]
thaumavorio has quit [Ping timeout: 260 seconds]
Brnocrist has quit [Ping timeout: 276 seconds]
merry has quit [Ping timeout: 276 seconds]
gruetzkopf has quit [Ping timeout: 248 seconds]
Emil has quit [Ping timeout: 256 seconds]
Brnocrist has joined #osdev
varad has quit [Ping timeout: 244 seconds]
Vercas has quit [Remote host closed the connection]
Vercas has joined #osdev
dragestil has joined #osdev
Ameisen has joined #osdev
thaumavorio has joined #osdev
nickster has joined #osdev
<graphitemaster> geist, see the new floppytron?
acidx has joined #osdev
<bslsk05> ​'The Floppotron 3.0 - Computer Hardware Orchestra' by Paweł Zadrożniak (00:03:25)
Burgundy has quit [Ping timeout: 248 seconds]
<mjg_> solid t-shirt
merry has joined #osdev
gruetzkopf has joined #osdev