<klange>
I just replied to someone on the forum by quoting one part for their post, and then quoting another part. I love when people answer their own questions but don't realiz eit.
<zid>
My favourite is when they researched something for too long without understanding any of it, and have like 30 contradictory made-up things they expect you to dismantle
<geist>
yah i think there were about 1500 beboxes total. i dont remember what the split was between 66 and 133 mhz versions
<gog>
which do you have?
<geist>
and i do remember there were some ones with black plastic that were 200mhz, but they were never sold
<geist>
i dont think they were totally stable
<geist>
133s
<gog>
nice
<geist>
they're not really that fast. iirc they were 603s, not 603es, and the issue was either the L1 or L2 cache had to be disabled in the chipset. i think L2
<gog>
dang :(
<geist>
because of SMP issues, so i do remember there were a lot of situations where disabling one cpu was faster
<zid>
still true today
<geist>
but there was the Pulse app on beos that showed instantaneous load of the cpus, and had a little button next to each cpu
<geist>
so you could easily just turn one or more of the cpus off
<zid>
My CPU is significantly faster if you turn all the APs off and up the multiplier :p
<gog>
was this a soft off or did the cpu actually power down?
<geist>
soft off, just was some syscall that descheduled particular cpus
<gog>
ah ok
<geist>
not really any reason one couldn't build an app like that on windows or linux, assuming it has the appropriate permission to do so, but with beos you were basically root
<gog>
yeh
<geist>
beos really was a pretty slick system at the time. especially compared to win9x and macos classic
<geist>
it seemed myuch more like an advanced irix or nextstep
<gog>
i used to use this one exec wrapper on windows to force cpu affinity because simcity 4 had issues with it
<gog>
back when i got my first dual core
<geist>
heh yeah i remember that too. had a dual core ppro back in college that i ran NT 4 on and was always tinkering with SMP
<geist>
early exploration of OS kernel stuff basically
<gog>
it wasn't that far back, this was like core duo :P
<geist>
yah now that i think about i guess that q6600 i got somewhere in the mid 2000s was pretty much the first > 2 cpu machine i ever owned. that thing was a real trooper
<geist>
though i ended up replacing it pretty quickly with some sort of 4 core athlon
rwxr-xr-x has quit [Read error: Connection reset by peer]
<geist>
but i did always make a note of always having a SMP machine as my main box since about 1996-1997 when i got the dual ppro machine
<geist>
only real step back was the brief period where i had a hyperthreaded P4 (the box i was fiddling with last month) though it was still technically SMP
gog` has joined #osdev
<geist>
didja get kicked off?
<gog`>
no I packed up my computer for the night
andreas303 has joined #osdev
nvmd has joined #osdev
gog has quit [Ping timeout: 252 seconds]
dennis95 has joined #osdev
andreas303 has quit [Ping timeout: 264 seconds]
dennis95 has quit [Ping timeout: 260 seconds]
heat has quit [Remote host closed the connection]
heat has joined #osdev
gog` has quit [Read error: Connection reset by peer]
gog has joined #osdev
andreas303 has joined #osdev
gildasio has quit [Remote host closed the connection]
dennis95 has joined #osdev
gildasio has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
epony has joined #osdev
<heat>
geist, swapcontext is this old UNIX/POSIX thing you could use to swap user contexts
<heat>
for i.e makeshift user threads
heat has quit [Ping timeout: 260 seconds]
dude12312414 has joined #osdev
gildasio has quit [Ping timeout: 255 seconds]
remexre has quit [Remote host closed the connection]
remexre has quit [Remote host closed the connection]
remexre has joined #osdev
Clockface has joined #osdev
<Clockface>
does x86 have an instruction for exponiation?
<Clockface>
if it does, can i trust the compiler to use it when i call pow()
<zid>
no, no
<zid>
exponent has a pretty nice software trick to compute it fast though
<Clockface>
what is it?
<Clockface>
i cant use the bit shift one since im not doing ^2
<zid>
You just decompose it into powers of two
<zid>
n^7 is n^4 * n^2 * n^1 and n^4 is n^2 * n^2, and n^2 is n^1 * n^1
<zid>
so for ^129 you can do it in a couple of multiplies instead of.. 129
<zid>
etc
<Clockface>
thanks
<zid>
so basically just shift out the exponent, multiplying a running total by n each shift, and multiply the result by that total if the bit was 1
<zid>
few lines total
<zid>
for floating point.. hope the cpu can do it for you :D
<mrvn>
for floating point it uses a power series.
<mrvn>
Clockface: google for square and multiply algorithm
<Clockface>
alright, im not using floating points this time around
<mrvn>
f^f or f^i?
<mrvn>
Note that even for long long int the exponent has to be really small. 0^x is 0, 1^x is 1 and for everything else n < 64 or it overflows. So there are really few factors you ever need.
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
smeso has quit [Quit: smeso]
eryjus has joined #osdev
eryjus has quit [Read error: Connection reset by peer]
smeso has joined #osdev
bradd has joined #osdev
<moon-child>
avx512 has an instruction for exponentiation
<moon-child>
(a fact which I'm sure will give zid great joy)
<moon-child>
(hah, x87 has--well, not an exp instruction, but a log instruction. We really have come full circle)
<zid>
x87 has trig, so avx is still bad
<moon-child>
avx software trig is faster than x87 trig I don't make the rules
gildasio has quit [Ping timeout: 255 seconds]
gildasio has joined #osdev
<geist>
yah in general i think youc an do trig faster with modern instructiosn than using builtins
<geist>
i think that's why sse/avx doesn't have it. nor does ARM
<zid>
at 32bit or 80bit?
* geist
shrugs
<zid>
(32bit is arguably more useful though)
<geist>
i just remember looking at the benchmark numbers for a discreate vs builtin trig function thing on x86 and ARM and was surprised
<zid>
32bit probably needs so few rounds that it's trivial
<geist>
yah
<geist>
and really the builtins are just running microcode anyway, tey dont provide a number in one cycle
<zid>
It'd be interesting to see an alternate reality cpu where the entire ISA was optimized down to 1 cycle, what the die size change would be, what beats avx now, blah blah, etc etc
<moon-child>
I heard back in the early days somebody was able to do popcnt faster than the hardware
<zid>
I wonder if it's basically free in terms of the former, but it's like any big software project and nobody is allowed to mess with legacy features, they're all on teams to add avx etc
<moon-child>
what would be the point of taking everything down to 1 cycle?
<geist>
trouble is one of the things that goes is you optimize everythig to 1 cycle, but now that cycle takes too long
<geist>
so you trade IPC for CPS
<zid>
I'd like to see a *current* breakdown of which ops break first under higher clocks too
<geist>
or you mega pipeline stuff, ie the P4 method
<moon-child>
zid: overclock your cpu, try out a bunch of instructions, and see what happens
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
<\Test_User>
zid: easy, reduce clock to 1mhz, and do several parts in sequence within the 1mhz clock. 1 cycle :P
<zid>
moon-child: It's hard to do that in isolation, lots of bios code and stuff runs first
<zid>
and even beyond that, you need to do all the power management code + bench code yourself, bare metal
<moon-child>
\Test_User: .oO( vliw time? )
<\Test_User>
moon-child: multi-cycle vliw so future cpus can be happy to run more on the same code ;)
bauen1 has quit [Ping timeout: 265 seconds]
bgs has quit [Remote host closed the connection]
raggi has quit [Ping timeout: 255 seconds]
<mrvn>
geist: wasn't that kind of the aim of RISC? Throw out the complex multi-stage/cycle opcodes and do them in software instead?
<mrvn>
Also isn't anything (integer) but load/store/mul a single cycle?
k8yun has quit [Quit: Leaving]
k8yun has joined #osdev
<mrvn>
zid: considering ram access when uncached is somewhere along 200 cycles how are you going to get a load down to 1? Make super fast ram? Slow down the cpu?
k8yun has quit [Quit: Leaving]
<geist>
mrvn: right
<geist>
and also avoid microcode because you can directly decode
<geist>
note risc doesn't mean you can't pipeline it, but its effective throughput is one per cycle
<geist>
also sure, if the cache isn't available you can't get it to 1, but you design it such that there is a cache and then try to get the cache hits high
<geist>
also note there was risc as it ws in the 80s and risc as it is now, etc. a lot of the ideas are more general principles than hard rules like they were initially, which was intended to at the time make implementations easier because of this or that
heat has joined #osdev
<heat>
morning operating system professionals
gog has quit [Ping timeout: 265 seconds]
gog has joined #osdev
raggi has joined #osdev
<moon-child>
mul is 1 cycle throughput, if you count that
<mrvn>
For my own CPU I have the ALU 74181+74182, which is 6 or 7 gates deep and the barrel shifter which is 13 gates deep I think. So I could make everything 1 cycle except for shift being 2 cycles.
<mrvn>
A multiplication units has a far greater depth than that.
<moon-child>
I mean, obviously an adder will have more span than, say, an xor
<mrvn>
How do you do mul efficiently? You don't just take the first arg shifted AND bit of second arg and then sum them all up?
<mrvn>
moon-child: not really. I mean you could have a dedicated XOR unit that's just 16 gates in parallel. But the 74181 has all the logic functions included.
<moon-child>
'how do you' dunno! I am not an rtl expert
<moon-child>
but I know intel, and, and arm managed it somehow
<mrvn>
You could do the N shifts and then an ALU tree. So that would only be log N deep. Or divide&conquer, doing a 16x16 mul by splitting it into 8x8 muls and so on.
<zid>
1 cycle mul is actually really easy
<zid>
it just takes a bunch of wiring
<zid>
It's the same as the log trick for pow except it's a tall af adder instead of having to shift and mul
<zid>
(which makes sense, mul is to add as pow is to mul)
nyah has joined #osdev
Burgundy has joined #osdev
kof123 has joined #osdev
GeDaMo has joined #osdev
mykernel has joined #osdev
<mykernel>
getting gpf(ring3 data) on iretq(timer irq) - ONLY ON AMD CPU (on hardware)
<mykernel>
on bochs, qemu and two intel cpus everything is working
TkTech has joined #osdev
<mykernel>
i have syscalls working(syscall/sysret works great) but if an interrupt fires (ONLY ON AMD CPU) i am fcked
<heat>
this is the part where you link your interrupt handler asm
<bslsk05>
github.com: mykernel64/irq_stub.S at master · aleksav013/mykernel64 · GitHub
<heat>
you're not switching segments properly, you're not saving every register (which makes you rely on the compiler to do it for you in C code, which creates a very random register frame layout), you're pushing flags for no reason
<mykernel>
ty
<heat>
your interrupt stubs are huge, all they should be doing is pushing something and jmping to a common thing
<heat>
s/something/the interrupt number and account for the error code/
GeDaMo has quit [Ping timeout: 265 seconds]
Burgundy has quit [Ping timeout: 268 seconds]
<heat>
I also vaguely recall some tiny behavior being different in AMD vs Intel wrt segments
<heat>
does it crash if you don't enter user mode?
<heat>
wait, gpf ring3 data eh? I think you need to properly switch the ds, ss, es segments
<bslsk05>
cgit.freebsd.org: src - FreeBSD source tree
<mjg>
i'm sure the bug was deliberate and now the patch is breaking functionality
<kaichiuchi>
god -fsanitize=address is the best thing to ever happen
<kaichiuchi>
just had to deal with a segfault and the call stack was nowhere near where asan detected a fault
<heat>
facs
<gog>
i need to play with that
<kaichiuchi>
and I was pulling my hair out
<heat>
step 1) -fsanitize=address
<heat>
step 2) compiler do magik
<heat>
step 3) runnnnnnnnnnnnn
<gog>
do you need any support code in kernel?
<gog>
or doess the com
<gog>
piler do all of it
<heat>
for a kernel or for userspace?
<gog>
kernel
<heat>
oh sweetheart, you need plenty of support code
<gog>
oh
<heat>
so basically you can lookup my AWESOME SUPER NICE KERNEL for some codez
<heat>
but the gist of the thing is that the compiler expects some functions to be present __asan_{load, store}_{1,2,4,8,N}
<gog>
ook
<heat>
but ofc it's more complicated
<heat>
compilers have a inline asan stuff (which FWIW you can fully disable) which expect a linear shadow map at $ADDRESS and they just codegen the whole check, which gets you a bigger binary but some really nice perf boosts
<heat>
oh yeah note that the shadow map is a (1:8) map that says if it's fully accessible (byte = 0), partially accessible (1...7 for the first N bytes) and special not-accessible values (redzones, freed memory, etc, all up in the negative int8_t space)
<heat>
this also means that you can't directly touch non-kernel (lower half) addresses in KASAN instrumented code because the shadow map will only cover the upper 128TB
<heat>
oh and -fsanitize=kernel-address is usually what you use for the kernel, it's lighter and more geared towards (linux) kernels
<heat>
the whole complex shadow map shit goes away if you look at userspace because you can just mmap (128/8)TB and rely on overcommit
<gog>
yeh makes sense
<gog>
and the compiler can rely on a lot of library being present