#osdev on 2022-03-29 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:13 heat has quit [Remote host closed the connection]

00:13 heat has joined #osdev

00:14 Bitweasil has quit [Remote host closed the connection]

00:15 Bitweasil has joined #osdev

00:16 flx-- has quit [Remote host closed the connection]

00:17 flx-- has joined #osdev

00:21 dude12312414 has joined #osdev

00:21 Piraty has quit [Ping timeout: 240 seconds]

00:21 dude12312414 has quit [Remote host closed the connection]

00:21 Piraty has joined #osdev

00:30 <kazinsal> forgive me #osdev for I have sinned

00:30 <kazinsal> I wrote python for work today

00:30 <papaya> ngl I don't see a sin in that

00:30 <kazinsal> I wasn't told to do something in Python. I just thought, maybe it'd be easiest to write a script in Python instead of doing all of this by hand

00:35 <heat> i rate python

00:35 <eryjus> nothing like a pissed off snake....

00:37 <klange> kaz i wrote _a_ python for my os, ain't nothin evil about writing python!

00:39 <heat> the only thing I like more than writing C/C++ is not writing C/C++ because I can do it in python

00:39 <GreaseMonkey> i work on embedded software, and i never, ever call it "C/C++"

00:40 <GreaseMonkey> they're two different languages, i did end up liking C++ temporarily but hated it again once i realised i had a mess i revisited that i couldn't clean up without a lot of drudgery

00:40 <gog> i'm sinning right now and wrote a fake plugin for a unit test because i thought it would satisfy the gitlab ci service

00:40 <gog> it did not

00:40 <gog> WONTFIX:WORKSFORME

00:40 <heat> GreaseMonkey, they're two very similar languages in a lot of aspects

00:41 <GreaseMonkey> in practice, however, they're quite different

00:41 <papaya> I work mostly with Java/Spring on web applications

00:41 kkd has quit [Remote host closed the connection]

00:41 <heat> if you go into template hell? sure, super different

00:41 <papaya> I agree C and C++ have diverged too much to call it "C/C++"

00:41 <GreaseMonkey> C also doesn't use the C++ smart pointers thing

00:42 <GreaseMonkey> and quite frankly for personal projects i prefer to use Zig these days

00:42 <moon-child> C/C++ is undefined behaviour

00:42 <heat> neither does C++?

00:42 <moon-child> because / is not a sequence point

00:42 <papaya> C++ is a mess I like C more

00:42 <moon-child> and C++ mutates C

00:42 <GreaseMonkey> C++11 has the shared_ptr thing or whatever's current

00:42 <heat> C code is a mess, I like C++ more

00:42 <moon-child> ergo, if you write C/C++, I get to put demons in your nose

00:42 <heat> raii is highly superior to goto hell

00:42 <kazinsal> I am a mess, therefore I enjoy C

00:46 <gog> i'm sharpening my C++ teeth again

00:46 <gog> enjoying it

00:49 * papaya hands gog a tooth sharpener.

00:50 eddof13 has joined #osdev

00:50 <gog> ^w^

00:50 <no-n> uwu

00:50 <heat> weird that your teeth are C++

00:51 <heat> mine are based and rustpilled

00:51 <gog> cat::teeth

00:51 <klange> oops i had a nested log issue in kuroko and it didn't show up on linux because glibc's pthread locks support nesting...

00:52 <heat> don't you need to enable them explicitly?

00:53 <heat> oh, TIL EDEADLK

00:53 <heat> I was thinking of PTHREAD_MUTEX_RECURSIVE

00:54 <klange> not mutexes

00:55 Mutabah has quit [Ping timeout: 246 seconds]

00:55 Mutabah has joined #osdev

00:56 <heat> hmm, EDEADLK in glibc/musl is only returned for PTHREAD_MUTEX_ERRORCHECK mutexes

00:56 nyah has quit [Remote host closed the connection]

00:58 <klange> this whole thing is probably broken anyway, need to fix a lot of thread support stuff in kuroko that plays it too fast and loose

00:58 <klange> also need to write better reader-writer locks for toaru anyway

01:01 eddof13 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

01:03 <heat> reject fine grained locking

01:04 <heat> embrace the GIL like a good python interpreter

01:04 <klange> I refuse to take the the (GI)L.

01:06 <gog> welp

01:06 <gog> i got the unit test to work

01:07 <gog> on the CI server

01:07 <gog> all i needed to do was put the plugins in a build directory where the test binary could find them

01:08 tomaw_ has joined #osdev

01:09 bauen1_ has joined #osdev

01:09 bauen1 has quit [Ping timeout: 240 seconds]

01:09 Dreg has quit [Quit: Dreg]

01:10 smeso has quit [Ping timeout: 272 seconds]

01:10 ckie has quit [Ping timeout: 272 seconds]

01:10 ckie has joined #osdev

01:11 Dreg has joined #osdev

01:13 smeso has joined #osdev

01:15 tomaw has quit [Ping timeout: 604 seconds]

01:15 tomaw_ is now known as tomaw

01:16 gog has quit [Quit: byee]

01:22 <moon-child> who the fuck is outside scraeming global interpreter lock. show yourself coward. i will never lock the interpreter

01:23 <heat> GLOBAL INTERPRETER LOCK

01:27 <klange> IF PYTHON BANS ME FOR NOT LOCKING THE INTERPRETER I WILL FACE GUIDO AND WALK BACKWARDS INTO HELL.

01:37 Arthuria has quit [Ping timeout: 260 seconds]

01:50 <kazinsal> what guido doesn't want you to know is that p-langs work on highlander rules so if you successfully overtake python in popularity, even for a moment, you steal his powers

01:50 <gorgonical> and from that point on only you have the power to write PEPs

01:52 <klange> we call them KEKs - Kuroko Enhancement Krokosals

01:54 <heat> >kek

01:54 * heat keks in uefi

01:54 <klange> oh right let's rebuild that

01:56 <klange> I should do module builds for that...

01:56 <heat> is KEK-001 an interface for manipulating the KEK?

01:56 <kazinsal> proposal for how to submit proposals

01:57 <heat> no

01:57 <heat> the key exchange key

01:57 <heat> the best key ever

01:57 <heat> the keyest of keys

01:57 <kazinsal> (ten minutes later) we regret to inform you that the key exchange key has been compromised

02:02 <heat> top kek

02:05 srjek has quit [Ping timeout: 240 seconds]

02:34 <Clockface> are linux signals implemented as an interrupt?

02:35 <Clockface> well, likely not now that i think about it?

02:35 <Mutabah> yes and no...

02:35 <Clockface> does it jump somewhere else next-timeslice?

02:35 <Mutabah> Some signals are triggered by an interrupt (e.g. timer based ones)

02:36 <Clockface> that makes sense, i guess its indirectly based on an interrupt since everything shares a hardware interrupt

02:36 <Clockface> well, sometimes its a hardware interrupt

02:37 <Mutabah> I've not looked at how linux does it, but I'd implement signals by having a check in the syscall stub that checks if a signal is pending, and returns to a different userland location if it is

02:37 <Clockface> im not implementing them yet, i just have been using them recently and started thinking how linux does it

02:41 <heat> you check for pending signals at specific moments

02:41 <klange> While signals and interrupts are conceptually similar, they have nothing to do with each other in implementation.

02:41 <heat> inside/around interruptible waits, in irq entries and exits, in all other kernel entry points and exits, etc

02:42 <heat> you can theoretically IPI another processor to reduce signal latency but I don't think anyone does that

02:42 <klange> Mutabah: Yeah, that's basically how Linux does it, though any return to userspace can trigger a signal, not just syscall returns - means you can signal something that isn't making syscalls, eg that broken thing doing an infinite loop.

02:43 terrorjack has quit [Quit: The Lounge - https://thelounge.chat]

02:44 <heat> the only way a signal triggers an interrupt is if it wakes an interruptible thread that has a higher prio than the current thread (so it wants to schedule it out)

02:44 <heat> the canonical way to do that scheduling out is through an IPI

02:44 <Mutabah> klange: Yeah, I was too lazy to add in "and interrupt handlers"

02:45 <klange> I just redid my signal stuff recently.

02:45 terrorjack has joined #osdev

02:46 <heat> the only big thing I'm missing is restartable syscalls

02:47 <heat> re: restartable, something cool I wanted to add: restartable sequences

02:47 <klange> I've got those for a couple of cases. Haven't done the fancy time-based stuff Linux supports.

02:51 knusbaum has quit [Ping timeout: 260 seconds]

02:53 knusbaum has joined #osdev

03:00 skipwich_ has quit [Quit: DISCONNECT]

03:02 <heat> i should revisit my scheduler

03:02 skipwich has joined #osdev

03:02 <heat> it's just a round robin with priorities

03:02 kkd has joined #osdev

03:02 <klange> I don't even have priorities.

03:03 <klange> Granted, I don't really want to add them or improvement as at this point it exists as a demonstration of "round robin with no priorities is actually good enough".

03:03 <klange> improve it*

03:03 <klange> It's my response to multiple semesters of operating systems / systems design courses spending way too much time on scheduling back in my uni days.

03:04 <klange> The _one_ improvement I would consider is something with core affinity.

03:05 <heat> i was thinking of something like a multilevel feedback queue

03:05 <heat> it would be a simple-ish thing

03:05 <heat> and better than what I have

03:06 <heat> i could also look into fuchsia's fair scheduler

03:06 <klange> oh there is one other thing I should probably fix which is that my time slicing for preemption is totally yolo

03:10 <klange> on x86 at least, it's not "you get 'n' units of time before you're getting scheduled out", it's "every 'n' units of time, something gets scheduled out" because it's periodic timers that aren't getting reset

03:11 <heat> hm?

03:12 <heat> my scheduler's heartbeat just decrements the current quantum

03:12 <heat> on a switch it's reset to 10ms

03:15 <klange> that sounds better than what I do

03:16 <klange> I don't even remember off-hand what my periodic timers are set to

03:25 smeso has quit [Quit: smeso]

03:35 smeso has joined #osdev

03:43 <kazinsal> I think my PIT is set to 60 Hz despite my OS not having any concept of graphics output...

03:48 <geist> yah i do something pretty similar to heat in LK. works pretty well

03:48 <geist> depending on the platform it may only have periodic timers or dynamic ones, but it's the same result

03:49 <heat> all my timers have periodic support but its never used anywhere

03:50 <heat> mostly because all of the timers I've found have oneshot

03:57 <geist> yah i've used periodic from time to time on various platforms. generally very low end ones where the cost ot recomputing the next event is generally more expensive than just a periodic timer

03:57 <geist> but it's a build option, so easy eough to switch back and forth

04:09 bradd has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

04:10 bradd has joined #osdev

04:11 freakazoid12345 has quit [Ping timeout: 256 seconds]

04:42 heat has quit [Ping timeout: 252 seconds]

04:45 Oli_ has quit [Quit: leaving]

04:47 Oli has quit [Quit: leaving]

04:49 no-n has left #osdev [#osdev]

05:14 Likorn has joined #osdev

05:19 sonny has joined #osdev

05:23 Arthuria has joined #osdev

05:37 skipwich has quit [Ping timeout: 260 seconds]

05:39 skipwich has joined #osdev

05:45 Likorn has quit [Quit: WeeChat 3.4]

06:02 kkd has quit [Remote host closed the connection]

06:30 Arthuria has quit [Killed (NickServ (GHOST command used by guest2795))]

06:30 Arthuria has joined #osdev

06:34 Arthuria has quit [Ping timeout: 246 seconds]

06:39 sonny has quit [Ping timeout: 250 seconds]

07:04 nur has joined #osdev

07:05 the_lanetly_052_ has joined #osdev

07:27 <mrvn> klange: I found that round-robin improves a lot when you add move-to-front. Like when some task gets woken up that's been sleeping longer than one round it gets put to the front.

07:29 <mrvn> geist: Not sure I ever had a case where "computing" the time till the next event is costly. That's just looking at the top of the heap. But reprogramming a timer to fire at the right time might be costly or, like the PIT, plain impossible.

07:33 <mrvn> Well, strike that, just thought of a case. :) For the network code I have a timeout algorithms with bins: 2 ticks, 4 ticks, 8 ticks, 16 ticks. Every tick one of them is processed fully and each item put into the bin for the remaining time. No way to now when the next event happens there except it's longer than a tick.

07:33 <mrvn> It's kind of a bucket sort amortized over time.

07:36 <mrvn> The idea is that most timers will be removed before they even get processed once, >99% get removed before they expire. So add/remove is O(1) and expiring is O(log duration).

07:37 <mrvn> Linux has pretty much the same.

07:47 bauen1_ is now known as bauen1

07:58 gog has joined #osdev

08:03 Likorn has joined #osdev

08:04 Likorn has quit [Client Quit]

08:06 no-n has joined #osdev

08:21 C-Man has quit [Ping timeout: 260 seconds]

08:34 wand has quit [Ping timeout: 240 seconds]

09:11 GeDaMo has joined #osdev

09:27 Vercas has quit [Quit: buh bye]

09:29 Vercas has joined #osdev

09:31 gog has quit [Quit: byee]

09:44 bauen1 has quit [Ping timeout: 252 seconds]

10:36 the_lanetly_052_ has quit [Ping timeout: 245 seconds]

10:55 dennis95 has joined #osdev

11:01 Vercas has quit [Remote host closed the connection]

11:01 Vercas has joined #osdev

11:18 wereii has quit [Ping timeout: 240 seconds]

11:18 Burgundy has joined #osdev

11:21 C-Man has joined #osdev

11:23 Payam19 has joined #osdev

11:29 wereii has joined #osdev

11:42 srjek has joined #osdev

12:10 Payam19 has quit [Quit: Client closed]

12:11 Likorn has joined #osdev

12:15 heat has joined #osdev

12:23 atrapa has joined #osdev

12:39 nomagno has joined #osdev

12:50 X-Scale` has joined #osdev

12:52 X-Scale has quit [Ping timeout: 256 seconds]

12:52 X-Scale` is now known as X-Scale

13:03 heat has quit [Ping timeout: 252 seconds]

13:12 pretty_dumm_guy has joined #osdev

13:19 MiningMarsh has quit [Ping timeout: 240 seconds]

13:24 MiningMarsh has joined #osdev

13:29 gildasio has quit [Remote host closed the connection]

13:29 gildasio has joined #osdev

13:41 Jari-- has quit [Ping timeout: 250 seconds]

14:21 Likorn has quit [Quit: WeeChat 3.4]

14:39 freakazoid343 has joined #osdev

15:05 wand has joined #osdev

15:08 amazigh has quit [Quit: WeeChat 2.8]

15:10 amazigh has joined #osdev

15:13 atrapa has quit [Quit: atrapa]

15:21 Oli has joined #osdev

15:28 vancz_ has quit [Quit: vancz_]

15:28 pie__ has quit [Quit: pie__]

15:28 Arthuria has joined #osdev

15:29 vancz has joined #osdev

15:29 pie_ has joined #osdev

15:34 vancz has quit [Quit: vancz]

15:34 pie_ has quit [Quit: pie_]

15:37 pie_ has joined #osdev

15:37 vancz has joined #osdev

15:38 mahmutov has joined #osdev

15:41 bauen1 has joined #osdev

15:43 Likorn has joined #osdev

15:44 gdd has quit [Ping timeout: 272 seconds]

15:45 elastic_dog has quit [Ping timeout: 260 seconds]

15:46 gdd has joined #osdev

15:48 elastic_dog has joined #osdev

15:57 gog has joined #osdev

15:59 heat has joined #osdev

16:08 Jari-- has joined #osdev

16:14 Bonstra has quit [Ping timeout: 252 seconds]

16:17 <catern> are there any operating systems on any architectures that can do a process switch *while the process is stalled mid-instruction waiting on a load from main memory*?

16:18 <Bitweasil> Nothing I'm aware of, it would be false savings.

16:18 <Bitweasil> That's what SMT/Hyperthreading does, though.

16:18 <Bitweasil> While one thread is stalled on a main memory load, it will use the execution units to go make progress on the other thread.

16:18 <Bitweasil> (among other things it can do)

16:19 <Bitweasil> If you're stalled on RAM, blowing your cache out with a kernel trip is the wrong thing to do.

16:19 gdd has quit [Ping timeout: 260 seconds]

16:19 <catern> indeed indeed, I partially ask this from inspiration by hyperthreading

16:19 <Bitweasil> A quick trip to the kernel is a few thousand cycles, more if you have to do things like schedule another task.

16:20 gdd has joined #osdev

16:20 <catern> yes, probably any OS/arch which did this would need to have an extremely fast transition to the kernel and scheduler

16:21 <geist> yah i think SMT is the closest to what you're asking for

16:21 <Bitweasil> And it's almost certain that after some period of operation, the kernel entry/scheduler/etc will not be in L2 cache.

16:21 <catern> you already want to schedule processes that mostly share their cache on pair hyperthreads though :) so this would just be an extension of that desire in the scheduler

16:21 <Bitweasil> SO now you're right back to waiting on memory.

16:21 <catern> geist: oh for sure, I just am curious if anyone ever, basically, did "SMT in software"

16:21 <Bitweasil> Or you want to disable hyperthreads because of all the impossible-to-plug leaks between them. ;)

16:22 <Bitweasil> I've stopped buying HT processors, I just disable them anyway.

16:22 <geist> yah i dont think so, since there's not a good way for software to work between the instructions

16:22 <geist> unless it's, sya, some sort of microcoded thing

16:22 <catern> yeah you'd need an extremely fancy architecture

16:22 <heat> is SMT done in microcode?

16:22 <Bitweasil> And an i5 without HT is usually a lot cheaper than an i7 with, for ~the same single core, and disturbingly close to the same on multithreaded loads, unless you're heavily stalled.

16:22 <geist> ie, the microcode decides that its going to block so it switches tasks

16:22 <Bitweasil> It's the scheduler.

16:22 <Bitweasil> You just have two threads (or more) feeding into a common set of execution units.

16:23 pie_ has quit [Quit: pie_]

16:23 vancz has quit [Quit: vancz]

16:23 <Bitweasil> It's not really "switching between threads" so much as "Just running both threads at the same time, and if one stalls, the other will have more EUs to execute on for a while."

16:23 <geist> but there are cpus that have existed where the user facing 'cpu' was just microcode running an emulator

16:23 <Bitweasil> True, that... uh...

16:23 <Bitweasil> Linus was involved in it.

16:23 <Bitweasil> Laptop CPU from the early 2000s?

16:23 <Bitweasil> Transmeta?

16:23 <geist> and the microcode itself can context switch. the Xerox Alto is the usual early example

16:24 <geist> and then things like transmeta or i think nvidia Denver

16:24 <Bitweasil> I've *still* never figured out what Denver was/is.

16:24 <Bitweasil> It sounded like "Bolt a couple beefy ARM cores to the GPU."

16:24 <Bitweasil> But everyone seems to think it's something slightly different.

16:24 <geist> same. i *think* some transmeta folks ended up there and convinced nvidia to have another go at it

16:24 <Bitweasil> I don't think it's ARM emulated on GPU, though.

16:24 <geist> the soc maybe, but the Denver core is its own thing

16:24 <geist> but it's just code some microcode going on

16:25 <Bitweasil> For a bunch of compute tasks that are GPU heavy, nVidia just needs to bolt some ARM cores and high bandwidth networking on the GPU, presto, standalone node.

16:25 <Bitweasil> Netboot it.

16:25 <geist> sure

16:25 <Bitweasil> Oh, I didn't realize Denver was specifically a core.

16:25 <geist> side note, they just announced their Grace ARM cpu

16:25 <geist> goes with their new Hopper GPU arch

16:25 <geist> cute!

16:25 <Bitweasil> I saw something like that, hadn't ... OH!

16:26 <Bitweasil> lol.

16:26 <Bitweasil> Hadn't paid much attention.

16:26 <Bitweasil> More interested in the new Rockchip SoC lately.

16:26 <geist> oh? which one?

16:26 <Bitweasil> RK3588. Should be good!

16:26 <geist> oh neat

16:26 <Bitweasil> Quad A76, quad A55, up to 32GB RAM, NVMe, couple GPU ports, and a HDMI in.

16:27 <catern> geist: so interesting point about the microcode context switching. I guess it would be possible in theory for such a system to context switch on memory stall

16:27 <catern> the question is whether any of them ever actually did...

16:27 <Bitweasil> catern, again, SMT does that in practice, very well.

16:27 <Bitweasil> Spending thousands of cycles going to the OS instead of "Oh, hey, I'll just run from this queue over here..." doesn't seem to make any sense.

16:27 <Bitweasil> Memory stalls aren't *that* slow.

16:28 <Bitweasil> And a kernel trip requires a ton of memory accesses that aren't likely to be cached.

16:28 <Bitweasil> Unless you start locking OS stuff into cache.

16:28 <catern> Bitweasil: well, alternative idea: maybe if there was a way to dynamically scale up SMT, haha - ask the core to give you another 10 register sets and start SMT-ing between them

16:28 <Bitweasil> Now you've just eliminated a bunch of cache that could have prevented the stall in the first place.

16:28 <Bitweasil> There are some SMT4 and I think a few SMT8 cores.

16:28 <Bitweasil> At some point, the gains stop happening for almost all workloads, and, again, you're splitting cache.

16:29 <Bitweasil> SMT2, generally, you've got about half the L2 per thread. SMT4, 1/4. So you hit diminishing returns in a hurry.

16:29 <Bitweasil> Meanwhile, Apple bolted insane giant gobs of L1 on their cores instead, and seems to be doing a solid job with it.

16:29 <Bitweasil> It's... 192kb L1I and 128kb L1D *per* core? Or maybe flip those.

16:29 <Bitweasil> (on the M1)

16:30 <geist> yah the SMT4 in the cavium i can say is clearly a diminishing returns

16:30 <Bitweasil> Plus gigantic L2s, and a huge L3 system cache, on top of fantastic latency to DRAM.

16:30 <geist> i can configure it to SMT2 in firmware and really it performs just as well

16:30 <geist> but the novelty of SMT4 is too great

16:30 <Bitweasil> There are plenty of workloads in which SMT2 is slower, total system throughput-wise, than no SMT, because the bigger effective cache per core is more useful than filling the bubbles.

16:31 <catern> (i'm not too concerned about losing cache because I was thinking about this in the context of explicitly-software-managed scratchpad memory anyway, where you could just make the tradeoff of "larger scratchpad" vs "more threads" explict to software)

16:31 <Bitweasil> I improved wall clock time on some Java stuff by reducing the total threads involved in stop the world GC, because it was thrashing the cache.

16:31 fkrauthan has quit [Quit: ZNC - https://znc.in]

16:31 <Bitweasil> (and reduced total CPU time massively - cores thrashing the cache aren't doing anything useful)

16:32 <Bitweasil> That sounds more like microcontroller space, where you often enough have single cycle SRAM anyway.

16:32 <Bitweasil> So you won't stall on memory accesses in the first place.

16:32 fkrauthan has joined #osdev

16:34 MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]

16:37 <geist> and there are a few SMT like microcontroller things. propeller

16:37 <geist> and there was a network processor i remember years ago bumping into that was kinda neat

16:38 <geist> had 8 threads, and you coud bind IRQs to threads if you wanted, so it had real time guarantees of task switching

16:38 <Bitweasil> I think a Defcon badge from years back was a propeller?

16:38 <geist> yah somewhere i have a propeller dev board that i never futzed with

16:38 <Bitweasil> High end NICs seem like something that could make use of that too, there's not a lot of processing, but when you get into the virtual NIC splitting, having "separate cores" would be useful.

16:38 <Bitweasil> Well, a lot of processing, but most of it is hardware accelerated stuff.

16:38 <Bitweasil> Checksum calculations and the like.

16:39 MiningMarsh has joined #osdev

16:39 <Bitweasil> I know it gets crazy complex, I've never had network links where it mattered.

16:39 <Bitweasil> I still think gigabit is pretty cool.

16:39 pie_ has joined #osdev

16:39 vancz has joined #osdev

16:57 rustyy has quit [Quit: Lost terminal]

16:59 Bonstra has joined #osdev

17:02 rustyy has joined #osdev

17:09 heat_ has joined #osdev

17:09 heat has quit [Read error: Connection reset by peer]

17:26 dennis95 has quit [Quit: Leaving]

17:29 <catern> tangentially I wrote https://catern.com/compdist.html comments anyone?

17:29 <bslsk05> catern.com: Your computer is a distributed system

17:32 Oli_ has joined #osdev

17:32 Oli has quit [Ping timeout: 260 seconds]

17:34 <Bitweasil> > This is especially painful on NUMA architectures, where different memory accesses can have radically different relative costs.

17:34 <Bitweasil> It's not *that* bad between sockets.

17:34 <Bitweasil> Extra couple cycles, typically.

17:35 <Bitweasil> I make my money in the weeds of all that complexity and have no interest in anything much higher level than C, except for my small board ARM hobby of "discovering what doesn't build on AArch64 because it believes that x86 is the only 64-bit processor out there so downloads x86 binaries to run."

17:36 <Bitweasil> And the reality is that a lot of the stuff you're talking about is well hidden, and... you seem to have an exceedingly pessimal view of the latency of cache misses.

17:36 <clever> Bitweasil: i recently found that even a single-socket design can be numa

17:36 <clever> https://randomascii.wordpress.com/2022/01/12/5-5-mm-in-1-25-nanoseconds/

17:36 <bslsk05> randomascii.wordpress.com: 11 mm in 1.25 nanoseconds | Random ASCII – tech blog of Bruce Dawson

17:37 <clever> Bitweasil: the bus connecting the cores to the L2 cache, has a longer path for some cores

17:38 <Bitweasil> Ok, and at 4GHz, 1.25ns is 5 cycles.

17:39 <clever> the thing i find a bit odd, is that nearly all logic in a cpu is deterministic

17:39 <clever> always triggering on a clock edge

17:39 <catern> Bitweasil: lol I added that line because someone else complained I didn't mention NUMA

17:39 <clever> so its not simply a matter of the wire being longer, and having a speed of light imposed latency

17:40 <clever> somebody had to choose to add flip-flops into the datapath, that would delay it for a clock cycle

17:40 <Bitweasil> I don't recall how many of the late Netburst era pipeline stages were literally just drive stages, pushing the signal on, but it was a non-trivial number.

17:40 <catern> Bitweasil: and was like "well SMP is not NUMA so this doesn't apply!!!"

17:40 <Bitweasil> ... I never said that.

17:40 <Bitweasil> I said the latency difference between sockets isn't that significant compared to DRAM latency in general.

17:40 <catern> Bitweasil: no no I know I'm just saying why I added it

17:41 <Bitweasil> And SMP is *typically* single socket, per-core L2 vs shared global L2.

17:41 <catern> nothing about you

17:41 <Bitweasil> Ok.

17:41 <catern> (really I'm just thinking aloud about how to remove it or edit it...)

17:41 <Bitweasil> L3 will see that sort of behavior, depending on how many ring stops away it is.

17:43 <catern> I think I'll just remove the mention of NUMA

17:44 <catern> still! if that's the only complaint of someone as knowledgeable as you, I think I'm pretty set

17:50 <Bitweasil> Oh, I think you're whining about problems that mostly don't meaningfully exist without offering solutions, but it's not *wrong.*

17:50 <Bitweasil> I just don't play in those spaces where it matters.

18:03 <catern> that's good enough for me

18:04 the_lanetly_052_ has joined #osdev

18:05 <Bitweasil> You just need to spend some time doing latency tests and understanding how things relate to each other.

18:06 <Bitweasil> The Anandtech review articles are a good start for memory vs cache latency.

18:06 <Bitweasil> Log scale charts, usually, but you're on the order of ~1ns for cache, ~100ns for DRAM.

18:06 <Bitweasil> That's only 400 cycles, and a syscall/sysret pair isn't free. Plus, kernel not being in cache if the thread has been running for a while.

18:07 <Bitweasil> So you're back to waiting on DRAM for the kernel, to save time... waiting on DRAM. You probably could design a chip that it would work on, but I don't think that's the best use of design and transistor resources.

18:07 <Bitweasil> What I would *love* to see, and don't think exists, is when hyperthreading is disabled, the core gets double the L1 cache...

18:07 <clever> and that leads back into a second bug from the page i linked above

18:07 <clever> the cpu had a prefetch opcode, so you could tell the cpu what data your going to want

18:07 <clever> so the latency is hidden, and its already in the L1 when you do need it

18:08 <clever> but, if there is a TLB miss, it just doesnt prefetch

18:08 <clever> so you would randomly have to pay that cache-miss cost

18:08 <Bitweasil> *Proper* processors will let you pin TLB entries, and you can often do something sane with them if you care, but... yeah.

18:08 <clever> the solution in this case, was switching from 4k pages to 64k pages

18:08 <Bitweasil> Of course, Intel prefetch will fail depending on where in the process it loses the rails, so you can probe things with it.

18:08 <clever> so a single TLB entry covered more data

18:09 <Bitweasil> Yeah, large pages are nice.

18:09 <clever> and large consecutive memcpy's would need fewer TLB slots

18:09 <Bitweasil> IIRC Apple uses 16kb pages on most of their ARM stuff.

18:09 <j`ey> YRC

18:10 <clever> another factor ive discovered on the rpi, ive got acess to an opcode that can load 4096 bytes in a single shot

18:10 <clever> i suspect thats making use of burst transfers, so while you may have a high latency to start the transfer, you only pay that latency once per 4kb

18:11 <clever> and that lets me nearly saturate the dram, without any cache involvement

18:11 Arthuria has quit [Read error: Connection reset by peer]

18:12 <Bitweasil> Where's it load the 4k to?

18:12 <clever> the vector registers

18:12 <Bitweasil> Oh, it's got enough space for that?

18:12 <clever> yeah

18:12 Brnocrist has quit [Ping timeout: 256 seconds]

18:12 <Bitweasil> I've not touched the vector stuff yet. Neat!

18:13 <clever> its only on the VPU side of things, the arm cant access it

18:13 <Bitweasil> ... which would be why it didn't sound familiar, got it. :D

18:13 <clever> basically, its an uint8_t[64][64]

18:13 <Bitweasil> I was doing the math on the ARM vector registers in my head thinking I was coming up an awful lot short.

18:13 <clever> at the cost of columns, you can join 2 or 4 vectors of 16, to form a 16bit or 32bit field

18:13 <clever> so it can also act as an uint32_t[64][16]

18:14 <Bitweasil> *nods*

18:14 <clever> you then specify an xy coord, a bit width(8/16/32) and a direction (horizontal or veritcal) to make a vector of T[16]

18:14 Brnocrist has joined #osdev

18:14 <clever> and you can optionally specify it to repeat an operation a power of 2 times (1 to 64), while incrementing either x or y

18:16 <clever> so the most extreme limit, is copying an uint32_t[1024] into/outof the vector core

18:18 <catern> (that's almost big enough to be scratchpad memory)

18:18 <clever> but i dont think you can do any scalar access into the vector file

18:19 <clever> the only vector->register options you even have, is to store the sum of a vector into a scalar reg

18:19 <clever> otherwise, its almost entirely vector<->ram or scalar->vector

18:19 <clever> or vector<->vector of course

18:27 Ali_A has joined #osdev

18:30 atrapa has joined #osdev

18:34 Arthuria has joined #osdev

18:36 k8yun has joined #osdev

18:38 nyah has joined #osdev

18:45 dude12312414 has joined #osdev

18:47 <Bitweasil> Sounds like a GPU to me!

18:48 <clever> Bitweasil: yep, but its entirely seperate from the 3d core

18:48 <Bitweasil> It's a Raspberry Pi. Of course it makes no sense. :p

18:49 <clever> i suspect this feature pre-dates the 3d core being on the chip

18:49 <clever> the official rpi firmware uses it for complex FFT operations

18:49 <clever> and i just had an idea!, a sampling cpu profiler, for the closed firmware

18:50 <clever> knowing the hot-spots in the code, would point towards what code is doing what tasks

18:51 <CompanionCube> does sound cool

18:52 <clever> basically, there are 4 32bit wide compare registers in the rpi's timer

18:52 <clever> i think the official firmware only uses 1

18:52 <clever> and with the arm having its own generic timer, the other 3 channels are unused

18:52 <clever> the VPU's vector table also allows a unique handler for every irq

18:53 <clever> so i could configure 2 timer interrupts, that fire at whatever delay i want (resolution of 1 uSec), and record the return addr i just interrupted

18:57 Ali_A has quit [Quit: Connection closed]

18:57 xenos1984 has quit [Read error: Connection reset by peer]

19:04 freakazoid343 has quit [Ping timeout: 256 seconds]

19:04 GeDaMo has quit [Remote host closed the connection]

19:04 Ali_A has joined #osdev

19:06 Oli_ has quit [Ping timeout: 272 seconds]

19:15 xenos1984 has joined #osdev

19:21 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

19:33 jimbzy has joined #osdev

19:46 Ali_A has quit [Quit: Connection closed]

20:03 heat_ has quit [Remote host closed the connection]

20:03 heat_ has joined #osdev

20:06 asocialblade has quit []

20:11 dude12312414 has joined #osdev

20:14 heat_ is now known as heat

20:14 bliminse has joined #osdev

20:37 asocialblade has joined #osdev

20:45 atrapa has quit [Quit: atrapa]

20:46 immibis has quit [Remote host closed the connection]

20:47 immibis has joined #osdev

20:49 Burgundy has quit [Ping timeout: 272 seconds]

20:50 asocialblade has quit []

20:51 <Clockface> when a signal handler is invoked in linux, where does it store where to return to when its done signal handling?

20:51 <zid> the stack

20:51 <Clockface> nice

20:52 <Clockface> ty

20:52 <Clockface> so ill find it at the top of the stack?

20:52 <zid> don't forget to add 128 to rsp on x86_64

20:52 <Clockface> why?

20:52 <zid> redzone

20:53 <Clockface> thank you

20:56 Oli has joined #osdev

20:57 <heat> Clockface, linux (and others) store the whole state in the stack

20:57 <heat> the siginfo_t, the mcontext_t, etc

20:58 pretty_d1 has joined #osdev

20:58 pretty_d1 has quit [Client Quit]

20:58 <heat> you can actually change the return state by changing those and sigreturning

20:59 immibis has quit [Remote host closed the connection]

20:59 immibis has joined #osdev

21:01 pretty_dumm_guy has quit [Ping timeout: 272 seconds]

21:02 <mrvn> since when do signal handlers return anything? typedef void (*sighandler_t)(int);

21:03 <zid> nobody said they did

21:03 C-Man has quit [Ping timeout: 272 seconds]

21:03 <mrvn> ups, read that as "what to return"

21:03 <mrvn> Does the stack have the return address on it when you have an signal stack?

21:04 <heat> yes

21:04 <heat> as I said, not just the return address but the whole state is stored there

21:04 <mrvn> one that isn't in the crt.o of the libc?

21:05 <heat> hm?

21:05 <zid> it's a software exception handler

21:05 <zid> so it does what hardware exception handlers do, but in software

21:05 <mrvn> heat: the return address on the stack will be the libc function that switches the stack back to non-signal context.

21:05 <heat> that's not how signals work

21:06 <heat> the libc doesn't dispatch signals

21:06 <heat> the kernel dispatches signals

21:06 * kingoffrance watches catern slowly get a migraine as "what is an interrupt?" takes hold again

21:06 <zid> does the kernel do the filteirng and stuff too?

21:06 <zid> I never looked at it

21:06 <mrvn> heat: and you are sure that doesn't go through the libc first?

21:06 <heat> mrvn, yes

21:06 <heat> zid, filtering of what?

21:06 <zid> My naive assumption would have just been that posix registers the 'default' handler and does the processing in software

21:07 <zid> heat: which handlers are registered or ignored etc

21:07 <heat> the only fancy thing the libc does is it sets sa_restorer to a nice function that just syscalls sigreturn

21:07 <heat> zid, no, that's all kernel

21:07 <zid> fair enough, I never looked

21:07 <mrvn> zid: mine too. especially when the libc has to posix-ify the kernel interface

21:07 <zid> I guess linux and posix just match in this case

21:07 <zid> on purpose

21:08 <zid> there's nothing saying you couldn't do a non-posix kernel interface and then need a trampoline to make it posixy

21:08 <heat> the kernel keeps two signal masks (blocked and pending) plus signal queues (for rt signals) and essentially an array of struct sigaction

21:09 <mrvn> zid: often the kernel interface can do more or less than POSIX or the format changed between kernel versions and the libc unifies that into a stable interface.

21:09 <clever> [root@amd-nixos:~]# grep Sig /proc/self/status

21:09 <clever> SigBlk: 0000000000000000

21:09 <clever> SigIgn: 0000000000000000

21:09 <clever> and this lets you view those states for any thread on the system

21:11 <heat> fun fact: even though userspace sigsets have space for 1024 signals, the kernel just uses 64 bit ints

21:11 <heat> that's why the max rt signal is 65

21:11 <catern> kingoffrance: lol

21:13 * mrvn wonders where the "struct sigaction *oldact" comes from when the previous handler was set through signal() instead of sigaction().

21:14 <clever> mrvn: i believe the kernel has a copy of the current handler for each signal

21:14 <clever> so when you tell the kernel to set, the kernel can just give you the old value

21:14 <heat> mrvn, signal(2) is not a thing

21:15 <heat> it's just emulated over sigaction

21:15 <heat> https://elixir.bootlin.com/musl/latest/source/src/signal/signal.c#L4

21:15 <bslsk05> elixir.bootlin.com: signal.c - src/signal/signal.c - Musl source code (v1.2.2) - Bootlin

21:16 <mrvn> heat: sure. but does it allocate a struct sigaction on the heap then? Who cleans that up?

21:16 <clever> oh, good point

21:16 <heat> mrvn, huh?

21:16 <heat> you can use the stack?

21:16 <clever> its a pointer, so somebody has to allocate it

21:16 <clever> int sigaction(int sig, const struct sigaction *restrict act, struct sigaction *restrict oact);

21:16 <clever> oh but wait, your passing the pointer to the syscall!

21:16 <mrvn> heat: a call to sigaction() returns the "struct sigaction *" of the previous call

21:16 <clever> so the syscall cant modify where it points

21:16 <heat> no it doesn't

21:16 <heat> it returns the old struct sigaction

21:17 <heat> you give it a pointer

21:17 <clever> mrvn: you must pass it a sigaction, and the kernel will then populate that for you

21:17 <heat> it doesn't return the previous one

21:17 <clever> with the old value

21:17 <mrvn> so signal() makes one on the stack, the kernel copies it and then copies it back? makes sense.

21:18 <clever> just the contents of the struct describing things gets copied

21:18 <mrvn> clever: yes, it's not a **

21:19 <mrvn> Have I mentioned that out parameters suck?

21:19 <clever> at least its not an m68k :P

21:19 <mrvn> clever: huh?

21:19 <clever> where parameters are either in a or d registers

21:19 <clever> depending on if its a pointer or data

21:19 <clever> pointers go into a registers, like a0, a1, a2, so the cpu can use it as an address immediately

21:20 <clever> data goes into d registers like d0/d1/d2

21:20 <mrvn> clever: you would have something similar on x86 if it didn't put everything on the stack

21:20 <heat> praise be the stack

21:20 <clever> but at least on x86, isnt it always managed by gcc?

21:20 <clever> amiga is the only time ive seen that being tagged in the args of a function

21:20 <mrvn> clever: how would gcc manage rust code?

21:21 <clever> where every function basically has a unique abi

21:21 <clever> mrvn: well, the compiler

21:21 <mrvn> clever: same thing with float, double, mxx, see retgisters

21:21 <mrvn> mmx

21:21 <j`ey> gcc rust frontend coming soon(tm)

21:21 <clever> yeah

21:22 <clever> let me get an example though

21:22 <mrvn> Having different types passed in different registers isn't realy anything uncommon.

21:22 <heat> gcc rust frontend comes in valve time

21:22 <clever> https://github.com/michalsc/Emu68-tools/blob/master/sdcard.device/src/open.c#L12

21:22 <bslsk05> github.com: Emu68-tools/open.c at master · michalsc/Emu68-tools · GitHub

21:22 <clever> void SD_Open(struct IORequest * io asm("a1"), LONG unitNumber asm("d0"), ULONG flags asm("d1"))

21:22 <clever> mrvn: more, that its being explicitely stated like this, for every function in the entire program

21:23 <mrvn> clever: that's basically just a workaround the compiler being to stupid to implement the calling convention.

21:23 <mrvn> +for

21:23 <clever> yeah

21:23 <clever> and there may be exceptions in the official libraries, that store pointers in d regs

21:24 <mrvn> The compiler should know that pointers go to A0/1 and integers to D0/1

21:24 <mrvn> clever: odd that you example uses A1 though.

21:25 <clever> certain a registers (i forget) are reserved, to act like a `this` pointer

21:25 <mrvn> clever: A6

21:25 <nomagno> I lost the reply to my last message

21:26 <nomagno> It was 1-2 days back. Anyone have it archived?

21:26 <clever> http://d0.se/autodocs/exec.library/OpenLibrary

21:26 <bslsk05> d0.se: d0 - The Classic M68k Amiga Developer Resource

21:26 <clever> mrvn: and this is an index of all of the official functions, and the abi for each

21:26 <heat> nomagno, the logs do

21:26 <mrvn> clever: I have that in dead tree

21:26 <clever> heh

21:27 <nomagno> heat: which logs? I'm not very used to IRC

21:27 <heat> https://libera.irclog.whitequark.org/osdev

21:27 <heat> in the topic

21:27 <bslsk05> libera.irclog.whitequark.org: #osdev on 2022-03-29 — irc logs at whitequark.org

21:27 <nomagno> aah

21:27 <mrvn> The Amiga RKRMs are great. Big book with all the interfaces, big book explaining how to use them and the concepts behind them and so on.

21:28 <clever> mrvn: the more i learn about the amiga, the more dos looks like a step backwards

21:28 <clever> but they are also pretty close in age

21:28 <mrvn> clever: dos is a joke

21:28 <clever> exactly

21:28 <mrvn> clever: AmigaOS in 85 was better than window3.1 or even later.

21:29 <clever> and it even had auto-configuration of cards like pci, in the form of the zorro expansion cards

21:29 <clever> back when x86 was still on isa i think

21:30 <nomagno> Can't find it...

21:30 <mrvn> You can take an Amiga 500, plug in a modern IDE drive (if you have a controller) and have it boot from ZFS by providing the filesystem driver in the harddisk.

21:31 <nomagno> It didn't sent, great

21:31 <clever> emu68 is using a virtual zorro3 card, to inject a rom into the initialization sequence

21:31 <nomagno> Is it more RISCy to have fixed-size instructions where space might be wasted depending on opcode, or more optimal variable-size instructions with length depending on opcode?

21:31 <clever> mrvn: that then injects sdcard.device into the driver array, so tagged SD card parititions appear as whole drives in amiga

21:31 <heat> i feel like fixed size instructions are more in the spirit of risc

21:31 <heat> as it simplifies decoding

21:32 <heat> vs the crapshoot you have in x86

21:32 <mrvn> heat: x86 is probably the worst there

21:32 <clever> i dont want to even think about how i would do an x86 decoder in verilog

21:33 <nomagno> Issue is these instructions are 5-bytes long

21:33 <nomagno> Which is pretty big for an 8-bit system with 16-bit address space

21:33 <mrvn> m68k has one 16bit opcode followed by optional immediates. None of those prefix bytes where you have to parse each before you even get the opcode.

21:33 <nomagno> clever: U don't cuz it's patended :O

21:33 <heat> no it's not

21:33 <nomagno> Depends on which instruction set you take to be x86

21:33 <heat> 32-bit x86 isn't

21:34 <mrvn> nomagno: why would you have 5 bytes?

21:34 <heat> and decoding x86 definitely isn't since emulators exist

21:34 <mrvn> heat: something can be patented and not be inforced

21:35 <nomagno> mrvn: Well operands may be addresses or literals, and instructions may have up to two operands. It's not very RISCy I agree, but that's besides the point

21:35 <nomagno> I explicitly rejected load/store architecture

21:35 <mrvn> nomagno: so you want to make CISC except for the variable opcode length?

21:36 <clever> nomagno: what is a load/store architecture ?

21:36 <mrvn> clever: the part that I would say makes a RISC system

21:36 <clever> as-in, only a load/store opcode can access ram, and mov/alu-stuff cant?

21:36 <heat> if youre skipping load/store that kinda stops being a RISC as instructions get complex as hell

21:36 <mrvn> clever: other than load and store opcodes operate on registers only.

21:37 <clever> ahhh

21:37 <heat> fixed size is also impossible there

21:37 <clever> but half of that, could just be naming

21:37 <heat> because instructions get very big

21:37 <clever> what id i patch the x86 objdump, so that `mov a, [b]` gets disassembled as `ld a, b` ?

21:37 <clever> now its a load/store arch!

21:38 <nomagno> heat: Well yeah I figured. I'll just deal with the 5 bytes per instruction, honestly

21:38 <mrvn> clever: no, because you have add r, #mem.

21:38 <nomagno> It gets too complex to implement otherwise

21:38 <nomagno> Or well, it gets too complex to read the bytecode

21:38 <clever> yeah, so its the alu directly from memory, that makes it not load/store

21:38 <mrvn> nomagno: 5 bytes is a horrible number

21:38 <mrvn> clever: jmp #mem

21:38 <clever> and i do agree that alu from memory, is a bad idea

21:39 <nomagno> mrvn: Because it's big or because it's odd?

21:39 <clever> you have no control over when the fetch happens, and it harms your latency when it cache misses

21:39 <mrvn> clever: usualy when you have one component like the ALU with mem access then all parts have mem access

21:39 <clever> with load/store stuff, you can load it ahead of the alu op, and hope out-of-order cores run it faster

21:39 <mrvn> nomagno: the later

21:40 <nomagno> mrvn: there are no real drawbacks to 4 vs 5 aside from aesthetics. No system ever is going to have issued with it

21:40 <mrvn> clever: that hardly matters in modern pipelines.

21:40 <clever> mrvn: what about an opcode like `switch r0`, where r0 is an int, and the opcode is immediately followed by an int16_t[] of pc-relative offsets?

21:40 <mrvn> nomagno: bus size, cache line size, DRAM sizes

21:40 <clever> thats not exactly `jmp #mem`, because the table is at a predictable offset, and likely in your i-cache

21:40 <nomagno> mrvn: it's really just for the byte code. The first byte gets split into two internally very easily

21:41 <mrvn> clever: up to which size?

21:41 sonny has joined #osdev

21:42 <mrvn> clever: is the int16_t[] part of the opcode? Or do you do the ARM jump table thing where each entry is a jump instruction on it's own?

21:42 <clever> mrvn: ive yet to see an upper limit, but it can only jump up to +64kb forwards, so if the table goes over ~32768 entries, all forward jumps become impossible

21:42 <mrvn> switch r0 == add PC, 4 * r0

21:43 <clever> mrvn: its not really `add PC, 4 * r0`, its more of a `ld pc, pc + (r0 * 2)`

21:43 <clever> hmmm no

21:43 <mrvn> clever: lea

21:44 <mrvn> clever: On ARM it's 32bit per opcode so 4*r0 there.

21:44 <clever> ld t, r0 * 2; t2 = t*2; pc = pc + t2;

21:44 <clever> thats the best way to describe it

21:44 <clever> all opcodes must be a multiple of 16bits on here, so bit0 is assumed to be 0, and the offset table contains bits 16:1

21:45 <clever> but it also has negative offsets

21:45 <clever> so each slot is an offset +/- 64kb, forwards or backwards

21:46 <clever> let me get an example

21:46 <mrvn> clever: you can always design something that breaks any precondition

21:47 <nomagno> I mean, I can't to much about the 5 bytes. I need 4 bits for the complex addressing mask, 4 for the opcode (yes, I have 16 instructions. Surprisingly nice to work with), and 16 for each operand

21:47 <mrvn> nomagno: not every opcode needs the same number of operands or the same addressing modes.

21:48 <clever> https://gist.github.com/cleverca22/8dc0eab6a27bf2e92f81b465df3fa7cb

21:48 <bslsk05> gist.github.com: gist:8dc0eab6a27bf2e92f81b465df3fa7cb · GitHub

21:48 <clever> > (0x80001546 + (0x1e * 2)).toString(16)

21:48 <clever> '80001582'

21:48 k8yun_ has joined #osdev

21:48 <nomagno> mrvn: Which is why it's really variable length

21:48 <nomagno> But the implementation cost of that doesn't really outweigh thee simplicity of 5 bgtes

21:49 <nomagno> I prefer 5 byte chunks to a singly linked list.

21:49 <clever> mrvn: the 1e on line 6, is an offset to pc + (0x1e * 2), and with line 6 being the 2nd index (0 based), it will jump to 80001582 if r2==2

21:49 <mrvn> nomagno: why not 4 bytes?

21:49 <nomagno> mrvn: what am I supposed to shave off?

21:49 <mrvn> nomagno: one of the later 4 bytes

21:50 <nomagno> 4 bits for complex addressing, 4 bits for opcode, 16 bits for arg1, 16 bits for arg2

21:50 <mrvn> why 16bit?

21:50 <mrvn> you don't have 65536 registers, do you?

21:51 <nomagno> 16 bit address space, 8-bit words, no load-store architecture. You just read that though

21:51 <clever> mrvn: in the example i linked, there is a compare against 0xe, and a "branch if higher" right before the switch, so its enforcing an upper limit on the offset-table size

21:51 <mrvn> nomagno: most cpus only allow one memory operand per opcode.

21:51 <nomagno> mrvn: Well, uh... Good for them.

21:51 k8yun has quit [Ping timeout: 260 seconds]

21:52 asocialblade has joined #osdev

21:52 <nomagno> It's a programmer-oriented machine code, not a compiler oriented machine code

21:52 <mrvn> nomagno: normaly machine code is hardware oriented

21:53 <nomagno> It's P-code

21:53 <nomagno> No need

21:53 <heat> if you want risc and easier programming, just add pseudo instructions

21:53 <clever> mrvn: gist updated to annotate things better

21:54 the_lanetly_052 has joined #osdev

21:54 <clever> does it make sense now?

21:54 <mrvn> clever: it made sense an hour ago. Everybody knows what a jump table is

21:54 <clever> but its also pc-relative, and immediately after the opcode

21:55 <clever> so its more predictable then `jmp #r0` from x86

21:55 <clever> and also PIC friendly

21:55 the_lanetly_052_ has quit [Ping timeout: 260 seconds]

21:55 <mrvn> clever: ARM uses that in hardware

21:55 <nomagno> PC-relative addressing feels to me like you have to be actually superhuman to be able to do many memory operations without messing up

21:56 <mrvn> nomagno: that's why you have an assembler and not just a hex editor.

21:56 <clever> mrvn: how did arm do this?

21:57 <mrvn> clever: the jump table, or interrupt vector or whatver is an uint32_t[] where each entry is a branch instruction.

21:57 <clever> ah, but thats different, its more setting PC to the addr of that slot

21:57 <clever> and that slot must contain a branch opcode

21:57 alpha2023 has quit [Ping timeout: 272 seconds]

21:57 <clever> vs the slot just being an offset relative to the source

21:57 <nomagno> mrvn: 'assembly is 1-to-1 coding of machine code' goes brrr

21:57 <mrvn> clever: just a different encoding

21:58 <clever> it starts to differ, when you introduce the switch.b opcode

21:58 <clever> which is followed by an int8_t[] array of offsets

21:58 <clever> which now allows denser packing, when you only need short offsets

21:58 <mrvn> clever: still not anything conceptually new

21:59 <nomagno> Is the relationship between time and memory price more pronounced than Moore's law?

21:59 <heat> assembly is not 1-to-1 coding of machine code

22:00 <clever> ldr could be either a mov or a ld i think?

22:00 <clever> depending on the size of the immediate

22:00 * mrvn wonders how "#include" translate into machine code

22:00 <nomagno> heat: Well did I make a programming language and not a virtual machine then?

22:01 <mrvn> nomagno: you made whatever you made. C++ has a virtual machine too.

22:01 k8yun_ has quit [Quit: Leaving]

22:02 <nomagno> Then that statement was kinda void. Assembly that is 1-to-1 coding still needs to be assembled

22:02 alpha2023 has joined #osdev

22:02 <mrvn> nomagno: Assembly that is 1-to-1 coding is a pretty stupic assembly and assembler.

22:02 <Griwes> nonono, C++ has an *abstract* machine :P

22:03 <mrvn> Griwes: that too

22:05 <mrvn> nomagno: last time I had a 1-to-1 coding was the machine code editor on my C64.

22:05 <mrvn> I wouldn't even call it an assembler.

22:11 C-Man has joined #osdev

22:18 onering has quit [Ping timeout: 250 seconds]

22:20 Beato has joined #osdev

22:55 <klange> Hot take: In the academic sense, assembly languages are not programming languages. Sure, they're "languages" that you can "program" in, but they aren't "PLs".

22:55 <moon-child> why?

22:56 <moon-child> they have well-defined semantics, and formal models have been constructed of them

22:56 <moon-child> probably _more_ formal models are made of asm (or, machine code) than of any other sort of language, by cpu vendors

22:57 <Bitweasil> I would agree with klange here. Assembly, in almost all cases, is just mnemonics for the machine opcodes, with a few nice bits added (macros).

22:57 <Bitweasil> I would generally define a programming language as something more abstracted and human-focused that can be converted to something the machine can execute.

22:58 <moon-child> per church-turing hypothesis, anything can be converted to something the machine can execute

22:58 <sonny> assembly is just a low level programming language

22:58 <moon-child> 'abstracted and human-focused' is not an interesting definition in an 'academic' sense

23:00 Likorn has quit [Quit: WeeChat 3.4]

23:01 <Bitweasil> Yeah, good thing I'm not in academia. ;)

23:01 <Bitweasil> My desire for a PhD has dropped off very rapidly after my Masters.

23:02 <klange> Academia is full of squares. That's why the hats are shaped the way they are.

23:10 <graphitemaster> Everyone knows the only language that actually classifies as a programming language is HTML

23:11 <klange> /kickban graphitemaster

23:12 <heat> html bad

23:12 <heat> css bad

23:13 <heat> vote for generating pages using javascript

23:13 <klange> generate your pages with kuroko

23:13 <heat> kuroko DOM when?

23:14 <klange> when wasm has dom stuff

23:14 <klange> quite a bit of kuroko-lang.github.io is written in kuroko running in your browser

23:14 Likorn has joined #osdev

23:15 <klange> especially the IDE, that's almost entirely kuroko with just a few little JS bridges to do DOM stuff

23:35 vdamewood has quit [Read error: Connection reset by peer]

23:36 vdamewood has joined #osdev

23:37 Arthuria has quit [Read error: Connection reset by peer]

23:40 Oli has quit [Ping timeout: 272 seconds]

23:41 HardWall has quit [Read error: Connection reset by peer]

23:47 skipwich has quit [Quit: DISCONNECT]

23:47 skipwich has joined #osdev