klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
rustyy has quit [Quit: leaving]
fwg has joined #osdev
knusbaum has quit [Quit: ZNC 1.8.2 - https://znc.in]
knusbaum has joined #osdev
Ali_A has quit [Quit: Connection closed]
<Clockface> are there any ways to declare "local" labels in NASM?
<Clockface> eh
<Clockface> i just thought of something that works instead, putting a %macro in my start macro and %endmacro in my ending macro so i can %% everything local within there, even if the "macro" is used for nothing
<Clockface> *used only once
<Clockface> well, how would i have a macro end/begin a macro that is a small part of?
<Clockface> *it is
Likorn has quit [Quit: WeeChat 3.4.1]
<geist> no clue, i dont use nasm, sorry
<nur> gas 4eva
<geist> gas 4 lyfe
JanC has quit [Remote host closed the connection]
JanC has joined #osdev
gog has quit [Ping timeout: 276 seconds]
<kingoffrance> eh? i thought local label is just .foo (period prefix)
<kingoffrance> then you can refer outside with xyz.foo IIRC
<kingoffrance> this is one of those "how can you have an older qemu version than me" "how can you know less nasm than me" :/
nyah has quit [Ping timeout: 256 seconds]
<kingoffrance> yes, i believe the %% thing also does period prefix
pretty_dumm_guy has quit [Quit: WeeChat 3.5]
<Clockface> thanks
<kingoffrance> i had some weird issue calling <var args macro> from <var args macro> where it would complain about %0 :/ other than that, i trust nasm pp more than C preprocessor :)
Ali_A has joined #osdev
knusbaum has quit [Quit: ZNC 1.8.2 - https://znc.in]
knusbaum has joined #osdev
lainon has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
lainon has joined #osdev
Oshawott has quit [Read error: Connection reset by peer]
qubasa_ has joined #osdev
lainon has quit [Ping timeout: 250 seconds]
qubasa_ is now known as qubasa
Ali_A has quit [Quit: Connection closed]
mctpyt has joined #osdev
flx-- has quit [Ping timeout: 256 seconds]
Burgundy has joined #osdev
flx-- has joined #osdev
Burgundy has quit [Ping timeout: 272 seconds]
archenoth has joined #osdev
ThinkT510 has quit [Quit: WeeChat 3.5]
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
ThinkT510 has joined #osdev
Likorn has joined #osdev
pg12 has quit [Quit: pg12]
pg12 has joined #osdev
wootehfoot has joined #osdev
pg12 has quit [Quit: pg12]
pg12 has joined #osdev
GeDaMo has joined #osdev
doorzan has joined #osdev
mctpyt has quit [Read error: Connection reset by peer]
vdamewood has joined #osdev
gog has joined #osdev
doorzan has quit [Ping timeout: 276 seconds]
pretty_dumm_guy has joined #osdev
gog has quit [Ping timeout: 276 seconds]
nyah has joined #osdev
jjuran has quit [Quit: Killing Colloquy first, before it kills me…]
jjuran has joined #osdev
Burgundy has joined #osdev
wootehfoot has quit [Read error: Connection reset by peer]
pretty_d1 has joined #osdev
pretty_d1 has quit [Client Quit]
pretty_d1 has joined #osdev
pretty_dumm_guy has quit [Ping timeout: 246 seconds]
kspalaiologos has joined #osdev
wootehfoot has joined #osdev
mahmutov has joined #osdev
rustyy has joined #osdev
pretty_d1 has quit [Quit: WeeChat 3.5]
lg has quit [Read error: Connection reset by peer]
lg has joined #osdev
yasar11732 has joined #osdev
wootehfoot has quit [Ping timeout: 272 seconds]
yasar11732 has quit [Ping timeout: 272 seconds]
yasar11732 has joined #osdev
lg has quit [Ping timeout: 272 seconds]
yasar11732 has quit [Read error: Connection reset by peer]
wootehfoot has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
gog has joined #osdev
vdamewood has joined #osdev
yasar11732 has joined #osdev
wootehfoot has quit [Ping timeout: 248 seconds]
yasar11732 has quit [Ping timeout: 246 seconds]
wootehfoot has joined #osdev
wootehfoot has quit [Quit: Leaving]
wootehfoot has joined #osdev
kspalaiologos has quit [Quit: Leaving]
pretty_dumm_guy has joined #osdev
Likorn has quit [Quit: WeeChat 3.4.1]
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
heat has joined #osdev
<bslsk05> ​datatracker.ietf.org: RFC 1122 - Requirements for Internet Hosts - Communication Layers
<heat> ^^really useful rfc of stuff all around the stack
Likorn has joined #osdev
<geist> indeed
<geist> also yawn
<bslsk05> ​github.com: lk/inetsrv.c at master · littlekernel/lk · GitHub
<mahmutov> sorry wrong place
<heat> hey accidentally not that offtopic heh
vdamewood has joined #osdev
<heat> also funnily enough they assume packets that are waiting for an arp reply should be queued
<heat> i guess that was already the way of doing things back then
<geist> oh versus dropping it and having the higher level try again later?
<geist> for TCP it'd probablywork, but UDP would be pretty annoying
<heat> no, versus just waiting synchronously for the arp reply (or failure)
<heat> my code right now is doing "neighbour *n = resolve(...); send_packet(n->hw_address, packet)" which blocks on resolve
<heat> this is not a correct solution
<geist> pretty sure i do something equally bad
<heat> at least not in my design (a-la linux) where there's a good chunk of code that runs under softirq
<heat> the correct solution is to queue packets under the neighbour and pretend you sent it, then if you get an arp reply, you send everything that's queued. if you don't, you signal an error
<bslsk05> ​github.com: lk/minip.c at master · littlekernel/lk · GitHub
<geist> sends an arp packet, sets a timer, waits a bit, tries again
<geist> all synchronously
<geist> and yeah i should probably do something like build an arp queue object as part of the socket that can be used to hold pending things as you say
<heat> the neighbour isn't part of the socket, it's part of the cache
<geist> yah, but then that means i dont have to malloc anything at run time
<geist> ie, have the queue structure as part of the socket in case it needs it
<clever> random note on the subject of network stacks
<geist> sicne the socket probably only has one neighbor ever it needs to ar
<clever> the official rpi firmware, in its tftp client, is oddly deterministic in its choice of source port
<geist> but, then yeah that also violates some layering, since it assumes arp is there
<clever> and xinetd gets upset if you reuse a source port for multiple requests
<clever> i assume your at least using a new source port for each udp "connection" ?
<geist> or need to wait some period of time because of usual socket timeouts
<geist> 2 minutes is i think the classic default?
<heat> tcp is 900 seconds iirc
<heat> hmm maybe I'm wrong
<geist> so it's possible xinetd is doing the usual 'i've seen a thing from this host at this socket and i'll keep a structure around for N seconds and assume it's the same client'
<geist> especially with UDP where it's connectionless
<clever> except the client has terminated because the request finished
<geist> so if the client is reboot, using the same source/port it can't tell
<clever> so xinetd assumes its junk at the end of a session, and doesnt spawn a new client
<clever> the rpi firmware also reuses the same port within a single boot, when fetching multiple files back to back
<geist> maybe there's some sort of termination opcode the client is suppoed to send?
<clever> and xinetd just silently drops those packets
<geist> also possible xinetd is simply not a good tftp server
<clever> yeah, i believe tftp does have such a termination code
<heat> yeah it looks like it's 2 minutes, dunno where i pulled this 900s out of
<heat> probably my ass
<geist> yah i vaguely remember 2 minutes because i think it was mentioned in the tcp/ip illustrated book that the number was picked because there was some notion at the time that clearly no computer can reboot within 2 minutes, so it was a good timeout
<heat> hahahaha
<clever> lol
<clever> and now you have systems doing udp within seconds of power-on
<geist> which is of course what it was really trying to deal with: computer restarting and reusing existing address/port pairs
<Griwes> lol
<Griwes> now you have systems probably doing udp while turned off, too
<clever> +++ xinetd-2.3.15/xinetd/util.c 2021-09-06 18:15:32.671053769 -0300
<clever> void drain( int sd )
<geist> so you hold the closed connections in a purgatory for 2 minutes
<clever> + return;
<clever> yeah, here it is
<heat> geist, i think they say they picked something that's kind of representative of how long a packet can be stuck in transit
<geist> possible
<clever> if xinetd detects traffic on an FD after the child daemon exited, it will just drain all data to /dev/null
<clever> and i had to jam a `return;` in there to make things work properly
<geist> clever: yeah may not be valid to spec
<geist> kinda makes sense to a point, dealing with chained transfers on a protocol like UDP is probably much more complicated
<clever> it may be useful to keep this in mind, and throw in some rng on the source port?
<geist> because you have no guarantee of ordering. you could start receiving data before the opcode, etc
<geist> though if the tftp prototocol itself has some in-band notion of a random session id it'd probably solve it
<clever> so you dont wind up with the same bugs
<geist> but yes, rng on the source port i think would be pretty sufficient
<clever> *looks*
<clever> Apr 30 16:33:57 router dhcpd4[1012]: DHCPOFFER on 192.168.2.51 to b8:27:eb:0a:ad:04 via lan
<clever> [root@router:~]# tcpdump -i lan src 192.168.2.51 or dst 192.168.2.51 -w pi1-netboot.pcap
<clever> reading it in wireshark, i can see that the tftp session begins with a 16bit opcode, initially "read request(1)", a null terminated path, and a type of "octet"
<heat> what's tftp good for?
<clever> the server responds with opcode=3 (data packet), a 16bit block#, and 512 bytes of data
<clever> and the client responds with opcode=4 ack, and a block#
<clever> heat: transfering files without a proper tcp stack
<heat> but you're ending up implementing a small tcp stack to ensure reliability
<heat> vs HTTP 1.1 GET /vmlinux\r\n
<clever> tftp lacks things like dynamic window scaling, or even having a window at all
<heat> right, but you don't need that to communicate
<geist> heat: it's very very simple
<geist> basically it was designed for things like diskless terminals to fetch their firmware
<clever> and i'm not sure how wireshark or the client can tell, but the last block seems to entirely lack any signs its the last block
<clever> its just not the expected block size
<geist> can be implemented along with a lot of a UDP stack in hundreds of lines of code, basically
<clever> the bloody bootrom on a pi3 has a full dhcp&tftp client
<clever> along with usb-host and usb-nic drivers
<geist> from a reliability point of view the block # + ack is not fast but pretty reliable, since it simply retransmits
<clever> its just 30kb of compiled binary code, to implement usb-msd, usb-nic, udp, dhcp, tftp, and SD card
<heat> in theory i think you could half-ass tcp pretty easily too
* geist shrugs
<geist> just sayin, that's why it exists
<heat> yup, not doubting you
<geist> usually useful for firmware like stuff that's really dumb
<geist> can even talk to other firmware like stuff that's really dumb
<geist> also at the time it was invented, DHCP hadn't yet morphed out of BOOTP, which is also simpler
<geist> and a pretty straightforward way to get an ip address + the file you're supposed to go fetch from the tftp server + the tftp server
<clever> and the rpi just ignores those standards :P
<clever> it tftp's the dhcp server, ignoring whatever next-server you had set
<clever> and always fetches bootcode.bin, not the file you set
<clever> but, only if the dhcp server responds with a magic option code!
<heat> standards are optional
<geist> ah well, thats good to know if i ever futzed with it, because i *specifically* dont have a tftp server on my firewall which is handing out dhcp
<clever> option rpiboot code 43 = text;
<heat> rfc = request for comments
<clever> option rpiboot "Raspberry Pi Boot ";
<heat> my comment is that your standard is stupid
<clever> geist: you can also configure a 2nd dhcp server on the lan, that responds with the above magic handshake
<clever> and the rpi will only listen to the 2nd server
* geist nods
<clever> pi4 greatly improves things
<clever> you can configure the tftp ip right in the eeprom config file
<geist> well, hoestly i probably wont be futzing with that any time soon because a certain person keeps not rolling back changes into my repository
<clever> so it can now differ, and the dhcp server doesnt have to co-operate
<clever> yeah, i need to finish that pl011 and ext4 pr's, and look into bcm2711 support
<geist> but actually my last free rpi4 has developed a fan problem so i have to actually fix it first
<geist> i turned it off a few months ago because it was making a ton of noise
<geist> i like the little case i have for it, but the fans keep crapping out
<geist> and then it's a pain to disassemble
<heat> clever, if you get edk2 on your pi you'll get an actual network stack with proper network booting
<bslsk05> ​github.com: Releases · pftf/RPi4 · GitHub
<clever> but, you need to either include the start4.elf + tianocore on an SD card, or netboot them via the non-standard path
<heat> i know
<heat> it's a bit useless if you just want to boot
<geist> oh on that note, i should buy one of the vim4s and see what it does
<heat> only really useful if you want to boot windows 10
<heat> vim4 probably goes brrrr
<clever> heat: i do have rough plans on how to bake tianocore into an SPI flash chip, like a proper motherboard
<clever> it should be possible on the entire pi model range
<geist> it's probably unobtanium like everything else
<geist> gosh i really wish i could figure out how to compile gcc for mac
<geist> 11.3 was released, but it still doesn't natively build. must be some trick or hack i need to do
<clever> but tianocore wont fit on the pi4's stock 512kbyte flash chip
<heat> geist, ? my mac toolchains compile fine
<clever> and the pi4 firmware explicitely checks the spi flash size, and refuses to operate if its not 128k or 512k
<heat> 11.2.0
<geist> heat: oh? gcc 11? on arm mac?
<clever> so, i would need to include tianocore on a 2nd flash chip
<geist> i should have been more clear: arm mac has some linkage problem
<heat> ah no, not arm
<bslsk05> ​IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/85asvF1v
<geist> etc
<heat> do you have brew on the arm macs?
<geist> i do, well macports
<geist> but it's some sort of gcc build system doesn't do the right thing for arm problem
<bslsk05> ​github.com: RISC-V cross compiler on m1 fails to build; kludgey workaround included #46 · Issue #47 · riscv-software-src/homebrew-riscv · GitHub
xenos1984 has quit [Read error: Connection reset by peer]
<bslsk05> ​github.com: patches for building gcc on Apple Silicon by richard-vd · Pull Request #129 · richfelker/musl-cross-make · GitHub
<heat> these seem to be "proper" patches
<geist> yah
<geist> i half expected that to have been included upstream by now, but at least not on the stable-11 branch
<geist> lets see. trying those patches here
<geist> made it past gcc. now failing in gdb, but that's a differe thing. this should work!
xenos1984 has joined #osdev
<heat> :O
<geist> now that i'm off travis-ci i should try to get a github action building things and see if they give you enough cpu time for it
<geist> travis-ci would time out if trying to build a toolchain
<heat> they do
<heat> you can even build llvm
<heat> with LTO
<heat> source: I do it
<bslsk05> ​github.com: Update toolchains (LLVM upgrade to 14.0.0) · heatd/Onyx@4cc2376 · GitHub
<geist> noice. yeah travis ci was no bueno
<geist> was free, so i got what i paid for really
<heat> i build 13 toolchains in the worst case
<heat> :D
<geist> yah
<geist> in one job? how long does it take?
<heat> 20-40 minutes for each gcc
<heat> + 2h for non-lto LLVM, 4h for thin-lto LLVM
<geist> ah nice that they have that much time, and that doesn't use up all your credits?
<geist> gdb wants libgmp which i am trying to provide but it's being a pain about it
<heat> geist, no credits for public projects
<geist> i mean i use it for LK but it's pretty lightweird
<geist> ah cool
<geist> the toolchains script tries to buid mostly hermetic binaries, so it includes local versions of libs, but in the case of gdb it reaches a bit farther into the host
<heat> i dont build gdb
<heat> i just use whatever -linux target with the same arch
<geist> yah but trying to build for non linux as well
<heat> i've also looked at gdb's code and it's damn scary
<heat> everything's scattered in tiny, poorly organised files
<heat> exactly what you want when porting something D:
<geist> yeah gdb is usually problematic
<heat> lldb looked better in that regard
<geist> there, finally had to just hack in to look in /opt/local/* for gmp stuff
<geist> which i dont want since i dont want the output of this to require macports but so it goes
<geist> yah predictably that picke dup a bunch of crap out of macports
<geist> well will work for now
<geist> otool -L <binary> is useful in this case FWIW. similar to ldd i think
<geist> but mac specific
doorzan has joined #osdev
GeDaMo has quit [Remote host closed the connection]
doorzan has quit [Ping timeout: 240 seconds]
woky has quit [Quit: Nothing in this world is hopeless!]
woky has joined #osdev
mahmutov has quit [Ping timeout: 276 seconds]
Likorn has quit [Quit: WeeChat 3.4.1]
gog has quit [Ping timeout: 276 seconds]
<nomagno> Greetings! If I have unsigned integers which are composite types (u8 high byte, u16 two low bytes, total 24 bits), and I have to implement multiplication of two such 'CompoditeUInt24' using only non-composite addition, substraction, and multiplication, and also being able to use pre-existing implementations for substraction and additions between CU24s, is it faster to do iterated addition or is there some low-level multiplication-with-carry
<nomagno> algorithm that is better?
<nomagno> I know this sounds like X Y, but I genuinely want to do this specifically.
<nomagno> And, well, I'd like a machine independent answer, but with the obvious fact that relying on native multiplication is faster than iterated addition in all cases
Burgundy has quit [Ping timeout: 276 seconds]
heat has quit [Remote host closed the connection]
theruran has quit [Quit: Connection closed for inactivity]
<moon-child> nomagno: I am curious what motivates this question
<moon-child> nomagno: I guess you are on a platform without a native 32-bit multiply? Do you at least have 16 x 16 -> 32?
<moon-child> (if not, then I would probably do it with repeated 16-bit multiplication on each 8-bit limb)
<nomagno> moon-child: I'm sticking to uint8_t and uint16_t, rationale being making slow interfaces that can be replaced by native uint32_t or whatever on a platform-by-platform basis is more portable than going straight for uint32_t. The reason I want specifically uint24_t is it's a type able to hold 16 million milimeters AKA 16 KMs, which is perfect level of precision to hold a race track in internal physics calculations
<moon-child> well, optimising without a concrete target and a concrete cost model is a lost cause
<mrvn> nomagno: this is about implementation, not interface
<bslsk05> ​en.wikipedia.org: Karatsuba algorithm - Wikipedia
<nomagno> mrvn: yeah
<mrvn> On 64bit cpus and many 32bit cpus you can simply to 24 bit * 24 bit = 48 bit and then shift the result.
<mrvn> For add/sub you can do 16 bit + 16 bit = 32bit to get the carry or do some tricks.
<mrvn> Worst case you deal with it a number as 3 8bit values. All the advanced multiplication algoirthms won't be worth it for that small a number.
<nomagno> I'll stick to iterated addition I guess
<mrvn> Check the produced code. The compiler will probably unroll it fully if you make loops opver 3 bytes.
<klys> three 8bit values is what I recommend starting with
<nomagno> All this suffering now will allow me to trivially maintain an embedded version and a fast version of the game engine in the future
<mrvn> But really, will you ever have this on something with less than 32bit?
<nomagno> mrvn: Hm, good point about optimization I guess
<nomagno> mrvn: I want it to compile to a GBA, so yes.
<klys> and just revise the algorithm to replace base10 with base256
<mrvn> You might want to provide functions for A*B+C. That's a quite frequent operation and you get the C basically for free.
<mrvn> A*A can also be usefull but at 3 bytes it doesn't save much yet.
<mrvn> By the way, is this to be 24bit integer or 8.16 fixed point?
<nomagno> mrvn: former
<mrvn> Might be usefull to define it as 16.8 fixed point or something. So the cars can be size 1.
Likorn has joined #osdev
<nomagno> mrvn: I purposefully express all internal state in different units to make the most of each type, car weight is measured in hectograms, for instance. The cars are a simple millimeted u16
<nomagno> So you can store cars that measure from a few millimeters to 65 meters, which is a reasonable scale
<mrvn> problem is when you multiply stuff. You quickly get too big numbers. With fixed point they stay smaller.
<mrvn> Like how much damage does a car of mass m1 at speed v1 produce colliding with a car of mass m2 and speed v2?
<mrvn> Kinetic energy for 256 hectograms going at 256 millimeter per millisecond = m v^2 = 256 * 256 * 256 = overflow.