klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<clever> geist: i assume the tlb shootdowns are fully automatic, and dont require the target core software to co-operate?
<geist> correct
<clever> yeah, no way to map them then
<geist> you basically use a specific form of the instruction that specifies ASID, VA, if it's a global shootdown, and if it's a terminal or inner node
JanC_ has joined #osdev
<clever> and i assume its chip wide, so you cant send a different shootdown to each core
<geist> and then do a DSB to wait for the other core to ack it
JanC is now known as Guest6602
JanC_ is now known as JanC
<geist> right
Guest6602 has quit [Ping timeout: 250 seconds]
<geist> side note: the new amd extensions are functionally identical except they also support flushing by range
<geist> VA + # pages
<mrvn> I don't follow that.
<mrvn> If the TLB entries are per core then on cross core shootdown you only have to translate the ASID from one core to the other.
<mrvn> or is there an opcode to shoot down a TLB on all cores?
<geist> yes
<geist> the latter
<mrvn> Ok, so if a process runs on many cores you want a global ASID so the shootdown is just one op and no IPI.
<geist> in which case the ASID you send through only makes sense if it maps to the same thing on the other core
<geist> right
<mrvn> This is so much simpler if you don't have threads. :)
<clever> so you could instead use an IPI and a local TLB clear, to have per-core ASID's
<clever> but that comes at the cost of IPI's having to interrupt every core, and ack the action
<mrvn> I don't have threads so I never have to do a shootdown to another core.
<clever> and even if you have shared memory between procs, the munmap removes it from that procs tables
<mrvn> no shared memory either
<clever> so only after both cores unmap a page, will it be free
<mrvn> My design is based on message passing.
<clever> if only one proc unmaps, only its pagetables need to update
<clever> what about the cost of copying the message between memory regions?
<mrvn> clever: messages are passed and not copied.
<clever> passed how?
<mrvn> the page(s) are mapped into the other address space.
<clever> that sounds like it needs some tlb shootdowns?
<mrvn> sure, locally by the sender to unmap the page
<clever> ah, and with no tlb-miss cache, the receiver will check the pagetables again, and discover it?
<mrvn> that's the idea.
<sonny> what happened before linker scripts?
Starfoxxes has quit [Ping timeout: 240 seconds]
<mrvn> sonny: single compilation units
<sonny> oh
<mrvn> hand calculated jump offsets
<sonny> well that makes sense
<mrvn> punch cards
<sonny> oof
<mrvn> hard wired programming
<gog> hot
<sonny> hand calculated ... D:
<mrvn> People used to programm in a hex editor at some point
Starfoxxes has joined #osdev
<sonny> it seems that gcc is responsible for linker scripts?
<clever> mrvn: ive seen a blog on bootstrapping a compiler
<klange> define "responsible for"
bgs has quit [Read error: Connection reset by peer]
<clever> step 1 was making a program that can convert hex to binary, i think using only opcodes within the ascii range
bgs has joined #osdev
<sonny> klange invented
<gog> klange invented linker scripts til
<mrvn> clever: I can't see anyone doing that unless it is the only computer they have access too.
<clever> mrvn: it was more of an example of how to bootstrap assuming you have no tools beyond the ability to run a binary and a text editor
<mrvn> clever: usualy you have some medium to transport data that you write on a nother system. Even if that means making punch cards.
<mrvn> or flashing an eprom
<mrvn> Where did you get the editor from? And if you have an editor why no basic?
<clever> just an arbitrary set of limits the author started from
<mrvn> "run a binary" is usualy not a given I think, an interpreter is more likely.
<bslsk05> ​web.archive.org: Wayback Machine
<mrvn> Anyone know what it looked like on mainframes? Did they have some basic or something or did you have to load a binary from punch cards?
<clever> oh, it also relied heavily on a shell with working redirection
<clever> so he didnt have to parse argv and call open()
<mrvn> clever: writing a binary from shell with only 7 bit input is pretty hard but possible.
<mrvn> I mean "cat > mycc" and then type in 7bit opcodes.
<mrvn> or rather ascii only.
<clever> i lost the link right now, but i had seen a youtube vid of somebody bootstrapping an old pdp-11 machine i think
<clever> they had to load some code with both punchcards and a tape emulator i think
<clever> then compiled it, and wrote it to real tape reels
<mrvn> clever: you would implement something like forth or scheme in punch cards and then write your assembler in that.
<clever> yeah
<clever> the bcompiler above, migrated into an assemble that supported ever-increasing levels of complexity
<clever> starting out as just a hex->binary program with label support, so you dont have to count bytes
<clever> and evolving into a macro based compiler that pastes chunks of asm together
<mrvn> those old systems didn't have the immediate inside the opcode like modern cpus.
<clever> like the 6502, yeah
<mrvn> So you can write "hex for jmp" <label>
<clever> so you can put the immediate in more easily
<clever> yep
<mrvn> having a label is a step up from what I had on my C64. I had to hand count bytes.
<mrvn> On the other hand the C64 has basic so you can write a better assembler in that.
<clever> yeah
<mrvn> I never got around to writing a compiler on the C64.
sonny has quit [Quit: Client closed]
<clever> mrvn: ah, found it: https://www.youtube.com/watch?v=uFQ3sajIdaM
<bslsk05> ​'The IBM 1401 compiles and runs FORTRAN II' by CuriousMarc (00:23:40)
blockhead has joined #osdev
<mrvn> WTF? The first example in the handbook is inverting a matrix? Not hello, world?
adder has quit [Read error: Connection reset by peer]
adder has joined #osdev
<cb> mothers milk for mathematicans
adder has quit [Read error: Connection reset by peer]
adder has joined #osdev
[itchyjunk] has quit [Ping timeout: 240 seconds]
sdfgsdfg has quit [Quit: ayo yoyo ayo yoyo hololo, hololo.]
[itchyjunk] has joined #osdev
masoudd has joined #osdev
[itchyjunk] has quit [Ping timeout: 240 seconds]
<klange> interestingly, I've been running this serial console now since last night with no freezes
<gog> aaay
<gog> nice
<gog> did you change adapters?
<klange> no, but I moved some cables around, maybe it was interface from a power cord ;)
<gog> aha
<klange> could also be my host desktop's USB situation just being an absolute mess; I have a hub here that crashes regularly, my Wacom tablet flakes out if I plug in my iPhone _to a wall adapter_ (cables run alongside each other)...
<clever> my ftdi uart disconnects randomly
<clever> [155288.462774] usb usb13-port2: disabled by hub (EMI?), re-enabling...
<klange> [6391889.064768] pl2303 ttyUSB1: usb_serial_generic_read_bulk_callback - urb stopped: -32
<klange> These probably align with my freezes
<clever> i keep a `dmesg -w&` open most of the time
<clever> and that makes it easier to see things lining up
[itchyjunk] has joined #osdev
wxwisiasdf has joined #osdev
<wxwisiasdf> hi
wxwisiasdf has quit [Client Quit]
wxwisiasdf has joined #osdev
srjek has quit [Ping timeout: 240 seconds]
<Mutabah> wxwisiasdf: Hello.
not_not has quit [Ping timeout: 272 seconds]
sdfgsdfg has joined #osdev
sdfgsdfg has quit [Read error: Connection reset by peer]
sdfgsdfg has joined #osdev
sdfgsdfg has quit [Quit: ayo yoyo ayo yoyo hololo, hololo.]
sonny has joined #osdev
pretty_d1 has quit [Quit: WeeChat 3.4]
Belxjander has joined #osdev
netbsduser has quit [Read error: Connection reset by peer]
netbsduser has joined #osdev
<geist> hi
<kazinsal> aloha
Belxjander has quit [Ping timeout: 245 seconds]
<wxwisiasdf>  im making an os
<klange> well you're in the right channel
<wxwisiasdf> heh
sdfgsdfg has joined #osdev
gog has quit [Ping timeout: 250 seconds]
wxwisiasdf has quit [Quit: Client closed]
wxwisiasdf has joined #osdev
wxwisiasdf has quit [Client Quit]
wxwisiasdf has joined #osdev
Maka_Albarn has joined #osdev
<Maka_Albarn> do any of you know if gcc freestanding supports bit fields? and if so, how to use them?
<moon-child> freestanding should be no different than hosted
<moon-child> in that respect
<wxwisiasdf> it does support bitfields
wxwisiasdf has quit [Client Quit]
epony has quit [Ping timeout: 240 seconds]
<Maka_Albarn> so it seems like I just suck at using them.
<Maka_Albarn> any tips or exampls?
<moon-child> struct foo { int y: 4; }; makes 'foo' a struct which contains a bitfield 'y' taking up 4 bits
<moon-child> if you find them confusing, you do not have to use them
<moon-child> extracting bits directly from ordinarily addressible fields is equivalent
<moon-child> bitfields just provide an alternate syntax for doing the same thing
<clever> the main complaint ive heard about bitfields, is that gcc and ms's compiler, put the fields in a different order i think
<clever> so you run into compatability problems if you want to mix compilers or talk to hw
<moon-child> who compiles kernels with msvc though? :P
<clever> heathens that write nodejs in vscode, and still use file->save to save, even after being told about ctrl+s :P
<moon-child> sounds like an accurate depiction of the windows kernel team
<clever> another issue i can see with bitfields, is mmio passwords
<clever> a large chunk of sensitive registers in the rpi only have 24 usable bits, and you must `0x5a000000 | x` every value you write to the register
<clever> if the 5a is missing, the write is silently ignored, and those bits read back as 0
<Maka_Albarn> clever: I use vscode, but compile with GCC through WSL Ubuntu
<Maka_Albarn> muahahahaha
<clever> Maka_Albarn: you better at least use ctrl+s !
<Maka_Albarn> i do
<Maka_Albarn> file -> save is to slow
<sonny> llvm is available btw
<clever> and dont do right click -> copy, right click->paste!!
<sonny> dang, guess I'm a heathen then
* moon-child just uses :w
<Maka_Albarn> can you bit field a boolean?
<sonny> is this a cpp question?
* Maka_Albarn shrugs
<sonny> sounds like a trick question xD
<moon-child> clever: what would be wrong with that? struct { char x; unsigned y; } __attribute__((packed)) *foo; foo->x = 0x5a; foo->y = whatever
<clever> moon-child: but will the compiler merge the x+y writes together, and issue it to the hw as a single 32bit store?
<moon-child> oh you mean it might do one 1-byte store and then 1 2-byte store
<clever> yeah
<kazinsal> unlikely. it will in fact probably jam up the works as it attempts to do an unaligned access on the ->y
<moon-child> instead of reading back the x, oring with it, and writing back the result?
<moon-child> yeah makes sense. Though ^; depends on platform alignment restrictions
<clever> kazinsal: yep, un-aligned MMIO will misbehave in all kinds of fun ways
<clever> but with the PW missing, it will do less harm then usual
<moon-child> :D
<clever> basically, the bus coming out of the cpu is 32 bits
<clever> and if you do a mis-aligned 8bit load/store to aligned(32bit)+8bit, then it will present valid data on bits 8:15, with a bus-valid flag for that 8bit section
<clever> the other bits, may be anything
<clever> the far end must then match on the address, and map the whole 32bit bus to something
<clever> for example, if the address doesnt match any valid register in the gpio, then the literal string "GPIO" is presented on that 32bit bus
<clever> but the cpu was expecting a result on bits 8:15, so only 'P' comes back
<clever> so, if you do a 32bit read of an invalid register, that is 32bit aligned, you get back "GPIO"
<clever> if you do a mis-aligned 8bit read, you get 'P' 'I' or 'O', depending of the mis-alignment
<clever> and an 8bit read with 32bit alignment, gives you an 8bit slice of a real register
<moon-child> there was one chip that couldn't handle misaligned reads. But it didn't fault when you issued one either; rather, it used the low bits of the address to permute the result
<clever> but i have also seen other reports, that VPU side of things, is using a 64bit bus
<clever> somebody used the "load many" opcode, to load 4x32bit registers from MMIO
<clever> the first 2 cpu registers, got the same MMIO value, from the starting addr
<clever> and the next 2 cpu registers, got another MMIO value, from a 64bit offset ahead
<clever> which implies a 64bit bus, and then 8 bits of bus masking, to select what 8bit chunks to obey
<clever> and when you hit a 32bit MMIO reg, it just matches on the raw addr, and duplicates the reply to fill out the bus
Maka_Albarn has quit [Ping timeout: 240 seconds]
joe9 has joined #osdev
<clever> which now has me wondering
<clever> if i do a vector load, of 64 bytes (16 x 32bit), what will happen......
<clever> how will the hw malfunction when abused that hard!
<clever> > `v32ld HY(0++,0),(r1+=r2) REPx, 11 cycle startup (for L1 hit), plus 2*x, given that (r2%64)==0
<clever> from my notes
<clever> so there is a fixed overhead at the start, and then it can load 64 bytes in 2 clock cycles
<clever> that somehow implies that the bus is 32 bytes(256 bits) wide!?
k8yun has joined #osdev
<clever> oh!, its only able to hit those params when reading from the L1 cache
<clever> is there maybe a 256bit bus between l1 and the cpu, but then narrower going to L2 and dram?
dormito has quit [Ping timeout: 240 seconds]
dormito has joined #osdev
Vercas has quit [Ping timeout: 240 seconds]
Vercas has joined #osdev
rcvalle has quit [Quit: Leaving]
vdamewood has quit [Read error: Connection reset by peer]
vdamewood has joined #osdev
ElectronApps has joined #osdev
k8yun_ has joined #osdev
<geist> wouldn't be surprised
k8yun has quit [Ping timeout: 240 seconds]
<clever> geist: do you think the whole axi bus is 256bit, or just the cpu<->L1 part?
<geist> probably the cpu->L1. axi being that wide would be probably only in super high end things
<geist> since... lemme guess... you're talkiong about raspberry pi
<clever> yep
<geist> i am going to guesss it doesn't have a bus that wide
<clever> and given the way scalar load-many opcode winds up repeating a 32bit chunk, i think the axi bus may be 64bit
<geist> but having a 16 or 32 byte fetch from L1 i would think would be pretty standard
<geist> the cache line size is probably that wide anyway
<clever> that also gives me another thought, what if the cache-line yeah, is 32 bytes
<clever> so the cpu just puts part of the addr onto the bus, and the L1 gives you the whole damn cache-line at once
<geist> pretty standard, though this is probablyt e VPU you're talking about?
<clever> and the cpu can then detect the right bits it wants
<clever> yeah
<geist> if it was the a53 or a72 you can simply look it up in the manual
<clever> i can only dream of it being that easy :P
<clever> assuming a 256bit cacheline, and a 64bit axi bus, that means a burst of 4 transfers is needed to fill the cache-line
<clever> but i cant access MMIO thru L1, because the cpu knows what MMIO is
<clever> i'll have to just write some silly asm, abusing MMIO in ways it wasnt meant to, and see what it does
<clever> its about amd microcode
<clever> and now i can see that the https://www.bigmessowires.com/bmow1/ was using horizontal encoding for its microcode
<bslsk05> ​www.bigmessowires.com: BMOW 1 Computer | Big Mess o' Wires
<clever> and vertical encoding is what ive heard elsewhere, where the microcode just translates cisc into risc
eroux has joined #osdev
<geist> in general there's a heirarchy of speed as you go from L1 out, so makes sense that the pipe gets narrower
<clever> yeah
<geist> though could also be clock rate, etc
<clever> and if its both narrower and slower, thats more of an exponential speed loss
<clever> so with my above example, it may take 4 transfers to fill a 256bit cacheline with a 64bit bus
<clever> but if that 64bit bus is running at half the clock rate (edges aligned), it would take 8 clock cycles
Electron has joined #osdev
Electron has quit [Remote host closed the connection]
<clever> un-aligned, longer, clock domain crossing is hard
Electron has joined #osdev
ElectronApps has quit [Ping timeout: 240 seconds]
epony has joined #osdev
<geist> at some point you get to the memory bus which is probably 32bit at best on that device
<clever> *looks*
<clever> if i decode the ddr2 identification registers, i can see signs of a 32bit bus, your right
<clever> and the pi3 with 1gig of ram, is running a pair of 16bit 512mb chips in parallel, each on half the bus
<clever> geist: oh, and i'm also wondering how ddr2 on desktop differs, with all of those slots, and so many chips on each module!
<geist> well you can look up the width of it. i think in general DDR2 was probaby 32bit wide per chip?
<clever> just how fat is the bus on the controller?
<geist> er i mean 32bit per package
<clever> are any slots or chips sharing data lines, and needing chip-selects
<clever> yeah, every rpi ddr2 ram package has 32bit bus exposed to the BGA, but some (1gig) have a pair of 16bit die's inside a single package
<geist> right
<geist> that's one of the big things thats different about aple M1s
<geist> they have super wide busses. the M1 pro is 256 bit wide, M1 max 512
<geist> so it has crazy bandwidth (for a cpu) though more comperable to a high end gpu
<geist> which also have wide busses
<clever> if we switch gears for a moment, and think about a ddr2 x86 desktop
<clever> lets start by assuming i only have 1 memory module in the motherboard
<moon-child> ehhh
<moon-child> nice cpu w/4 memory channels will be like 100gb/s
<moon-child> gpu can get up to 1tb/s (tho probably more like 500-600gb/s)
<moon-child> m1 is in the middle
<clever> is every chip on the module accessible in parallel? or are they sharing the data bus, and you use a combination of chip-select and addr to select one row in a single chip?
<moon-child> definitely can't compete with nice gpus
<geist> clever: the latter. the odule itself has like a 32bit bus, but each chip is providing 4 bits or so
<clever> ahh
<geist> depending on the layout of the module
<clever> so its doing the exact same thing as the 1gig rpi's
<geist> i think it changed with DDR5 or so though. the width may have gone up
<clever> just reduce the bus width, and raid-stripe the 32bit bus over all of the chips
<clever> ok, so lets say i have 2 identical modules in the motherboard
<clever> i assume the tricky rules about matched modules, are because the controller is going into a sort of 64bit ddr2 mode? and driving both modules at once?
<clever> and they need to have the same cas timings?
<clever> and if you fail to meet those rules, it degrades into 2 32bit busses, possibly only allowing one active at a time?
<geist> ah looks like DDR2 dimms are 64bits wide. makes more sense because it has 240 pins
<clever> ah, but the same stripe thing
<geist> yah
<geist> 8 x 8 probably. also i'm sure if you have ECC version it's really 72 or dso
<clever> and then if i have a 4 slot motherboard, is that just a 256 bit bus into the ram controller?
<clever> and depending on configuration, it will operate in different modes (4 x 64, or 2x128)
<clever> ?
<geist> depends on how the motherboard does it. if it has two channels and 4 slots (common) it's two 64bit busses in this case
<geist> with two dimms per bus
<clever> yep
<clever> but if i mis-matched the dimms, it may degrade into 64 + 32 + 32
<clever> or even give up and just 32+32+32+32
<geist> doesn't mean the memory controller can't be interleaving things and selecting rows on one rank while another one is bursting etc
<clever> yeah
<geist> but that does mean when it's transferring data it's only pulling data off one dimm in a channel at a time
<geist> i think what M1 does that gives it more fflexibility is it just has 4 or 8 separate controllers at the same time
<clever> ive also heard rumors that the ddr4 in the pi4 has transparent ecc
<geist> so it's not that it has a really wide channel, it has a lot of channels
<geist> and it can stripe/etc however it wants
<clever> where the dram chip is internally doing ecc, and sending already repaired data down the bus
<clever> so the host controller isnt even aware of the ecc
<geist> that reminds me, now that i have a M1 pro i should try to write a memory benchmark
<geist> i think you can get close to 200GB/sec with it? or was it 100?
<geist> more so than the oribinal M1 which was still respectibly north of 50GB/sec
<clever> checking some other random numbers, even though the vector<->L1 is a 256bit bus, there is an 11 cycle overhead at the start of a vector op
<clever> so when moving max sized blocks, thats 235 bits/clock on avg
<clever> which comes out to about 13.67 gig/sec
<geist> that's geerally what i see in mid range single chip ARM devices
<geist> usually just a bit over 10GB/sec
<clever> thats at only 500mhz, the arm can get up to 1ghz, but i dont know its configuration as well
<geist> yah but DDR2 cant be clocked that high
<clever> yeah, so at some point, youll L1-miss, and performance will tank
<clever> L2 and uncached reads, are much harder to measure, because of clock domain crossing
<clever> the numbers are never the same, and depend on the clock ratio
<clever> geist: and one last question (i think), if ddr4 is clocked at 400mhz, is that 400 million 32bit transfers per second, or 800million (the ddr)?
<geist> the latter
<clever> ddr2 i mean
<geist> so that'd be probably 800 megatransfers * whatever the width is
<clever> yeah, thats what i thought
<geist> to get to 13GB/sec seems like you need 64bits there
<clever> but ive also heard that some companies now put the megatransfers on the spec sheet
<clever> because bigger numbers are better :P
pretty_dumm_guy has joined #osdev
<geist> well, actually makes sense, because its the rate at which its clocking bits off the bus, then you multiply the bus width
<clever> if one computer has 1600 ram, and another has 3200 ram, which would you buy?
<geist> actually no 400mhz would work
<geist> since that'd be 800MT/sec * 32 bits
<clever> > (800 * 1000 * 1000 * 32) / 8 / 1024 / 1024 / 1024
<clever> 2.9802322387695312
<clever> thats 2.9gig/sec?
<clever> vs the 13.6gig/sec the L1 cache can do
<geist> should be around 25GB/sec?
<kazinsal> 25.6 Gbps, yeah
<clever> > (800 * 1000 * 1000 * 32) / 1024 / 1024 / 1024
<clever> 23.84185791015625
<geist> though that's bits, so... anyway
<clever> ah, 2.9gigbyte, aka 23.8 gigabit
<geist> hmm, that doens't add up how you can get 13GB/sec out of it
<clever> > (500000000 * 235) / 8 / 1024 / 1024 / 1024
<clever> 13.67880031466484
<clever> 500mhz bus, 235 bits/clock on avg (overheads), to bytes, to gig
<kazinsal> should also be noted that MT/s and Gbps are always SI-prefixed
<clever> > (800 * 1000 * 1000 * 32) / 1000 / 1000 / 1000
<clever> 25.6
<clever> which makes the ddr2 400mhz bus capable of 25.6 gigabit/sec
<geist> all this aside raspberry pi 4 uses newer stuff
sonny has quit [Quit: Client closed]
<clever> > (500 * 1000 * 1000 * 235) / 1000 / 1000 / 1000
<clever> 117.5
<clever> and the L1 cache is 117.5 gigabit/sec
<clever> yeah, the pi4 has a ddr4 controller, running at much higher clocks, and support for up to 16gig of ram
<geist> looks like LPDDR4 at 3200mhz which comes out about 12.8GB/sec (not GiB) which is basdically what you observed
<geist> 3200 * 32 / 8
<clever> and given the major clock gating changes they did, they arent just pasting a pre-laidout set of gates for anything
<clever> like ive heard about with some esp? chips
<clever> i found a thread on twitter, where somebody was taking about a line of tiny MCU's, that where designed back before cad was as heavily involved
<clever> and you could visibly see where they just cut the cpu out of the chip design, and routed the bus over to a modern arm core
<clever> leaving a giant void in the middle, lol
<clever> oh, there was also a big forum thread a few years back, where people where trying to benchmark the pi4's ram, and they where getting rather poor numbers, and the rpi engineers basically could only say a few things
<clever> 1: your testing it wrong, its way faster
<clever> 2: due to NDA, we cant say how fast
* geist rolls eyes
<geist> just say it you dipshits your stuff is not that good
<geist> s/due to NDA/due to embarrasment/
<clever> yeah, i dont get what all this secrecy is for
<clever> who really benefits?
<geist> lawyers run the show
<bslsk05> ​twitter: <whitequark> here's what happens if you upload the SiFive FU740 SoC manual somewhere. does this behavior remind you of someone? https://pbs.twimg.com/media/EsR1g2oW4AIdlVO.jpg
<geist> same thing with qualcomm. the default is to make stuff secret unless you expend energy to make it otherwise
<geist> and energy == money
<clever> ah, and here is the twitter thread i mentioned
<clever> wait no
<clever> wrong one
<clever> not sure where it went
<kazinsal> the rpi guys don't want to give concrete performance numbers because they know their whole architecture is lowest-bidder stuff held together with digital duct tape
<clever> there was also something similar recently
<clever> kazinsal: the graphics guy did admit on the forums, that the vc6 core is an unfinished and unreleased product
<clever> and they basically just shoe-horned the new 3d core into the old vc4 design
<clever> and that is where the bcm2711 came from
rcvalle has joined #osdev
<clever> geist: the new pi4 beta firmware, has a new bootmain.elf component, that is entirely accessing hw blocks that have open source drivers, so there arent any secrets left for them to hide, but this is the responce to asking for source: https://forums.raspberrypi.com/viewtopic.php?p=1975352#p1975352
<bslsk05> ​forums.raspberrypi.com: Network install beta test feedback - Page 5 - Raspberry Pi Forums
<clever> > Unfortunately, software license agreements don't work like that. .....
<kazinsal> it's amazing what happens when you accidentally create an extremely successful low-cost linux-compatible appliance board using whatever chip you could buy 100,000 of for the cheapest off of Digikey
<kazinsal> (you have to keep making newer and better low-cost linux-compatible appliance boards using cheap chips you can buy by the crateload from Digikey)
<clever> kazinsal: there was a time when small companies could buy bcm2835 directly from broadcom
<clever> but the first group to do that, violated the rpi firmware license, by running it on a non-rpi board
<clever> and the doors have been sealed shut ever since
<clever> but, with my understanding of the hw and the open firmware, a board could be designed that isnt compatible with the rpi firmware
<clever> so its then impossible to do that again
<clever> > (216 * 1000 * 1000 * 109) / 1000 / 1000 / 1000
<clever> 23.544
<clever> geist: oooo, interesting!, in past testing, i was getting about 109bits/clock from uncached ram, at 216mhz, which comes out to 23.54gbit/sec, and 400mhz ddr2 clocks in at 25.6gbit!
<clever> so i was getting the ddr2 bus to 91% of max load then
the_lanetly_052 has joined #osdev
<clever> from memory, i was loading the same 4kb array in a tight loop
<geist> seems about right
<clever> and if i raise the vpu to 432mhz, it now takes 593 clocks to do a 4kb load, avg of 55 bits/clock, which is 23.75gbit/sec
<clever> right in the same ballpark
<clever> confirming that the ddr2 was 400mhz/800mt, helps confirm those numbers
<geist> but what rpi was this?
<geist> one of the earlier ones because DDR2
<clever> probably a pi3
<clever> but the entire pi0-pi3 range has nearly identical performance, if you ignore the arm core
<clever> the same dram init code works on every model
Mutabah has quit [Ping timeout: 240 seconds]
Mutabah has joined #osdev
<clever> oh right, but its not 400mhz perfect, one min
<clever> its 398.4mhz ddr2 ram
<clever> so 25.4976gigbit, not 25.6gigbit, not that big of a loss
<kazinsal> yeah bus base clocks are usually somewhat variable between 99.6 and 100.4 MHz
<clever> > (19.2 * 0x53)/4
<clever> 398.4
<geist> 19.2 is a common crystal, so yeah makes sense
<clever> 19.2mhz crystal, 0x53 divisor in the driver source, /4 found by experimentation
<clever> but, now that i can compare vectorloads and expected ddr2 bandwidth, i could just raise the ram to 403.2mhz, and see if it still works, and if i get the expected performance increase
nitrix has quit [Ping timeout: 256 seconds]
nitrix has joined #osdev
theruran has quit [Quit: Connection closed for inactivity]
<clever> checking a random banana pi r1 (allwinner a20) board, i see it has a pair of 4Gb 256Mx16 1600Mbps modules from samsung
<clever> 1gig of ram total i believe
<geist> DDR2 is also pretty old
<bslsk05> ​semiconductor.samsung.com: K4B4G1646D-BYK0(4Gb) | DRAM | Samsung Semiconductor Global
<clever> yeah
<clever> oh, and youve mentioned seeing the bcm2711's pci-e core in other soc's, any chance its got a common name or public docs?
Payam has joined #osdev
<geist> design ware i think
<geist> DWC pci
<clever> ah, them again!
<geist> i've seen it over and over again. they seem to implement the basic PCIe goop that vendosr pick up
<geist> yah xhci controllers tend to be DWC too
<clever> which reminds me, is there a vendor string in xhci by chance?
<geist> good question, dunno
<geist> someone here that's written an xhci controller might know
<clever> given the root hub shows up in lspci, that string must come from somewhere...
<clever> 01:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)
<clever> a pci-e xhci controller shows up as this
<clever> wait doh
<geist> that's probably just pci
<clever> Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
<clever> yeah, that first one was lspci, *doh*
<clever> the 2nd is lsusb
<clever> now, if i flip on the 2nd xhci controller...
xenos1984 has quit [Read error: Connection reset by peer]
<klange> lspci uses a database, no vendor/device description strings in pci
<clever> i now see an extra `Linux Foundation ?.0 root hub` that wasnt there before
xenos1984 has joined #osdev
<clever> this 2nd xhci isnt on a pci bus
<klange> isn't 001.002 on USB the hub that connects the USB2 ports?
<geist> probably the generic name the latter
<clever> i'm confirming which is which now
<clever> ah, there is both a `Linux Foundation 3.0 root hub` and a `Linux Foundation 2.0 root hub`, neither was there before!
<clever> let me try to exclude the pcie one from the mess
<clever> root@pi400:/sys/class/pci_bus/0000:01/device# echo 1 > remove
<clever> bit of a big hammer, the entire pci-e bridge vanished from lspci, lol
<klange> i have done that before
<clever> but this 2nd xhci and the gigabit are both non-pcie
<clever> so its fine, lol
<klange> as is the sd, wifi, bluetooth
<klange> video, of course...
<clever> yep
<clever> https://gist.github.com/cleverca22/ccf79400c6dc15098a6e5b7259dd193b this would be the 2nd xhci controller
<bslsk05> ​gist.github.com: gist:ccf79400c6dc15098a6e5b7259dd193b · GitHub
<klange> gods for a chipset that's promoted for being cheap, the hodpodge of interconnects and buses is _hilarious_
<clever> ive been refering to it as the broadcom xhci, because its directly in the soc, and its not the via-labs one
<clever> oh, you dont even know :P
<clever> there are even interconnects on the usb ph
<clever> phy*
<clever> 2 entirely different usb controllers, are sharing 1 usb phy!!
<klange> hm, cart-horse as it's not like I have an xhci stack yet anyway, but would my Apple C dongle work to provide power while giving me a USB-A port so I can poke that controller instead...
<clever> probably
Electron has quit [Remote host closed the connection]
Electron has joined #osdev
rcvalle_ has joined #osdev
rcvalle has quit [Ping timeout: 256 seconds]
[itchyjunk] has quit [Read error: Connection reset by peer]
k8yun_ has quit [Quit: Leaving]
zaquest has quit [Remote host closed the connection]
zaquest has joined #osdev
_xor has quit [Quit: brb]
Electron has quit [Remote host closed the connection]
ravan has joined #osdev
<gorgonical> I have just learned about intel's protection keys scheme. Stumbled upon it from a paper hacking the mechanism to allegedly improve ipc vs security tensions
GeDaMo has joined #osdev
Jari-- has quit [Ping timeout: 272 seconds]
gog has joined #osdev
_xor has joined #osdev
pretty_dumm_guy has quit [Ping timeout: 240 seconds]
<moon-child> gorgonical: have a link? Sounds interesting
<gorgonical> it's 4.6.2 in the sdm vol 3. kernel docs have a link here: https://www.kernel.org/doc/html/latest/core-api/protection-keys.html
<bslsk05> ​www.kernel.org: Memory Protection Keys — The Linux Kernel documentation
<gorgonical> basically introduces 16 regions that give read and write enable/disable in userspace by jamming a few bits into the page table addresses
<gorgonical> something like a hack that made it into hardware
<gorgonical> i'm very tired so that explanation may not make any sense
<moon-child> doesn't quite make sense, but I'm also tired, so math checks out :P
ElectronApps has joined #osdev
Payam has quit [Ping timeout: 256 seconds]
MrBonkers has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]
Bonstra has quit [Ping timeout: 250 seconds]
Bonstra has joined #osdev
not_not has joined #osdev
<not_not> Hi
<g1n> hi not_not
<klange> okay it's arguably terrible and not at all the fancy thing NTP defines, but I can at least set the clock on the RPi after booting... after discovering some terrible wrong math in 'mktime' that I guess I just wasn't using
<klange> also my CD FAT images weren't building _again_, and for some reason I am not catching the 'disk full' error the tool spits out so everything happily continues and I get an empty ramdisk...
<klange> because of course it's already put everything else in there and it's the 16MB ramdisk that is too much...
dennis95 has joined #osdev
DonRichie has quit [Quit: bye]
freakazoid333 has quit [Ping timeout: 245 seconds]
<not_not> Hi dennis95
<not_not> Ur name is Dennis?
<dennis95> yes
<not_not> Mine too
<not_not> I am dennis91
<not_not> Maybe people called Dennis Are more prone to os development like Dennis ritchie
not_not has quit [Ping timeout: 272 seconds]
ymwm has joined #osdev
Payam has joined #osdev
not_not has joined #osdev
Payam has quit [Quit: Client closed]
Payam has joined #osdev
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
dude12312414 has joined #osdev
nyah has joined #osdev
sdfgsdfg has quit [Quit: ayo yoyo ayo yoyo hololo, hololo.]
<mrvn> moon-child: the layout of bitfields is implementation defined, more specifically it's in the C calling conventions. But generally bits are packed either high to low or low to high. Memebers of the struct will use the bit width of the type you use but get packed as long as they don't cross a boundary of the given type. The used type(s) determine the alignment and size. So { char a:5; char b:5; char c:5; } will
<mrvn> be 3 byte, alignment 1. { short a:5; short b:5; short c:5; } will be 2 bytes alignment 2. At least for any sane compiler/abi.
<mrvn> moon-child: also never use packed. On ARM a packed means the compiler MUST create byte access and your MMIO register will totaly break.
<mrvn> I'm not quite sure about what access the compiler is allowed to do with bitfields, e.g. { int a:8; int b:8; int c:8; int d:8; } Is the compiler allowed to do byte access or does it have to read/write int and mask?
<mrvn> https://godbolt.org/z/ns1MK5Gfd apparently gcc does byte access.
<bslsk05> ​godbolt.org: Compiler Explorer
<not_not> Nice always wondered that
<mrvn> moon-child: another point to your bitfield. You hope the compiler will merge multiple writes to bits into a single 32bit write. But MMIO register need to be volatile. The compiler is not allowed to merge the writes as you declared that each has a observable effect.
srjek has joined #osdev
<mrvn> moon-child: One thing I played with is a union of uint32_t and bitfield as temporary object. The read/write use the uint32_t of the union, the user uses the bitfield. So you read the register, toggle a few bits and then write it back, which you can do with RAII.
the_lanetly_052_ has joined #osdev
the_lanetly_052 has quit [Ping timeout: 256 seconds]
gildasio has quit [Ping timeout: 240 seconds]
gildasio has joined #osdev
not_not has quit [Read error: Connection reset by peer]
gwizon has joined #osdev
zaquest has quit [Read error: Connection reset by peer]
zaquest has joined #osdev
[itchyjunk] has joined #osdev
blockhead has quit []
ymwm has quit [Ping timeout: 272 seconds]
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
freakazoid343 has joined #osdev
[itchyjunk] has quit [Read error: Connection reset by peer]
ElectronApps has quit [Remote host closed the connection]
[itchyjunk] has joined #osdev
sonny has joined #osdev
ymwm has joined #osdev
mctpyt has joined #osdev
freakazoid343 has quit [Read error: Connection reset by peer]
ymwm has quit [Quit: Leaving]
sonny has quit [Ping timeout: 256 seconds]
sonny has joined #osdev
sonny has quit [Quit: Client closed]
pretty_dumm_guy has joined #osdev
k8yun has joined #osdev
eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
not_not has joined #osdev
<not_not> Wow
isaacwoods has joined #osdev
dennis95 has quit [Quit: Leaving]
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
scoobydoob has joined #osdev
scoobydoo has quit [Ping timeout: 256 seconds]
scoobydoob is now known as scoobydoo
mctpyt has quit [Ping timeout: 256 seconds]
vdamewood has quit [Read error: Connection reset by peer]
vdamewood has joined #osdev
mctpyt has joined #osdev
not_not has quit [Read error: Connection reset by peer]
<geist> mrvn: yeah i've fiddled with that too. works pretty good *if* you can be sure that on all the arches you use the bitfields line up
<geist> depending on what arches you support, etc
pretty_dumm_guy has quit [Ping timeout: 240 seconds]
<mrvn> geist: doing this for the 16650 uart might be tricky and need a #ifdef around both orders to pick the right one.
gorgonical_ has joined #osdev
<mrvn> Hmm, kind of breaks freestanding. It has to pick an abi for the bitfields.
* mrvn likes: auto & avl() { return Bits<5,6,7>; } sort of things.
srjek has quit [Ping timeout: 240 seconds]
wootehfoot has joined #osdev
k8yun has quit [Quit: Leaving]
gorgonical_ has quit [Read error: Connection reset by peer]
the_lanetly_052_ has quit [Ping timeout: 260 seconds]
Teukka has quit [Read error: Connection reset by peer]
Teukka has joined #osdev
simpl_e has joined #osdev
gildasio has quit [Quit: WeeChat 3.4]
<moon-child> mrvn: I see. I haven't actually used bitfields in practice (never seemed worthwhile over manually fudging bits, and seemed rather twiddly). Guess I made the right choice!
<moon-child> though temp union with integer and bitfield is clever
<mrvn> not quite legal C though. You are only allowed to read the type out of a union you wrote to it. Using it to convert between uint32 and bitfield is not legal.
<mrvn> but it's such a nice way to access bits.
<mrvn> Saddly you also meet stuff like the x86 page tables where an address is split into multiple parts.
<kingoffrance> there's legal, lawful, grace, alchemy -- in that order ;D
<mrvn> you forgot magic
Payam has quit [Quit: Client closed]
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
<moon-child> mrvn: that's not true
<moon-child> it's only illegal in c++
<moon-child> it's fine in c
chigorin is now known as australopithecus
australopithecus is now known as chigoringrigorin
chigoringrigorin is now known as australopithecus
<sham1> > 23:07 <mrvn> not quite legal C though. You are only allowed to read the type out of a union you wrote to it.
<sham1> Not ever since C1999, you can do type punning with unions
GeDaMo has quit [Remote host closed the connection]
<mrvn> sham1: iirc if you have structs in the union that start the same then you can access the start. But as soon as they diverge you can only access the type you wrote.
<mrvn> Before C99 a union didn't even have to use the same address for all it's members.
<mrvn> anyway, the bitfield is already implementation defined and the implementation also says what happens when you type pune. So we are good.
<sham1> mrvn: oh but you can, although it's only on a footnote: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
<sham1> 82) If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the ne w type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
srjek has joined #osdev
<mrvn> sham1: "the appropriate part of the object representation" is what?
<sham1> So for example, `union foo { uint32_t bits; float f; } f = { .f = 3.14195f };` and you can use `f.bits` to get the bit representation of the float, and that's completely valid in C1999 and beyond. At least that's how the footnote is interpreted
<sham1> Although I do feel that `float f = 3.14195f; uint32_t bits; memcpy(&bits, &f, sizeof(f));` feels nicer
<mrvn> sham1: assuming sizeof() is the same.
<sham1> Sure
<sham1> But we can essentially assume IEEE 754
<mrvn> nope.
<sham1> Especially in the context of osdev of course
<mrvn> It's implementation defined
<mrvn> The only thing you actually know is that uint32_t is 32 bit if it exists.
<sham1> Yes, standards sense it is indeed implementation defined, but you see what I'm driving at
<mrvn> so not even that strictly speaking.
<mrvn> sham1: If you accept implementation defined behavior then it's all good. Which we do.
<mrvn> sham1: My point was that if you have "union { struct Base { enum type; } base; struct Foo { enum type; ... } foo; struct Bar { enum type; ...} bar;} blub" then blub.base.type is always defined.
<mrvn> Before C99 that was implementation defined too.
<sham1> Yes, and it was in ANSI-C. It's just that now after 1999 that's possible for other stuff as well
<mrvn> In Ansi-C you could do #define union struct
[itchyjunk] has quit [Remote host closed the connection]
adder has quit [Read error: Connection reset by peer]
blockhead has joined #osdev
<mrvn> sham1: oh, one more nit-picking: memcpy(&bits, &f, sizeof(f)) is UB: The memory areas must not overlap.
wootehfoot has quit [Quit: Leaving]
<sham1> Yes, although I'd like to see an implementation which places two automatic storage duration variables in such a way that they'd overlap
<mrvn> oh, you didn't mean copying the members of the union, then never mind.
sdfgsdfg has joined #osdev
<sham1> Yeah. I do consider using memcpy with variables and such nicer than doing weird type punning with unions. My point was just that it's possible and in the standard
<mrvn> I wonder if memmove(&f.bits, &f.f, sizeof(f)}; becomes a nop
dmh has quit [Quit: rip]
<bslsk05> ​godbolt.org: Compiler Explorer
<mrvn> and now I know
sdfgsdfg has quit [Quit: ayo yoyo ayo yoyo hololo, hololo.]
pretty_dumm_guy has joined #osdev