<bslsk05>
playlist 'Making an 8 Bit pipelined CPU' by James Sharman
wgrant has joined #osdev
<geist>
also looks like magic-1.org is running today
<geist>
he's had that thing alive for i thnk 15 years now
mahmutov has quit [Ping timeout: 256 seconds]
LostFrog has quit [Quit: ZNC 1.8.2+deb2 - https://znc.in]
PapaFrog has joined #osdev
FreeFull has quit [Ping timeout: 256 seconds]
Oli has quit [Quit: leaving]
dormito has quit [Quit: WeeChat 3.3]
FreeFull has joined #osdev
<gorgonical>
I agree with Bill here: somehow FPGAs dont feel like the same thing
<gorgonical>
Like it's cheating or something
<gog>
how to program fpga to be perfect girlfriend
<gog>
not cheating
<gog>
:|
<kazinsal>
field-programmable waifu
* gog
pets kazinsal
* kazinsal
nyaas unexpectedly
<kazinsal>
uh oh. I've become a catboy
<gog>
:3
GeDaMo has quit [Remote host closed the connection]
<gorgonical>
so how does an FPGA actually work? like, are you just ultimately providing the truth-table values for each gate?
<zid>
they don't want you to know
<gorgonical>
Is that what's happening? You program the lookup tables and the block references that table?
<zid>
but they're basically a giant shift register and you clock it all in, and yea, the bits determine if it's an and/or/xor etc
<zid>
in a big grid
<gorgonical>
So surely there's trade-offs between having per-block tables vs a big central table or something
<kazinsal>
basically each logic block has a 4-input LUT, an adder, and a flop-flop
<gorgonical>
And the purpose of the block may not use all of these components
<gorgonical>
But the general idea being any gate configuration you want you can achieve with these pieces
<kazinsal>
yeah FPGA optimization is magic
sortie has quit [Ping timeout: 260 seconds]
<gog>
i'd really like to play with one
ahalaney has quit [Quit: Leaving]
<gog>
i should look into some mini dev boards or something
<gorgonical>
So probably there's a lot more happening under the hood. Cause the blocks have to be routed, etc. So assumedly there's almost no FPGA where you can take gate logic and just apply it?
<gorgonical>
As in a "really see what's happening" approach?
<gorgonical>
My understanding is that you take something like verilog and push it through various tools that transform it into the FPGA magic configuration you need, which will not resemble in any way the verilog you put in
sortie has joined #osdev
<gog>
i have no idea how any of it works lol
<gog>
i just know i want one as a toy
heat has joined #osdev
<zid>
I just want a microcontroller and an spi thingy
heat has quit [Remote host closed the connection]
<bauen1>
zid: a digispark maybe ? it's an attiny85 with a very hackish usb port and enough free wires for spi :D
biblio_ has joined #osdev
biblio has quit [Ping timeout: 240 seconds]
<zid>
by microcontroller I basically mean 'controller', not that micro :P
gog has quit [Ping timeout: 250 seconds]
srjek|home has joined #osdev
biblio_ is now known as biblio
srjek has joined #osdev
srjek|home has quit [Quit: Leaving]
gog` has joined #osdev
Oli has joined #osdev
<pie_>
i patiently await your lecture on monday :3 <gorgonical> so how does an FPGA actually work? like, are you just ultimately providing the truth-table values for each gate?
<geist>
gorgonical: basically it's a series of LUTs yes
<geist>
may be a huge array of say 5 in 2 out LUTs with a lot of interconnecting traces
<pie_>
i think that might be CPLDs but Im not sure<gorgonical> So probably there's a lot more happening under the hood. Cause the blocks have to be routed, etc. So assumedly there's almost no FPGA where you can take gate logic and just apply it?
<geist>
also each LUT may have some additional features like a 1 or 2 bit latch, or a dedicated add circuit, or an inverter on every input/output
<geist>
plus some dedicated SRAM blocks spread around the FPGA, some PLLs and some pin drivers
<geist>
but the bulk of the lifting are the LUTs
<geist>
what's fascinating is to look at what the fpga compiler comes up with
<geist>
you can usually get it to visualize how it decided to flatten your logic
<bslsk05>
en.wikipedia.org: Complex programmable logic device - Wikipedia
<geist>
yah CPLDs and FPGAS are pretty similar nowadays
<geist>
they used to have more of a difference, but now it's kinda like cpu vs microcontroller. similar things, largely scale and how they're used
<geist>
my experience is modern CPLDs are usually smaller, lower power, and have built in flash so you can program them and they stay that way
<geist>
FPGAs usually have an external flash chip and reload their configuration on powerup
<geist>
but are usually bigger
<geist>
(more LUTs)
<clever>
would a CPLD just have an internal flash array, and still load the config, or is it more that the flash is spread over the whole chip, and each config element IS a flash cell?
<geist>
good question. i'm guessing the former?
<geist>
but could be the latter. it's my udnerstanding that the config that an fpga loads is largely sram cells spread all over the luts
<geist>
so it could be you could embed the flash or eeprom in the luts themselves. maybe slower, but requires no load time
<geist>
and thus a cpld is born
<clever>
you could probably figure out, by looking at how quick of a "boot time" the datasheet claims
<pie_>
there have been some fpga reverse engineering efforts
<pie_>
not sure if they really mainly only worked on the bitstreams or if that actually yielded much hardware info
<pie_>
maybe the datasheets do say enoguh
<pie_>
*enough. - or does anyone have access to TechInsights? :p
<geist>
i dunno, both xilinx and altera document their fpgas pretty well
<geist>
you can find good descriptions of precisely how the luts work, how they're laid out, etc
<geist>
the hard part is figuring out how the bitstream maps to them, but it doesn't look *tremendously* hard. if you look at an uncompressed bitstream it really does look a lot like a gigantic bitmap
<geist>
but that being said the lattice ones are well understood, such that there's an open soruce fpga compiler for it
<geist>
i think the problem is fpga compilers are ridiculously complicated
<vancz>
i would be curious to find out what they do one of these days
<vancz>
and what makes the IDEs start at 30gigs or whatever
<vancz>
at least for xilinx
<vancz>
though im suspicious a lot of that is having like 10 copies of toolchains in them and maybe lots of IP? :p
<vancz>
All FPGA tools take a massive amount of space because each supported part needs it’s own model, that specifies not only features bitstream format etc but also all the detailed timing info to run routing, synthesis etc. Thus the higher end the supported parts are the bigger the models, and you have probably hundreds of them (a model might be “Only” Few hundreds megs of data, however you).
<clever>
vancz: was talking about that timing stuff over in #cpudev, how the tooling basically needs to compute the entire propogating time from the input flipflops to the output flipflops, and then compute what max freq the design can handle
<clever>
if the clock is over that number, the signal wont have time to propogate thru every gate, and cross whatever whacky distances the router picked
<vancz>
That makes sense.
<clever>
and then you may need to modify your design to pipeline things, so it does less work in a given clock cycle
<clever>
if you split the job into 2 halves, then the propogation time is halved, so you can run at twice the freq
<clever>
if there are no other bottlenecks
<clever>
but now it takes 2 clock cycles to do the job
<clever>
with an asic, your not limited by how the fpga laid out its gates and LUTS, so you can make things more compact
<clever>
but you still have other issues
<clever>
the fab-house will have a set of rules, on how close gates on the silicon can safely be packed, and your router needs to follow those rules
<vancz>
You're not limited by how the fpga is laid out, you're limited by how you laid it out :p
<clever>
but depending on what resources your using, you may run out of something like blockram in a given area
<clever>
so the router has to wonder over to the other half of the chip, and steal some from there
<clever>
and now your getting a bonus round-trips to the other side of the chip and back again
<clever>
simplest way i can see to cause that, is to just shove all of the fpga block ram into a single array in my verilog
<clever>
then the tooling has to generate an addr decoder, that routes things to the right region of the chip, based on the index i used
<clever>
and what address i access, changes the access latency
<clever>
but to hide that, the tooling has to just take the worst latency possible, and declares that to be the speed limit