<whitequark[cis]>
esden: I think we should do a limited run of ram-paks after this
<whitequark[cis]>
maybe not a full run before we run a bigger battery of tests, including on a PLL
<_whitenotifier-6>
[glasgow] colinoflynn opened pull request #545: device.hardware.py: Add skip_on_error flag to improve Windows experience with USB discovery - https://github.com/GlasgowEmbedded/glasgow/pull/545
<_whitenotifier-6>
[glasgow] colinoflynn synchronize pull request #545: device.hardware.py: Add skip_on_error flag to improve Windows experience with USB discovery - https://github.com/GlasgowEmbedded/glasgow/pull/545
<tpw_rules>
how many copies of the bee movie script can it support?
<tpw_rules>
48? or one but written and read 48 times?
<whitequark[cis]>
at least 96 per chip, actually, i just wrote it in UTF-16
<tpw_rules>
that's even more cursed
<tpw_rules>
(hopefully you tested the odd bytes too)
<_whitenotifier-5>
[glasgow] colinoflynn opened pull request #547: applets: Remove a few unicode characters that caused problems on Windows mingw terminal - https://github.com/GlasgowEmbedded/glasgow/pull/547
<_whitenotifier-6>
[glasgow] colinoflynn synchronize pull request #545: device.hardware.py: Add skip_on_error flag to improve Windows experience with USB discovery - https://github.com/GlasgowEmbedded/glasgow/pull/545
<_whitenotifier-6>
[glasgow] whitequark closed pull request #545: device.hardware.py: Add skip_on_error flag to improve Windows experience with USB discovery - https://github.com/GlasgowEmbedded/glasgow/pull/545
<josHua[m]>
(well, filling, and reading. this is what I did for my nand-flash peripheral a while ago)
redstarcomrade has quit [Read error: Connection reset by peer]
ari has quit [Ping timeout: 260 seconds]
redstarcomrade has joined #glasgow
redstarcomrade has quit [Changing host]
redstarcomrade has joined #glasgow
p1onk[m] has quit [Quit: Idle timeout reached: 172800s]
ari has joined #glasgow
<ewenmcneill[m]>
Catherine: having read through the code (having a vague idea how HyperRAM works, and an approximate idea how Amarath is written) I agree, it looks very elegant (and it's extremely well commented). Well done!
<jamie3456[m]>
hello! just assembled my first glasgow by hand – any tips for testing that it all works? after some qfn surgery I no longer get errors setting pullups on port A, which I was getting initially (I lifted the pad for ENVA)
<jamie3456[m]>
(intentionally leaving the remaining bridge there because it's two GND pads and I don't wanna dislodge the wire)
<jamie3456[m]>
*two 3v3 pads
<jamie3456[m]>
I see that the selftest applet errors with "mode pins-int is broken on device revision C3"
redstarcomrade has quit [Read error: Connection reset by peer]
redstarcomrade has joined #glasgow
redstarcomrade has quit [Changing host]
redstarcomrade has joined #glasgow
<jamie3456[m]>
oh i see, there are other modes :)
ar-jan has joined #glasgow
<jamie3456[m]>
damn, pins-loop and voltage fail badly (with the connections in place, i think…)
<jamie3456[m]>
at least i have leds :)
ar-jan has quit [Ping timeout: 260 seconds]
<rcombs>
update: it works, turns out I can write FPGA code
<rcombs>
currently only works with a TAS file hardcoded in at build-time; streaming in input over USB *should* work but doesn't
<whitequark[cis]>
nice
redstarcomrade has quit [Read error: Connection reset by peer]
<jamie3456[m]>
looks like i somehow mixed up all the resistor values around the pad A voltage regulator hehe
bvernoux has joined #glasgow
<jamie3456[m]>
s/pad/port/
<jamie3456[m]>
btw are discord message edits transmitted to the bridged chat platforms somehow, or should i do irc-style corrections?
Daniel[m]1 has joined #glasgow
<Daniel[m]1>
i can see the edits in matrix
<jamie3456[m]>
oh haha i can see in whitelogger, i edited my message to change "pad" to "port" and it transmits to irc as "s/pad/port/"
<jn>
it's a very nice translation of the edit
<gsuberland>
jamie3456[m]: working on getting this into main docs (been busy last few days) but this may be of use to you for getting an understanding of how all the hardware components fit together and what depends on what: https://gist.github.com/gsuberland/0affffae190386e0ae2c5c8c5974104c
<gsuberland>
e.g. if you're seeing stuff not coming up or not replying on addresses you expect, there's info in there to help figure that out
q3k[cis] has quit [Quit: Idle timeout reached: 172800s]
novakov[m] has joined #glasgow
<novakov[m]>
I might be blind and don't see something obvious - is there an applet that allows me to just set pins of glasgow to high/low?
czero64[m] has quit [Quit: Idle timeout reached: 172800s]
notgull has joined #glasgow
notgull has quit [Ping timeout: 252 seconds]
notgull has joined #glasgow
bvernoux has quit [Read error: Connection reset by peer]
notgull has quit [Ping timeout: 260 seconds]
redstarcomrade has joined #glasgow
redstarcomrade has quit [Changing host]
redstarcomrade has joined #glasgow
bvernoux has joined #glasgow
redstarcomrade has quit [Read error: Connection reset by peer]
dos has quit [Read error: Connection reset by peer]
dos has joined #glasgow
<Wanda[cis]>
<novakov[m]> "I might be blind and don't see..." <- unfortunately not; this feature has been requested quite a few times though
<Wanda[cis]>
we're definitely open to setting up somehting like this
<gsuberland>
would be useful for testing that all pins are functioning as expected when folks build their own boards, too.
<Wanda[cis]>
we do have the selftest applet for that
<whitequark[cis]>
rcombs: to add a bit more context: sending a PR to Glasgow is essentially a request that whatever it is be maintained forever, with substantial changes often being done by people who lack the original context
<whitequark[cis]>
so the bar for merging a PR is vastly higher than "it works"
<rcombs[m]>
yeah, honestly this would probably make more sense as an out-of-tree applet, those just don't exist yet
<whitequark[cis]>
they do, behind a feature flag
<rcombs[m]>
oh nice
<whitequark[cis]>
there isn't really a problem with having a TAS applet in-tree
<whitequark[cis]>
it just needs to fit into the existing architecture (for example, we separate format parsers and gateware), be written with targeting a slow FPGAs in mind (the decision trees you have there that are doing the parsing are rather complex and the timing on them is probably pretty bad), and structured in a way that keeps it maintainable
<whitequark[cis]>
which is all not very well compatible with just quickly hacking something together
<rcombs[m]>
mm, just has to be Actually Competent instead of Baby's First 1-day FPGA Project
<whitequark[cis]>
correct
K900 has joined #glasgow
<K900>
Is the bridge broken again
<whitequark[cis]>
which of the two
<K900>
I'm only seeing messages from whitequark
<K900>
Matrix
<rcombs[m]>
but yeah, if there's no bandwidth for review+feedback I'm happy to just have this sit in my fork for the foreseeable future, and then maybe one day I learn enough about this stuff to clean it up for merge
<whitequark[cis]>
I'm on Matrix and I see rcombs just fine
<whitequark[cis]>
it's your Matrix client
<whitequark[cis]>
rcombs[m]: the way I would structure the gateware is by splitting it into an N64 controller module and a buffer replay module where the latter streams commands to the former
<rcombs[m]>
mmm, that'd be a good idea; the buffer replay stuff is probably gonna get reused for [S]NES stuff
<whitequark[cis]>
yes
<whitequark[cis]>
the "log" thing is pretty weird and it sounds like you should be using a memory there? not sure
<whitequark[cis]>
extracting `m.d.<anything>` or any other calls on `m` into functions is unsupported
<whitequark[cis]>
modifying self within elaborate is not something you should ever do in production code
<rcombs[m]>
yeah that was hacked together just to be able to observe state
<rcombs[m]>
is passing m down into function calls okay?
<K900>
Weird
<whitequark[cis]>
rcombs[m]: nope
<K900>
I'll have to double check the logs later
<rcombs[m]>
ah
<whitequark[cis]>
it results in extremely odd and difficult to understand behavior in some cases because it's not composable in the way lexical scoping there suggests
<rcombs[m]>
how about a function that returns an array of assignments, and the caller adds that to self.m.d.sync
<whitequark[cis]>
that's fine
<whitequark[cis]>
well not self.m.d.sync but just m.d.sync
<rcombs[m]>
erm yes that
<rcombs[m]>
alrighty, that's pretty straightforward to do
<whitequark[cis]>
whitequark[cis]: we'll eventually fix the underlying scoping issue but that requires major year-long internal refactoring that we're about in the middle of
<rcombs[m]>
really the log thing probably should be an amaranth fifo
<whitequark[cis]>
so, again, I haven't read it in detail, but something to keep in mind is
<whitequark[cis]>
have you run glasgow run benchmark latency yet?
<rcombs[m]>
but _really_-really iirc you were pondering some nicer logging functionality and it'd be better to use that long-term
<whitequark[cis]>
the logging functionality I was thinking of is host-only
<whitequark[cis]>
oh, is the "log" a mini-ILA?
<whitequark[cis]>
you should try glasgow run --analyze applet
<whitequark[cis]>
that gives you a FIFO in/out trace correlated with pin changes
<rcombs[m]>
uhhhh conceptually sorta lol, during development I had way more logging in there, dumped out a lot more state, logged on a decent number of state transitions
<rcombs[m]>
I hadn't but did now, no idea what to make of its output though
<whitequark[cis]>
feed it to gtkwave
<rcombs[m]>
[looks up gtkwave]
<whitequark[cis]>
I think you don't need to do almost all of the run-side UART stuff?
<whitequark[cis]>
it could be literally just await in_fifo.write(parse_tas_file(args.as))
<whitequark[cis]>
in interact
<rcombs[m]>
I don't have that arg
<whitequark[cis]>
whatever is the file argument is
<whitequark[cis]>
s//`/, s/as/file/, s//`/
<whitequark[cis]>
the tty action isn't to read from stdin, it talks to a PTY
<whitequark[cis]>
and all of the forwarding stuff is of completely no use here since it's just one way
<rcombs[m]>
well, and dumps log output to stdout
<whitequark[cis]>
in general I don't want to merge debugging functionality unless it's well-integrated and necessary for users of the applet; hence --analyzer being useful, as it requires no modification to the code
<whitequark[cis]>
things like got_data = Signal(8 * 37) are exceptionally inefficient in gateware
<whitequark[cis]>
if you do a bit_select on that you create a mux tree of size 296+(296/2)+(296/4)+(296/8)+(296/16)+...
<rcombs[m]>
the logging that remains isn't primarily for debugging; it's there to report events (console pairing, frame timings, unsupported commands...)
<whitequark[cis]>
which is like one tenth of the FPG just for that bit_select
<whitequark[cis]>
ok I see; so it's bidirectional. the frontend is still definitely wrong in that case; I'd probably spawn a background task reading from in_fifo and logging to the usual glasgow log, instead of the _forward thing from the UART applet that doesn't even fully work on Windows
<whitequark[cis]>
rcombs[m]: or a shift register; doing `m.d.sync += got_data.eq(Cat(foo, got_data))` is very cheap
<rcombs>
oh huh
<rcombs>
I'd still need to read the first byte to branch on later, but I guess that could just be a register on its own
<whitequark[cis]>
for memories, you have 32 of them, which can be configured as 25616, 5128, 10244, or 20482
<whitequark[cis]>
oh fucking matrix
<whitequark[cis]>
* for memories, you have 32 of them, which can be configured as 256x16, 512x8, 1024x4, or 2048x2
<rcombs[m]>
oh hah, and by sheer coincidence I have a 1024x4
<whitequark[cis]>
the toolchain will attempt to satisfy any memory request you give to it
<whitequark[cis]>
by combining these 32 memories (btw, each FIFO is using one of them, so you're down to 30 right away)
<whitequark[cis]>
so your 1024x32 memory is consuming 8 of the BRAMs (a quarter of the FPGA again)
<rcombs[m]>
oh, x4 bits
<whitequark[cis]>
correct
<whitequark[cis]>
in general, I think you seriously overindexed on "this needs to be low-latency" and added buffering everywhere despite it being almost certainly unnecessary
<rcombs[m]>
I'm too used to operating on bytes
<whitequark[cis]>
try running glasgow run benchmark latency
<whitequark[cis]>
you'll see that the end-to-end Python-to-Python latency is something like 150 micros, with stddev of as much
<whitequark[cis]>
this means that you could literally feed it commands one by one from the Python harness and it would almost, but not quite, work
<rcombs[m]>
that's too long, iirc N64 requires a response to start no later than 62.5µs after a command
<whitequark[cis]>
that's your "I have literally not even tried to design this for realtime" case
<whitequark[cis]>
I'm using it as an example for emphasis, not to suggest you should do that (you shouldn't)
<whitequark[cis]>
it would be a terrible design and a bad idea, but the fact that it's almost low enough gives you an indication of how low the typical latency is
<rcombs[m]>
but yeah this is basically a direct port of my C++ code for ARM boards
<whitequark[cis]>
the existing OUT FIFO buffer (512x8) is probably entirely sufficient for your buffering needs, provided you give the Glasgow stack the entire TAS file on the host so it can shove that; you could also give it get_out_fifo a bigger depth=
redstarcomrade has joined #glasgow
redstarcomrade has quit [Changing host]
redstarcomrade has joined #glasgow
<rcombs[m]>
though that reads from a microSD card instead of over USB
<whitequark[cis]>
your parser runs at 20.83ns per iteration
<whitequark[cis]>
so unless it takes a few thousand cycles to get a response it's fine to do it with any level of inefficiency
<rcombs[m]>
I reflexively add buffers as large as I can get away with because I don't trust OS schedulers as far as I can throw them
<whitequark[cis]>
this is my result from a busy machine that's running an active Electron instance and what not
<rcombs[m]>
yeah, 17119.54µs was worst here
<whitequark[cis]>
rcombs[m]: the only scheduler here that you care about is the XHCI controller's microframe scheduler (you're using an USB 3 downstream facing port for this, right?)
<whitequark[cis]>
(you REALLY should be using an USB3 DFP for any low latency or high bandwidth USB work, even with USB HS)
<rcombs[m]>
I'm doing this on a recent MBP, all ports are USB3
<whitequark[cis]>
this is because XHCI controllers will often schedule three BULK OUT or BULK IN per microframe, while EHCI controllers like to do fuck all
<rcombs[m]>
huh
<rcombs[m]>
neat
<whitequark[cis]>
so both the bandwidth and the latency is significantly improved on XHCI hosts compared to EHCI
<rcombs[m]>
that's extremely good to know
<whitequark[cis]>
bandwidth by several MB/s (that's megabytes), latency by an order of magnitude if not more
<whitequark[cis]>
rcombs[m]: about right for a macbook
<vegard_e[m]>
three?
<vegard_e[m]>
oh, you mean URBs/transfers, not packets, I guess
<whitequark[cis]>
three... scheduling units or something
<whitequark[cis]>
I don't quite recall, it's been about 5y
<whitequark[cis]>
basically, XHCI controllers schedule a lot more aggressively if you're giving them full sized BULK packets and they're accepted
<whitequark[cis]>
and I remember seeing three in a row of something
<whitequark[cis]>
<rcombs[m]> "mm, a bit over 4 seconds" <- but yes, with worst case latency of 17 ms you don't need even that much
<whitequark[cis]>
and I think a lot of the complexity in your gateware is stemming from the excessive buffering
<whitequark[cis]>
as well as resource use
<rcombs[m]>
the ARM version of this (with input read from a microSD card into a 2048-byte buffer) is _extremely_ solid; I once had a stream run continuously for a bit over 39 days in perfect sync, while I generally don't trust a full desktop machine not to, like, hang for an entire minute at some point over that duration
<vegard_e[m]>
a microframe could theoretically fit 13 full 512B packets, but last I benchmarked it I mostly topped out at 12
<rcombs[m]>
particularly if it's also running OBS and streaming to twitch
<whitequark[cis]>
rcombs[m]: ok, so you have two general solutions here
<vegard_e[m]>
but I benchmarked throughput, not latency, so I don't have experience with how many individual URBs you can get it to schedule
<rcombs[m]>
big part of the work to get it that stable was to have it absolutely not care what happens to the host machine
<whitequark[cis]>
the first one is to extend the 512 deep FIFO to be the max size you can do, which is 31x512=15872, which should handle a full 2 hours of hang
<whitequark[cis]>
but in your particular case, I would suggest a much simpler solution
<rcombs[m]>
there were some issues with the logging stack blocking early on, esp if the USB connection dropped and came back
<whitequark[cis]>
connect an SPI flash, shove the entire thing into an SPI flash, do a Fast Read and stream directly to your stream processor
<rcombs[m]>
yeah that'd be pretty good
<whitequark[cis]>
rcombs[m]: USB connection drop in glasgow should result in a crash
<whitequark[cis]>
it's fundamentally designed as a tethered device, though it also has an untethered mode where it can load and start a bitstream you program into its NVM
<whitequark[cis]>
note that the untethered mode is currently incomplete in that you can't set the IO buffer voltage in, uh, any way
<whitequark[cis]>
why? we don't have a boot flow that ensures you don't accidentally get 5V where you're not expecting it
<whitequark[cis]>
and no one wanted it that much (until now i guess)
<whitequark[cis]>
whitequark[cis]: if you do this, you can ditch the PC entirely
<whitequark[cis]>
hotwire an SPI flash, connect an N64, plug it into a stable 5V source, it should cheerfully run until the heat death of the universe
<rcombs[m]>
isn't this 2 minutes
<whitequark[cis]>
oh, oops
<whitequark[cis]>
i can't into units
<rcombs[m]>
yeah, I think if I end up using this for long-running streams, this'll be the way to go
<whitequark[cis]>
though then it also ties up a $200 device for 40 days
<rcombs[m]>
heh, vs a Teensy
<whitequark[cis]>
although it's not formally supported (in the "we do not provide support" sense) you can also pluck various bits of the Glasgow framework and shove the result into any FPGA you like, even from another family
<rcombs[m]>
like, given that the Teensy implementation exists and works reliably, this whole project isn't really _essential_ to anyone, I just think it's neat
<whitequark[cis]>
it is definitely possible (I do this occasionally) to grab an applet and make it run in a completely foreign environment on some random devboard
<whitequark[cis]>
which is another reason I insist applet gateware is modular
<whitequark[cis]>
* which is another reason I insist applet gateware must be modular
<rcombs[m]>
good way to learn the basics of FPGA gateware development in a day, and it makes N64 TAS playback very accessible to folks with glasgows (the Teensy version requires a little more fiddling with the hardware and configuration)
<whitequark[cis]>
yes
<rcombs[m]>
10/10 project, would recommend to anyone
<whitequark[cis]>
anyway, to summarize, the buffering you do within the applet can never save you for more than 2 minutes due to the amount of onboard RAM
<whitequark[cis]>
thank you :)
<whitequark[cis]>
whitequark[cis]: ... except if you buy Ram-Pak, which has 16 MB of RAM
<whitequark[cis]>
the early 90s called
<rcombs[m]>
oh yeah that might be an interesting way to go
<rcombs[m]>
16MB is plenty for the vast majority of TASes, of course
<rcombs[m]>
that's at a level where I'd be comfortable using it on the GDQ stage
<whitequark[cis]>
the 16MB is organized as 2 chips * 2 dice * 4 MB
<whitequark[cis]>
and it has a really weird DDR interface (but not the DDR you know)
<whitequark[cis]>
I did write a memory controller just yesterday though
<sorear>
something tells me you could stretch the FIFO memory quite a bit with RLE
<whitequark[cis]>
I assumed TAS already includes RLE
<rcombs[m]>
someone did suggest compression
<whitequark[cis]>
oh
<tpw_rules>
i had committed Crimes in my other ice40 stuff but i was basically saturating the bandwidth of the snes controller ports
<whitequark[cis]>
is it just completely uncompressed?
<tpw_rules>
usually
<whitequark[cis]>
oh.
<rcombs[m]>
nope, the file format is literally just "4 bytes per frame, exactly what gets sent to the console over the controller port"
<whitequark[cis]>
well there's your problem ™
<whitequark[cis]>
yeah I would compress on the host and decompress in gateware ofc
<whitequark[cis]>
this is how --analyze works
<whitequark[cis]>
FPGA based RLE compressor
<rcombs[m]>
could be very fun to play with
<rcombs[m]>
wonder if that'd let me hardcode in TASes long enough to be useful
<rcombs[m]>
I built that functionality to make testing easier (like, it let me test the console output portion separately from the fifo input)
<tpw_rules>
the way TASes work i think you could squeeze in a simple backreferencing encoder for even more savings
<tpw_rules>
and would justify more RAM :P
<rcombs[m]>
and it _does_ play what's probably the single most useful TAS in existence: clear.m64 (it just deletes file A on the cartridge, I run it before every playback of anything else)
<rcombs[m]>
(possibly the most-played TAS out there lol)
<whitequark[cis]>
I think the built-in TAS functionality is kind of weird but given clear.m64 that actually seems genuinely useful in production
<whitequark[cis]>
so a stream interface for the TAS source + a mux from internal to external source would work well
<rcombs[m]>
I was hoping I'd be able to fit 1key, which is somewhere in the neighborhood of 30KB uncompressed
<rcombs[m]>
(bit over 4 minutes)
<whitequark[cis]>
(they're not implemented yet, which is trivial, but what's not trivial is writing a methodology for development with streams that beginners can understand and apply, which is the hard part that's on me)
<rcombs[m]>
and then have it work in standalone mode with no additional peripherals
<tpw_rules>
the glasgow fpga does not have SPRAM, correct?
<whitequark[cis]>
tpw_rules: only revA/revB ones do
<rcombs[m]>
new memory expansion addon for glasgow coming soon
<rcombs[m]>
(it's perfect that it's even called a Pak)
<esden[m]>
If I am assembling a few more ram-pak prototypes I might have one to go to rcombs (@_discord_120681155392438273:catircservices.org) if it helps with the Tas development. I find that a very good cause considering I watch GDQ quite regularly 😉
<tpw_rules>
i'd like one too, i can be a guinea pig with that and also help my own tas work
<esden[m]>
Guess why I called it a RAM-Pak rcombs (@_discord_120681155392438273:catircservices.org)
<rcombs[m]>
lol, fantastic
<tpw_rules>
is hyperram worse or better than rdram
<esden[m]>
I have to see how many parts I have. The HyperRAM might be a limiting factor. The manufacturers like to obsolete them quite quickly in favor of new models. I need to do some checking.
<rcombs[m]>
I'd be happy to play around with this kind of thing
<tpw_rules>
rcombs[m]: do you remember how the snes controller video encoding worked?
<rcombs[m]>
the snes controller protocol is just a shift register, and games have full control over it, so they can just clock out >16 bits and go as fast as the CPU is capable of
<tpw_rules>
yes, i mean the specific sgdq demo with streaming video over same
<tpw_rules>
s/s//
<rcombs[m]>
like, the way it encodes and draws the actual video data? not sure of all the details there
<rcombs[m]>
lots of funny PPU tomfoolery
<rcombs[m]>
my philosophy with TAS stuff is mostly centered around making stuff accessible to beginners; most playback device designs require custom hardware or the like, and I've focused on making stuff work with off-the-shelf equipment you can just buy and start playing around with
<rcombs[m]>
glasgow isn't as cheap as a teensy, but it's more approachable imo, and it's slightly cheaper than a flipper zero (which this stuff _sorta_ works on but not reliably)
<whitequark[cis]>
rcombs[m]: oh wow, flippers are so expensive
<SnoopJ>
I don't think I had previously clocked that but yea wow
<rcombs[m]>
yeah, glasgow is pricier when you include the aluminum case but not on its own (and if you include the _flipper's_ case they end up within spitting distance of each other)
<tpw_rules>
where does one get the right plugs
bvernoux has quit [Quit: Leaving]
<tpw_rules>
that is the problem i have, maybe because i have too many dev boards and soldering irons already lol
<jamie3456[m]>
the flipper really isn’t a great dev environment either :/
<rcombs[m]>
ufbt's made things a bit better but yeah still not quite as nice as I'd like
<jamie3456[m]>
especially since all google searches in search of documentation are just flooded with clickbait videos
<jamie3456[m]>
cheers
<jamie3456[m]>
i have a good understanding of my issues now though i think: i messed up resistor values for the port A voltage regulator, and there’s a short after the 33ohm resistors between A6&A7
<jamie3456[m]>
* i have a good understanding of my issues now though i think: i messed up resistor values for the port A voltage regulator, and there’s a short after the 33ohm resistor pack between A6&A7
dne has quit [Remote host closed the connection]
dne has joined #glasgow
<jamie3456[m]>
the fact that the 2.4GHz radio peripheral is only accessible to an undebuggable second cortex core running encrypted firmware is a real stretch for a hackable device lol