<jimwilson>
SpaceCoaster, SiFive doesn't sell chips it sells IP, and Intel did license SiFive IP for the Horse Creek chip, if Intel sells the chips commercially then SiFive might be getting royalties from the chip sales
<dh`>
when will the hardware world finally move to IPv6
* dh`
hides in a very deep corner
riff-IRC has joined #riscv
ahs3 has joined #riscv
Sofia has quit [Ping timeout: 276 seconds]
riff-IRC has quit [Quit: PROTO-IRC v0.73a (C) 1988 NetSoft - Built on 11-13-1988 on AT&T System V]
Sofia has joined #riscv
EchelonX has joined #riscv
BOKALDO has joined #riscv
jack_lsw has joined #riscv
jacklsw has quit [Ping timeout: 250 seconds]
riff-IRC has joined #riscv
jack_lsw has quit [Quit: Back to the real world]
jacklsw has joined #riscv
`join_subline has quit [Ping timeout: 256 seconds]
riff-IRC has quit [Remote host closed the connection]
riff-IRC has joined #riscv
jack_lsw has joined #riscv
jacklsw has quit [Ping timeout: 240 seconds]
jack_lsw has quit [Read error: Connection reset by peer]
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<gordonDrogon>
solrize, I've found RISC-V a complete joy after the ugliness of the 65'816. The 6502 I do like as I grew up with it, so know and understand it's limitations (compared to today's systems) but the '816 was just harder than it should have been for a c1983 CPU ...
wingsorc has quit [Quit: Leaving]
<[exa]>
gordonDrogon: btw did you manage to find any better algorithm for the muldiv thing?
<gordonDrogon>
[exa], not put more thought into it yet - got diverted with some home DIY work...
<gordonDrogon>
however that looks very good, thanks.
<gordonDrogon>
guessing mips has a delayed branch slot thingy.
jjido has joined #riscv
<[exa]>
yeah.
<solrize>
gordonDrogon, i feel like the lack of int overflow detection is a big error. other than that it is mostly nice
jmdaemon has quit [Ping timeout: 240 seconds]
<gordonDrogon>
solrize, and shifting into a "carry" bit - things that are not impossible to overcome, but would help save a cycle or 2 in places.
<gordonDrogon>
the code density is better than the 65816. My litle bytecode VM is 1/3 the size (in bytes) of that in the '816 ...
seninha has joined #riscv
<muurkha>
yeah, I agree with the complete joy
seninha has quit [Quit: Leaving]
<gordonDrogon>
and some years back I said I'd never program in assembler again... Lost far too many brain cells to it!!!
<muurkha>
heh
<[exa]>
this is actually one of the reasons I was asking about small dev tools here some time ago
<[exa]>
like, even writing a simple compiler+assembler (and even a simple optimizer) for rv is a toy project for a few days at most, doable with students
<[exa]>
quite incomparable with the last time I tried to explain one certain other history-ridden ISA
<muurkha>
chuck moore said the i386 was the worst architecture he ever worked with
<[exa]>
how did you know which arch I'm talking about? :D
<muurkha>
I wonder how much of the modern viewpoint that writing an operating system is inevitably horrifically difficult comes from using overcomplicated hardware
<la_mettrie>
so he didn't work with 286?
<muurkha>
well he said "the PC"
<gordonDrogon>
one day I'll write a "proper" compiler but it's so-far eluded me.
<muurkha>
my friend Ann wrote an operating system that was her company's main cash cow for a few years, by herself
EchelonX has quit [Quit: Leaving]
<muurkha>
for the SDS 940 I think
<gordonDrogon>
I've written a little OS in BCPL.. If only I could turn that into a cash cow ;-)
<[exa]>
muurkha: overcomplicated hardware and totally misdirected programming languages -- yeah that makes 99% :D
<[exa]>
and, yeah, shooting too high
<muurkha>
with memory protection?
<muurkha>
I think writing an OS that is a cash cow is maybe more difficult now than it was in 01968
<gordonDrogon>
mine? no - not yet, anyway. the current BCPL compiler puts static data after functions - no concept of text/date sections...
<gordonDrogon>
there is essentially no linker either.
<muurkha>
neat
<muurkha>
is the 65816 as much of a pain for reentrant code as the 6502?
<gordonDrogon>
however an entire program could be prevented from touching memory outside it's own footprint, so at the worst it stomps all over it's own code, but nothing else ...
<gordonDrogon>
muurkha, it has a 16-bit stack pointer, so if you use the stack then it can be easier.
<muurkha>
yeah, I think that kind of fault isolation is pretty convenient
<muurkha>
does it have stack-pointer-relative addressing modes?
<gordonDrogon>
I think so, but I've not had the need to use them.
<gordonDrogon>
my entire bytecode interpreter runs without a subroutine call... BCPL keeps it's own stack.
<muurkha>
oh, naturally
<gordonDrogon>
if I were compiling bcpl to native then it is something I'd look at as BCPL uses stack data extensively.
<muurkha>
having reasonably cheap SP-relative or FP-relative addressing helps a lot with reentrant code
<muurkha>
I was complaining the other day about the 8080's inability to handle PIC or FP-relative addressing and a friend said
<muurkha>
<friend> calling the 8080's decision to not add more transistors to connect an adder on jump when its 4500 transistors was already a lot is amusing
<gordonDrogon>
well - it was the mid 70's ...
<muurkha>
but the PDP-8, which was only about 4000 transistors when Intersil made a CMOS version of it, did support PIC
<gordonDrogon>
bcpl bumps the stack pointer before a function call to 'hide' all it's local variables, so the called function can use as much stack as it wants.
<gordonDrogon>
and accessing the first 14 stack variables is efficient at the code level as there are 14 one-byte instructions to load/store them. The compiler sorts the local variables based on usage to make sure the commonly used ones are hopefully in that first 14 ...
<muurkha>
which it managed by having jump offsets *replace* the low 11 bits of the PC rather than *adding* to it. so you could relocate a page of code anywhere
<gordonDrogon>
I have a PDP8 here :)
<muurkha>
cool! what kind?
<gordonDrogon>
It's an 8/a, so not the nice flappy switch ones but I have a couple of PiDP8's too.
<muurkha>
neat!
<muurkha>
are those 14 one-byte instructions in the bytecode your interpreter interprets?
<muurkha>
interesting, it's an instruction set implemnted with gas macros?
<muurkha>
in what sense is LP 6 "one byte" then?
<muurkha>
I was thinking that if you wanted to save on transistors and still efficiently support reentrant code you could use a PDP-8-like approach using in-RAM stack register windows, where you bump your stack pointer by 8 or 16 words on entry to a subroutine, and there are load and store instructions with a 3- or 4-bit field that replaces some of the low-order bits of the stack pointer
<muurkha>
that way you don't need an extra adder or a delay during microcode addition to calculate the effective address for a local variable access
<muurkha>
SPARC-style register windows where the register window moves on entry and exit, but not by an entire window size, would still require an adder for the effective address calculation, but those aren't really necessary
<muurkha>
having one-byte instructions for local variable access makes a significant difference in code density
<gordonDrogon>
LP6 has the value 134 ...
<gordonDrogon>
it's just a number .. could be anything. there are 254 opcodes in the bytecode..
<gordonDrogon>
I used a few macros as that's what I used in the '816 version - pushAB in RV land equates to mv regB,regA - in '816 land it's 4 lines of code.
seninha has joined #riscv
<muurkha>
oh, this is the RISC-V implementation of some of the bytecodes in the interpreter, which is interpreting the bytecode you compile BCPL to?
BOKALDO_ has joined #riscv
BOKALDO has quit [Read error: Connection reset by peer]
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<joev>
(a question from above, the '816 does have a stack-relative addressing mode, so you can do things like LDA(5,S),Y)
<muurkha>
oh nice! thanks joev!
<muurkha>
that seems like it would make a huge difference for compiling C and Pascal
<joev>
Yeah, passing arguments via the stack is a lot easier.
<gordonDrogon>
muurkha, yes!
<gordonDrogon>
I think the (one of the) issues with the '816 is that it was just in the too little, too late category - so there was little incentive for people to write/adapt existing C/Pascal/etc. compilers for it.
<sorear>
the 6809 looks quite nice although I've never been in a position to use it seriously
<gordonDrogon>
another johnny come lately though...
<muurkha>
yeah, I guess that although the Super Nintendo was very widely deployed, not many people got the chance to develop for it
<sorear>
those 24 bit pointers will still be fun for C
<muurkha>
I used a 6809 a lot when I was a kid, but unfortunately never learned its assembly
<muurkha>
CoCo at my daycare center
<gordonDrogon>
and half my struggle with bcpl on the '816 has been the 64KB banks of RAM. BCPL wants linear addressing.
<muurkha>
I've wondered if that kind of thing could be dealt with usefully with an actors language
<gordonDrogon>
I treated everything as 32-bit in bcpl. Not quite sure how the WDC C compiler does it. (or the other VBCC?)
<muurkha>
treating different structs as almost different processes which you context-switch between on function calls
<gordonDrogon>
one thing I'm liking about my RV implementation is that I can keep all state/data for the byecode VM entirely in registers - which is sort of the same as using 64 bytes of zero page in the '816 but a billion times faster for the same clock speed...
epony has quit [Read error: Connection reset by peer]
<muurkha>
well, that depends on your RAM, doesn't it?
<gordonDrogon>
well - it's only 8-bits wide in the '816 - that doesn't help at all.
<gordonDrogon>
another frustration is that when in 16-bit mode you can't load an 8-bit value from RAM - it loads a 16-bit value which you then have to mask out, so you lose a cycle reading RAM and lose 3 more with the AND #$00FF )-:
<muurkha>
interesting!
<gordonDrogon>
that is (sadly) the core of the bytecode fetcher, so wasting 4 cycles is a big thing at 16Mhz ...
<gordonDrogon>
I still like the 6502 though, but now wishing I'd just gone directly to RV, however building my own '816 board and implementing BCPL, overcoming some of the interesting issues was fun, so there is that.
wingsorc has joined #riscv
<sorear>
that's what you get when your opcode space is full
<gordonDrogon>
it wasn't aimed at home users, so it as sort of obscure.
<muurkha>
ah
<gordonDrogon>
they're quite rare and sought after now.
<muurkha>
so's your old man
<gordonDrogon>
er, he's somewhat dead and scattered locally ...
<muurkha>
condolences
<gordonDrogon>
it was some time ago.
<gordonDrogon>
I'm not exactly a "millenial" ;-)
jjido has joined #riscv
jjido has quit [Client Quit]
<muurkha>
don't be silly, millenials don't exist
<gordonDrogon>
I did get one of the early Acorn Archemedes computers - and it was amazingly fast, even for an 8Mhz CPU at that time.
<gordonDrogon>
that was an intersting bit of editing for me - to enable me to compare the 2 bits of code side by side. Interestingly 20 bytes (816) vs. 24 bytes (RV)
<gordonDrogon>
it's in-lined at the end of every bytecode instruction handler....
<gordonDrogon>
on the '816 I was sacrificing any amount of RAM just to save a cycle or 2 ...
epony has joined #riscv
<muurkha>
does the 29 instructions include the instruction fetch time?
<gordonDrogon>
it's 29 clock cycles on the '816.
<muurkha>
sorry, I meant 29 clock cycles
<gordonDrogon>
and yes, that's the total time.
<gordonDrogon>
So lda [] is a 2-byte instruction that takes 7 clock cycles to execute.
<muurkha>
I wonder how far you could get with speeding up software on small processors by loading inner loops into a small writable microcode store so you don't have to fetch the instructions from RAM again on every iteration
<muurkha>
like, you could think of the 64 18-bit words of RAM on the x18 core in the GA4 and GA144 as being such a microcode store
<muurkha>
that's about 7000 transistors
<muurkha>
you could probably get a MISC "microcode engine" and a bytecode interpreter in it into another 7000 transistors or so and another 64 words (=256 instructions) of ROM
<muurkha>
then you could compile most of your program into the bytecode, but selectively rewrite inner loops in the microcode and load them into a small writable microcode store on loop entry
<muurkha>
that might be a lot more approachable than writing your entire program in ArrayForth or whatever the fuck it's called this year
<gordonDrogon>
and the 6502 is only 2500 transistors to start with ;-)
<gordonDrogon>
oops, 3500...
<muurkha>
if you don't count the pullups, yeah
<muurkha>
but you probably should count the pullups
<muurkha>
especially for purposes of comparison with the 8080 and the Intersil PDP-8 chip (what was that called again?)
<muurkha>
since those are CMOS, so a 2-input NAND or NOR is 4 transistors
<sorear>
you're perilously close to reinventing caches
<muurkha>
I don't think you can build a CPU with a useful cache in 15000 transistors, sorear
<sorear>
or more of an instruction scratchpad/tightly-coupled-memory in this case
<muurkha>
yeah, it's like a TCM alternative to i$
<gordonDrogon>
intersil 6100..
<muurkha>
right, thanks!
<muurkha>
4000 transistors but in NMOS probably would've been 3000
epony has quit [Ping timeout: 240 seconds]
<gordonDrogon>
if only I knew more about the internals of chip design...
<gordonDrogon>
I spent the best part of a year writing 8080 code though (using a z80 syntax assembler apparently) then I maintained it on/off for a couple of year after that, but I really do not have the same feel for 8080 as I do for the 6502. I've not touched it since that project (ended about '83)
<muurkha>
I feel like the 6502 is a better design?
<muurkha>
(than 8080 or than RL78, not than RISC-V)
<muurkha>
256 8-bit registers instead of 7, two index registers instead of one, and effectively a several times higher clock speed despite more primitive fabrication technology
<muurkha>
well, 259 if you include A, X, and Y
<gordonDrogon>
:)
[ has quit [Ping timeout: 240 seconds]
<gordonDrogon>
the usual 6502 vs z80 'wars' of the early 80's more or less said that a 4Mhz Z80 could just about keep up with a 1Mhz 6502...
<muurkha>
yeah, and the 8080 couldn't run at 4MHz
<muurkha>
the 6502 had less compact code though
<gordonDrogon>
so with the z80 being an improvement over the 8080 then it leaves it a little bit far behind..
<gordonDrogon>
the poject I worked on with an 8080 (actually an 8085) was a real-time blood analysis machine.
<muurkha>
neat
<gordonDrogon>
today an arduino would run the lot but then it took an 8085 with a big IO board to make it all go - nd 16KB of hand written assembler.
<muurkha>
yup
<gordonDrogon>
at the same place I made up little 6502 boards to act as an IO controller for an automated factory test-bed we were doing a lot of research into. 2-boards, one CPU with 128 bytes of RAM and 2K EPROM the other with opto isolated outputs/inputs + relays to control 24v solenoids...
<gordonDrogon>
arduino of the day - eurocard cpu board - the industrial IO wouldn't be much different now though.
Noisytoot has joined #riscv
<muurkha>
not familiar with eurocard
<muurkha>
Noisytoot: you've de-[ed
<muurkha>
I wonder if now more of the industrial I/O would have an RS-485 or CAN bus interface on it instead of just 24 volts
<muurkha>
or even Ethernet
<muurkha>
apparently General Instrument introduced the PIC16 in 01975 and Intel introduced the 8048 in 01976
<muurkha>
those might also be "the arduino of the day"
<muurkha>
my dad worked on a project around that time that used I think an HP 9825 desk calculator as the "arduino"
<muurkha>
01981 I mean
<muurkha>
which was maybe a more reasonable choice when they started the project in 01976 or so than when they finished it
<muurkha>
I think you could get an 8748 with 2K EPROM onboard?
<gordonDrogon>
there was a "microcontroller" with a eprom window that I remember - but I wasn't the one programming it - just the programmer using it - it was part of the timing control for a high resolution (at the time) video generator...
<gordonDrogon>
1280x1024x32bpp...
<gordonDrogon>
or x8bpp but we could gang 3 cards together for 24 bit output, but a cpu per colour to speed stuff up. late 80's.
<gordonDrogon>
(transputer was the CPU)
<muurkha>
that's insanely high resolution, yeah
<muurkha>
must've been expensive
<gordonDrogon>
yea, they were - at the time.
<gordonDrogon>
but in terms of the rest of the system, just a small part - supercomputing with the transputer (and latterly i860)
<muurkha>
a lot of people were still using "calligraphic" (i.e. vector) displays when they needed such high resolutions
<gordonDrogon>
one of our biggest non-military customers was Toyota - they used a system to do high resolution rendering of their cars with realistic colour, shading, etc.
<muurkha>
it's insane to me that you can do real-time raymarching in a dynamically-typed language now
<muurkha>
the whole thing is only 51 lines of Lua
<gordonDrogon>
it was raytracing back then but I wasn't on the applications side - back-end hardware testing / drvice drivers and working with the hardware guys.
<muurkha>
pretty jerky in a 1280×1024 window tho
<gordonDrogon>
I did a raycasting trial a fe years ago.. in BASIC ..
<muurkha>
have you seen rossum's NTSC generation on an ARM?
<gordonDrogon>
I almost re-coded it in C to add into my basic interpreter - with the aim of having a basic version of doom...
<gordonDrogon>
er, no, but I've generated PAL on an ATmega.
<muurkha>
yes but this is with color
<gordonDrogon>
never twice same colo(u)r ;-)
<muurkha>
heh
ivii has quit [Remote host closed the connection]
<gordonDrogon>
can't be any worse than the apple II ..
<muurkha>
that was a pretty funky palette
<gordonDrogon>
although here in the uk we needed an extra card plugged in to get colour ...
<gordonDrogon>
and different clock xtal for PAL.
<muurkha>
huh, I didn't know that
<gordonDrogon>
the apple II europlus ...
vagrantc has joined #riscv
<gordonDrogon>
looking at the Gowin FPGA in the Tang 9K - looks like it's capable of HDMI video output which will be nice to have a little stand-alone RISC-V system with video in a retro sort of way...
<gordonDrogon>
at least that's my hope but they only made 300 in the first batch, so who knows for the future...
<muurkha>
neat, I hadn't heard of the Tang 9K
<muurkha>
what do you think of the Raspberry Pi Pico pioasm coprocessor?
<gordonDrogon>
I have one on order - with the LCD display.
<gordonDrogon>
don't know - I don't have one. A bit anti-Pi right now, so I've not even looked at them.
<gordonDrogon>
the anti-Pi is also sort of anti-arm, which is why I'm here.
EchelonX has joined #riscv
<muurkha>
it seems like an interesting alternative to the "FPPA" multithreading approach taken for programmable real-time peripherals in the 3¢ Padauk processors
<muurkha>
well I guess the multithreading doesn't show up until the high-end 12¢ processors
<muurkha>
ARM is okay but it's definitely not the breath of fresh air RISC-V is
<gordonDrogon>
well yes.
<gordonDrogon>
I remember looking at ARM assembler when they first cam out (Acorn Arcimedes computers) and not doing a lot with it - then moving to sparc and i860 and thinking wow - much nicer and I've never looked at ARM assembler since.
<gordonDrogon>
I was heavily into Pi until about 2.5 years back, but things change.
cwebber has joined #riscv
<muurkha>
SPARC is pretty nice, but I still think RISC-V is nicer
<muurkha>
what were you doing with the Pi?
<gordonDrogon>
I don't remember too much sparc now - and I shudder at the braincell loss of the i860 with it's dual-instruction mode ...
<gordonDrogon>
I wrote/maintained a GPIO library called wiringPi.
<muurkha>
never looked at the i860
<gordonDrogon>
^don't
<muurkha>
oh! I didn't realize you were the one who wrote wiringPi
<gordonDrogon>
:)
<gordonDrogon>
so the first thing I'll do when I get my BCPL going on some 'real' risc-v hardware is write a bcpl version of wiringPi, er, wiringRV for it ;-)
<muurkha>
gordonDrogon: yeah, there's a whole distribution of jerkiness in the public, and whenever you do something that is accessible to a lot of people that right tail of jerkiness gets multiplied by the volume of people
<muurkha>
very jerky people are especially likely to contact you or do something that otherwise impacts you
<muurkha>
I first learned about this working in retail, at Taco Bell
<muurkha>
condolences
<muurkha>
gordonDrogon: how would you design a CPU instruction set to be ideal for bytecode interpreters?
<gordonDrogon>
hi - sorry been away for tea time.
<gordonDrogon>
muurkha, interestingly the Inmos Transputer is close - the designers of that were BCPL programmers ...
<gordonDrogon>
but how many different bytecode interpreters are there ... the bcpl/cintcode one is sort of stack (register) based in that it's stack to rega which pushes A into B, then add then store back to stack ...
<gordonDrogon>
although it has a concept of a fusion operation in that there is an add stack variable N to regA, then you might follow that with a store back to the stack.
<gordonDrogon>
but only add/sub not mul/div.
<gordonDrogon>
https://unicorn.drogon.net/z.b.txt is simple FOR loop with the bytecode the compiler outputs with my comments in the bytecode.
<gordonDrogon>
the SP8 might seem odd, but that's SP4 in the called function which is the 2nd argument to writef()
aerkiaga has quit [Remote host closed the connection]
<gordonDrogon>
The Transputer has 3 registers in it's stack and works in a similar way - all data to/from a local stack.
<gordonDrogon>
one issue that I think could be made better here is if the compiler output word aligned data - e.g. there is a load word opcode but the next 4 bytes are in-line, so might not always be word aligned - if the compiler were to word align by inserting blanks then yes, code size would increase but at the minimal expense of a faster word (or halfword) load.
<gordonDrogon>
my current load word code is 14 instructions long...
KombuchaKip has joined #riscv
wingsorc has quit [Quit: Leaving]
mauz has quit [Quit: Leaving...]
riff-IRC has quit [Read error: Connection reset by peer]
riff-IRC has joined #riscv
jjido has joined #riscv
aburgess has joined #riscv
epony has quit [Ping timeout: 240 seconds]
jmdaemon has joined #riscv
epony has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<muurkha>
I thought you said C was "only used for byte indirection though (picking/placing) bytes."
<muurkha>
oh!
<muurkha>
PBYT implicitly uses C as the source
<gordonDrogon>
I get confused. the manuals says: The registers A and B are used for expression evaluation, and C is used in in byte subscription.
epony has joined #riscv
<gordonDrogon>
oh, looking a few lines up: % fetches/stores a byte and ! fetches/stores words.
vagrantc has quit [Quit: leaving]
epony has quit [Ping timeout: 240 seconds]
seninha has quit [Quit: Leaving]
seninha has joined #riscv
<muurkha>
I wonder what the reason for Cintcode's 2-register stack is
<muurkha>
I used a 2-register stack in https://github.com/kragen/calculusvaporis in order to save on transistors, but I don't think that reasoning is applicable to Cintcode
<muurkha>
Richards designed it, right? maybe he's explained the rationale
cwebber` has joined #riscv
cwebber has quit [Ping timeout: 250 seconds]
aerkiaga has joined #riscv
prabhakarlad has quit [Quit: Client closed]
jmdaemon has quit [Ping timeout: 256 seconds]
Sofia has quit [Ping timeout: 276 seconds]
GenTooMan has quit [Ping timeout: 240 seconds]
Sofia has joined #riscv
Bluefoxicy has quit [Ping timeout: 256 seconds]
Bluefoxicy has joined #riscv
mahmutov has joined #riscv
epony has joined #riscv
cwebber` has quit [Ping timeout: 256 seconds]
jjido has joined #riscv
jmdaemon has joined #riscv
<gordonDrogon>
Yes. Martin Richards.
<gordonDrogon>
We have exchanged a few emails, but he's not replied to the last one - he is in his 80's now I think.