ChanServ changed the topic of #armlinux to: ARM kernel talk [Upstream kernel, find your vendor forums for questions about their kernels] | https://libera.irclog.whitequark.org/armlinux
mraynal has quit [Remote host closed the connection]
mraynal has joined #armlinux
<marex>
hanetzer: like openwrt does ?
<hanetzer>
marex: yeah. and we've determined the issue. and its (probably) not my fault :D
<hanetzer>
its reading the squashfs magic as 0x73717360, which is wrong. should be 0x73717368
<marex>
hanetzer: you're missing a bit there
<hanetzer>
aye
<hanetzer>
(it truly is 0x73717368, however)
<hanetzer>
dirty patching it to 60 gives a new error, so I think their driver is borked.
apritzel_ has quit [Ping timeout: 246 seconds]
<marex>
er yes, the vendor bsp method of disabling error checking often doesnt work
<marex>
are you using /dev/mtdX or /dev/mtdblockX ?
<hanetzer>
the latter.
<marex>
hum
<marex>
hanetzer: that's some QSPI NOR, isn't it
<marex>
reduce the bus frequency for starters ?
<hanetzer>
yeh.
<marex>
(and width maybe ?)
<hanetzer>
hrm. I know how to do the former.
<hanetzer>
I don't know how to do the latter.
<marex>
git grep spi-max-frequency = <...hz...>; , it is often used for flash@ nodes
<marex>
the later ... spi-rx-bus-width / spi-tx-bus-width
<marex>
and spi-cpol / spi-cpha in case of severe oddity
<marex>
that changes clock polarity / phase, but that's unlikely to be the issue here
<hanetzer>
yeah. I know the spi-max-freq bit.
<hanetzer>
heh. for this flash controller on this device, there are only three valid rates. 24, 83, and 150mhz
<hanetzer>
ok, 83 gets the same results
<marex>
24 ?
<hanetzer>
24 crashes it and sends it back to bootrom. trying 150
<marex>
that's some bsp kernel fork ?
<hanetzer>
nope. mainline. I'm writing it. but the sfc driver is mainline and untouched by me. (well, aside from adding some printk to check things)
<marex>
hanetzer: is there some specific pattern in those missing bits ?
<hanetzer>
I only see the missing 8
<hanetzer>
so the last half byte, I guess.
<marex>
did you write the data into the SPI NOR yourself ?
<marex>
(and are you sure the data in the SPI NOR are something else than what you're reading out ? could it be you are reading out the right thing, but you wrote into it the wrong thing ? )
<hanetzer>
yeah, I did write it myself. I tftp'd into memory using u-boot and write it with sf
<hanetzer>
reading it back in u-boot produces data one would expect.
<marex>
hanetzer: maybe you can compare the controller configuration (and pinmux ?) between u-boot and linux ?
<marex>
that might provide some hint
<marex>
try md in u-boot and devmem in linux
<hanetzer>
well, can't get to devmem in linux yet, good point however.
<marex>
hanetzer: add pr_err("%x\n", readl(...)); into the driver ?
<hanetzer>
doable, sure. lemme check the pinm configs in the u-boot first eh.
<hanetzer>
hrm. the sfc driver doesn't have pinctrl support :)
<marex>
it doesnt have to, the pinctrl driver should be separate
<marex>
or is the hisi IO directly routed to the SoC edge ?
<hanetzer>
yeah, they have a separate pinctrl thing. I was just looking at the amba serial, thought it may be like that.
<hanetzer>
two blocks of 0x800, one handles the pinfunc and the other the pin's drive stuff.
<hanetzer>
and I don't know what you mean by routed directly to the soc edge; as in, you can't turn these pins into other things?
<hanetzer>
pinmux setup transferred. no joy.
Pali has quit [Ping timeout: 240 seconds]
<hanetzer>
it seems their driver doesnt respect spi-{tx,rx}-bus-width
<hanetzer>
fuck it. booting a vendor kernel to do some poking.
<hanetzer>
yeah, they're using the 150mhz clock... and my pinconf regs are correct
silkroadrunner has joined #armlinux
CounterPillow has quit [Ping timeout: 245 seconds]
CounterPillow has joined #armlinux
silkroadrunner has quit [Remote host closed the connection]
<marex>
hanetzer: check clock, controller register programming, ... you might spot a difference
apritzel_ has joined #armlinux
archetech has joined #armlinux
apritzel_ has quit [Ping timeout: 240 seconds]
headless has joined #armlinux
iivanov has joined #armlinux
iivanov__ has joined #armlinux
iivanov__ has quit [Client Quit]
iivanov has quit [Ping timeout: 246 seconds]
iivanov has joined #armlinux
iivanov has quit [Quit: Leaving...]
<hanetzer>
marex: tbf, I don't. comparing across all the mainline'd chips code and datasheet, its 'the same'. a number of 'root' fixed rate clocks, a char * array naming the clocks the fmc can chose from, a u32 array showing the bit values for chosing those clocks, and an entry of 'hisi_mux_clocks' binding those together, along with the clock register offset, width of the bitfield, and the shift of the
headless has quit [Quit: Konversation terminated!]
<marex>
hanetzer: I think mtdblockX read triggers 512 Byte read , right ?
<marex>
I assume your mtdblockX placement is at least so much aligned (check cat /proc/mtd)
<marex>
what happens if you write some pattern into this block (probably use 64k alignment, because that's the usual erase block size of SPI NOR) and then read it back in Linux, do you spot some pattern in the missing bits ?
<hanetzer>
well, can't check proc/mtd as it currently stands, but the dmesg log shows them as being at least 1k alignment, yes
<marex>
1k is weird
<marex>
SPI NOR erase block size is usually 64k or if it supports small pages it is 4k
<marex>
hanetzer: which SPI NOR do you have on that board ?
<marex>
Look at CONFIG_SQUASHFS_4K_DEVBLK_SIZE and CONFIG_MTD_SPI_NOR_USE_4K_SECTORS
<hanetzer>
macronix,mx25l25635e
<hanetzer>
yeah, 4k at least. 0x5000 is u-boot, 0x20000 is u-boot-env, 0x700000 is kernel, 0x300000 is rootfs, 0x1500000 is currently unused
<marex>
20000 is 128k aligned
<marex>
which way is CONFIG_MTD_SPI_NOR_USE_4K_SECTORS set in your kernel config ?
<hanetzer>
yes, which is *at least* 4k
<hanetzer>
=y
<marex>
turn it off , just for a test
<hanetzer>
k
<marex>
hanetzer: CONFIG_SQUASHFS_4K_DEVBLK_SIZE also =y ?
<hanetzer>
yes, turn off?
<marex>
try it
<hanetzer>
k, bootin'
<hanetzer>
(also, on the bsp kernel it uses the 150mhz clock)
<hanetzer>
same result :P
<marex>
hanetzer: write a test pattern into the SPI NOR and read it back in Linux, compare
<marex>
hanetzer: maybe you can spot some pattern in the missing bits, like only the 4 LSbits of each 32bit WORD are missing or some such
<marex>
that could mean you have the wrong amount of ... errr ... those bits between command/address and data cycle
<marex>
there are a few clock cycles which are empty
<marex>
"dummy cycles"
<hanetzer>
possible. unfortunately I can't get live access to the system with serial/network :/
<marex>
how so ?
<hanetzer>
entire reason I'm trying to do flash shiz is because no amount of coersion gets me serial access in a tftp booted rootf.cpio.gz with the mainline kernel
<hanetzer>
(same rootfs.cpio.gz works fine with a bsp kernel)
<marex>
so you do get output from the kernel, but not in the initramfs ?
<hanetzer>
well, I get normal bootup messages with kernel+initramfs, up until 'freeing init mem' or so
<marex>
add "init=/bin/sh rdinit=/bin/sh" to kernel bootargs and make sure your initrd contains /dev/console (mknod /path/to/initrd/dev/console c 5 1) ?
<hanetzer>
yeah, one sec.
<hanetzer>
same results. strange thing is, normally when you have a complete kpanic/hang/whatever, hitting 'enter' in the serial window does nothing. but I can input into it just fine
<Xogium>
I'm thinking that some part of your kernel freeze but not others
<Xogium>
or will reboot as soon as there's a panic
<marex>
that's panic=-1
<hanetzer>
'normal' bootargs when trying flash is root=/dev/mtdblock3 console=ttyAMA0,115200 slub_debug=FZP debug dyndbg="module squashfs +pf"
<hanetzer>
trying with panic=1 now
<Xogium>
marex: uh really ? I didn't know. I thought -1 would be infinite timeout like not setting it
<Xogium>
my bad then
<marex>
Xogium: =0 is infinite wait, see Documentation/admin-guide/kernel-parameters.txt
<Xogium>
but… yeah, sometimes you don't even see that the kernel panicked, but it did
<marex>
Xogium: here I would expect wrong /dev/ content in the initramfs , that does look like a symptom of it
<marex>
(the lack of console that is)
<hanetzer>
'Rebooting in 1 seconds..' and hang :P
<Xogium>
if you don't have early debug uart enabled and it panics sooner than when it initializes the regular console for example… Got bitten by that once several times over
<Xogium>
hanetzer: ouch so it does panic…
<hanetzer>
anywho. gonna try printk'ing some mtd rwops
<marex>
hanetzer: maybe just copy the entire /dev/ content from your host into the initramfs , and make sure your initramfs is not too large btw
<marex>
(do you use built-in initramfs or external separate cpio blob ?)
<hanetzer>
external.
<hanetzer>
u-boot doesn't like the kernel with built in initramfs, says it'll overwrite u-boot
<marex>
dont you need large enough initrd support compiled into kernel for that ?
<marex>
(do you have it ? how large is the unpacked initrd ?)
<hanetzer>
ah, unpacked, one sec.
<hanetzer>
the plain cpio is 14m
<Xogium>
oww that is big
<marex>
hanetzer: grab busybox and crosscompile it in some reduced configuration ...
<marex>
that should be smaller
<Xogium>
I wonder if the initramfs isn't overriding part of the kernel in memory
<hanetzer>
its the hwdb files. I can remove those and make it smaller.
<marex>
how could it ? the kernel has to allocate the tmpfs for it, so that would be a kernel bug
<marex>
hanetzer: yes
<Xogium>
hmm, yeah you're right
<Xogium>
I was thinking that maybe kernel and initramfs actually overlapped in memory, on the u-boot side
<hanetzer>
doesn't help it takes like three tries for tftp to actually happen on vendorboot :P
<Xogium>
that said, never seen a kernel that doesn't freeze the serial console after a panic
<marex>
if the kernel is compressed, you would most likely get decompressor error
<marex>
with uImage/fitImage , you would surely get error due to checksum mismatch (assuming vendor stuff did not inhibit that)
<Xogium>
ah
<hanetzer>
hopefully not.
<hanetzer>
(vendor u-boot is pretty shit tbh)
<Xogium>
as most of them are ;p
<Xogium>
I'm almost scared of the version number of u-boot, if they picked kernel 2.6
<hanetzer>
anywho. food is about to arrive. ttyl
<Xogium>
ahah, enjoy
Turingtoast has joined #armlinux
<hanetzer>
cpio rebuild finished. much smaller. 4.2m uncompressed.
Turingtoast has quit [Quit: My iMac has gone to sleep. ZZZzzz…]
<hanetzer>
I don't suppose you guys know if there are any mtd tracers in the kernel?
<marex>
hanetzer: maybe you should get that initrd working first
<hanetzer>
its going to end up being something stupid I know it.
<marex>
it always is
<marex>
hanetzer: oh, right ... what happens if you take e.g. a code with empty main() function (or with printf() in there) , cross compile it as static binary , and use it for init ?
<marex>
just stick the init which is like a half a MiB file into the initramfs , keep /dev/ content, delete all the rest
<marex>
maybe add for(;;); before the return 0 ...
djrscally has joined #armlinux
<hanetzer>
just did, trying.
<hanetzer>
yeh no output
<marex>
hanetzer: uh
<marex>
Documentation/filesystems/ramfs-rootfs-initramfs.rst "Contents of initramfs:"
<marex>
hanetzer: try 'rootwait=1' ... the kernel would panic if it cannot mount initrd
<marex>
(or root)
<hanetzer>
yeh.
<hanetzer>
whelp. made a small dummy initrd like mentioned in the above docs, just null console zero and ttyAMA0 aside from the init file. embedded it in the kernel, same results as every other initrd boot.
<marex>
hanetzer: no dice ?
<marex>
hanetzer: which ARM core is this hardware, ARM9 or newer ?
<hanetzer>
yep. fine all the way up to 'freeing unused kernel image (initmem) memory 1024K' and then nothing.
<hanetzer>
uh, sec.
<hanetzer>
arm cortex a7
<marex>
oh, so likely not that toolchain uses some unsupported opcode
<hanetzer>
yeah. says it has neon/fpu as well.
<hanetzer>
VFP support v0.3: implementor 41 architecture 2 part 30 variant 7 rev 5
<j`ey>
hanetzer: have you tried: keep_bootcon on the kernel command line?
<hanetzer>
never even heard of it.
<marex>
speaking of bootcon, you can try sysrq over serial (meta-f h in minicom I think)
<marex>
if that prints anything, then the kernel is not dead yet
<marex>
you can then even ask the kernel for backtrace on all cores (read the help that comes out of sysrq-h above)
<hanetzer>
blerg. not familiar with minicom heh. and 'what' is meta again, here?
<marex>
the meta-f is "send break"
<marex>
meta=alt
<hanetzer>
hrm.
<hanetzer>
well, there's a flash of something then no result.
<hanetzer>
(yes, I enabled magic sysreq in .config)
<marex>
flash of something ?
<hanetzer>
yeah. the terminal 'flashes'
<marex>
could it be your terminal swallows the break ?
<hanetzer>
maybe? alacritty minicom.
<hanetzer>
installing xterm.
<hanetzer>
yeh no response.
<marex>
uh
<marex>
hanetzer: err ... boot with clk_ignore_unused pd_ignore_unused in bootargs
<hanetzer>
man, this is getting more and more arcane lmao
<hanetzer>
could misconfigured dma on the uart do it?
<marex>
I would expect that to fail sooner
<hanetzer>
and no, those ignore_unused args don't do anything different than I've seen before.
<marex>
they prevent the kernel from disabling unused clock/power domains when it starts init, so that could've made difference
<marex>
oh well
archetech has quit [Quit: Konversation terminated!]
apritzel_ has joined #armlinux
<hanetzer>
haha, got it !
<hanetzer>
tried a different usb/uart adapter, some (probably) fake ftdi mini board
<marex>
hanetzer: err ... what ? :)
<hanetzer>
marex: the serial adapter I was using was a blackpill turned into a black magic probe (its all I had on hand after my 5+yo usb to serial adapter died), I recently got in 12x 'ftdi' usb/serial boards.
<j`ey>
hanetzer: so you get the login prompt/whatever now?