<f_ridge>
<x2x6_/D> When I prepare WRITE_MULTIPLE command - I have to set source address to DMA control block, and then run DMA. I also actually do a whole bunch of things - like resetting the channel
<f_ridge>
<x2x6_/D> why?
<f_ridge>
<clever___/D> writing to flash is a 2 step process, first it has to be erased, then written
<f_ridge>
<clever___/D> behind the scenes, the SD card has its own filesystem, where it keeps track of where each sector lives in flash
<f_ridge>
<clever___/D> and it only has a limited amount of flash pre-erased
<f_ridge>
<GitHub Lines/D> static void bcm2835_emmc_setup_dma_transfer(int dma_channel, int control_block,
<f_ridge>
<GitHub Lines/D> ```
<f_ridge>
<clever___/D> if you use fstrim or `CMD38`, you can tell the SD card that you dont need a given block
<f_ridge>
<clever___/D> so the firmware is then free to pre-erase things ahead of time
<f_ridge>
<clever___/D> and then when it comes time to write, it can skip the erase step
<f_ridge>
<x2x6_/D> Hm.
<f_ridge>
<x2x6_/D> I was not aware that it does anything like PREERASE. maybe this is a good hint.
<f_ridge>
<clever___/D> a quick&dirty way to test things from linux, `blkdiscard /dev/sdX1` will just pre-erase the entire partition, scorched earth, nothing will survive
<f_ridge>
<clever___/D> then eject, and re-insert, and confirm its actually erased, some SD cards dont support it, and silently ignore the command
<f_ridge>
<clever___/D> then repeat your write tests, and see if its faster
<f_ridge>
<clever___/D> some USB SD adapters may not support discard, so you may need to use `/dev/mmcblk0p1` in a supported device
<f_ridge>
<x2x6_/D> good advice, lets try it
<f_ridge>
<clever___/D> when i tested one of my cards, i did a discard, and then immediately ejected the card, there was no way it could have written 32gig in that time
<f_ridge>
<clever___/D> and upon re-inserting it, the entire card was "blank"
<f_ridge>
<x2x6_/D> Nah, operation not supported
<f_ridge>
<clever___/D> was that via a usb adapter or mmc?
<f_ridge>
<x2x6_/D> via adapter
<f_ridge>
<clever___/D> got any laptops with proper mmc?
<f_ridge>
<x2x6_/D> I probably need to implement cmd38 first
<f_ridge>
<x2x6_/D> nope
<f_ridge>
<clever___/D> or a pi that can boot from usb
<f_ridge>
<x2x6_/D> Do you have a datasheet at hand for CMD38?
<f_ridge>
<x2x6_/D> I want to check it now, maybe its faster just to code it
<f_ridge>
<clever___/D> > Certainly! CMD38 is a specific command used in the context of SD (Secure Digital) cards, which are commonly used in cameras, smartphones, and other devices for storing data. In the SD card protocol, CMD38 is a command used to initiate an erase operation on the card.
<f_ridge>
<clever___/D> >
<f_ridge>
<clever___/D> > Here’s a more detailed explanation of how CMD38 works and its significance:
<f_ridge>
<clever___/D> >
<f_ridge>
<clever___/D> > 1. **Command Structure**: In the SD card protocol, commands are issued by the host device (e.g., a camera or smartphone) to the SD card. CMD38 is one of these commands.
<f_ridge>
<clever___/D> >
<f_ridge>
<clever___/D> > 2. **Purpose**: CMD38 is used to erase data from the SD card. Specifically, it erases a specified range of sectors. Sectors are the smallest individually addressable units of storage on the SD card.
<f_ridge>
<clever___/D> >
<f_ridge>
<clever___/D> > 3. **Parameters**: When CMD38 is issued, it includes parameters that specify:
<f_ridge>
<clever___/D> > - The starting sector from which erasure should begin.
<f_ridge>
<clever___/D> > - The number of sectors to be erased.
<f_ridge>
<clever___/D> >
<f_ridge>
<clever___/D> > 4. **Response**: After receiving CMD38, the SD card processes the command and provides a response indicating the success or failure of the erase operation.
<f_ridge>
<clever___/D> >
<f_ridge>
<clever___/D> > 5. **Usage**: Applications typically use CMD38 when they need to clear or erase specific data on the SD card. For example, a camera might use CMD38 to erase all photos in a particular folder when instructed by the user to format that folder.
<f_ridge>
<clever___/D> >
<f_ridge>
<clever___/D> > 6. **Security Considerations**: The CMD38 command is significant in terms of security and data management. It allows for efficient erasure of data when needed, ensuring that sensitive information can be securely removed from the SD card.
<f_ridge>
<clever___/D> >
<f_ridge>
<clever___/D> > Overall, CMD38 is part of the set of commands defined by the SD card standard and is essential for managing data on SD cards effectively. Its implementation ensures that devices can perform erase operations efficiently and securely, contributing to reliable data management in SD card-based storage systems.
<f_ridge>
<clever___/D> but nothing you could actually use
<f_ridge>
<clever___/D> the official pdf is more helpful in implementing it
<f_ridge>
<clever___/D> /8 gives 100mbit max, 14mbit on the first write, 2.51mbit on the rest(edited)
<f_ridge>
<x2x6_/D> interesting observation.
<f_ridge>
<x2x6_/D> So. I need to implement code that counts amount of data that VCOS sends in compressed h264 packets each second and see how much is that in bitrate.
<f_ridge>
<clever___/D> to me, it feels like the flash can sustain 2.5mbit, but can do short 14mbit bursts(edited)
<f_ridge>
<x2x6_/D> But first I want of course to test erase.
<f_ridge>
<x2x6_/D> So I was not thinking with bursts in mind. -
<f_ridge>
<clever___/D> there should also be parameters on the h264 encoder, that let you set the desired bit rate
<f_ridge>
<x2x6_/D> If you continuously write to flash - then looks like its fifo is not very useful because its always stuffed.
<f_ridge>
<x2x6_/D> If you write from time to time - then fifo works.
<f_ridge>
<clever___/D> something your more likely to see in streaming video, is dynamically changing the encoder bitrate, to suit the link speed
<f_ridge>
<x2x6_/D> Once I've did test to checkout the fifo depth, It was much more than documented - 16k
<f_ridge>
<x2x6_/D> Also, Interesting, if its event possible, probably there is such topic on our favourite forum
<f_ridge>
<clever___/D> from my understanding, the pi0-pi4 have no native command queuing, and at least for write-single mode, you transfer the sector, then wait for completion, then transfer another sector, and wait for completion
<f_ridge>
<x2x6_/D> Yes.
<f_ridge>
<clever___/D> for write-multiple, i can see how you can burst packets over the SD bus, and they could build up in a fifo on the SD card itself
<f_ridge>
<clever___/D> and then a single completion signals the entire batch
<f_ridge>
<clever___/D> but i think there is also a completion between each packet, so you dont overload that FIFO
<f_ridge>
<clever___/D> i'm fuzzy on what tricks a card might play to speed that all up
<f_ridge>
<clever___/D> for write-multiple, i can see how you can burst sectors over the SD bus, and they could build up in a fifo on the SD card itself(edited)
<f_ridge>
<clever___/D> > For block oriented write data transfer, the CRC
<f_ridge>
<clever___/D> > check bits are added to each data block. The card performs 1 or 4 bits CRC parity check (See Section
<f_ridge>
<clever___/D> > 4.5) for each received data block prior to the write operation. By this mechanism, writing of erroneously
<f_ridge>
<clever___/D> > transferred data can be prevented.
<f_ridge>
<clever___/D> this line from the pdf, implies that the SD card will buffer the entire sector before doing a write, to confirm the transfer was good
<f_ridge>
<clever___/D> but i can picture a way to cheese that, and write early with the option for rollback
<f_ridge>
<x2x6_/D> Ok, I've implemented CMD38
<f_ridge>
<x2x6_/D> actually block_erase is a sequence of CMD32, CMD33 and CMD38
<f_ridge>
<clever___/D> yep
<f_ridge>
<x2x6_/D> The write itself is blazing fast now
<f_ridge>
<x2x6_/D> the second screen is the actual dev->ops.write operation
<f_ridge>
<clever___/D> ideally, you would run an erase command on the entire partition (or at least, all free space), and then leave the card powered for some unknown amount of time, with no writes
<f_ridge>
<clever___/D> the erase command has 2 parts, first is just updating the metadata to declare the blocks as "erased", much like deleting a file on a pc
<f_ridge>
<clever___/D> and the second is actually erasing the blocks, which still takes time to do
<f_ridge>
<x2x6_/D> Yes, But for speed of trying this out today - I just have put a 300ms delay after earse
NightMonkey has quit [Ping timeout: 272 seconds]
<f_ridge>
<x2x6_/D> But tomorrow I will wipe the whole partition at start
<f_ridge>
<clever___/D> i notice you have a `is_blocking_mode` flag in there
<f_ridge>
<clever___/D> so is the stop time actually waiting for the write to complete?
<f_ridge>
<x2x6_/D> Yes, its for early use after boot while scheduler is not activated
<f_ridge>
<x2x6_/D> stop time?
<f_ridge>
<clever___/D> the `ts2` variable
<f_ridge>
<clever___/D> is that after the write has transfered all data, or just after the command started
<f_ridge>
<x2x6_/D> this is the place we are actually waiting for irq
<f_ridge>
<x2x6_/D> and then when we wake up after irq , then go up couple of levels to the bcm2835_emmc_data_io function and log ts2 - ts1
<f_ridge>
<clever___/D> do you have any dma completion IRQ's?
<f_ridge>
<x2x6_/D> Ok, I will check it tomorrow then.
<f_ridge>
<x2x6_/D> It would be very interesting to speed up writes to emmc, but yes, already now the numbers are strange.
<f_ridge>
<x2x6_/D> The numbers go almost directly from ST_CLO
<f_ridge>
<x2x6_/D> YES, DMA completion IRQ is also in the same file
<f_ridge>
<clever___/D> can you print the timestamp for dma completion as well?
<f_ridge>
<clever___/D> i can see there being 3 different times when you can get an irq
<f_ridge>
<clever___/D> first, when the SD controller has finished sending the command over the CMD pin, thats something like 96bits (not sure exactly), send 1 bit per clock, over the CMD pin
<f_ridge>
<clever___/D> second, is when the busy signal goes away, and data transfer can begin over the DAT pins, but dma will have pre-filled the host FIFO some by then
<f_ridge>
<clever___/D> third, is when dma finishes writing to the host FIFO
<f_ridge>
<clever___/D> and fourth, is when the host FIFO runs dry, the emmc finishes sending data over DAT, and the card signals completion
<f_ridge>
<clever___/D> first, when the SD controller has finished sending the command over the CMD pin, thats something like 48bits (not sure exactly), send 1 bit per clock, over the CMD pin(edited)
<f_ridge>
<clever___/D> first, when the SD controller has finished sending the command over the CMD pin, thats something like 48bits, send 1 bit per clock, over the CMD pin(edited)
<f_ridge>
<x2x6_/D> I have printed DMA completion timestamps ( but not included them in the math)
<f_ridge>
<x2x6_/D> So you were right to guess, that I don't wait for dma completion
<f_ridge>
<clever___/D> 5403 uSec would come out to 48mbit
<f_ridge>
<x2x6_/D> I don't remember why though. I remember that I have solved something this way. But forgot what exactly)
<f_ridge>
<clever___/D> i think the reason it works currently, is because when you go to start the 2nd write, half the hardware is busy with the 1st one, so it stalls before even starting
<f_ridge>
<clever___/D> and then things even out and each write takes the proper time
<f_ridge>
<clever___/D> having that erase and sleep hides it in a place your not measuring
<f_ridge>
<x2x6_/D> ok
<f_ridge>
<x2x6_/D> the second one you mean the data transfer is slow because fifo is filled with something else?
<f_ridge>
<clever___/D> according to this math, sending a 48bit command at 25mhz, would take 1.92 uSec
<f_ridge>
<clever___/D> if your using the same dma channel for every write, then you must wait for the dma to be idle before starting another dma operation? and then your capturing the 1st write in the time the 2nd write took
<f_ridge>
<x2x6_/D> Thanks a lot for getting involved.
<f_ridge>
<x2x6_/D> I have to go to sleep, tomorrow will rework this part with clear head.
<f_ridge>
<clever___/D> sure
<f_ridge>
<clever___/D> i need to get more into SD as well, i dont even have write working on my stack
<f_ridge>
<x2x6_/D> )
<f_ridge>
<clever___/D> interesting, i removed the sd sniffer from the loop, and my pi1 still cant init the card, when using this custom firmware
<f_ridge>
<clever___/D> ```
<f_ridge>
<clever___/D> [6266619.011693] mmc0: new ultra high speed SDR104 SDXC card at address 59b4
<f_ridge>
<clever___/D> signal voltage: 0 (3.30 V)
<f_ridge>
<clever___/D> driver type: 0 (driver type B)
<f_ridge>
<clever___/D> ```
<f_ridge>
<clever___/D> and this is the old as dirt card that i was testing earlier
<f_ridge>
<clever___/D> it runs at 50mhz 4bit SDR, with 3.3v IO, that should get around 200mbit, and thats also the limit i believe pi0-pi3 hit, which translates to 25MB/s
<f_ridge>
<clever___/D> this seems to imply, that during write-multiple, the card can cheat, and claim not busy (done) quickly on a block, and buffer/batch things internally
<f_ridge>
<clever___/D> and it just has to fully flush everything (within 500ms), when claiming the final block is done (or when CMD12 says to halt a write-multiple)
<f_ridge>
<clever___/D> so it may falsely claim some sectors are written quickly, then the last sector was slow, but then internally do a bigger write of all sectors at once
<f_ridge>
<clever___/D> and with some more debug, i can see that in both cases, i send CMD0 then CMD8
<f_ridge>
<clever___/D> but the lexar responds to 8, while the sandisk? just gives a timeout error, and retrying doesnt help
<f_ridge>
<clever___/D> and now i realize, CMD8 includes the voltage ranges the host supports
<f_ridge>
<clever___/D> and if the card doesnt accept them, it should just ignore the command!
<f_ridge>
<clever___/D> a 5 (b101) turned into a 7 (b111)
<f_ridge>
<clever___/D> a 1 turned into a 3
<f_ridge>
<clever___/D> the card is clearly upset by the overclock, but i can freely run this lexar card anywhere from 1mhz to 50mhz, and switch dynamically
<f_ridge>
<clever___/D> but the bigger problem, is that the checksum errors didnt fire, so this resulted in silent corruption
Stromeko has quit [Quit: Going… gone.]
Stromeko has joined ##raspberrypi-internals
<f_ridge>
<clever___/D> ```
<f_ridge>
<clever___/D> ] sdhost_div 250
<f_ridge>
<clever___/D> # 1mhz
<f_ridge>
<clever___/D> ] sdhost_bench
<f_ridge>
<clever___/D> 9268798 uSec to read 1MB
<f_ridge>
<clever___/D> 0.905 mbits/sec
<f_ridge>
<clever___/D> # 2mhz
<f_ridge>
<clever___/D> 4793175 uSec to read 1MB
<f_ridge>
<clever___/D> 1.7501 mbits/sec
<f_ridge>
<clever___/D> # 5mhz
<f_ridge>
<clever___/D> 2097183 uSec to read 1MB
<f_ridge>
<clever___/D> 3.999 mbits/sec
<f_ridge>
<clever___/D> # 10mhz
<f_ridge>
<clever___/D> 1193253 uSec to read 1MB
<f_ridge>
<clever___/D> 7 mbits/sec
<f_ridge>
<clever___/D> # 25mhz
<f_ridge>
<clever___/D> 674121 uSec to read 1MB
<f_ridge>
<clever___/D> 12 mbits/sec
<f_ridge>
<clever___/D> # 50mhz
<f_ridge>
<clever___/D> 643383 uSec to read 1MB
<f_ridge>
<clever___/D> 13 mbits/sec
<f_ridge>
<clever___/D> ```
<f_ridge>
<clever___/D> this implementation isnt using dma, and you can see that it clearly tops out around 12-13mbit
<f_ridge>
<clever___/D> its also not using read-multiple correctly