sedevidi wrote: ↑09 Apr 2022, 11:57
From Wikipedia, Flash and EEPROM are nowadays very similar, except that Flash are erasable by whole pages, which enables high speed and high density, while EEPROM are erasable byte-by-byte, which leads to much lower density and speed, but much better endurance.
Dug up the ATmegaXXU2 datasheet, and apparently the EEPROM on this series does actually consist of 4-byte pages, though provisions are made for single-byte read/write through a page buffer. So, agreed I'm barking up the wrong tree here.
sedevidi wrote: ↑09 Apr 2022, 11:57
Reading « Manual page dfu-programmer(1) » (0.6.1-1+b1 on Debian), I can find a "flash-eeprom" command, but no "--eeprom" option to the "flash" command. flash-eeprom "Writes to eeprom memory."
There is also an "erase" (but no documented "--force" option), which "Erases all the flash memory. This is required before the bootloader will perform other commands."
The manual also says "You will normally need to start by issuing the "erase" command; the default security policies prevent extracting firmware, to prevent reverse engineering of what is usually proprietary code."
This might be the reason for the "erase" before anything else. Thus the written EEPROM byte is much probably a "first run boolean" that the QMK firmware uses to initialise all its EEPROM book-keeping.
FYI, active development on dfu-programmer appears to have come to a standstill; even so, 0.6.1 is
extremely old (April 2013), so kind of nuts that the most current Debian package is still on this version. The last published version from current maintainers was 0.7.2 from February 2015, which is the version that QMK Toolbox ships with. An HTMLized version of the man page from 0.7.2 is available
straight from their Github repo, and reveals that the command-line parameters were reworked (likely in 0.7.0), explaining the discrepancy.
But I suspect your conclusion is correct. The man page mentions that "for AT90 and ATmega type devices a chip erase must be performed before other commands become available," so even if you want just to write to EEPROM through the Atmel DFU, apparently you still have to do a complete flash erase. Seems bonkers to me, but...
Reading between the lines, this maybe isn't a 100% hard-and-fast rule, and likely has something to do with the security fuse bits Atmel has made available on this product. The man page goes on later to describe dfu-programmer's "setsecure" option:
"setsecure: Sets the security bit on AVR32 chips. This prevents the content being read back from the chip, except in the same session in which it was programmed. When the security fuse is set, almost nothing will work without first executing the erase command. The only way to clear the security fuse once set is to use a JTAG chip erase, which will also erase the bootloader."
...and then in the Known Issues section near the end:
"To remove any write or read protection from any chips, a full chip erasure is required. For AVR32 chips an erase operation over USB will remove protection
until the device is rebooted. To remove the protection more permanently requires a JTAG erase (which will also erase the bootloader)."
[emphasis added]
Some Googling around and perusing of other forums (e.g. AVRFreaks) suggests that if you purchase ATmegaXXUx chips from the channel that have been preloaded with Atmel's DFU bootloader,
the security bits are also pre-enabled on these products. The only way clear those bits would be to use an ISP/JTAG to erase the entirety of flash, which in turn wipes out the stock DFU bootloader, leaving you with a brick. In theory you could re-burn the (not-open-source) Atmel DFU bootloader back to flash using the same interface but without setting the security bits this time, but that would of course require that you possess a copy of said bootloader. I've managed to find where Atmel has released the binary for the DFU bootloader they wrote for the ATmega32U4, but so far have struck out on the 32U2 version, and it seems unlikely they'd be willing to talk to me unless I'm a direct customer of theirs (likely with a support contract of some kind, maybe even enforcing an NDA first...you know how these companies are,
*sigh*). And even if I could dig the bootloader code up, the wcass xwhatsit controller that ships with these keyboards does not (to my knowledge) expose an ISP header. Not that that's a complete impediment to wiring up an ISP to the chip on the board, but it does present an additional challenge. (In theory, you don't actually "need" the Atmel DFU bootloader to have a working keyboard controller. You could just flash QMK itself to the bootable area of flash, foregoing USB DFU ability entirely. It just means that every time you want to reflask QMK or update the keyboard controller firmware, you have to break out your ISP, which is of course way less convenient than reflashing via the USB side.)
(More and more I'm wishing that xwhatsit had simply spent the additional $5-10 or whatever to use the U4 variant of the microcontroller in his design...)
Anyway, tying all of the loose ends together: dfu-programmer man page talks in seeming absolutes about this in certain places, but that's probably just because -- given that dfu-programmer is specifically designed to talk to the Atmel stock DFU bootloader -- the vast, vast majority of people will be using dfu-programmer to interact with AVR chips that have had the security bits pre-set at the factory, and so the reality is that for all intents and purposes, you must first do a complete erase of the flash chip (which then unlocks all read/write functions,
but only up until the next reboot) before you can do anything else. And this is why QMK Toolbox performs a flash erase before it can do an "EEPROM wipe".
And circling back around to your theory: it's very likely not writing all 0s (or all FFs) to the whole 1K worth of EEPROM in order to "wipe" it, because QMK itself probably just checks the contents of the first byte in order to determine whether to trust the contents of the rest of EEPROM or not ("bookkeeping", as you say). That way it can avoid unnecessarily exercising/wearing out the EEPROM prematurely. (Of course, scrutiny of the QMK source should be able to confirm or disprove this theory.)
sedevidi wrote: ↑09 Apr 2022, 11:57
All this leads me to think that there might be a bug in the VIAL EEPROM book-keeping, which then needs the full Flash and EEPROM erase then reflash to work correctly. Missing the EEPROM erase migh lead to VIAL behaving like CoolPenguin1 described.
Or it's just like the man page says: "erase (Flash + EEPROM) is required before any other command"... or else too many possibility of bugs might arise from misaligned program in Flash vs. data in EEPROM.
Since I suspect that the requirement to do an erase before "any other command" is related to the ATmega flash security bits issue & not some kind of data "misalignment", your first proposed theory strikes me as the more likely one. This actually makes all the more sense when you consider that QMK itself I believe hardly uses EEPROM for anything...if you run straight-up QMK on your controller without any VIA or Vial bits, the entire keyboard layout/map is hard-coded in flash: if you want to change it, you have to re-compile and re-flash. So likely either Vial is failing to take into account what little EEPROM "bookkeeping" QMK does, and/or is failing to properly do any of its own.
This is hardly the only Vial-related memory-adjacent bug I've run into, if so. I'm still running up against what seems to be some sort of memory (SRAM) corruption issue in later releases of Vial that to this day is preventing me from making working builds of Vial firmware for the New Model Fs if I use a source tree checkout dated past September 2021 and also enable all of the various dynamic Vial features (Combos + Tapdance + Mousekeys etc.). But this seems to
only be a problem when I build recent Vial code specifically for xwhatsit controllers using the pandrew QMK driver, and isn't a problem for other QMK-supported keyboard controllers designed around the ATmega32U2, which suggests that there is a bad interaction specifically between pandrew's code and recent Vial code. But this is another discussion for another time (hopefully soon)...