This is the next post in my series about upgrading my old Chumby 8’s kernel. Here are links to part 1 and part 2 if you missed them. As a quick summary, I got U-Boot working in part 1 and then got the SD card working in part 2. In this part I’ll describe the complicated process of how I got Wi-Fi working.
The Chumby 8/Insignia Infocast 8 has a built-in AzureWave AW-GH321 802.11g module. This is a pretty old module that doesn’t even support 802.11n, so it maxes out at 54 Mbps and the link is an archive.org link because it’s nowhere to be found these days. The module makes use of the Marvell 88W8686 chipset, which connects through the SDIO bus. SDIO is basically just the same as SD, except it’s for I/O devices like Wi-Fi and Bluetooth modules instead of SD cards. This wireless chipset is supported by a Linux driver called libertas.
I knew this was going to be a challenge because I never quite succeeded at getting Wi-Fi working during my first attempt at upgrading the kernel in 2013. At the time, I ran into weird issues that went away when I had debugging output enabled. I decided to start over from scratch. Here’s a high-level list of things I knew I needed/wanted to accomplish:
- Get USB working in order to use a USB-Ethernet adapter with an NFS root filesystem for convenience
- Enable a second SD host controller in the device tree
- Enable the libertas driver in the kernel config
- Add the 88W8686 firmware to my root filesystem in /lib/firmware
- Get wpa_supplicant working
- Get NetworkManager working
The reason I wanted to boot from NFS was so that I could easily make changes to kernel modules and programs without having to swap the SD card out. The YMODEM solution I used in U-Boot would be far too slow for this.
It turns out that getting USB working was almost too easy. There are already drivers in the kernel for the host controller (ehci-mv) and PHY (phy-pxa-usb). All I had to do was hook them up in the device tree:
usb_host_phy: usb-phy@d4206000 { compatible = "marvell,pxa168-usb-phy"; reg = <0xd4206000 0x40>; #phy-cells = <0>; status = "okay"; }; usb_host: usb@d4209000 { compatible = "marvell,pxau2o-ehci"; reg = <0xd4209000 0x200>; interrupts = <51>; clocks = <&soc_clocks PXA168_CLK_SPH>; resets = <&soc_clocks PXA168_CLK_SPH>; clock-names = "USBCLK"; phys = <&usb_host_phy>; phy-names = "usb"; status = "okay"; };
This is a perfect example of how this process is supposed to theoretically work with well-supported SoCs and peripherals. You just turn on the relevant drivers, add the correct hardware description to the device tree, and boom it works. In this case, everything went as planned and I immediately had USB support. I turned on drivers for mass storage and ASIX USB-Ethernet adapters. After enabling all of the necessary networking subsystems, NFS filesystem, NFS root support, and all the rest of that fun stuff, I was able to boot to a root filesystem on my development computer. I knew this would speed me up going forward.
It would have been cool to get NFS working in U-Boot too, but that would have required figuring out how to get the USB host controller working in U-Boot and I didn’t want to spend time on that. It’s okay though; since I had a working kernel, I could copy newer test kernels and device tree blobs to the SD card from inside of Linux and then load them from the SD card in U-Boot instead.
An NFS root filesystem is set up by passing a few arguments to the kernel. Here’s what I used:
root=/dev/nfs nfsroot=192.168.10.1:/chumbynfsroot,nfsvers=3,nolock ip=192.168.10.2:192.168.10.1::255.255.255.0::eth0 rootwait
This tells the kernel to boot using NFSv3 from an NFS server at 192.168.10.1 using the export called /chumbynfsroot. It sets the Chumby’s IP address to 192.168.10.2 with a subnet mask of 255.255.255.0. I’ve always had good luck with the “nolock” option when doing development testing with NFS booting. My experience has been that NFSv4 doesn’t work very well for this application so I stuck with NFSv3. With NFSv4 I was running into hangs on the host system when I tried to change things. Maybe someone more experienced with NFSv4 will see this post and can explain to me what I’ve been doing wrong. In the meantime I’m just going to stick with NFSv3.
Anyway, with USB and NFS booting working, I turned my focus to the main topic of this post: the Wi-Fi module. In the schematic, I saw that it was connected to the SDHCI2 peripheral. I started out by enabling the host controller in the device tree by adding a section like this (exactly the same as the SDHCI3 peripheral from the last post, with a different offset, interrupt, and clock):
sdhci2: sdhci@d4281000 { compatible = "mrvl,pxav2-mmc"; reg = <0xd4281000 0x1000>; interrupts = <39>; clocks = <&soc_clocks PXA168_CLK_SDH1>; resets = <&soc_clocks PXA168_CLK_SDH1>; clock-names = "PXA-SDHCLK"; mrvl,clk-delay-cycles = <0x1F>; non-removable; no-1-8-v; bus-width = <4>; max-frequency = <24000000>; cap-sd-highspeed; status = "okay"; };
I also assigned the correct pin functions for the SDHC controller pins in U-Boot, although later on I did figure out how to set up pin function assignments in the device tree in Linux instead.
After I added this second SDHC controller to the device tree, the kernel started printing a weird error. Of course it did. The entire error output would be crazy to share in this post, but here is the relevant snippet:
[ 2.312800] sdhci: Secure Digital Host Controller Interface driver [ 2.312924] sdhci: Copyright(c) Pierre Ossman [ 2.336182] 8<--- cut here --- [ 2.336206] Unhandled fault: imprecise external abort (0x406) at 0xc886f0fe [ 2.339415] pgd = (ptrval) [ 2.346425] [c886f0fe] *pgd=01431811, *pte=d4281653, *ppte=d4281552 [ 2.349236] Internal error: : 406 [#1] PREEMPT ARM [ 2.355534] Modules linked in: [ 2.360352] CPU: 0 PID: 26 Comm: kworker/u2:2 Not tainted 5.15.33 #10 [ 2.363412] Hardware name: Generic DT based system [ 2.369877] Workqueue: events_unbound async_run_entry_fn [ 2.374709] PC is at __sdhci_read_caps+0xd8/0x180
So basically, something bad was happening somewhere in __sdhci_read_caps. That function isn’t too crazy. It basically just resets the controller, reads some device tree properties, reads some registers in the controller, and saves a bitmask of capabilities. Something caught my eye pretty quickly: this function reads the SDHCI_HOST_VERSION register. Why is that special? Well, one of the older PXA168 SDHC patches that I had ignored dealt with that register!
The change in the patch is pretty simple. It detects 16-bit reads to the SDHCI_HOST_VERSION register and silently replaces them with 32-bit reads to the register 2 bytes before it, with some bit shifting to effectively return the content of the version register. I later learned that this is a workaround for a documented erratum in the PXA168 that affects two of the four SDHC controllers. 16-bit reads of the SDHCI_HOST_VERSION register cause an exception. The 32-bit read is the official workaround that Marvell suggested. The reason I hadn’t run into it earlier is because the host controller used for the internal SD card isn’t affected by the hardware bug. The host controller used for Wi-Fi is though. So I added the patch to my kernel.
It was around this point that I finally started working on adding a new compatible string to the sdhci-pxav2 driver so that I could keep the PXA168 quirks isolated since this driver is used by other hardware that isn’t affected by the problems I ran into. As of this writing, my changes to that driver are going through the Linux code review process. I submitted version 5 of the patch series today. You may notice if you clicked on that link that there are some other patches I haven’t described. Don’t worry, we’ll get to those later in this post!
With that problem solved, I was able to boot the kernel without any weird errors, and the SDIO card was detected:
[ 2.345175] mmc0: SDHCI controller on d4281000.sdhci [d4281000.sdhci] using DMA [ 2.352655] mmc1: SDHCI controller on d427e000.sdhci [d427e000.sdhci] using DMA ... [ 2.405899] mmc1: new SD card at address f259 [ 2.431421] mmc0: new SDIO card at address 0001
Interestingly, the Wi-Fi module was mmc0 and the SD card was mmc1. This makes sense because SDHCI2 is used for the Wi-Fi module and SDHCI3 is used for the SD card. I wanted them to be swapped though, which was easy to fix in the device tree:
aliases { mmc0 = &sdhci3; mmc1 = &sdhci2; };
Awesome! Next, I enabled the libertas driver in my kernel. I had to turn on CONFIG_LIBERTAS and CONFIG_LIBERTAS_SDIO. I also turned on CONFIG_LIBERTAS_DEBUG which turned out to be a very smart decision. I also knew I needed to install the firmware, which ended up being a breeze. In buildroot, I simply had to enable BR2_PACKAGE_LINUX_FIRMWARE_LIBERTAS_SD8686_V9, and it took care of the rest for me.
I was hopeful, but I also was skeptical because I remembered running into problems back in 2013. But I went ahead and installed wpa_supplicant and NetworkManager in buildroot. Then, crossing my fingers, I booted up the Chumby and set up a connection in NetworkManager for my access point.
As I’m sure you can guess, it didn’t work. If it had worked, it probably wouldn’t have been worthy of its own blog post in the series! What I found was that NetworkManager was giving me a list of all available access points, but it couldn’t connect. Since I had introduced a bunch of new stuff, I decided to do baby steps instead. First I would get wpa_supplicant working properly by itself, and then once I verified it was working, I would try NetworkManager again.
buildroot installs a simple default wpa_supplicant config file, so I stopped NetworkManager, killed the instance of wpa_supplicant that it automatically started, and manually started wpa_supplicant myself:
/etc/init.d/S45network-manager stop killall wpa_supplicant wpa_supplicant -c /etc/wpa_supplicant.conf -i wlan0
wpa_supplicant immediately started spitting out this error over and over again:
wlan0: CTRL-EVENT-SCAN-FAILED ret=-22 retry=1 wlan0: CTRL-EVENT-SCAN-FAILED ret=-22 retry=1 wlan0: CTRL-EVENT-SCAN-FAILED ret=-22 retry=1
Well, that’s not normal! I did some Googling and it quickly became apparent that other people had also been seeing this issue: in particular, one common combination was people who had upgraded to a newer version of wpa_supplicant and had Broadcom Wi-Fi chipsets.
I went ahead and manually built some older versions of wpa_supplicant. Sure enough, versions 2.5 and 2.6 didn’t have this problem, which was consistent with several forum posts I had found. I was able to scan for networks, add my network, and attempt to connect to it in wpa_cli:
scan scan_results add_network set_network 1 ssid "My Network" set_network 1 psk "MyPassword" enable_network 1
However, these older wpa_supplicant versions still didn’t work properly. They seemed to be successfully starting to connect to my network, but soon afterward, these kernel errors popped up:
[ 84.565634] libertas_sdio mmc1:0001:1 wlan0: command 0x0010 timed out [ 84.572155] libertas_sdio mmc1:0001:1 wlan0: Timeout submitting command 0x0010 [ 84.580047] libertas_sdio: Resetting card... [ 89.605636] libertas_sdio mmc1:0001:1 wlan0: TX lockup detected [ 92.645637] libertas_sdio mmc1:0001:1 wlan0: command 0x0024 timed out [ 92.652154] libertas_sdio mmc1:0001:1 wlan0: Timeout submitting command 0x0024 [ 92.660031] libertas_sdio mmc1:0001:1 wlan0: PREP_CMD: command 0x0024 failed: -110 [ 95.685637] libertas_sdio mmc1:0001:1 wlan0: command 0x0010 timed out [ 95.692172] libertas_sdio mmc1:0001:1 wlan0: Timeout submitting command 0x0010 [ 95.747407] mmc1: card 0001 removed
Great. Just great. It seemed that I was dealing with two separate issues: a problem preventing modern wpa_supplicant versions from working at all, and a problem that caused the libertas driver to time out when sending commands. Tracking down two different connection failure problems simultaneously isn’t a good recipe for success, so I temporarily stuck with the older wpa_supplicant and focused on the command timeout issue.
I spent a lot of time looking into the libertas driver to try to understand why commands could possibly be timing out. I found that I could set a libertas_debug kernel parameter to spit out more detailed info. So I did:
echo 0x406180 > /sys/module/libertas/parameters/libertas_debug
This enabled debugging output for SDIO, commands/responses, communication between host and WLAN chip, association, and scanning. The One Laptop Per Child wiki has a good page about this. The bits are also documented directly in the driver. I tried again after rebooting, and got a lot more output. There was way too much output to put in this post, but here’s a snippet of relevant debug messages:
[ 71.305005] libertas cmd: DNLD_CMD: command 0x001c, seq 37, size 12 [ 71.312176] libertas sdio: interrupt: 0x2 [ 71.316738] libertas sdio: interrupt: 0x1 [ 71.327515] libertas cmd: CMD_RESP: response 0x801c, seq 37, size 12 [ 71.353123] libertas cmd: DNLD_CMD: command 0x0050, seq 38, size 92 [ 71.360299] libertas sdio: interrupt: 0x2 [ 71.368780] libertas sdio: interrupt: 0x1 [ 71.379573] libertas cmd: CMD_RESP: response 0x8012, seq 38, size 58 [ 71.444255] libertas cmd: DNLD_CMD: command 0x0021, seq 39, size 20 [ 71.451360] libertas sdio: interrupt: 0x1 [ 74.485631] libertas_sdio mmc1:0001:1 wlan0: command 0x0021 timed out [ 74.492149] libertas_sdio mmc1:0001:1 wlan0: Timeout submitting command 0x0021 [ 74.500023] libertas_sdio: Resetting card... [ 74.548791] libertas sdio: interrupt: 0x3 [ 74.565763] libertas cmd: DNLD_CMD: command 0x0021, seq 40, size 20 [ 74.572390] libertas sdio: interrupt: 0x1 [ 74.577079] libertas cmd: CMD_RESP: response 0x8021, seq 39, size 20 [ 74.583478] libertas_sdio mmc1:0001:1 wlan0: Received CMD_RESP with invalid sequence 39 (expected 40)
In this trace, you can see that commands are sent to the module with a sequence number in order to ensure commands and responses go together. The first two commands in the trace (sequence numbers 37 and 38) work just fine. There’s an interrupt 0x2 (download) and 0x1 (upload), followed by a response received with a matching sequence number. When the third command (sequence number 39) is submitted, something goes wrong. The 0x2 interrupt is missing, and eventually the driver gets impatient and says that the command times out. So it tries to reset the card, and sends out another command (sequence number 40). After this, it gets a response! But the response is for sequence number 39. So basically, the driver thought that command 39 timed out, but it succeeded just fine. I suspected that the missing “interrupt: 0x2” was probably related to this mismatch.
By the way, this result was pretty random. The problem didn’t occur at the exact same place each time, but it was the same type of failure. My first idea was to try increasing the timeout. Maybe the driver doesn’t wait long enough for a response. Nope, that didn’t fix it. It just delayed detection of the timeout. The missing “interrupt: 0x2” was pretty suspicious to me, so I did something really naughty: I hacked the main SDHC driver to pretend that my controller didn’t have support for SDIO IRQs and instead would need to be polled. The PXA168’s controller does have support for SDIO IRQs, but my theory was that if I told Linux that it didn’t, it might help expose an interrupt-related problem.
To do that, I temporarily removed the MMC_CAP_SDIO_IRQ and MMC_CAP2_SDIO_IRQ_NOTHREAD capabilities in sdhci_setup_host(). I rebuilt my kernel and tried again with the old wpa_supplicant.
Success! My theory was correct. With SDIO IRQs handled using polling, wpa_supplicant was able to connect to my network without any problems.
wlan0: Trying to associate with xx:xx:xx:xx:xx:xx (SSID='My Network' freq=2412 MHz) wlan0: Associated with xx:xx:xx:xx:xx:xx wlan0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 wlan0: WPA: Key negotiation completed with xx:xx:xx:xx:xx:xx [PTK=CCMP GTK=CCMP] wlan0: CTRL-EVENT-CONNECTED - Connection to xx:xx:xx:xx:xx:xx completed [id=1 id_str=] wlan0: WPA: Key negotiation completed with xx:xx:xx:xx:xx:xx [PTK=CCMP GTK=CCMP]
I used ifconfig to set an IP address and did some stress testing with downloading and uploading. It worked fine, although the polling was definitely not ideal. This troubleshooting so far sounds like it all happened so easily, but in actuality my brain was fried at this point. I knew I had some kind of interrupt problem to deal with, and I wasn’t really looking forward to messing with it. I was jumping into drivers and subsystems that I knew nothing about, which is a common challenge you deal with when problems arise during board bringup.
To give my brain a break, I left the polling hack in place (don’t worry, we’ll come back to it further below), and decided to go after some lower hanging fruit with the original wpa_supplicant problem. What would it take to get newer wpa_supplicant versions working properly?
I did more Googling and determined that Arch Linux had already applied a patch to their wpa_supplicant to fix an issue with Broadcom drivers. The patch was submitted to wpa_supplicant’s mailing list by David Bauer almost a year ago, but as of this writing it still hasn’t been applied. As a summary, the problem is that Wi-Fi drivers in the Linux kernel specify how many bytes are available for information elements (IEs) to be added to scan requests, but wpa_supplicant doesn’t respect that limit. Modern versions of wpa_supplicant blindly attempt to add some additional scan IEs regardless of the driver’s limit. It fails on drivers that have a maximum scan IE length of 0, such as this driver and the out-of-tree broadcom-wl driver. Looking through the thread on the mailing list, it appears that the patch wasn’t accepted because there were some open questions about determining which IEs to omit if the maximum IE length is nonzero but still too small.
I applied the patch and tried again. I was sure it would work…but it didn’t!
wlan0: Trying to associate with xx:xx:xx:xx:xx:xx (SSID='My Network' freq=2412 MHz) wlan0: CTRL-EVENT-ASSOC-REJECT bssid=xx:xx:xx:xx:xx:xx status_code=1 wlan0: Trying to associate with xx:xx:xx:xx:xx:xx (SSID='My Network' freq=2412 MHz) wlan0: CTRL-EVENT-ASSOC-REJECT bssid=xx:xx:xx:xx:xx:xx status_code=1 BSSID xx:xx:xx:xx:xx:xx ignore list count incremented to 2, ignoring for 10 seconds
This was very surprising and incredibly frustrating. It created a completely different error instead! Some Googling pointed out that this can be caused by having random MAC addresses enabled during scanning, but that wasn’t causing my issue. I tried it both ways and nothing changed. It seemed that my access point was rejecting my association attempt. I even tried connecting to my phone’s hotspot instead, and it gave the same error, proving that it wasn’t a problem specific to my access point.
I settled on doing a Git bisect between wpa_supplicant 2.6 and 2.7. 2.7 was the first broken version. It pointed me to this commit, which changes the code around so that a “supported operating classes” IE is always added to association requests, rather than only if CONFIG_MBO is enabled. With the IE removed, wpa_supplicant 2.7 through 2.9 began working. Version 2.10 was still broken though! I performed a second bisect. It blamed this commit, which adds support for 802.11 Mirrored Stream Classification Service request frames. It seemed to be something related to QoS. Anyway, I played around and found that the part of this commit that adds a bit in wpas_ext_capab_byte() was the problem, so I temporarily disabled it.
And after I did that, wpa_supplicant 2.10 still didn’t work! This is the first time I’ve ever had to do three separate git bisects to track down an issue. The third bisect pointed out this commit. This third commit adds support for 802.11 Stream Classification Service request frames. It didn’t take me long to discover that this patch also set bits in wpas_ext_capab_byte() and that’s indeed the problem.
With all three of these commits disabled through silly code hacks, wpa_supplicant 2.10 finally worked. I posted about it on the hostap mailing list, but nobody cared enough to comment. I didn’t understand the problem, but all I knew was that it worked with those changes disabled. I left this problem alone and added my hacky wpa_supplicant patches to my buildroot distribution.
Before we go back to the command timeout problem, I do want to point out that months later, I looked deeper into the wpa_supplicant problem and realized that most of these problems were actually caused by a bug in the libertas driver. I made this discovery by sniffing wireless traffic using a separate Wi-Fi adapter in monitor mode and interpreting the packets with Wireshark. What I found was that the libertas driver had a bug when handling IEs in association requests. The code was written assuming there was only one IE added to association requests (for WPA or WPA2 security). The three wpa_supplicant commits linked above all caused additional association request IEs to be added, which confused the driver and caused it to send out a corrupted association request containing a wildcard SSID instead of the actual SSID. This is what caused the association request to be rejected.
I’m going through the process of submitting a fix to the libertas driver for this problem. During the review process, Dan Williams, who used to be the maintainer of the driver, also provided me with a suggestion for fixing the original issue with scan IEs. As soon as these patches are finalized and accepted (fingers crossed), the libertas driver will work out of the box with modern wpa_supplicant versions. No crazy patches will be required. As an added bonus, this will also improve support for Wi-Fi Protected Setup (WPS). Here’s an example captured 802.11 frame sent by my Chumby now that I’ve added support for WPS in probe requests:
Anyway, this story is not finished yet! I still needed to figure out the command timeout problem. I thought more about what I had learned. The problem happened inconsistently; the sequence number of the command that timed out was different during each boot attempt. Using polling to handle SDIO interrupts, as opposed to relying on the SDHC controller’s actual SDIO interrupt support, fixed the issue. Those two observations combined to lead me to believe that I was randomly missing some interrupts. I removed my polling hack and did a bunch of register reads and printk debugging to try to confirm the problem. It quickly became obvious that I was correct: the SDHC controller wasn’t dispatching some interrupts, even though the card was definitely signaling that an interrupt was ready. As a long shot, I posted on the libertas-dev mailing list about this problem.
James Cameron of OLPC wrote back to me and gave me a pointer to check out the OLPC kernel. The OLPC used this same wireless chipset in some revisions. Unfortunately, I wasn’t able to find anything relevant to my problem, but I still very much appreciated his info.
I pored through the code and noticed that the libertas driver doesn’t acknowledge SDIO interrupts right away, whereas the original Marvell sd8xxx vendor driver (bundled alongside the stock 2.6.28 Chumby 8 kernel) does. I hacked the libertas driver to acknowledge SDIO interrupts more quickly in the if_sdio_interrupt function, which actually improved the situation quite a bit. I was able to reliably connect to a network, but I found that during stress tests of big downloads, I would still eventually miss an interrupt and the whole driver would again get out of sync with a sequence number mismatch.
It was around this time that a question popped up in my head: was this problem even related to the new kernel at all? Eager to answer it, I went back to the original Chumby 2.6.28 kernel. The Marvell sd8xxx vendor driver didn’t ever miss interrupts, even during stress tests. But what about the libertas driver? It also existed in 2.6.28. I enabled the libertas driver in the old kernel and tried it instead. I was surprised to discover that it failed in the exact same way. This meant that there wasn’t anything special in the original Chumby kernel that made the interrupts work better. It seemed to be some kind of a design difference between the sd8xxx vendor driver and the libertas driver. The PXA168 simply liked the vendor driver better.
With further hackery and trial and error, I was able to observe that if I read back the IF_SDIO_H_INT_STATUS register in the libertas driver after acknowledging the SDIO interrupt, everything worked perfectly. So I had discovered a way to make the libertas driver work correctly in my newer kernel, but it required an ugly hack that I knew wasn’t actually necessary. The same Wi-Fi chipset was used in the OLPC and it didn’t need any hacks in order to work reliably.
I almost gave up at this point. I had a solution that worked. It was a little hacky and would be impossible to get working in the mainline kernel, but it worked. If I had been working on a commercial product, I probably would have had to move on and continue making progress on other stuff. Before I gave up, I did a little more searching and something cool happened: I discovered that the PXA168 errata document was actually public. The Linux kernel documentation even links to it! This is pretty rare for Marvell.
This errata document contains a really interesting erratum: SDIO CARD_INT interrupt may be missed when the DAT1 signal is asserted by the external device. This appeared to be exactly the kind of problem I was seeing! Yes, the weird out-of-sync sequence number and missing interrupts problem I was tracking down was not my fault and it wasn’t the libertas driver’s fault either. It was a hardware bug! I never found any evidence that Marvell submitted a kernel patch to work around this problem, so I was on my own to create one. I removed my hack to the libertas driver and tried to fix the problem the correct way instead.
I followed the process that Marvell suggested to work around the issue. I had to write a register to reset the data port logic, then send a fake SD command to restart the clock afterward. They also suggested to disconnect the CMD signal from the SD bus temporarily while performing the fake command so as not to confuse the connected device, but I ignored that part at first.
I really struggled my way through this. This required a deep dive into the MMC/SDHC subsystem to understand how to execute a command by hand. Some of my first attempts failed miserably. Eventually I figured something out, although it wasn’t the cleanest approach in the world. I’m not going to go into too much detail, but I was also able to figure out how to disconnect the CMD signal from the SD bus through the use of pinctrl states — basically the ability to change assigned pin functions on the fly through a common API. This required getting pinctrl working on the PXA168, but it wasn’t too bad because there were some sample implementations of how to do it on similar platforms using the pinctrl-single driver. I haven’t started working on upstreaming the pinctrl change yet.
Like I said earlier, all of my PXA168 SDHC driver changes are still going through the code review process. Adrian Hunter, who is the SDHC maintainer, told me that the way I was doing the fake SD command was wrong and helpfully provided a suggestion for a better way to handle it by manually writing registers to initiate the command.
I think that’s about enough to talk about in this post. That’s the process of how I got Wi-Fi working. Oh, and after I had wpa_supplicant and the libertas driver figured out, NetworkManager just worked out of the box with no problems whatsoever. I haven’t seen a single command timeout, and all stress tests pass just fine. The driver’s download speeds are just as fast as the original sd8xxx driver, as long as power save is disabled.
If you’ve been following this series, you may be seeing a pattern. Almost everything that I try to get working requires getting my head into some new kernel subsystem or technology that I have no experience with. So far in this series I’ve mentioned clocking, MMC/SDHC, pinctrl, and WiFi. There are more to come. I’m sure what I am doing is child’s play compared to what SoC vendors have to do when they get a new SoC working in the kernel. It has helped me gain a lot of respect for the effort that goes into the mainline Linux kernel development.
As an example of the deep dives I’ve been having to do, I had no idea what 802.11 information elements were when I began this project. Now I can open up Wireshark and recognize that “oh yeah, that IE in the probe request is for WPS and was added to the scan IE buffer by wpa_supplicant”. It can be frustrating having to deal with a bunch of kernel subsystems while you’re trying to solve a particular problem, but it’s also fun to look back and see how much you’ve learned when you’re all done. Now I know a little bit more about Wi-Fi than I used to. Same with SDIO.
In the next post, I’m thinking I’ll discuss the new driver I had to write for the Chumby 8 in order to enable the reboot and poweroff commands. This required diving into yet another kernel subsystem. I also handled reset/poweroff in U-Boot, but I kind of hand-waved it away when I mentioned it in part 1. I’ll be sure to discuss the kernel driver in more detail than what I did for U-Boot.
Click here to go to part 4, where I get the reboot and poweroff commands working.
[…] Doug Brown ☛ Upgrading my Chumby 8 kernel part 3: Wi-Fi […]
Thanks so much for this latest look- it’s really fascinating watching you puzzle through each bit, and slowly assemble a working whole.
Thanks Steve! It’s a lot of fun writing about it too!
I’ve learned much from these postings! I’m looking forward to having some free time to see if I can apply what you’ve learned to two additional PXA16X devices I have (GlobalScale Gplug-D and GTCam).
I’d love to hear how that ends up, Ray!
For everyone’s info, it appears my Wi-Fi patches are likely destined to be included in Linux 6.3.
The SDHC patches have also been accepted for Linux 6.3. I haven’t forgotten about this series by the way; needed to take a break for my iMac chime post!