Update on 2023/05/19: ASUS has publicly acknowledged the issue and provided an explanation and workaround of their own (rebooting, or a hard reset if the reboot doesn’t fix it). The original post is below:
When I woke up today around 6:45 AM PDT, I didn’t seem to have internet service available. My phone told me that I was connected to my Wi-Fi network, but it didn’t have connectivity. “Hmm, that’s weird,” I thought. Maybe a fiber cut in the area or something? I looked at my IRC client on my desktop Windows PC, which is nice because it records timestamps of when I lose my connection:
My connection had been down for over 3 hours at this point. Weird! I figured I would log into my ASUS RT-AC86U router’s web interface and see what was going on. Something happened that I wasn’t expecting at all — the page wouldn’t fully load. Portions of it showed the little “sad page” icon indicating a connection error.
I tried to SSH into the router instead. The first few connection attempts failed, and then finally I got in. What I found, though, was that I couldn’t run any commands. It just spit this error back at me:
-sh: can't fork
OK, so something was really messed up. I decided to power cycle the router at this point. Maybe some weird glitch happened or something. Which would be odd — this router has been pretty rock solid since I’ve had it, aside from 2.4 GHz Wi-Fi issues over time. That’s another story I don’t want to get into today.
Anyway, when the router came back up everything seemed fine. But then, 40 minutes later, my connection dropped again with the same symptoms.
The fact that they were both at exactly 23 seconds is probably just a crazy coincidence. I was starting to panic a bit at this point. I really didn’t think an issue like this could be my ISP’s fault, but I hadn’t changed a single thing about my network setup. I hadn’t updated my router firmware for quite a while either — I had automatic updates turned off, and last I had checked, ASUS hadn’t released a new update for it.
I was able to successfully SSH into the router this time, and I did a few quick diagnostics. I used top to show me what was going on. I sadly didn’t take any screenshots, but I noticed that a process called asd was taking up 50% of my CPU. The CPU is dual-core according to /proc/cpuinfo, so 50% likely means one core was fully pegged.
My first instinct was to search for asd (which was difficult with a non-working internet connection) but I found that it’s an ASUS security daemon. This made me feel a little bit better, but I still felt like it had to be involved in the problem. Normally when I SSH into my router, top doesn’t show anything using anywhere close to 50% of the CPU.
I started searching on Reddit and Twitter to see if anyone else had run into anything similar, and that’s when I spotted this tweet by @stevecantsmell:
The way he worded it, it sounds like he works for an ISP. This sounded so similar to my issue, even down to the time frame! That would correspond to 3 AM in my time zone. I followed his advice. I quickly rebooted the router and went right into the firmware update page in its web UI. Sure enough, I was running version 3.0.0.4.386.48260 and there was an update available for 3.0.0.4.386.51529 which was released last month. It turns out I had also missed a firmware release that came out in March. I do like to keep my router up to date, but I had been checking at a slower interval since there hadn’t been an update for about a year.
I was able to install the update. The router rebooted on its own after the update finished and everything has been fine since then. asd is no longer using 50% of the CPU either. In the hours since this problem occurred, I’ve heard of countless other people who ran into this exact same issue with a variety of ASUS routers. More people chimed in in the Twitter thread linked above, and there were several posts on Reddit and SNBForums. In some cases a beta firmware was required to fix the issue. It was comforting to know that I wasn’t alone, but also incredibly frustrating to hear that so many people were affected. I bet ISP tech support employees had a “wonderful” day today.
So…what exactly happened early this morning to set this whole thing off? Did ASUS’s asd program download some kind of faulty file from their servers that caused it to hang up? Was someone attempting a mass exploit on a vulnerability that was recently patched by ASUS? Did updating the firmware really fix the issue or did it just stop a chain of events that will restart itself again soon?
I don’t know, but here’s what I’ve been able to gather so far. It appears that the file /jffs/asd.log (and /jffs/asd.log.1, which I think is the rolled-over version containing previous entries) on my router was being filled with thousands of lines of the following error message:
1684335272[chknvram_action] Invalid string
The number appears to be a UNIX timestamp, corresponding to 7:54 AM PDT this morning, which is probably right around the time that I finally installed the firmware update. I’m guessing this was constantly being written to this log as soon as the problem began at 3:24 AM.
I also found these interesting messages in /jffs/syslog.log-1 at around the time the connection was first lost:
May 17 03:18:14 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update May 17 03:18:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7687)]retrieve firmware information May 17 03:18:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7702)]fimrware update check first time May 17 03:18:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7733)]no need to upgrade firmware May 17 03:18:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update May 17 03:18:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7687)]retrieve firmware information May 17 03:18:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7707)]fimrware update check once May 17 03:19:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update May 17 03:19:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7687)]retrieve firmware information May 17 03:19:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7707)]fimrware update check once May 17 03:19:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update May 17 03:19:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7687)]retrieve firmware information May 17 03:19:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7702)]fimrware update check first time May 17 03:19:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7733)]no need to upgrade firmware May 17 03:22:58 kernel: CPU: 1 PID: 12870 Comm: touch Tainted: P O 4.1.27 #2 May 17 03:22:58 kernel: Hardware name: Broadcom-v8A (DT) May 17 03:22:58 kernel: task: ffffffc01730eb00 ti: ffffffc0126dc000 task.ti: ffffffc0126dc000 May 17 03:22:58 kernel: PC is at 0xf6dcc90c May 17 03:22:58 kernel: LR is at 0xf6dccfd4 May 17 03:22:58 kernel: pc : [<00000000f6dcc90c>] lr : [<00000000f6dccfd4>] pstate: 400f0010 May 17 03:22:58 kernel: sp : 00000000fff3d154 May 17 03:22:58 kernel: x12: 00000000fff3d188 May 17 03:22:58 kernel: x11: 00000000fff3d1d4 x10: 00000000f76334c0 May 17 03:22:58 kernel: x9 : 00000000fff3d189 x8 : 00000000fff3d184 May 17 03:22:58 kernel: x7 : 000000000000000b x6 : 0000000000000000 May 17 03:22:58 kernel: x5 : 00000000fff3e8bc x4 : 00000000fff3d420 May 17 03:22:58 kernel: x3 : 000000006e69622f x2 : 00000000fff3e8bc May 17 03:22:58 kernel: x1 : 00000000fff3d420 x0 : fffffffffffffff2 May 17 03:23:17 kernel: CPU: 1 PID: 12894 Comm: touch Tainted: P O 4.1.27 #2 May 17 03:23:17 kernel: Hardware name: Broadcom-v8A (DT) May 17 03:23:17 kernel: task: ffffffc01730eb00 ti: ffffffc01151c000 task.ti: ffffffc01151c000 May 17 03:23:17 kernel: PC is at 0xf6dcc90c May 17 03:23:17 kernel: LR is at 0xf6dccfd4 May 17 03:23:17 kernel: pc : [<00000000f6dcc90c>] lr : [<00000000f6dccfd4>] pstate: 400f0010 May 17 03:23:17 kernel: sp : 00000000fff3d154 May 17 03:23:17 kernel: x12: 00000000fff3d188 May 17 03:23:17 kernel: x11: 00000000fff3d1d4 x10: 00000000f76334c0 May 17 03:23:17 kernel: x9 : 00000000fff3d189 x8 : 00000000fff3d184 May 17 03:23:17 kernel: x7 : 000000000000000b x6 : 0000000000000000 May 17 03:23:17 kernel: x5 : 00000000fff3e8bc x4 : 00000000fff3d420 May 17 03:23:17 kernel: x3 : 000000006e69622f x2 : 00000000fff3e8bc May 17 03:23:17 kernel: x1 : 00000000fff3d420 x0 : fffffffffffffff4 May 17 03:23:51 watchdog: restart_firewall due DST time changed(1->0) May 17 03:23:51 rc_service: watchdog 1807:notify_rc restart_firewall May 17 03:23:51 rc_service: watchdog 1807:notify_rc restart_wan May 17 03:23:51 rc_service: waitting "restart_firewall" via watchdog ... May 17 03:23:51 firewall: apply rules error(2857) May 17 03:23:51 firewall: apply rules error(2892) May 17 03:23:51 services: apply rules error(17779) May 17 03:23:51 firewall: apply rules error(4580) May 17 03:23:52 miniupnpd[5322]: shutting down MiniUPnPd May 17 03:23:53 DualWAN: skip single wan wan_led_control - WANRED off May 17 03:23:58 dnsmasq-dhcp[5503]: failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
So it did an auto firmware update check at 3:18 AM (again, I have auto updates turned off) and then 3 minutes later, the kernel got mad about something. As you can see at the bottom, other things started to fail too. The dnsmasq error clearly indicates that there was no space available in /var/lib/misc. /var is mounted as a tmpfs, so I think this means the router was out of RAM.
It looks like the auto firmware check is pretty common to see every morning in the log, although it did fail on Monday if that’s relevant:
May 15 03:18:05 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update May 15 03:19:17 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7737)]could not retrieve firmware information: webs_state_update = 1, webs_state_error = 1, webs_state_dl_error = 0, webs_state_info.len = 23 May 15 03:19:46 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update May 15 03:21:11 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7737)]could not retrieve firmware information: webs_state_update = 1, webs_state_error = 1, webs_state_dl_error = 0, webs_state_info.len = 23 May 15 03:21:40 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update May 15 03:22:44 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7737)]could not retrieve firmware information: webs_state_update = 1, webs_state_error = 1, webs_state_dl_error = 0, webs_state_info.len = 23
It’s unclear to me if the auto firmware check is even related to when the problem first started. Maybe it’s one of several periodic tasks that run at around that time? It looks like typically I see this message about 30-40 minutes after the auto firmware check:
ahs: [read_json]Update ahs JSON file.
This seems to be related to “ASUS Healing System” which I don’t even know if I have enabled or not. I also saw the auto update check and ahs JSON message show up again in the log after my first router reboot, at around 6:47 AM. Not too long after that, the dnsmasq.leases “no space left on device” error happened again, so I think it was out of RAM again — perhaps asd was gobbling up CPU time and RAM.
Does anyone have any further info on what happened here? My two theories are: either asd downloaded a bad file from ASUS that caused it to crash, or someone was exploiting a vulnerability that was patched in one of ASUS’s two most recent updates for my RT-AC86U router. If it’s the latter, it’s obviously my bad for not keeping my firmware up to date, but I can’t help but wonder if an automatic file download in the middle of the night caused it. I’m very curious about what happened! Did anyone with an ASUS router not run into a similar problem today?
I had an RT-AX56U with the same/similar issue. I deleted /jffs/asd/chknvram20230516, rebooted, and everything seems to be ok now. That file had been downloaded at 03:11 am, seemingly when the issue arose. Not sure if it’s that particular file or some other underlying issue with asd, but it solved it for me.
Ah, very interesting! That filename seems like a good clue about what happened. The filename I have now is chknvram2023051601. Maybe that original filename you deleted was the bad one I originally had too…
This makes me feel better though; it seems less and less likely that it was a mass exploit.
I like your writing.
I have an Asus AX55 and this EXACT thing was happening to me – I’ve reverted to using an old openwrt POS i had laying about. ssh was doing the same for error and uptime after reboot is 5-10m tops. disconnecting WAN seems to help. I also can’t upload firmware and looking at the debug in browser it never uploads any bytes to upload.php (i think) router will stop pinging and go unresponsive for a few mins then somewhat recover. so i can’t event flash it!
I assumed it was a memory leak but it would bounce about 40MB free, cpu was low tho but would get slower responses over ssh until it crashed. I had auto update disabled too and was 2 revisions old.
same issue with the RT-AC87r or u but no firmware update to be had i’m still trying to figure what can be done since i don’t have a firmware fix outside of downgrading to even older FW
My AX89X had the same problem. I rebooted the router at least 4 times before realizing the asd was causing the kernel panic after about ten minutes.
I had to upload a beta firmware RT-AX89U_9.0.0.4_388_32094 someone recommended on Reddit before the CPU and Memory hogging would stop.
Fortunately I had that ten minute window to patch the router with beta firmware and the problem seemed to go away. Hope it does not return tonight haha!
RT-AC68U here. Had to reboot it through the management UI when I got home from work. Same symptoms and log messages. Stayed up for 4-5 hours, then crapped out again. Did not have auto firmware updates enabled. Just finished applying an update after the second reboot. Haven’t remoted in to clear any files, just patch and reboot. I guess I’ll find out if it worked come morning.
Just a few days ago this has been blogged on how a threat actor is exploiting a recently discovered TP-Link exploit:
https://research.checkpoint.com/2023/the-dragon-who-sold-his-camaro-analyzing-custom-router-implant/
Probably there’ll be a more mundane explanation, but just saying… 😉
Similar story here. RT-AX55 which had been a solid workhorse for some time, lost internet connection. Syslog file mentioned /var/lib/misc/dnsmasq.leases and file system full. Reboot seemed to bring internet back for a few minutes. Called Asus support, listened to a jarring message loop for better than an hour, finally got a tech support representative who talked me through a bunch of procedures that did not much help. Finally she sent me an email with a questionnaire and instructions to attach backup of configuration and syslog files. All rather hard to do, as the web UI went down rather quickly when things started falling apart.
From the very long wait for a tech support person I’d say Asus was experiencing a lot of calls this morning. I’ve never had more than a couple of minutes of waiting in the past, and I’ve owned several pieces of Asus equipment.
I hope they get a clue and come up with more directly useful suggestions than I was given.
In the process of multiple resets I managed to get the thing to download and install an upgrade, since which I’ve had no problems. Fingers crossed.
Filesystem full when various processes are trying to produce logs and data files can work pretty much havoc, as I know well from experience. Hoping this one doesn’t recur.
Sam thing happened to me. albeit a little different. I left for a doctor’s appointment yesterday morning and came home 2.5 hours later and no device in my house could see or join my wireless network. Similar/Same error codes but it was like the Wi-Fi module crashed completely. My hardwired devices were still moving right along. A quick reboot with the button on the back and everything came back online.
[…] Doug Brown ☛ What happened with ASUS routers this morning? […]
Thanks ASUS….woke up this morning and spent 2h resetting devices and reconfigure the Mesh network. Ridiculous!
I can’t get the FW upload to work.. how the hell do you get the recovery mode to work? only port that becomes active is the wan port but the tool won’t upload to it.
Hey.
I have RT-AC3200 and have no idea what you guys talk about when explaining on how to resolve the issue.
I simply downloaded the 2nd to last firmware release and tried to install it on the router.
It said it was failing, but kept going from 1-100%.
After the 100% i was asked to manually reboot the router, so i pulled the power for 10s.
It works for me now, none of the CPU spikes and i have 1/3 of RAM available.
I had an RT-AX92U with the same/similar issue this morning 5/18 (IST timezone). Reboot + firmware upgrade solved the issue
RT-AC66U fleet here. No issues observed. These are older MIPS units. I believe the 68 onward are ARM.
its snake
https://youtube.com/shorts/p-MuQhoJpqw?feature=share
RT-AC1750: similar issues as described, starting 5/17/2023. Unable to connect to router UI at 192.168.1.1 or 192.168.50.1. Router stays alive for a few hours then craps out again.
RT-AC87U here. I did *not* encounter this problem this morning (found out about it from HN). I’m running the Merlin firmware, version 384.13_10.
Same problem with Asus routers here in Poland today morning and evening = 18.5.2023: AC1500/ ZenWifi Mini and RT-AC51 – all our Asus routers went down. After several reboot(s) – routers work for some minutes, and then … down again, then reboot … some minutes up – and then hang up again! Different routers, different networks, so far – the only common is the Asus router!
I have a mesh of RT-AC68U and DSL-AC68Us sitting behind a virgin media hub in modem-only mode. The RT units (including the gateway) are running stock fw version 386_51255 (not latest) and we were unaffected. I wonder whether that means the VM hub inadvertently filtered any bad traffic.
I got same problem. RT-AC68U with Firmware Version: 3.0.0.4.386_49703
May 18 13:30:48 kernel: LR is at 0x21708
May 18 13:30:48 kernel: pc : [] lr : [] psr: 20000010
May 18 13:30:48 kernel: sp : beb367f0 ip : 401278d4 fp : 0007c314
May 18 13:30:48 kernel: r10: 0007c2ec r9 : 0007c2ec r8 : 0007bab8
May 18 13:30:48 kernel: r7 : 0000000b r6 : 00069afb r5 : 0007c314 r4 : 0007c2bc
May 18 13:30:48 kernel: r3 : 0007aff4 r2 : 0007c2ec r1 : 0007c2bc r0 : fffffff4
May 18 13:30:48 kernel: Flags: nzCv IRQs on FIQs on Mode USER_32 ISA ARM Segment user
May 18 13:30:48 kernel: Control: 10c53c7d Table: 0729004a DAC: 00000015
May 18 13:31:31 dnsmasq-dhcp[772]: failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 18 13:32:31 dnsmasq-dhcp[772]: failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 18 13:33:31 dnsmasq-dhcp[772]: failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 18 13:34:31 dnsmasq-dhcp[772]: failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 18 13:35:31 dnsmasq-dhcp[772]: failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 18 13:35:46 kernel: brctl/6954: potentially unexpected fatal signal 11.
May 18 13:35:46 kernel: Pid: 6954, comm: brctl
May 18 13:35:46 kernel: CPU: 1 Tainted: P (2.6.36.4brcmarm #1)
May 18 13:35:46 kernel: PC is at 0x401878e0
May 18 13:35:46 kernel: LR is at 0x21708
May 18 13:35:46 kernel: pc : [] lr : [] psr: 20000010
May 18 13:35:46 kernel: sp : bede4810 ip : 401878d4 fp : 0007c2cc
May 18 13:35:46 kernel: r10: 0007c2a4 r9 : 0007c2a4 r8 : 0007bab8
May 18 13:35:46 kernel: r7 : 0000000b r6 : 00069afb r5 : 0007c2cc r4 : 0007c294
May 18 13:35:46 kernel: r3 : 0007aff4 r2 : 0007c2a4 r1 : 0007c294 r0 : fffffff4
May 18 13:35:46 kernel: Flags: nzCv IRQs on FIQs on Mode USER_32 ISA ARM Segment user
May 18 13:35:46 kernel: Control: 10c53c7d Table: 07e6c04a DAC: 00000015
May 18 13:36:31 kernel: ATE/7031: potentially unexpected fatal signal 11.
May 18 13:36:31 kernel: Pid: 7031, comm: ATE
May 18 13:36:31 kernel: CPU: 1 Tainted: P (2.6.36.4brcmarm #1)
May 18 13:36:31 kernel: PC is at 0x400c77a8
May 18 13:36:31 kernel: LR is at 0x4001a4cc
Yes. Same here..
I keep getting that dnsmasq error and the Ram is about 95%full and both CPUs are over 85%. Reboot fixes it but not long after it gets bunged up and requires a reboot again.
Im using lots in my router. DNS over TLS, DDNS, VPN IKEv2. Im wondering if having all these features is causing the issues.
I had the same EXACT issue. Same log errors and all that you published. I flashed the merlin firmware and that fixed my router. Looking at the syslog after the merlin update, there was nothing complaining about nvram space or anything even several hours after. For anyone with this similar issue, that would be my reccommended fix. Just google “merlin asuswrt” and find the firmware for your asus router and upgrade it through the web GUI.
Glad I just flashed Fresh Tomato last week to my AC68U. I have a second AC68U that is currently not being used (and therefore not updated). I wonder if the issue will present itself.
Same issue started yesterday. I was going to swap out my RT-AC1900P today but googled the issue and found that the most recent firmware fixed the issue so I upgraded the firmware on my 3 other routers and a friends with an RT-AC68U with the same problem. The installation of 3.0.0.4386_51665 seems to have fixed all the routers.
I hope Asus comes clean with what caused the problem, but that might just be wishful thinking.
Oh wow, thank you all for providing so many more data points! Between the comments here and Hacker News and everywhere else, it’s clear that this affected a lot of people.
Looks like this morning at 4:42 AM PDT my RT-AC86U downloaded a new chknvram: chknvram2021083002. Interesting…the date in the filename is almost 2 years ago, whereas the previous ones had a very recent date. So clearly ASUS has been doing something on their end about this too. Anyway, no problems and I’m still up and running with no issues.
Also the auto_firmware_check lines in syslog match up perfectly with that timestamp, so it’s all starting to make sense — it’s looking like that is what automatically downloads the chknvram files.
Wow, I thought I only had this problem. Contacted my ISP and they checked on their end and everything was fine. So I reset the router and it seems to be normal now.
I had the same problem yesterday (5/16). Finally called my ISP and they asked if I had an ASUS router. The tech support rep didn’t directly recommend anything, but mentioned firmware, so I checked and saw that I was a few updates behind. If it ain’t broke…, so I don’t update my firmware all the time. Updated without issue and it was still up this morning. I have an AC86U router as well.
Same issue here since yesterday morning even running with the lasted firmware version on RT-AC5300.
All WIFI SSID shows up but most of my equipments were not receiving IP’s including wired one.
Eletrical reset fixed the issue for a short period of time. The next morning same thing happened again. I look further to see the error “dnsmasq-dhcp[425]: failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)” appeared several time and that the CPU and memory was going into frenzy mode.
I did a factory-reset, reconfigured. So far so good. I’ve manage to have my connection more than 20 minutes.
Of course upgrading or downgrading will wipe and recreate the nvram file. What i am afraid off, is that it’s not a permanent fix. The exploit could occur again until it’s patched.
I have an RT-AX92U and got similar logs to yours, but focused on the DST issue, and got it working after a firmware update and changing my NTP server from pool.ntp.org (which was having issues for my end anyway) to time.google.com which uses smear for DST.
Same thing here. ASUS technical support is IMPOSSIBLE to contact. Thanks so much for this post. I had a different file, /jffs/asd/blockfile20230510 and I have been having problems a few days ago, but yesterday the router wouldn’t stay up more than an hour or so.
I just downloaded and installed the latest firmware, hope that it hold.
I’ll be looking for a different router, NOT ASUS. This is not the first problem I have had with their router.
The lead developer of Asuswrt-Merlin says on Twitter that older versions of the Merlin firmware could also be affected. He says that stock firmware 388, or recent 386_51xxx versions also aren’t affected by the issue. This lines up with my experience where the problem went away as soon as I upgraded to 386_51529
https://twitter.com/RMerlinDev/status/1659219112780873729
Sounds like I should be damn glad I run FreshTomato on my Asus router. I took a look at the stock Asus f/w when I bought the router and didn’t waste any time putting FreshTomato on it..
Well now I kind of feel like an idiot for having gone out and purchased a new mesh network. Both my 86Us died yesterday morning and I thought I was losing my mind – how do two perfectly good routers die on the SAME DAY?? And now I know. Kind of pissed that I spent money when there was a solution, but at least I’m on wifi6 now, so yay?
Update:
After my initial post it took 20 minutes and back to problems again.
However, around 6h ago something happend with my router:
May 18 13:47:35 rc_service: httpd 636:notify_rc start_webs_update
May 18 13:47:43 rc_service: httpd 636:notify_rc start_sig_check
There’s a ton of more in the logs at that time, but does this mean that ASUS sent out a fix to the problem?
Since this time, my router has been at 56% use of RAM and 0-5% use of CPU’s…
About 11:00 p.m. MDT yesterday 5/17/2023, a firmware update 23285 did resolve the issues. (I used ASUS app for 2 mesh XT-8 Zens.)
Like everyone, this is infuriating and to this point, not a whit that I can find from ASUS owning the issue and reassuring folks about some future proofing | what to do | repeating how to save settings. I’m fairly certain I’m done with them. Just a sterling example of HOW NOT TO MANAGE A FAILURE PROBLEM. Own it, explain it, help folks prepare & fix. Basic stuff.
Just got off chat with Asus. They are telling me there was a server configuration change early morning 17 MAY 2023 that caused this issue and impacts nearly all Asus routers. I asked the clarifying question on whether there is a server configuration outside my network that impacts my home router and they said yes there is. My router has auto firmware download/install turned off. Not sure what else can be pushed to the router. I was given a FAQ on how to perform a factory reset. If they pushed something to the router, that would be a way to back that out, I guess. I cannot imagine a server config that doesn’t push something to the device impacting the router like this. They are still working on the issue at 11:00 AM PDT 18 MAY 2023 and no ETA was available. I’ve noticed my router has stabilized. I was one of first to report the issue since I called at 6:00 AM PDT yesterday.
Given the error of “Invalid string” and Ron’s interaction with Asus support, I wonder if Asus pushed/listed a malformed URI for the update check, and the affected firmware version(s) didn’t perform proper validation before storing it to an NVRAM variable where the asd process choked on it later?
I had the same as everyone else. I ended up doing a Factory Reset on the main router and the 2 mesh nodes. Had to set it all up from scratch. I did notice that the signatures were updated at 12:14pm today Central Time. Not sure if they are new or that was just the last Factory Reset time.
Crossing my fingers that I don’t see the issue return in 2-4 hours again…
Same issue as all have… root FS is full and no leases are distributed… only reboot helps some hours…
Started 18.05 during night.
The current ‘fix’ is to downgrade the firmware to the version at this link:
https://dlcdnets.asus.com/pub/ASUS/wireless/RT-AC68U/FW_RT_AC68U_300438651255.zip?model=RT-AC68U
[Per ASUS chat support session ending at 12:32:46 PT on 2023-05-18]
Source: https://news.ycombinator.com/item?id=35993425
I found below solution that seems to work till asus fixed it per Firmware upgrade…
UPDATE – so far, here is what I did with postive results for that last hour or so:
1.) From command line : simply do:
rm /jffs/asd/chknvram20230516
and then type
exit
to close
then reboot
2.) after reboot
free && sync && echo 3 > /proc/sys/vm/drop_caches && free
and finally
3.)
kill -SIGSTOP $(ps | grep ‘[a]sd$’ | awk ‘{print $1}’)
We’ll see how it goes until a fix is pushed by ASUS.
Same problem with RT-N19. Russia. Firmware 3.0.0.4.382_52488 from 2021/01/19. Woke up, noticed phone not connected to wi-fi. Wireless, WAN and power LEDs were on on the router. Network is visible from devices, can’t connect. Rebooted by power. After about 20 minutes lost connection again. Restarted again and it works ok since then.
From logs see router went down at about 2023-05-18 00:05:00 UTC. asd.log is filled with “1684394042[chknvram_action] Invalid string”
It would be really great if ASUS would put a comment up on their support page: https://www.asus.com/support/ with some kind of status, instead of dead air…
“fimrware”? No wonder they’re getting errors about Strings…
Thanks for this post. Thanks to you I got back on line very quickly.
And I REALLY agree with avantdude that ASUS should have put up a comment on their support page. I have a lot of ASUS products, and always believed the higher price tag was worth it. Still do, but my confidence is shaken.
I manage a few ASUS RT-ACRH13’s for people who apparently suffered through the problem in silence. The cool part is, the routers stayed up throughout and recovered spontaneously; they’re still up at the time of writing. It looks like this was a repeated OOM (out-of-memory) kill of the `asd` process, probably caused by an invalid or unexpected response from the firmware update server:
https://gist.github.com/slingamn/01252fe2a74cc89e598149fdc124c652
I guess ASUS fixed the response/files on the server side? I have:
admin@RT-ACRH13:/tmp/home/root# ls /jffs/asd
blockfile20230510 chknvram20230518
Wow, I’m glad I use a router that’s actually well supported by the manufacturer, like a decade old Fritz!box.
I spoke to Asus support today – they told me that resolution should be applied to all routers – only 1 thing has to be performed by user – hard reset of device and all should be fine now
Hard reset, AFTER backing up config and, if you don’t have an ethernet port on your device ( ipad and such ) taking a picture of the sticker on the bottom so you can log back in the router with the default wireless info and uploading the config again. Which is so easy for little old ladies who bought this router and then paid for someone to come in and config it the first time. This is a clusterf of the first degree. For so many companies… ASUS owes bigtime!
I updated the firmware; it worked for a day and is now just completely unresponsive; my desktop on a wired connection doesn’t even recognize that it’s connected to anything, though the router lights up.
I may try a factory reset, but this is making me think that I should be looking at a different brand.
Asus has responded.
Last Update: 2023/05/19 03:23
https://www.asus.com/support/FAQ/1050466
I love that the completely neglect the whole part of attaching to the router after it is reset which makes it seem so trivial.. Truly the definition of 1/2 ass support effort and documentation. Come on ASUS act like professionals instead of Midnight Engineering!!
For fun, I extracted the asd binary from the RT-AC86U firmware version 86.48260 (the version I was running that experienced the problem) and did some very rudimentary disassembly to try to understand it more. I don’t want to waste too much time looking at it in detail to try to discover the actual bug that caused it to flood that error to the log and eat up RAM, but I can say that the log rotation from /jffs/asd.log to /jffs/asd.log.1 is implemented in the same function that logs messages to asd.log. It simply checks if asd.log is greater than 102400 bytes, and if so, copies it to asd.log.1 (erasing any previous asd.log.1 contents) and then erases asd.log. After the log rotation, it finally appends the new message to asd.log. So in the worst case, asd.log.1 and asd.log would both be just over 102400 bytes.
So overall…I don’t think the asd.log file was filling up the 47 MB JFFS2 partition at all. The log rotation ensures it and asd.log.1 can’t grow too big. It definitely would have wasted some of the life of the NAND flash though.
The chknvram_action function was probably caught in an infinite loop due to poor error handling logic. Based on a cursory look at a disassembly of that function (I reserve the right to be totally wrong about this), I think it could potentially do memory allocation with strdup() without being free()’d in the error case. This combined with an infinite loop would fill up the RAM.
As far as I can tell based on additional hasty disassembly, chknvram_action was completely redone in the new asd in the 86.51xxx firmwares (actually it’s now in libasd.so). It doesn’t even have code that prints out that same error message to the log anymore.
I just followed ASUS instructions.. and it perfectly fixed my ASUS router. I had issues for 3 days, restarting did not help, my family household was getting very agitated because we had to actually communicate with each other. lol.
Interruption in Router Product Connectivity and Urgent Mitigation Measures
Dear Valued Customer,
During routine security maintenance, our technical team discovered an error in the configuration of our server settings file, which could potentially cause an interruption in network connectivity on part of the routers.
• Our technical team has urgently addressed the server issue and impacted routers should return to normal operation. If your device was affected, we recommend the following:
1. Manually reboot your router
2. If rebooting does not resolve the issue, please save the settings file, perform a hard reset (factory default), and then re-upload the settings file (follow the directions in the https://www.asus.com/support/FAQ/1050464)
3. If you cannot access the user interface to save settings or perform a reset, you can press the RESET button for about 5-10 seconds until the power LED indicator on the router starts to blink, which means the reset is completed.
https://www.asus.com/support/FAQ/1000925/#m2
If there are any further developments around this issue, we will immediately update our users.
We deeply apologize for any inconvenience this incident may have caused and are committed to preventing such an incident from happening again.
For any further inquiries about your ASUS router, please contact our customer service for support.
USA & Canada Hotline:1-812-282-2787
Support site ASUS ROG
Thank you for your understanding and thank you for choosing ASUS.
[…] пользователи оставались в неведении и пытались сами разобраться в чем […]
Thank you for writing this detailed, timely article.
Faced the same issue and upgraded the firmware to 3.0.0.4.388_23285-g5068da5 on all 3 nodes (mesh network). The router (asus zenwifi ax mesh 3 nodes) worked fine for a day and now it just gives yellow status light (Weak backhaul connection to router) on both the satellites nodes.
Logged into the router. top is normal. getting these errors every 3 mins:
May 20 07:17:31 roamast: determine candidate node [50:EB:F6:73:B2:34](rssi: -31dbm) for client [D2:69:8A:3D:88:CC](rssi: -64dbm from client)(rssi: -75dbm from ap) to roam
May 20 07:17:31 roamast: Roam a client [D2:69:8A:3D:88:CC], status [0]
May 20 07:17:31 wlceventd: wlceventd_proc_event(520): eth5: Disassoc D2:69:8A:3D:88:CC, status: 0, reason: Disassociated because sending station is leaving (or has left) BSS (8), rssi:0
May 20 07:17:31 wlceventd: wlceventd_proc_event(520): eth5: Disassoc D2:69:8A:3D:88:CC, status: 0, reason: Disassociated because sending station is leaving (or has left) BSS (8), rssi:0
May 20 07:17:58 wlceventd: wlceventd_proc_event(503): wl0.1: Deauth_ind 00:35:FF:28:94:6D, status: 0, reason: Deauthenticated because sending station is leaving (or has left) IBSS or ESS (3), rssi:0
May 20 07:18:14 wlceventd: wlceventd_proc_event(539): wl0.1: Auth 00:35:FF:28:94:6D, status: Successful (0), rssi:0
May 20 07:18:14 wlceventd: wlceventd_proc_event(568): wl0.1: Assoc 00:35:FF:28:94:6D, status: Successful (0), rssi:-38
May 20 07:20:13 wlceventd: wlceventd_proc_event(520): eth5: Disassoc BC:D0:74:4E:A0:68, status: 0, reason: Disassociated because sending station is leaving (or has left) BSS (8), rssi:0
May 20 07:20:13 wlceventd: wlceventd_proc_event(520): eth5: Disassoc BC:D0:74:4E:A0:68, status: 0, reason: Disassociated because sending station is leaving (or has left) BSS (8), rssi:0
Like everyone else I’m having the same issue. I have a ZEN WiFi AX Mini wireless mesh connected to my ISP modem. Any advice on how to upload the latest firmware?
My ASUS Router App says I have 3.0.0.4.386_49599. Is that the latest version?
rt-ac1900p here. Same problem occurred. Didn’t know about SSHing into the device. Thought my provider was playing tricks on my connection.
I did an update, and finally power cycled it. Seems to be working ok now.
[…] BrutePrint attackhttps://arstechnica.com/information-technology/2023/05/hackers-can-brute-force-fingerprint-authentication-of-android-devices/Wemo SmartPlug buffer overflow vulnhttps://sternumiot.com/iot-blog/mini-smart-plug-v2-vulnerability-buffer-overflow/Ford EVs to adopt Tesla charging standardhttps://www.caranddriver.com/news/a44016347/ford-tesla-ev-charging-opinion/ASUS pushes bad config file to millions of routers; breaks internet for two dayshttps://www.asus.com/support/FAQ/1050466https://www.downtowndougbrown.com/2023/05/what-happened-with-asus-routers-this-morning/https://www.asuswrt-merlin.net/ […]