Day 25
Ramdump. Telnet. Ramdump.
A keen eye would have noticed I already copied androidboot.ramdump=disable
from my previous port - it was already there from its Lineage build.
That didn’t do much if anything. But the answer needs to be in the kernel.
The question: why does the system reboot in a (special mode](android - How are ramdumps generated on a system crash? - Stack Overflow) when this device crashes?
Searching MAGIC, RAMDUMP etc in defconfig doesn’t do it.
Searching ramdump in kernel sources has some (many) hits. I need to filter them out, since some are RAM dumps from peripheral devices.
asusdebug.c
. This file is full of gems.
First, there’s an asdf
partition. That contains some asdf-logcat.txt
or logcat-crash.txt
but they are from weeks ago.
Idea 1: maybe I should mount it in recovery.
CONFIG_MACH_ASUS seems to guard some of the debugging, but I only have CONFIG_MACH_ASUS_SAKE. However obj-$(CONFIG_MACH_ASUS_SAKE) += asusdebug.o
. Wait, I do have CONFIG_MACH_ASUS=y
output in .config, I just don’t know from where.
Same asusdebug.c. There should be a /proc/asusdebug (it now contains just f2fs-attr:off
)
It seems to accept a command like echo get_asdf_log > /proc/asusdebug
So, mkdir /asdf && mount /dev/block/sda9 /asdf
then that echo.
That doesn’t seem to change the contents of the folder, unftortunately
echo slowlog > /proc/asusdebug
does produce ASUSSlowg-$(date)
logs though they might be from the current running system (something like a backtrace of all kernel threads.)
Breaking: Elros finds something in my dmesg from Day 23 and pings me to point out this change from: https://review.lineageos.org/c/LineageOS/android_device_xiaomi_sm6150-common/+/291494/4
Where ${ro.boot.usbcontroller}
is a600000.dwc3
in my case.
I do it, create the /init_enter_debug2
file, boot hybris-boot and LO AND BEHOLD!
New USB device is detected when I boot, new usb network device and I just need to sudo ifconfig enp62s0u1 192.168.2.14
it (twice, because NetworkManager overrides the first config) and telnet 192.168.2.15 2323
and
Welcome to the Mer/SailfishOS Boat loader debug init system.
\o/
I immediately use this new superpower to spawn another telnet where I run dmesg -w
and instruct the first session to continue the boot halted by the init_enter_debug2
file presence:
# echo "continue" >/init-ctl/stdin
And just as before, I am offered a ramdump screen (because droid-hal-init does this, after I “fixed” its quit because wlan.ko was missing).
The interactive telnet dmesg session has a headstart from the persisted journalctl and includes these lines in addition:
[ 75.080658] [ 75.080658] (CPU:4-pid:1020:systemd-udevd) [05:46:21.316141874] cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000
[ 75.080669] [ 75.080669] (CPU:4-pid:1020:systemd-udevd) [05:46:21.316151978] cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000, update PLL setting!!!
[ 75.084444] [ 75.084444] (CPU:5-pid:1020:systemd-udevd) [05:46:21.319927447] cs35l45 5-0030: Cirrus Logic CS35L45 (35a450), Revision: A0
[ 75.084574] [ 75.084574] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320057030] gpio gpiochip0: (f000000.pinctrl): allocate IRQ 401, hwirq 2
[ 75.084580] [ 75.084580] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320063436] gpio gpiochip0: (f000000.pinctrl): found parent hwirq 117
[ 75.084587] [ 75.084587] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320070051] gpio gpiochip0: (f000000.pinctrl): alloc_irqs_parent for 401 parent hwirq 117
[ 75.084614] [ 75.084614] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320097030] register_speaker_dai_name: register cs35l45 speaker name = cs35l45.5-0031
[ 75.087043] [ 75.087043] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322527134] cs35l45 5-0031: num_fast_switch:3
[ 75.087061] [ 75.087061] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322544114] cs35l45 5-0031: 0:cs35l45-spk-music.txt
[ 75.087070] [ 75.087070] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322553436] cs35l45 5-0031: 1:cs35l45-spk-outdoor.txt
[ 75.087079] [ 75.087079] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322562655] cs35l45 5-0031: 2:cs35l45-spk-voice.txt
[ 75.124684] [ 75.124684] (CPU:6-pid:1020:systemd-udevd) [05:46:21.360167186] cs35l45 5-0031: cs35l45_set_sysclk: clk_id = 0, freq = 3072000
[ 75.124692] [ 75.124692] (CPU:6-pid:1020:systemd-udevd) [05:46:21.360175259] cs35l45 5-0031: cs35l45_set_sysclk: clk_id = 0, freq = 3072000, update PLL setting!!!
[ 75.128460] [ 75.128460] (CPU:5-pid:1020:systemd-udevd) [05:46:21.363944009] cs35l45 5-0031: Cirrus Logic CS35L45 (35a450), Revision: A0
[ 75.170210] [ 75.170210] (CPU:7-pid:54:migration/7) [05:46:21.405693280] IRQ 14: no longer affine to CPU7
[ 75.245474] [ 75.245474] (CPU:2-pid:1035:systemd-udevd) [05:46:21.480958072] [FTS_TS]fts_get_ic_information: Enter
[ 75.246339] [ 75.246339] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481822186] [FTS_TS]fts_get_chip_types:verify id:0x5652
[ 75.246351] [ 75.246351] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481834218] [FTS_TS/I]fts_get_ic_information:get ic information, chip id = 0x5652
[ 75.246359] [ 75.246359] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481842707] [FTS_TS]fts_get_ic_information: Exit(372)
[ 75.246373] [ 75.246373] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481856822] [FTS_TS/I]fts_create_apk_debug_channel:Create proc entry success!
[ 75.246409] [ 75.246409] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481892759] [FTS_TS/I]fts_create_sysfs:[EX]: sysfs_create_group() succeeded!!
[ 75.246422] [ 75.246422] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481905884] [FTS_TS]fts_ex_mode_init:create sysfs(ex_mode) succeedfully
[ 75.246442] [ 75.246442] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481924999] [FTS_TS/I]asus_create_sysfs:[EX]: asus_create_group() succeeded!!
[ 75.246463] [ 75.246463] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481946509] [FTS_TS/I]asus_game_create_sysfs:[EX]: asus_create_group() succeeded!!
[ 75.246471] [ 75.246471] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481954843] [FTS_TS]fts_gesture_init: Enter
[ 75.246484] [ 75.246484] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481967030] [FTS_TS]fts_gesture_init: Exit(462)
[ 75.246491] [ 75.246491] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481974895] [FTS_TS]asus_gesture_init: Enter
[ 75.246505] [ 75.246505] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481988541] [FTS_TS]asus_gesture_init: Exit(577)
[ 75.246513] [ 75.246513] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481996822] [FTS_TS][TEST]fts_test_init: Enter
[ 75.246521] [ 75.246521] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482004791] [FTS_TS/I][TEST]fts_test_func_init:init test function
[ 75.246531] [ 75.246531] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482014270] [FTS_TS/I][TEST]fts_test_func_init:match test function,type:88
[ 75.246541] [ 75.246541] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482024530] [FTS_TS][TEST]fts_test_init:sysfs(test) create successfully
[ 75.246549] [ 75.246549] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482032707] [FTS_TS][TEST]fts_test_init: Exit(2165)
[ 75.246571] [ 75.246571] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482054478] [FTS_TS/I]fts_irq_registration:irq:399, flag:2002
[ 75.246860] [ 75.246860] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482343645] [FTS_TS/I]fts_fwupg_init:fw upgrade init function
[ 75.246884] [ 75.246884] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482367082] [FTS_TS]fts_ts_probe_entry: Exit(1928)
[ 75.246893] [ 75.246893] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482376457] [FTS_TS/I]fts_ts_probe_entry:FOD location 6560 10720 26096 30256
[ 75.246901] [ 75.246901] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482384843] [FTS_TS/I]fts_ts_probe:Touch Screen(I2C BUS) driver prboe successfully
[ 75.247001] [ 75.247001] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482484686] [FTS_TS/I]fts_fwupg_work:fw upgrade work function
[ 75.247021] [ 75.247021] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482504791] [FTS_TS]fts_fwupg_get_fw_file:get upgrade fw file
[ 75.247030] [ 75.247030] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482513489] [FTS_TS]fts_get_fw_file_via_i:fts_get_fw_file_via_i
[ 75.247038] [ 75.247038] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482521926] [FTS_TS/I]fts_fwupg_get_fw_file:upgrade fw file len:102100
[ 75.247047] [ 75.247047] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482530884] [FTS_TS/I]fts_fwupg_auto_upgrade:********************FTS enter upgrade********************
[ 75.247054] [ 75.247054] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482538072] [FTS_TS]fts_ts_init: Exit(2240)
[ 75.247065] [ 75.247065] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482548020] [FTS_TS/I]fts_fwupg_upgrade:fw auto upgrade function
[ 75.248194] [ 75.248194] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483677186] [FTS_TS/I]fts_wait_tp_to_valid:TP Ready,Device ID:0x5652
[ 75.248201] [ 75.248201] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483683957] [FTS_TS/I]fts_fwupg_check_fw_valid:tp fw vaild
[ 75.248209] [ 75.248209] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483692759] [FTS_TS/I]fts_fwupg_get_ver_in_host:fw version offset:0x10e
[ 75.248628] [ 75.248628] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484111145] [FTS_TS/I]fts_fwupg_need_upgrade:fw version in tp:84, host:84
[ 75.248636] [ 75.248636] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484119791] [FTS_TS/I]fts_fwupg_upgrade:fw upgrade flag:0
[ 75.248642] [ 75.248642] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484124999] [FTS_TS/I]fts_fwupg_auto_upgrade:**********tp fw(app/param) no upgrade/upgrade success**********
[ 75.248646] [ 75.248646] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484129634] [FTS_TS/I]fts_fwupg_auto_upgrade:********************FTS exit upgrade********************
However, still no crash reason per se.
Now I can disable(mask?) droid-hal-init service, and interactively start it from telnet, maybe that gives me more control.
At least I could single it out as being the culprit.
But I would still need the details of the kernel crash.
That may need a better understanding of asusdebug.c
and its related changes to printk.c
or a revert of them.
Also I still don’t know how the ramdump screen is triggered (other than a Stackoverflow post) and how to use it in my advantage.
Let’s go with the disablement idea: in chroot through ADB, I
# systemctl mask droid-hal-init.service
Created symlink /etc/systemd/system/droid-hal-init.service → /dev/null.
I reboot, connect two telnet sessions, one with journalctl -f
(instead of dmesg -w) and the other used to echo “continue” to /init-ctl/stdin
Somehow this time I get disconnected but the device is not rebooted.
Journal last logs show systemd-udevd
activity like link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
.
It is not related to usb0, but that may be because it disconnects by that time. Maybe my udev rules are stepping on my toes here.
Checking the journal on the device (in adb shell chroot) shows some lines after the disconnect:
probably usb-moded
Mar 15 14:56:56 Zenfone8 systemd[1]: Started udev Wait for Complete Device Initialization.
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting usb-moded USB gadget controller...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting ohm daemon for resource policy management...
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target System Initialization.
Mar 15 14:56:56 Zenfone8 systemd[1]: Listening on OpenSSH Server Socket.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Sockets.
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: usb_moded 0.86.0+mer56 starting
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Daily Cleanup of Temporary Directories.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Reclaim memory once per day and on boot.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Timers.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Wayland path watcher.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Paths.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Basic System.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started droid-late-start.
Mar 15 14:56:56 Zenfone8 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=droid-late-start comm="systemd"
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Indicate boot is done...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Bluetooth service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Application Permission Management Daemon...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Oneshot stuff for root...
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: CONFIGFS detected
Mar 15 14:56:56 Zenfone8 kernel: [ 75.113388] (CPU:2-pid:4761:usb_moded) [12:56:56.971053020] configfs-gadget a600000.dwc3: unregistering UDC driver
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Login Service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Telephony service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Nemo device lock daemon...
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Disk quota netlink message daemon.
Mar 15 14:56:56 Zenfone8 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=quota_nld comm="systemd" exe="/
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Reclaim memory...
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis: mkdir failed: No such file or directory
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis/ethaddr: can't open for writing: No such file or directory
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis/wceis: can't open for writing: No such file or directory
Mar 15 14:56:56 Zenfone8 systemd-logind[4773]: New seat seat0.
Mar 15 14:56:56 Zenfone8 kernel: [ 75.122581] (CPU:2-pid:4761:usb_moded) [12:56:56.980245884] Mass Storage Function, version: 2009/09/11
Mar 15 14:56:56 Zenfone8 kernel: [ 75.122598] (CPU:2-pid:4761:usb_moded) [12:56:56.980262238] LUN: removable file: (no medium)
Mar 15 14:56:56 Zenfone8 kernel: [ 75.122648] (CPU:2-pid:4761:usb_moded) [12:56:56.980312342] file system registered
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: Unable to find $power_supply device.
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: hwal init failed
Let’s mask usb-moded too. Great, I don’t get disconnected now, and can inspect the running system.
I notice that odm is mounted as /odm_root. Is there anything useful there that could prevent droid-hal-init from starting?
Also “vendor-vm\x2dsystem.mount: Mount process exited, code=exited status=32”. So does “asusfw”, and “FFS”
pulseaudio[5126]: library “libdl_android.so” not found
This is usually fixed by linking the lib in /odm/lib[64]
And let’s see about that droid-hal-init.
If I didn’t write this till now, droid-hal-init is actually “init”, but from Android. We are executing this as non-0 PID service (with some patches so it ignores that it is not PID=1).
I didn’t peek into its sources before, but I got more familiar with this occasion.
On Android 11 at least, there is a “first stage init” that determines if its’ recovery or not, and if not it loads kernel modules.
Then it executes himself with “selinux_setup” as argument.
If that succeeds, then it executes himself with “second_stage”.
And it actually starts executing /system/etc/init.rc
The hybris patches change a bit of that, but in essence you still get the same steps.
Speaking of selinux, maybe this is why it sends us to the ramdump screen. Or the failed mounts.
So I at have to check:
- failed mounts
- selinux files
- odm links
1.failed mounts.
# systemctl status vendor-vm\\x2dsystem.mount
mount: /vendor/vm-system: wrong fs type, bad option, bad superblock on /dev/sde44, missing codepage or helper program
sde44
, the device previously known as /dev/block/bootdevice/by-name/vm-bootsys_b
. I don’t have notes on /vendor/vm-system being mounted in Lineage. It’s presence in the fstab may be a red herring.
Also a generous xxd -l 5120
shows only zeroes and there is no reference to either name(s) in the ASUS flashing script or list of images.
Same story for vendor/asusfw
aka sda16
.
find /usr/lib/systemd/system -iname \*.mount -exec grep FFS {} \; -print
Description=FFS mount
/usr/lib/systemd/system/dev-mtp.mount
Again, I’ll worry about this later. MTP is media transfer protocol.
2.selinux files.
HADK FAQ says I should copy some files.
They’re from /vendor/etc/selinux.
Unfortunately I don’t have ssh to copy them from the device in the droid-config-sake repo. I’ll make the folder structure on device and I’ll copy it later through ADB (If I don’t forget and boot into Lineage by accident)
As in the example repo, I:
- create
config
with the same contents, create minimum/context/files
folder, create minimum/context/dbus_contexts
with the same contents.
- copy
/vendor/etc/selinux/vendor_file_contexts
to minimum/contexts/files/files_contexts
- create ‘minimum/policy’ folder, copy… what. There’s no precompiled_sepolicy in /vendor/etc.
Fortuntely I soon find one in /odm_root/etc/selinux/precompiled_sepolicy
. Hopefully it plays the same role.
3.odm links
For now I create a link in /odm/etc pointing to /odm_root/etc
There’s another ueventd.rc file in there talking about firmwares for “trustedvm” that are on /vendor/vm-system
but I don’t have that so I don’t link it either.
4?
What about wireless? I am to curious not to check it out.
Not present in ifconfig -a
. It fails to modprobe qca_cld3_wlan
and dmesg says Reject WLAN Driver insmod before CBC
Kernel source above that error says:
/* If enabled Cold Boot Calibration is the 1st step in init sequence.
* CBC is done on file system_ready trigger. Qcacld should be loaded
* from init.target.rc after that. Reject qcacld load from
* vendor_modprobe.sh at early boot to satisfy this requirement.
*/
Wow, so I need some droid init sequence even for wlan driver. Hmm. find / -iname \*.rc -exec grep -i wlan {} \; -print
finds me this nicely commented piece of init:
# Enable WLAN cold boot calibration
write /sys/devices/platform/soc/b0000000.qcom,cnss-qca6490/fs_ready 1
Ok, modprobe again:
DMS QMI connection not established
Direct firmware load for wlan/qca_cld/WCNSS_qcom_cfg.ini failed with error -2
I do have /vendor/firmware/qca_cld/WCNSS_qcom_cfg.ini
QMI is probably Qualcomm MSM Interface. Wikipedia says oFono could help. But why this dependency in the wlan driver? VoWIFI?
The kernel source file with that error references “CNSS_QMI_DMS_CONNECTED” that is missing.
I’ll leave it for another day to see what DMS is.
Ok, now let’s start droid-hal-init.
Prepare three telnet sessions, one with journalctl -f, one ready to launch /usr/libexec/droid-hybris/system/bin/logcat
Aand… ramdump screen. I don’t have the chance to execute logcat.
At least this time I got fuller journal than before, when probably usb-moded kicked in earlier and journalctl didn’t sync to persisted?
There are a couple of ‘sending signal 9’ lines
The last service started is ‘/vendor/bin/vendor_modprobe.sh’.
Probably some .rc services need to be disabled.
Tomorrow.