Livecasting porting notes for Zenfone 8

Day 11:
Asked about gki1 on #sailfishos-porters - suggestion was I should adapt hadk/hybris/hybris-boot/Android.mk - just as I tried to do with ubports build.sh.

Meanwhile:
UBports pad was updated.
There are references to vbmeta emtpy hashes if system reboots into bootloader fast (as I had on Day 7 part 1?)
Also “console=tty0” is a must for cmdline and should not be removed no matter what.

Back to SFOS:
I continue to have a long discussion about gki/vendor_boot on #sailfishos-porters. Folks are very helpful but it is clear that this is new territory. Asus may have used a shiny new thing that was not compulsive for Android 11.

Creating an empty vbmeta meanwhile:

$ avbtool make_vbmeta_image --flags 2 --padding_size 4096 --output vbmeta_disabled.img

Flashing that vbmeta + hybris recovery. No telnet.
Flash asus vendor_boot
Flashing ubports boot/vendor_boot
Flashing back lineage boot/vendor_boot

nothing.

Per porters’ idea, I unpacked Lineage’s boot.img, replaced its ramdisk with the hybris-recovery one LZ4’d → stuck at ASUS logo, no sign of mer rndis/gadget either.
(It’s funny how the hybris-recovery differs from the hybris-boot ramdisk just by one byte, which is 1 as in ALWAYS_DEBUG=1)

Because the device won’t reboot in 60s and I have no telnet, it means that the init script does not reach the sleep 60 line either.

Added a sleep 30, followed by a reboot -f just before hotplug in case that hotplug line is the culprit.
Still stuck at ASUS logo.

It is however a lesson to me that you need to trust that the init shell script could be debugged. With nothing at hand but staggered sleeps and restarts. That is, with no other connection to the device.

Tomorrow I will probably add a restart even earlier…

1 Like

Day 12:
Let’s add a sleep 15 at the beginning of init shellscript.
Flashing hybris-recovery → reboot in about 10 seconds (or 14 if you count the ‘unlocked’ screen) but hardly 15.

Let’s add 25. The same reboot time, approx 14 (depends on how you count).

Wait, I forgot, I shouldn’t be flashing hybris-recovery, but Lineage boot with hybris recovery ramdisk:)
mkbootimg later, fastboot boot → ‘Failed to load/authenticate boot image: Load Error’
Ok, fastboot flash boot. Now, “familiar” stuck-at-asus-logo.

Wait, was it like this all the time when the hybris-recovery booted? 10-14s until reboot?

Let me grab an ‘old’ hybrys-recovery. Yes, it was…

Ok, maybe I should boot twrp just to have adb shell and check last_kmsg or equivalent.

fastboot boot twrp → ‘Failed to load/authenticate boot image: Load Error’
Reboot bootloader → fastboot boot twrp fishes, but … is stuck at asus logo ?
Let’s flash Lineage boot again just for sanity. Works :whew:

Nothing in pstore or last_kmsg though.

Let’s replace the kernel in the ubports boot/vendor_boot pair now, with the Lineage one.
Hmm… restart in under 20 seconds… which is different :slight_smile:

But… The problem remains - boot script does not seem to be followed.
The orignal recovery init is not shell script, but binary.
Searching for “kernel init shell script” finds me a reference to CONFIG_BINFMT_SHELL.
My config does not have this - though it might be enabled by default.
Let’s check with scripts/extract-ikconfig - yup, CONFIG_BINFMT_SHELL it is enabled.

So this is not the problem.

I remembered that elros said something about abootimg. I think he referred to using that to replace the kernel.
abootimg seems to be a software that would allow this, but the last commit is from 2012.
There are forks updated two months ago, like this one that support v1 and v2 images.
Of course, I’m on v3 so they might not apply. Let’s “table” this.

What about Lineage? how come it boots?
What if I replace my kernel in the lineage boot.img, would it still boot?
There is something weird when I do this - boot.img was 98M, the new boot.img is 54M only.
Just to set the record straight, the way I’m unpacking and repacking is:

unpack_bootimg --boot_img boot.img --out k --format mkbootimg > k.txt

then, cat-ing the k.txt

mkbootimg --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel k/kernel --ramdisk k/ramdisk --cmdline ‘’ -o boot-k.img

I remember to have seen this difference on another pack/repack, that is, when I did the replacement in the first direction earlier, Lineage kernel in ubports boot image. I did not see this difference (was 46M always) when replacing the Lineage kernel in my hadk recovery boot image though.

Maybe my mkbootimg is different than the one in lineage tree?
That was in lineage/out/soong/host/linux-x86/bin/mkbootimg as I’ve seen on Day 6.
The mkbootimg has no --version param, but just calling it with -h shows differences in output.

lineage/out/soong/host/linux-x86/bin/mkbootimg
usage: mkbootimg [-h] [--kernel KERNEL] [--ramdisk RAMDISK] [--second SECOND]
                 [--dtb DTB]
                 [--recovery_dtbo RECOVERY_DTBO | --recovery_acpio RECOVERY_ACPIO]
                 [--cmdline CMDLINE] [--vendor_cmdline VENDOR_CMDLINE]
                 [--base BASE] [--kernel_offset KERNEL_OFFSET]
                 [--ramdisk_offset RAMDISK_OFFSET]
                 [--second_offset SECOND_OFFSET] [--dtb_offset DTB_OFFSET]
                 [--os_version OS_VERSION] [--os_patch_level OS_PATCH_LEVEL]
                 [--tags_offset TAGS_OFFSET] [--board BOARD]
                 [--pagesize {2048,4096,8192,16384}] [--id]
                 [--header_version HEADER_VERSION] [--dt DT] [-o OUTPUT]
                 [--vendor_boot VENDOR_BOOT] [--vendor_ramdisk VENDOR_RAMDISK]
/usr/bin/mkbootimg
usage: mkbootimg [-h] [--kernel KERNEL] [--ramdisk RAMDISK] [--second SECOND] [--dtb DTB] [--recovery_dtbo RECOVERY_DTBO | --recovery_acpio RECOVERY_ACPIO]
                 [--cmdline CMDLINE] [--vendor_cmdline VENDOR_CMDLINE] [--base BASE] [--kernel_offset KERNEL_OFFSET] [--ramdisk_offset RAMDISK_OFFSET]
                 [--second_offset SECOND_OFFSET] [--dtb_offset DTB_OFFSET] [--os_version OS_VERSION] [--os_patch_level OS_PATCH_LEVEL] [--tags_offset TAGS_OFFSET]
                 [--board BOARD] [--pagesize {2048,4096,8192,16384}] [--id] [--header_version HEADER_VERSION] [-o OUTPUT] [--vendor_boot VENDOR_BOOT]
                 [--vendor_ramdisk VENDOR_RAMDISK] [--vendor_bootconfig VENDOR_BOOTCONFIG] [--gki_signing_algorithm GKI_SIGNING_ALGORITHM]
                 [--gki_signing_key GKI_SIGNING_KEY] [--gki_signing_signature_args GKI_SIGNING_SIGNATURE_ARGS] [--gki_signing_avbtool_path GKI_SIGNING_AVBTOOL_PATH]

But I just stubbed with a script the soong one with my host’s and it didn’t generate a smaller image for lineage.

Let’s check parameters too:
Intercepted command:

mkbootimg --kernel $kernel --ramdisk $ramdisk-recovery --os_version 11 --os_patch_level 2023-02-05 --header_version 3 --output boot.img
Unpackbootimg command:
mkbootimg --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel b/kernel --ramdisk b/ramdisk --cmdline ‘’ -o hybris-b.img

Differences are: other mkbootimg binary, missing day in OS patch level, version is dotted, cmdline is specified… weird that any of those would count.

Adapting the command-line for the hybris ramdisk + Lineage boot didn’t make a difference in size (and wasnt’ a difference from the get go).
Adapting the command-line for the ubport ramdisk + Lineage boot outputs the exat file as before, ~50M vs 98M without lineage.
No difference in the Lineage boot.img repacking either (still 54M instead of 98M)

Let’s diff the extracted kernel from a boot.img to the one originally packed by the scripts from ubports or Lineage build.
Using diff <(xxd ../debugging/lineage/boot/kernel) <(xxd out/target/product/sake/kernel) I see that they do differ:

< 00001090: 0200 0014 885e 1453 2900 01d0 2941 1c91 …^.S)…)A…

From an early offset, that is.
But remember, after the initial Lineage build, I’ve re-made the build with my mkbootimg spoofing script. Just because I re-made it may make it different,…

There’s also the fact that the kernel in the build tree is marked as executable. Maybe I should do that too?

Anyway, a thoght for the next day, this difference in boot img sizes… what about booting lineage with my hadk kernel?
Let’s try it anyway. (Doing that). Restarts to bootloader in like 40 seconds.
Maybe I forgot vendor_boot from ubports? Let’s flash Lineage’s vendor_boot to make sure.
Restart in about 20s…

Day 13:
Vendor boot partitions

Copy your device fstab into /first_stage_ramdisk in the vendor_boot partition, not the boot partition.

GKI-versioning
Idea: extrack-ikconfig from gki kernel and apply minimal patches to that to boot uboports

Note for self: Xiaomi 12x wouldn’t have used the v3 image since it has a 4.19 kernel, not a 5.4 :slight_smile:

Sanity check: take lineage boot.img, unpack it and repack it.
The image is different (52M vs 98M) but Lineage still boots. The same result with host mkbootimg and soong/out one.

The weird part is that taking commands sourced from Day 6, when I tried to see what mkbootimg commands were there in the Lineage build… they still output 52M. So it’s not the unpack step that makes the images smaller… what can it be?

Next thing to sanity try: a kernel change.
I am going to inspire from xperia 10 IV kernel minimal changes. First resetting to Lineage config, then picking this change and packing it in the HADK kernel then the kernel in the Lineage boot image.

20 mins later - It boots.

Let’s pick the next change now, which implies NotKit has made sure to not break the ABI.
20 mins again:
Hmm fastboot plays tricks again, “fastboot: error: Failed to identify current slot”.
Disconnecting and reconnecting the cable and rebooting to bootload. Flashed.

Lineage boots.
But it says there’s an internal problem with my device. And I should contact the manufacturer.

But it works. Progress! Let’s pick the next change by NotKit.
20 more mins:
Same fastboot dance cable as before.
Lineage logo appears, but the device does not finishes booting.

Actually… it does, but touchscreen does not work, screen flickers and red ligt flashes the notification light :slight_smile:
I would call it a success - my ubports or hadk builds don’t do that :slight_smile:

Now I would like to test this kernel with either hadk or ubports.
However, logically, the Lineage one has not worked before for this test. There is a slight possiblity that some of the changes make this kernel more friendly with our GNU/Linux.

8 mins later:
Making hybris-recovery and flashing that as boot → restart in under 10 seconds.
But I had a sleep 5 followed by restart in that init. Let’s make it 30.
8 mins again:
Fastboot cable/reboot dance.
It still restarts under 10s :frowning: so nothing says that the sleep value was taken into account.
Desperately hoping that I would find a last_kmsg or /sys/fs/pstore message, I try fastboot boot twrp but that locks up stuck at the asus logo too…

So checking my notes, it seems that 10-14s reboot has always been the case for hybris-boot.img. Under 10s is just depending on how you count.
So no progress on the front of “Lineage-like kernel, but with hybris init”.

Let’s see ubports boot/vendor_boot.
That is, the one where ubports ramdisk is on vendor.

This takes at most 20s to reboot, and does go through the ASUS logo - hybris-boot hadnt.
It is also the same result as in the previous day.

trying to boot / flash twrp to boot again:
“Any key to shutdown” says a 8-pt font on a 300+ppi screen (I got do use an actual “search icon” to see that - that is, an actual hand glass)
Flashing doesn’t work until a 2nd reboot, twrp starts, no last_kmsg or /sys/fs/pstore as ever.

So I’m back to square one. I don’t know how to boot the linuxes in this boot/vendor_boot world

Not even trying with my early hack to have halium ramdisk moved to first_stage_ramdisk doesn’t seem to make a difference (didn’t check the build output though [I did in hindsight, it was what I expected])

And now Lineage boot says “Can’t load Android system. Your data may be corrupt. If you continue to get this mssage, you may need to perform a factory reset and erase all user data stored on this device” [Factory Reset/Try again]?.

Hmm. Maybe I should have erased user data between all these experiments.
Try again.
Lineage vibrates, shows loading screen and “internal problem with your device” (that second-to-last kernel…) but does boot on second try.

4 Likes

Day 14:

Let’s read the recovery init sources, the ones which produce a binary file - that work with both kernels, as opposed to the busybox shell script which doesn’t.

Again I’m grepping lineage mk files for the recovery-image target

find . -iname *.mk -exec grep “recovery-image” {} ; -print

No results.

find . -iname *.mk -exec grep “ramdisk-recovery” {} ; -print

Only one out result.
.bp?

Nope.
Just ramdisk then. Many hits. Filering the out/ ones and goldfish/cuttulefish ones (a generic device?/the emulator?)

./vendor/lineage/build/tasks/kernel.mk
./build/make/core/envsetup.mk
./build/make/core/board_config.mk
./build/make/core/main.mk
./build/make/core/config.mk
./build/make/core/tasks/sdk-addon.mk
./build/make/target/product/gsi_keys.mk
./build/make/target/product/virtual_ab_ota.mk
./build/make/target/product/developer_gsi_keys.mk
./build/make/target/product/security/Android.mk
./device/asus/sake/device.mk
./system/core/CleanSpec.mk
./system/core/rootdir/avb/Android.mk
./system/core/rootdir/Android.mk
./system/core/init/Android.mk
./system/sepolicy/Android.mk

The first one I inspect, ./build/make/core/board_config.mk, references a debug_ramdisk thing I also have in out/ and which has this adb_debug.prop with the following interesting contents:

# Note: This file will be loaded with highest priority to override
# other system properties, if a special ramdisk with "/force_debuggable"
# is used and the device is unlocked.
# Disable adb authentication to allow test automation on user build GSI
ro.adb.secure=0
# Allow 'adb root' on user build GSI
ro.debuggable=1
# Introduce this property to indicate that init has loaded adb_debug.prop
ro.force.debuggable=1
Sounds very useful, but is related to adb debugging.

The next one, ./build/make/core/main.mk has some targets I could grep for later:

.PHONY: ramdisk
ramdisk: $(INSTALLED_RAMDISK_TARGET)
.PHONY: ramdisk_debug
ramdisk_debug: $(INSTALLED_DEBUG_RAMDISK_TARGET)
.PHONY: ramdisk_test_harness
ramdisk_test_harness: $(INSTALLED_TEST_HARNESS_RAMDISK_TARGET)
.PHONY: vendor_ramdisk_debug
vendor_ramdisk_debug: $(INSTALLED_VENDOR_DEBUG_RAMDISK_TARGET)

The next one ./build/make/core/config.mk lists ramdisk in dont_bother_goals := :- )

Next one is ./build/make/core/tasks/sdk-addon.mk has some QEMU reference on the same line, I’ll skip it.

$(addon_dir_img):$(INSTALLED_QEMU_RAMDISKIMAGE):images/$(TARGET_CPU_ABI)/ramdisk.img \

I’m losing my patience and I skip to ./system/core/init/Android.mk but there isn’t anything of much more interest other than

# Install adb_debug.prop into debug ramdisk.
# This allows adb root on a user build, when debug ramdisk is used.
LOCAL_REQUIRED_MODULES := \
   adb_debug.prop \
# Set up the directories that first stage init mounts on.
LOCAL_POST_INSTALL_CMD := mkdir -p \
    $(TARGET_RAMDISK_OUT)/debug_ramdisk \
    $(TARGET_RAMDISK_OUT)/dev \
    $(TARGET_RAMDISK_OUT)/mnt \
    $(TARGET_RAMDISK_OUT)/proc \
    $(TARGET_RAMDISK_OUT)/sys \

Buut.
LOCAL_MODULE_PATH := $(TARGET_RAMDISK_OUT)
also

LOCAL_MODULE := init_first_stage
LOCAL_MODULE_STEM := init

Maybe in here there’s the .cpp file that builds init.

Detour thoughts (from Day 13) 1. /first_stage_ramdisk should have a fstab
2. boot img is further post-processed to grow, in the lineage build?

Looking at first_stage_init.cpp patched for hadk, the last lines do an exec of /sbin/droid-hal-init which I dont have (yet) because I have only started to boot the kernel, not the whole thing.

My hunch is that the same init source are used for system and recovery too. And that the recovery one is started first then it switches to the system one.

This “confirms” it, with a question: would it work if recovery is Lineage/Asus and system init would not be?
Let’s look at the same file in Lineage, so “unpatched”:

It just execs “/system/bin/init”. That is, first stage init, which is in boot.img’s recovery-ramdisk at system/bin/init, execs… itself?
Of course not.

The author of that line says himself:

Tom Cherry, 5 years ago (July 21st, 2018 12:57 AM)
split first stage init into a separate executable

In the future, systems with dm-linear will require a ramdisk to set up
the mount for system. In this world, first stage init will be a part
of this ramdisk and handle setting up dm-linear, mounting the
necessary partitions, then pivoting to the system image, which will
become the root partition.
This also enables previous devices without system-as-root, to be
unified with system-as-root devices for all aspects of boot after the
pivot_root.

:thinking:
I think this means that if I want to re-use the recovery ramdisk I have to have my init on system, already mounted by dm-linear. Which of course I don’t have.

But this doesn’t explain why literally having another “first-stage-init” wouldn’t work, so I need to dig into that more (why init busybox shell is not enough?)

1 Like

Day 15

/dev/console

This option connects stdin, stdout, and stderr to the console. It is mutually exclusive with the
stdio_to_kmsg option, which only connects stdout and stderr to kmsg.

Idea:
Set androidboot.selinux=permissive kernel cmdline

system/core/init/main.cpp calls SetupSelinux(argv) if “selinux_setup” is passed in, which is on first_stage_init.cpp last line that execs ‘/sbin/droid-hal-init’ or ‘/system/bin/init’ (originally)

There is a very nice Readme at /system/core/init/README.md:

First stage init has three variations depending on the device configuration: 1) For system-as-root devices, first stage init is part of /system/bin/init and a symlink at /init points to /system/bin/init for backwards compatibility. These devices do not need to do anything to mount system.img, since it is by definition already mounted as the rootfs by the kernel.
  1. For devices with a ramdisk, first stage init is a static executable located at /init. These
    devices mount system.img as /system then perform a switch root operation to move the mount at
    /system to /. The contents of the ramdisk are freed after mounting has completed.

  2. For devices that use recovery as a ramdisk, first stage init it contained within the shared init
    located at /init within the recovery ramdisk. These devices first switch root to
    /first_stage_ramdisk to remove the recovery components from the environment, then proceed the same
    as 2). Note that the decision to boot normally into Android instead of booting
    into recovery mode is made if androidboot.force_normal_boot=1 is present in the
    kernel commandline.

Once first stage init finishes it execs /system/bin/init with the “selinux_setup” argument. This
phase is where SELinux is optionally compiled and loaded onto the system. selinux.cpp contains more
information on the specifics of this process.

Lastly once that phase finishes, it execs /system/bin/init again with the “second_stage”
argument. At this point the main phase of init runs and continues the boot process via the init.rc
scripts.

Back to “why my hadk init does not execute?”:

file busybox from hadk

ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, for GNU/Linux 2.6.32

file init from lineage

ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /system/bin/linker64

[hadk]$ find . -iname busybox -exec file {} ; -print

They are all 32-bit.

In ubports/

ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, for GNU/Linux 3.7.0

CONFIG_COMPAT=y is set in kernel (all of them, asus, lineage, hadk…) so 32-bit execution should work

Let’s still copy busybox from ubports to hadk/external/busybox :slight_smile:
Flashing the 64-bit busybox… still reboots at ~14 seconds :frowning:
Flashing an older ubports boot.img shows ASUS logo for more than 60 seconds though… (24Feb)
Flashing another older ubports boot.img (25Feb) shows the same…
Making a system/bin folder in first_stage… and linking init to ../../init
Making a vendor_boot with that - still reboots in about 20 seconds

Day 16

Let’s see why /sys/fs/pstore/ has no console-ramoops when the device reboots.
All CONFIG_PSTORE-related flags are present in the kernel.
Recovery (at least Lineage one) does mount pstore.

(pstore logging is based on a memory region that is not cleared upon reboot, where the kernel puts the last logs before crashing).

Grepping through the kernel, Documentation/admin-guide/ramoops.rst says that the pstore mem can be configured:

  • as kernel params
  • dts (devicetree) files may indicate pstore support.
  • using a ramoops_platform_data struct reference in C.

Looking at my current SFOS device, I do have ‘ramoops_memreserve=4M’ in /proc/cmdline at runtime.
I don’t have it in BoardConfig.mk though. (The next day I found out that cmdline can be prepended from .dts files and it was there)
Lineage on Zenfone does not have anything “ramoops”-related in /proc/cmdline

My current SFOS device also has printk.devkmsg=on

Let’s add

BOARD_KERNEL_CMDLINE := \
    printk.devkmsg=on \
    ramoops_memreserve=4M \
    androidboot.console=ttyMSM0 \

Building, flashing, rebooting to bootloader - still no /sys/fs/pstore contents

Except the kernel ramoops.rst, I also found a stackoverflow post also referencing dts/devicetree files.
Let’s download the ASUS OSS kernel to see if there is any ramoops region defined in dts[i] files, Lineage doesn’t seem to have.
There is also good advice in postmarketOS wiki:

Make sure that the memory address you reserve and its size is the same in your downstream and mainline configuration

Day 17

SFOS on tucana /proc/cmdline

ramoops_memreserve=4M rcupdate.rcu_expedited=1 rcu_nocbs=0-7 console=ttyMSM0,115200n8 earlycon=msm_geni_serial,0x880000 androidboot.hardware=qcom androidboot.console=ttyMSM0 androidboot.memcg=1 lpm_levels.sleep_disabled=1 video=vfb:640x400,bpp=32,memsize=3072000 msm_rtb.filter=0x237 service_locator.enable=1 swiotlb=1 loop.max_part=7 androidboot.usbcontroller=a600000.dwc3 selinux=1 enforcing=0 audit=0 androidboot.selinux=permissive systemd.legacy_systemd_cgroup_controller=yes androidboot.init_fatal_reboot_target=recovery printk.devkmsg=on androidboot.verifiedbootstate=orange androidboot.keymaster=1 root=PARTUUID=ed7e0db9-1db3-d6bc-273c-2d091d3262b3 androidboot.bootdevice=1d84000.ufshc androidboot.serialno=ab5fb9e6 androidboot.ramdump=disable androidboot.secureboot=1 androidboot.cpuid=0x5670fc70 androidboot.hwversion=6.19.0 androidboot.hwc=GLOBAL androidboot.hwlevel=MP androidboot.baseband=msm msm_drm.dsi_display0=dsi_xiaomi_f4_41_06_0a_fhd_cmd_display: skip_initramfs rootwait ro init=/init androidboot.dtbo_idx=13 androidboot.dtb_idx=4 androidboot.dp=0x0

lineage on sake /proc/cmdline

log_buf_len=256K earlycon=msm_geni_serial,0x98c000 rcupdate.rcu_expedited=1 rcu_nocbs=0-7 kpti=off androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enable=1 swiotlb=0 buildvariant=userdebug androidboot.verifiedbootstate=orange androidboot.keymaster=1 androidboot.bootdevice=1d84000.ufshc androidboot.fstab_suffix=default androidboot.boot_devices=soc/1d84000.ufshc androidboot.serialno=M4AIB76087246RZ androidboot.baseband=msm msm_drm.dsi_display0=qcom,mdss_dsi_samsung_fhd_cmd: androidboot.slot_suffix=_b rootwait ro init=/init androidboot.dtbo_idx=4 androidboot.dtb_idx=0 androidboot.force_normal_boot=1 androidboot.pre-ftm=0 androidboot.id.prj=4 androidboot.id.stage=7 androidboot.id.sku=3 androidboot.id.rf=1 androidboot.id.pcb=4731 androidboot.id.nfc=1 androidboot.country_code=EU androidboot.bootcount=0 androidboot.rawdump_en=0 androidboot.asus.authorized=0 androidboot.cpuid.hash=7facfc3cde1faa8fca410840eaf6adde androidboot.toolid=043f7063abdf40289d70cb1b1ec86413 SB=Y androidboot.fused=1 SBNR=Y androidboot.fused.norpmb=1 androidboot.id.panel=1 androidboot.id.ufs=3 androidboot.factory.crc=0x84A43520 androidboot.ddr.manufacturer_id=FF androidboot.ddr.device_type=8 LCD=96000141

Let’s manually trigger a kernel panic in Lineage
# echo c > /proc/sysrq-trigger

AND WE GET OUR FIRST RAMDUMP REFERNECE! :ghost:

“Waiting for flashing full ramdump” it’s a whole new screen that pops up instead of bootloader / ASUS logo or Lineage one.

On Vol-Dn and Power, Lineage boots - but no console ramoops in sys/fs/pstore…

I again ask for ideas on #sailfishos-porters channel.
Two great ideas:

  1. modify the Lineage ramdisk too (after I tried to replace the kernel) to see if that boots.
  2. adding printk.always_kmsg_dump=y too to my BOARD_KERNEL_CMDLINE and CONFIG_PANIC_TIMEOUT=20 to my kernel would help me again to observe if there is an additional timeout / pstore log.

For 2, I do the changes which involve a 22 minutes build including the kernel.
Unfortunately, the new hybris-recovery image does not play differently, reboots just as fast and doesnt’t write to /sys/fs/pstore.

Some other ideas:
3. modify the hybris boot compression to use lz4 → same behavior as before
4. oops=panic makes kernel oops behave as panic → same behavior as before.

So in conclusion, there is no kernel problem :-S because the 20 seconds from OOPS or PANIC are not added.

Let’s also try adding changes to lineage ramdisk - point 1. above.

cpio ramdisks seem hard to work with since they may pack files owned by root and you can’t just unpack and re-pack them.
For my first test, I’ll just --append a file, to see how that goes.

find ./testonefile | cpio -oA -H newc -F ramdisk-appendone.cpio
and repackaged
mkbootimg --out boot-appendone.img --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel boot/kernel --ramdisk boot/ramdisk-appendone.cpio.lz4 --cmdline ''

And. Lineage. Does. Not. Boot. with that.

New error is always progress :wink:

1 Like

“Waiting for flashing full ramdump”

So the phone is stuck in this or you can boot something else?

1 Like

So far it’s just a screen that looks like this, and can be made go away by VolDn + Power.

This post on stackoverflow explains that it might be “just” a kernel panic that, along with bootloader support for MAGIC_CRASH makes the next boot look like this.

My hypothesis is that ASUS bootloader included this MAGIC_CRASH handler in the bootloader so this is why they have reports of the “dreaded ramdump bug” instead of just “dreaded bootloop”.

I am still trying to figure out what tool I can use from host to get the dump :wink:

1 Like

Day 18

(30 seconds).

It seems I missed in Day 13 to actually apply this first change I was talking about.
Let’s do that, since it is called “minimal changes to boot” :).

Hmm… there was a boot.img created with hadk scripts, possibly with the first kernel I changed to pass all mer checker tests and other kernel checkers.
I think that image has the lingeage ramdisk with that kernel. I’m trying to boot that and for a second time I get the
‘Waiting for flashing full ramdump’ message [1].
Since then, I have taken back most of the kernel changes and re-made them ‘incrementally’ based on the sony xperia 10 IV ubports repo.
I re-build make bootimage and I still get that ramdump message though after ASUS logo stays for some time - and then dissappears for some time…
‘Waiting for flashing full ramdump’ again. This may be because I just added oops=panic yesterday?


That file I appended yesterday to the lineage ramdisk… Let’s look into build/make/core/Makefile
After copying a lot of files to $(TARGET_RECOVERY_ROOT_OUT), it executes a command $(BOARD_RECOVERY_IMAGE_PREPARE).
Then it uses mkbootfs instead of cpio, citing the kernel Documentation/driver-api/early-userspace/buffer-format.rst - which just describes cpio…

find . -iname \*.mk -exec grep BOARD_RECOVERY_IMAGE_PREPARE {} \; -print finds nothing (tried .bp and Makefile too)

mkbootfs.c seems to have a list of permissions and ownerships for android filesystem.
Maybe the problem is that the file I appended was created with my local user, which has uid/git of 1000 - that corresponds to system in Android.
However, changing the file ownership to root does not make the device boot.
Neither does… not adding a file at all:). That may be because the way I use LZ4 command line (defaults?)
That same core/Makefile above says $(LZ4) -l -12 --favor-decSpeed
Yup! Victory! one file appended to that ramdisk!

Next test: hybris-recovery with lz4 -l (“Use Legacy format (typically for Linux Kernel compression)” - get it?).
This time, instead of rebooting in ~14 seconds, it gets stuck at bootloader image.
Progress is getting redefined weirder and weirder.

Back to my non-booting hybris-recovery.img:

I’ll change the gzip -9 commands inside hybris-boot/Android.mk to lz4 -l -12 --favor-decSpeed -
The .gz extensions to lz4.

This will allow the bootloader to concatenate the hybris boot.img ramdisk with vendor_boot.img ramdisk.
But I remember, the vendor_boot is the one with the kernel cmdline. Are there any differences between Lineage vendor_boot and HADK boot cmdline? Maybe.

I try to boot a hybris-recovery.img lz4’d with that but the result is still stuck at logo.
Booting the equivalent boot.img (if you remember, hybris-hal also builds one just as lineage does…) - it also does not get me out of the woods, but it does switch to ASUS logo, then blank for a long time, then RAMDUMP screen [1].

Maybe the ramdisk in that boot is different, maybe the kernel cmdline should be left out…
Trying with lineage ramdisk and no cmdline → no sign of life, no ASUS logo.
Trying with lineage ramdisk and cmdline → the same
Non sense, IMO - lineage booted just fine with my previous kernels?
I cannot make it boot now.
And there is a slight difference between the hybris kernel + lineage ramdisk vs the hybris-recovery: the latter displays an android logo over the selected bootloader option, while the latter does not…

Oh my!, I think I just made a header_version = 0 image

mkbootimg --header_version 0 --kernel boot/kernel-lineage --ramdisk boot/ramdisk-lineage --cmdline ‘’ --out boot-lineage-lineage-nocmdline.img

instead of

mkbootimg --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel boot/kernel --ramdisk boot/ramdisk --cmdline ‘’ --out boot-appendone.img

Mbbfrp… that was it, lineage boots with my kernel alright :facepalm:

However, with my ramdisk (from boot.img) it gives a RAMDUMP.
[1] I think I know why: that ramdisk has a different system/init/bin. That is Android’s init, which is re-used now in boot, recovery and in SFOS too, but patched.
The one that runs droid-hal-init when SFOS is staring. That may be the cause of my RAMDUMP. So… I can ignore this boot.img for now:)

There is one thing that tricked me before into making a header-0 image, and that is unpack_bootimg output from hybris-recovery.img.
I need to make that header-3 too.

Booting that - wow, new symptoms again - 10 seconds and the phone immediately shuts down. Effectively displays “charging” logo.

Not convinced. one more try:) This time, I use the “unlocked bootloader” feature to ‘press any key to stop booting’, THEN I set up my Timer, then click to resume. Asus logo… 35 seconds or so to… reboot.

If you remember, what was the sleep that I introduced in init[2] shell? 30 seconds.

I back out my changes from init, make hybris-recovery again, unpack it, repack it with image version 3.
And it reboots in about 65 seconds.
Not bright, not terrible - this means that RNDIS didn’t work.


[2] my changes to add sleep 30; reboot -f were wrong, I had to move them after busybox --install a couple of days ago, after elros34’s advice on #sailfishos-porters

1 Like

Day 19

To recap, I replaced my own-built kernels into Lineage’s boot.img and they worked.
I tried to replace Lineage’s ramdisk in hybris boot.img and it didnt.
=> This has lead me to the fact that hybris boot/recovery is built using image header 0 (instead of 3).

I tried to append a file to the Lineage ramdisk.
=> This revealed that there are particular lz4 parameters used for the ramdisk

Now this part of the init script does not work

usb_setup() {
    if [ -d $ANDROID_USB ]; then
        usb_setup_android_usb $1
    elif [ -d $GADGET_DIR ]; then
        usb_setup_configfs $1
    fi
}

where ANDROID_USB=/sys/class/android_usb/android0 and GADGET_DIR=/config/usb_gadget.
Let’s see what devices are available under Lineage recovery:

ASUS_I006D:/ # ls -l /sys/class/android_usb/android0/                                                                                                
power/                   state                    subsystem/               uevent                   waiting_for_supplier
ASUS_I006D:/ # ls /config/usb_gadget/g1/                                                                                                             
UDC           bDeviceProtocol  bMaxPacketSize0  bcdUSB   driver_match_existing_only  idProduct  max_speed  strings
bDeviceClass  bDeviceSubClass  bcdDevice        configs  functions                   idVendor   os_desc

So both are.

  1. Android USB

# This sets up the USB with whatever USB_FUNCTIONS are set to via android_usb

The first command in usb_setup_android_usb() is:

    write $ANDROID_USB/enable          0

Executing that write on Lineage does not work, write: inaccessible or not found

  1. Gadget

The first command in usb_setup_configfs() is:

    write $GADGET_DIR/g1/idVendor                   "0x18D1"

Executing that write on Lineage does not work, write: inaccessible or not found

Grepping #sailfishos-porters logs, I again find an advice from elros34 to see what is my android device using in */usb.rc files.
These .rc files are Android’s init system’s.

Doing find device/asus/sake -iname \*.rc -exec grep usb {} \; -print finds me a configfs reference.

But let’s first fix the build to generate header-3 images; Adding the previous Lineage custom mkbootimg shellscript gives us

MBOOTIMG --ramdisk out/target/product/sake/obj/ROOT/hybris-recovery_intermediates/recovery-initramfs.lz4 --kernel out/target/product/sake/kernel --dtb out/target/product/sake/dtb.img --base 0x00000000 --pagesize 4096 --cmdline printk.devkmsg=on printk.always_kmsg_dump=y ramoops_memreserve=4M androidboot.ramdump=disable androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 androidboot.selinux=permissive cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enable=1 swiotlb=0 bootmode=debug --output out/target/product/sake/obj/ROOT/hybris-recovery_intermediates/hybris-recovery.img

Somehow, the BOARD_MKRECOVERYIMG_ARGS which have --header_version $(BOARD_BOOT_HEADER_VERSION), don’t make it to hybris-boot/Android.mk
I will add the --header_version $(BOARD_BOOT_HEADER_VERSION) manually to the $(MKBOOTIMG) command.

Building a new hybris-recovery & booting it shows some difference from yesterday (ASUS logo instead of stuck on Android one) and reboots in 65+ seconds… into a “Safe mode” Lineage. Didn’t know that was a thing :slight_smile:

Now reordering the usb_setup lines in init script to read

usb_setup() {
    if [ -d $GADGET_DIR ]; then
        usb_setup_configfs $1
    elif [ -d $ANDROID_USB ]; then
        usb_setup_android_usb $1
    fi
}

But I get the same result as above, reboot in 60+ seconds.

Let’s unpack/repack the image as yesterday. Same 65+ seconds reboot, I think the image is packed alright but the kernel still misses some configs to get usb telnet.


Grepping for usb in .rc files as above also found me a reference to /vendor/etc/init/hw/init.qcom.usb.rc which I find only on the device (interesting, I thought this Lineage build was also building vendor…?)

init.qcom.usb.rc seems to be using write /config/usb_gadget/g2/configs/b.1/strings/0x409/configuration "rndis", that is, b.1 instead of c.1 as in the init script.

I ask on #sailfishos-porters since I don’t see by grepping the logs if someone did this at any other point in time.
elros34 is again of much help giving me another advice: if I know that my kernel boots, maybe I should just… boot it. Meaning, use hybris-boot.img instead of hybris-recovery.img.

Then I will get logs and could iterate later in “rootfs”.

What he means is that, if /data is mounted correctly and /data/.stowaways/sailfishos is there, then I can debug by editing that directly (through recovery, or even Lineage). That’s because hybris-boot will not stop/reboot if there is no USB device up. I’d also have logs persisted in init.log.

But I still get that BadDLLP error on my host. Let me try that kernel param again… nope, it just messes up with my flashing (needs me to reboot booatloader?) but usb is not talking to me any more than ever, no telnet.

Next day I’ll create the droid-config-sake, droid-hal-sake and droid-hal-version-sake repos and try a real SFOS build encouraged by

kernel runs init, proof by changing reboot sleep

(as postulated in Day 11)

2 Likes

Day 20

HADK guide,
7.1 Creating Repositories for a New Device

(Skipping sparse/var/lib/environment/compositor/droid-hal-device.conf - it is already present in droid-configs-device/sparse/var/lib/environment/compositor/droid-hal-device.conf, without the touchscreen device part.)

After creating the three repos droid-hal-sake, droid-config-sake, droid-version-sake and I also check in PLATFORM_SDK$ (sfossdk command)

  • cat /etc/os-release gives me 4.3.0.15, which is minimum supported
  • sdk-assistant tooling list contains 4.4.0 (which I want to target) and ‘latest’
  • sdk-assistant target list contains… xiaomi-tucana :slight_smile:

I need to create a new target - I download Sailfish_OS-4.4.0.58-Sailfish_SDK_Target-aarch64.tar.7z and run sdk-assistant target create asus-sake-aarch64.

But how will the next commands know that they should use this new target?
There’s a footnote in the HADK doc that says:

mb2 looks for a directory named .mb2, where it stores some of its state. It is created implicitly by mb2 … build and you can also
create it explicitly with mb2 -t $VENDOR-$DEVICE-$PORT_ARCH build-init

Indeed, I now see that cat .mb2/target has asus-sake-aarch64.default. So I can proceed without breaking my previous target.

Next, execute commands from “7.2.1 Building the droid-hal-device packages” HADK doc.

rpm/dhd/helpers/build_packages.sh --droid-hal

The first command fails while checking CONFIG_DUMMY (should be n), CONFIG_UEVENT_HELPER_PATH and CONFIG_FW_LOADER_USER_HELPER (should be n).

I think my “minimal changes” kernel will probably not work with these checks.
Either I have to a). disable the checks “temporarily” or b). switch back to the initial branch that passed through all the kernel checkers.
Of course I go with b). initially. I always wanted to test hybris-recovery with that anyway:)

The kernel “works” in the same way meaning that it “reboots when I say so”, but still no telnet.
It seems to pass the --droid-hal check, which now complains that cannot stat './out/target/product/sake/hybris-updater-script'

That one seems to be missing from the fact that there is no /boot detected when I make hybris-hal (that is, kernel and boot images):

/boot appears to live on
/data appears to live on /dev/block/bootdevice/by-name/userdata

It definitely doesn’t “Live on”:). That was supposed to be a device path.
The hybris-boot/Android.mk presents me with the challenge to understand Perl.

/usr/bin/perl -w -e '$$fs=shift; if ($$ARGV[0]) { while (<>) { next unless /^$$fs\s|\s$$fs\s/;for (split) {next unless m(^/dev); print "$$_\n"; }}} else { print "ERROR: *fstab* not found\n";}' /boot $HYBRIS_FSTABS | sort -u

It’s one line, how hard can it be ;)). HYBRIS_FSTABS is a list of fstab files.
So, the perl expression is using shift which probably means it read the first argument (which is /boot) then for the ‘next first argument’ ARGV[0] it uses some while spaceship (probably iterating through each fstab path?) where next is called (continue?) unless that $fs is matching at the beginning or between \s some \s and then split and for next unless /dev…

Anyway, enough perl, I didn’t have a /dev path in the fstab that listed /boot.

The build_packages.sh --droid-hal seems to start the real work now.

Meanwhile mal explains what to do with dynamic partitions. Right on time:) Telegram: Contact @sailfishos_porters
So I need to skip generating some systemd mounts in droid-hal spec file.

%define makefstab_skip_entries /odm /product /system /system_ext /vendor

Also I get this complaint from --droid-hal build above:

error: Installed (but unpackaged) file(s) found: /bugreports /cache /d /sdcard

Which I resolve by adding %define straggler_files .. with that list above.
Building of droid-hal-sake finished successfully.

Next is rpm/dhd/helpers/build_packages.sh --configs
That already installs some packages in the target (bluez5 etc) but works.

Next is rpm/dhd/helpers/build_packages.sh --mw
mmmiddleware… La crème.

Nothing to do.
Build libhybris? [Y/n/all]

If I remember correctly it asks for each one of the packages. So I’ll go with “all”
That takes some time but finishes alright

Next is rpm/dhd/helpers/build_packages.sh --gg

Please build droidmedia as per HADK instructions
!! Failed to pack_source_droidmedia-localbuild.sh

Well, let’s TEMPORARY_DISABLE_PATH_RESTRICTIONS=true make -j$(nproc --all) droidmedia one more time, I didn’t since a couple of days.

ninja: no work to do.

Still the same error.

touch external/droidmedia/droidmedia.cpp

Doesn’t build anything either. There is no out/target/product/sake/system/lib*/libdroidmedia* either,
Looking at external/droidmedia/Android.mk I see LOCAL_MODULE := libdroidmedia (along others). Let’s make that…?
Actually, I think I know what’s going on. I checked out the 0.20220929.0 tag for 4.5.0 as in hadk-hot and for 4.4.0 the 0.20211101.0 tag was supposed to be used.

But I get the same error. I need to understand where droidmedia target is.
Grepping the *.mk files again, I find it in … hybris-boot:

droidmedia: $(shell external/droidmedia/detect_build_targets.sh $(PORT_ARCH) $(TARGET_ARCH))

If I manually run that command I get the list of targets libdroidmedia minimediaservice minisfservice libminisf from external/droidmedia.
I’ll just make all of those and assume that me making hybris-hal without droidmedia marked that shell target generation as ‘done’ and it doesn’t get evaluated again \o/

It completes.
Next is rpm/dhd/helpers/build_packages.sh --version

Check /home/vlad/hadk/hybris/droid-hal-version-sake.log for full log.
!! building of package failed

Checking that log reveals:

File /etc/ofono/binder.conf from install of droid-config-sake-1-202303222304.aarch64 (dir:/home/vlad/hadk/hybris/droid-hal-version-sake/.mb2/filtered-output-dir) conflicts with file from package ofono-configs-binder-1.0.2-1.1.1.jolla.aarch64 (@System)

File conflicts happen when two packages attempt to install files with the same name but different contents. If you continue, conflicting files will be replaced losing the previous content.

Which means that I need to make a change to droid-config-sake spec.
Hadk-hot has the solution for that readily available,

Provides: ofono-configs
Obsoletes: ofono-configs-mer

Adding those, re-building --configs, then --version

While building I remembered that Lineage /data is going to get wiped for the next test.
I install Open Camera and notice that if you switch to Camera 2 API, there are settings for Edge Detection and Noise reduction.
Cranking the Edge detection down (off) and Noise reduction to “High Quality” I get similar results to my Mi Note 10 that has a larger sensor but tends to not sharpen but to aggressively remove noise. The Camara 1 API shots on ASUS are however very sharpened by default. It is good to know that just changing the API adds access to those options. Of course, Noise reduction should be minimal too :slight_smile:

Hmm. same error in hybris/droid-hal-version-sake.log.
A quest for tomorrow:)

2 Likes

Day 21

That droidmedia missing build from yesterday - I figured I had no PORT_ARCH defined, that’s why the shellscript describing the targets from hybris-boot was not working.

Normally you have a ~/.hadk.env that is sourced when you run sfossdk which gives you PlatformSDK$ prompt and then from there you run ubu-chroot to build the Android bits. However, I quit that on a build error on Day 10 and directly built the Android bits from outside the sdk.
Exporting PORT_ARCH and the other env vars made it work out of the box.

That droid-hal-version error build from yesterday, about ofono configs errors?
That had a follow-up in hadk-hot: you need to manually install the resulting new package, which interactively asks you if you’re ok to overwrite a file from ofono-configs-binder. This is the reason --configs step was successful but the package did not install in --version step.

I re-run all the --droid-hal/--configs/--gg/--version build commands above (-d, -c, -g, -v).
They pass.
The last one is build_packages.sh --mic - Mer Image Creation I believe.

This one fails with Requires [rpmlib(PayloadIsZstd)-5.4.18-1], which is not provided
I believe this has something to do with changing compression method recently.
mal recommends for other porter encountering the same error to "sdk-foreach-su -ly ssu re 4.5.0.18", "sdk-foreach-su -ly zypper ref" and then "sdk-foreach-su -ly zypper dup"
Well… and I was planning to build 4.4.0…
The changelog for 4.4.0 mentions [rpm] Add zstd support.. The one for 4.5.0 [meego-rpm-config] Change rpm compression to zstd
Not sure but my guess would be that the new compression needs to be supported in 4.4.0 already for 4.5.0 to… install.
So let’s upgrade just what was 4.3 then. I run sdk-foreach-su -ly ssu re 4.4.0.72 because I am stubborn and born to make choices that hit errors.

sdk-foreach-su: Executing in ‘SailfishOS-4.4.0’ tooling…
File ‘/repodata/repomd.xml’ not found on medium ‘https://releases.jolla.com/releases/4.4.0.72/mer-tools/builds/i486/packages/

Looking at Index of /sdk/targets/ actually only 4.4.0.58 is listed. Use that instead :slight_smile:
Well… that doesn’t work either, - it would probably work if I install manually, but the sdk-foreach left me in some half-state I wouldn’t want to debug. Let’s go with 4.5.0.18 thus…
FFS, not even that works \o/

In hindsight, I see that only ‘SailfishOS-latest’ tooling gives error, while ‘SailfishOS-4.4.0’ tooling does not, ‘SailfishOS-latest-armv7hl’ build target and ‘SailfishOS-latest-i486’ build target do, while ‘asus-sake-aarch64’ build target does not.
So it may be something about -latest things, which are the ones installed with SailfishOS IDE…

Buut… getting to the zypper dup step:

Error: Subprocess failed. Error: RPM failed: error: unpacking of archive failed: cpio: Bad magic

This has to be compression. Maybe I skipped 4.3 => 4.5 directly.

I remove the -latest targets/tooling (sdk-assistant remove target .., ..tooling). And try again to update to 4.4.0.72
As stated above, the zypper ref step does not fail (it only failed on -latest thingies).

All these commands are really commands to upgrade SDKs but using the same upgrade system as our phones.
The good thing is, if I break these “phones” I get to remove them and install them afresh :slight_smile:

With 4.4.0.72 and without -latest toolings/targets, zypper does start to instal packages without cpio error.
One of the packages that will get installed will be the rpm with support for zstd :fingerscrossed:

Error message: Could not resolve host: releases.jolla.com

?!?
On porters channel: “you need to exit sdk and re-enter before you can use network in sdk after you have updated it.”

Yes. That zypper dup across the universe takes some time…
Actually it takes so much time I think there’s a bug.
It may have been faster to re-create those SDK/tooling/target
I’m half-heartedly posting this before going to sleep while package 20 out of 573 still installs, instead of pressing Ctrl-C impulsively

Day 22

–mic finishes SailfishOScommunity-release-4.4.0.58-sake-zero/sailfishos-sake-release-4.4.0.58-zero.zip
Which contains a bz2.

Rebooting into Lineage Recovery, through adb root shell, I notice there is no bunzip2
The toybox command that wraps everything doesn’t know it either: toybox bunzip2

toybox: Unknown command bunzip2

So I push my 64-bit busybox (from ubports) to /system/bin and ln -s /system/bin/busybox /system/bin/bunzip2
Next thing, tar --help does show j bzip2 compression - so that might use the bunzip2 command
(The truth is busybox is 1.4M while toybox is under 500k)

Next I use Lineage recovery “Format data/factory reset”.
mount /data/ now works as f2fs and it’s empty. Maybe it should be ext4, but let’s see.

Next I use Lineage recovery “Apply update”/“Apply from adb”
adb sideload SailfishOScommunity-release-4.4.0.58-sake-zero/sailfishos-sake-release-4.4.0.58-zero.zip
Package not signed, install anyway? Yes

E:Error in /sideload/package.zip (killed by signal 11)

Maybe the sideloading is for “update” packages which I am not familiar with.
My zip file has a straightforward updater-unpack.sh however:

FS_ARC="/data/sailfishos-rootfs.tar.bz2"
FS_DST="/data/.stowaways/sailfishos"

rm -rf $FS_DST
mkdir -p $FS_DST
tar --numeric-owner -xvjf $FS_ARC -C $FS_DST
EXIT=$?

rm $FS_ARC

So I just push the .bz2 file and tar --numeric-owner -xvjf /data/sfe-sake-4.4.0.58-zero.tar.bz2 $FS_DST

tar: exec bzip2: No such file or directory

Ah, so I should link that to busybox…

bzip2: short write

Altough the error comes from bzip2, I think tar from toybox may be the culprit here.
Let’s link that to busybox too

tar: /data/.stowaways/sailfishos not found in archive

:facepalm:! I forgot an -C before $FS_DST: tar --numeric-owner -xvjf /data/sfe-sake-4.4.0.58-zero.tar.bz2 -C $FS_DST
Good. it works.

Now we’re going to reboot to bootloader and boot (without flashing) the hybris-boot.img.
Remember, if we flash it, Lineage recovery is gone since it lives on boot.img for this device. And we need that to inspect what happened in /data/.stowaways/saifishos…

After 60 something seconds it reboots.
Entering Lineage Recovery, adb root shell, mounting /data… it’s empty!
Hmm… how could that be?

Let’s re-trace the steps and just reboot into recovery:
HOST: adb push external/busybox/busybox /system/bin/
ADB#: ln -s /system/bin/busybox /system/bin/bzip2
ADB#: ln -sf /system/bin/busybox /system/bin/tar
HOST: adb push sfe-sake-4.4.0.58-zero.tar.bz2 /data/
ADB#: mkdir -p /data/.stowaways/sailfishos
ADB#: tar --numeric-owner -xvjf /data/sfe-sake-4.4.0.58-zero.tar.bz2 -C /data/.stowaways/sailfishos

Reboot to bootloader → Reboot to recovery
ADB# mount /data
ADB# ls /data/.stowaways/sailfishos/

Rootfs is there. Let’s boot hybris-boot again …
Reboots in 60 seconds, let’s hold Vol-up, enter Recovery, enable Adb shell
Now stoways survive.
And I got my first init.log!

4 Likes

Day 23

The problem with the log before is that I didn’t see any of the device changing on my Host (like: “init-debug in real rootfs”)

elros has figured that in my previous log there are two lines output to $GADGET_DIR/g1/UDC and suggests this change to exclude the dummy one.
With that fix, the next boot, while still doesn’t show any new USB device on my host computer, activates an usb0 device (suprised it’s not rndis0) and configures it with an IP. It starts dhcp server even though writing to /etc/udhcpd.conf fails (it’s a link to /run/usb-moded/udhcpd.conf that doesn’t exist?)
Next I grep my vendor/device sources for init.usb.rc files (unfortunately I dont’ find any meaninful ones, which is surprising).
Lineage recovery has a GUI option to mount system. When I do that, it appears on /dev/block/dm-2. I scroll up to Day 10 when I determined by dmctl list devices that vendor_b is on 4th dm device. I mount that in /mnt/vendor
$ find /mnt/vendor -iname *.rc -exec grep gadget {} ; -print
And I find this /etc/init/hw/init.qcom.usb.rc and in another rc file that vendor.usb.rndis.func.name is “gsi”.
Notice that the init script uses b.1 instead of c.1 and gsi.rndis instead of rmnet_bam

I also remembered that my hybris-boot.img is concatenated with a vendor_boot which probably doesn’t have my kernel modules, but Lineages’. Grepping hadk/out/target/product/sake/obj/KERNEL_OBJ I find a lot of *.ko files, even rmnet_ctl.ko and rmnet_core.ko
I descover that while toybox has the command ‘insmod’, busybox doesn’t. I can’t use toybox in Mer init, since that is not a glibc executable.
However in rootfs there seems to be a kmod command which works about the same as [busy|toy]box, in the sense that it can have link to it so it starts another command. ln -sf /usr/bin/kmod /usr/bin/insmod seems to work.
Initially I upload the two modules in root and manually issue

    insmod /rmnet_ctl.ko                                                       
    insmod /rmnet_core.ko

right after do_mount_devprocsys in the init script (I now remembered that I do have all the modules in rootfs /lib/modules :slight_smile:
Also, I create /init_enter_debug2 which grabs the output of some debugging tools too (ps, netstat, mount) and I also add dmesg there (dmesg output was present in my first log because no rndis/usb device was found and it rebooted. This time, usb0 is present and it doesn’t reboot.)

I get something like this as the next log which is showing some errors in dmesg:

RNDIS_IPA@rndis_ipa_init_module@2447@ctx:swapper/0: failed to create IPC log, continue…
gsi_set_inst_name: Err allocating ipc_log_ctxt for prot:gsi.rndis
gsi_bind: ipa is not ready
gsi_bind: ipa ready timeout
configfs-gadget a600000.dwc3: failed to start g1: -110

(-110 is probably timeout, according to perror 110)

Ideas:

  1. search for rmnet commit logs in kernel, something changed from 4.19 where this worked to 5.4
  2. remove vendor_boot kernel cmdline, as I now have duplicate definitions (is there a limit?). That would render Lineage unbootable, maybe better remove the duplicates from hybris-boot…?
  3. replace lineage boot.img kernel with mine so that I can test rndis commands in adb shell?
  4. use modprobe for /lib/modules
bloated cmdline
 log_buf_len=256K earlycon=msm_geni_serial,0x98c000 rcupdate.rcu_expedited=1 rcu_nocbs=0-7 kpti=off androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enable=1 swiotlb=0 buildvariant=userdebug printk.devkmsg=on printk.always_kmsg_dump=y ramoops_memreserve=4M androidboot.ramdump=disable androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 androidboot.selinux=permissive cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enabl

For 4.

# ln -sf /usr/bin/kmod /usr/bin/modprobe`
# modprobe -D rmnet_core -S 5.4.61-qgki-perf-gbface408530e-dirty
insmod /lib/modules/5.4.61-qgki-perf-gbface408530e-dirty/rmnet_ctl.ko 
insmod /lib/modules/5.4.61-qgki-perf-gbface408530e-dirty/rmnet_core.ko 

I have to override the kernel version because in adb shell uname -r is without -dirty.

# ln -sf /usr/bin/kmod /usr/bin/depmod`
# depmod -a -v

New log with depmod -a: Ubuntu Pastebin
Lineage recovery dmesg for comparison: Ubuntu Pastebin

Looking at kernel sources.
rndis_ipa_init_module message “failed to create IPC log, continue…” seems to be a debugging log, we can probably ignore that.
gsi_set_inst_name: “Err allocating ipc_log_ctxt for prot:gsi.rndis” seems to be the same ipc_log_context_create as above.

gsi_bind: “ipa is not ready”. This is call from drivers/usb/gadget/function/f_gsi.c to ipa_register_ipa_ready_cb which is in drivers/platform/msm/ipa_fmwk/ipa_fmwk.c. The ipa_register_ipa_ready_cb could have output “ipa framework hasn’t been initialized yet” but it does not, and a message above does say “IPA framework init”, so initialization was started, but maybe not done, or it didn’t do something.
The call returns non-0, which is -EEXIST or -ENOMEM. The pr_debug is probably missing output my configuration.

The dynamic debug flag is not (yet) present in my configuration, so pr_debug is probably printk(KERN_DEBUG and that can be made visible with loglevel=7 in cmdline.

But before I change the cmdline, I can already si a pr_info("IPA framework init\n"); in the logs but no pr_info("IPA driver is now in ready state\n"); so it’s safe to assume that the error above is triggered by a timeout in this init.

From ipa_clients_manager_init() I get all the logs except the last one, ipa3_notify_clients_registered().
ipa3_notify_clients_registered calls ipa3_register_to_fmwk() if ipa_initialization_complete, but so does ipa3_post_init.

(Btw, the makefile in the ipa folder also references CONFIG_RMNET_IPA3 which I don’t have in the kernel.)

Back to ipa3_post_init, that gets called from ipa3_load_ipa_fw (load firmware) which looks in dts or at a constant for the firmware name, which is probably in my case qcom,firmware-name = "ipa_fws" as in lahaina.dtsi
Searching for that file in the vendor partition does not find anything (good, because that would be hard to mount outside of Android)
Next, looking in the mounts that were present in Lineage I try
/dev/block/sde29 on /vendor/firmware_mnt type vfat - that does lit a ton of firmwares, but not ipa returned by find /mnt/firmware_mnt -iname \*ipa\*.
Let’s stop guessing and read how it finds the path. Something something subsys with ipa_fws param. Indeed there is a device /dev/subsys_ipa_fws but I can’t just ‘cat’ that.
Another IPA error not seen before: minidump-id not found for ipa_fws. This is present in Lineage recovery dmesg too.

Ok, time to give up and make something: adding CONFIG_RMNET_IPA3 to my defconfig and removing duplicate cmdline options that are present in the vendor_boot already and adding loglevel=7
Actually I found that CONFIG_RMNET_IPA3 is set in dataipa_QGKI.conf but so is CONFIG_RNDIS_IPA which I tried to disable… Let’s try to disable it… again? Nah, it doesnt compile if I do so.

Booting - not much change in output.

Now Elros suggestions: set these two CONFIG_USB_CONFIGFS_F_GSI=n and CONFIG_USB_CONFIGFS_ECM=n
Booting - some things changed, like: mkdir gsi.rndis fails, then no IPA errors in the log (but no rndis_bam.rndis either)

New suggestions: DUMMY_HCD=n, IPA=n, ECM=y

DUMMY_HCD=n didn’t help, but for some reason IPA=n compiled now. That doesnt’ help either

Meanwhile, while operating the device, I miss pressing VolUp on one of the boots and Lineage kicks in. I let it display the welcome and reboot to recovery.

Sure thing, my /data is gone :frowning: For some reason, lineage reformats it in dm-style, and I need it bare.
This means I have to redo the changes I did not save from init-debug.
Also, wipe data from recovery, push bz2, push busybox, make links to tar/bzip2

Still no telnet, but boy there’s stuff to debug in this kernel.

1 Like

Day 24

What if I don’t need no telnet?

Usually what happens is that the screen won’t turn on and you need to debug that through telnet.
Very fast iterating possible solutions.

The “screen” in hybris ports is based on hwcomposer service starting, at least.
For that to start, some dependant services need not crash (or need to be disabled)
To find out what happens you probably need to enter telnet as soon as possible and observe logcat logs.

But, even before that, to have the android “init” starting, you need to have mounts set up correctly.
That is, system goes into /system (or /system_root), vendor to /vendor and all else.
Part of those partitions are in fixup-mountpoints.
Another part, in my case, is in dynamic partitions. But so is fp4’s. So I literally copy its mount services. I also check out parse-android-dynparts into hybris/mw/, build it with -b and add it to patterns.

Building --droid-hal, --configs, --version and --mic again.

I then re-do the (push bz2, push busybox, link tar and bz2, untar) steps from day 22.
I also chroot into /data/.stowaways/sailfishos, /usr/bin/vi /etc/systemd/journalctl.conf and change it to persistent (from volatile).
Then I reboot to bootloader, sudo fastboot boot hybris-boot.img wait a couple of minutes, and reboot into recovery.
mount /data, chroot to sfos, jouralctl:

Of course I mess something up. where dmsetup was supposed to be called I get

[E][liblp]Logical partition metadata has invalid geometry magic signature.
Failed to parse metadata from "/dev/sda10"

Fortunately the error was a simple one as in Lineage recovery if I make ls -l /dev/block/by-name/super I get /dev/block/by-name/super -> /dev/block/sda19

Next thing after reboot in journalctl, droid-hal-init is loading modules (since when does it…?) and

droid-hal-init: LoadWithAliases was unable to load wlan

init/first_stage_init.cpp seems to use modules.load
I don’t have in /lib/modules/$MYKERNELVERSION/
Let’s try fixing modules.load by removing wlan.

Ramdump error:(

Nine seconds into the boot, jouralctl shows

Zenfone8 mce[2660]: modules/mempressure.c: mempressure_cgroup_init(): mempressure 'warning' threshold is not defined
Zenfone8 mce[2660]: modules/mempressure.c: g_module_check_init(): mempressure plugin init failed
Zenfone8 DSME[2872]: state: new state: USER
Zenfone8 systemd-udevd[974]: conflicting device node '/dev/dri/card0' found, link to '/dev/dri/card0' will not be created
Zenfone8 systemd[1]: Started Mode Control Entity (MCE).
Zenfone8 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=mce comm="systemd" exe="/usr/li
Zenfone8 mce[2660]: modules/display.c: mdy_stm_set_compositor_availability_changed(): compositor availability change: pending
Zenfone8 systemd-udevd[982]: conflicting device node '/dev/dri/renderD128' found, link to '/dev/dri/renderD128' will not be created
Zenfone8 kernel: [    9.234374] (CPU:5-pid:966:systemd-udevd)cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000
Zenfone8 kernel: [    9.234384] (CPU:5-pid:966:systemd-udevd)cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000, update PLL setting!!!
"update PLL setting!!!" ? But.. that was not the case in the previous boot :hmm: Again?

Yes, ramdump screen again. and again.

How can changing modules.load break the system so hard?
Surely enough, I add back wlan.ko in modules.load and it doesn’t reboot.
The weird thing is that it never even started to load modules when it broke!

Grepping for modules.load finds me vendor/lineage/build/tasks/kernel.mk
In that file, BOARD_VENDOR_KERNEL_MODULES_LOAD is used.
Indeed, that wlan.ko came out of my BoardConfig.mk.

Let’s see where else it’s used, and remove it :slight_smile:
Oh, there’s also /lib/modules/$MYKERNELVERSION/modules.order that includes it. That is the simpler explanation.
drivers/staging/qcacld-3.0/wlan.ko. Removing it.

And blocklist qca_cld3_wlan in modules.blocklist hmm :thinking: first_stage_init.cpp doesn’t seem to use blacklist by default…

But still I get a ramdump, as if something is hardcoded somewhere about that wlan module…\o/

        LOG(FATAL) << "Failed to load kernel modules";

Let’s change this to just LOG(ERROR) till we figure it out. And put back wlan where it was:()

Hmm…Why was wlan in “staging”? Sounds like unfinished stuff.
Let’s copy it manually?

adb push ./out/target/product/sake/obj/KERNEL_OBJ/drivers/staging/qcacld-3.0/wlan.ko /data/.stowaways/sailfishos/lib/modules/5.4.61-qgki-perf-gbface408530e-dirty/

Meanwhile I also rebuild init with FATAL removed and using ERROR instead (There’s also FATAL_WITHOT_ABORT, btw)

And I still get that fatal error about wlan.
Let’s look at that fresh init. Hopefully is this one out/target/product/sake/system/bin/init. It has 929016.
On device, usr/libexec/droid-hybris/system/bin/init has 933136. Kind of a big diff?
Let’s build_packages.sh --droid-hal and unpack that to make sure.
Yup, droid-local-repo/sake/droid-hal-sake-0.0.6-202303262207.aarch64.rpm has exactly the 929016 one now.

Adb pushing it. Reboot, same error, same abort.
So maybe I didn’t build/replace the expected init?
/usr/bin/droid/droid-hal-startup.sh points to /sbin/droid-hal-init which is… 933136 bytes. Why is there a copy…?
cp /usr/libexec/droid-hybris/system/bin/init /sbin/droid-hal-init
Reboot (meaning fastboot boot these days)

Aaand… ramdump screen again

So I got two/three unrelated changes that generate this. Removing wlan references from modules.* files or replacing /sbin/init with a version that doesn’t abort. Or the fact that I did copy a wlan.ko in there…
(I remove the wlan.ko and it’s not that)
The logs I get in the reboot cases are definitely not synced till the end as they don’t end in the same way, nor include a kernel panic message.

So maybe all the things that are in common is that droid-hal-init continues, all these ways of avoiding an “error” become a sure path to making 100% that a next error, which is fatal for the kernel is encountered.

This I don’t need telnet for. I need to think (as always) and read the first_stage_init.cpp carefully and then I would need an actual kernel panic/oops message that generates the ramdump.

1 Like

I follow very closey what you write here, and at the beginning I also had the assumption that I can follow you. But I had to realize that this exceeds my understanding. :smiley:
So I ask: are you making progress and are you confident that you will be able to make a port?

1 Like

The quality of the story degrades as the problems I face are getting harder for me to solve, indeed - not your fault that this became hard to follow. There is a balance of trying many solutions vs writing, and lately that balance was in disfavor of the writing. (Basically I got from “I’ll try a new thing because I know how to port with HADK” to “what’s this is not even booting and they changed the image header and wow now I can’t even get kernel logs or telnet access”).

Hopefully that is just an intermediary state and things will get back to well-known stuff.

To the point: I am making progress. I got init starting, dmesg and journalctl logs and probably I got the mount points right.

2 Likes

Day 25

Ramdump. Telnet. Ramdump.

A keen eye would have noticed I already copied androidboot.ramdump=disable from my previous port - it was already there from its Lineage build.

That didn’t do much if anything. But the answer needs to be in the kernel.
The question: why does the system reboot in a (special mode](android - How are ramdumps generated on a system crash? - Stack Overflow) when this device crashes?

Searching MAGIC, RAMDUMP etc in defconfig doesn’t do it.
Searching ramdump in kernel sources has some (many) hits. I need to filter them out, since some are RAM dumps from peripheral devices.

asusdebug.c. This file is full of gems.
First, there’s an asdf partition. That contains some asdf-logcat.txt or logcat-crash.txt but they are from weeks ago.
Idea 1: maybe I should mount it in recovery.

CONFIG_MACH_ASUS seems to guard some of the debugging, but I only have CONFIG_MACH_ASUS_SAKE. However obj-$(CONFIG_MACH_ASUS_SAKE) += asusdebug.o. Wait, I do have CONFIG_MACH_ASUS=y output in .config, I just don’t know from where.

Same asusdebug.c. There should be a /proc/asusdebug (it now contains just f2fs-attr:off)
It seems to accept a command like echo get_asdf_log > /proc/asusdebug
So, mkdir /asdf && mount /dev/block/sda9 /asdf then that echo.
That doesn’t seem to change the contents of the folder, unftortunately
echo slowlog > /proc/asusdebug does produce ASUSSlowg-$(date) logs though they might be from the current running system (something like a backtrace of all kernel threads.)


Breaking: Elros finds something in my dmesg from Day 23 and pings me to point out this change from: https://review.lineageos.org/c/LineageOS/android_device_xiaomi_sm6150-common/+/291494/4
Where ${ro.boot.usbcontroller} is a600000.dwc3 in my case.
I do it, create the /init_enter_debug2 file, boot hybris-boot and LO AND BEHOLD!
New USB device is detected when I boot, new usb network device and I just need to sudo ifconfig enp62s0u1 192.168.2.14 it (twice, because NetworkManager overrides the first config) and telnet 192.168.2.15 2323 and

Welcome to the Mer/SailfishOS Boat loader debug init system.

\o/

I immediately use this new superpower to spawn another telnet where I run dmesg -w and instruct the first session to continue the boot halted by the init_enter_debug2 file presence:

 # echo "continue" >/init-ctl/stdin 

And just as before, I am offered a ramdump screen (because droid-hal-init does this, after I “fixed” its quit because wlan.ko was missing).
The interactive telnet dmesg session has a headstart from the persisted journalctl and includes these lines in addition:

[   75.080658] [   75.080658] (CPU:4-pid:1020:systemd-udevd) [05:46:21.316141874] cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000
[   75.080669] [   75.080669] (CPU:4-pid:1020:systemd-udevd) [05:46:21.316151978] cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000, update PLL setting!!!
[   75.084444] [   75.084444] (CPU:5-pid:1020:systemd-udevd) [05:46:21.319927447] cs35l45 5-0030: Cirrus Logic CS35L45 (35a450), Revision: A0
[   75.084574] [   75.084574] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320057030] gpio gpiochip0: (f000000.pinctrl): allocate IRQ 401, hwirq 2
[   75.084580] [   75.084580] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320063436] gpio gpiochip0: (f000000.pinctrl): found parent hwirq 117
[   75.084587] [   75.084587] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320070051] gpio gpiochip0: (f000000.pinctrl): alloc_irqs_parent for 401 parent hwirq 117
[   75.084614] [   75.084614] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320097030] register_speaker_dai_name: register cs35l45 speaker name =  cs35l45.5-0031
[   75.087043] [   75.087043] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322527134] cs35l45 5-0031: num_fast_switch:3
[   75.087061] [   75.087061] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322544114] cs35l45 5-0031: 0:cs35l45-spk-music.txt
[   75.087070] [   75.087070] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322553436] cs35l45 5-0031: 1:cs35l45-spk-outdoor.txt
[   75.087079] [   75.087079] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322562655] cs35l45 5-0031: 2:cs35l45-spk-voice.txt
[   75.124684] [   75.124684] (CPU:6-pid:1020:systemd-udevd) [05:46:21.360167186] cs35l45 5-0031: cs35l45_set_sysclk: clk_id = 0, freq = 3072000
[   75.124692] [   75.124692] (CPU:6-pid:1020:systemd-udevd) [05:46:21.360175259] cs35l45 5-0031: cs35l45_set_sysclk: clk_id = 0, freq = 3072000, update PLL setting!!!
[   75.128460] [   75.128460] (CPU:5-pid:1020:systemd-udevd) [05:46:21.363944009] cs35l45 5-0031: Cirrus Logic CS35L45 (35a450), Revision: A0
[   75.170210] [   75.170210] (CPU:7-pid:54:migration/7) [05:46:21.405693280] IRQ 14: no longer affine to CPU7
[   75.245474] [   75.245474] (CPU:2-pid:1035:systemd-udevd) [05:46:21.480958072] [FTS_TS]fts_get_ic_information: Enter
[   75.246339] [   75.246339] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481822186] [FTS_TS]fts_get_chip_types:verify id:0x5652
[   75.246351] [   75.246351] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481834218] [FTS_TS/I]fts_get_ic_information:get ic information, chip id = 0x5652
[   75.246359] [   75.246359] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481842707] [FTS_TS]fts_get_ic_information: Exit(372)
[   75.246373] [   75.246373] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481856822] [FTS_TS/I]fts_create_apk_debug_channel:Create proc entry success!
[   75.246409] [   75.246409] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481892759] [FTS_TS/I]fts_create_sysfs:[EX]: sysfs_create_group() succeeded!!
[   75.246422] [   75.246422] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481905884] [FTS_TS]fts_ex_mode_init:create sysfs(ex_mode) succeedfully
[   75.246442] [   75.246442] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481924999] [FTS_TS/I]asus_create_sysfs:[EX]: asus_create_group() succeeded!!
[   75.246463] [   75.246463] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481946509] [FTS_TS/I]asus_game_create_sysfs:[EX]: asus_create_group() succeeded!!
[   75.246471] [   75.246471] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481954843] [FTS_TS]fts_gesture_init: Enter
[   75.246484] [   75.246484] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481967030] [FTS_TS]fts_gesture_init: Exit(462)
[   75.246491] [   75.246491] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481974895] [FTS_TS]asus_gesture_init: Enter
[   75.246505] [   75.246505] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481988541] [FTS_TS]asus_gesture_init: Exit(577)
[   75.246513] [   75.246513] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481996822] [FTS_TS][TEST]fts_test_init: Enter
[   75.246521] [   75.246521] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482004791] [FTS_TS/I][TEST]fts_test_func_init:init test function
[   75.246531] [   75.246531] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482014270] [FTS_TS/I][TEST]fts_test_func_init:match test function,type:88
[   75.246541] [   75.246541] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482024530] [FTS_TS][TEST]fts_test_init:sysfs(test) create successfully
[   75.246549] [   75.246549] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482032707] [FTS_TS][TEST]fts_test_init: Exit(2165)
[   75.246571] [   75.246571] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482054478] [FTS_TS/I]fts_irq_registration:irq:399, flag:2002
[   75.246860] [   75.246860] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482343645] [FTS_TS/I]fts_fwupg_init:fw upgrade init function
[   75.246884] [   75.246884] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482367082] [FTS_TS]fts_ts_probe_entry: Exit(1928)
[   75.246893] [   75.246893] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482376457] [FTS_TS/I]fts_ts_probe_entry:FOD location 6560 10720 26096 30256
[   75.246901] [   75.246901] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482384843] [FTS_TS/I]fts_ts_probe:Touch Screen(I2C BUS) driver prboe successfully
[   75.247001] [   75.247001] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482484686] [FTS_TS/I]fts_fwupg_work:fw upgrade work function
[   75.247021] [   75.247021] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482504791] [FTS_TS]fts_fwupg_get_fw_file:get upgrade fw file
[   75.247030] [   75.247030] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482513489] [FTS_TS]fts_get_fw_file_via_i:fts_get_fw_file_via_i
[   75.247038] [   75.247038] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482521926] [FTS_TS/I]fts_fwupg_get_fw_file:upgrade fw file len:102100
[   75.247047] [   75.247047] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482530884] [FTS_TS/I]fts_fwupg_auto_upgrade:********************FTS enter upgrade********************
[   75.247054] [   75.247054] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482538072] [FTS_TS]fts_ts_init: Exit(2240)
[   75.247065] [   75.247065] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482548020] [FTS_TS/I]fts_fwupg_upgrade:fw auto upgrade function
[   75.248194] [   75.248194] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483677186] [FTS_TS/I]fts_wait_tp_to_valid:TP Ready,Device ID:0x5652
[   75.248201] [   75.248201] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483683957] [FTS_TS/I]fts_fwupg_check_fw_valid:tp fw vaild
[   75.248209] [   75.248209] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483692759] [FTS_TS/I]fts_fwupg_get_ver_in_host:fw version offset:0x10e
[   75.248628] [   75.248628] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484111145] [FTS_TS/I]fts_fwupg_need_upgrade:fw version in tp:84, host:84
[   75.248636] [   75.248636] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484119791] [FTS_TS/I]fts_fwupg_upgrade:fw upgrade flag:0
[   75.248642] [   75.248642] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484124999] [FTS_TS/I]fts_fwupg_auto_upgrade:**********tp fw(app/param) no upgrade/upgrade success**********
[   75.248646] [   75.248646] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484129634] [FTS_TS/I]fts_fwupg_auto_upgrade:********************FTS exit upgrade********************

However, still no crash reason per se.

Now I can disable(mask?) droid-hal-init service, and interactively start it from telnet, maybe that gives me more control.
At least I could single it out as being the culprit.

But I would still need the details of the kernel crash.
That may need a better understanding of asusdebug.c and its related changes to printk.c or a revert of them.
Also I still don’t know how the ramdump screen is triggered (other than a Stackoverflow post) and how to use it in my advantage.

Let’s go with the disablement idea: in chroot through ADB, I

# systemctl mask droid-hal-init.service
Created symlink /etc/systemd/system/droid-hal-init.service → /dev/null.

I reboot, connect two telnet sessions, one with journalctl -f (instead of dmesg -w) and the other used to echo “continue” to /init-ctl/stdin
Somehow this time I get disconnected but the device is not rebooted.
Journal last logs show systemd-udevd activity like link_config: autonegotiation is unset or enabled, the speed and duplex are not writable..
It is not related to usb0, but that may be because it disconnects by that time. Maybe my udev rules are stepping on my toes here.

Checking the journal on the device (in adb shell chroot) shows some lines after the disconnect:

probably usb-moded
Mar 15 14:56:56 Zenfone8 systemd[1]: Started udev Wait for Complete Device Initialization.
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting usb-moded USB gadget controller...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting ohm daemon for resource policy management...
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target System Initialization.
Mar 15 14:56:56 Zenfone8 systemd[1]: Listening on OpenSSH Server Socket.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Sockets.
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: usb_moded 0.86.0+mer56 starting
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Daily Cleanup of Temporary Directories.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Reclaim memory once per day and on boot.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Timers.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Wayland path watcher.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Paths.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Basic System.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started droid-late-start.
Mar 15 14:56:56 Zenfone8 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=droid-late-start comm="systemd"
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Indicate boot is done...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Bluetooth service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Application Permission Management Daemon...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Oneshot stuff for root...
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: CONFIGFS detected
Mar 15 14:56:56 Zenfone8 kernel: [   75.113388] (CPU:2-pid:4761:usb_moded) [12:56:56.971053020] configfs-gadget a600000.dwc3: unregistering UDC driver
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Login Service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Telephony service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Nemo device lock daemon...
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Disk quota netlink message daemon.
Mar 15 14:56:56 Zenfone8 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=quota_nld comm="systemd" exe="/
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Reclaim memory...
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis: mkdir failed: No such file or directory
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis/ethaddr: can't open for writing: No such file or directory
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis/wceis: can't open for writing: No such file or directory
Mar 15 14:56:56 Zenfone8 systemd-logind[4773]: New seat seat0.
Mar 15 14:56:56 Zenfone8 kernel: [   75.122581] (CPU:2-pid:4761:usb_moded) [12:56:56.980245884] Mass Storage Function, version: 2009/09/11
Mar 15 14:56:56 Zenfone8 kernel: [   75.122598] (CPU:2-pid:4761:usb_moded) [12:56:56.980262238] LUN: removable file: (no medium)
Mar 15 14:56:56 Zenfone8 kernel: [   75.122648] (CPU:2-pid:4761:usb_moded) [12:56:56.980312342] file system registered
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: Unable to find $power_supply device.
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: hwal init failed

Let’s mask usb-moded too. Great, I don’t get disconnected now, and can inspect the running system.

I notice that odm is mounted as /odm_root. Is there anything useful there that could prevent droid-hal-init from starting?
Also “vendor-vm\x2dsystem.mount: Mount process exited, code=exited status=32”. So does “asusfw”, and “FFS”

pulseaudio[5126]: library “libdl_android.so” not found

This is usually fixed by linking the lib in /odm/lib[64]

And let’s see about that droid-hal-init.
If I didn’t write this till now, droid-hal-init is actually “init”, but from Android. We are executing this as non-0 PID service (with some patches so it ignores that it is not PID=1).
I didn’t peek into its sources before, but I got more familiar with this occasion.
On Android 11 at least, there is a “first stage init” that determines if its’ recovery or not, and if not it loads kernel modules.
Then it executes himself with “selinux_setup” as argument.
If that succeeds, then it executes himself with “second_stage”.
And it actually starts executing /system/etc/init.rc

The hybris patches change a bit of that, but in essence you still get the same steps.

Speaking of selinux, maybe this is why it sends us to the ramdump screen. Or the failed mounts.
So I at have to check:

  1. failed mounts
  2. selinux files
  3. odm links

1.failed mounts.

# systemctl status vendor-vm\\x2dsystem.mount
mount: /vendor/vm-system: wrong fs type, bad option, bad superblock on /dev/sde44, missing codepage or helper program

sde44, the device previously known as /dev/block/bootdevice/by-name/vm-bootsys_b. I don’t have notes on /vendor/vm-system being mounted in Lineage. It’s presence in the fstab may be a red herring.
Also a generous xxd -l 5120 shows only zeroes and there is no reference to either name(s) in the ASUS flashing script or list of images.
Same story for vendor/asusfw aka sda16.
find /usr/lib/systemd/system -iname \*.mount -exec grep FFS {} \; -print

Description=FFS mount
/usr/lib/systemd/system/dev-mtp.mount

Again, I’ll worry about this later. MTP is media transfer protocol.

2.selinux files.
HADK FAQ says I should copy some files.
They’re from /vendor/etc/selinux.
Unfortunately I don’t have ssh to copy them from the device in the droid-config-sake repo. I’ll make the folder structure on device and I’ll copy it later through ADB (If I don’t forget and boot into Lineage by accident)

As in the example repo, I:

  • create config with the same contents, create minimum/context/files folder, create minimum/context/dbus_contexts with the same contents.
  • copy /vendor/etc/selinux/vendor_file_contexts to minimum/contexts/files/files_contexts
  • create ‘minimum/policy’ folder, copy… what. There’s no precompiled_sepolicy in /vendor/etc.
    Fortuntely I soon find one in /odm_root/etc/selinux/precompiled_sepolicy. Hopefully it plays the same role.

3.odm links
For now I create a link in /odm/etc pointing to /odm_root/etc
There’s another ueventd.rc file in there talking about firmwares for “trustedvm” that are on /vendor/vm-system but I don’t have that so I don’t link it either.

4?
What about wireless? I am to curious not to check it out.
Not present in ifconfig -a. It fails to modprobe qca_cld3_wlan and dmesg says Reject WLAN Driver insmod before CBC
Kernel source above that error says:

/* If enabled Cold Boot Calibration is the 1st step in init sequence.
* CBC is done on file system_ready trigger. Qcacld should be loaded
* from init.target.rc after that. Reject qcacld load from
* vendor_modprobe.sh at early boot to satisfy this requirement.
*/

Wow, so I need some droid init sequence even for wlan driver. Hmm. find / -iname \*.rc -exec grep -i wlan {} \; -print finds me this nicely commented piece of init:

# Enable WLAN cold boot calibration
write /sys/devices/platform/soc/b0000000.qcom,cnss-qca6490/fs_ready 1

Ok, modprobe again:

DMS QMI connection not established
Direct firmware load for wlan/qca_cld/WCNSS_qcom_cfg.ini failed with error -2

I do have /vendor/firmware/qca_cld/WCNSS_qcom_cfg.ini
QMI is probably Qualcomm MSM Interface. Wikipedia says oFono could help. But why this dependency in the wlan driver? VoWIFI?
The kernel source file with that error references “CNSS_QMI_DMS_CONNECTED” that is missing.
I’ll leave it for another day to see what DMS is.


Ok, now let’s start droid-hal-init.
Prepare three telnet sessions, one with journalctl -f, one ready to launch /usr/libexec/droid-hybris/system/bin/logcat

Aand… ramdump screen. I don’t have the chance to execute logcat.
At least this time I got fuller journal than before, when probably usb-moded kicked in earlier and journalctl didn’t sync to persisted?

There are a couple of ‘sending signal 9’ lines
The last service started is ‘/vendor/bin/vendor_modprobe.sh’.
Probably some .rc services need to be disabled.
Tomorrow.

3 Likes

Day 26

Let’s see where that is.
processing action (early-init) from (/vendor/etc/init/hw/init.target.rc:33)

Running early-init from that script, line by line, doesn’t crash the kernel.

I need to get back to kernel reading to see how to acutally use the ram dumps.
The asusdebug.c seems to be fully created by asus and it’s history only contains 2 commits in my kernel tree.
Let’s look at the initial integration which is in commit named treewide: Import ASUS changes from 31.1004.0404.81

Scroll, scroll, scroll.

WTF did I just see pubg here.

Scroll, scroll. I cannot finish reading it all as of now.


Playing a funny game: ls /sys/class/leds/

blue/ led:flash_0/ led:flash_2/ led:switch_0/ led:switch_2/ led:torch_1/ led:torch_3/ red/
green/ led:flash_1/ led:flash_3/ led:switch_1/ led:torch_0/ led:torch_2/ mmc0::confused:

Echoing 1 or 0 to red/green/blue path brightness controls the notification led next to the charger port
Echoing 1 to switch_0/brightness makes the camera led flash :wink: Well, at least the first time.
Maybe I will get faster-than-camera-opens flashlight in SFOS


After playing the led game, I again start dmesg -w (for kicks) and /usr/bin/droid/droid-hal-startup.sh
And I got my ‘lucky’ kernel crash - dmesg: Ubuntu Pastebin
Basically lucky because while a kernel thread was bringing the system down some other one found the time to flush the dmesg to telnet…

The crash is an ipa_assert.

Direct firmware load for ipa_fws.mdt failed with error -2
ipa_fws: Failed to locate ipa_fws.mdt(rc:-2)
pil_boot failed for ipa_fws
ipa ipa3_pil_load_ipa_fws:6536 Unable to PIL load FW for sub_sys=ipa_fws
ipa ipa3_load_ipa_fw:6579 IPA FW loading process has failed result=-22
IPA: unrecoverable error has occurred, asserting

While grepping that same log for ipa, the scrollbar jumps above and shows me previously parsed rc files:

droid-hal-init: Parsing file /vendor/etc/init/ipa_fws.rc…
droid-hal-init: Parsing file /vendor/etc/init/ipacm-diag.rc…
droid-hal-init: Parsing file /vendor/etc/init/ipacm.rc…

The first one has one suspicious monologue with the kernel: write /dev/ipa 1
Unfortunately that command is in an on early-boot trigger so I cannot just disable it as I do with a service override.

I’m backing out the IPA=n in the previous kernel defconfig changes (backing out GSI=n does not work, telnet is again absent.)
Boot hybris-boot, and check

  1. /vendor/firmware/ipa_fws.mdt exists alright.
  2. echo 1 > /dev/ipa crashes the system

Let’s chmod a-w /dev/ipa and run droid-hal-startup.sh again. Ramdump.
Hmm… maybe I need to disable ipacm service too.
I add to /usr/libexec/droid-hybris/system/etc/init/disabled_services.rc:

service vendor.ipacm /system/vendor/bin/ipacm_HYBRIS_DISABLED

Unfortunately, that’s not the culprit. The real problem is that root can write to /dev/ipa even without rights.
And that makes the kernel crash - altough I don’t know why it doesn’t find that framework.
I just checked and echo 1 > /dev/ipa crashes recovery too.
So the only “working system” I have that uses that /dev/ipa is Lineage itself…

I am grepping the whole hadk repo for ipa_fws.

For example, kernel ipa_i.h says

/* The relative location in /lib/firmware where the FWs will reside */
#define IPA_FWS_PATH “ipa/ipa_fws.elf”
#define IPA_FWS_PATH_4_0 “ipa/4.0/ipa_fws.elf”

and ipa.c in ipa3_manual_load_ipa_fws() says it’s IPA_HW_v4_0

But that codepath of manual ipa load does not seem to be touched based on the above log.
Another one, ipa3_pil_load_ipa_fws is used. Again, based on log messages grepping, that calls request_firmware eventually which ends up in firmware_loader.c.
The path that are enumerated there are:

static const char * const fw_path[] = {
    fw_path_para,
    "/lib/firmware/updates/" UTS_RELEASE,
    "/lib/firmware/updates",
    "/lib/firmware/" UTS_RELEASE,
    "/lib/firmware"
};

And further grepping fw_path_para reveals this nice explanation:

There is an alternative to customize the path at run time after bootup, you
can use the file:

  • /sys/module/firmware_class/parameters/path

Let’s use it, then:

echo -n “/vendor/firmware” > /sys/module/firmware_class/parameters/path

echo 1 > /dev/ipa seems to survive now \o/
I need to add that above to /usr/bin/droid/droid-hal-early-init.sh as it doesn’t exist probably for this exact usecase.

# systemctl unmask droid-hal-init.service and reboot to bootloader to boot hybris-boot.

After reboot, I don’t run journalctl anymore, but /usr/libexec/droid-hybris/system/bin/logcat because now droid-hal-init starts.

And here is my first logcat: Ubuntu Pastebin. Goodnight.

6 Likes

Oh, it sounds like you took that as a criticism. I had intended the opposite! I’m sorry! I was and am impressed because the whole thing is fascinating but also sometimes beyond my understanding. I stay tuned and always look forward to a new episode of the Livecast!

Nice to hear :slight_smile:

3 Likes

Don’t worry, I didn’t took it as a criticism. But it did trigger some introspection.
Thanks for the kind words.