Livecasting porting notes for Zenfone 8

Day 11:
Asked about gki1 on #sailfishos-porters - suggestion was I should adapt hadk/hybris/hybris-boot/Android.mk - just as I tried to do with ubports build.sh.

Meanwhile:
UBports pad was updated.
There are references to vbmeta emtpy hashes if system reboots into bootloader fast (as I had on Day 7 part 1?)
Also “console=tty0” is a must for cmdline and should not be removed no matter what.

Back to SFOS:
I continue to have a long discussion about gki/vendor_boot on #sailfishos-porters. Folks are very helpful but it is clear that this is new territory. Asus may have used a shiny new thing that was not compulsive for Android 11.

Creating an empty vbmeta meanwhile:

$ avbtool make_vbmeta_image --flags 2 --padding_size 4096 --output vbmeta_disabled.img

Flashing that vbmeta + hybris recovery. No telnet.
Flash asus vendor_boot
Flashing ubports boot/vendor_boot
Flashing back lineage boot/vendor_boot

nothing.

Per porters’ idea, I unpacked Lineage’s boot.img, replaced its ramdisk with the hybris-recovery one LZ4’d → stuck at ASUS logo, no sign of mer rndis/gadget either.
(It’s funny how the hybris-recovery differs from the hybris-boot ramdisk just by one byte, which is 1 as in ALWAYS_DEBUG=1)

Because the device won’t reboot in 60s and I have no telnet, it means that the init script does not reach the sleep 60 line either.

Added a sleep 30, followed by a reboot -f just before hotplug in case that hotplug line is the culprit.
Still stuck at ASUS logo.

It is however a lesson to me that you need to trust that the init shell script could be debugged. With nothing at hand but staggered sleeps and restarts. That is, with no other connection to the device.

Tomorrow I will probably add a restart even earlier…

1 Like

Day 12:
Let’s add a sleep 15 at the beginning of init shellscript.
Flashing hybris-recovery → reboot in about 10 seconds (or 14 if you count the ‘unlocked’ screen) but hardly 15.

Let’s add 25. The same reboot time, approx 14 (depends on how you count).

Wait, I forgot, I shouldn’t be flashing hybris-recovery, but Lineage boot with hybris recovery ramdisk:)
mkbootimg later, fastboot boot → ‘Failed to load/authenticate boot image: Load Error’
Ok, fastboot flash boot. Now, “familiar” stuck-at-asus-logo.

Wait, was it like this all the time when the hybris-recovery booted? 10-14s until reboot?

Let me grab an ‘old’ hybrys-recovery. Yes, it was…

Ok, maybe I should boot twrp just to have adb shell and check last_kmsg or equivalent.

fastboot boot twrp → ‘Failed to load/authenticate boot image: Load Error’
Reboot bootloader → fastboot boot twrp fishes, but … is stuck at asus logo ?
Let’s flash Lineage boot again just for sanity. Works :whew:

Nothing in pstore or last_kmsg though.

Let’s replace the kernel in the ubports boot/vendor_boot pair now, with the Lineage one.
Hmm… restart in under 20 seconds… which is different :slight_smile:

But… The problem remains - boot script does not seem to be followed.
The orignal recovery init is not shell script, but binary.
Searching for “kernel init shell script” finds me a reference to CONFIG_BINFMT_SHELL.
My config does not have this - though it might be enabled by default.
Let’s check with scripts/extract-ikconfig - yup, CONFIG_BINFMT_SHELL it is enabled.

So this is not the problem.

I remembered that elros said something about abootimg. I think he referred to using that to replace the kernel.
abootimg seems to be a software that would allow this, but the last commit is from 2012.
There are forks updated two months ago, like this one that support v1 and v2 images.
Of course, I’m on v3 so they might not apply. Let’s “table” this.

What about Lineage? how come it boots?
What if I replace my kernel in the lineage boot.img, would it still boot?
There is something weird when I do this - boot.img was 98M, the new boot.img is 54M only.
Just to set the record straight, the way I’m unpacking and repacking is:

unpack_bootimg --boot_img boot.img --out k --format mkbootimg > k.txt

then, cat-ing the k.txt

mkbootimg --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel k/kernel --ramdisk k/ramdisk --cmdline ‘’ -o boot-k.img

I remember to have seen this difference on another pack/repack, that is, when I did the replacement in the first direction earlier, Lineage kernel in ubports boot image. I did not see this difference (was 46M always) when replacing the Lineage kernel in my hadk recovery boot image though.

Maybe my mkbootimg is different than the one in lineage tree?
That was in lineage/out/soong/host/linux-x86/bin/mkbootimg as I’ve seen on Day 6.
The mkbootimg has no --version param, but just calling it with -h shows differences in output.

lineage/out/soong/host/linux-x86/bin/mkbootimg
usage: mkbootimg [-h] [--kernel KERNEL] [--ramdisk RAMDISK] [--second SECOND]
                 [--dtb DTB]
                 [--recovery_dtbo RECOVERY_DTBO | --recovery_acpio RECOVERY_ACPIO]
                 [--cmdline CMDLINE] [--vendor_cmdline VENDOR_CMDLINE]
                 [--base BASE] [--kernel_offset KERNEL_OFFSET]
                 [--ramdisk_offset RAMDISK_OFFSET]
                 [--second_offset SECOND_OFFSET] [--dtb_offset DTB_OFFSET]
                 [--os_version OS_VERSION] [--os_patch_level OS_PATCH_LEVEL]
                 [--tags_offset TAGS_OFFSET] [--board BOARD]
                 [--pagesize {2048,4096,8192,16384}] [--id]
                 [--header_version HEADER_VERSION] [--dt DT] [-o OUTPUT]
                 [--vendor_boot VENDOR_BOOT] [--vendor_ramdisk VENDOR_RAMDISK]
/usr/bin/mkbootimg
usage: mkbootimg [-h] [--kernel KERNEL] [--ramdisk RAMDISK] [--second SECOND] [--dtb DTB] [--recovery_dtbo RECOVERY_DTBO | --recovery_acpio RECOVERY_ACPIO]
                 [--cmdline CMDLINE] [--vendor_cmdline VENDOR_CMDLINE] [--base BASE] [--kernel_offset KERNEL_OFFSET] [--ramdisk_offset RAMDISK_OFFSET]
                 [--second_offset SECOND_OFFSET] [--dtb_offset DTB_OFFSET] [--os_version OS_VERSION] [--os_patch_level OS_PATCH_LEVEL] [--tags_offset TAGS_OFFSET]
                 [--board BOARD] [--pagesize {2048,4096,8192,16384}] [--id] [--header_version HEADER_VERSION] [-o OUTPUT] [--vendor_boot VENDOR_BOOT]
                 [--vendor_ramdisk VENDOR_RAMDISK] [--vendor_bootconfig VENDOR_BOOTCONFIG] [--gki_signing_algorithm GKI_SIGNING_ALGORITHM]
                 [--gki_signing_key GKI_SIGNING_KEY] [--gki_signing_signature_args GKI_SIGNING_SIGNATURE_ARGS] [--gki_signing_avbtool_path GKI_SIGNING_AVBTOOL_PATH]

But I just stubbed with a script the soong one with my host’s and it didn’t generate a smaller image for lineage.

Let’s check parameters too:
Intercepted command:

mkbootimg --kernel $kernel --ramdisk $ramdisk-recovery --os_version 11 --os_patch_level 2023-02-05 --header_version 3 --output boot.img
Unpackbootimg command:
mkbootimg --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel b/kernel --ramdisk b/ramdisk --cmdline ‘’ -o hybris-b.img

Differences are: other mkbootimg binary, missing day in OS patch level, version is dotted, cmdline is specified… weird that any of those would count.

Adapting the command-line for the hybris ramdisk + Lineage boot didn’t make a difference in size (and wasnt’ a difference from the get go).
Adapting the command-line for the ubport ramdisk + Lineage boot outputs the exat file as before, ~50M vs 98M without lineage.
No difference in the Lineage boot.img repacking either (still 54M instead of 98M)

Let’s diff the extracted kernel from a boot.img to the one originally packed by the scripts from ubports or Lineage build.
Using diff <(xxd ../debugging/lineage/boot/kernel) <(xxd out/target/product/sake/kernel) I see that they do differ:

< 00001090: 0200 0014 885e 1453 2900 01d0 2941 1c91 …^.S)…)A…

From an early offset, that is.
But remember, after the initial Lineage build, I’ve re-made the build with my mkbootimg spoofing script. Just because I re-made it may make it different,…

There’s also the fact that the kernel in the build tree is marked as executable. Maybe I should do that too?

Anyway, a thoght for the next day, this difference in boot img sizes… what about booting lineage with my hadk kernel?
Let’s try it anyway. (Doing that). Restarts to bootloader in like 40 seconds.
Maybe I forgot vendor_boot from ubports? Let’s flash Lineage’s vendor_boot to make sure.
Restart in about 20s…

Day 13:
Vendor boot partitions

Copy your device fstab into /first_stage_ramdisk in the vendor_boot partition, not the boot partition.

GKI-versioning
Idea: extrack-ikconfig from gki kernel and apply minimal patches to that to boot uboports

Note for self: Xiaomi 12x wouldn’t have used the v3 image since it has a 4.19 kernel, not a 5.4 :slight_smile:

Sanity check: take lineage boot.img, unpack it and repack it.
The image is different (52M vs 98M) but Lineage still boots. The same result with host mkbootimg and soong/out one.

The weird part is that taking commands sourced from Day 6, when I tried to see what mkbootimg commands were there in the Lineage build… they still output 52M. So it’s not the unpack step that makes the images smaller… what can it be?

Next thing to sanity try: a kernel change.
I am going to inspire from xperia 10 IV kernel minimal changes. First resetting to Lineage config, then picking this change and packing it in the HADK kernel then the kernel in the Lineage boot image.

20 mins later - It boots.

Let’s pick the next change now, which implies NotKit has made sure to not break the ABI.
20 mins again:
Hmm fastboot plays tricks again, “fastboot: error: Failed to identify current slot”.
Disconnecting and reconnecting the cable and rebooting to bootload. Flashed.

Lineage boots.
But it says there’s an internal problem with my device. And I should contact the manufacturer.

But it works. Progress! Let’s pick the next change by NotKit.
20 more mins:
Same fastboot dance cable as before.
Lineage logo appears, but the device does not finishes booting.

Actually… it does, but touchscreen does not work, screen flickers and red ligt flashes the notification light :slight_smile:
I would call it a success - my ubports or hadk builds don’t do that :slight_smile:

Now I would like to test this kernel with either hadk or ubports.
However, logically, the Lineage one has not worked before for this test. There is a slight possiblity that some of the changes make this kernel more friendly with our GNU/Linux.

8 mins later:
Making hybris-recovery and flashing that as boot → restart in under 10 seconds.
But I had a sleep 5 followed by restart in that init. Let’s make it 30.
8 mins again:
Fastboot cable/reboot dance.
It still restarts under 10s :frowning: so nothing says that the sleep value was taken into account.
Desperately hoping that I would find a last_kmsg or /sys/fs/pstore message, I try fastboot boot twrp but that locks up stuck at the asus logo too…

So checking my notes, it seems that 10-14s reboot has always been the case for hybris-boot.img. Under 10s is just depending on how you count.
So no progress on the front of “Lineage-like kernel, but with hybris init”.

Let’s see ubports boot/vendor_boot.
That is, the one where ubports ramdisk is on vendor.

This takes at most 20s to reboot, and does go through the ASUS logo - hybris-boot hadnt.
It is also the same result as in the previous day.

trying to boot / flash twrp to boot again:
“Any key to shutdown” says a 8-pt font on a 300+ppi screen (I got do use an actual “search icon” to see that - that is, an actual hand glass)
Flashing doesn’t work until a 2nd reboot, twrp starts, no last_kmsg or /sys/fs/pstore as ever.

So I’m back to square one. I don’t know how to boot the linuxes in this boot/vendor_boot world

Not even trying with my early hack to have halium ramdisk moved to first_stage_ramdisk doesn’t seem to make a difference (didn’t check the build output though [I did in hindsight, it was what I expected])

And now Lineage boot says “Can’t load Android system. Your data may be corrupt. If you continue to get this mssage, you may need to perform a factory reset and erase all user data stored on this device” [Factory Reset/Try again]?.

Hmm. Maybe I should have erased user data between all these experiments.
Try again.
Lineage vibrates, shows loading screen and “internal problem with your device” (that second-to-last kernel…) but does boot on second try.

4 Likes

Day 14:

Let’s read the recovery init sources, the ones which produce a binary file - that work with both kernels, as opposed to the busybox shell script which doesn’t.

Again I’m grepping lineage mk files for the recovery-image target

find . -iname *.mk -exec grep “recovery-image” {} ; -print

No results.

find . -iname *.mk -exec grep “ramdisk-recovery” {} ; -print

Only one out result.
.bp?

Nope.
Just ramdisk then. Many hits. Filering the out/ ones and goldfish/cuttulefish ones (a generic device?/the emulator?)

./vendor/lineage/build/tasks/kernel.mk
./build/make/core/envsetup.mk
./build/make/core/board_config.mk
./build/make/core/main.mk
./build/make/core/config.mk
./build/make/core/tasks/sdk-addon.mk
./build/make/target/product/gsi_keys.mk
./build/make/target/product/virtual_ab_ota.mk
./build/make/target/product/developer_gsi_keys.mk
./build/make/target/product/security/Android.mk
./device/asus/sake/device.mk
./system/core/CleanSpec.mk
./system/core/rootdir/avb/Android.mk
./system/core/rootdir/Android.mk
./system/core/init/Android.mk
./system/sepolicy/Android.mk

The first one I inspect, ./build/make/core/board_config.mk, references a debug_ramdisk thing I also have in out/ and which has this adb_debug.prop with the following interesting contents:

# Note: This file will be loaded with highest priority to override
# other system properties, if a special ramdisk with "/force_debuggable"
# is used and the device is unlocked.
# Disable adb authentication to allow test automation on user build GSI
ro.adb.secure=0
# Allow 'adb root' on user build GSI
ro.debuggable=1
# Introduce this property to indicate that init has loaded adb_debug.prop
ro.force.debuggable=1
Sounds very useful, but is related to adb debugging.

The next one, ./build/make/core/main.mk has some targets I could grep for later:

.PHONY: ramdisk
ramdisk: $(INSTALLED_RAMDISK_TARGET)
.PHONY: ramdisk_debug
ramdisk_debug: $(INSTALLED_DEBUG_RAMDISK_TARGET)
.PHONY: ramdisk_test_harness
ramdisk_test_harness: $(INSTALLED_TEST_HARNESS_RAMDISK_TARGET)
.PHONY: vendor_ramdisk_debug
vendor_ramdisk_debug: $(INSTALLED_VENDOR_DEBUG_RAMDISK_TARGET)

The next one ./build/make/core/config.mk lists ramdisk in dont_bother_goals := :- )

Next one is ./build/make/core/tasks/sdk-addon.mk has some QEMU reference on the same line, I’ll skip it.

$(addon_dir_img):$(INSTALLED_QEMU_RAMDISKIMAGE):images/$(TARGET_CPU_ABI)/ramdisk.img \

I’m losing my patience and I skip to ./system/core/init/Android.mk but there isn’t anything of much more interest other than

# Install adb_debug.prop into debug ramdisk.
# This allows adb root on a user build, when debug ramdisk is used.
LOCAL_REQUIRED_MODULES := \
   adb_debug.prop \
# Set up the directories that first stage init mounts on.
LOCAL_POST_INSTALL_CMD := mkdir -p \
    $(TARGET_RAMDISK_OUT)/debug_ramdisk \
    $(TARGET_RAMDISK_OUT)/dev \
    $(TARGET_RAMDISK_OUT)/mnt \
    $(TARGET_RAMDISK_OUT)/proc \
    $(TARGET_RAMDISK_OUT)/sys \

Buut.
LOCAL_MODULE_PATH := $(TARGET_RAMDISK_OUT)
also

LOCAL_MODULE := init_first_stage
LOCAL_MODULE_STEM := init

Maybe in here there’s the .cpp file that builds init.

Detour thoughts (from Day 13) 1. /first_stage_ramdisk should have a fstab
2. boot img is further post-processed to grow, in the lineage build?

Looking at first_stage_init.cpp patched for hadk, the last lines do an exec of /sbin/droid-hal-init which I dont have (yet) because I have only started to boot the kernel, not the whole thing.

My hunch is that the same init source are used for system and recovery too. And that the recovery one is started first then it switches to the system one.

This “confirms” it, with a question: would it work if recovery is Lineage/Asus and system init would not be?
Let’s look at the same file in Lineage, so “unpatched”:

It just execs “/system/bin/init”. That is, first stage init, which is in boot.img’s recovery-ramdisk at system/bin/init, execs… itself?
Of course not.

The author of that line says himself:

Tom Cherry, 5 years ago (July 21st, 2018 12:57 AM)
split first stage init into a separate executable

In the future, systems with dm-linear will require a ramdisk to set up
the mount for system. In this world, first stage init will be a part
of this ramdisk and handle setting up dm-linear, mounting the
necessary partitions, then pivoting to the system image, which will
become the root partition.
This also enables previous devices without system-as-root, to be
unified with system-as-root devices for all aspects of boot after the
pivot_root.

:thinking:
I think this means that if I want to re-use the recovery ramdisk I have to have my init on system, already mounted by dm-linear. Which of course I don’t have.

But this doesn’t explain why literally having another “first-stage-init” wouldn’t work, so I need to dig into that more (why init busybox shell is not enough?)

1 Like

Day 15

/dev/console

This option connects stdin, stdout, and stderr to the console. It is mutually exclusive with the
stdio_to_kmsg option, which only connects stdout and stderr to kmsg.

Idea:
Set androidboot.selinux=permissive kernel cmdline

system/core/init/main.cpp calls SetupSelinux(argv) if “selinux_setup” is passed in, which is on first_stage_init.cpp last line that execs ‘/sbin/droid-hal-init’ or ‘/system/bin/init’ (originally)

There is a very nice Readme at /system/core/init/README.md:

First stage init has three variations depending on the device configuration: 1) For system-as-root devices, first stage init is part of /system/bin/init and a symlink at /init points to /system/bin/init for backwards compatibility. These devices do not need to do anything to mount system.img, since it is by definition already mounted as the rootfs by the kernel.
  1. For devices with a ramdisk, first stage init is a static executable located at /init. These
    devices mount system.img as /system then perform a switch root operation to move the mount at
    /system to /. The contents of the ramdisk are freed after mounting has completed.

  2. For devices that use recovery as a ramdisk, first stage init it contained within the shared init
    located at /init within the recovery ramdisk. These devices first switch root to
    /first_stage_ramdisk to remove the recovery components from the environment, then proceed the same
    as 2). Note that the decision to boot normally into Android instead of booting
    into recovery mode is made if androidboot.force_normal_boot=1 is present in the
    kernel commandline.

Once first stage init finishes it execs /system/bin/init with the “selinux_setup” argument. This
phase is where SELinux is optionally compiled and loaded onto the system. selinux.cpp contains more
information on the specifics of this process.

Lastly once that phase finishes, it execs /system/bin/init again with the “second_stage”
argument. At this point the main phase of init runs and continues the boot process via the init.rc
scripts.

Back to “why my hadk init does not execute?”:

file busybox from hadk

ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, for GNU/Linux 2.6.32

file init from lineage

ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /system/bin/linker64

[hadk]$ find . -iname busybox -exec file {} ; -print

They are all 32-bit.

In ubports/

ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, for GNU/Linux 3.7.0

CONFIG_COMPAT=y is set in kernel (all of them, asus, lineage, hadk…) so 32-bit execution should work

Let’s still copy busybox from ubports to hadk/external/busybox :slight_smile:
Flashing the 64-bit busybox… still reboots at ~14 seconds :frowning:
Flashing an older ubports boot.img shows ASUS logo for more than 60 seconds though… (24Feb)
Flashing another older ubports boot.img (25Feb) shows the same…
Making a system/bin folder in first_stage… and linking init to ../../init
Making a vendor_boot with that - still reboots in about 20 seconds

Day 16

Let’s see why /sys/fs/pstore/ has no console-ramoops when the device reboots.
All CONFIG_PSTORE-related flags are present in the kernel.
Recovery (at least Lineage one) does mount pstore.

(pstore logging is based on a memory region that is not cleared upon reboot, where the kernel puts the last logs before crashing).

Grepping through the kernel, Documentation/admin-guide/ramoops.rst says that the pstore mem can be configured:

  • as kernel params
  • dts (devicetree) files may indicate pstore support.
  • using a ramoops_platform_data struct reference in C.

Looking at my current SFOS device, I do have ‘ramoops_memreserve=4M’ in /proc/cmdline at runtime.
I don’t have it in BoardConfig.mk though. (The next day I found out that cmdline can be prepended from .dts files and it was there)
Lineage on Zenfone does not have anything “ramoops”-related in /proc/cmdline

My current SFOS device also has printk.devkmsg=on

Let’s add

BOARD_KERNEL_CMDLINE := \
    printk.devkmsg=on \
    ramoops_memreserve=4M \
    androidboot.console=ttyMSM0 \

Building, flashing, rebooting to bootloader - still no /sys/fs/pstore contents

Except the kernel ramoops.rst, I also found a stackoverflow post also referencing dts/devicetree files.
Let’s download the ASUS OSS kernel to see if there is any ramoops region defined in dts[i] files, Lineage doesn’t seem to have.
There is also good advice in postmarketOS wiki:

Make sure that the memory address you reserve and its size is the same in your downstream and mainline configuration

Day 17

SFOS on tucana /proc/cmdline

ramoops_memreserve=4M rcupdate.rcu_expedited=1 rcu_nocbs=0-7 console=ttyMSM0,115200n8 earlycon=msm_geni_serial,0x880000 androidboot.hardware=qcom androidboot.console=ttyMSM0 androidboot.memcg=1 lpm_levels.sleep_disabled=1 video=vfb:640x400,bpp=32,memsize=3072000 msm_rtb.filter=0x237 service_locator.enable=1 swiotlb=1 loop.max_part=7 androidboot.usbcontroller=a600000.dwc3 selinux=1 enforcing=0 audit=0 androidboot.selinux=permissive systemd.legacy_systemd_cgroup_controller=yes androidboot.init_fatal_reboot_target=recovery printk.devkmsg=on androidboot.verifiedbootstate=orange androidboot.keymaster=1 root=PARTUUID=ed7e0db9-1db3-d6bc-273c-2d091d3262b3 androidboot.bootdevice=1d84000.ufshc androidboot.serialno=ab5fb9e6 androidboot.ramdump=disable androidboot.secureboot=1 androidboot.cpuid=0x5670fc70 androidboot.hwversion=6.19.0 androidboot.hwc=GLOBAL androidboot.hwlevel=MP androidboot.baseband=msm msm_drm.dsi_display0=dsi_xiaomi_f4_41_06_0a_fhd_cmd_display: skip_initramfs rootwait ro init=/init androidboot.dtbo_idx=13 androidboot.dtb_idx=4 androidboot.dp=0x0

lineage on sake /proc/cmdline

log_buf_len=256K earlycon=msm_geni_serial,0x98c000 rcupdate.rcu_expedited=1 rcu_nocbs=0-7 kpti=off androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enable=1 swiotlb=0 buildvariant=userdebug androidboot.verifiedbootstate=orange androidboot.keymaster=1 androidboot.bootdevice=1d84000.ufshc androidboot.fstab_suffix=default androidboot.boot_devices=soc/1d84000.ufshc androidboot.serialno=M4AIB76087246RZ androidboot.baseband=msm msm_drm.dsi_display0=qcom,mdss_dsi_samsung_fhd_cmd: androidboot.slot_suffix=_b rootwait ro init=/init androidboot.dtbo_idx=4 androidboot.dtb_idx=0 androidboot.force_normal_boot=1 androidboot.pre-ftm=0 androidboot.id.prj=4 androidboot.id.stage=7 androidboot.id.sku=3 androidboot.id.rf=1 androidboot.id.pcb=4731 androidboot.id.nfc=1 androidboot.country_code=EU androidboot.bootcount=0 androidboot.rawdump_en=0 androidboot.asus.authorized=0 androidboot.cpuid.hash=7facfc3cde1faa8fca410840eaf6adde androidboot.toolid=043f7063abdf40289d70cb1b1ec86413 SB=Y androidboot.fused=1 SBNR=Y androidboot.fused.norpmb=1 androidboot.id.panel=1 androidboot.id.ufs=3 androidboot.factory.crc=0x84A43520 androidboot.ddr.manufacturer_id=FF androidboot.ddr.device_type=8 LCD=96000141

Let’s manually trigger a kernel panic in Lineage
# echo c > /proc/sysrq-trigger

AND WE GET OUR FIRST RAMDUMP REFERNECE! :ghost:

“Waiting for flashing full ramdump” it’s a whole new screen that pops up instead of bootloader / ASUS logo or Lineage one.

On Vol-Dn and Power, Lineage boots - but no console ramoops in sys/fs/pstore…

I again ask for ideas on #sailfishos-porters channel.
Two great ideas:

  1. modify the Lineage ramdisk too (after I tried to replace the kernel) to see if that boots.
  2. adding printk.always_kmsg_dump=y too to my BOARD_KERNEL_CMDLINE and CONFIG_PANIC_TIMEOUT=20 to my kernel would help me again to observe if there is an additional timeout / pstore log.

For 2, I do the changes which involve a 22 minutes build including the kernel.
Unfortunately, the new hybris-recovery image does not play differently, reboots just as fast and doesnt’t write to /sys/fs/pstore.

Some other ideas:
3. modify the hybris boot compression to use lz4 → same behavior as before
4. oops=panic makes kernel oops behave as panic → same behavior as before.

So in conclusion, there is no kernel problem :-S because the 20 seconds from OOPS or PANIC are not added.

Let’s also try adding changes to lineage ramdisk - point 1. above.

cpio ramdisks seem hard to work with since they may pack files owned by root and you can’t just unpack and re-pack them.
For my first test, I’ll just --append a file, to see how that goes.

find ./testonefile | cpio -oA -H newc -F ramdisk-appendone.cpio
and repackaged
mkbootimg --out boot-appendone.img --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel boot/kernel --ramdisk boot/ramdisk-appendone.cpio.lz4 --cmdline ''

And. Lineage. Does. Not. Boot. with that.

New error is always progress :wink:

1 Like

“Waiting for flashing full ramdump”

So the phone is stuck in this or you can boot something else?

1 Like

So far it’s just a screen that looks like this, and can be made go away by VolDn + Power.

This post on stackoverflow explains that it might be “just” a kernel panic that, along with bootloader support for MAGIC_CRASH makes the next boot look like this.

My hypothesis is that ASUS bootloader included this MAGIC_CRASH handler in the bootloader so this is why they have reports of the “dreaded ramdump bug” instead of just “dreaded bootloop”.

I am still trying to figure out what tool I can use from host to get the dump :wink:

1 Like

Day 18

(30 seconds).

It seems I missed in Day 13 to actually apply this first change I was talking about.
Let’s do that, since it is called “minimal changes to boot” :).

Hmm… there was a boot.img created with hadk scripts, possibly with the first kernel I changed to pass all mer checker tests and other kernel checkers.
I think that image has the lingeage ramdisk with that kernel. I’m trying to boot that and for a second time I get the
‘Waiting for flashing full ramdump’ message [1].
Since then, I have taken back most of the kernel changes and re-made them ‘incrementally’ based on the sony xperia 10 IV ubports repo.
I re-build make bootimage and I still get that ramdump message though after ASUS logo stays for some time - and then dissappears for some time…
‘Waiting for flashing full ramdump’ again. This may be because I just added oops=panic yesterday?


That file I appended yesterday to the lineage ramdisk… Let’s look into build/make/core/Makefile
After copying a lot of files to $(TARGET_RECOVERY_ROOT_OUT), it executes a command $(BOARD_RECOVERY_IMAGE_PREPARE).
Then it uses mkbootfs instead of cpio, citing the kernel Documentation/driver-api/early-userspace/buffer-format.rst - which just describes cpio…

find . -iname \*.mk -exec grep BOARD_RECOVERY_IMAGE_PREPARE {} \; -print finds nothing (tried .bp and Makefile too)

mkbootfs.c seems to have a list of permissions and ownerships for android filesystem.
Maybe the problem is that the file I appended was created with my local user, which has uid/git of 1000 - that corresponds to system in Android.
However, changing the file ownership to root does not make the device boot.
Neither does… not adding a file at all:). That may be because the way I use LZ4 command line (defaults?)
That same core/Makefile above says $(LZ4) -l -12 --favor-decSpeed
Yup! Victory! one file appended to that ramdisk!

Next test: hybris-recovery with lz4 -l (“Use Legacy format (typically for Linux Kernel compression)” - get it?).
This time, instead of rebooting in ~14 seconds, it gets stuck at bootloader image.
Progress is getting redefined weirder and weirder.

Back to my non-booting hybris-recovery.img:

I’ll change the gzip -9 commands inside hybris-boot/Android.mk to lz4 -l -12 --favor-decSpeed -
The .gz extensions to lz4.

This will allow the bootloader to concatenate the hybris boot.img ramdisk with vendor_boot.img ramdisk.
But I remember, the vendor_boot is the one with the kernel cmdline. Are there any differences between Lineage vendor_boot and HADK boot cmdline? Maybe.

I try to boot a hybris-recovery.img lz4’d with that but the result is still stuck at logo.
Booting the equivalent boot.img (if you remember, hybris-hal also builds one just as lineage does…) - it also does not get me out of the woods, but it does switch to ASUS logo, then blank for a long time, then RAMDUMP screen [1].

Maybe the ramdisk in that boot is different, maybe the kernel cmdline should be left out…
Trying with lineage ramdisk and no cmdline → no sign of life, no ASUS logo.
Trying with lineage ramdisk and cmdline → the same
Non sense, IMO - lineage booted just fine with my previous kernels?
I cannot make it boot now.
And there is a slight difference between the hybris kernel + lineage ramdisk vs the hybris-recovery: the latter displays an android logo over the selected bootloader option, while the latter does not…

Oh my!, I think I just made a header_version = 0 image

mkbootimg --header_version 0 --kernel boot/kernel-lineage --ramdisk boot/ramdisk-lineage --cmdline ‘’ --out boot-lineage-lineage-nocmdline.img

instead of

mkbootimg --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel boot/kernel --ramdisk boot/ramdisk --cmdline ‘’ --out boot-appendone.img

Mbbfrp… that was it, lineage boots with my kernel alright :facepalm:

However, with my ramdisk (from boot.img) it gives a RAMDUMP.
[1] I think I know why: that ramdisk has a different system/init/bin. That is Android’s init, which is re-used now in boot, recovery and in SFOS too, but patched.
The one that runs droid-hal-init when SFOS is staring. That may be the cause of my RAMDUMP. So… I can ignore this boot.img for now:)

There is one thing that tricked me before into making a header-0 image, and that is unpack_bootimg output from hybris-recovery.img.
I need to make that header-3 too.

Booting that - wow, new symptoms again - 10 seconds and the phone immediately shuts down. Effectively displays “charging” logo.

Not convinced. one more try:) This time, I use the “unlocked bootloader” feature to ‘press any key to stop booting’, THEN I set up my Timer, then click to resume. Asus logo… 35 seconds or so to… reboot.

If you remember, what was the sleep that I introduced in init[2] shell? 30 seconds.

I back out my changes from init, make hybris-recovery again, unpack it, repack it with image version 3.
And it reboots in about 65 seconds.
Not bright, not terrible - this means that RNDIS didn’t work.


[2] my changes to add sleep 30; reboot -f were wrong, I had to move them after busybox --install a couple of days ago, after elros34’s advice on #sailfishos-porters

1 Like

Day 19

To recap, I replaced my own-built kernels into Lineage’s boot.img and they worked.
I tried to replace Lineage’s ramdisk in hybris boot.img and it didnt.
=> This has lead me to the fact that hybris boot/recovery is built using image header 0 (instead of 3).

I tried to append a file to the Lineage ramdisk.
=> This revealed that there are particular lz4 parameters used for the ramdisk

Now this part of the init script does not work

usb_setup() {
    if [ -d $ANDROID_USB ]; then
        usb_setup_android_usb $1
    elif [ -d $GADGET_DIR ]; then
        usb_setup_configfs $1
    fi
}

where ANDROID_USB=/sys/class/android_usb/android0 and GADGET_DIR=/config/usb_gadget.
Let’s see what devices are available under Lineage recovery:

ASUS_I006D:/ # ls -l /sys/class/android_usb/android0/                                                                                                
power/                   state                    subsystem/               uevent                   waiting_for_supplier
ASUS_I006D:/ # ls /config/usb_gadget/g1/                                                                                                             
UDC           bDeviceProtocol  bMaxPacketSize0  bcdUSB   driver_match_existing_only  idProduct  max_speed  strings
bDeviceClass  bDeviceSubClass  bcdDevice        configs  functions                   idVendor   os_desc

So both are.

  1. Android USB

# This sets up the USB with whatever USB_FUNCTIONS are set to via android_usb

The first command in usb_setup_android_usb() is:

    write $ANDROID_USB/enable          0

Executing that write on Lineage does not work, write: inaccessible or not found

  1. Gadget

The first command in usb_setup_configfs() is:

    write $GADGET_DIR/g1/idVendor                   "0x18D1"

Executing that write on Lineage does not work, write: inaccessible or not found

Grepping #sailfishos-porters logs, I again find an advice from elros34 to see what is my android device using in */usb.rc files.
These .rc files are Android’s init system’s.

Doing find device/asus/sake -iname \*.rc -exec grep usb {} \; -print finds me a configfs reference.

But let’s first fix the build to generate header-3 images; Adding the previous Lineage custom mkbootimg shellscript gives us

MBOOTIMG --ramdisk out/target/product/sake/obj/ROOT/hybris-recovery_intermediates/recovery-initramfs.lz4 --kernel out/target/product/sake/kernel --dtb out/target/product/sake/dtb.img --base 0x00000000 --pagesize 4096 --cmdline printk.devkmsg=on printk.always_kmsg_dump=y ramoops_memreserve=4M androidboot.ramdump=disable androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 androidboot.selinux=permissive cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enable=1 swiotlb=0 bootmode=debug --output out/target/product/sake/obj/ROOT/hybris-recovery_intermediates/hybris-recovery.img

Somehow, the BOARD_MKRECOVERYIMG_ARGS which have --header_version $(BOARD_BOOT_HEADER_VERSION), don’t make it to hybris-boot/Android.mk
I will add the --header_version $(BOARD_BOOT_HEADER_VERSION) manually to the $(MKBOOTIMG) command.

Building a new hybris-recovery & booting it shows some difference from yesterday (ASUS logo instead of stuck on Android one) and reboots in 65+ seconds… into a “Safe mode” Lineage. Didn’t know that was a thing :slight_smile:

Now reordering the usb_setup lines in init script to read

usb_setup() {
    if [ -d $GADGET_DIR ]; then
        usb_setup_configfs $1
    elif [ -d $ANDROID_USB ]; then
        usb_setup_android_usb $1
    fi
}

But I get the same result as above, reboot in 60+ seconds.

Let’s unpack/repack the image as yesterday. Same 65+ seconds reboot, I think the image is packed alright but the kernel still misses some configs to get usb telnet.


Grepping for usb in .rc files as above also found me a reference to /vendor/etc/init/hw/init.qcom.usb.rc which I find only on the device (interesting, I thought this Lineage build was also building vendor…?)

init.qcom.usb.rc seems to be using write /config/usb_gadget/g2/configs/b.1/strings/0x409/configuration "rndis", that is, b.1 instead of c.1 as in the init script.

I ask on #sailfishos-porters since I don’t see by grepping the logs if someone did this at any other point in time.
elros34 is again of much help giving me another advice: if I know that my kernel boots, maybe I should just… boot it. Meaning, use hybris-boot.img instead of hybris-recovery.img.

Then I will get logs and could iterate later in “rootfs”.

What he means is that, if /data is mounted correctly and /data/.stowaways/sailfishos is there, then I can debug by editing that directly (through recovery, or even Lineage). That’s because hybris-boot will not stop/reboot if there is no USB device up. I’d also have logs persisted in init.log.

But I still get that BadDLLP error on my host. Let me try that kernel param again… nope, it just messes up with my flashing (needs me to reboot booatloader?) but usb is not talking to me any more than ever, no telnet.

Next day I’ll create the droid-config-sake, droid-hal-sake and droid-hal-version-sake repos and try a real SFOS build encouraged by

kernel runs init, proof by changing reboot sleep

(as postulated in Day 11)

2 Likes

Day 20

HADK guide,
7.1 Creating Repositories for a New Device

(Skipping sparse/var/lib/environment/compositor/droid-hal-device.conf - it is already present in droid-configs-device/sparse/var/lib/environment/compositor/droid-hal-device.conf, without the touchscreen device part.)

After creating the three repos droid-hal-sake, droid-config-sake, droid-version-sake and I also check in PLATFORM_SDK$ (sfossdk command)

  • cat /etc/os-release gives me 4.3.0.15, which is minimum supported
  • sdk-assistant tooling list contains 4.4.0 (which I want to target) and ‘latest’
  • sdk-assistant target list contains… xiaomi-tucana :slight_smile:

I need to create a new target - I download Sailfish_OS-4.4.0.58-Sailfish_SDK_Target-aarch64.tar.7z and run sdk-assistant target create asus-sake-aarch64.

But how will the next commands know that they should use this new target?
There’s a footnote in the HADK doc that says:

mb2 looks for a directory named .mb2, where it stores some of its state. It is created implicitly by mb2 … build and you can also
create it explicitly with mb2 -t $VENDOR-$DEVICE-$PORT_ARCH build-init

Indeed, I now see that cat .mb2/target has asus-sake-aarch64.default. So I can proceed without breaking my previous target.

Next, execute commands from “7.2.1 Building the droid-hal-device packages” HADK doc.

rpm/dhd/helpers/build_packages.sh --droid-hal

The first command fails while checking CONFIG_DUMMY (should be n), CONFIG_UEVENT_HELPER_PATH and CONFIG_FW_LOADER_USER_HELPER (should be n).

I think my “minimal changes” kernel will probably not work with these checks.
Either I have to a). disable the checks “temporarily” or b). switch back to the initial branch that passed through all the kernel checkers.
Of course I go with b). initially. I always wanted to test hybris-recovery with that anyway:)

The kernel “works” in the same way meaning that it “reboots when I say so”, but still no telnet.
It seems to pass the --droid-hal check, which now complains that cannot stat './out/target/product/sake/hybris-updater-script'

That one seems to be missing from the fact that there is no /boot detected when I make hybris-hal (that is, kernel and boot images):

/boot appears to live on
/data appears to live on /dev/block/bootdevice/by-name/userdata

It definitely doesn’t “Live on”:). That was supposed to be a device path.
The hybris-boot/Android.mk presents me with the challenge to understand Perl.

/usr/bin/perl -w -e '$$fs=shift; if ($$ARGV[0]) { while (<>) { next unless /^$$fs\s|\s$$fs\s/;for (split) {next unless m(^/dev); print "$$_\n"; }}} else { print "ERROR: *fstab* not found\n";}' /boot $HYBRIS_FSTABS | sort -u

It’s one line, how hard can it be ;)). HYBRIS_FSTABS is a list of fstab files.
So, the perl expression is using shift which probably means it read the first argument (which is /boot) then for the ‘next first argument’ ARGV[0] it uses some while spaceship (probably iterating through each fstab path?) where next is called (continue?) unless that $fs is matching at the beginning or between \s some \s and then split and for next unless /dev…

Anyway, enough perl, I didn’t have a /dev path in the fstab that listed /boot.

The build_packages.sh --droid-hal seems to start the real work now.

Meanwhile mal explains what to do with dynamic partitions. Right on time:) Telegram: Contact @sailfishos_porters
So I need to skip generating some systemd mounts in droid-hal spec file.

%define makefstab_skip_entries /odm /product /system /system_ext /vendor

Also I get this complaint from --droid-hal build above:

error: Installed (but unpackaged) file(s) found: /bugreports /cache /d /sdcard

Which I resolve by adding %define straggler_files .. with that list above.
Building of droid-hal-sake finished successfully.

Next is rpm/dhd/helpers/build_packages.sh --configs
That already installs some packages in the target (bluez5 etc) but works.

Next is rpm/dhd/helpers/build_packages.sh --mw
mmmiddleware… La crème.

Nothing to do.
Build libhybris? [Y/n/all]

If I remember correctly it asks for each one of the packages. So I’ll go with “all”
That takes some time but finishes alright

Next is rpm/dhd/helpers/build_packages.sh --gg

Please build droidmedia as per HADK instructions
!! Failed to pack_source_droidmedia-localbuild.sh

Well, let’s TEMPORARY_DISABLE_PATH_RESTRICTIONS=true make -j$(nproc --all) droidmedia one more time, I didn’t since a couple of days.

ninja: no work to do.

Still the same error.

touch external/droidmedia/droidmedia.cpp

Doesn’t build anything either. There is no out/target/product/sake/system/lib*/libdroidmedia* either,
Looking at external/droidmedia/Android.mk I see LOCAL_MODULE := libdroidmedia (along others). Let’s make that…?
Actually, I think I know what’s going on. I checked out the 0.20220929.0 tag for 4.5.0 as in hadk-hot and for 4.4.0 the 0.20211101.0 tag was supposed to be used.

But I get the same error. I need to understand where droidmedia target is.
Grepping the *.mk files again, I find it in … hybris-boot:

droidmedia: $(shell external/droidmedia/detect_build_targets.sh $(PORT_ARCH) $(TARGET_ARCH))

If I manually run that command I get the list of targets libdroidmedia minimediaservice minisfservice libminisf from external/droidmedia.
I’ll just make all of those and assume that me making hybris-hal without droidmedia marked that shell target generation as ‘done’ and it doesn’t get evaluated again \o/

It completes.
Next is rpm/dhd/helpers/build_packages.sh --version

Check /home/vlad/hadk/hybris/droid-hal-version-sake.log for full log.
!! building of package failed

Checking that log reveals:

File /etc/ofono/binder.conf from install of droid-config-sake-1-202303222304.aarch64 (dir:/home/vlad/hadk/hybris/droid-hal-version-sake/.mb2/filtered-output-dir) conflicts with file from package ofono-configs-binder-1.0.2-1.1.1.jolla.aarch64 (@System)

File conflicts happen when two packages attempt to install files with the same name but different contents. If you continue, conflicting files will be replaced losing the previous content.

Which means that I need to make a change to droid-config-sake spec.
Hadk-hot has the solution for that readily available,

Provides: ofono-configs
Obsoletes: ofono-configs-mer

Adding those, re-building --configs, then --version

While building I remembered that Lineage /data is going to get wiped for the next test.
I install Open Camera and notice that if you switch to Camera 2 API, there are settings for Edge Detection and Noise reduction.
Cranking the Edge detection down (off) and Noise reduction to “High Quality” I get similar results to my Mi Note 10 that has a larger sensor but tends to not sharpen but to aggressively remove noise. The Camara 1 API shots on ASUS are however very sharpened by default. It is good to know that just changing the API adds access to those options. Of course, Noise reduction should be minimal too :slight_smile:

Hmm. same error in hybris/droid-hal-version-sake.log.
A quest for tomorrow:)