Livecasting porting notes for Zenfone 8

So far it’s just a screen that looks like this, and can be made go away by VolDn + Power.

This post on stackoverflow explains that it might be “just” a kernel panic that, along with bootloader support for MAGIC_CRASH makes the next boot look like this.

My hypothesis is that ASUS bootloader included this MAGIC_CRASH handler in the bootloader so this is why they have reports of the “dreaded ramdump bug” instead of just “dreaded bootloop”.

I am still trying to figure out what tool I can use from host to get the dump :wink:

1 Like

Day 18

(30 seconds).

It seems I missed in Day 13 to actually apply this first change I was talking about.
Let’s do that, since it is called “minimal changes to boot” :).

Hmm… there was a boot.img created with hadk scripts, possibly with the first kernel I changed to pass all mer checker tests and other kernel checkers.
I think that image has the lingeage ramdisk with that kernel. I’m trying to boot that and for a second time I get the
‘Waiting for flashing full ramdump’ message [1].
Since then, I have taken back most of the kernel changes and re-made them ‘incrementally’ based on the sony xperia 10 IV ubports repo.
I re-build make bootimage and I still get that ramdump message though after ASUS logo stays for some time - and then dissappears for some time…
‘Waiting for flashing full ramdump’ again. This may be because I just added oops=panic yesterday?


That file I appended yesterday to the lineage ramdisk… Let’s look into build/make/core/Makefile
After copying a lot of files to $(TARGET_RECOVERY_ROOT_OUT), it executes a command $(BOARD_RECOVERY_IMAGE_PREPARE).
Then it uses mkbootfs instead of cpio, citing the kernel Documentation/driver-api/early-userspace/buffer-format.rst - which just describes cpio…

find . -iname \*.mk -exec grep BOARD_RECOVERY_IMAGE_PREPARE {} \; -print finds nothing (tried .bp and Makefile too)

mkbootfs.c seems to have a list of permissions and ownerships for android filesystem.
Maybe the problem is that the file I appended was created with my local user, which has uid/git of 1000 - that corresponds to system in Android.
However, changing the file ownership to root does not make the device boot.
Neither does… not adding a file at all:). That may be because the way I use LZ4 command line (defaults?)
That same core/Makefile above says $(LZ4) -l -12 --favor-decSpeed
Yup! Victory! one file appended to that ramdisk!

Next test: hybris-recovery with lz4 -l (“Use Legacy format (typically for Linux Kernel compression)” - get it?).
This time, instead of rebooting in ~14 seconds, it gets stuck at bootloader image.
Progress is getting redefined weirder and weirder.

Back to my non-booting hybris-recovery.img:

I’ll change the gzip -9 commands inside hybris-boot/Android.mk to lz4 -l -12 --favor-decSpeed -
The .gz extensions to lz4.

This will allow the bootloader to concatenate the hybris boot.img ramdisk with vendor_boot.img ramdisk.
But I remember, the vendor_boot is the one with the kernel cmdline. Are there any differences between Lineage vendor_boot and HADK boot cmdline? Maybe.

I try to boot a hybris-recovery.img lz4’d with that but the result is still stuck at logo.
Booting the equivalent boot.img (if you remember, hybris-hal also builds one just as lineage does…) - it also does not get me out of the woods, but it does switch to ASUS logo, then blank for a long time, then RAMDUMP screen [1].

Maybe the ramdisk in that boot is different, maybe the kernel cmdline should be left out…
Trying with lineage ramdisk and no cmdline → no sign of life, no ASUS logo.
Trying with lineage ramdisk and cmdline → the same
Non sense, IMO - lineage booted just fine with my previous kernels?
I cannot make it boot now.
And there is a slight difference between the hybris kernel + lineage ramdisk vs the hybris-recovery: the latter displays an android logo over the selected bootloader option, while the latter does not…

Oh my!, I think I just made a header_version = 0 image

mkbootimg --header_version 0 --kernel boot/kernel-lineage --ramdisk boot/ramdisk-lineage --cmdline ‘’ --out boot-lineage-lineage-nocmdline.img

instead of

mkbootimg --header_version 3 --os_version 11.0.0 --os_patch_level 2023-02 --kernel boot/kernel --ramdisk boot/ramdisk --cmdline ‘’ --out boot-appendone.img

Mbbfrp… that was it, lineage boots with my kernel alright :facepalm:

However, with my ramdisk (from boot.img) it gives a RAMDUMP.
[1] I think I know why: that ramdisk has a different system/init/bin. That is Android’s init, which is re-used now in boot, recovery and in SFOS too, but patched.
The one that runs droid-hal-init when SFOS is staring. That may be the cause of my RAMDUMP. So… I can ignore this boot.img for now:)

There is one thing that tricked me before into making a header-0 image, and that is unpack_bootimg output from hybris-recovery.img.
I need to make that header-3 too.

Booting that - wow, new symptoms again - 10 seconds and the phone immediately shuts down. Effectively displays “charging” logo.

Not convinced. one more try:) This time, I use the “unlocked bootloader” feature to ‘press any key to stop booting’, THEN I set up my Timer, then click to resume. Asus logo… 35 seconds or so to… reboot.

If you remember, what was the sleep that I introduced in init[2] shell? 30 seconds.

I back out my changes from init, make hybris-recovery again, unpack it, repack it with image version 3.
And it reboots in about 65 seconds.
Not bright, not terrible - this means that RNDIS didn’t work.


[2] my changes to add sleep 30; reboot -f were wrong, I had to move them after busybox --install a couple of days ago, after elros34’s advice on #sailfishos-porters

1 Like

Day 19

To recap, I replaced my own-built kernels into Lineage’s boot.img and they worked.
I tried to replace Lineage’s ramdisk in hybris boot.img and it didnt.
=> This has lead me to the fact that hybris boot/recovery is built using image header 0 (instead of 3).

I tried to append a file to the Lineage ramdisk.
=> This revealed that there are particular lz4 parameters used for the ramdisk

Now this part of the init script does not work

usb_setup() {
    if [ -d $ANDROID_USB ]; then
        usb_setup_android_usb $1
    elif [ -d $GADGET_DIR ]; then
        usb_setup_configfs $1
    fi
}

where ANDROID_USB=/sys/class/android_usb/android0 and GADGET_DIR=/config/usb_gadget.
Let’s see what devices are available under Lineage recovery:

ASUS_I006D:/ # ls -l /sys/class/android_usb/android0/                                                                                                
power/                   state                    subsystem/               uevent                   waiting_for_supplier
ASUS_I006D:/ # ls /config/usb_gadget/g1/                                                                                                             
UDC           bDeviceProtocol  bMaxPacketSize0  bcdUSB   driver_match_existing_only  idProduct  max_speed  strings
bDeviceClass  bDeviceSubClass  bcdDevice        configs  functions                   idVendor   os_desc

So both are.

  1. Android USB

# This sets up the USB with whatever USB_FUNCTIONS are set to via android_usb

The first command in usb_setup_android_usb() is:

    write $ANDROID_USB/enable          0

Executing that write on Lineage does not work, write: inaccessible or not found

  1. Gadget

The first command in usb_setup_configfs() is:

    write $GADGET_DIR/g1/idVendor                   "0x18D1"

Executing that write on Lineage does not work, write: inaccessible or not found

Grepping #sailfishos-porters logs, I again find an advice from elros34 to see what is my android device using in */usb.rc files.
These .rc files are Android’s init system’s.

Doing find device/asus/sake -iname \*.rc -exec grep usb {} \; -print finds me a configfs reference.

But let’s first fix the build to generate header-3 images; Adding the previous Lineage custom mkbootimg shellscript gives us

MBOOTIMG --ramdisk out/target/product/sake/obj/ROOT/hybris-recovery_intermediates/recovery-initramfs.lz4 --kernel out/target/product/sake/kernel --dtb out/target/product/sake/dtb.img --base 0x00000000 --pagesize 4096 --cmdline printk.devkmsg=on printk.always_kmsg_dump=y ramoops_memreserve=4M androidboot.ramdump=disable androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 androidboot.selinux=permissive cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enable=1 swiotlb=0 bootmode=debug --output out/target/product/sake/obj/ROOT/hybris-recovery_intermediates/hybris-recovery.img

Somehow, the BOARD_MKRECOVERYIMG_ARGS which have --header_version $(BOARD_BOOT_HEADER_VERSION), don’t make it to hybris-boot/Android.mk
I will add the --header_version $(BOARD_BOOT_HEADER_VERSION) manually to the $(MKBOOTIMG) command.

Building a new hybris-recovery & booting it shows some difference from yesterday (ASUS logo instead of stuck on Android one) and reboots in 65+ seconds… into a “Safe mode” Lineage. Didn’t know that was a thing :slight_smile:

Now reordering the usb_setup lines in init script to read

usb_setup() {
    if [ -d $GADGET_DIR ]; then
        usb_setup_configfs $1
    elif [ -d $ANDROID_USB ]; then
        usb_setup_android_usb $1
    fi
}

But I get the same result as above, reboot in 60+ seconds.

Let’s unpack/repack the image as yesterday. Same 65+ seconds reboot, I think the image is packed alright but the kernel still misses some configs to get usb telnet.


Grepping for usb in .rc files as above also found me a reference to /vendor/etc/init/hw/init.qcom.usb.rc which I find only on the device (interesting, I thought this Lineage build was also building vendor…?)

init.qcom.usb.rc seems to be using write /config/usb_gadget/g2/configs/b.1/strings/0x409/configuration "rndis", that is, b.1 instead of c.1 as in the init script.

I ask on #sailfishos-porters since I don’t see by grepping the logs if someone did this at any other point in time.
elros34 is again of much help giving me another advice: if I know that my kernel boots, maybe I should just… boot it. Meaning, use hybris-boot.img instead of hybris-recovery.img.

Then I will get logs and could iterate later in “rootfs”.

What he means is that, if /data is mounted correctly and /data/.stowaways/sailfishos is there, then I can debug by editing that directly (through recovery, or even Lineage). That’s because hybris-boot will not stop/reboot if there is no USB device up. I’d also have logs persisted in init.log.

But I still get that BadDLLP error on my host. Let me try that kernel param again… nope, it just messes up with my flashing (needs me to reboot booatloader?) but usb is not talking to me any more than ever, no telnet.

Next day I’ll create the droid-config-sake, droid-hal-sake and droid-hal-version-sake repos and try a real SFOS build encouraged by

kernel runs init, proof by changing reboot sleep

(as postulated in Day 11)

2 Likes

Day 20

HADK guide,
7.1 Creating Repositories for a New Device

(Skipping sparse/var/lib/environment/compositor/droid-hal-device.conf - it is already present in droid-configs-device/sparse/var/lib/environment/compositor/droid-hal-device.conf, without the touchscreen device part.)

After creating the three repos droid-hal-sake, droid-config-sake, droid-version-sake and I also check in PLATFORM_SDK$ (sfossdk command)

  • cat /etc/os-release gives me 4.3.0.15, which is minimum supported
  • sdk-assistant tooling list contains 4.4.0 (which I want to target) and ‘latest’
  • sdk-assistant target list contains… xiaomi-tucana :slight_smile:

I need to create a new target - I download Sailfish_OS-4.4.0.58-Sailfish_SDK_Target-aarch64.tar.7z and run sdk-assistant target create asus-sake-aarch64.

But how will the next commands know that they should use this new target?
There’s a footnote in the HADK doc that says:

mb2 looks for a directory named .mb2, where it stores some of its state. It is created implicitly by mb2 … build and you can also
create it explicitly with mb2 -t $VENDOR-$DEVICE-$PORT_ARCH build-init

Indeed, I now see that cat .mb2/target has asus-sake-aarch64.default. So I can proceed without breaking my previous target.

Next, execute commands from “7.2.1 Building the droid-hal-device packages” HADK doc.

rpm/dhd/helpers/build_packages.sh --droid-hal

The first command fails while checking CONFIG_DUMMY (should be n), CONFIG_UEVENT_HELPER_PATH and CONFIG_FW_LOADER_USER_HELPER (should be n).

I think my “minimal changes” kernel will probably not work with these checks.
Either I have to a). disable the checks “temporarily” or b). switch back to the initial branch that passed through all the kernel checkers.
Of course I go with b). initially. I always wanted to test hybris-recovery with that anyway:)

The kernel “works” in the same way meaning that it “reboots when I say so”, but still no telnet.
It seems to pass the --droid-hal check, which now complains that cannot stat './out/target/product/sake/hybris-updater-script'

That one seems to be missing from the fact that there is no /boot detected when I make hybris-hal (that is, kernel and boot images):

/boot appears to live on
/data appears to live on /dev/block/bootdevice/by-name/userdata

It definitely doesn’t “Live on”:). That was supposed to be a device path.
The hybris-boot/Android.mk presents me with the challenge to understand Perl.

/usr/bin/perl -w -e '$$fs=shift; if ($$ARGV[0]) { while (<>) { next unless /^$$fs\s|\s$$fs\s/;for (split) {next unless m(^/dev); print "$$_\n"; }}} else { print "ERROR: *fstab* not found\n";}' /boot $HYBRIS_FSTABS | sort -u

It’s one line, how hard can it be ;)). HYBRIS_FSTABS is a list of fstab files.
So, the perl expression is using shift which probably means it read the first argument (which is /boot) then for the ‘next first argument’ ARGV[0] it uses some while spaceship (probably iterating through each fstab path?) where next is called (continue?) unless that $fs is matching at the beginning or between \s some \s and then split and for next unless /dev…

Anyway, enough perl, I didn’t have a /dev path in the fstab that listed /boot.

The build_packages.sh --droid-hal seems to start the real work now.

Meanwhile mal explains what to do with dynamic partitions. Right on time:) Telegram: Contact @sailfishos_porters
So I need to skip generating some systemd mounts in droid-hal spec file.

%define makefstab_skip_entries /odm /product /system /system_ext /vendor

Also I get this complaint from --droid-hal build above:

error: Installed (but unpackaged) file(s) found: /bugreports /cache /d /sdcard

Which I resolve by adding %define straggler_files .. with that list above.
Building of droid-hal-sake finished successfully.

Next is rpm/dhd/helpers/build_packages.sh --configs
That already installs some packages in the target (bluez5 etc) but works.

Next is rpm/dhd/helpers/build_packages.sh --mw
mmmiddleware… La crème.

Nothing to do.
Build libhybris? [Y/n/all]

If I remember correctly it asks for each one of the packages. So I’ll go with “all”
That takes some time but finishes alright

Next is rpm/dhd/helpers/build_packages.sh --gg

Please build droidmedia as per HADK instructions
!! Failed to pack_source_droidmedia-localbuild.sh

Well, let’s TEMPORARY_DISABLE_PATH_RESTRICTIONS=true make -j$(nproc --all) droidmedia one more time, I didn’t since a couple of days.

ninja: no work to do.

Still the same error.

touch external/droidmedia/droidmedia.cpp

Doesn’t build anything either. There is no out/target/product/sake/system/lib*/libdroidmedia* either,
Looking at external/droidmedia/Android.mk I see LOCAL_MODULE := libdroidmedia (along others). Let’s make that…?
Actually, I think I know what’s going on. I checked out the 0.20220929.0 tag for 4.5.0 as in hadk-hot and for 4.4.0 the 0.20211101.0 tag was supposed to be used.

But I get the same error. I need to understand where droidmedia target is.
Grepping the *.mk files again, I find it in … hybris-boot:

droidmedia: $(shell external/droidmedia/detect_build_targets.sh $(PORT_ARCH) $(TARGET_ARCH))

If I manually run that command I get the list of targets libdroidmedia minimediaservice minisfservice libminisf from external/droidmedia.
I’ll just make all of those and assume that me making hybris-hal without droidmedia marked that shell target generation as ‘done’ and it doesn’t get evaluated again \o/

It completes.
Next is rpm/dhd/helpers/build_packages.sh --version

Check /home/vlad/hadk/hybris/droid-hal-version-sake.log for full log.
!! building of package failed

Checking that log reveals:

File /etc/ofono/binder.conf from install of droid-config-sake-1-202303222304.aarch64 (dir:/home/vlad/hadk/hybris/droid-hal-version-sake/.mb2/filtered-output-dir) conflicts with file from package ofono-configs-binder-1.0.2-1.1.1.jolla.aarch64 (@System)

File conflicts happen when two packages attempt to install files with the same name but different contents. If you continue, conflicting files will be replaced losing the previous content.

Which means that I need to make a change to droid-config-sake spec.
Hadk-hot has the solution for that readily available,

Provides: ofono-configs
Obsoletes: ofono-configs-mer

Adding those, re-building --configs, then --version

While building I remembered that Lineage /data is going to get wiped for the next test.
I install Open Camera and notice that if you switch to Camera 2 API, there are settings for Edge Detection and Noise reduction.
Cranking the Edge detection down (off) and Noise reduction to “High Quality” I get similar results to my Mi Note 10 that has a larger sensor but tends to not sharpen but to aggressively remove noise. The Camara 1 API shots on ASUS are however very sharpened by default. It is good to know that just changing the API adds access to those options. Of course, Noise reduction should be minimal too :slight_smile:

Hmm. same error in hybris/droid-hal-version-sake.log.
A quest for tomorrow:)

2 Likes

Day 21

That droidmedia missing build from yesterday - I figured I had no PORT_ARCH defined, that’s why the shellscript describing the targets from hybris-boot was not working.

Normally you have a ~/.hadk.env that is sourced when you run sfossdk which gives you PlatformSDK$ prompt and then from there you run ubu-chroot to build the Android bits. However, I quit that on a build error on Day 10 and directly built the Android bits from outside the sdk.
Exporting PORT_ARCH and the other env vars made it work out of the box.

That droid-hal-version error build from yesterday, about ofono configs errors?
That had a follow-up in hadk-hot: you need to manually install the resulting new package, which interactively asks you if you’re ok to overwrite a file from ofono-configs-binder. This is the reason --configs step was successful but the package did not install in --version step.

I re-run all the --droid-hal/--configs/--gg/--version build commands above (-d, -c, -g, -v).
They pass.
The last one is build_packages.sh --mic - Mer Image Creation I believe.

This one fails with Requires [rpmlib(PayloadIsZstd)-5.4.18-1], which is not provided
I believe this has something to do with changing compression method recently.
mal recommends for other porter encountering the same error to "sdk-foreach-su -ly ssu re 4.5.0.18", "sdk-foreach-su -ly zypper ref" and then "sdk-foreach-su -ly zypper dup"
Well… and I was planning to build 4.4.0…
The changelog for 4.4.0 mentions [rpm] Add zstd support.. The one for 4.5.0 [meego-rpm-config] Change rpm compression to zstd
Not sure but my guess would be that the new compression needs to be supported in 4.4.0 already for 4.5.0 to… install.
So let’s upgrade just what was 4.3 then. I run sdk-foreach-su -ly ssu re 4.4.0.72 because I am stubborn and born to make choices that hit errors.

sdk-foreach-su: Executing in ‘SailfishOS-4.4.0’ tooling…
File ‘/repodata/repomd.xml’ not found on medium ‘https://releases.jolla.com/releases/4.4.0.72/mer-tools/builds/i486/packages/

Looking at Index of /sdk/targets/ actually only 4.4.0.58 is listed. Use that instead :slight_smile:
Well… that doesn’t work either, - it would probably work if I install manually, but the sdk-foreach left me in some half-state I wouldn’t want to debug. Let’s go with 4.5.0.18 thus…
FFS, not even that works \o/

In hindsight, I see that only ‘SailfishOS-latest’ tooling gives error, while ‘SailfishOS-4.4.0’ tooling does not, ‘SailfishOS-latest-armv7hl’ build target and ‘SailfishOS-latest-i486’ build target do, while ‘asus-sake-aarch64’ build target does not.
So it may be something about -latest things, which are the ones installed with SailfishOS IDE…

Buut… getting to the zypper dup step:

Error: Subprocess failed. Error: RPM failed: error: unpacking of archive failed: cpio: Bad magic

This has to be compression. Maybe I skipped 4.3 => 4.5 directly.

I remove the -latest targets/tooling (sdk-assistant remove target .., ..tooling). And try again to update to 4.4.0.72
As stated above, the zypper ref step does not fail (it only failed on -latest thingies).

All these commands are really commands to upgrade SDKs but using the same upgrade system as our phones.
The good thing is, if I break these “phones” I get to remove them and install them afresh :slight_smile:

With 4.4.0.72 and without -latest toolings/targets, zypper does start to instal packages without cpio error.
One of the packages that will get installed will be the rpm with support for zstd :fingerscrossed:

Error message: Could not resolve host: releases.jolla.com

?!?
On porters channel: “you need to exit sdk and re-enter before you can use network in sdk after you have updated it.”

Yes. That zypper dup across the universe takes some time…
Actually it takes so much time I think there’s a bug.
It may have been faster to re-create those SDK/tooling/target
I’m half-heartedly posting this before going to sleep while package 20 out of 573 still installs, instead of pressing Ctrl-C impulsively

Day 22

–mic finishes SailfishOScommunity-release-4.4.0.58-sake-zero/sailfishos-sake-release-4.4.0.58-zero.zip
Which contains a bz2.

Rebooting into Lineage Recovery, through adb root shell, I notice there is no bunzip2
The toybox command that wraps everything doesn’t know it either: toybox bunzip2

toybox: Unknown command bunzip2

So I push my 64-bit busybox (from ubports) to /system/bin and ln -s /system/bin/busybox /system/bin/bunzip2
Next thing, tar --help does show j bzip2 compression - so that might use the bunzip2 command
(The truth is busybox is 1.4M while toybox is under 500k)

Next I use Lineage recovery “Format data/factory reset”.
mount /data/ now works as f2fs and it’s empty. Maybe it should be ext4, but let’s see.

Next I use Lineage recovery “Apply update”/“Apply from adb”
adb sideload SailfishOScommunity-release-4.4.0.58-sake-zero/sailfishos-sake-release-4.4.0.58-zero.zip
Package not signed, install anyway? Yes

E:Error in /sideload/package.zip (killed by signal 11)

Maybe the sideloading is for “update” packages which I am not familiar with.
My zip file has a straightforward updater-unpack.sh however:

FS_ARC="/data/sailfishos-rootfs.tar.bz2"
FS_DST="/data/.stowaways/sailfishos"

rm -rf $FS_DST
mkdir -p $FS_DST
tar --numeric-owner -xvjf $FS_ARC -C $FS_DST
EXIT=$?

rm $FS_ARC

So I just push the .bz2 file and tar --numeric-owner -xvjf /data/sfe-sake-4.4.0.58-zero.tar.bz2 $FS_DST

tar: exec bzip2: No such file or directory

Ah, so I should link that to busybox…

bzip2: short write

Altough the error comes from bzip2, I think tar from toybox may be the culprit here.
Let’s link that to busybox too

tar: /data/.stowaways/sailfishos not found in archive

:facepalm:! I forgot an -C before $FS_DST: tar --numeric-owner -xvjf /data/sfe-sake-4.4.0.58-zero.tar.bz2 -C $FS_DST
Good. it works.

Now we’re going to reboot to bootloader and boot (without flashing) the hybris-boot.img.
Remember, if we flash it, Lineage recovery is gone since it lives on boot.img for this device. And we need that to inspect what happened in /data/.stowaways/saifishos…

After 60 something seconds it reboots.
Entering Lineage Recovery, adb root shell, mounting /data… it’s empty!
Hmm… how could that be?

Let’s re-trace the steps and just reboot into recovery:
HOST: adb push external/busybox/busybox /system/bin/
ADB#: ln -s /system/bin/busybox /system/bin/bzip2
ADB#: ln -sf /system/bin/busybox /system/bin/tar
HOST: adb push sfe-sake-4.4.0.58-zero.tar.bz2 /data/
ADB#: mkdir -p /data/.stowaways/sailfishos
ADB#: tar --numeric-owner -xvjf /data/sfe-sake-4.4.0.58-zero.tar.bz2 -C /data/.stowaways/sailfishos

Reboot to bootloader → Reboot to recovery
ADB# mount /data
ADB# ls /data/.stowaways/sailfishos/

Rootfs is there. Let’s boot hybris-boot again …
Reboots in 60 seconds, let’s hold Vol-up, enter Recovery, enable Adb shell
Now stoways survive.
And I got my first init.log!

4 Likes

Day 23

The problem with the log before is that I didn’t see any of the device changing on my Host (like: “init-debug in real rootfs”)

elros has figured that in my previous log there are two lines output to $GADGET_DIR/g1/UDC and suggests this change to exclude the dummy one.
With that fix, the next boot, while still doesn’t show any new USB device on my host computer, activates an usb0 device (suprised it’s not rndis0) and configures it with an IP. It starts dhcp server even though writing to /etc/udhcpd.conf fails (it’s a link to /run/usb-moded/udhcpd.conf that doesn’t exist?)
Next I grep my vendor/device sources for init.usb.rc files (unfortunately I dont’ find any meaninful ones, which is surprising).
Lineage recovery has a GUI option to mount system. When I do that, it appears on /dev/block/dm-2. I scroll up to Day 10 when I determined by dmctl list devices that vendor_b is on 4th dm device. I mount that in /mnt/vendor
$ find /mnt/vendor -iname *.rc -exec grep gadget {} ; -print
And I find this /etc/init/hw/init.qcom.usb.rc and in another rc file that vendor.usb.rndis.func.name is “gsi”.
Notice that the init script uses b.1 instead of c.1 and gsi.rndis instead of rmnet_bam

I also remembered that my hybris-boot.img is concatenated with a vendor_boot which probably doesn’t have my kernel modules, but Lineages’. Grepping hadk/out/target/product/sake/obj/KERNEL_OBJ I find a lot of *.ko files, even rmnet_ctl.ko and rmnet_core.ko
I descover that while toybox has the command ‘insmod’, busybox doesn’t. I can’t use toybox in Mer init, since that is not a glibc executable.
However in rootfs there seems to be a kmod command which works about the same as [busy|toy]box, in the sense that it can have link to it so it starts another command. ln -sf /usr/bin/kmod /usr/bin/insmod seems to work.
Initially I upload the two modules in root and manually issue

    insmod /rmnet_ctl.ko                                                       
    insmod /rmnet_core.ko

right after do_mount_devprocsys in the init script (I now remembered that I do have all the modules in rootfs /lib/modules :slight_smile:
Also, I create /init_enter_debug2 which grabs the output of some debugging tools too (ps, netstat, mount) and I also add dmesg there (dmesg output was present in my first log because no rndis/usb device was found and it rebooted. This time, usb0 is present and it doesn’t reboot.)

I get something like this as the next log which is showing some errors in dmesg:

RNDIS_IPA@rndis_ipa_init_module@2447@ctx:swapper/0: failed to create IPC log, continue…
gsi_set_inst_name: Err allocating ipc_log_ctxt for prot:gsi.rndis
gsi_bind: ipa is not ready
gsi_bind: ipa ready timeout
configfs-gadget a600000.dwc3: failed to start g1: -110

(-110 is probably timeout, according to perror 110)

Ideas:

  1. search for rmnet commit logs in kernel, something changed from 4.19 where this worked to 5.4
  2. remove vendor_boot kernel cmdline, as I now have duplicate definitions (is there a limit?). That would render Lineage unbootable, maybe better remove the duplicates from hybris-boot…?
  3. replace lineage boot.img kernel with mine so that I can test rndis commands in adb shell?
  4. use modprobe for /lib/modules
bloated cmdline
 log_buf_len=256K earlycon=msm_geni_serial,0x98c000 rcupdate.rcu_expedited=1 rcu_nocbs=0-7 kpti=off androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enable=1 swiotlb=0 buildvariant=userdebug printk.devkmsg=on printk.always_kmsg_dump=y ramoops_memreserve=4M androidboot.ramdump=disable androidboot.console=ttyMSM0 androidboot.hardware=qcom androidboot.memcg=1 androidboot.usbcontroller=a600000.dwc3 androidboot.selinux=permissive cgroup.memory=nokmem,nosocket console=ttyMSM0,115200n8 ip6table_raw.raw_before_defrag=1 iptable_raw.raw_before_defrag=1 loop.max_part=7 lpm_levels.sleep_disabled=1 msm_rtb.filter=0x237 pcie_ports=compat service_locator.enabl

For 4.

# ln -sf /usr/bin/kmod /usr/bin/modprobe`
# modprobe -D rmnet_core -S 5.4.61-qgki-perf-gbface408530e-dirty
insmod /lib/modules/5.4.61-qgki-perf-gbface408530e-dirty/rmnet_ctl.ko 
insmod /lib/modules/5.4.61-qgki-perf-gbface408530e-dirty/rmnet_core.ko 

I have to override the kernel version because in adb shell uname -r is without -dirty.

# ln -sf /usr/bin/kmod /usr/bin/depmod`
# depmod -a -v

New log with depmod -a: Ubuntu Pastebin
Lineage recovery dmesg for comparison: Ubuntu Pastebin

Looking at kernel sources.
rndis_ipa_init_module message “failed to create IPC log, continue…” seems to be a debugging log, we can probably ignore that.
gsi_set_inst_name: “Err allocating ipc_log_ctxt for prot:gsi.rndis” seems to be the same ipc_log_context_create as above.

gsi_bind: “ipa is not ready”. This is call from drivers/usb/gadget/function/f_gsi.c to ipa_register_ipa_ready_cb which is in drivers/platform/msm/ipa_fmwk/ipa_fmwk.c. The ipa_register_ipa_ready_cb could have output “ipa framework hasn’t been initialized yet” but it does not, and a message above does say “IPA framework init”, so initialization was started, but maybe not done, or it didn’t do something.
The call returns non-0, which is -EEXIST or -ENOMEM. The pr_debug is probably missing output my configuration.

The dynamic debug flag is not (yet) present in my configuration, so pr_debug is probably printk(KERN_DEBUG and that can be made visible with loglevel=7 in cmdline.

But before I change the cmdline, I can already si a pr_info("IPA framework init\n"); in the logs but no pr_info("IPA driver is now in ready state\n"); so it’s safe to assume that the error above is triggered by a timeout in this init.

From ipa_clients_manager_init() I get all the logs except the last one, ipa3_notify_clients_registered().
ipa3_notify_clients_registered calls ipa3_register_to_fmwk() if ipa_initialization_complete, but so does ipa3_post_init.

(Btw, the makefile in the ipa folder also references CONFIG_RMNET_IPA3 which I don’t have in the kernel.)

Back to ipa3_post_init, that gets called from ipa3_load_ipa_fw (load firmware) which looks in dts or at a constant for the firmware name, which is probably in my case qcom,firmware-name = "ipa_fws" as in lahaina.dtsi
Searching for that file in the vendor partition does not find anything (good, because that would be hard to mount outside of Android)
Next, looking in the mounts that were present in Lineage I try
/dev/block/sde29 on /vendor/firmware_mnt type vfat - that does lit a ton of firmwares, but not ipa returned by find /mnt/firmware_mnt -iname \*ipa\*.
Let’s stop guessing and read how it finds the path. Something something subsys with ipa_fws param. Indeed there is a device /dev/subsys_ipa_fws but I can’t just ‘cat’ that.
Another IPA error not seen before: minidump-id not found for ipa_fws. This is present in Lineage recovery dmesg too.

Ok, time to give up and make something: adding CONFIG_RMNET_IPA3 to my defconfig and removing duplicate cmdline options that are present in the vendor_boot already and adding loglevel=7
Actually I found that CONFIG_RMNET_IPA3 is set in dataipa_QGKI.conf but so is CONFIG_RNDIS_IPA which I tried to disable… Let’s try to disable it… again? Nah, it doesnt compile if I do so.

Booting - not much change in output.

Now Elros suggestions: set these two CONFIG_USB_CONFIGFS_F_GSI=n and CONFIG_USB_CONFIGFS_ECM=n
Booting - some things changed, like: mkdir gsi.rndis fails, then no IPA errors in the log (but no rndis_bam.rndis either)

New suggestions: DUMMY_HCD=n, IPA=n, ECM=y

DUMMY_HCD=n didn’t help, but for some reason IPA=n compiled now. That doesnt’ help either

Meanwhile, while operating the device, I miss pressing VolUp on one of the boots and Lineage kicks in. I let it display the welcome and reboot to recovery.

Sure thing, my /data is gone :frowning: For some reason, lineage reformats it in dm-style, and I need it bare.
This means I have to redo the changes I did not save from init-debug.
Also, wipe data from recovery, push bz2, push busybox, make links to tar/bzip2

Still no telnet, but boy there’s stuff to debug in this kernel.

1 Like

Day 24

What if I don’t need no telnet?

Usually what happens is that the screen won’t turn on and you need to debug that through telnet.
Very fast iterating possible solutions.

The “screen” in hybris ports is based on hwcomposer service starting, at least.
For that to start, some dependant services need not crash (or need to be disabled)
To find out what happens you probably need to enter telnet as soon as possible and observe logcat logs.

But, even before that, to have the android “init” starting, you need to have mounts set up correctly.
That is, system goes into /system (or /system_root), vendor to /vendor and all else.
Part of those partitions are in fixup-mountpoints.
Another part, in my case, is in dynamic partitions. But so is fp4’s. So I literally copy its mount services. I also check out parse-android-dynparts into hybris/mw/, build it with -b and add it to patterns.

Building --droid-hal, --configs, --version and --mic again.

I then re-do the (push bz2, push busybox, link tar and bz2, untar) steps from day 22.
I also chroot into /data/.stowaways/sailfishos, /usr/bin/vi /etc/systemd/journalctl.conf and change it to persistent (from volatile).
Then I reboot to bootloader, sudo fastboot boot hybris-boot.img wait a couple of minutes, and reboot into recovery.
mount /data, chroot to sfos, jouralctl:

Of course I mess something up. where dmsetup was supposed to be called I get

[E][liblp]Logical partition metadata has invalid geometry magic signature.
Failed to parse metadata from "/dev/sda10"

Fortunately the error was a simple one as in Lineage recovery if I make ls -l /dev/block/by-name/super I get /dev/block/by-name/super -> /dev/block/sda19

Next thing after reboot in journalctl, droid-hal-init is loading modules (since when does it…?) and

droid-hal-init: LoadWithAliases was unable to load wlan

init/first_stage_init.cpp seems to use modules.load
I don’t have in /lib/modules/$MYKERNELVERSION/
Let’s try fixing modules.load by removing wlan.

Ramdump error:(

Nine seconds into the boot, jouralctl shows

Zenfone8 mce[2660]: modules/mempressure.c: mempressure_cgroup_init(): mempressure 'warning' threshold is not defined
Zenfone8 mce[2660]: modules/mempressure.c: g_module_check_init(): mempressure plugin init failed
Zenfone8 DSME[2872]: state: new state: USER
Zenfone8 systemd-udevd[974]: conflicting device node '/dev/dri/card0' found, link to '/dev/dri/card0' will not be created
Zenfone8 systemd[1]: Started Mode Control Entity (MCE).
Zenfone8 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=mce comm="systemd" exe="/usr/li
Zenfone8 mce[2660]: modules/display.c: mdy_stm_set_compositor_availability_changed(): compositor availability change: pending
Zenfone8 systemd-udevd[982]: conflicting device node '/dev/dri/renderD128' found, link to '/dev/dri/renderD128' will not be created
Zenfone8 kernel: [    9.234374] (CPU:5-pid:966:systemd-udevd)cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000
Zenfone8 kernel: [    9.234384] (CPU:5-pid:966:systemd-udevd)cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000, update PLL setting!!!
"update PLL setting!!!" ? But.. that was not the case in the previous boot :hmm: Again?

Yes, ramdump screen again. and again.

How can changing modules.load break the system so hard?
Surely enough, I add back wlan.ko in modules.load and it doesn’t reboot.
The weird thing is that it never even started to load modules when it broke!

Grepping for modules.load finds me vendor/lineage/build/tasks/kernel.mk
In that file, BOARD_VENDOR_KERNEL_MODULES_LOAD is used.
Indeed, that wlan.ko came out of my BoardConfig.mk.

Let’s see where else it’s used, and remove it :slight_smile:
Oh, there’s also /lib/modules/$MYKERNELVERSION/modules.order that includes it. That is the simpler explanation.
drivers/staging/qcacld-3.0/wlan.ko. Removing it.

And blocklist qca_cld3_wlan in modules.blocklist hmm :thinking: first_stage_init.cpp doesn’t seem to use blacklist by default…

But still I get a ramdump, as if something is hardcoded somewhere about that wlan module…\o/

        LOG(FATAL) << "Failed to load kernel modules";

Let’s change this to just LOG(ERROR) till we figure it out. And put back wlan where it was:()

Hmm…Why was wlan in “staging”? Sounds like unfinished stuff.
Let’s copy it manually?

adb push ./out/target/product/sake/obj/KERNEL_OBJ/drivers/staging/qcacld-3.0/wlan.ko /data/.stowaways/sailfishos/lib/modules/5.4.61-qgki-perf-gbface408530e-dirty/

Meanwhile I also rebuild init with FATAL removed and using ERROR instead (There’s also FATAL_WITHOT_ABORT, btw)

And I still get that fatal error about wlan.
Let’s look at that fresh init. Hopefully is this one out/target/product/sake/system/bin/init. It has 929016.
On device, usr/libexec/droid-hybris/system/bin/init has 933136. Kind of a big diff?
Let’s build_packages.sh --droid-hal and unpack that to make sure.
Yup, droid-local-repo/sake/droid-hal-sake-0.0.6-202303262207.aarch64.rpm has exactly the 929016 one now.

Adb pushing it. Reboot, same error, same abort.
So maybe I didn’t build/replace the expected init?
/usr/bin/droid/droid-hal-startup.sh points to /sbin/droid-hal-init which is… 933136 bytes. Why is there a copy…?
cp /usr/libexec/droid-hybris/system/bin/init /sbin/droid-hal-init
Reboot (meaning fastboot boot these days)

Aaand… ramdump screen again

So I got two/three unrelated changes that generate this. Removing wlan references from modules.* files or replacing /sbin/init with a version that doesn’t abort. Or the fact that I did copy a wlan.ko in there…
(I remove the wlan.ko and it’s not that)
The logs I get in the reboot cases are definitely not synced till the end as they don’t end in the same way, nor include a kernel panic message.

So maybe all the things that are in common is that droid-hal-init continues, all these ways of avoiding an “error” become a sure path to making 100% that a next error, which is fatal for the kernel is encountered.

This I don’t need telnet for. I need to think (as always) and read the first_stage_init.cpp carefully and then I would need an actual kernel panic/oops message that generates the ramdump.

1 Like

I follow very closey what you write here, and at the beginning I also had the assumption that I can follow you. But I had to realize that this exceeds my understanding. :smiley:
So I ask: are you making progress and are you confident that you will be able to make a port?

1 Like

The quality of the story degrades as the problems I face are getting harder for me to solve, indeed - not your fault that this became hard to follow. There is a balance of trying many solutions vs writing, and lately that balance was in disfavor of the writing. (Basically I got from “I’ll try a new thing because I know how to port with HADK” to “what’s this is not even booting and they changed the image header and wow now I can’t even get kernel logs or telnet access”).

Hopefully that is just an intermediary state and things will get back to well-known stuff.

To the point: I am making progress. I got init starting, dmesg and journalctl logs and probably I got the mount points right.

2 Likes

Day 25

Ramdump. Telnet. Ramdump.

A keen eye would have noticed I already copied androidboot.ramdump=disable from my previous port - it was already there from its Lineage build.

That didn’t do much if anything. But the answer needs to be in the kernel.
The question: why does the system reboot in a (special mode](android - How are ramdumps generated on a system crash? - Stack Overflow) when this device crashes?

Searching MAGIC, RAMDUMP etc in defconfig doesn’t do it.
Searching ramdump in kernel sources has some (many) hits. I need to filter them out, since some are RAM dumps from peripheral devices.

asusdebug.c. This file is full of gems.
First, there’s an asdf partition. That contains some asdf-logcat.txt or logcat-crash.txt but they are from weeks ago.
Idea 1: maybe I should mount it in recovery.

CONFIG_MACH_ASUS seems to guard some of the debugging, but I only have CONFIG_MACH_ASUS_SAKE. However obj-$(CONFIG_MACH_ASUS_SAKE) += asusdebug.o. Wait, I do have CONFIG_MACH_ASUS=y output in .config, I just don’t know from where.

Same asusdebug.c. There should be a /proc/asusdebug (it now contains just f2fs-attr:off)
It seems to accept a command like echo get_asdf_log > /proc/asusdebug
So, mkdir /asdf && mount /dev/block/sda9 /asdf then that echo.
That doesn’t seem to change the contents of the folder, unftortunately
echo slowlog > /proc/asusdebug does produce ASUSSlowg-$(date) logs though they might be from the current running system (something like a backtrace of all kernel threads.)


Breaking: Elros finds something in my dmesg from Day 23 and pings me to point out this change from: https://review.lineageos.org/c/LineageOS/android_device_xiaomi_sm6150-common/+/291494/4
Where ${ro.boot.usbcontroller} is a600000.dwc3 in my case.
I do it, create the /init_enter_debug2 file, boot hybris-boot and LO AND BEHOLD!
New USB device is detected when I boot, new usb network device and I just need to sudo ifconfig enp62s0u1 192.168.2.14 it (twice, because NetworkManager overrides the first config) and telnet 192.168.2.15 2323 and

Welcome to the Mer/SailfishOS Boat loader debug init system.

\o/

I immediately use this new superpower to spawn another telnet where I run dmesg -w and instruct the first session to continue the boot halted by the init_enter_debug2 file presence:

 # echo "continue" >/init-ctl/stdin 

And just as before, I am offered a ramdump screen (because droid-hal-init does this, after I “fixed” its quit because wlan.ko was missing).
The interactive telnet dmesg session has a headstart from the persisted journalctl and includes these lines in addition:

[   75.080658] [   75.080658] (CPU:4-pid:1020:systemd-udevd) [05:46:21.316141874] cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000
[   75.080669] [   75.080669] (CPU:4-pid:1020:systemd-udevd) [05:46:21.316151978] cs35l45 5-0030: cs35l45_set_sysclk: clk_id = 0, freq = 3072000, update PLL setting!!!
[   75.084444] [   75.084444] (CPU:5-pid:1020:systemd-udevd) [05:46:21.319927447] cs35l45 5-0030: Cirrus Logic CS35L45 (35a450), Revision: A0
[   75.084574] [   75.084574] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320057030] gpio gpiochip0: (f000000.pinctrl): allocate IRQ 401, hwirq 2
[   75.084580] [   75.084580] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320063436] gpio gpiochip0: (f000000.pinctrl): found parent hwirq 117
[   75.084587] [   75.084587] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320070051] gpio gpiochip0: (f000000.pinctrl): alloc_irqs_parent for 401 parent hwirq 117
[   75.084614] [   75.084614] (CPU:5-pid:1020:systemd-udevd) [05:46:21.320097030] register_speaker_dai_name: register cs35l45 speaker name =  cs35l45.5-0031
[   75.087043] [   75.087043] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322527134] cs35l45 5-0031: num_fast_switch:3
[   75.087061] [   75.087061] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322544114] cs35l45 5-0031: 0:cs35l45-spk-music.txt
[   75.087070] [   75.087070] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322553436] cs35l45 5-0031: 1:cs35l45-spk-outdoor.txt
[   75.087079] [   75.087079] (CPU:0-pid:1020:systemd-udevd) [05:46:21.322562655] cs35l45 5-0031: 2:cs35l45-spk-voice.txt
[   75.124684] [   75.124684] (CPU:6-pid:1020:systemd-udevd) [05:46:21.360167186] cs35l45 5-0031: cs35l45_set_sysclk: clk_id = 0, freq = 3072000
[   75.124692] [   75.124692] (CPU:6-pid:1020:systemd-udevd) [05:46:21.360175259] cs35l45 5-0031: cs35l45_set_sysclk: clk_id = 0, freq = 3072000, update PLL setting!!!
[   75.128460] [   75.128460] (CPU:5-pid:1020:systemd-udevd) [05:46:21.363944009] cs35l45 5-0031: Cirrus Logic CS35L45 (35a450), Revision: A0
[   75.170210] [   75.170210] (CPU:7-pid:54:migration/7) [05:46:21.405693280] IRQ 14: no longer affine to CPU7
[   75.245474] [   75.245474] (CPU:2-pid:1035:systemd-udevd) [05:46:21.480958072] [FTS_TS]fts_get_ic_information: Enter
[   75.246339] [   75.246339] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481822186] [FTS_TS]fts_get_chip_types:verify id:0x5652
[   75.246351] [   75.246351] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481834218] [FTS_TS/I]fts_get_ic_information:get ic information, chip id = 0x5652
[   75.246359] [   75.246359] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481842707] [FTS_TS]fts_get_ic_information: Exit(372)
[   75.246373] [   75.246373] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481856822] [FTS_TS/I]fts_create_apk_debug_channel:Create proc entry success!
[   75.246409] [   75.246409] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481892759] [FTS_TS/I]fts_create_sysfs:[EX]: sysfs_create_group() succeeded!!
[   75.246422] [   75.246422] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481905884] [FTS_TS]fts_ex_mode_init:create sysfs(ex_mode) succeedfully
[   75.246442] [   75.246442] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481924999] [FTS_TS/I]asus_create_sysfs:[EX]: asus_create_group() succeeded!!
[   75.246463] [   75.246463] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481946509] [FTS_TS/I]asus_game_create_sysfs:[EX]: asus_create_group() succeeded!!
[   75.246471] [   75.246471] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481954843] [FTS_TS]fts_gesture_init: Enter
[   75.246484] [   75.246484] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481967030] [FTS_TS]fts_gesture_init: Exit(462)
[   75.246491] [   75.246491] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481974895] [FTS_TS]asus_gesture_init: Enter
[   75.246505] [   75.246505] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481988541] [FTS_TS]asus_gesture_init: Exit(577)
[   75.246513] [   75.246513] (CPU:1-pid:1035:systemd-udevd) [05:46:21.481996822] [FTS_TS][TEST]fts_test_init: Enter
[   75.246521] [   75.246521] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482004791] [FTS_TS/I][TEST]fts_test_func_init:init test function
[   75.246531] [   75.246531] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482014270] [FTS_TS/I][TEST]fts_test_func_init:match test function,type:88
[   75.246541] [   75.246541] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482024530] [FTS_TS][TEST]fts_test_init:sysfs(test) create successfully
[   75.246549] [   75.246549] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482032707] [FTS_TS][TEST]fts_test_init: Exit(2165)
[   75.246571] [   75.246571] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482054478] [FTS_TS/I]fts_irq_registration:irq:399, flag:2002
[   75.246860] [   75.246860] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482343645] [FTS_TS/I]fts_fwupg_init:fw upgrade init function
[   75.246884] [   75.246884] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482367082] [FTS_TS]fts_ts_probe_entry: Exit(1928)
[   75.246893] [   75.246893] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482376457] [FTS_TS/I]fts_ts_probe_entry:FOD location 6560 10720 26096 30256
[   75.246901] [   75.246901] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482384843] [FTS_TS/I]fts_ts_probe:Touch Screen(I2C BUS) driver prboe successfully
[   75.247001] [   75.247001] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482484686] [FTS_TS/I]fts_fwupg_work:fw upgrade work function
[   75.247021] [   75.247021] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482504791] [FTS_TS]fts_fwupg_get_fw_file:get upgrade fw file
[   75.247030] [   75.247030] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482513489] [FTS_TS]fts_get_fw_file_via_i:fts_get_fw_file_via_i
[   75.247038] [   75.247038] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482521926] [FTS_TS/I]fts_fwupg_get_fw_file:upgrade fw file len:102100
[   75.247047] [   75.247047] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482530884] [FTS_TS/I]fts_fwupg_auto_upgrade:********************FTS enter upgrade********************
[   75.247054] [   75.247054] (CPU:1-pid:1035:systemd-udevd) [05:46:21.482538072] [FTS_TS]fts_ts_init: Exit(2240)
[   75.247065] [   75.247065] (CPU:3-pid:354:wk:fts_fwupg_w) [05:46:21.482548020] [FTS_TS/I]fts_fwupg_upgrade:fw auto upgrade function
[   75.248194] [   75.248194] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483677186] [FTS_TS/I]fts_wait_tp_to_valid:TP Ready,Device ID:0x5652
[   75.248201] [   75.248201] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483683957] [FTS_TS/I]fts_fwupg_check_fw_valid:tp fw vaild
[   75.248209] [   75.248209] (CPU:6-pid:354:wk:fts_fwupg_w) [05:46:21.483692759] [FTS_TS/I]fts_fwupg_get_ver_in_host:fw version offset:0x10e
[   75.248628] [   75.248628] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484111145] [FTS_TS/I]fts_fwupg_need_upgrade:fw version in tp:84, host:84
[   75.248636] [   75.248636] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484119791] [FTS_TS/I]fts_fwupg_upgrade:fw upgrade flag:0
[   75.248642] [   75.248642] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484124999] [FTS_TS/I]fts_fwupg_auto_upgrade:**********tp fw(app/param) no upgrade/upgrade success**********
[   75.248646] [   75.248646] (CPU:4-pid:354:wk:fts_fwupg_w) [05:46:21.484129634] [FTS_TS/I]fts_fwupg_auto_upgrade:********************FTS exit upgrade********************

However, still no crash reason per se.

Now I can disable(mask?) droid-hal-init service, and interactively start it from telnet, maybe that gives me more control.
At least I could single it out as being the culprit.

But I would still need the details of the kernel crash.
That may need a better understanding of asusdebug.c and its related changes to printk.c or a revert of them.
Also I still don’t know how the ramdump screen is triggered (other than a Stackoverflow post) and how to use it in my advantage.

Let’s go with the disablement idea: in chroot through ADB, I

# systemctl mask droid-hal-init.service
Created symlink /etc/systemd/system/droid-hal-init.service → /dev/null.

I reboot, connect two telnet sessions, one with journalctl -f (instead of dmesg -w) and the other used to echo “continue” to /init-ctl/stdin
Somehow this time I get disconnected but the device is not rebooted.
Journal last logs show systemd-udevd activity like link_config: autonegotiation is unset or enabled, the speed and duplex are not writable..
It is not related to usb0, but that may be because it disconnects by that time. Maybe my udev rules are stepping on my toes here.

Checking the journal on the device (in adb shell chroot) shows some lines after the disconnect:

probably usb-moded
Mar 15 14:56:56 Zenfone8 systemd[1]: Started udev Wait for Complete Device Initialization.
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting usb-moded USB gadget controller...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting ohm daemon for resource policy management...
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target System Initialization.
Mar 15 14:56:56 Zenfone8 systemd[1]: Listening on OpenSSH Server Socket.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Sockets.
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: usb_moded 0.86.0+mer56 starting
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Daily Cleanup of Temporary Directories.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Reclaim memory once per day and on boot.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Timers.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Wayland path watcher.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Paths.
Mar 15 14:56:56 Zenfone8 systemd[1]: Reached target Basic System.
Mar 15 14:56:56 Zenfone8 systemd[1]: Started droid-late-start.
Mar 15 14:56:56 Zenfone8 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=droid-late-start comm="systemd"
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Indicate boot is done...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Bluetooth service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Application Permission Management Daemon...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Oneshot stuff for root...
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: CONFIGFS detected
Mar 15 14:56:56 Zenfone8 kernel: [   75.113388] (CPU:2-pid:4761:usb_moded) [12:56:56.971053020] configfs-gadget a600000.dwc3: unregistering UDC driver
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Login Service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Telephony service...
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Nemo device lock daemon...
Mar 15 14:56:56 Zenfone8 systemd[1]: Started Disk quota netlink message daemon.
Mar 15 14:56:56 Zenfone8 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=quota_nld comm="systemd" exe="/
Mar 15 14:56:56 Zenfone8 systemd[1]: Starting Reclaim memory...
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis: mkdir failed: No such file or directory
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis/ethaddr: can't open for writing: No such file or directory
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: /config/usb_gadget/g1/functions/rndis_bam.rndis/wceis: can't open for writing: No such file or directory
Mar 15 14:56:56 Zenfone8 systemd-logind[4773]: New seat seat0.
Mar 15 14:56:56 Zenfone8 kernel: [   75.122581] (CPU:2-pid:4761:usb_moded) [12:56:56.980245884] Mass Storage Function, version: 2009/09/11
Mar 15 14:56:56 Zenfone8 kernel: [   75.122598] (CPU:2-pid:4761:usb_moded) [12:56:56.980262238] LUN: removable file: (no medium)
Mar 15 14:56:56 Zenfone8 kernel: [   75.122648] (CPU:2-pid:4761:usb_moded) [12:56:56.980312342] file system registered
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: Unable to find $power_supply device.
Mar 15 14:56:56 Zenfone8 usb_moded[4761]: hwal init failed

Let’s mask usb-moded too. Great, I don’t get disconnected now, and can inspect the running system.

I notice that odm is mounted as /odm_root. Is there anything useful there that could prevent droid-hal-init from starting?
Also “vendor-vm\x2dsystem.mount: Mount process exited, code=exited status=32”. So does “asusfw”, and “FFS”

pulseaudio[5126]: library “libdl_android.so” not found

This is usually fixed by linking the lib in /odm/lib[64]

And let’s see about that droid-hal-init.
If I didn’t write this till now, droid-hal-init is actually “init”, but from Android. We are executing this as non-0 PID service (with some patches so it ignores that it is not PID=1).
I didn’t peek into its sources before, but I got more familiar with this occasion.
On Android 11 at least, there is a “first stage init” that determines if its’ recovery or not, and if not it loads kernel modules.
Then it executes himself with “selinux_setup” as argument.
If that succeeds, then it executes himself with “second_stage”.
And it actually starts executing /system/etc/init.rc

The hybris patches change a bit of that, but in essence you still get the same steps.

Speaking of selinux, maybe this is why it sends us to the ramdump screen. Or the failed mounts.
So I at have to check:

  1. failed mounts
  2. selinux files
  3. odm links

1.failed mounts.

# systemctl status vendor-vm\\x2dsystem.mount
mount: /vendor/vm-system: wrong fs type, bad option, bad superblock on /dev/sde44, missing codepage or helper program

sde44, the device previously known as /dev/block/bootdevice/by-name/vm-bootsys_b. I don’t have notes on /vendor/vm-system being mounted in Lineage. It’s presence in the fstab may be a red herring.
Also a generous xxd -l 5120 shows only zeroes and there is no reference to either name(s) in the ASUS flashing script or list of images.
Same story for vendor/asusfw aka sda16.
find /usr/lib/systemd/system -iname \*.mount -exec grep FFS {} \; -print

Description=FFS mount
/usr/lib/systemd/system/dev-mtp.mount

Again, I’ll worry about this later. MTP is media transfer protocol.

2.selinux files.
HADK FAQ says I should copy some files.
They’re from /vendor/etc/selinux.
Unfortunately I don’t have ssh to copy them from the device in the droid-config-sake repo. I’ll make the folder structure on device and I’ll copy it later through ADB (If I don’t forget and boot into Lineage by accident)

As in the example repo, I:

  • create config with the same contents, create minimum/context/files folder, create minimum/context/dbus_contexts with the same contents.
  • copy /vendor/etc/selinux/vendor_file_contexts to minimum/contexts/files/files_contexts
  • create ‘minimum/policy’ folder, copy… what. There’s no precompiled_sepolicy in /vendor/etc.
    Fortuntely I soon find one in /odm_root/etc/selinux/precompiled_sepolicy. Hopefully it plays the same role.

3.odm links
For now I create a link in /odm/etc pointing to /odm_root/etc
There’s another ueventd.rc file in there talking about firmwares for “trustedvm” that are on /vendor/vm-system but I don’t have that so I don’t link it either.

4?
What about wireless? I am to curious not to check it out.
Not present in ifconfig -a. It fails to modprobe qca_cld3_wlan and dmesg says Reject WLAN Driver insmod before CBC
Kernel source above that error says:

/* If enabled Cold Boot Calibration is the 1st step in init sequence.
* CBC is done on file system_ready trigger. Qcacld should be loaded
* from init.target.rc after that. Reject qcacld load from
* vendor_modprobe.sh at early boot to satisfy this requirement.
*/

Wow, so I need some droid init sequence even for wlan driver. Hmm. find / -iname \*.rc -exec grep -i wlan {} \; -print finds me this nicely commented piece of init:

# Enable WLAN cold boot calibration
write /sys/devices/platform/soc/b0000000.qcom,cnss-qca6490/fs_ready 1

Ok, modprobe again:

DMS QMI connection not established
Direct firmware load for wlan/qca_cld/WCNSS_qcom_cfg.ini failed with error -2

I do have /vendor/firmware/qca_cld/WCNSS_qcom_cfg.ini
QMI is probably Qualcomm MSM Interface. Wikipedia says oFono could help. But why this dependency in the wlan driver? VoWIFI?
The kernel source file with that error references “CNSS_QMI_DMS_CONNECTED” that is missing.
I’ll leave it for another day to see what DMS is.


Ok, now let’s start droid-hal-init.
Prepare three telnet sessions, one with journalctl -f, one ready to launch /usr/libexec/droid-hybris/system/bin/logcat

Aand… ramdump screen. I don’t have the chance to execute logcat.
At least this time I got fuller journal than before, when probably usb-moded kicked in earlier and journalctl didn’t sync to persisted?

There are a couple of ‘sending signal 9’ lines
The last service started is ‘/vendor/bin/vendor_modprobe.sh’.
Probably some .rc services need to be disabled.
Tomorrow.

3 Likes

Day 26

Let’s see where that is.
processing action (early-init) from (/vendor/etc/init/hw/init.target.rc:33)

Running early-init from that script, line by line, doesn’t crash the kernel.

I need to get back to kernel reading to see how to acutally use the ram dumps.
The asusdebug.c seems to be fully created by asus and it’s history only contains 2 commits in my kernel tree.
Let’s look at the initial integration which is in commit named treewide: Import ASUS changes from 31.1004.0404.81

Scroll, scroll, scroll.

WTF did I just see pubg here.

Scroll, scroll. I cannot finish reading it all as of now.


Playing a funny game: ls /sys/class/leds/

blue/ led:flash_0/ led:flash_2/ led:switch_0/ led:switch_2/ led:torch_1/ led:torch_3/ red/
green/ led:flash_1/ led:flash_3/ led:switch_1/ led:torch_0/ led:torch_2/ mmc0::confused:

Echoing 1 or 0 to red/green/blue path brightness controls the notification led next to the charger port
Echoing 1 to switch_0/brightness makes the camera led flash :wink: Well, at least the first time.
Maybe I will get faster-than-camera-opens flashlight in SFOS


After playing the led game, I again start dmesg -w (for kicks) and /usr/bin/droid/droid-hal-startup.sh
And I got my ‘lucky’ kernel crash - dmesg: Ubuntu Pastebin
Basically lucky because while a kernel thread was bringing the system down some other one found the time to flush the dmesg to telnet…

The crash is an ipa_assert.

Direct firmware load for ipa_fws.mdt failed with error -2
ipa_fws: Failed to locate ipa_fws.mdt(rc:-2)
pil_boot failed for ipa_fws
ipa ipa3_pil_load_ipa_fws:6536 Unable to PIL load FW for sub_sys=ipa_fws
ipa ipa3_load_ipa_fw:6579 IPA FW loading process has failed result=-22
IPA: unrecoverable error has occurred, asserting

While grepping that same log for ipa, the scrollbar jumps above and shows me previously parsed rc files:

droid-hal-init: Parsing file /vendor/etc/init/ipa_fws.rc…
droid-hal-init: Parsing file /vendor/etc/init/ipacm-diag.rc…
droid-hal-init: Parsing file /vendor/etc/init/ipacm.rc…

The first one has one suspicious monologue with the kernel: write /dev/ipa 1
Unfortunately that command is in an on early-boot trigger so I cannot just disable it as I do with a service override.

I’m backing out the IPA=n in the previous kernel defconfig changes (backing out GSI=n does not work, telnet is again absent.)
Boot hybris-boot, and check

  1. /vendor/firmware/ipa_fws.mdt exists alright.
  2. echo 1 > /dev/ipa crashes the system

Let’s chmod a-w /dev/ipa and run droid-hal-startup.sh again. Ramdump.
Hmm… maybe I need to disable ipacm service too.
I add to /usr/libexec/droid-hybris/system/etc/init/disabled_services.rc:

service vendor.ipacm /system/vendor/bin/ipacm_HYBRIS_DISABLED

Unfortunately, that’s not the culprit. The real problem is that root can write to /dev/ipa even without rights.
And that makes the kernel crash - altough I don’t know why it doesn’t find that framework.
I just checked and echo 1 > /dev/ipa crashes recovery too.
So the only “working system” I have that uses that /dev/ipa is Lineage itself…

I am grepping the whole hadk repo for ipa_fws.

For example, kernel ipa_i.h says

/* The relative location in /lib/firmware where the FWs will reside */
#define IPA_FWS_PATH “ipa/ipa_fws.elf”
#define IPA_FWS_PATH_4_0 “ipa/4.0/ipa_fws.elf”

and ipa.c in ipa3_manual_load_ipa_fws() says it’s IPA_HW_v4_0

But that codepath of manual ipa load does not seem to be touched based on the above log.
Another one, ipa3_pil_load_ipa_fws is used. Again, based on log messages grepping, that calls request_firmware eventually which ends up in firmware_loader.c.
The path that are enumerated there are:

static const char * const fw_path[] = {
    fw_path_para,
    "/lib/firmware/updates/" UTS_RELEASE,
    "/lib/firmware/updates",
    "/lib/firmware/" UTS_RELEASE,
    "/lib/firmware"
};

And further grepping fw_path_para reveals this nice explanation:

There is an alternative to customize the path at run time after bootup, you
can use the file:

  • /sys/module/firmware_class/parameters/path

Let’s use it, then:

echo -n “/vendor/firmware” > /sys/module/firmware_class/parameters/path

echo 1 > /dev/ipa seems to survive now \o/
I need to add that above to /usr/bin/droid/droid-hal-early-init.sh as it doesn’t exist probably for this exact usecase.

# systemctl unmask droid-hal-init.service and reboot to bootloader to boot hybris-boot.

After reboot, I don’t run journalctl anymore, but /usr/libexec/droid-hybris/system/bin/logcat because now droid-hal-init starts.

And here is my first logcat: Ubuntu Pastebin. Goodnight.

6 Likes

Oh, it sounds like you took that as a criticism. I had intended the opposite! I’m sorry! I was and am impressed because the whole thing is fascinating but also sometimes beyond my understanding. I stay tuned and always look forward to a new episode of the Livecast!

Nice to hear :slight_smile:

3 Likes

Don’t worry, I didn’t took it as a criticism. But it did trigger some introspection.
Thanks for the kind words.

Day 27

After droid-hal-init started, I test wlan again:

# echo 1 > /sys/devices/platform/soc/b0000000.qcom,cnss-qca6490/fs_ready
# modprobe modprobe qca_cld3_wlan
# ifconfig -a
wlan0     Link encap:Ethernet  HWaddr 3C:7C:3F:8E:E5:8A  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:3000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

wlp1s0    Link encap:Ethernet  HWaddr 3E:7C:3F:8E:E5:8A  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:3000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Cool.

Hmmm. forgot to # cp /odm_root/etc/selinux/precompiled_sepolicy /etc/selinux/minimum/policy/policy.30 yesterday?
Btw, I can now safely flash instead of boot the hybris-boot.img. I should do so next.


Done. Bye bye Lineage recovery, see you soon.


About yesterdays’ logcat: Check failed: selinux_status_open(true ) >= 0.
I thought I set up selinux files, but maybe more is needed.
For example, my kernel doesnt display “SELinux: Initializing” on the 0th second on boot, but “AppArmor: AppArmor initialized”.
The function selinux_status_open(int fallback) is in libselinux/src/sestatus.c.
The comment says something about “/selinux/status”.
I do have /selinux but there’s nothing in there.

Checking mount on the device, there is no selinuxfs mount, but there is one securityfs

securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)

In that folder I do find AppArmor references - but no “status”.
Reading a comparison1 of SELinux and AppArmor it is clear to me that those are not related at all, and are parallell developments.
I was “hoping” that ASUS used some rebranded thing that would turn out to be SELinux, but it is not.
It may be older… Also, wikipedia hints at “Yama” and “Tomoyo”.

So the question is, then: how come those same services work on Lineage without SELinux and the fail in hybris?
:facepalm: they were from halium check-kernel-config.
Let’s remove most of that.

Next boot my heart skipped a beat because telnet got disconnected and wouldn’t connect again.
Oh, no not some sfos or android service again.
But when pulling out the USB cable and connecting it back I got back access :whew:

I get a large logcat - of which I save the first 60 seconds since that would be enough for a phone to ‘boot’ - and it doesn’t show that selinux error. Also, /sys/fs/selinux is now mounted as selinuxfs.
Lots of “avc: denied” lines there that end up in “permissive=1”, so it’s just noise (would have been denied, but isn’t)

Some notable errors:

vendor.qti.vibrator: open /dev/input/by-path failed, errno = 21

Altough I see it is world readable.
When CamX enters the stage, logs start to become unreadable:)

Then there’s

--------- beginning of crash
03-17 17:24:47.367 6400 6400 F linker : CANNOT LINK EXECUTABLE “/usr/libexec/droid-hybris/system/bin/minisfservice”: library “libandroidicu.so” not found: needed by /system/lib/libmedia.so in namespace (default)

That I should know how to fix.
From previous debugging experience, confirmed in porter’s channel, the usual path to look for libraries contains /odb/lib[64], and since I control that, I should be able to link these kind of libraries.

Then there’s also

03-17 17:24:47.665 6480 6615 E audioadsprpcd: vendor/qcom/proprietary/commonsys-intf/adsprpc/src/apps_std_imp.c:921:Error 0x2: fileExists failed for path /vendor/dsp/adsp/audioadsprpcd.farf, errno is No such file or directory

But /vendor/dsp/adsp/ exists and it is filled with other files.

Good news is that I got one “ACDB-LOADER: ACDB → init done!” which is audio related.

Next, I grep for “Waiting for service” lines. I get

Waiting for service ‘package_native’
Waiting for service ‘statscompanion’

That ‘statscompanion’ is so often waited upon I look for it on the whole device

find / -iname *.rc -exec grep statscompanion {} ; -print

I don’t find anything though. The same for package_native.

Hmm. Let’s:

  1. boot an audit=0 cmdline kernel
  2. fix the libandroidicu.so requirement by linking it from /odm/lib
  3. See the new logcat. Maybe share it on porter’s channel…

For 2. I do # ln -s /apex/com.android.art/lib/libandroidicu.so /odm/lib/libandroidicu.so
Wait, no such file or directory - /odm does not exist unfortunately.
I thought I already created it and linkd /odm/etc to /odm_root/etc… it is not on the rootfs?

Again:

# mkdir /odm
# ln -s /odm_root/etc /odm/etc
# mkdir /odm/lib
# ln -s /apex/com.android.art/lib/libandroidicu.so /odm/lib/libandroidicu.so

Reboot. Boot new hybris-boot with audit=0

Minimedia this time complains about missing libicuuc.so
# ln -s /apex/com.android.art/lib/libicuuc.so /odm/lib/libicuuc.so

Let’s link all libs from my previous port then.

# ln -s /apex/com.android.vndk.v30/lib/libaudioroute.so /odm/lib/libaudioroute.so
# ln -s /apex/com.android.art/lib/libicui18n.so /odm/lib/libicui18n.so

Reboot.

linker : CANNOT LINK EXECUTABLE “/usr/libexec/droid-hybris/system/bin/minimediaservice”: library “libnativehelper.so” not found: needed by /system/lib/libmediandk.so in namespace (default)

Ok, # ln -s /apex/com.android.art/lib/libnativehelper.so /odm/lib/libnativehelper.so
Reboot.

This time I don’t find CANNOT LINK messages in logcat. Let’s see about those services.
# vi /usr/libexec/droid-hybris/system/etc/init/disabled_services.rc

I add them as

service package_native /system/vendor/bin/DISABLED           
                                                  
service statscompanion /system/vendor/bin/DISABLED

and remove the vendor.ipacm disablement line from previous day when I thought that was making me reboot.
And reboot again.

(Btw, each reboot needs cable re-connect to continue telnet. Some service is re-setting something)

02-25 23:16:36.211 985 985 I SurfaceFlinger: Using HWComposer service: ‘default’
02-25 23:16:38.466 985 985 I HWComposer: Switching to generalized multi-display mode
02-25 23:16:38.466 985 985 W DisplayIdentification: Invalid EDID: falling back to serial number due to missing display name.
02-25 23:16:38.466 985 985 W DisplayIdentification: Invalid EDID: falling back to ASCII text due to missing serial number.
02-25 23:16:38.466 985 985 E HWComposer: isConnected failed for display 19261202339590786: Invalid display

Hmm…
I would like to have wlan to test minimer.
But then again…

 # echo 1 > /sys/devices/platform/soc/b0000000.qcom,cnss-qca6490/fs_ready
 # modprobe qca_cld3_wlan
 modprobe: FATAL: Module qca_cld3_wlan not found in directory /lib/modules/5.4.61-qgki-perf-gc8a3515be514
 # uname -r
 5.4.61-qgki-perf-gc8a3515be514

This means that my latest kernel changes changed the magic version too. I need to re-build the mic image… and I was hoping that I will first make screen and wlan work then scp into to transfer the changes.
Maybe I can just copy the /lib/modules though, by booting lineage recovery…

But how to boot it? fastboot boot will not go into recovery…
Maybe it’s safer to flash it.

Hmm Lineage recovery says that my /data needs wiping/reset. I get away by “using” the back button and do enable “Adb shell”
However I cannot mount /data as before. So I may have broken it somehow:(

I try to boot a previous “AppArmor” kernel.
Nope.
So no, I again have borked my /data probably by booting Lineage.

So. To end in a high note. Tomorrow I will have to do all the changes I did in the previous days (/dev/ipa, links to /odm, selinux) properly, offline, in a way they will get picked up by the next --mic build.
Good think I made a note of them all :slight_smile:

Also, I need to fix this “accidentally rebooting without holding vol-up starts lineage that wipes /data” somehow. Either by formatting and mounting data some other way, or by including the recover ramdisk in the hybris-boot image (or vendor boot?)

1 Like

Day 28

Meeting notes from Jolla community meeting mention ‘three month to get a high quality port’. I’m only halfway there:)

Retracing my steps from previous days: selinux, droid-hal-early-init.sh, /etc links.
Good thing I have them noted.
Takes me some time, but done offline on the laptop.

Trying to flash lineage boot… it ends up in slot ‘a’.
Maybe this is what happened yesterday, the slot was changed? But /data should not be slotted.
Anyway, I get `FAILED (remote: ‘Slot Change is not allowed in Lock State’) when trying to switch back to slot B.
I decide to go ahead and boot slot A, and ASUS operating system welcomes me :slight_smile:

I’m pretty sure there was something I hardcoded on slot B though.
So I restart to bootloader and try to --set-active=b again: FAILED (remote: 'unknown command')

Hey, but at least I get access to recovery from slot A since I flash lineage boot.img.
I can do all my stuff from there, and SFOS is going to pick up B-side, right?

One weird thing is that Lineage recovery has lost the abilty to regiter touches and I need to operate it with vol up/down and power.
Is this a side-A thing?

Anyway, I am booting my hybris-boot.img
And it again restarts in 60 seconds.
So I need to get back to Lineage recovery to see why. There is no usb0 interface anyore?

Hmm I needed to fix local changes, there was one error in the init-debug that on the device was hardcoded.

I also redo all the /odm/etc link to /odm_root/etc and cp instead of ln the selinux file.

Fresh logcat with kernel that can load its modules :slight_smile: Ubuntu Pastebin

Lots of “Waiting for service ‘vold’ on ‘/dev/binder’…”, let’s try to comment out its disablement…
No difference - That only makes it (vold) fail because he doesn’t understand how I mount system_a.

I try test_hwcomposer and EGL_PLATFORM=hwcomposer test_hwcomposer under strace and they don’t find hwcomposer.*.so where ‘*’ is either default, lahaina or qcom.
All these paths are tried and do not exist :hmm:

hwcomposer.*.so /odm/lib64/hw/hwcomposer.qcom.so /odm/lib64/hw/hwcomposer.qcom.so /vendor/lib64/hw/hwcomposer.qcom.so /system/lib64/hw/hwcomposer.qcom.so /odm/lib64/hw/hwcomposer.lahaina.so /odm/lib64/hw/hwcomposer.lahaina.so /vendor/lib64/hw/hwcomposer.lahaina.so /system/lib64/hw/hwcomposer.lahaina.so /odm/lib64/hw/hwcomposer.lahaina.so /odm/lib64/hw/hwcomposer.lahaina.so /vendor/lib64/hw/hwcomposer.lahaina.so /system/lib64/hw/hwcomposer.lahaina.so /odm/lib64/hw/hwcomposer.default.so /odm/lib64/hw/hwcomposer.default.so /vendor/lib64/hw/hwcomposer.default.so /system/lib64/hw/hwcomposer.default.so

Let’s try minimer. This is a qml file with a jpeg. I got this as a tar.gz file → I echo it as base64, copy it to clipboard, and transfer it in vi by paste on the device.
Because it’s mini. Mini-Mer:)
Oh, but I need to install qt5-qtdeclarative-qmlscene first.
Which means I need to have internet connectivity.

Trying systemctl mask user@100000 (prevents lipstick restart) and restarting - the test_hwcomposer is… the same.

Hmm… so ssh-ing into the device will not work until I enable developer mode and set a password (so: needs GUI)
But maybe I can connect to the internet since I have a what-seems-to-be-working wlan module.
Or maybe I can route the connection through telnet, and HADK has instructions for that.

Actually is very easy, so I will not use wlan:

$ sudo iptables -t nat -A POSTROUTING -o wlp2s0 -j MASQUERADE
$ sudo echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward

(^^on my laptop)
then

# route add default gw 192.168.2.14 # <- host's usb0 IP
# echo 'nameserver 208.67.222.222' > /etc/resolv.conf

on the device.

Then zypper in qt5-qtdeclarative-qmlscene.
Stracing: EGL_PLATFORM=hwcomposer strace /usr/lib64/qt5/bin/qmlscene -platform hwcomposer main.qml
This also opens /usr/libexec/droid-hybris/system/lib64/android.frameworks.vr.composer@1.0.so, android.hardware.graphics.composer, libfmq.so, /vendor/lib64/libqdMetaData.so, /vendor/lib64/libgrallocutils.so
outputs “EGLFS: Failed to open /dev/fb0” and “/dev/pmsg0” (??)

So I still don’t know what’s missing for hwcomposer access. Probably the main thing for tomorrow - as just solving mounts and droid-hal-init starting did not automagically turn the screen on:)

2 Likes

Day 29

hwcomposer should be running. On my current device ps aux | grep composer is

/vendor/bin/hw/android.hardware.graphics.composer@2.3-service

On Zenfone:

/vendor/bin/hw/vendor.qti.hardware.display.composer-service

Q: Also, maybe “egl” related links need to be done in /odm too?

Looking again at journalctl -f now, instead of just logcat:

droid-hal-init: Could not start service 'ptt_socket_app' as part of class 'main': Cannot find '/system/vendor/bin/ptt_socket_app': No such file or directory
droid-hal-init: Could not start service 'miniaf' as part of class 'main': Cannot find '/usr/libexec/droid-hybris/system/bin/miniafservice': No such file or directory
droid-hal-init: Could not start service 'vendor_flash_recovery' as part of class 'main': Cannot find '/vendor/bin/install-recovery.sh': No such file or directory

(that’s a good thing:)

droid-hal-init: Control message: Could not find 'android.hardware.gatekeeper@1.0::IGatekeeper/default' for ctl.interface_start from pid: 5342 (/system/bin/hwservicemanager)
droid-hal-init: Control message: Could not find 'android.hardware.keymaster@4.0::IKeymasterDevice/default' for ctl.interface_start from pid: 5342 (/system/bin/hwservicemanager)

and

droid-hal-init: updatable process 'vendor.qseecomd' exited 4 times before boot completed
apexd: Native process 'vendor.qseecomd' is crashing. Attempting a revert
apexd: Revert failed : Revert requested, when there are no active sessions.

Q: maybe the side A vendor is ASUS’s?
A: Yes it is (of course, since it booted:)
That’s an opportunity to make it work without flashing Lineage first!:slight_smile:

But I fear I need to re-install lineage, so I am using my shiny telnet internet connection to scp the actual selinux files to the host.

The services that won’t start are

service vendor.keymaster-4-1 /vendor/bin/hw/android.hardware.keymaster@4.1-service-qti
(in /vendor/etc/init/android.hardware.keymaster@4.1-service-qti.rc)
service wait_for_keymaster /system/bin/wait_for_keymaster
(in /system/etc/init/wait_for_keymaster.rc)
service gatekeeper-1-0 /vendor/bin/hw/android.hardware.gatekeeper@1.0-service-qti
(in /vendor/etc/init/android.hardware.gatekeeper@1.0-service-qti.rc)
service vendor.qseecomd /vendor/bin/qseecomd
(in /vendor/etc/init/qseecomd.rc)

Maybe keymaster and gatekeeper wait for vendor.qseecom.
strace shows it’s crashing

Another class of errors:

droid-hal-init: Sending signal 9 to service ‘boringssl_self_test_apex64’
droid-hal-init: Sending signal 9 to service 'exec 48 (/system/bin/flags_health_check UPDATABLE_CRASHING)
droid-hal-init: Sending signal 9 to service ‘time_daemon’ (pid 8082) process group… HYBRIS: killing PID instead of process group.
droid-hal-init: Sending signal 9 to service ‘diag_mdlog_stop’ (pid 6775) process group… HYBRIS: killing PID instead of process group.

For boringssl Elros seems to have a clue SUSE Paste (source)

To actually see the strace messages for qseecomd I add -s 256 (“limit length of print strings to STRSIZE chars (default 32)”).
I now see :
rpmb_ufs : “could not find the ufs-bsg dev”
DrmLibRpmb: “Error: rpmb_init failed! with ret = -19”
QSEECOMD : “Init rpmb_init_service ret = -19”
QSEECOMD : “Init dlsym(g_FSHandle, rpmb_init_service) fail”
QSEECOMD : “ERROR: RPMB_INIT failed, shall not start listener services”

Hmm there is actually one

 # ls -l /dev/bsg/ufs-bsg0 
crw-------    1 root     root      242,   3 Mar 31 22:09 /dev/bsg/ufs-bsg0

It is only available to root, but /vendor/etc/init/qseecomd.rc runs qsecoomd as root.

Maybe qseecomd behaves better with system/vendor/etc from Lineage - remember, since switching slots I actually booted Sailfish “over” the Asus operating system.
Since changing active slot is still a mistery for me… I will flash entire Lineage on slot a too.

Large detour:


Reading my notes I see that indeed lineage can be installed by adb sideload-ing the zip, while saifish was not.
Doing that, success. I will not boot lineage as I will have to re-do the /data dance with busybox/tar etc.
Next I 'fastboot boot hybris-boot.img`. Hmm, ramdump. What did I forgot now.

Reboot to recovery, mount /data (works, so sailfish is still there) chroot to .stowaways/sailfishos
journalctl --list-boots then journalctl -b + the last boot id.

linkerconfig.mount: Directory /linkerconfig to mount over is not emp

and

Apr 01 15:03:42 Zenfone8 bash[946]: Skipping zero-length logical partition: system_b
Apr 01 15:03:42 Zenfone8 bash[946]: Skipping zero-length logical partition: system_ext_b
Apr 01 15:03:42 Zenfone8 bash[946]: Skipping zero-length logical partition: product_b
Apr 01 15:03:42 Zenfone8 bash[946]: Skipping zero-length logical partition: vendor_b
Apr 01 15:03:42 Zenfone8 bash[946]: Skipping zero-length logical partition: odm_b

So the b-side is gone ?
No, there is only one /dev/block/by-name/super in recovery, no _a/_b versions.

Let’s mask droid-hal-init, because probably that writing to /dev/ipa that crashes when the firmware folder is not set to /vendor/firmware…
Indeed I can enter telnet now, the device doesn’t ramdump. And the first thing I do is ls /vendor to find out it’s empty.
Something stopped working, about dynparts, but what?
/usr/bin/parse-android-dynparts /dev/sda19 echoes

dynpart-system_a,ro,0 7010296 linear /dev/sda19 2048;dynpart-system_ext_a,ro,0 470248 linear /dev/sda19 7012352;dynpart-product_a,ro,0 2604744 linear /dev/sda19 7483392;dynpart-vendor_a,ro,0 2521232 linear /dev/sda19 10088448;dynpart-odm_a,ro,0 2176 linear /dev/sda19 12611584

But systemctl status vendor.mount says

Apr 01 15:18:35 Zenfone8 mount[2641]: mount: /vendor: wrong fs type, bad option, bad superblock on /dev/mapper/dynpart-vendor_a, missing codepage or helper program, or

Indeed, xxd -l 5120 /dev/mapper/dynpart-vendor_a shows only zeroes.
Maybe Lineage flashing did not work and I didn’t notice (since I didn’t boot it…?)
Let’s boot it. What could possibly go wrong. It doesn’t boot.
Back to recovery. Your data may be corrupt:)
Ok, I think the Lineage flash did not work because something failed because I have a f2fs /data instead of whatever dm-2 was expected.

Hmm… the error when mounting /data is

mount: ‘/dev/block/bootdevice/by-name/userdata’->’/data’: Invalid argument

While thinking abou this I wiped data with the Lineage recovery GUI. Then it occured to me that maybe it was just one link that was broken…
After this, ls -l /dev/block/by-name/userdata shows

/dev/block/by-name/userdata → /dev/block/sda23

Sideload Lineage again, this time I look if there are errors. Looks like it worked, too bad I didn’t pay attention last time.

Grepping the local hadk .rc files for userdata I see this in system/etc/init/hw/init.rc

     # We restorecon /data in case the userdata partition has been reset.
    restorecon /data

Hmm… sounds like a selinux thing.

Anyway, after re-flashing lineage I re-do all the busybox/untar/stowaways steps and boot hybris-boot.img
This time there’s a new “device” connected, called “Failed to boot init in real rootfs”.
Telnet works, but this time on port 23.

mount: mounting /data//.stowaways/sailfishos on /target failed: No such file or directory

Wait, what.
Oh snap.
It’s empty. Because I have done all the above busybox/untar/stowaways steps on a /data folder that was not mounted in reconvery :facepalm:
Again, starting with mount /data first.


End detour.

After this detour of flashing/reflashing I’m back to qseecomd not starting.
Same error, “could not find the ufs-bsg dev”
The string is actually in strings /vendor/lib64/librpmb.so. There is also the string “/dev/ufs-bsg”
Let’s ln -s /dev/bsg/ufs-bsg0 /dev/ufs-bsg. And try starting qseecomd again → no crash :wink:

Adding that to droid-hal-early-init.sh
This time gatekeeper and keymaster services seem to also start.

2 Likes

Day 30

test_hwcomposer still crashes. minimer still doesnt start the display. It says

library "libcutils.so" not found
library "libc++.so" not found
library "libz.so" not found
library "libion.so" not found
library "libhardware.so" not found
library "android.hardware.graphics.mapper@3.0.so" not found
library "android.hardware.graphics.mapper@4.0.so" not found
library "android.hardware.graphics.mapper@2.0.so" not found
library "libhidlbase.so" not found
library "libutils.so" not found
library "android.hardware.graphics.common@1.0.so" not found
library "android.hardware.graphics.common@1.1.so" not found
library "android.hardware.graphics.mapper@2.1.so" not found
library "android.hardware.graphics.common@1.2.so" not found
library "libandroidicu.so" not found

Linking libandroidicu to odm/lib64: ln -s /apex/com.android.art/lib64/libandroidicu.so /odm/lib64/libandroidicu.so gives back

library “libandroidicu.so” needed or dlopened by “/usr/libexec/droid-hybris/system/lib64/libmedia.so” is not accessible for the namespace “(default)”

Let’s unmount /linkerconfig :slight_smile:
Maybe /linkerconfig/default needs to be mounted on /linkerconfig, rather than bootstrap?

Also:

ln -s /vendor/lib64/egl/libGLESv2_adreno.so /odm/lib64/egl/libGLESv2_adreno.so
ln -s /vendor/lib64/egl/eglSubDriverAndroid.so /odm/lib64/egl/eglSubDriverAndroid.so

I also find that the droid-config submodule is “too new” according to hadk-hot.
I reset that to the recommended sha and make some corresponding changes to the files on device,

To my surprise, after reboot… Lineage tries to start again. (logo appears, but it doesn’t start)
This may mean that I only get a couple of boots before the slot is changed → my bootctl service does not correctly mark the system as booted.
I force reboot it to recovery and in /data, .stowaways is still there. Flashing hybris-boot:

Writing ‘boot_b’ OKAY [ 0.705s]
Indeed, it switched slot. Let’s hope super is not gone again.

I have a healthcheck process that restarts, from /system/etc/init/flags_health_check.rc

on property:sys.boot_completed=1
    setprop persist.device_config.attempted_boot_count 0

on property:sys.init.updatable_crashing=1
    exec - system system -- /system/bin/flags_health_check UPDATABLE_CRASHING

That last line. But I see on the line above that boot_completed resets some counter.
Indeed, persist.device_config.attempted_boot_count returns 1 (after it just switched the slot).

Back to linkerconfig. There’s a patch by Elros to make the switch from bootstrap to default, linked above.
Before that:

# ls /linkerconfig/ld.config.txt -l
-rw-r--r--    1 root     root         92135 Mar 31 10:50 /linkerconfig/ld.config.txt
# ls /linkerconfig/default/ld.config.txt -l
-rw-r--r--    1 root     root          4907 Mar 31 10:49 /linkerconfig/default/ld.config.txt
# ls /linkerconfig/bootstrap/ld.config.txt -l
-rw-r--r--    1 root     root          4907 Mar 31 10:49 /linkerconfig/bootstrap/ld.config.txt

After: no default or bootstrap subfolders. ld-config.txt still is 92kib

Let’s “fix” that time_daemon: adding to disabled_services.rc

service time_daemon /vendor/bin/time_daemon_HYBRIS_DISABLED
        override
        disabled

I also find that the droid-config submodule is “too new” according to hadk-hot.
I reset that to the recommended sha and make some corresponding changes to the files on device,

Let’s try harder (this is the commit to “revert” on device)

/ # rm /usr/lib/systemd/user/jolla-actdead-charging.service.d/50-compositor.conf
/ # rm /usr/lib/systemd/user/jolla-startupwizard-pre-user-session.service.d/50-compositor.conf
/ # rm /usr/lib/systemd/user/lipstick.service.d/50-compositor.conf

Ok, now I do get bootanimation + surfaceflinger to work (It’s another sanity test - just start these two (disabled) services by hand to see something on screen. Lineage un-smile in this case.)
But not minimer, that doesn’t turn on the screen.

03-31 04:12:01.936  7994  8050 I Adreno  : IsValidNativeBuffer: Buffer has a NULL handle
03-31 04:12:01.936  7994  8050 I Adreno  : DequeueBuffer: Dequeued Buffer is not valid
03-31 04:12:01.936  5980  8001 W SDM     : DisplayBase::SetVSyncState: Can't enable vsync when display 54-0 is powered off or SecureDisplay/TUI in progress

7995 is minimer
8001 is /vendor/bin/hw/vendor.qti.hardware.display.composer-service

Fortunately I have the sources for that part: android_hardware_qcom_display/display_base.cpp at lineage-18.1-caf-sm8350 · LineageOS/android_hardware_qcom_display · GitHub

Also, I can strace -p 8001.

With a similar command, and another PID, I get a “irisConfigureGetIoctl” error.
Strace shows the ioctl, but how to decrypt?

ioctl(13, _IOC(_IOC_WRITE, 0x64, 0x90, 0x10), 0x7fe3bc1fe8) = -1 EINVAL (Invalid argument)

Nevermind. This ioctl error happens with surfaceflinger too.

However, instead of “SetVSyncState” error, when I start bootanim I get some other messages:

03-31 06:00:00.826 5988 5988 E FMQ : grantorIdx must be less than 3
03-31 06:00:00.828 5988 6899 I SDM : HWCColorModeStc::ApplyCurrentColorModeWithRenderIntent: Applying Stc mode (gamut 1 gamma 1 intent 1), curr mode 7, render intent 0, hdr present 0

logcat when minimer connects to composer:

HWCSession::RegisterCallback: Registering callback: Hotplug
HWCSession::RegisterCallback: Hotplugging primary...
HWCSession::RegisterCallback: Handling built-in displays...
HWCSession::RegisterCallback: Handling pluggable displays...
HWCSession::HandlePluggableDisplays: Handling hotplug...
HWCSession::HandlePluggableDisplays: Handling hotplug... Done.
HWCSession::RegisterCallback: Registering callback: Refresh
HWCSession::RegisterCallback: Registering callback: Vsync
DisplayBase::SetDisplayState: Set state = 1, display 54-0, teardown = 0
DisplayBase::SetDisplayState: Same state transition is requested.

logcat when surfaceflinger connects to composer:

HWCSession::RegisterCallback: Registering callback: Hotplug
HWCSession::RegisterCallback: Hotplugging primary...
HWCColorModeStc::GetColorModeCount: Supported color mode count = 4
HWCColorModeStc::GetColorModes: Color mode = 0 is supported
HWCColorModeStc::GetColorModes: Color mode = 7 is supported
HWCColorModeStc::GetColorModes: Color mode = 9 is supported
HWCColorModeStc::GetColorModes: Color mode = 12 is supported
HWCColorModeStc::GetRenderIntentCount: mode: 0 supported rendering intent count = 1
HWCColorModeStc::GetRenderIntents: Color mode = 0 is supported with render intent = 0
HWCColorModeStc::GetRenderIntentCount: mode: 7 supported rendering intent count = 1
HWCColorModeStc::GetRenderIntents: Color mode = 7 is supported with render intent = 0
HWCColorModeStc::GetRenderIntentCount: mode: 9 supported rendering intent count = 2
HWCColorModeStc::GetRenderIntents: Color mode = 9 is supported with render intent = 0
HWCColorModeStc::GetRenderIntents: Color mode = 9 is supported with render intent = 1
HWCColorModeStc::GetRenderIntentCount: mode: 12 supported rendering intent count = 1
HWCColorModeStc::GetRenderIntents: Color mode = 12 is supported with render intent = 2
HWCSession::RegisterCallback: Handling built-in displays...
HWCSession::RegisterCallback: Handling pluggable displays...
HWCSession::HandlePluggableDisplays: Handling hotplug...
HWCSession::HandlePluggableDisplays: Handling hotplug... Done.
HWCSession::RegisterCallback: Registering callback: Refresh
HWCSession::RegisterCallback: Registering callback: Vsync2.4
HWCSession::RegisterCallback: Registering callback: VsyncPeriodTimingChanged
HWCSession::RegisterCallback: Registering callback: SeamlessPossible
DisplayBase::SetDisplayState: Set state = 1, display 54-0, teardown = 0
DisplayBase::SetDisplayState: Same state transition is requested.

Notice Vsync vs Vsync 2.4
And other callbacks registered. Plus, getting color modes, and rendering intents (whatever that means)

Surfaceflinger… hhm.
There’s qt5-qpa-surfaceflinger-plugin recently adapted by nephros probably fixing build.
Maybe I should use that? But let’s not give up on hwcomposer yet.

Let’s disable minisfservice → this seems to make lipstick not fail at boot time. No screen changes though.

(’/vendor/bin/thermal-engine’ makes some noise like timed, I need to disable that too)

Bootctl seems to not start
droid-bootctl.service: Main process exited, code=exited, status=70/n/a
Trying this fix: GitHub - mer-hybris/hadk-faq: FAQ for Sailfish OS porting guide (HADK)
It seems to work - hopefully I wont’ have reboots that switch slots anytime soon…

Well, still no display, but less time fighting reboots:)

1 Like

Day 31

Building qt5-qpa-surfaceflinger-plugin
Scp-ing it to my device by telnet over usb0 ssh back to laptop.
Doesn’t install because it needs something that provides libsf - it seems that package exists too, libhybris-libsf, same time, so probably built from the same sources.

Next: start

LD_PRELOAD=/usr/libexec/droid-hybris/system/lib64/libsurfaceflinger.so /system/bin/surfaceflinger;

and in another terminal:

EGL_PLATFORM=null QT_QPA_PLATFORM=surfaceflinger /usr/lib64/qt5/bin/qmlscene main.qml

Both commands fail:)

CANNOT LINK EXECUTABLE “/system/bin/surfaceflinger”: library “/usr/libexec/droid-hybris/system/lib64/libsurfaceflinger.so” needed or dlopened by “/system/bin/surfaceflinger” is not accessible for the namespace “(default)”

So I really should get down to the root cause of linker config.

library “libsf_compat_layer.so” not found

The readme said so:

You will need a patched libsurfaceflinger (permission fix), libsf_compat_layer and a recent libhybris.

In the android buildenv make libsf_compat_layer gives many errors however

external/libhybris/libhybris/compat/surface_flinger/surface_flinger_compatibility_layer.cpp:76:45: error: no member named ‘getBuiltInDisplay’ in ‘android::SurfaceComposerClient’
external/libhybris/libhybris/compat/surface_flinger/surface_flinger_compatibility_layer.cpp:77:32: error: no member named ‘eDisplayIdMain’ in ‘android::ISurfaceComposer’
external/libhybris/libhybris/compat/surface_flinger/surface_flinger_compatibility_layer.cpp:102:32: error: no member named ‘eDisplayIdHdmi’ in ‘android::ISurfaceComposer’

And grepping the sailfishos-porters logs I don’t find that the people who had these errors fixed them…

Bisecting through web interface: https://android.googlesource.com/platform/frameworks/native/+log/master/libs/gui/include/gui/SurfaceComposerClient.h?s=838de0622c700345fbfde270c065fdc97f4b9428
88d37dd good
838de06 bad
9f03447 good
74e5377 good
80d94ad good
42d0456 good
dcb38bb bad
So we have the commit: Diff - dcb38bbd32eb96ec46d69658390353a853b3af6d^! - platform/frameworks/native - Git at Google

composerService()->getBuiltInDisplay(ISurfaceComposer::eDisplayIdMain)) becomes composerService()->getInternalDisplayToken()

But errors keep crawling up, I cannot find where was that DisplayInfo.h that was used some 10+ years ago…
Quitting this thread for the time.


Let’s disable some failing mount services:

# systemctl mask vendor-asusfw.mount
# systemctl mask vendor-vm\\x2dsystem.mount

qmlscene:

library “libcutils.so” needed or dlopened by “/vendor/lib64/hw/gralloc.default.so” is not accessible for the namespace “sphal”

/linkerconfig is just mounted from sda23 which is just userdata (/data)
But who writes it? (Look who’s posing smart an year ago)

Trying to edit /linkerconfig/ld.config.txt

namespace.default.permitted.paths += /usr/libexec/droid-hybris/system/${LIB}

In the droid-configs base submodule there is a ld.config.29.txt (from Android 10) but no such config for 30 (Android 11).

It is generated in /system/linkerconfig/main.cc

Adding this to droid-hal-early-init.sh

mount --bind /usr/libexec/droid-hybris/system/lib64/libcutils.so /odm/lib64/libcutils.so

“Fixes” the “sphal” error. The odm/lib64 links were

eglSubDriverAndroid.so -> egl/eglSubDriverAndroid.so
gralloc.default.so -> /vendor/lib64/hw/gralloc.default.so
libGLESv2_adreno.so -> egl/libGLESv2_adreno.so
libcutils.so -> /usr/libexec/droid-hybris/system/lib64/libcutils.so

However I now remember libcutils was especially called out as an example in linker docs of a library that can differ between vndk (sphal) and system…
So maybe I’d better remove the link & bind mount…


Back to another train of thought - test_hwcomposer crashes because of this assert.
But maybe it doesn’t need it. So I remove all the lines and leave just return create_hwcomposer2_window();
Rebuild libhybris (–mw) and scp it - doesn’t crash.
It just gives more of

04-03 05:49:28.184 16858 16858 I Adreno  : IsValidNativeBuffer: Buffer has a NULL handle
04-03 05:49:28.184 16858 16858 I Adreno  : DequeueBuffer: Dequeued Buffer is not valid

Just as qmlscene command does.

Maybe I can work out something from here?
My plan is to read through surfaceflinger sources to see what composer init it does that test_hwcomposer/qt5-qpa-hwcomposer does not.

This will take a while.
And there’s also a school holiday coming, so I won’t hold my breath for some weeks…

2 Likes

A few weeks without a livecast? Geez!
:scream:

Then I wish you a nice holiday, you have earned!

1 Like