Tuning the oom killer / low memory killer

So, lmkd isn’t the process name for lowmemorykiller, but they are different things altogether? How not confusing at all to an outside tinkerer who likes to assume things :smiley: So, lowmemorykiller seems to be an AOSP (and Android?) kernel thing, while lmkd runs inside the Android AppSupport container.

If that is the case, indeed tinkering with minfree does exactly what I assumed it did, aka. not killing Browser, Email was not placebo! But then, if it really affects the whole system, then it’s responsible for killing Android software too. Perhaps it even took some DNS or routing thingamajig with it, causing a plethora of other issues?

(I’m not editing the opening post just yet, things seem to move at quite the pace now!)

Yes actually, when there are no good native alternatives. Lipstick crashing is another thing altogether (unless it was killed by oom, ha) and that is luckily very rare for me to run into.

Grayed-out application covers is pretty much the primary thing I’m trying to get rid of here, no matter if it’s an Android app or a native one. The second annoyance is Bluetooth media controls and fingerpring sensor not working. Both of those have been non-issues for the last two days of semi-intensive testing for me, so it does look good…! But, more testing is needed. I want to get rid of placebo, and others test results, too.

3 Likes

I wonder who is actually defining minfree values. I see older devices have it defined by Sony, like XA2: https://github.com/sonyxperiadev/device-sony-nile/blob/8b8acd6d486cfe4e94bc2206137414d784099572/rootdir/vendor/etc/init/init.nile.rc#L61
Meanwhile 10 III has no such definition, so maybe it is using AOSP’s defaults, and that is why it is so high?
Oh, and interesting note is that all devices have this line: ro.lmk.use_minfree_levels=true, which probably means when we are changing minfree we are also affecting lmkd… It is a lot.

1 Like

The missing definition would indeed explain this nonsense! That’s worth of a bug report of its own - this is a thread in general. Would you mind doing it? It seems you have better understanding of this than I do.

Oh, nice! If lmkd is configured to use the same limits, that means minfree really affects both SFOS and Android apps! So no placebo!

3 Likes

I reported it on Sony’s bugtracker, hope someone will actually respond…

Actually, 4 values are a default, in theory! But then for XA2 series there should be 6 of them, because that is what Sony has defined… AAAA, it all makes no sense!

I meant a bug report here in FSO, but that’s even better actually :stuck_out_tongue_winking_eye:

Four values with an array of size 6? Uh. Oh, it seems to be fine…? AAAA, indeed!

1 Like

Created an issue for MCE’s improper reporting of memory pressure, as it is a separate issue but kind of related: MCE is using wrong metric to properly measure RAM usage

1 Like

I have no clue how I missed this until now, but the Kernel documentation really breaks down swappiness for me. And it contains this:

For in-memory swap, like zram or zswap, as well as hybrid setups that have swap on faster devices than the filesystem, values beyond 100 can be considered.

So, yeah, moving from 25 to 50 isn’t that radical :smiley:

2 Likes

I can no longer find lowmemorykiller.c in kernel for X10 III. So my theory is that Sony is no longer configuring it, because maybe newest Androids got rid of it in place of the LMKD?
But SFOS is still using it, but didn’t also add the configuration that was present on older devices.
If that theory is correct, then it probably is an omission on SFOS side. But I am waiting for a response on my bug, maybe someone more knowledgeable can answer there.

I finally hit oom with my setup (on purpose). I’m not quite sure what the process was (zygote perhaps?) but it took the Android AppSupport with it… Every Android app ended up grayed out, and once I clicked Deezer, a notification stated that it was getting started.

No application was in the foreground; I had swiped myself to the app cover view, locked the phone and put in my pocket. Deezer was playing music via Bluetooth though. Here’s the list of applications I had open:

  • Deezer (playing music, via Bluetooth)
  • WhatsApp
  • Browser (5 tabs)
  • Fennec (4 tabs)
  • Whisperfish
  • Email
  • Hydrogen
  • ToeTerm (with dbus logger)
  • ToeTerm (nothing running)
  • Clock
  • Settings (on Bluetooth page)
  • Nettiradio (not playing)
  • Föli (local bus ticket and routing app)
  • Slack
  • Battery Buddy

That’s quite a load, I’d say! All was good and dandy until I opened Deezer, and let it play for some 10 minutes. Before opening Deezer, I actually did a furious switching test between all apps a few rounds, and WhatsApp, Slack and Föli had some 1-2s delay after clicking the app cover, presumably because parts of them were swapped out. No other issues, however. I’m quite impressed - ignoring the fact that it decided to kill the whole Android subsystem, and me not being able to figure out which process was first killed…

The logs show a lot of going on, but nothing blatantly obvious to me… There’s no “expected” oom messages, but here’s the part I think is relevant (full dmesg output):

[82429.419823] binder: 21166 RLIMIT_NICE not set
[82429.447497] binder: 22505 RLIMIT_NICE not set
[82429.658595] sysrq: Show Blocked State
[82429.658631]   task                        PC stack   pid father
[82429.658852] hdcp_2x         D    0   517      2 0x00000008
[82429.658864] Call trace:
[82429.658882]  __switch_to+0x114/0x120
[82429.658894]  __schedule+0xb50/0xdcc
[82429.658903]  schedule+0x70/0x90
[82429.658914]  sde_hdcp_2x_main+0x8c/0x794
[82429.658923]  kthread+0x13c/0x15c
[82429.658932]  ret_from_fork+0x10/0x18
[82429.658939] dp_hdcp2p2      D    0   518      2 0x00000008
[82429.658948] Call trace:
[82429.658957]  __switch_to+0x114/0x120
[82429.658965]  __schedule+0xb50/0xdcc
[82429.658972]  schedule+0x70/0x90
[82429.658983]  dp_hdcp2p2_main+0x9c/0x5b4
[82429.658990]  kthread+0x13c/0x15c
[82429.658998]  ret_from_fork+0x10/0x18
[82429.662776] sysrq: Show backtrace of all active CPUs
[82429.785985] binder: release 21138:21330 transaction 13261867 out, still active
[82429.785991] binder: undelivered TRANSACTION_COMPLETE
[82429.788700] binder_thread_write: 19 callbacks suppressed
[82429.788705] binder: 21077:21077 BC_DEAD_BINDER_DONE 0000000000000003 not found
[82429.791473] binder: 5754:5754 BC_DEAD_BINDER_DONE 0000000000000001 not found
[82429.792698] binder: 18804:18804 BC_DEAD_BINDER_DONE 0000000000000001 not found
[82429.797283] binder: 20992:20992 BC_DEAD_BINDER_DONE 0000000000000001 not found
[82429.799330] binder: 21077:21077 BC_DEAD_BINDER_DONE 0000000000000001 not found
[82429.800691] binder: 5754:5754 BC_DEAD_BINDER_DONE 0000000000000004 not found
[82429.801869] binder: 5754:5754 BC_DEAD_BINDER_DONE 0000000000000005 not found
[82429.802557] binder: 21077:21077 BC_DEAD_BINDER_DONE 0000000000000002 not found
[82429.804620] binder: 5754:5754 BC_DEAD_BINDER_DONE 0000000000000002 not found
[82429.811647] init: Service 'zygote' (pid 38) received signal 9
[82429.847344] binder: 20907:21059 transaction failed 29189/-22, size 116-8 line 3099
[82429.986802] binder: 21077:21077 BC_DEAD_BINDER_DONE 0000000000000002 not found
[82430.028784] init: Sending signal 9 to service 'zygote' (pid 38) process group...
[82430.103591] libprocessgroup: Successfully killed process cgroup uid 0 pid 38 in 74ms
[82430.108014] binder: 20921:21452 transaction failed 29189/-22, size 64-0 line 3099
[82430.154282] init: Command 'write /sys/power/state on' action=onrestart (<Service 'zygote' onrestart>:2) took 0ms and failed: Unable to write to file '/sys/power/state': open() failed: Read-only file system
[82430.154336] init: Sending signal 9 to service 'audioserver' (pid 49) process group...
[82430.154419] libprocessgroup: Successfully killed process cgroup uid 1041 pid 49 in 0ms
[82430.154503] init: Sending signal 9 to service 'cameraserver' (pid 56) process group...
[82430.154560] libprocessgroup: Successfully killed process cgroup uid 1047 pid 56 in 0ms
[82430.154616] init: Sending signal 9 to service 'media' (pid 65) process group...
[82430.154658] libprocessgroup: Successfully killed process cgroup uid 1013 pid 65 in 0ms
[82430.154708] init: Sending signal 9 to service 'netd' (pid 37) process group...
[82430.154750] libprocessgroup: Successfully killed process cgroup uid 0 pid 37 in 0ms
[82430.156207] init: Could not restart 'wificond': Cannot find '/system/bin/wificond_ALIEN_DISABLED': No such file or directory
[82430.157226] init: Received sys.powerctl='shutdown' from pid: 1 (/system/bin/init)
[82430.157394] init: Clear action queue and start shutdown trigger
[82430.157474] init: processing action (shutdown_done) from (<Builtin Action>:0)
[82430.157487] init: Reboot start, reason: shutdown, rebootTarget: 
[82430.172498] init: Shutdown timeout: 6000 ms
[82430.172847] init: starting service 'blank_screen'...
[82430.174208] libprocessgroup: Failed to make and chown /acct/uid_1000: Read-only file system
[82430.174268] init: createProcessGroup(1000, 2030) failed for service 'blank_screen': Read-only file system
[82430.174558] init: Could not start shutdown critical service 'hwservicemanager': Cannot find '/system/bin/hwservicemanager_ALIEN_DISABLED': No such file or directory
[82430.174618] init: terminating init services
[82430.174669] init: Sending signal 15 to service 'vendor.thermal-hal-1-0_ALIEN' (pid 79) process group...
[82430.174786] init: Sending signal 15 to service 'vendor.drm-hal-1-0_ALIEN' (pid 78) process group...
[82430.174863] init: Sending signal 15 to service 'gatekeeperd' (pid 69) process group...
[82430.174941] init: Sending signal 15 to service 'media.swcodec' (pid 68) process group...
[82430.175009] init: Sending signal 15 to service 'storaged' (pid 67) process group...
[82430.175075] init: Sending signal 15 to service 'statsd' (pid 66) process group...
[82430.175211] init: Sending signal 15 to service 'media' (pid 65) process group...
[82430.175279] init: Sending signal 15 to service 'mediametrics' (pid 64) process group...
[82430.175345] init: Sending signal 15 to service 'mediaextractor' (pid 63) process group...
[82430.175416] init: Sending signal 15 to service 'mediadrm' (pid 62) process group...
[82430.175496] init: Sending signal 15 to service 'keystore' (pid 61) process group...
[82430.175568] init: Sending signal 15 to service 'installd' (pid 60) process group...
[82430.175632] init: Sending signal 15 to service 'incidentd' (pid 59) process group...
[82430.175701] init: Sending signal 15 to service 'idmap2d' (pid 58) process group...
[82430.175766] init: Sending signal 15 to service 'drm' (pid 57) process group...
[82430.175837] init: Sending signal 15 to service 'cameraserver' (pid 56) process group...
[82430.175902] init: Sending signal 15 to service 'mediacodec_ALIEN' (pid 54) process group...
[82430.175967] init: Sending signal 15 to service 'lmkd' (pid 51) process group...
[82430.176036] init: Sending signal 15 to service 'gpu' (pid 50) process group...
[82430.176105] init: Sending signal 15 to service 'audioserver' (pid 49) process group...
[82430.176234] init: Sending signal 15 to service 'ashmemd' (pid 48) process group...
[82430.176314] init: Sending signal 15 to service 'healthd' (pid 47) process group...
[82430.176385] init: Sending signal 15 to service 'system_suspend' (pid 46) process group...
[82430.176455] init: Sending signal 15 to service 'hidl_memory' (pid 45) process group...
[82430.176520] init: Sending signal 15 to service 'vendor.hwcomposer-2-1' (pid 44) process group...
[82430.176586] init: Sending signal 15 to service 'vendor.drm-clearkey-hal-1-2' (pid 43) process group...
[82430.176653] init: Sending signal 15 to service 'vendor.configstore-hal' (pid 42) process group...
[82430.176728] init: Sending signal 15 to service 'vendor.audio-hal-2-0' (pid 41) process group...
[82430.176794] init: Sending signal 15 to service 'zygote_secondary' (pid 39) process group...
[82430.176877] init: Sending signal 15 to service 'netd' (pid 37) process group...
[82430.227320] init: Service 'blank_screen' (pid 2030) exited with status 255
[82430.684057] binder: 21483:21483 transaction failed 29189/-22, size 80-0 line 3099
[82430.715209] binder: 21483:21483 transaction failed 29189/-22, size 2500-8 line 3099
[82430.779603] init: Untracked pid 476 received signal 9
[82433.196284] init: Terminating running services took 3038ms with remaining services:30
[82433.196355] init: Sending signal 9 to service 'vendor.thermal-hal-1-0_ALIEN' (pid 79) process group...
[82433.196587] libprocessgroup: Successfully killed process cgroup uid 1000 pid 79 in 0ms
[82433.196743] init: Sending signal 9 to service 'vendor.drm-hal-1-0_ALIEN' (pid 78) process group...
[82433.196851] libprocessgroup: Successfully killed process cgroup uid 1013 pid 78 in 0ms
[82433.196972] init: Sending signal 9 to service 'gatekeeperd' (pid 69) process group...
[82433.197078] libprocessgroup: Successfully killed process cgroup uid 1000 pid 69 in 0ms
[82433.198267] init: Sending signal 9 to service 'media.swcodec' (pid 68) process group...
[82433.198389] libprocessgroup: Successfully killed process cgroup uid 1046 pid 68 in 0ms
[82433.198508] init: Sending signal 9 to service 'storaged' (pid 67) process group...
[82433.198613] libprocessgroup: Successfully killed process cgroup uid 0 pid 67 in 0ms
[82433.198729] init: Sending signal 9 to service 'statsd' (pid 66) process group...
[82433.198839] libprocessgroup: Successfully killed process cgroup uid 1066 pid 66 in 0ms
[82433.198952] init: Sending signal 9 to service 'media' (pid 65) process group...
[82433.199057] libprocessgroup: Successfully killed process cgroup uid 1013 pid 65 in 0ms
[82433.200069] init: Sending signal 9 to service 'mediametrics' (pid 64) process group...
[82433.200480] libprocessgroup: Successfully killed process cgroup uid 1013 pid 64 in 0ms
[82433.200604] init: Sending signal 9 to service 'mediaextractor' (pid 63) process group...
[82433.200713] libprocessgroup: Successfully killed process cgroup uid 1040 pid 63 in 0ms
[82433.200828] init: Sending signal 9 to service 'mediadrm' (pid 62) process group...
[82433.200933] libprocessgroup: Successfully killed process cgroup uid 1013 pid 62 in 0ms
[82433.201047] init: Sending signal 9 to service 'keystore' (pid 61) process group...
[82433.201851] libprocessgroup: Successfully killed process cgroup uid 1017 pid 61 in 0ms
[82433.201985] init: Sending signal 9 to service 'installd' (pid 60) process group...
[82433.202097] libprocessgroup: Successfully killed process cgroup uid 0 pid 60 in 0ms
[82433.202696] init: Sending signal 9 to service 'incidentd' (pid 59) process group...
[82433.202816] libprocessgroup: Successfully killed process cgroup uid 1067 pid 59 in 0ms
[82433.202931] init: Sending signal 9 to service 'idmap2d' (pid 58) process group...
[82433.203036] libprocessgroup: Successfully killed process cgroup uid 1000 pid 58 in 0ms
[82433.203725] init: Sending signal 9 to service 'drm' (pid 57) process group...
[82433.204089] libprocessgroup: Successfully killed process cgroup uid 1019 pid 57 in 0ms
[82433.204459] init: Sending signal 9 to service 'cameraserver' (pid 56) process group...
[82433.204573] libprocessgroup: Successfully killed process cgroup uid 1047 pid 56 in 0ms
[82433.204688] init: Sending signal 9 to service 'mediacodec_ALIEN' (pid 54) process group...
[82433.204795] libprocessgroup: Successfully killed process cgroup uid 1046 pid 54 in 0ms
[82433.204908] init: Sending signal 9 to service 'lmkd' (pid 51) process group...
[82433.205012] libprocessgroup: Successfully killed process cgroup uid 1069 pid 51 in 0ms
[82433.205129] init: Sending signal 9 to service 'gpu' (pid 50) process group...
[82433.205814] libprocessgroup: Successfully killed process cgroup uid 1072 pid 50 in 0ms
[82433.205860] init: Sending signal 9 to service 'audioserver' (pid 49) process group...
[82433.205902] libprocessgroup: Successfully killed process cgroup uid 1041 pid 49 in 0ms
[82433.205945] init: Sending signal 9 to service 'ashmemd' (pid 48) process group...
[82433.205986] libprocessgroup: Successfully killed process cgroup uid 9999 pid 48 in 0ms
[82433.206029] init: Sending signal 9 to service 'healthd' (pid 47) process group...
[82433.206068] libprocessgroup: Successfully killed process cgroup uid 0 pid 47 in 0ms
[82433.206111] init: Sending signal 9 to service 'system_suspend' (pid 46) process group...
[82433.206504] libprocessgroup: Successfully killed process cgroup uid 1000 pid 46 in 0ms
[82433.206554] init: Sending signal 9 to service 'hidl_memory' (pid 45) process group...
[82433.206595] libprocessgroup: Successfully killed process cgroup uid 1000 pid 45 in 0ms
[82433.206639] init: Sending signal 9 to service 'vendor.hwcomposer-2-1' (pid 44) process group...
[82433.206679] libprocessgroup: Successfully killed process cgroup uid 1000 pid 44 in 0ms
[82433.206732] init: Sending signal 9 to service 'vendor.drm-clearkey-hal-1-2' (pid 43) process group...
[82433.206773] libprocessgroup: Successfully killed process cgroup uid 1013 pid 43 in 0ms
[82433.206818] init: Sending signal 9 to service 'vendor.configstore-hal' (pid 42) process group...
[82433.206858] libprocessgroup: Successfully killed process cgroup uid 1000 pid 42 in 0ms
[82433.206902] init: Sending signal 9 to service 'vendor.audio-hal-2-0' (pid 41) process group...
[82433.206941] libprocessgroup: Successfully killed process cgroup uid 1041 pid 41 in 0ms
[82433.206986] init: Sending signal 9 to service 'zygote_secondary' (pid 39) process group...
[82433.207026] libprocessgroup: Successfully killed process cgroup uid 0 pid 39 in 0ms
[82433.207069] init: Sending signal 9 to service 'netd' (pid 37) process group...
[82433.207108] libprocessgroup: Successfully killed process cgroup uid 0 pid 37 in 0ms
[82433.244585] vdc: Waited 0ms for vold
[82433.244784] binder: 20866:20866 transaction failed 29189/-22, size 92-0 line 3099
[82433.665955] init: Sending signal 9 to service 'vold' (pid 8) process group...
[82433.666099] libprocessgroup: Successfully killed process cgroup uid 0 pid 8 in 0ms
[82433.666729] init: Sending signal 9 to service 'tombstoned' (pid 70) process group...
[82433.666833] libprocessgroup: Successfully killed process cgroup uid 1058 pid 70 in 0ms
[82433.666917] init: Sending signal 9 to service 'logd' (pid 4) process group...
[82433.666969] libprocessgroup: Successfully killed process cgroup uid 1036 pid 4 in 0ms
[82433.667025] init: sync() before umount...
[82433.676232] init: sync() before umount took9ms
[82433.677029] init: sync() after umount...
[82433.678590] init: sync() after umount took1ms
[82433.779276] init: powerctl_shutdown_time_ms:3621:0
[82433.779344] init: Reboot ending, jumping to kernel

2 Likes

Bingo! So SFOS is all to blame for those high values!
First they manually increased the minfree values for 10 II, and then they gave 10 III values even 2x higher!
So those really high values were chosen by Jolla on purpose, disregarding the values Sony chose themselves. From my limited testing I just don’t see why, the phone is working better with values halved.
I think it is worth suggesting for those values to be reconsidered, that 2x or 3x lower values work better in our testing. It would even make sense, 2x lower values are the same as on 10 II and have been well tested.
Edit: Right now 10 III is having limits set 12x higher than defaults on Android, that is kind of silly

1 Like

I’ve just rebooted the phone and governors of all cores were reset to defaults. And so were minfree values in /sys/module/lowmemorykiller/parameters/minfree. Is it supposed to be like that?

1 Like

Yup, exactly as expected. That is why by default they are set during init, both the governor and minfree.

1 Like

OK, thanks. Shell script done to quickly re-set them :wink:

1 Like

Oh well, that’s not great news… It was set on purpose, so changing it falls out of the trivial category. But still, 12x the defaults is quite absurd. It could be the double daemons running though… In favor of the comments, I now increased the limits a bit.

1 Like

There is one more thing I am curious about. When we are tuning minfree values for lowmemorykiller, they are also affecting lmkd on Android side, because ro.lmk.use_minfree_levels=true. Here is documentation for it: Low Memory Killer Daemon  |  Android Open Source Project
What I am not sure about is if it reads configuration at boot, or all the time. That can change a lot, because if we are tuning only one part and not the other, then we will get different results than if Jolla changed the defaults and set them at boot.
So I edited my init file nano /vendor/etc/init/init.lena.rc and I will try to check if behaviour is different or the same as when changing values with the system running.

Edit: Yeah, testing an extreme scenario - default Sony values the behaviour seems to be different when setting it during boot. Previously with memory almost full Android apps got closed even when I was in them, and now (at boot) all works smoothly. So we need to account for that. Or maybe my testing is not repetitive enough, and is all placebo, idk
Buuuut maybe all we need is to restart Aliendalvik, for lkmd to start using new values?
Oh, and changing values to the Sony defaults, and commenting out adj, so it is all default seems to be working really well
Edit 2: According to Thaodan Android’s lkmd doesn’t use minfree values. So I was probably wrong and you don’t need to restart Aliendalvik when changing values.

So the SFOS commits for changing the minfree levels hint that Jolla is frightened of SFOS processes being killed (understandably). Thus, they set the minfree levels rather high, probably to get lmkd to kill Android applications before the system oom killer does it.
From this perspective, this seems to work quite well, every time I watch a video in my Fennec browser, the system is killing Android Apps / the whole Android Subsystem afterwards.
However, I adjusted the minfree levels on my Xperia 10 II (cut them by half) and restarted the Android subsystem. This definitely improved the stability of android apps (no more killing while watching a video in the browser), but probably at the cost of system stability to some degree (higher chance of oom killing a jolla process).
If the minfree values currently used by Jolla are empirically tested or if they came from a ‘just make sure its enough’ approach is hard to tell. They may have been tested to an extreme I won’t reach in daily usage, so it may be ok for me to change them but not for users longing for extreme reliability of the system. This is something, only Jolla can tell us.

3 Likes

True, the comments in the source code must be there for a reason.

However, my fingerprint sensor still works, mpris-proxy is fine, phone calls work and no applications have been killed after three days of uptime. I set the values to roughly 1/3 of the original values and they work for me. Just halving them worked wonders, too! But I know my use case is not the general case, but so far the ones who have tried this have had similar results.

1 Like

Funny, the reason they set it so high is exactly the opposite! The person writing the commit message wrote lkmd instead of lkm:

Set minfree to 6 times higher values, only then lmkd starts killing native
Sailfish OS apps (otherwise it only kills within the App Support container,
and when there are many native apps, Android apps eventually get killed upon
launch).

I tested it myself. If I set the values 12x lower (Sony defaults) then so much ram would get used by native apps, that when you tried to launch Android app it would get killed in 2 seconds after launching, because lkmd would see there is not enough RAM and would kill it.
So setting minfree values higher is actually to make lowmemorykiller kill native apps faster, to leave some space for Android apps, not the other way around! So if you don’t use Android apps, you can set it super low.

Also, I used super low Sony values for a few days of uptime, and I also got not a single problem with native services. Because low minfree doesn’t badly affect native apps, just can cause weird interference with lkmd. And I only managed to replicate that problem once, when my browser was using 30% of memory, with a few tabs and a 2 hours long YT video playing.

TL:DR Only reason for high minfree is too frequent killing of Android apps, and I only could replicate it once. I think current defaults could be way lower.

3 Likes

My experience is closer to the @JacekJagosz, than @jenix, one. On my Xperia 10 II, I feel that browser, mail, notes, clock - all native applications - are reaped earlier than Whatsapp and Firefox, being Android apps.

1 Like

That is why I actually prefer low values and are voting for them, experience with Sony values is sooo nice, it really feels like the RAM is so much bigger, because native apps don’t get closed, and Android apps deal with closing much more gracefully, remembering where you were. The only problem is that bug that can sometimes occur, where the ram is so full you can’t open Android apps.