Tuning the oom killer / low memory killer

Oom score is changed by Lipstick. Instead of changing it, I would recommend to tune lowmemory killer parameters /sys/module/lowmemorykiller/parameters/adj and /sys/module/lowmemorykiller/parameters/minfree. Lipstick’s algorithm is fine IMHO.

MCE reporting is useless in current form. It is based on property that tells nothing about memory pressure. (But I may be wrong)

Yes, browser is listening it. It “close” tabs on the background when there is reported memory pressure. Another component is Qt webview I believe. And OSM Scout maps :wink:

Not sure. it is very fuzzy tunable.

As I understand it, increasing swappiness does not cause the system to swap more but to favor to “swap” files instead of non-file backed data. In that case, this parameter is a horribly complicated heuristic. And it doesnt influence the oom killer in any way

1 Like

Are you sure? It seems like, even if for indirect reasons, if you observe different behaviour over a long period of time, that there is some relation? Are you still testing with your adjusted values? I have a 10ii which is largely a paperweight because it is much less performant than the volla phone, so I could do some testing…

2 Likes

If I understood it correctly, all you did was correct actually. minfree parameter affects lowmemorykiller which is system-wide and affects native apps. So it wasn’t just placebo.
Android-compatibility only lmkd has different tunables actually, and we haven’t touched them yet!
So tuning minfree is exactly what we want.
@karry Do you think it would be a good idea to create a separate bug report about changing the behaviour of mce, based on your findings? Or maybe discussing it during a community meeting?

Are you guys using a lot of android apps? I have been using xperia 10 III since day two of official support coming out and have yet to experience all apps closing. From what I can tell all apps disappearing means lipstick was restarted (if for example app leaks memory and eats up all ram, some gifs/webms would cause that, but that bug was fixed some time ago), oom leaves greyed out app covers and that only happened once when using android app, never yet with native sfos apps, maybe it’s a popular app that’s misbehaving causing a lipstick restart and not an oom issue? Maybe going for ‘what apps did you have open when they all disappeared’ could give a hint

1 Like

So, lmkd isn’t the process name for lowmemorykiller, but they are different things altogether? How not confusing at all to an outside tinkerer who likes to assume things :smiley: So, lowmemorykiller seems to be an AOSP (and Android?) kernel thing, while lmkd runs inside the Android AppSupport container.

If that is the case, indeed tinkering with minfree does exactly what I assumed it did, aka. not killing Browser, Email was not placebo! But then, if it really affects the whole system, then it’s responsible for killing Android software too. Perhaps it even took some DNS or routing thingamajig with it, causing a plethora of other issues?

(I’m not editing the opening post just yet, things seem to move at quite the pace now!)

Yes actually, when there are no good native alternatives. Lipstick crashing is another thing altogether (unless it was killed by oom, ha) and that is luckily very rare for me to run into.

Grayed-out application covers is pretty much the primary thing I’m trying to get rid of here, no matter if it’s an Android app or a native one. The second annoyance is Bluetooth media controls and fingerpring sensor not working. Both of those have been non-issues for the last two days of semi-intensive testing for me, so it does look good…! But, more testing is needed. I want to get rid of placebo, and others test results, too.

3 Likes

I wonder who is actually defining minfree values. I see older devices have it defined by Sony, like XA2: https://github.com/sonyxperiadev/device-sony-nile/blob/8b8acd6d486cfe4e94bc2206137414d784099572/rootdir/vendor/etc/init/init.nile.rc#L61
Meanwhile 10 III has no such definition, so maybe it is using AOSP’s defaults, and that is why it is so high?
Oh, and interesting note is that all devices have this line: ro.lmk.use_minfree_levels=true, which probably means when we are changing minfree we are also affecting lmkd… It is a lot.

The missing definition would indeed explain this nonsense! That’s worth of a bug report of its own - this is a thread in general. Would you mind doing it? It seems you have better understanding of this than I do.

Oh, nice! If lmkd is configured to use the same limits, that means minfree really affects both SFOS and Android apps! So no placebo!

2 Likes

I reported it on Sony’s bugtracker, hope someone will actually respond…

Actually, 4 values are a default, in theory! But then for XA2 series there should be 6 of them, because that is what Sony has defined… AAAA, it all makes no sense!

I meant a bug report here in FSO, but that’s even better actually :stuck_out_tongue_winking_eye:

Four values with an array of size 6? Uh. Oh, it seems to be fine…? AAAA, indeed!

1 Like

Created an issue for MCE’s improper reporting of memory pressure, as it is a separate issue but kind of related: MCE is using wrong metric to properly measure RAM usage

1 Like

I have no clue how I missed this until now, but the Kernel documentation really breaks down swappiness for me. And it contains this:

For in-memory swap, like zram or zswap, as well as hybrid setups that have swap on faster devices than the filesystem, values beyond 100 can be considered.

So, yeah, moving from 25 to 50 isn’t that radical :smiley:

2 Likes

I can no longer find lowmemorykiller.c in kernel for X10 III. So my theory is that Sony is no longer configuring it, because maybe newest Androids got rid of it in place of the LMKD?
But SFOS is still using it, but didn’t also add the configuration that was present on older devices.
If that theory is correct, then it probably is an omission on SFOS side. But I am waiting for a response on my bug, maybe someone more knowledgeable can answer there.

I finally hit oom with my setup (on purpose). I’m not quite sure what the process was (zygote perhaps?) but it took the Android AppSupport with it… Every Android app ended up grayed out, and once I clicked Deezer, a notification stated that it was getting started.

No application was in the foreground; I had swiped myself to the app cover view, locked the phone and put in my pocket. Deezer was playing music via Bluetooth though. Here’s the list of applications I had open:

  • Deezer (playing music, via Bluetooth)
  • WhatsApp
  • Browser (5 tabs)
  • Fennec (4 tabs)
  • Whisperfish
  • Email
  • Hydrogen
  • ToeTerm (with dbus logger)
  • ToeTerm (nothing running)
  • Clock
  • Settings (on Bluetooth page)
  • Nettiradio (not playing)
  • Föli (local bus ticket and routing app)
  • Slack
  • Battery Buddy

That’s quite a load, I’d say! All was good and dandy until I opened Deezer, and let it play for some 10 minutes. Before opening Deezer, I actually did a furious switching test between all apps a few rounds, and WhatsApp, Slack and Föli had some 1-2s delay after clicking the app cover, presumably because parts of them were swapped out. No other issues, however. I’m quite impressed - ignoring the fact that it decided to kill the whole Android subsystem, and me not being able to figure out which process was first killed…

The logs show a lot of going on, but nothing blatantly obvious to me… There’s no “expected” oom messages, but here’s the part I think is relevant (full dmesg output):

[82429.419823] binder: 21166 RLIMIT_NICE not set
[82429.447497] binder: 22505 RLIMIT_NICE not set
[82429.658595] sysrq: Show Blocked State
[82429.658631]   task                        PC stack   pid father
[82429.658852] hdcp_2x         D    0   517      2 0x00000008
[82429.658864] Call trace:
[82429.658882]  __switch_to+0x114/0x120
[82429.658894]  __schedule+0xb50/0xdcc
[82429.658903]  schedule+0x70/0x90
[82429.658914]  sde_hdcp_2x_main+0x8c/0x794
[82429.658923]  kthread+0x13c/0x15c
[82429.658932]  ret_from_fork+0x10/0x18
[82429.658939] dp_hdcp2p2      D    0   518      2 0x00000008
[82429.658948] Call trace:
[82429.658957]  __switch_to+0x114/0x120
[82429.658965]  __schedule+0xb50/0xdcc
[82429.658972]  schedule+0x70/0x90
[82429.658983]  dp_hdcp2p2_main+0x9c/0x5b4
[82429.658990]  kthread+0x13c/0x15c
[82429.658998]  ret_from_fork+0x10/0x18
[82429.662776] sysrq: Show backtrace of all active CPUs
[82429.785985] binder: release 21138:21330 transaction 13261867 out, still active
[82429.785991] binder: undelivered TRANSACTION_COMPLETE
[82429.788700] binder_thread_write: 19 callbacks suppressed
[82429.788705] binder: 21077:21077 BC_DEAD_BINDER_DONE 0000000000000003 not found
[82429.791473] binder: 5754:5754 BC_DEAD_BINDER_DONE 0000000000000001 not found
[82429.792698] binder: 18804:18804 BC_DEAD_BINDER_DONE 0000000000000001 not found
[82429.797283] binder: 20992:20992 BC_DEAD_BINDER_DONE 0000000000000001 not found
[82429.799330] binder: 21077:21077 BC_DEAD_BINDER_DONE 0000000000000001 not found
[82429.800691] binder: 5754:5754 BC_DEAD_BINDER_DONE 0000000000000004 not found
[82429.801869] binder: 5754:5754 BC_DEAD_BINDER_DONE 0000000000000005 not found
[82429.802557] binder: 21077:21077 BC_DEAD_BINDER_DONE 0000000000000002 not found
[82429.804620] binder: 5754:5754 BC_DEAD_BINDER_DONE 0000000000000002 not found
[82429.811647] init: Service 'zygote' (pid 38) received signal 9
[82429.847344] binder: 20907:21059 transaction failed 29189/-22, size 116-8 line 3099
[82429.986802] binder: 21077:21077 BC_DEAD_BINDER_DONE 0000000000000002 not found
[82430.028784] init: Sending signal 9 to service 'zygote' (pid 38) process group...
[82430.103591] libprocessgroup: Successfully killed process cgroup uid 0 pid 38 in 74ms
[82430.108014] binder: 20921:21452 transaction failed 29189/-22, size 64-0 line 3099
[82430.154282] init: Command 'write /sys/power/state on' action=onrestart (<Service 'zygote' onrestart>:2) took 0ms and failed: Unable to write to file '/sys/power/state': open() failed: Read-only file system
[82430.154336] init: Sending signal 9 to service 'audioserver' (pid 49) process group...
[82430.154419] libprocessgroup: Successfully killed process cgroup uid 1041 pid 49 in 0ms
[82430.154503] init: Sending signal 9 to service 'cameraserver' (pid 56) process group...
[82430.154560] libprocessgroup: Successfully killed process cgroup uid 1047 pid 56 in 0ms
[82430.154616] init: Sending signal 9 to service 'media' (pid 65) process group...
[82430.154658] libprocessgroup: Successfully killed process cgroup uid 1013 pid 65 in 0ms
[82430.154708] init: Sending signal 9 to service 'netd' (pid 37) process group...
[82430.154750] libprocessgroup: Successfully killed process cgroup uid 0 pid 37 in 0ms
[82430.156207] init: Could not restart 'wificond': Cannot find '/system/bin/wificond_ALIEN_DISABLED': No such file or directory
[82430.157226] init: Received sys.powerctl='shutdown' from pid: 1 (/system/bin/init)
[82430.157394] init: Clear action queue and start shutdown trigger
[82430.157474] init: processing action (shutdown_done) from (<Builtin Action>:0)
[82430.157487] init: Reboot start, reason: shutdown, rebootTarget: 
[82430.172498] init: Shutdown timeout: 6000 ms
[82430.172847] init: starting service 'blank_screen'...
[82430.174208] libprocessgroup: Failed to make and chown /acct/uid_1000: Read-only file system
[82430.174268] init: createProcessGroup(1000, 2030) failed for service 'blank_screen': Read-only file system
[82430.174558] init: Could not start shutdown critical service 'hwservicemanager': Cannot find '/system/bin/hwservicemanager_ALIEN_DISABLED': No such file or directory
[82430.174618] init: terminating init services
[82430.174669] init: Sending signal 15 to service 'vendor.thermal-hal-1-0_ALIEN' (pid 79) process group...
[82430.174786] init: Sending signal 15 to service 'vendor.drm-hal-1-0_ALIEN' (pid 78) process group...
[82430.174863] init: Sending signal 15 to service 'gatekeeperd' (pid 69) process group...
[82430.174941] init: Sending signal 15 to service 'media.swcodec' (pid 68) process group...
[82430.175009] init: Sending signal 15 to service 'storaged' (pid 67) process group...
[82430.175075] init: Sending signal 15 to service 'statsd' (pid 66) process group...
[82430.175211] init: Sending signal 15 to service 'media' (pid 65) process group...
[82430.175279] init: Sending signal 15 to service 'mediametrics' (pid 64) process group...
[82430.175345] init: Sending signal 15 to service 'mediaextractor' (pid 63) process group...
[82430.175416] init: Sending signal 15 to service 'mediadrm' (pid 62) process group...
[82430.175496] init: Sending signal 15 to service 'keystore' (pid 61) process group...
[82430.175568] init: Sending signal 15 to service 'installd' (pid 60) process group...
[82430.175632] init: Sending signal 15 to service 'incidentd' (pid 59) process group...
[82430.175701] init: Sending signal 15 to service 'idmap2d' (pid 58) process group...
[82430.175766] init: Sending signal 15 to service 'drm' (pid 57) process group...
[82430.175837] init: Sending signal 15 to service 'cameraserver' (pid 56) process group...
[82430.175902] init: Sending signal 15 to service 'mediacodec_ALIEN' (pid 54) process group...
[82430.175967] init: Sending signal 15 to service 'lmkd' (pid 51) process group...
[82430.176036] init: Sending signal 15 to service 'gpu' (pid 50) process group...
[82430.176105] init: Sending signal 15 to service 'audioserver' (pid 49) process group...
[82430.176234] init: Sending signal 15 to service 'ashmemd' (pid 48) process group...
[82430.176314] init: Sending signal 15 to service 'healthd' (pid 47) process group...
[82430.176385] init: Sending signal 15 to service 'system_suspend' (pid 46) process group...
[82430.176455] init: Sending signal 15 to service 'hidl_memory' (pid 45) process group...
[82430.176520] init: Sending signal 15 to service 'vendor.hwcomposer-2-1' (pid 44) process group...
[82430.176586] init: Sending signal 15 to service 'vendor.drm-clearkey-hal-1-2' (pid 43) process group...
[82430.176653] init: Sending signal 15 to service 'vendor.configstore-hal' (pid 42) process group...
[82430.176728] init: Sending signal 15 to service 'vendor.audio-hal-2-0' (pid 41) process group...
[82430.176794] init: Sending signal 15 to service 'zygote_secondary' (pid 39) process group...
[82430.176877] init: Sending signal 15 to service 'netd' (pid 37) process group...
[82430.227320] init: Service 'blank_screen' (pid 2030) exited with status 255
[82430.684057] binder: 21483:21483 transaction failed 29189/-22, size 80-0 line 3099
[82430.715209] binder: 21483:21483 transaction failed 29189/-22, size 2500-8 line 3099
[82430.779603] init: Untracked pid 476 received signal 9
[82433.196284] init: Terminating running services took 3038ms with remaining services:30
[82433.196355] init: Sending signal 9 to service 'vendor.thermal-hal-1-0_ALIEN' (pid 79) process group...
[82433.196587] libprocessgroup: Successfully killed process cgroup uid 1000 pid 79 in 0ms
[82433.196743] init: Sending signal 9 to service 'vendor.drm-hal-1-0_ALIEN' (pid 78) process group...
[82433.196851] libprocessgroup: Successfully killed process cgroup uid 1013 pid 78 in 0ms
[82433.196972] init: Sending signal 9 to service 'gatekeeperd' (pid 69) process group...
[82433.197078] libprocessgroup: Successfully killed process cgroup uid 1000 pid 69 in 0ms
[82433.198267] init: Sending signal 9 to service 'media.swcodec' (pid 68) process group...
[82433.198389] libprocessgroup: Successfully killed process cgroup uid 1046 pid 68 in 0ms
[82433.198508] init: Sending signal 9 to service 'storaged' (pid 67) process group...
[82433.198613] libprocessgroup: Successfully killed process cgroup uid 0 pid 67 in 0ms
[82433.198729] init: Sending signal 9 to service 'statsd' (pid 66) process group...
[82433.198839] libprocessgroup: Successfully killed process cgroup uid 1066 pid 66 in 0ms
[82433.198952] init: Sending signal 9 to service 'media' (pid 65) process group...
[82433.199057] libprocessgroup: Successfully killed process cgroup uid 1013 pid 65 in 0ms
[82433.200069] init: Sending signal 9 to service 'mediametrics' (pid 64) process group...
[82433.200480] libprocessgroup: Successfully killed process cgroup uid 1013 pid 64 in 0ms
[82433.200604] init: Sending signal 9 to service 'mediaextractor' (pid 63) process group...
[82433.200713] libprocessgroup: Successfully killed process cgroup uid 1040 pid 63 in 0ms
[82433.200828] init: Sending signal 9 to service 'mediadrm' (pid 62) process group...
[82433.200933] libprocessgroup: Successfully killed process cgroup uid 1013 pid 62 in 0ms
[82433.201047] init: Sending signal 9 to service 'keystore' (pid 61) process group...
[82433.201851] libprocessgroup: Successfully killed process cgroup uid 1017 pid 61 in 0ms
[82433.201985] init: Sending signal 9 to service 'installd' (pid 60) process group...
[82433.202097] libprocessgroup: Successfully killed process cgroup uid 0 pid 60 in 0ms
[82433.202696] init: Sending signal 9 to service 'incidentd' (pid 59) process group...
[82433.202816] libprocessgroup: Successfully killed process cgroup uid 1067 pid 59 in 0ms
[82433.202931] init: Sending signal 9 to service 'idmap2d' (pid 58) process group...
[82433.203036] libprocessgroup: Successfully killed process cgroup uid 1000 pid 58 in 0ms
[82433.203725] init: Sending signal 9 to service 'drm' (pid 57) process group...
[82433.204089] libprocessgroup: Successfully killed process cgroup uid 1019 pid 57 in 0ms
[82433.204459] init: Sending signal 9 to service 'cameraserver' (pid 56) process group...
[82433.204573] libprocessgroup: Successfully killed process cgroup uid 1047 pid 56 in 0ms
[82433.204688] init: Sending signal 9 to service 'mediacodec_ALIEN' (pid 54) process group...
[82433.204795] libprocessgroup: Successfully killed process cgroup uid 1046 pid 54 in 0ms
[82433.204908] init: Sending signal 9 to service 'lmkd' (pid 51) process group...
[82433.205012] libprocessgroup: Successfully killed process cgroup uid 1069 pid 51 in 0ms
[82433.205129] init: Sending signal 9 to service 'gpu' (pid 50) process group...
[82433.205814] libprocessgroup: Successfully killed process cgroup uid 1072 pid 50 in 0ms
[82433.205860] init: Sending signal 9 to service 'audioserver' (pid 49) process group...
[82433.205902] libprocessgroup: Successfully killed process cgroup uid 1041 pid 49 in 0ms
[82433.205945] init: Sending signal 9 to service 'ashmemd' (pid 48) process group...
[82433.205986] libprocessgroup: Successfully killed process cgroup uid 9999 pid 48 in 0ms
[82433.206029] init: Sending signal 9 to service 'healthd' (pid 47) process group...
[82433.206068] libprocessgroup: Successfully killed process cgroup uid 0 pid 47 in 0ms
[82433.206111] init: Sending signal 9 to service 'system_suspend' (pid 46) process group...
[82433.206504] libprocessgroup: Successfully killed process cgroup uid 1000 pid 46 in 0ms
[82433.206554] init: Sending signal 9 to service 'hidl_memory' (pid 45) process group...
[82433.206595] libprocessgroup: Successfully killed process cgroup uid 1000 pid 45 in 0ms
[82433.206639] init: Sending signal 9 to service 'vendor.hwcomposer-2-1' (pid 44) process group...
[82433.206679] libprocessgroup: Successfully killed process cgroup uid 1000 pid 44 in 0ms
[82433.206732] init: Sending signal 9 to service 'vendor.drm-clearkey-hal-1-2' (pid 43) process group...
[82433.206773] libprocessgroup: Successfully killed process cgroup uid 1013 pid 43 in 0ms
[82433.206818] init: Sending signal 9 to service 'vendor.configstore-hal' (pid 42) process group...
[82433.206858] libprocessgroup: Successfully killed process cgroup uid 1000 pid 42 in 0ms
[82433.206902] init: Sending signal 9 to service 'vendor.audio-hal-2-0' (pid 41) process group...
[82433.206941] libprocessgroup: Successfully killed process cgroup uid 1041 pid 41 in 0ms
[82433.206986] init: Sending signal 9 to service 'zygote_secondary' (pid 39) process group...
[82433.207026] libprocessgroup: Successfully killed process cgroup uid 0 pid 39 in 0ms
[82433.207069] init: Sending signal 9 to service 'netd' (pid 37) process group...
[82433.207108] libprocessgroup: Successfully killed process cgroup uid 0 pid 37 in 0ms
[82433.244585] vdc: Waited 0ms for vold
[82433.244784] binder: 20866:20866 transaction failed 29189/-22, size 92-0 line 3099
[82433.665955] init: Sending signal 9 to service 'vold' (pid 8) process group...
[82433.666099] libprocessgroup: Successfully killed process cgroup uid 0 pid 8 in 0ms
[82433.666729] init: Sending signal 9 to service 'tombstoned' (pid 70) process group...
[82433.666833] libprocessgroup: Successfully killed process cgroup uid 1058 pid 70 in 0ms
[82433.666917] init: Sending signal 9 to service 'logd' (pid 4) process group...
[82433.666969] libprocessgroup: Successfully killed process cgroup uid 1036 pid 4 in 0ms
[82433.667025] init: sync() before umount...
[82433.676232] init: sync() before umount took9ms
[82433.677029] init: sync() after umount...
[82433.678590] init: sync() after umount took1ms
[82433.779276] init: powerctl_shutdown_time_ms:3621:0
[82433.779344] init: Reboot ending, jumping to kernel

2 Likes

Bingo! So SFOS is all to blame for those high values!
First they manually increased the minfree values for 10 II, and then they gave 10 III values even 2x higher!
So those really high values were chosen by Jolla on purpose, disregarding the values Sony chose themselves. From my limited testing I just don’t see why, the phone is working better with values halved.
I think it is worth suggesting for those values to be reconsidered, that 2x or 3x lower values work better in our testing. It would even make sense, 2x lower values are the same as on 10 II and have been well tested.
Edit: Right now 10 III is having limits set 12x higher than defaults on Android, that is kind of silly

I’ve just rebooted the phone and governors of all cores were reset to defaults. And so were minfree values in /sys/module/lowmemorykiller/parameters/minfree. Is it supposed to be like that?

1 Like

Yup, exactly as expected. That is why by default they are set during init, both the governor and minfree.

1 Like

OK, thanks. Shell script done to quickly re-set them :wink:

1 Like

Oh well, that’s not great news… It was set on purpose, so changing it falls out of the trivial category. But still, 12x the defaults is quite absurd. It could be the double daemons running though… In favor of the comments, I now increased the limits a bit.

1 Like

There is one more thing I am curious about. When we are tuning minfree values for lowmemorykiller, they are also affecting lmkd on Android side, because ro.lmk.use_minfree_levels=true. Here is documentation for it: Low Memory Killer Daemon  |  Android Open Source Project
What I am not sure about is if it reads configuration at boot, or all the time. That can change a lot, because if we are tuning only one part and not the other, then we will get different results than if Jolla changed the defaults and set them at boot.
So I edited my init file nano /vendor/etc/init/init.lena.rc and I will try to check if behaviour is different or the same as when changing values with the system running.

Edit: Yeah, testing an extreme scenario - default Sony values the behaviour seems to be different when setting it during boot. Previously with memory almost full Android apps got closed even when I was in them, and now (at boot) all works smoothly. So we need to account for that. Or maybe my testing is not repetitive enough, and is all placebo, idk
Buuuut maybe all we need is to restart Aliendalvik, for lkmd to start using new values?
Oh, and changing values to the Sony defaults, and commenting out adj, so it is all default seems to be working really well
Edit 2: According to Thaodan Android’s lkmd doesn’t use minfree values. So I was probably wrong and you don’t need to restart Aliendalvik when changing values.