Tuning the oom killer / low memory killer

Those do seem almost unreasonably small… Even the largest portion is only 2.1% of the total available RAM! Try increasing the values instead:

3072,4096,8192,32768

That’s 12MiB, 16MiB, 32MiB, 128MiB and 0.4%, 0.5%, 1.0%, 4.2% respectively.

I suggest also kicking swappines to 50 - assuming you have zram around 512MiB…1GiB (check it by running zramctl as root)

1 Like

NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram3 lzo 139,8M 636K 279,9K 560K 1 [SWAP]
/dev/zram2 lzo 139,8M 652K 287,8K 576K 1 [SWAP]
/dev/zram1 lzo 139,8M 644K 308,6K 600K 1 [SWAP]
/dev/zram0 lzo 139,8M 624K 298,6K 560K 1 [SWAP]

That’s around half a giga, so swappiness of 50 should be fine.

[root@XperiaXCompact defaultuser]# cat /proc/sys/vm/swappiness
60

So i set it to 50 like you suggest.

Is a restart mandatory after changing these Values?

1 Like

According to your first post, the sizes are in pages. Thus you need to make sure, that pagesize is the same when you compare those values between phones

That’s the default value; I think that’s a bit too aggressive, try 40-50 instead…

I started getting Browser crashes (they weren’t lkmd kills) so I’m using thirds of the original values (X10III w/6BG):

55720,63400,71080,78760,118830

Indeed. All the four devices I have at hand have the same 4096 bytes page size (even my desktop PC has it) but other sizes do exist, so it has to be checked.

Hi, you may be interested in my older blog post for this topic: Sailfish OS and memory :: karry.cz

just few corrections:

  • Linux kernel contains standard OOM killer, and Sailfish OS adds even non-standard lowmemory killer module. This is the guy that is causing troubles.
  • lmkd is android user-space lowmemory killer replacement. On Sailfish OS, it is running in container and affects just android applications.

My conclusion is that memory management in Sailfish is broken and in-kernel lowmemory killer should be replaced by some smarter user-space alternative. lmkd, systemd or SFOS specific implementation…

12 Likes

Combining this with swappiness of

$ cat /proc/sys/vm/swappiness
25

means that the system is likely to start sniping processes before swapping has a change to step in.

After reading this I am quite sure this is wrong. As I understand it, you could notice a performance impact by changing the swappiness, but it shouldnt influence the oom behavior in any way.

Please edit your first post if you come to the same conclusion. (or share your knowledge with me to convince me otherwise :wink:

However, I really appreciate that you are digging into this.

2 Likes

Thank you for tackling this problem! I would be great if Jolla would adopt your solution in the next release (trusting that your approach is going in the right direction)

2 Likes

Thank you for the suggestions and corrections, @karry and @thigg! I’ll review and update the first post tomorrow.

About lowmemorykiller:

That was quite an interesting read, karry! It looks like you tackled this before than me, and with more technical approach, too!

It is indeed an Android user-space daemon (that much I did get right) but I assumed it was running on the Sailfish OS side covering the whole system, because of SFOS is running on top of AOSP and that made all the sense to me. Should have checked though. The process lmkd is running only when the Android app support is running, so that indeed is the case.

That means my impression of setting lower values to the minfree file affecting e.g. Browser is 100% placebo. I guess the increased overall responsiveness just carried over. Edit: This also means Email and Browser simply vanishing on me is not because of oom but another bugs altogether. Oh dear.

However, this means that there’s no SFOS side user-space daemon to handle oom. (Or is there? I found nothing when I searched…) Having two oom daemons monitoring the same RAM space doesn’t really sound welcoming either. Perhaps something simple like only using earlyoom SFOS-side with system application oom score adjustment could work? The in-kernel oom killer doesn’t get a lot of praises, from what I’ve read…

About swappiness:

Indeed, I didn’t really understand vm.swappiness; it doesn’t mean what I think it did. After reading the excellent blog post, I think 25 is a good value for SFOS configuration. I’m going to keep using 50 still, as I simply prefer swapping over killed processes, and changing swappiness from 25 to 50 isn’t that drastic. The value goes up to 200 after all. There’s a lot of inactive stuff to page out, specially with 6GiB of applications loaded.

Then again, in another thread there’s someone with a swappiness of 60, but it’s a community port, and I guess swappiness value just wasn’t ever checked or considered.

3 Likes

Wow, amazing read! Thank for writing something so informative, I hope someone from Jolla will look at it.
Meanwhile I wonder if we could tweak oom_score_adj in any way? Or treshold at whick mce reports?
I wonder how many apps actually try to use mce’s memory pressure reporting? Does browser do it? It definitely should.
Also wouldn’t increasing swappiness actually help in this situation? With the same apps running and number seen by mce, there is more free RAM left, so apps should be killed a bit later. Also more chance for MCE to report pressure.

Oom score is changed by Lipstick. Instead of changing it, I would recommend to tune lowmemory killer parameters /sys/module/lowmemorykiller/parameters/adj and /sys/module/lowmemorykiller/parameters/minfree. Lipstick’s algorithm is fine IMHO.

MCE reporting is useless in current form. It is based on property that tells nothing about memory pressure. (But I may be wrong)

Yes, browser is listening it. It “close” tabs on the background when there is reported memory pressure. Another component is Qt webview I believe. And OSM Scout maps :wink:

Not sure. it is very fuzzy tunable.

As I understand it, increasing swappiness does not cause the system to swap more but to favor to “swap” files instead of non-file backed data. In that case, this parameter is a horribly complicated heuristic. And it doesnt influence the oom killer in any way

1 Like

Are you sure? It seems like, even if for indirect reasons, if you observe different behaviour over a long period of time, that there is some relation? Are you still testing with your adjusted values? I have a 10ii which is largely a paperweight because it is much less performant than the volla phone, so I could do some testing…

2 Likes

If I understood it correctly, all you did was correct actually. minfree parameter affects lowmemorykiller which is system-wide and affects native apps. So it wasn’t just placebo.
Android-compatibility only lmkd has different tunables actually, and we haven’t touched them yet!
So tuning minfree is exactly what we want.
@karry Do you think it would be a good idea to create a separate bug report about changing the behaviour of mce, based on your findings? Or maybe discussing it during a community meeting?

Are you guys using a lot of android apps? I have been using xperia 10 III since day two of official support coming out and have yet to experience all apps closing. From what I can tell all apps disappearing means lipstick was restarted (if for example app leaks memory and eats up all ram, some gifs/webms would cause that, but that bug was fixed some time ago), oom leaves greyed out app covers and that only happened once when using android app, never yet with native sfos apps, maybe it’s a popular app that’s misbehaving causing a lipstick restart and not an oom issue? Maybe going for ‘what apps did you have open when they all disappeared’ could give a hint

1 Like

So, lmkd isn’t the process name for lowmemorykiller, but they are different things altogether? How not confusing at all to an outside tinkerer who likes to assume things :smiley: So, lowmemorykiller seems to be an AOSP (and Android?) kernel thing, while lmkd runs inside the Android AppSupport container.

If that is the case, indeed tinkering with minfree does exactly what I assumed it did, aka. not killing Browser, Email was not placebo! But then, if it really affects the whole system, then it’s responsible for killing Android software too. Perhaps it even took some DNS or routing thingamajig with it, causing a plethora of other issues?

(I’m not editing the opening post just yet, things seem to move at quite the pace now!)

Yes actually, when there are no good native alternatives. Lipstick crashing is another thing altogether (unless it was killed by oom, ha) and that is luckily very rare for me to run into.

Grayed-out application covers is pretty much the primary thing I’m trying to get rid of here, no matter if it’s an Android app or a native one. The second annoyance is Bluetooth media controls and fingerpring sensor not working. Both of those have been non-issues for the last two days of semi-intensive testing for me, so it does look good…! But, more testing is needed. I want to get rid of placebo, and others test results, too.

3 Likes

I wonder who is actually defining minfree values. I see older devices have it defined by Sony, like XA2: https://github.com/sonyxperiadev/device-sony-nile/blob/8b8acd6d486cfe4e94bc2206137414d784099572/rootdir/vendor/etc/init/init.nile.rc#L61
Meanwhile 10 III has no such definition, so maybe it is using AOSP’s defaults, and that is why it is so high?
Oh, and interesting note is that all devices have this line: ro.lmk.use_minfree_levels=true, which probably means when we are changing minfree we are also affecting lmkd… It is a lot.

1 Like

The missing definition would indeed explain this nonsense! That’s worth of a bug report of its own - this is a thread in general. Would you mind doing it? It seems you have better understanding of this than I do.

Oh, nice! If lmkd is configured to use the same limits, that means minfree really affects both SFOS and Android apps! So no placebo!

3 Likes

I reported it on Sony’s bugtracker, hope someone will actually respond…

Actually, 4 values are a default, in theory! But then for XA2 series there should be 6 of them, because that is what Sony has defined… AAAA, it all makes no sense!