Dysfunctional timecounter during powersafe causes network (TCP-IP) troubles on XA2

REPRODUCIBILITY (% or how often): 100%
BUILD ID = OS VERSION (Settings > About product): Suomenlinna 4.3.0.12
HARDWARE (XA2, X10, X10 II, …): XA2
UI LANGUAGE: en-us
REGRESSION: (compared to previous public release: Yes, No, ?): ?

DESCRIPTION:

ssh connections to the XA2 are annyoing choppy and any network related action seems to suffer from overload situations.

PRECONDITIONS:

XA2 flashed to Suomenlinna, WiFI infrastructure known to work very well with countless other devices and very strong S/N ratio.

STEPS TO REPRODUCE:

  1. Power up, connect to WiFi, and make sure to interact with touchscreen to avoid entering any power saving mode for the next minute!
  2. devel-su

date; time ping -4 -c 5 -i 1 google.com ; sleep 61 ; date; time ping -4 -c 5 -i 1 google.com

  1. wait a minute and note the result doesn’t show any unexpected numbers.
  2. wait agin, now without intercating, so until display switches dark
  3. repeat the exactly same command from 2)
  4. After waiting some minutes, wake up the phone

EXPECTED RESULT:

Mon Nov 15 11:30:45 CET 2021
PING google.com (142.250.185.78): 56 data bytes
64 bytes from 142.250.185.78: seq=0 ttl=64 time=1.278 ms
64 bytes from 142.250.185.78: seq=1 ttl=64 time=19.334 ms
64 bytes from 142.250.185.78: seq=2 ttl=64 time=20.637 ms
64 bytes from 142.250.185.78: seq=3 ttl=64 time=1.600 ms
64 bytes from 142.250.185.78: seq=4 ttl=64 time=19.146 ms

google.com ping statistics —
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 1.278/12.399/20.637 ms
real 0m 4.04s <<<<<<<<<<<<<<<<<<<<<<<<<< This is what I expect
user 0m 0.00s
sys 0m 0.00s
Mon Nov 15 11:31:50 CET 2021 <<<<<<<<<<<<< This is what I expect too (61s+4s after invocation)
PING google.com (142.250.185.78): 56 data bytes
64 bytes from 142.250.185.78: seq=0 ttl=64 time=4.565 ms
64 bytes from 142.250.185.78: seq=1 ttl=64 time=20.215 ms
64 bytes from 142.250.185.78: seq=2 ttl=64 time=20.087 ms
64 bytes from 142.250.185.78: seq=3 ttl=64 time=19.929 ms
64 bytes from 142.250.185.78: seq=4 ttl=64 time=19.880 ms

google.com ping statistics —
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 4.565/16.935/20.215 ms
real 0m 4.04s <<<<<<<<<<<<<<<<<<<<<<<<<< With non-sleeping XA2, again I see what I expect
user 0m 0.00s
sys 0m 0.00s

ACTUAL RESULT:

When screen is still powerd up, actual result corresponds to expected result shown above!
When XA2 is sleeping, this is the result:

Mon Nov 15 11:33:56 CET 2021 <<<<<<<<<<- Note the time of invocation
PING google.com (142.250.185.78): 56 data bytes
64 bytes from 142.250.185.78: seq=0 ttl=64 time=3.691 ms
64 bytes from 142.250.185.78: seq=1 ttl=64 time=23.232 ms
64 bytes from 142.250.185.78: seq=2 ttl=64 time=20.705 ms
64 bytes from 142.250.185.78: seq=3 ttl=64 time=19.000 ms
64 bytes from 142.250.185.78: seq=4 ttl=64 time=17.515 ms

google.com ping statistics —
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 3.691/16.828/23.232 ms
real 0m 30.64s <<<<<<<<<<<<<---------- 30 seconds for 4*1s+latency??? Should be 4s, not 30s!!!
user 0m 0.00s
sys 0m 0.00s
Mon Nov 15 11:41:32 CET 2021 <<<<------!!! only after wkaing device up, sleep(1) continues, and shows that XA2 was 9 minutes in coma, instead of sleeping 60s!
PING google.com (142.250.185.78): 56 data bytes
64 bytes from 142.250.185.78: seq=0 ttl=64 time=1.550 ms
64 bytes from 142.250.185.78: seq=1 ttl=64 time=19.505 ms
64 bytes from 142.250.185.78: seq=2 ttl=64 time=21.037 ms
64 bytes from 142.250.185.78: seq=3 ttl=64 time=19.698 ms
64 bytes from 142.250.185.78: seq=4 ttl=64 time=7.333 ms

google.com ping statistics —
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 1.550/13.824/21.037 ms
real 0m 4.06s
user 0m 0.00s
sys 0m 0.00s

ADDITIONAL INFORMATION:

This very simplistic analysis reflects exacly all the network related problems I observed with XA2 and s1p (VoIP app), email (synchronizing) and SSH.
(RTT of google.com above shows WiFi Latency only, you will see different numbers usually)

I have no idea about arm timecounters (clocks) nor about linux counter implementations, with x6 and unix I’d simply guess TSC is incharge and switching to ACPI or i8254 would work it arround…
But this is a really severe issue which I hope Jolla will adderss really quick.

Can others please do the same test on different hardware? And maybe on XA2 with pre-Suomenlinna?

Thanks for your contribution, we need to track that down as quick as possible!

Conclusion of the numbers from above:
It’s about the hardware timecounter suffering from power saving state.
This affects at least sleep(1) and most likely any other programatically used timecounter (timeout) functions - obviously even in the TCP-IP stack.
I hope there’s a ARM timecounter keeping counting during sleep, which linux kernel is able to utilize for standard time keeping clock. So there’s just a little sysctl/kern-conf tweak needed to utilize the ‘correct’ clock. But as mentioned, I’m not the linux guy…

1 Like

Same behaviour on XA2 with 3.2.1

and I would bet on other devices and even earlier firmwares as well

It may be useful to repeat these tests after setting each of these (well maybe not the last):

mcetool  --set-suspend-policy=<enabled|disabled|early|disable_on_charger>

Accidentially found something interesting, while hunting a fingerprint reader problem (FP worked without problems on latest Pie (9 [50.2.A.3.77]), but after flashing Sailfish X, only garbage seemd to reach the daemon.
Re-flashign android confirmed that it’s operating fine hardware wise.
After re-re-flashing Sailfish falsified the issue.

Downgraded Oreo to 50.1.(don’t know the rest out of mind, the only 8.x version offered by Emma, wich is 8.0.0).
Not only my fingerprint reader is working like a charm after flahsing Sailfish X above Oreo (8.0.0), also the timekeeping (and the resulting networking) problem as I described initially doesn’t occur.

One caveat though: Indeed 802.11ac (5Ghz) suffers from weak signal strength (did use the nile17B oem, as I did before).

So we’re left in the woods - at least I am.
I have no idea which of the 70 GUID partitions contain what code with what combination.
Some are clear, the majority is not.

Since I don’t have time to investigate further, my concern is, that I now run very outdated binary code with plenty of exploitable, known and elswhere corrected security flaws, making Sailfish at best a second class citizen security wise.

I just can beg Jolla to release a comperhensive map, which illustrates which code from what partition is running after flashing Sailfish X (for each official supported device, which we pay money for). Especially the non-sailfish code - sailfish binaries themself are easy to follow.
And especially about the Nile16 vs. Nile17B code paths!

Will try to find some 8.1 firmware for XA2 and check things out.
What a pity. Why hasn’t this been noted by Jolla - seems nobody skilled testing the own product range. The XA2 is the uppermost limit in size for me (and some others too) and, despite not as beautiful as the JP1 back then, still the best choice if it’s about esthetics; for me and mabye for other soo :wink:

Unfortunately the vanishing issue with Oreo (instead of Pie) baseband turnd out to be wrong!
It must have been some other side effect, keeping the device from suspend-mode.
I see the tomecounter problem also with Oreo baseband Sailfish setup :frowning:
It seems sospend mode is entered a bit later after switching display off…

mcetool  --set-suspend-policy=disabled

solves the issue (yet only tested on Oreo baseband Sailfish X)
EDIT:
And to my astonishment, setting it to ‘early’ also solves the problem.
After inspecting

mcetool --get-suspend-stats

it’s clear why: suspend_time seconds don’t increase anymore, so “early” is equal to “disabled” on my XA2. If there only was docs…

So please nobody re-flash for the timecounter problem!

But still a valuable workaround: Fingerprint reader works perfectly after downgrading before re-flashing Sailfish X.