[4.1.0.23] Consistent black-screen(ish) in conjunction with mobile data captive-portal?

REPRODUCIBILITY (% or how often): seemingly 100%
BUILD ID = OS VERSION (Settings > About product): 4.1.0.23 (Kvarken)
HARDWARE (XA2, Xperia 10…): XA2 (Dual Sim)
UI LANGUAGE: German
REGRESSION: (compared to previous public release: Yes, No, ?): ?

DESCRIPTION:

This is a weird one–I don’t know how to describe it concisely. I think I might have run out of mobile data quota from my provider, triggering a “captive portal” notification from them pointing me at comviq.se/mersurf (“more surf”) and cutting my mobile data entirely. In this state when trying to use the music app (possibly coincidental–might just be bad timing), or immediately upon seeing the “login to use this network” popup on startup after reboot, the screen gets completely covered by a black screen.

I can’t seem to dismiss this at all with edge gestures/similar. Some experimenting indicates a sailfish loading spinner shows for a split second after a few seconds into the darkness, horizontally centered and vertically maybe 2cm below the top screen edge. This suggests to me that it’s probably a Qt/Silica overlay gone wrong, rather than more “severe” breakage?

To verify that it is indeed tied to the new captive-portal notification/behaviour or at least the phone network, I took out the SIM card, and after that I can boot the phone just fine, and use it without the cellular connection. After reseating the SIM and rebooting, the black-screen behaviour returns.

PRECONDITIONS:

Use a SIM where the mobile data connection tries to redirect captive-portal-style. Possibly also related to me currently roaming. Might be dependent on specific providers too: for context, I’m on a Comviq (SE) plan, currently roaming on Vodafone (DE).

STEPS TO REPRODUCE:

  1. Trigger the condition for getting a “login to use this network” popup from the mobile data network
  2. Perform an action that requires internet connectivity? maybe?
  3. Both directly after, and on each subsequent reboot, get a black screen that can’t be dismissed and blocks phone usage.

EXPECTED RESULT:

Works, I can use my phone, etc.

ACTUAL RESULT:

  1. Phone boots, shows the lockscreen
  2. After a few moments, the statusbar indicates 4G connection with full reception, mobile data symbol
  3. I get the captive-portal “you need to login to use this network” notification
  4. While notification is still up, screen “turns black” (completely black, but backlight still on)

ADDITIONAL INFORMATION:

As mentioned, I’m currently roaming in DE (Vodafone network), and my plan is an SE plan (Comviq). Not sure whether the roaming condition is relevant.

If there’s a setting to adjust or package that can be strategically uninstalled to temporarily disable the captive-portal behaviour introduced in 4.0.1, then I could test if that would let me work around the issue and continue using the phone with the SIM card in it (if that is indeed the source of the problem–but there’s seeming correlation at least). Ideally without fully uninstalling the web browser…

Right now I can’t really use the phone (well, SIM) at all, which is inconvenient to say the least! :smiley:

Some updates on this issue. I replicated it again after I ran into the same issue after reaching half my monthly data quota… again seemingly triggering a captive-portal-style “you need to log in to use this network” from over the cellular net mobile data connection, and again resulting in a black screen.

I had some energy to do some more digging and investigation. I ssh’d to the phone over the USB network connection and compared the state during “normal operation” (boot with SIM removed → works fine, but no cellular connection obviously) and during “black screen” (boot with SIM inserted → as soon as the “need to login” notification appears, crash to black screen).

First, I diff’d the ps output in both cases. The main notable thing here is that we’re missing lipstick, which explains the black backlit screen… it also explains the behaviour of briefly showing a Silica “loading” spinner briefly sometimes after the black screen happens, since that would be lipstick trying to start up again, I guess.

Next, I looked at the journal, in particular for what’s happening lipstick-wise, and indeed it looks like it segfaults and crashes. Here’s an excerpt:

...
Jul 01 22:01:58 reidh systemd[4379]: Started The lipstick UI.
Jul 01 22:01:58 reidh lipstick[4552]: [D] unknown:0 - Registered Bluetooth OBEX agent: "/com/jolla/obexservice/agent"
Jul 01 22:01:58 reidh systemd[4379]: Started The lipstick security prompt UI.
Jul 01 22:01:59 reidh polkitd(authority=local)[3033]: Registered Authentication Agent for unix-session:c1 (system bus name :1.86 [/usr/libexec/lipstick-security-ui], object path /org/sailfishos/Lipstick/SecurityUi/PolkitAgent, locale de_DE.utf8)
Jul 01 22:01:59 reidh dbus-daemon[4427]: dbus-daemon[4427]: [session uid=100000 pid=4427] Activating service name='com.jolla.settings.system.flashlight' requested by ':1.24' (uid=100000 pid=4552 comm="/usr/bin/lipstick -plugin evdevtouch -plugin evdev")
Jul 01 22:01:59 reidh lipstick[4552]: [D] unknown:0 - Specified Desktop file does not exist "/usr/share/applications/sailfish-homescreen-services.desktop"
Jul 01 22:01:59 reidh lipstick[4552]: [W] unknown:0 - No desktop entry for process name: "sailfish-homescreen-services"
Jul 01 22:02:01 reidh lipstick-security-ui[5214]: [W] unknown:0 - The Wayland connection broke. Did the Wayland compositor die?
Jul 01 22:02:01 reidh systemd[4379]: lipstick.service: Main process exited, code=killed, status=11/SEGV
Jul 01 22:02:01 reidh systemd[4379]: lipstick.service: Failed with result 'signal'.
Jul 01 22:02:01 reidh invoker[5244]: invoker: Invoking execution: '/usr/libexec/lipstick-security-ui-launcher'
Jul 01 22:02:01 reidh systemd[4379]: lipstick-security-ui.service: Main process exited, code=exited, status=1/FAILURE
Jul 01 22:02:01 reidh systemd[4379]: lipstick-security-ui.service: Failed with result 'exit-code'.
Jul 01 22:02:02 reidh systemd[4379]: lipstick.service: Service hold-off time over, scheduling restart.
Jul 01 22:02:02 reidh systemd[4379]: lipstick.service: Scheduled restart job, restart counter is at 1.
Jul 01 22:02:02 reidh systemd[4379]: lipstick-security-ui.service: Service hold-off time over, scheduling restart.
Jul 01 22:02:02 reidh systemd[4379]: lipstick-security-ui.service: Scheduled restart job, restart counter is at 1.
Jul 01 22:02:02 reidh systemd[4379]: Stopped The lipstick security prompt UI.
Jul 01 22:02:02 reidh systemd[4379]: Stopped The lipstick UI.
Jul 01 22:02:02 reidh systemd[4379]: Starting The lipstick UI...
Jul 01 22:02:02 reidh lipstick[5687]: == hwcomposer module ==
Jul 01 22:02:02 reidh lipstick[5687]:  * Address: 0xee7a9004
Jul 01 22:02:02 reidh lipstick[5687]:  * Module API Version: 3
Jul 01 22:02:02 reidh lipstick[5687]:  * HAL API Version: 0
Jul 01 22:02:02 reidh lipstick[5687]:  * Identifier: hwcomposer
Jul 01 22:02:02 reidh lipstick[5687]:  * Name: QTI Hardware Composer Module
Jul 01 22:02:02 reidh lipstick[5687]:  * Author: CodeAurora Forum
Jul 01 22:02:02 reidh lipstick[5687]: == hwcomposer module ==
...

(I can share the full 1008 lines of journal output as well, not sure what the best way of doing so would be though.)

And indeed it tries to restart a few times more before systemd steps in because of too many attempted restarts too quickly:

...
Jul 01 22:02:27 reidh systemd[4379]: Failed to start The lipstick UI.
Jul 01 22:02:28 reidh systemd[4379]: lipstick.service: Service hold-off time over, scheduling restart.
Jul 01 22:02:28 reidh systemd[4379]: lipstick.service: Start request repeated too quickly.
Jul 01 22:02:28 reidh systemd[4379]: lipstick.service: Failed with result 'signal'.

Not sure what the next step after this would be. I guess trying to launch lipstick from within gdb to get a stack trace, ideally with debugging symbols handy, would be best… if this would be useful and if those are available, I could do that and attach a proper trace. Not sure when I would have time for that right now, as this immediate next future got a bit unexpectedly busy for me… but I’m sure I’ll have some time for it Eventually™. Kind of feels like some rather big issue hiding here though if it’s taking lipstick down with it… and of course annoying with full breakage of phone services after exceeding half my data quota. :stuck_out_tongue:

I have a similar problem with a black screen, but when connecting to a wi-fi point. It is treated only by rebooting and turning off the wi-fi connection, and then you need to have time, otherwise the screen will go off again)