Reliable crash SfOS through grep command

A fifo would explain it and is not unlikely. I have to admit, I still don’t get the point of the exercise :slight_smile: And I like fifos and all, but will probably concentrate on making them for websockets :slight_smile:

I think the only thing we learned from my end is that it doesn’t cause a kernel panic and a reboot on Volla?

Thank you @nephros for pointing to some reasons why grep may “hang”. Because for me (while testing the original topic) grep does not consume any CPU time when it hangs, it is likely similar to your “named pipe”-case example: The file content is inaccessible.

But as @poetaster repeatedly insisted, we should not get sidetracked and focus on reproducible SailfishOS reboots triggered by grepping or other means of reading files.
Knowing now, that the offender apparently is in a directory after /b*/, but before /sys/, I was able to narrow it down to:

  • fgrep -rsc foo /d/ > /dev/null # Always ends in rebooting the SailfishOS device
  • fgrep -rsc foo /d/* > /dev/null # Always ends in grep hanging

But /d/ does not contain any dot files or directories!?!
So what could cause the difference in behaviour? Note tat the hanging state can be explained by the many special files in subdirectories of /d/ and is not the point: What is triggering a reboot is the question of interest.
Now I am confused, but will stop for today.

Bingo:
cat /d/usb_serial0/readstatus # Triggers reboot of an Xperia X (and likely other Sony devices)

@poetaster, does that really not happen on your Volla phone or does /d/usb_serial0/ only exist when a Sony-derived kernel is in use?

@inte and @nephros, can you confirm that on your devices and denote here, which ones?

3 Likes

Xperia 10 (one) Dual SIM: That device/path does not exist, and at a quick glance I can’t see a similar one.

It makes reliably reboot a JollaC also : /

Congratulation for spotting the culprit file.

For information, the kernel last messages are :

<1>[  391.826287] Unable to handle kernel NULL pointer dereference at virtual address 00000024
<1>[  391.826348] pgd = f1108000
<1>[  391.826372] [00000024] *pgd=00000000
<0>[  391.826410] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
<6>[  391.826444] Modules linked in: radio_iris_transport wlan(O)
<6>[  391.826500] CPU: 3 PID: 8658 Comm: cat Tainted: G        W  O 3.10.49+0.0.99 #1
<6>[  391.826546] task: efcaec00 ti: f3a34000 task.ti: f3a34000
<6>[  391.826589] PC is at debug_read_status+0x38/0x1e0
<6>[  391.826624] LR is at do_readv_writev+0x118/0x20c
<6>[  391.826660] pc : [<c05706d8>]    lr : [<c0210af8>]    psr: 200a0013
<6>[  391.826660] sp : f3a35b78  ip : c05706b8  fp : f3a35bbc
<6>[  391.826719] r10: c05706a0  r9 : f380c000  r8 : f0614688
<6>[  391.826753] r7 : 00000000  r6 : 0000000f  r5 : f28536c0  r4 : 00000000
<6>[  391.826791] r3 : c128db2c  r2 : 00000200  r1 : 000080d0  r0 : f5001d00
<6>[  391.826830] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
<6>[  391.826874] Control: 10c5387d  Table: b110806a  DAC: 00000015

Followed by hexadecimal dump and a backtrace :

<6>[  391.837815] [<c05706d8>] (debug_read_status+0x38/0x1e0) from [<c0210af8>] (do_readv_writev+0x118/0x20c)
<6>[  391.837899] [<c0210af8>] (do_readv_writev+0x118/0x20c) from [<c0210c58>] (vfs_readv+0x6c/0x84)
<6>[  391.837977] [<c0210c58>] (vfs_readv+0x6c/0x84) from [<c0238f10>] (default_file_splice_read+0x198/0x2cc)
<6>[  391.838060] [<c0238f10>] (default_file_splice_read+0x198/0x2cc) from [<c0237ee8>] (do_splice_to+0x8c/0xa0)
<6>[  391.838142] [<c0237ee8>] (do_splice_to+0x8c/0xa0) from [<c0237fac>] (splice_direct_to_actor+0xb0/0x1ac)
<6>[  391.838219] [<c0237fac>] (splice_direct_to_actor+0xb0/0x1ac) from [<c0239400>] (do_splice_direct+0x6c/0x84)
<6>[  391.838297] [<c0239400>] (do_splice_direct+0x6c/0x84) from [<c02103b4>] (do_sendfile+0x17c/0x314)
<6>[  391.838373] [<c02103b4>] (do_sendfile+0x17c/0x314) from [<c02111e8>] (SyS_sendfile64+0xd4/0xdc)
<6>[  391.838449] [<c02111e8>] (SyS_sendfile64+0xd4/0xdc) from [<c0106b00>] (ret_fast_syscall+0x0/0x48)
<0>[  391.838519] Code: e30810d0 e50b2030 e3a02c02 e5930024 (e5948024) 
<4>[  391.846945] ---[ end trace da227214a82491bb ]---
1 Like

Just looking at it a bit blindly :

  • debug_read_status() is from drivers/usb/gadget/u_serial.c in the kernel sources, see Index of /sources/4.0.1.48/ (Jolla where are the sources for 4.3 ?) and the kernel-adaptation tarball.
  • in this function there are various accesses done to members of structures referenced by a pointer, namely ui_dev, tty and gser. We can discard tty since it is protected by a if (tty) whenever used. If I assume that ui_dev is properly defined, I end up with gser that may be NULL. Looking here and there in the file, it seems that it could be the case and its various usages are all protected. But not in here. There is a if (gser->get_dtr) that should have been if (gser && gser->get_dtr).

I’ve no idea where to submit a patch for kernel adaptation. Moreover, I even don’t know how to test this hypothesis because I’ve no idea how to recompile a kernel for my JollaC. And even if I know, I’ve no idea how to put it properly on the phone and recover in case it’s not working. I’ll ask on IRC #sailfishos-porter channel.

2 Likes

This file does not exist on the xperia 10ii, nor does a grep -r bla /usb* trigger a reboot.

Side note / off-topic:

It would, but the root directory ("/") and all its top level directories ("/*") should contain no files and directories starting with a dot (".").

The actual difference is the order in which the directories are processed, exacerbated by the fact that grep freezes when processing certain files; thus it does not reach the critical file triggering the reboot.

XA2 seems not having this file neither.

# cat /d/usb_serial0/readstatus
cat: can’t open ‘/d/usb_serial0/readstatus’: No such file or directory

Also, I couldn’t find it elsewhere.

/usb* sure yields nothing (because there is no such directory IMHO), but what about a
grep -Frsc bla /d/?
If that does not make your device reboot, you may continue looking for the offending file (or a different cause, though I assume that to be unlikely), while the search has ended for dcaliste (Jolla C) and me (Xperia X).

Edit: And inte (Xperia 10 II), plus ric9k (Xperia XA2).

Sorry, my fault. I meant a grep in /d/usb* doesn’t result in a reboot.
A grep issued in /d does indeed trigger the reboot, though

Well, I assume the diverging results mostly originate from everybody is testing with different commands (and thus testing different things).

So @ric9k, @poetaster, @nephros, does a
grep -Frsc bla /d/ # (please use exactly this command)
really not trigger a reboot on your devices?

And if it does not, does grep not return to the command prompt (that is the exact criteria for “to hang”) without using CPU (see top in a second window).

P.S.: @Inte, reading which file does trigger the reboot on your device?
You might use for determining this a

find /d/ -type f,c,b -fprint /filelist -execdir cat {} > /dev/null \;

Correct, no reboot.

But a non-cpu-using hang.

I did grep -Frsc bla /d/ | tee grep-Frsc-bla and both console output and the teed file hang hang for a while at the place below, and a couple of other places, but then continue after a couple of seconds:

nemo@PGXperia10:~ $ tail grep-Frsc-bla
/d/pmic-votable/PD_DISALLOWED_INDIRECT/force_val:0
/d/pmic-votable/PD_DISALLOWED_INDIRECT/status:0
/d/pmic-votable/DC_ICL/force_active:0
/d/pmic-votable/DC_ICL/force_val:0
/d/pmic-votable/DC_ICL/status:0
/d/pmic-votable/DC_SUSPEND/force_active:0
/d/pmic-votable/DC_SUSPEND/force_val:0
/d/pmic-votable/DC_SUSPEND/status:0
/d/pmic-votable/PL_ENABLE_INDIRECT/force_active:0
/d/pmic-votable/PL_ENABLE_Inemo@PGXperia10:~ $

It seems to finally hang forever at

/d/tracing/events/raw_syscalls/enable:0
/d/tracing/events/raw_syscalls/filter:0
/d/tracing/events/enable:0
/d/tracing/events/header_event:0
/d/tracing/events/header_page:0
/d/tracing/set_event:0
/d/tracing/available_events:3
/d/tracing/max_graph_depth:0
/d/tracing/printk_formats:0
/d/tracing/instances/wifi/set_ftrace_pid:0

strace sais:

 strace -p21504
strace: Process 21504 attached
close(9)                                = 0
lstat64("/d/tracing/instances/wifi/per_cpu/cpu5/trace", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
openat(AT_FDCWD, "/d/tracing/instances/wifi/per_cpu/cpu5/trace", O_RDONLY|O_LARGEFILE) = 9
fstat64(9, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(9, "# tracer: nop\n#\n# entries-in-buf"..., 4096) = 432
read(9, "", 4096)                       = 0
close(9)                                = 0
lstat64("/d/tracing/instances/wifi/per_cpu/cpu5/trace_pipe", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
openat(AT_FDCWD, "/d/tracing/instances/wifi/per_cpu/cpu5/trace_pipe", O_RDONLY|O_LARGEFILE) = 9
fstat64(9, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(9,

lsof -p sais:

COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
grep    21504 nemo  cwd    DIR  252,2     4096 262145 /home/nemo
grep    21504 nemo  rtd    DIR  252,0     4096      2 /
grep    21504 nemo  txt    REG  252,0   305908   7330 /usr/bin/busybox
grep    21504 nemo  mem    REG  252,0   317128   7307 /usr/lib/libpcre.so.1.2.10
grep    21504 nemo  mem    REG  252,0    34464  32294 /lib/libgcc_s-8.3.0.so.1
grep    21504 nemo  mem    REG  252,0    83900    958 /lib/libpthread-2.30.so
grep    21504 nemo  mem    REG  252,0    13804    944 /lib/libdl-2.30.so
grep    21504 nemo  mem    REG  252,0  1079604    940 /lib/libc-2.30.so
grep    21504 nemo  mem    REG  252,0   101044  25534 /usr/lib/libselinux.so.1
grep    21504 nemo  mem    REG  252,0     9804   8623 /usr/lib/libpreloadpatchmanager.so
grep    21504 nemo  mem    REG  252,0   113904    928 /lib/ld-2.30.so
grep    21504 nemo    0u   CHR  136,0      0t0      3 /dev/pts/0
grep    21504 nemo    1w  FIFO   0,10      0t0 406690 pipe
grep    21504 nemo    2u   CHR  136,0      0t0      3 /dev/pts/0
grep    21504 nemo    3r   DIR    0,7        0      1 /sys/kernel/debug
grep    21504 nemo    4r   DIR    0,9        0      1 /sys/kernel/debug/tracing
grep    21504 nemo    5r   DIR    0,9        0   1134 /sys/kernel/debug/tracing/instances
grep    21504 nemo    6r   DIR    0,9        0  82427 /sys/kernel/debug/tracing/instances/wifi
grep    21504 nemo    7r   DIR    0,9        0  97734 /sys/kernel/debug/tracing/instances/wifi/per_cpu
grep    21504 nemo    8r   DIR    0,9        0  97759 /sys/kernel/debug/tracing/instances/wifi/per_cpu/cpu4
grep    21504 nemo    9r   REG    0,9        0  97762 /sys/kernel/debug/tracing/instances/wifi/per_cpu/cpu4/trace_pipe_raw

That file being called a “pipe” makes me suspicious, is this supposed to act as a kind of fifo, producing data only after enabling some kind of trace? That would explain why grep doesn’t get anything.
The file itself looks like a regular file, but probably debugfs doesn’t have real fifo or other special files.

In general I wonder why the kernel debugfs (so /d/) is mounted at all in “production” devices. It is done by sys-kernel-debug.mount and sys-kernel-debug-tracing.mount, but why?

I get slightly different but comparable output:

Seems to hang without further load:

/d/fpsgo/fstb/fpsgo_status:0
/d/fpsgo/minitop/enable:0
/d/fpsgo/minitop/thrs_heavy:0
/d/fpsgo/minitop/cooldn_order:0
/d/fpsgo/minitop/warmup_order:0
/d/fpsgo/minitop/minitop_n:0
/d/fpsgo/minitop/list:0
/d/fpsgo/xgf/deplist:0
/d/fpsgo/fbt/ultra_rescue:0
/d/fpsgo/fbt/enable_switch_sync_flag:0
/d/fpsgo/fbt/enable_switch_down_throttle:0
/d/fpsgo/fbt/boost_ta:0
/d/fpsgo/fbt/enable_uclamp_boost:0
/d/fpsgo/fbt/table:0
/d/fpsgo/fbt/enable_fteh:0
/d/fpsgo/fbt/switch_idleprefer:0
/d/fpsgo/fbt/fbt_info:0
/d/fpsgo/fbt/light_loading_policy:0

strace -p22634

strace: Process 22634 attached
read(4, “\n<0>,[493993.272568],784052,2005”…, 4096) = 4066
read(4, “, 0, 0, 0, 0,[11”…, 4096) = 4067
read(4, “, 0, 0, 0, 0,[27”…, 4096) = 1816
read(4, “\n<0>,[493994.263862],784052,2005”…, 4096) = 4066
read(4, “, 0, 0, 0, 0,[11”…, 4096) = 4067
read(4, “, 0, 0, 0, 0,[27”…, 4096) = 1816
read(4, “\n<0>,[493995.286057],784060,2005”…, 4096) = 4066
read(4, “, 0, 0, 0, 0,[11”…, 4096) = 4067
read(4, “, 0, 0, 0, 0,[27”…, 4096) = 1816
read(4, “\n<0>,[493996.278636],783928,2005”…, 4096) = 4066
read(4, “, 0, 0, 0, 0,[11”…, 4096) = 4067
read(4, “, 0, 0, 0, 0,[27”…, 4096) = 1816
read(4, “\n<0>,[493997.270119],783928,2005”…, 4096) = 4066

lsof:

grep 22634 defaultuser 1w FIFO 0,10 0t0 2420904 pipe

A pipe is a fifo. And in this case, FIFO == GIGO :slight_smile:

I believe that piggz has / had (was so in 3.4) more kernel debugging on for the volla kernels. ln /sys/kernel/debug/fpsgo/common/fpsgo_enable is 1

Yes, that’s the pipe connecting grep and tee.

But I was referring to the file /sys/kernel/debug/tracing/instances/wifi/per_cpu/cpu4/trace_pipe_raw

@nephros, @poetaster, thank l you for testing. Well, if the (ab-)used utility for triggering the reboot hangs, it cannot get to the file which triggers a reboot.
Thus my strategy was to omit the locations which cause hangs repeatedly (by placing the start of the search after them, again and again), which is how I ultimately determined that file triggering the reboots for the Xperia X and Jolla C.

IMO you both became side-tracked into researching the “hangs” again, which is not the original question, albeit an interesting one.

I do like the question @nephros raises, because if there is no real reason (other than convenience for developers) and we can show that this bears some dangers on most devices (hence my insisting on testing), the logical consequence is to ask Jolla and SailfishOS porters to switch it off:

P.S.: BTW, the “pipe” / “FIFO” file(s) you both observed are just what is commonly called “named pipe” on UNIXes. I can see why one might employ that for debugging purposes.

As another sidetracking side-node, you can’t simply stop the mount service or unmount the tracing fs. But you can compile trace-cmd and use trace-cmd stop or reset to stop all traces, and then umount it.

You need to do more app development :slight_smile: nudge, nudge. And another Gibli Ambience set :slight_smile:

I confirm: reboot (or a shutdown because I wasn’t there to enter the unlock code)
The rest of what you are talking about is to me like latin…no: ancient greek