Strange GUI freeze with polling threads
#1
Hi all,

by making use of an automatic command issuer tool (namely, the xbmc-monkey :-) ), I'm testing the platform stability of XBMC - SVN revision 20182 - upon an Ubuntu 8.04 installation, with kernel 2.6.24-23-generic #1 SMP. This tool sends plenty of commands, separated by a random time sleep.

Every now and then I get an application freeze. In the last of them, I've extracted the xbmc related rows from a top -H command:

Code:
PID     USER      PR  NI  VIRT     RES   SHR  S  %CPU %MEM  TIME+   COMMAND
25533   xbmc      20   0  3039m    556m  43m  T   0.0 36.6  3:38.15  xbmc.bin
25574   xbmc      20   0  3039m    556m  43m  T   0.0 36.6  0:04.16  xbmc.bin
25575   xbmc      20   0  3039m    556m  43m  T   0.0 36.6  0:00.00  xbmc.bin
25576   xbmc      20   0  3039m    556m  43m  T   0.0 36.6  0:00.00  xbmc.bin
25582   xbmc      20   0  3039m    556m  43m  T   0.0 36.6  0:01.16  xbmc.bin
25583   xbmc      20   0  3039m    556m  43m  T   0.0 36.6  0:00.00  xbmc.bin
3789    xbmc      20   0  3039m    556m  43m  S   0.7 36.6  0:11.00  xbmc.bin

Notice that PIDS 25533, 25574, 25582 and 3789 are the only that have been active in this session: here follows a strace -f to the main process pid in order to trace down all the threads syscalls. This trace is a never-ending sequence of rows like the followings:

Code:
25582 <... nanosleep resumed> {0, 10000000}) = 0
25582 poll([{fd=31, events=POLLIN}], 1, 0) = 0
25582 nanosleep({0, 10000000}, {0, 10000000}) = 0
25582 poll([{fd=31, events=POLLIN}], 1, 0) = 0
25582 nanosleep({0, 10000000},  <unfinished ...>
25533 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
25533 futex(0x103212e0, 0x81 /* FUTEX_??? */, 1) = 0
25533 gettimeofday({1242658914, 840295}, NULL) = 0
25533 gettimeofday({1242658914, 840318}, NULL) = 0
25533 clock_gettime(CLOCK_REALTIME, {1242658914, 840341717}) = 0
25533 futex(0x104c17ac, 0x80 /* FUTEX_??? */, 5457 <unfinished ...>
3789  <... poll resumed> [{fd=23, events=POLLIN|POLLERR|POLLNVAL, revents=POLLIN}], 1, -1) = 1
3789  ioctl(24, 0x4122, 0xbf920fb8)     = 0
3789  ioctl(24, 0x4122, 0x1800ffff)     = 0
3789  ioctl(24, 0x4122, 0xbf920f48)     = 0
3789  ioctl(24, 0x4122, 0x1)            = 0
3789  ioctl(24, 0x4122, 0xbf920fc8)     = 0
3789  ioctl(24, 0x4122, 0xbf920fa8)     = 0
3789  ioctl(24, 0x4122, 0xbf920fb8)     = 0
3789  ioctl(24, 0x4122, 0xbf920fb8)     = 0
3789  ioctl(24, 0x4122, 0)              = 0
3789  ioctl(25, USBDEVFS_IOCTL, 0x104f72a0) = 0
3789  ioctl(24, 0x4122, 0)              = 0
3789  semop(107380736, 0xbf920f0a, 2)   = 0
3789  semop(107380736, 0xbf920f16, 1)   = 0
3789  read(23, "\1\0\0\0\352l\0\0!\365\266\r\1\0\0\0", 64) = 16
3789  gettimeofday({1242658914, 846024}, NULL) = 0
3789  ioctl(24, 0x4122, 0xa)            = 0
3789  ioctl(24, 0x4122, 0xbf920fe0)     = 0
3789  ioctl(24, 0x4122, 0xbf920ff8)     = 0
3789  poll( <unfinished ...>
25582 <... nanosleep resumed> {0, 10000000}) = 0
25582 poll([{fd=31, events=POLLIN}], 1, 0) = 0
25582 nanosleep({0, 10000000}, {0, 10000000}) = 0
25582 poll([{fd=31, events=POLLIN}], 1, 0) = 0
25582 nanosleep({0, 10000000},  <unfinished ...>

We have no news about 25574, but all the others are active and, I guess, stuck for something to happen.

Here are the very last rows of my log (if you need more of them I could upload it somewhere and show the full log output):

Code:
16:50:27 T:2983144336 M:930058240   DEBUG: webs: accept request
16:50:27 T:3053098848 M:930058240  NOTICE: DVDPlayer: Opening: Videos/TV/RAI2.mpg
16:50:27 T:3053098848 M:930058240  NOTICE: CDVDPlayer::CloseFile()
16:50:27 T:3053098848 M:930058240  NOTICE: DVDPlayer: waiting for threads to exit
16:50:27 T:3214027664 M:930058240   DEBUG: CDVDPlayerAudio:: Discontinuty - was:-6970866.738502, should be:48366133.265632, error:55337000.004135
16:50:27 T:3205634960 M:930058240  NOTICE: CDVDPlayer::OnExit()
16:50:27 T:3205634960 M:930058240  NOTICE: DVDPlayer: closing audio stream
16:50:27 T:3205634960 M:930058240  NOTICE: Closing audio stream
16:50:27 T:3205634960 M:930058240  NOTICE: Waiting for audio thread to exit
16:50:27 T:3214027664 M:930058240   ERROR: AddPackets - failed to add leftover bytes to render
16:50:27 T:3214027664 M:930058240  NOTICE: thread end: CDVDPlayerAudio::OnExit()
16:50:27 T:3214027664 M:930058240   DEBUG: Thread 3214027664 terminating
16:50:27 T:3205634960 M:930058240  NOTICE: Closing audio device
16:50:27 T:3205634960 M:930144256  NOTICE: Deleting audio codec
16:50:27 T:3205634960 M:930144256  NOTICE: CDVDPlayer::OnExit() deleting demuxer
16:50:27 T:3205634960 M:930144256  NOTICE: CDVDPlayer::OnExit() deleting input stream
16:50:27 T:3205634960 M:930144256   DEBUG: Thread 3205634960 terminating
16:50:27 T:3053098848 M:930144256  NOTICE: DVDPlayer: finished waiting
16:50:27 T:3053098848 M:930144256   DEBUG: LinuxRendererGL: Cleaning up GL resources
16:50:27 T:3053098848 M:930144256 WARNING: CDVDMessageQueue(player)::Put MSGQ_NOT_INITIALIZED

The only strange thing I see in the log is the
Code:
ERROR: AddPackets - failed to add leftover bytes to render
stuff.

Any ideas about this freeze?

Thanks in advance.
M.
Reply
#2
run through gdb and get us a proper backtrace. then the strace might be useful as extra information. on its own it's rather useless.
Reply
#3
spiff Wrote:run through gdb and get us a proper backtrace. then the strace might be useful as extra information. on its own it's rather useless.

You are right. I was only hoping that someone was aware of some similar situations... but I guess it was a little bit inappropriate for a developer zone.
I'm not sure I will be able to get the very same application freeze, since as I said in first place, it is a totally random phenomenon generated by an exponential random sleep time function with a pseudo-random command sequence generated at a fairly high rate.

I will do my best to let you have a gdb bt.

M.
Reply
#4
I have to say its an interesting approach to fuzz test htpc software! But probably a very good idea.
Reply
#5
Might wanna setup your test to output stuff like the sleep times so you can more easily verify that you're on the right track.

In general, the more times you describe a problem as "random" the less you actually understand it.
Reply
#6
althekiller Wrote:Might wanna setup your test to output stuff like the sleep times so you can more easily verify that you're on the right track.

Yes, I did that. My script outputs the sequence of commands and sleep time in order to walk over a deterministic path if needed. What it changes from session to session is made up by the starting conditions of the system, the reaction rate and the actual time interposed (the computation of random sleep times PLUS the sleep time will be different from the reading of the sleep time from the file PLUS the sleep time itself). This is to say that the very same command and sleep time sequence never takes the application to the very same places...

althekiller Wrote:In general, the more times you describe a problem as "random" the less you actually understand it.

This is true. Consider, though, that this script helped me finding some problems (either with XBMC as in trac tickets 6430 and 6431 or with my python scripts), so I would rather keep on living with a little bit of uncertainty Wink
Reply

Logout Mark Read Team Forum Stats Members Help
Strange GUI freeze with polling threads0