Muffinresearch Labs by Stuart Colville

Ubuntu: BUG: soft lockup – CPU#0 stuck for 11s! | Comments (6)

Posted in Code, Linux/Unix on 20th August 2008, 2:51 pm by Stuart

So this post falls neatly under the “watching paint dry” category, so if you’re susceptible to rage induced by reading terminally dull posts avert your eyes now.

I’ve been having some interesting issues with VMs running on VMware server of late. Basically the problem involved kernel panics and the appearance of lots of messages in the syslog that look like this:

[19133.298838] BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0]
[19133.298929]
[19133.298932] Pid: 0, comm: swapper Not tainted (2.6.24-16-server #1)
[19133.298936] EIP: 0060:[native_safe_halt+0x2/0x10] EFLAGS: 00000246 CPU: 1
[19133.298942] EIP is at native_safe_halt+0x2/0x10
[19133.298945] EAX: 00000000 EBX: 00000001 ECX: c0106f40 EDX: cdde4000
[19133.298949] ESI: c0495004 EDI: c049b300 EBP: 00000000 ESP: cdde5fa4
[19133.298952]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[19133.298956] CR0: 8005003b CR2: b7f51828 CR3: 0049c000 CR4: 000006b0
[19133.298960] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[19133.298963] DR6: ffff0ff0 DR7: 00000400
[19133.298966]  [default_idle+0x3c/0x60] default_idle+0x3c/0x60
[19133.298972]  [cpu_idle+0x73/0xd0] cpu_idle+0x73/0xd0
[19133.298987]  =======================
[19133.302433] BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0]

I found this bug which noted that adding the kernel parameter “noapic” would resolve the issue. In case you’re wondering at this point what apic is (well you read this far), the this wiki page details it quite nicely: http://wiki.linuxquestions.org/wiki/APIC

This worked for two out of the three vms but the third was still showing the same error messages.

I thought I’d update to the newest verison of the 2.6.24.19-server kernel

Following a quick:

sudo apt-get update
sudo apt-get upgrade

Followed by a reboot to pick up the newest edition of the 2.6.24.19-server kernel the messages have gone away.

What’s not clear to me is what is the cause of these issues as they seemed to appear out of no where though I suspect they may have been related to the cpufrequency changes to prevent time drift.

Post Tools

Comments: Add yours

1. On January 4th, 2009 at 8:34 pm BUG: soft lockup - CPU#0 stuck for 11s! | Noah's Randomness said:

[...] Well I started to do some digging. First I just googled the error message and came up with this post. [...]

2. On February 4th, 2009 at 12:57 am Kahn said:

I seem to have run into this bug also however running 2.6.24-19-generic on vmware server 1.0.8. Looks like i’ll give noapic a try and go from there

3. On February 13th, 2009 at 9:29 am Unclown said:

Not sure if this is relevant:

Feb 13 07:11:31 Eeyore kernel: [    0.000000] Linux version 2.6.20-17-generic (root@terranova) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP Wed Aug 20 16:47:34 UTC 2008 (Ubuntu 2.6.20-17.39-generic)

[   24.697338] CPU0: Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz stepping 06
[   24.787921] CPU1: Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz stepping 06

Feb 12 15:53:35 Eeyore kernel: [27003.130001] BUG: soft lockup detected on CPU#1!
Feb 12 15:53:35 Eeyore kernel: [27003.130023]  [softlockup_tick+156/240] softlockup_tick+0x9c/0xf0
Feb 12 15:53:35 Eeyore kernel: [27003.130463]  [run_timer_softirq+303/416] run_timer_softirq+0x12f/0x1a0
Feb 12 15:53:35 Eeyore kernel: [27003.130475]  [__do_softirq+130/256] __do_softirq+0x82/0x100
Feb 12 15:53:35 Eeyore kernel: [27003.130485]  [do_softirq+85/96] do_softirq+0x55/0x60
Feb 12 15:54:21 Eeyore kernel: [27048.573537] BUG: soft lockup detected on CPU#1!

Feb 13 07:11:31 Eeyore kernel: [    2.397143] ACPI (exconfig-0455): Dynamic SSDT Load - OemId [   AMI] OemTableId [  CPU1PM] [20060707]
Feb 13 07:11:31 Eeyore kernel: [    2.397478] ACPI: Processor [CPU1] (supports 8 throttling states)
Feb 13 07:11:31 Eeyore kernel: [    2.397755] ACPI (exconfig-0455): Dynamic SSDT Load - OemId [   AMI] OemTableId [  CPU2PM] [20060707]
Feb 13 07:11:31 Eeyore kernel: [    2.398048] ACPI: Processor [CPU2] (supports 8 throttling states)
Feb 13 07:11:31 Eeyore kernel: [    2.398054] ACPI Exception (acpi_processor-0677): AE_NOT_FOUND, Processor Device is not present [20060707]
Feb 13 07:11:31 Eeyore kernel: [    2.398060] ACPI Exception (acpi_processor-0677): AE_NOT_FOUND, Processor Device is not present [20060707]
Feb 12 15:54:21 Eeyore kernel: [27048.573561]  [softlockup_tick+156/240] softlockup_tick+0x9c/0xf0
Feb 12 15:54:21 Eeyore kernel: [27048.573674]  [__do_softirq+130/256] __do_softirq+0x82/0x100
Feb 12 15:54:21 Eeyore kernel: [27048.573683]  [ksoftirqd+0/240] ksoftirqd+0x0/0xf0
Feb 12 15:54:21 Eeyore kernel: [27048.573687]  [do_softirq+85/96] do_softirq+0x55/0x60
Feb 12 15:54:21 Eeyore kernel: [27048.573693]  [ksoftirqd+115/240] ksoftirqd+0x73/0xf0

Apart from this I also experienced:

My machine appeared to have locked-up – every 15 to 20 seconds it would unfreeze though… then I could do things for a few seconds (i.e. change from X11 GUI to console (terminal)), then it would freeze again.

I was running GKrellM (http://gkrellm.net), and CPU usage was NIL on Core1, and high on Core0.

Ambient temperature was around 30 Degrees Celsius, and Humidity was approaching 100%.
CPU according to the BIOS was ~62 Degrees Celsius.

From what I can gather on Intel’s website, the CPU hardware has HW thermal protection:

The Intel Core Duo system also implements hardware-based thermal control. Hardware-based thermal management is intended to handle abnormal thermal conditions and to protect the die from transient effects. Hardware-based thermal control ensures that the CPU will always operate within specified conditions. This improves reliability and allows higher performance with tighter control parameters.

Legacy thermal control features implement two externally visible signals:

  • THERMTRIP: a fixed temperature sensor to detect catastrophic thermal conditions and to shut down the system if thermal runaway occurs.
  • PROCHOT: a fixed temperature threshold that provides the DVS with a self-control mechanism that drops frequency and voltage to a new working point (a more detailed description of this mechanism can be found in [1]).

http://www.intel.com/technology/itj/2006/volume10issue02/art03_Power_and_Thermal_Management/p03_power_management.htm

I believe this may be what is causing the syslog messages?

HTH.

4. On August 9th, 2009 at 4:16 pm Natalie said:

I’ve been experiencing the same issues with Ubuntu under Xen at VPS.NET. I run ubuntu (hardy) on VMWare ESXi in production at work with no issues. I see this post is from January. Have the VMs remained stable since this update?

5. On August 9th, 2009 at 4:33 pm The migration saga continues…. – Multiply By Pi… said:

[...] and found that it may be an Ubuntu thing. Seems that other people have had the same problem with Ubuntu under VMWare.  Stuart suggested a fix which worked for 3 out of 4 of his VMs. The ubuntu forum he links to [...]

6. On February 26th, 2010 at 9:54 pm Phil said:

I’ve been having this problem using turnkey linux drupal appliance (Ubuntu 8.04 LTS) on VPS.net . It seems to happen every 12 hours or so.

I tried apt-get update / upgrade, but the problem didn’t go away. Trying with noapic added to grub now, I’ll post my findings here.

Thanks for the post, it’s a big help.







XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



Using Loggerhead with mod_wsgi|(0)

Here’s a post I wrote over on the Project Fondue Blog about our use of Loggerhead with mod_wsgi under Apache. Loggerhead is the rather nice branch viewer for bazaar branches as used on Launchpad.net.

If you’re not already subscribed to the Project Fondue blog feed then I can recommend it, as there should be some interesting posts coming out of there in the coming months (yes I’m unashamedly biased!).

Ubuntu: Turn off changing workspace with mouse wheel|(1)

I found the changing with the workspace with the mouse wheel really annoying. To disable it go to System => Preferences => CompizConfig (available if the compizconfig-settings-manager package is installed) and uncheck “Viewport Switcher” which is under the “Desktop” heading.

Photos on Flickr

© Copyright 2004-10 Stuart Colville, all rights reserved. May contain traces of Muffin. Powered by WordPress. Hosting by Slicehost.com This page was baked in 0.656s.