[PlanetCCRMA] CCRMA FC3 smp kernel crashing

Fernando Lopez-Lezcano nando@ccrma.Stanford.EDU
Tue Jul 19 17:05:02 2005


On Tue, 2005-07-19 at 16:56, pirrone wrote:
> System is a Dell 470 dual 3.0 GHz Xeons with 1 GB RAM and dual SCSI 
> drives in RAID 0, a PCIe nVidia Quadro NVS and Audigy 2 running a 
> previously stable Fedora Core 3. 
> 
> Just updated kernel to smp-2.6.11-0.3.rdt.rhfc3.ccrma with all other 
> updates installed and have been getting sudden hard lockups without any 
> particular pattern - browsing the Web, working in apps, in a terminal 
> window with MC open, etc.  No apparent pattern at all.  No problem with 
> the uniprocessor kernels.
> 
> I'm attaching a file extracted from /var/log/messages (and this log 
> entry does not always occur when system crashes, it's appeared twice, 
> other times there is nothing relating to the crash at all) with some 
> unrelated lines removed with spaces left. 
> 
> I tried to stimulate the system in the few ways I could when it was 
> unresponsive, like plugging in a USB flash drive, and unplugging the UBB 
> modem as I was online last time it happened, and those events were 
> registered into the log file, though there was no sign on the screen - 
> Fluxbox with gKrellM up in the corner.  The display and input devices 
> were completely frozen.
> 
> What say Fernando or anyone, seen or heard of this happening with SMP, 
> this kernel, realtime, nVidia, whatever?  Got a fix?

Maybe. SMP with the preempt patch is (or was?) tricky. The latest
"official" released kernel is the one you are trying but that is very
very old. You should try the newest more experimental kernels that live
in the "planetedge" repository. To access them add a line to your apt
sources configuration file that is like the planetcore one, except that
"planetcore" is replaced by "planetedge". "apt-get update" and then
install planetccrma-core-edge-smp (and then disable the planetedge repo
in your configuration file until you need it again). The latest one
there should be 2.6.12-0.21.rdt.

-- Fernando

> Jul 19 14:07:16 dell470 kernel: irq 169: nobody cared!
> Jul 19 14:07:16 dell470 kernel:  [<c0147af4>] __report_bad_irq+0x24/0x80 (8)
> Jul 19 14:07:16 dell470 kernel:  [<c0147c0e>] note_interrupt+0x8e/0xb0 (20)
> Jul 19 14:07:16 dell470 kernel:  [<c0147857>] do_hardirq+0xe7/0x100 (24)
> Jul 19 14:07:16 dell470 kernel:  [<c0147983>] do_irqd+0x113/0x1e0 (28)
> Jul 19 14:07:16 dell470 kernel:  [<c0147870>] do_irqd+0x0/0x1e0 (40)
> Jul 19 14:07:16 dell470 kernel:  [<c013ad25>] kthread+0xa5/0xf0 (4)
> Jul 19 14:07:16 dell470 kernel:  [<c013ac80>] kthread+0x0/0xf0 (16)
> Jul 19 14:07:16 dell470 kernel:  [<c0102355>] kernel_thread_helper+0x5/0x10 (16)
> Jul 19 14:07:16 dell470 kernel: ---------------------------
> Jul 19 14:07:16 dell470 kernel: | preempt count: 00000002 ]
> Jul 19 14:07:16 dell470 kernel: | 2-level deep critical section nesting:
> Jul 19 14:07:16 dell470 kernel: ----------------------------------------
> Jul 19 14:07:16 dell470 kernel: .. [<c0340876>] .... _raw_spin_lock_irqsave+0x16/0x90
> Jul 19 14:07:16 dell470 kernel: .....[<00000000>] ..   ( <= 0x0)
> Jul 19 14:07:16 dell470 kernel: .. [<c013bd5d>] .... print_traces+0xd/0x40
> Jul 19 14:07:16 dell470 kernel: .....[<00000000>] ..   ( <= 0x0)
> Jul 19 14:07:16 dell470 kernel: 
> Jul 19 14:07:16 dell470 kernel: handlers:
> Jul 19 14:07:16 dell470 kernel: [<c02a7d30>] (usb_hcd_irq+0x0/0x70)
> Jul 19 14:07:16 dell470 kernel: [<c02a7d30>] (usb_hcd_irq+0x0/0x70)
> Jul 19 14:07:16 dell470 kernel: [<f93f41ed>] (nv_kern_isr+0x0/0x6a [nvidia])
> Jul 19 14:07:16 dell470 kernel: Disabling IRQ #169
> 
> Jul 19 14:07:24 dell470 kernel: NVRM: Xid: 16, Head 00000000 Count 000346ff
> Jul 19 14:07:28 dell470 kernel: NVRM: Xid: 8, Channel 0000001e
> Jul 19 14:07:32 dell470 kernel: NVRM: Xid: 16, Head 00000000 Count 00034700
> Jul 19 14:07:36 dell470 kernel: NVRM: Xid: 8, Channel 0000001e
> Jul 19 14:07:40 dell470 kernel: NVRM: Xid: 16, Head 00000000 Count 00034701
> Jul 19 14:07:44 dell470 kernel: NVRM: Xid: 8, Channel 0000001e
> Jul 19 14:07:48 dell470 kernel: NVRM: Xid: 16, Head 00000000 Count 00034702
> Jul 19 14:07:52 dell470 kernel: NVRM: Xid: 8, Channel 0000001e
> Jul 19 14:07:56 dell470 kernel: NVRM: Xid: 16, Head 00000000 Count 00034703
> Jul 19 14:08:00 dell470 kernel: NVRM: Xid: 8, Channel 0000001e
> Jul 19 14:08:04 dell470 kernel: NVRM: Xid: 16, Head 00000000 Count 00034704
> Jul 19 14:08:08 dell470 kernel: NVRM: Xid: 8, Channel 0000001e
> Jul 19 14:08:12 dell470 kernel: NVRM: Xid: 16, Head 00000000 Count 00034705
> Jul 19 14:08:16 dell470 kernel: NVRM: Xid: 8, Channel 0000001e
> Jul 19 14:08:20 dell470 kernel: NVRM: Xid: 16, Head 00000000 Count 00034706
> Jul 19 14:08:24 dell470 kernel: NVRM: Xid: 8, Channel 0000001e
> 
> Jul 19 14:10:13 dell470 kernel: BUG: Unable to handle kernel NULL pointer dereference at virtual address 00000000
> Jul 19 14:10:13 dell470 kernel:  printing eip:
> Jul 19 14:10:13 dell470 kernel: c01e33b7
> Jul 19 14:10:13 dell470 kernel: *pde = 34d92001
> Jul 19 14:10:13 dell470 kernel: Oops: 0000 [#1]
> Jul 19 14:10:13 dell470 kernel: PREEMPT SMP 
> Jul 19 14:10:13 dell470 kernel: Modules linked in: usb_storage(U) ipt_TCPMSS(U) ipt_limit(U) ip_nat_irc(U) ip_nat_ftp(U) iptable_mangle(U) ipt_LOG(U) ipt_MASQUERADE(U) iptable_nat(U) ipt_TOS(U) ipt_REJECT(U) ip_conntrack_irc(U) ip_conntrack_ftp(U) ipt_state(U) ip_conntrack(U) ppp_deflate(U) zlib_deflate(U) ppp_async(U) crc_ccitt(U) ppp_generic(U) slhc(U) nvidia(U) snd_seq(U) realtime(U) commoncap(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) sunrpc(U) iptable_filter(U) ip_tables(U) video(U) container(U) button(U) battery(U) ac(U) cdc_acm(U) ohci1394(U) ieee1394(U) uhci_hcd(U) ehci_hcd(U) snd_emu10k1(U) snd_rawmidi(U) snd_seq_device(U) snd_ac97_codec(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd_util_mem(U) snd_hwdep(U) snd(U) soundcore(U) e1000(U) floppy(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) ata_piix(U) libata(U) aacraid(U) sd_mod(U) scsi_mod(U)
> Jul 19 14:10:13 dell470 kernel: CPU:    2
> Jul 19 14:10:13 dell470 kernel: EIP:    0060:[<c01e33b7>]    Tainted: P      VLI
> Jul 19 14:10:13 dell470 kernel: EFLAGS: 00010246   (2.6.11-0.3.rdt.rhfc3.ccrmasmp) 
> Jul 19 14:10:13 dell470 kernel: EIP is at get_kobj_path_length+0x27/0x40
> Jul 19 14:10:13 dell470 kernel: eax: 00000000   ebx: 00000000   ecx: ffffffff   edx: ffffffff
> Jul 19 14:10:13 dell470 kernel: esi: 00000001   edi: 00000000   ebp: f71f47b8   esp: f4927de0
> Jul 19 14:10:13 dell470 kernel: ds: 007b   es: 007b   ss: 0068   preempt: 00000001
> Jul 19 14:10:13 dell470 kernel: Process pppd (pid: 4880, threadinfo=f4926000 task=f773d7b0)
> Jul 19 14:10:13 dell470 kernel: Stack: 000000d0 f71f4794 f703a198 f71f47b8 c01e344c 00000246 c03b1eb0 f71f4794 
> Jul 19 14:10:13 dell470 kernel:        f703a198 f703ad0c c0261604 00000246 ffffffff ffffffff fffffffd 0000000a 
> Jul 19 14:10:13 dell470 kernel:        081a2bd7 f7e5d429 c03607f2 00000000 00000000 00000000 c03b1eb0 f703ad14 
> Jul 19 14:10:13 dell470 kernel: Call Trace:
> Jul 19 14:10:13 dell470 kernel:  [<c01e344c>] kobject_get_path+0x1c/0x70 (20)
> Jul 19 14:10:13 dell470 kernel:  [<c0261604>] class_hotplug+0x64/0x1b0 (24)
> Jul 19 14:10:13 dell470 kernel:  [<c01e40fb>] kobject_hotplug+0x1db/0x2d0 (64)
> Jul 19 14:10:13 dell470 kernel:  [<c01aa0a9>] sysfs_hash_and_remove+0xb9/0xf0 (60)
> Jul 19 14:10:13 dell470 kernel:  [<c01e381d>] kobject_del+0xd/0x20 (20)
> Jul 19 14:10:13 dell470 kernel:  [<c0261ab1>] class_device_del+0xa1/0xc0 (8)
> Jul 19 14:10:13 dell470 kernel:  [<c0261ad8>] class_device_unregister+0x8/0x10 (24)
> Jul 19 14:10:13 dell470 kernel:  [<f8ae34f2>] acm_tty_close+0xb2/0xf0 [cdc_acm] (8)
> Jul 19 14:10:13 dell470 kernel:  [<c022d88f>] release_dev+0x6af/0x7e0 (16)
> Jul 19 14:10:13 dell470 kernel:  [<c0103edd>] handle_signal+0xed/0x120 (16)
> Jul 19 14:10:13 dell470 kernel:  [<c0103fcb>] do_signal+0xbb/0x150 (28)
> Jul 19 14:10:13 dell470 kernel:  [<c0340fb8>] lock_kernel+0x28/0x50 (100)
> Jul 19 14:10:14 dell470 kernel:  [<c022de6f>] tty_release+0xf/0x20 (44)
> Jul 19 14:10:14 dell470 kernel:  [<c016bb16>] __fput+0x156/0x1a0 (8)
> Jul 19 14:10:14 dell470 kernel:  [<c016a168>] filp_close+0x48/0x90 (24)
> Jul 19 14:10:14 dell470 kernel:  [<c0104179>] sysenter_past_esp+0x52/0x75 (20)
> Jul 19 14:10:14 dell470 kernel: ---------------------------
> Jul 19 14:10:14 dell470 kernel: | preempt count: 00000002 ]
> Jul 19 14:10:14 dell470 kernel: | 2-level deep critical section nesting:
> Jul 19 14:10:14 dell470 kernel: ----------------------------------------
> Jul 19 14:10:14 dell470 kernel: .. [<c0340876>] .... _raw_spin_lock_irqsave+0x16/0x90
> Jul 19 14:10:14 dell470 kernel: .....[<00000000>] ..   ( <= 0x0)
> Jul 19 14:10:14 dell470 kernel: .. [<c013bd5d>] .... print_traces+0xd/0x40
> Jul 19 14:10:14 dell470 kernel: .....[<00000000>] ..   ( <= 0x0)
> Jul 19 14:10:14 dell470 kernel: 
> Jul 19 14:10:14 dell470 kernel: Code: 00 00 00 00 55 ba ff ff ff ff 89 c5 57 56 be 01 00 00 00 53 31 db 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 8b 7d 00 89 d1 89 d8 <f2> ae f7 d1 49 8b 6d 24 8d 74 31 01 85 ed 75 e9 5b 89 f0 5e 5f