[PlanetCCRMA] FC4 SMP kernels won't boot

Fernando Lopez-Lezcano nando@ccrma.Stanford.EDU
Thu Nov 24 14:09:00 2005


On Wed, 2005-11-23 at 23:55 -0800, Michael Gurevich wrote:
> Hi,
> 
> I just did a fresh install of FC4 and PlanetCCRMA. The machine is a little 
> weird - it has Dual Xeons with HT enabled and a SCSI BIOS/Boot ROM 
> thingie whose purpose I don't really understand. But we get along alright 
> and I don't think this is related to the problem. The SCSI HD has WinXP on 
> it, but I can dual boot from GRUB on an IDE drive which also has the FC4 
> install. 
> 
> The boot seems to start okay, gets to runlevel 5 and freezes on:
> 
> Bringing up interface eth0
> 
> Just now I left it at this stage for about 5 minutes and got:
> BUG: Unable to handle kernel NULL pointer at virtual address 0000008
> then it went all Matrix on me with junk scrolling down the screen. If I 
> do the Fedora interactive startup and skip loading the module the rest of 
> the boot goes normally. I know this isn't very helpful to diagnose what 
> the problem is but I'm not sure where to look. Is it possible to log what 
> happens in the kernel during a failed boot? /var/log/dmesg doesn't contain 
> anything noteworthy and /var/log/boot.log is empty.

Most probably you are hitting a kernel panic while loading the ethernet
kernel module, that will not be logged to disk as the kernel does
nothing else after the panic. It is also possible to dump things to a
serial console (to other machine). 
> 
> This happens with the smp kernels 2.6.13-0.3.rdt.rhfc4.ccrmasmp and 
> 2.6.12-0.21.rdt.rhfc4.ccrmasmp but the uni-processor versions of these 
> kernels boot and run smoothly.

Hmmm, not much to say except that it has to be a kernel bug (oh boy, am
I bright today!). 

What driver is eth0 using? Look into /etc/modprobe.conf. That should be
the culprit. What happens if you disable HT in the BIOS? (most probably
the same thing, of course). 

I'm getting ready to release a more up to date kernel - I just need to
properly patch Jack so that it works without using the TSC ever for
timing, or just in the case of dual core X1 Athlons. Stay tuned (or
contact me off the list if you want to be a guinea pig for it). 

-- Fernando