[PlanetCCRMA] another nvidia driver problem

Fernando Lopez-Lezcano nando at ccrma.Stanford.EDU
Sat Mar 21 17:47:55 PDT 2015


On 03/18/2015 05:29 PM, Fernando Lopez-Lezcano wrote:
> On 03/14/2015 10:43 AM, Juan Reyes wrote:
>>
>> Hi Oded, Chris,
>>
>> Thanks for the tip Chris!.
>>
>> Are you guys able to compile akmod on RT Kernels?.

Al well, two evenings installing and rebooting and I have some notes 
that might prove useful.

The end result is: Lenovo W540 with functional bumblebee + NVidia binary 
driver (argh!) on an RT patched kernel. Tested with jack + ardour3 + 
ambdec + jconvolver + RME Multiface with express card, seems to run 
fine, no xruns at 128x2. Laptop suspends and resumes (although the RME 
is not happy after that and needs to be reinserted). HDMI output to a 
second monitor works as well.

So, what did I do:

- install fc21 from usb key, with UEFI enabled

- as Chris pointed out I needed to add "i915.modeset=1 
nouveau.modeset=0" to the kernel boot line

- install finished, reboot and add those lines again to the kernel boot line

- yum upgrade to get the latest kernel and packages, reboot, add:

----
i915.modeset=1 nouveau.modeset=0 acpi_osi="!Windows 2013"
----

to the kernel boot line (and the grub configuration afterwards). The 
"acpi_osi" bit enables bbswitch to work fine afterwards - otherwise the 
nvidia card is always ON even if not used. See:

https://github.com/Bumblebee-Project/Bumblebee/issues/592

- install bumblebee, follow directions here for the NVidia binary driver:

----
http://fedoraproject.org/wiki/Bumblebee
----

After the install I'm able to run "optirun glxgears" and get the higher 
frame rate that indicates that it is being "accelerated" by the NVidia 
GPU (you can "cat /proc/acpi/bbswitch" to see if the NVidia GPU is ON or 
OFF). This is on the stock Fedora kernel. Then I installed a new 
3.18.9-rt5 experimental RT patched kernel I built overnight. Rebooting 
into this kernel works (ie: the screen is there) but the NVidia kernel 
module is not compiled and bumblebee does nothing.

You need to edit the /usr/sbin/bumblebee-install script and add 
IGNORE_PREEMPT_RT_PRESENCE=1 at the beginning of the lines that start 
with ./nvidia-installer - otherwise the nvidia installer will complain 
and not build a kernel module for an RT patched kernel (you will never 
see the error as the bumblebee script redirects all messages to /dev/null).

After that run /usr/sbin/bumblebee-install while booted into the RT 
kernel and a kernel module will be built.

It seems to run fine in light testing.

Is it worth it? I don't know. In any case, it properly turns off the 
NVidia GPU when not used, so the battery life should be much better. 
Actually the older rt kernels I'm running as I type this on fc20 do not 
turn it ON at all, and that is why the temperature of the hand rest to 
the left of the trackpad was cool with those kernels. Newer kernels 
apparently initialize more of the GPU but that is useless without 
bumblebee and optirun - the GPU is NOT physically connected to the 
display and will do nothing but eat power.

I don't know which sound applications would benefit from being run with 
optirun (ie: hardware accelerated), I tried ardour3 but did not see any 
difference, but at least it runs. But well, the ability of running with 
hardware acceleration is there if needed.

We originally got the W540 because it was pretty much the only laptop we 
could find that had an express slot which would enable us to keep 
running RME cards and had the latest generation CPUs. But the internal 
architecture is not nice for Linux and the BIOS, well... - there was an 
early version that could BRICK the MOTHERBOARD[1] if you reinstalled 
from scratch (or installed Linux). There are many things I don't like in 
this latest generation of Lenovo laptops, I'll look elsewhere next time 
I upgrade.

Enjoy!
-- Fernando

[1] 
https://forums.lenovo.com/t5/W-Series-ThinkPad-Laptops/HOWTO-Brick-a-W540-in-easy-steps/td-p/1400393

----
We found the BIOS bug that causes this problem.  The bug is in BIOS code 
that implements Intel Rapid Startup Technology.  The bug causes BIOS to 
read the wrong portion of SSD and treat it as a memory size.  Because 
the SSD data being read in is actually random data, and not a memory 
size, by bad luck the data may cause a memory allocation that is too big 
and leads to POST hang.  The other part of the bug, is that the too-big 
memory size is stored in BIOS NVRAM so power cycle (and even 
disconnect/reconnect CMOS battery) doesn't clear it.  Unfortunately the 
only solution is to replace planar, but if you try to reuse the same SSD 
with same partitions and same data, the problem will happen again.
----

PS: things I tried and could not make work... (MANY!)

== bumblebee + nouveau:

I could not make this work. Trying to run optirun gives me an error, 
specifying the card name and pci id in a custom xorg.conf gets me 
further but then there is another error. Bumblebee wants to have a 
driver configured but that does not solve the problem. Trying optirun 
gives me a "secondary GPU card not detected or something similar". I 
gave up.

== nvidia + xrandr

Apparently there is also a way to just install the NVIdia driver alone 
and use it to render without installing bumblebee. You must make sure 
the proper driver is started by X (ie: the Intel driver and NOT the 
NVidia driver). And then there is an X configuration trick to use xrandr 
to use the NVidia card to render and then the Intel card to show the 
results in the screen. A pain.

http://us.download.nvidia.com/XFree86/Linux-x86/331.79/README/optimus.html

And then the trick:

http://us.download.nvidia.com/XFree86/Linux-x86/331.79/README/randr14.html

Not worth the pain, I think.

== nvidia alone

And somebody claims he is able to run the NVidia driver on a w540:
http://askubuntu.com/questions/504078/w540-14-04-nvidia-driver-no-work
(apparently just the driver, no bumblebee or anything else)

He says he managed to make things work in one machine and tried to 
replicate this on another and failed. Weird. So I went down that rabbit 
hole with no result to show. Maybe a different hardware revision of the 
w540 that has a mux for the GPU? A different BIOS? (the BIOS on the 
newer Lenovos are apparently a never ending source of problems for 
"exotic" operating systems like Linux).



More information about the PlanetCCRMA mailing list