Package Details: nvidia-l-pa 352.09-2

Package Base: nvidia-l-pa
Description: NVIDIA driver for linux (for Linux-l-pa)
Upstream URL: http://www.nvidia.com/
Category: modules
Licenses: custom:NVIDIA
Conflicts: nvidia-173xx, nvidia-96xx
Submitter: Ninez
Maintainer: Ninez
Last Packager: Ninez
Votes: 1
First Submitted: 2013-12-04 00:56
Last Updated: 2015-05-29 16:19

Dependencies (6)

Required by (0)

Sources

Latest Comments

Comment by unknown78

2015-05-30 01:32

@Ninez thanx alot for taking the time. Just updated. It compiled fine and seem to run very well. At least i can't see any issues on my 970 right of now, but need to test longer.

Comment by Ninez

2015-05-29 16:22

@unknown78 - I've updated the package and enabled both patches again.

wbindvd just need an adjustment. seems to compile fine now

nvidia-rt_mutexes need a bunch of rebasing.

Can you please let me know how it works for you... my nvidia box currently isn't in use, so I can't really test it just now :\

if you have any problems, try disabling the patches and rebuilding. ttyl

Comment by unknown78

2015-04-26 16:09

some feedback, without wbindvd patch everything is fine. With wbindvd patch you can't currently compile:

-> apply nvidia-rt_explicit.patch
checking file nv-linux.h
Hunk #1 succeeded at 323 (offset 11 lines).
Hunk #2 succeeded at 911 (offset 23 lines).
-> apply nvidia-rt-no-wbinvd.patch
patching file nv-linux.h
Hunk #1 succeeded at 423 with fuzz 2 (offset 13 lines).
patching file nv-pat.c
Hunk #1 FAILED at 34.
Hunk #2 succeeded at 43 with fuzz 1.
1 out of 2 hunks FAILED -- saving rejects to file nv-pat.c.rej

Thanx so much and great work as always.

Comment by Ninez

2015-04-25 01:32

[finally] updated for 3.18-l-pa [which i've just made available].

the wbinvd patch is disabled [may cause problems, but gets rid of high latency spikes], so is rt-mutexes patch [currently broken]...

Comment by unknown78

2015-01-24 21:56

Thanx alot ninez. I love your kernel :)

Comment by Ninez

2015-01-23 20:31

updated to 346.35

@unknown78 - I plan on fixing all of those patches on sunday. just updated for now, for a working module. [as of late i haven't been using mutex and kbuild patch doesn't affect runtime]...



Comment by Ninez

2015-01-23 20:31

updated to 346.35

@unknown78 - I plan on fixing all of those patches on sunday. just updated for now, for a working module. [as of late i haven't been using mutex and kbuild patch doesn't affect runtime]...



Comment by unknown78

2015-01-19 21:43

:) the last update i could fix the pkgbuild myself.
For 346.35 the patch failes. (nv-linux.h) Hunk #4 failes and a rebase with new offsets might be usefull too ?

Comment by Ninez

2014-12-31 02:07

updated to 343.36 ... sorry for the really late update. I'll be speedy on the next one.

Comment by Ninez

2014-10-13 12:53

you're welcome. Sorry i was a bit late with the update! ;)

Comment by unknown78

2014-10-13 10:36

Thanx alot ninez.

Comment by Ninez

2014-10-12 01:24

updated to 343.22

Comment by unknown78

2014-10-06 21:26

Thanx much ninez for your continued work on this wonderfull kernel and the enviroment around.

Comment by Ninez

2014-08-15 17:38

got home early from work - updated to 340.32 :)

Comment by Ninez

2014-08-15 03:07

I won't have time to update this until tomororw [to 340.32]. It's late here and i have to be up early, but as soon as i am home tomorrow, from work ~ i will get this updated :) thx

Comment by Ninez

2014-08-03 00:23

@pavlinux;

1. "provide no difference" to what?! - please speak more clearly, I have no idea what you are trying to say... If you are talking in terms of latency [?] only wbinvd patch will affect that [on PREEMPT_RT_FULL].

2. For someone who was asking me for more detail, just yesterday - you have given me ZERO details. I have no idea what nvidia card you have, or any other H/W details - regarding X failing to start, you haven't even provided any logs, etc. So i am not sure how you expect me to help you... that being said - you are the 1st person i know of that X has failed for - so i am very curious about your setup. [because if that patch is problematic on some h/w, i won't make it default] ...but note: nvidia-l-pa works on all of my machines, just fine [and other people that i know of]... This isn't a old/semi-borked box is it?

3. Which kernel are you using? [uname -a] You say 3.14.15 - but i don't support any kernel package but linux-3.14.12-rt9-l-pa [and older versions]. soo...? .. not only that but i don't see an incremental -rt patch for that kernel either [and i don't support mainline].

Comment by pavlinux

2014-08-02 22:58

... remaining patches provide no difference

Comment by pavlinux

2014-08-02 22:57

My x86_64, 3.14.15 kernel, does not start Xorg with nvidia-rt_mutexes.patch

Comment by Ninez

2014-08-01 03:34

@ pavlinux - sorry, that was a small mistake and i thought i had already updated to fix it, but i guess i didn't. [anyway, i just sat down in front of the AMD/Nvidia box, had a look. so..]

As far as semaphores vs. mutexes. Yes, I am aware of the differences, but I am also not the 1st [nor the last] to be using mutexes over semaphores in the nv driver. In my case, It solves all of the scheduling while atomic bugs on all of my nvidia/blob machines. I was also getting random hangs on one machine without it too. If you prefer not to use that patch - then comment it out. As far as writing you docs/descriptions. I don't see a problem with the ones that are there.

'nvidia-rt_explicit.patch' - sets CONFIG_PREEMPT_RT_FULL
'nvidia-rt_no_wbinvd.patch' - disables/noop's wbinvd() - which is a source of huge latency on nvidia/rt. [disabled/warning though, since it's a no-op!]
'nvidia-rt_mutexes.patch' - swaps out semaphore code with mutex code, which solves "atomic while scheduling bugs" in the driver [...possibly at the cost of a little more overhead.]
'nvidia_kbuildCommands_verbose.patch' - makes the kbuild commands/output more verbose.
'nvidia_useLDbfd_avoid_LDgold.patch' - avoids ld.gold linker and used ld.bfd.

[+more in the pkgbuild].

I don't know what more you want?

Comment by pavlinux

2014-08-01 01:23

Where a full description of what each patch do, not something that they replace?

Comment by pavlinux

2014-08-01 01:08

1. You check your patches?

# patch -Np1 --dry-run -i /usr/src/NVIDIA/nvidia-l-pa/nvidia-rt_mutexes.patch
patching file nv.c
Hunk #1 succeeded at 147 (offset 2 lines).
Hunk #2 succeeded at 1282 (offset 4 lines).
Hunk #3 succeeded at 1486 (offset 4 lines).
Hunk #4 succeeded at 1535 (offset 4 lines).
Hunk #5 succeeded at 1580 (offset 4 lines).
Hunk #6 succeeded at 1652 (offset 4 lines).
Hunk #7 succeeded at 1806 (offset 4 lines).
Hunk #8 succeeded at 1914 (offset 4 lines).
Hunk #9 succeeded at 1934 (offset 4 lines).
Hunk #10 succeeded at 1959 (offset 4 lines).
Hunk #11 succeeded at 1971 (offset 4 lines).
Hunk #12 succeeded at 2346 (offset 1 line).
Hunk #13 succeeded at 2928 (offset 20 lines).
patching file nv-linux.h
Hunk #1 succeeded at 137 (offset -2 lines).
Hunk #2 FAILED at 942.
Hunk #3 succeeded at 1579 with fuzz 2 (offset 78 lines).
Hunk #4 succeeded at 1645 (offset 81 lines).
1 out of 4 hunks FAILED -- saving rejects to file nv-linux.h.rej
patching file nv-mmap.c
Hunk #1 succeeded at 217 (offset -16 lines).
Hunk #2 succeeded at 357 (offset -9 lines).
patching file os-interface.c
patching file nv-frontend.c
patching file nv-procfs.c
---

2. You are 100% sure that the mutex and semaphore interchangeable?
RTFM: http://www.geeksforgeeks.org/mutex-vs-semaphore/

Comment by Ninez

2014-07-13 13:37

updated to 340.xx series.

Comment by Ninez

2014-07-13 12:36

@unknown78 - hey man. i ended up passing out / not getting to update. However, I am updating my system right now, soi will rebuild nvidia / update nvidia-l-pa, in the next little while

Comment by unknown78

2014-07-13 12:06

:) :) :) should i put more smilies Ninez. Man you doing a wonderfull job :)

Comment by Ninez

2014-07-12 11:17

Ill have this updated later this afternoon. [I have to work first though]

Comment by Ninez

2014-06-03 12:07

-2 just fixes a mistake in the pkgbuild. [i forgot to disable no_wbinvd patch]

Comment by Ninez

2014-06-03 11:52

updated to 337.25

Comment by Ninez

2014-06-02 21:43

I'll have this updated to match Arch repo in the next couple of hours

Comment by Ninez

2014-05-21 01:35

no changes, just updating to match nvidia in Arch repos. [and a linux-l-pa update is also sure to be coming very soon too].

Comment by unknown78

2014-04-21 07:28

Not sure when this messages first appeared. Useally only have journalctl running when i test and all the delay messages destroyed older logs.
Anyway i rebuilded your new kernel and need more testing but maybe your softirq changes helped (at least it was all stable all morning). Need to test longer.

Comment by Ninez

2014-04-20 18:40

okay, so there is something funny going on in linux 3.14 in general, not just linux-l-pa, by the look of it... I'm not sure there is much that i can do about it. the device + kernel combo is definitely buggy [you didn't have this problem on linux-3.12 right?].

Comment by unknown78

2014-04-19 21:17

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1191603 << something i found

Ninez is a good question if it improve the situation. The problem i can't reproduce it. Overall the mainline kernel doesn't seem to have the the complete kernel trace but it has the following messages:

Apr 19 20:45:44 archbox pulseaudio[3672]: [alsa-sink-USB Audio] alsa-sink.c: ALSA weckte uns auf, um neue Daten auf das Gerät zu schreiben, doch es gab nichts zum Schreiben!
Apr 19 20:45:44 archbox pulseaudio[3672]: [alsa-sink-USB Audio] alsa-sink.c: Dies ist höchstwahrscheinlich ein Fehler im ALSA-Treiber 'snd_usb_audio'. Bitte melden Sie diesen Fehler den ALSA-Entwicklern.
Apr 19 20:45:44 archbox pulseaudio[3672]: [alsa-sink-USB Audio] alsa-sink.c: Wir wurden durch das POLLOUT-Set geweckt, allerdings lieferte ein anschliessender snd_pcm_avail() den Wert 0 oder einen anderen

^^ sorry for the german --> here in english some bugreport which could be related https://bugs.launchpad.net/ubuntu/+source/pulseaudio/+bug/1284415.

Comment by Ninez

2014-04-19 20:28

hmmm... i would see if mainline/Archlinux kernel does the same thing (?)

did disabling NO_HZ_FULL at least improve the situation - or no change at all??

I would google your errors a bit and see if you can turn up anything of interest - i'll probably have a look around too, in a bit.

Comment by unknown78

2014-04-19 18:24

Sad news :/
I was working today with qt-creator. And the problems with the soundcard was orcuring again. But i got more informations in the log. http://sprunge.us/dHcE

After this the delay line keeps repeating.

Comment by Ninez

2014-04-18 18:31

@unknown78 - thx. I am glad everything is working :) - on major/new kernel releases [especially on PREEMPT_RT_FULL] there are always issues that crop up like this, I'm just glad that we were ale to identify NO_HZ_FULL as the problem, as it really seems to be in bad shape for 3.14-rt. [so thank you for your help!]...and yeah, my one machine was just under 2 days uptime - before rebooting today, before rebooting.

That is also good to know that nvidia is serving you well [since i enabled the mutexes patch in 377.12]. also, very helpful info... at some point, if you like try out the nowbinvd patch and let me know if it works. if it does/doesn't let me know. [it's worthwhile to check, as if it works - nvidia won't throw determinism out the window [adding 1000ns delays in cyclictest. with the patch applied, that number drops to double-digits]...

Comment by unknown78

2014-04-18 18:17

Ok linux-l-pa & nvidia-l-pa (all default) runs stable since hours now.
So thanx again ninenz you did an awesome job (also i bookmarked the patch if ever needed).

Comment by Ninez

2014-04-18 03:03

@unknown78 - i did a quick dig, about your USB problem.

In linux-313 they removed nrpacks, this is the patch; http://pastebin.com/P3NRwriZ you could reverse the patch on your own sources to test and see if the problem goes away. [?] i don't know, it sounds like, from reading the commit - that the patch was supposed to have more correct/better behavior - maybe your's is a corner-case to still be ironed out?

Comment by Ninez

2014-04-18 02:09

you may think it changed something - but i *know* it did bro! ;)

The new no_hz code is cidey on PREEMPT_RT_FULL, at best. also, by re-introducing the threadsirq stuff - i have fundamentally changed how are systems are running on -rt... But do test more and lets me know - as i am having some interactions, relating to this - that is spanning a few important mailing list - and you, like myself and others - have been affected by this stuff - so it's important that upstream is aware [even if i have already made some good choices, as i have been weeding out problems, making improvements / different design choices]...

so ya, test more - let me know what blows up ;) [and good news too... but fyi; this should be the best -rt kernel in a long while, like 2.6.33-rt good :) ...which this last release is very similar too].

Comment by unknown78

2014-04-18 02:01

Ninez you get a big thumb's up. I think removing all the NO-HZ stuff did change something.

Useally once i get delay messages they don't stop. Now i got only 1. Need to use it longer and test more :)

Comment by Ninez

2014-04-18 00:30

I'm not sure. snd_usb_audio driver does get a lot of use, maybe the device just has some quirkyness to it... yeah, i thik nrpacks disappeared recently [i use to use it].

fyi - I pushed my kernel to AUR. I'm sick/down with NO_HZ_FULL and this kernel is just better.

Comment by unknown78

2014-04-18 00:09

Here are the information of the soundcard http://sprunge.us/KFPJ.

I had hope nrpacks=1 still would work, but modinfo doesn't show this param anymore.

modinfo snd_usb_audio | grep parm
parm: index:Index value for the USB audio adapter. (array of int)
parm: id:ID string for the USB audio adapter. (array of charp)
parm: enable:Enable USB audio adapter. (array of bool)
parm: vid:Vendor ID for the USB audio device. (array of int)
parm: pid:Product ID for the USB audio device. (array of int)
parm: device_setup:Specific device setup (if needed). (array of int)
parm: ignore_ctl_error:Ignore errors from USB controller for mixer interfaces. (bool)
parm: autoclock:Enable auto-clock selection for UAC2 devices (default: yes). (bool)

Comment by unknown78

2014-04-17 23:57

I'm not sure about NOHZ on a desktop multimedia machine too as long as it is 'kinda' experimentell.

Just to give you an idea what i tested today:
- with NOHZ=full kernel param (still had a hardlock)
- with tsched=0 pulseaudio (more stable but had another hardlock)
- what is funny i kinda agree with it that always nvidia was doing more than only painting 2D UI.
- maybe it's a combination of both high usb workload (usb sounddevice) and longer graphical tasks

I think i realy need to give the official Arch Kernel a try and see if it hardlocks too. I might also setup a ssh login and puty on my laptop. But it's 2AM over here ... need a little of a rest. So will go on tomorow

Comment by Ninez

2014-04-17 23:53

interesting. What is the [usb] soundcard? - maybe there is a documented module parameter that might help or something?

Regardless, I probably am still going to go, as planned and ditch NO_HZ_FULL + include the threadsirq patch, as it has some usefulness, for those [like myself] who want it... + going back to old tick should be more stable and deterministic than the flacky no_hz stuff anyway...

Comment by unknown78

2014-04-17 23:38

(cat /proc/meminfo ; cat /proc/meminfo) | grep KernelStack
KernelStack: 4512 kB
KernelStack: 4512 kB

Ok i think i know what might produce the hardlocks. Just no solution to avoid cause i think it's a broken kernel driver.

Basically my external usb soundcard seems to create the problems when i all the time connect the line input to the output using pulseaudio loopback.

I now use my AC97 onboard input and it seems to be stable ... but small crackles.

So overall i wouldn't put it towards your kernel, but i would be in need to test much more.

Comment by Ninez

2014-04-17 23:13

@unkown78 - can you execute;

(cat /proc/meminfo ; cat /proc/meminfo) | grep KernelStack

...and post the results to me, when you get a chance?

Comment by Ninez

2014-04-17 22:46

1) I haven't updated PA yet [and probably won't till gnome-3.12 settles down... so i can avoid a bunch of breakage - i use gnome with compiz, so IME, it's best to wait... PA 5 could be part of the problem. I won't be testing it for another week or so.].

2) I doubt the cause is BFQ [although, i am not excluding it totally, either].

3) nvidia-l-pa [the now default 'mutexified' nvidia.ko _should_ be good. I've tested it for a very long time]. nowbinvdt() patch, as i warn may cause instability on some H/W - i know of one person... but i imagine there is some cuda 6.0 UVM stuff that requires that call being there...but i don't know [yet ;)].

4). I am seriously considering disabling NO_HZ_FULL in my kernel. Even today, someone is on the linux-rt list, who reverted the same patches, based on reading my posting and saw some improvements - i don't think NO_HZ_FULL is ready for prime-time yet... In fact, 18 hours ago - i built a new kernel that i am testing with NO_HZ_FULL disabled and switched back to the more reliable "old tick" method. It may use slightly more overhead/power, but it seems more reliable and it's code paths don't seem tied into the new method[s] that seem to be a good part of the problems with linux-rt 3.10+... Not only that, but there had been a patch floating around to bring back sirq threads on 3.12-rt that i've been dying to test it out - so i'm doing just that ;) [it's optional/but builtin, you just pass 'threadsirq' to the kernel commandline to enable].

it allows the user to be able to modify softirqs with more granular control- and depending on workload, may be useful. here's what it looks like; http://pastebin.com/M70QQip5 ... [you can run that 'ps' command, on your system to compare layouts]. the patch suggests that it's possible to reduce jitter in some cases to boot... So with all of this in mind - If this new kernel works out - I will probably ditch NO_HZ in favor of this.

Also, I experienced one lockup yesterday [although, i suspect using another machine i could have ssh'd into the box], i just didn't have the time to do so. My lockup happened from video on youtube [flash, H/W accelerated] but I read in either nv dev forums, or Archlinux forums that other people were having some similar issues on linux 3.14. - this lockup was also what prompted me to rethink NO_HZ_FULL [and the NO_HZ messages, and noise on the RT list, etc]... Let me know, if you figure out something that is reproducible and/or culprit - but barring no problems, i suspect I will be changing a few things for the next update to work out stability and performance on 3.14-rt...

let me know what you think? [and i could probably push an update in a couple of hours, if you want it, specifically - since your current running kernel isn't working all that well... it's up to you]

Comment by unknown78

2014-04-17 17:58

Ok it wasn't the scheduler cfq crashes too. So now checking if the arch main kernel is stable.

Comment by unknown78

2014-04-17 16:31

Hi Ninez here some feedback towards linux-l-pa and nvidia-l-pa. I'm still in testing since i experienced hardlocks.

So far what i got:
1.) I changed back to pulseadio 4.0-6 cause with pulseaudio 5.0-1 and default-fragment-size-msec = 5 (i have a usb headset i use mainly https://bugs.archlinux.org/task/39186) in daemon.conf i get the errors mentionend in the bug and a huge amount of the following lines "kernel: delay: estimated 265, actual 45". I'm kinda sure it is not what the hardlocks produce but i wanted to fix it to be 100% sure.

2.)I enabled BFQ Scheduler. Whenever i experience a hardlock, even journald doesn't write anything down. I have the feeling that this might be the reason for the hardlocks. I changed probably 30 Minutes ago to CFQ Scheduler and i will watch if that makes it stable again.

3.) I did not notice any problems with nvidia-l-pa in my logfiles (even they where screwed up cause of the 1. point). So i think the package and default settings are fine. Now testing if it is stable with winbind patch too.

4.) I don't think i saw this NOHZ warning. But after i experienced drop outs while using loopback module on the usb device (capture my laptop pc's out and mix it with my pc's audio stream) i reogranised all my usb devices (lsusb -t) so my usb device has a full usb bus alone.

Sidenote yeah you could have remembered my config.

Comment by Ninez

2014-04-16 05:06

Archlinux pushed lib32-nvidia-utils and friends into [extra]... so we are good now [no need for links to packages] ;) [ i just noticed now ]...

Comment by Ninez

2014-04-16 05:02

okay, thanks. correct me if i am wrong - but i already knew this, no? [i was double-checking...] Also, with the AMD problems on newer linux-rt, i did want to hear some feedback from AMD users, even if to just verify that reverting thos patches allows their machines to boot...

but yeah, let me know about stability, for sure. i sometimes see the "NOHZ: local_softirq_pending 40" but otherwise things seem pretty good. [similar to 3.12-l-pa, although i haven't done huge benchmarks on it yet... I can however compare nvidia-337.12, as i first tested it on 3.12-l-pa].

Comment by unknown78

2014-04-16 04:15

Ninez this is my cpu -> AMD Phenom(tm) II X6 1090T Processor on a
GA-990FXA-UD5 with IOMMU enabled. Now need some time to play around with the new kernel and nvidia module to check if it is stable.

Comment by Ninez

2014-04-16 01:59

@unknown78 -lol. you confused me with this comment; "besides it seems that 337.12 has landed in main repository too." ... so i hit the main page, but my eyes played tricks on me - it's NOT in main, and yes, is in testing [still].

I guess i will need to link to those binaries, on this page - but still i am glad we got it worked out for you - as I wouldn't want you to be stuck on the commandline / no Xorg...

I'm not sure why you were using -Rd [pacman?], but you shouldn't need to do that. With nvidia-l-pa - you can just uninstall [ -R ], then install nvidia-utils [on update] - i do this all of the time, since i test against/switch between both [main] and nvidia-beta drivers. [you will also note in my pkgbuild that depends() has an option for nvidia-beta commented out.]...

I just gotta dig up the links to 337.12 nvidia stuff now, but thanks for confirming that 337.12 is working on linux-l-pa 3.14-rt1-2. that is helpful. btw, what is your CPU? [i'm curious].

Comment by unknown78

2014-04-15 23:03

Ok 337.12 builds now fine with nvidia-utils from testing. So all seems to be fine.

Comment by unknown78

2014-04-15 22:30

I think removing -Rd and removing all former PA directories , than reinstalling did the trick. Now waiting for all the nvidia-packages to show up. It's crazy nvidia is alread @ 337.12 but all the other packages aren't (lib32 aren't even in testing). So i think the best is to just relax.

Comment by Ninez

2014-04-15 21:37

k, i just quickly built nvidia-l-pa using yaourt/AUR and it worked fine.

I'll be home in a while.

Comment by Ninez

2014-04-15 21:32

@unknown78 - no, it doesn't mean i have found something, but it means i can actually see what you are showing me. [ showing it truncated is harder to tell where it failed ].

Comment by Ninez

2014-04-15 21:30

@unknown78 - I am heading out the door [ i actually delayed leaving to try to help sort you out, but it'll take you a while to recompile, etc and i got to get some stuff done before stores close ]. I can check my email while i am gone - so drop me a line either way [if it fails or works wonderfully!].

i shouldn't be out more than an hour or so - so if there is a problem, i should be able to address it quickly. lates

Comment by unknown78

2014-04-15 21:24

Sure you can delete and yeah i always forget about pastebin :/ my fault.
When you say "thanks for posting the entire output ..." means you have found something?

Comment by Ninez

2014-04-15 21:17

@unknown78 - yeah, i would do that, as that is the only thing i can think of... you see, release -1 of linux-l-pa didn't have the sed bit, which was handled in nvidia-l-pa -> but after some thought, i decided to switch them [as the module symbol problem exists in linux kernel], which may have led to some confusion here. which is sort of my fault, so sorry for that.

btw, thanks for posting the entire output - that was helpful. - but i am going to delete it to keep my post near the top [maybe nexttime use pastebin website].

anyway, it _should_ work, as i tested on several machines, both 344 and 337 on linux-l-pa 3.14.0_rt1-2, before updating. [plus 24hour test run periods... been using 337 since april 8th].


Comment by Ninez

2014-04-15 21:04

@unkown78 - I've updated to 337.12, but first i obviously re-built the package, installed it, reloaded nvidia, etc to test...

give it a try now. nvidia-l-pa should install fine, as i am using it on several machines of my own. [both 344 and 337 work fine here, all revisions].

Comment by unknown78

2014-04-15 20:46

1.) i got the build directory (linux-l-pa) ... uncommented the sed lines .. makepkg .. pacman -U packages
2.) reboot
3.) build nvidia-l-pa -> and i got the error

Comment by Ninez

2014-04-15 20:38

unknown78 - give me a minute - i will update to 337.12 - as it wasn't in Archlinux repo yet. Then we will revisit your issue...

but off hand;

1). don't you mean; uncomment the sed lines and THEN built linux-l-pa?

2). your 'make' command looks different, then what the package does [?]

anyway, let me push a 337.12 update, first.

cheerz

Comment by unknown78

2014-04-15 20:29

1.) made linux_l_pa and uncommented the sed lines
2.) tried to install the new nvidia-l-pa and get an error
make "CC=cc" NV_MODULE_SUFFIX= KBUILD_VERBOSE=1 -C /usr/lib/modules/3.14.0-rt1-2-l-pa/build SUBDIRS=/tmp/yaourt-tmp-mgnad/aur-nvidia-l-pa/src/NVIDIA-Linux-x86_64-334.21-no-compat32/kernel ARCH=x86_64 modules
make[1]: Entering directory '/usr/src/linux-3.14.0-rt1-2-l-pa'
make[1]: *** No rule to make target 'modules'. Schluss.
make[1]: Leaving directory '/usr/src/linux-3.14.0-rt1-2-l-pa'
Makefile:170: recipe for target 'nvidia.ko' failed
make: *** [nvidia.ko] Error 2

besides it seems that 337.12 has landed in main repository too.

Comment by Ninez

2014-04-15 17:29

Updated to 334.21-5

- I've reverted the sed command in the PKGBUILD. This is now handled in linux-l-pa [thus, update linux-l-pa before this package], where nvidia users *must* uncomment a sed command that will change __rt_mutex_init module export to EXPORT_SYMBOL [which is what it should be! ...which i have also noted in linux-l-pa comments]. This allows nvidia-uvm.ko [which arrives in 337.12] to install on PREEMPT_RT_FULL and allows my 'mutexified' version of nvidia.ko to install on PREEMPT_RT_FULL, as well - while allowing nvidia to stay a 'tainted module'.

- In this release nvidia-rt_mutexes.patch is now default. [I've been using it for months on several different H/W configurations, I've tortured those systems, etc ... and using mutexes always seems to work better + no scheduling bugs in the ring buffer.

note: now that i have fixed linux-l-pa to handle the module export change - we are ready for the 337.12 nvidia update, when it hits Archlinux :)

enjoy.

Comment by Ninez

2014-04-14 17:09

updated for linux-l-pa 3.14 ...

In this release you cannot use the nvidia-rt_mutexes.patch ... I will [hopefully] have this remedied for 337.xx drivers - just waiting to hear back from some linux-devs on an issue - also am reporting to nvidia the issue. [**as it stands you can't compile 337.xx against an RT kernel].

***IMPORTANT NOTE: you probably will want to avoid nvidia updates from Archlinux for a few days, in fact, wait until i ship nvidia-l-pa 377.12 [ as 337.xx is in testing ] which we cannot use/package until some module symbol issues are resolved. [described below].

Since, i can't package 377.xx but 344.xx is workable for 3.14-l-pa - i thought it best to update now, since it will allow nvidia users to upgrade, where if i had waited for 337.xx [nvidia-utils/libgl] to hit the repos - you would have no upgrade path for nvidia... It's a compromise, but a worthwhile one i think :)

EDIT: I have notified nvidia; https://devtalk.nvidia.com/default/topic/729914/linux/__rt_mutex_init-has-changed-to-export_symbol_gpl-causing-337-12-to-not-compile-on-preempt_rt_full/

and here to linux-rt devs; http://www.spinics.net/lists/linux-rt-users/msg11639.html

Comment by Ninez

2014-04-14 00:51

just another head's up; I've got 3.14.0-l-pa up and running;

[ninez@localhost ~]$ uname -a
Linux localhost 3.14.0-rt1-2-l-pa #1 SMP PREEMPT RT Sun Apr 13 15:28:48 EDT 2014 x86_64 GNU/Linux

...coupled with nvidia-l-pa 337.12 driver. However, this has led to an issue, as linux developers changed __rt_mutex_init from EXPORT_SYMBOL (which nvidia can use) into EXPORT_SYMBOL_GPL (which nvidia can't use), as that symbol is for GPL only modules...

One easy workaround is to change nvidia's module license in the sources to GPL - however, this isn't a proper solution / and likely couldn't be used to package the driver. - I've gone ahead and brought it up on the linux-rt user list, but may hit LKML too, as this issue needs to be resolved.

so for now, i can't update linux-l-pa or nvidia-l-pa -> but i will as soon as i can / once i can get some resolution on this.



Comment by Ninez

2014-04-09 00:19

Just a heads up - I am already using/testing the 337.xx nvidia driver (currently 337.12 released today). I've rebased all of my patches against this latest driver to prepare for when 337.xx hits a Stable release /makes it into Archlinux repo...

assuming that this driver works flawlessly for me - I will likely enable the nvidia-rt_mutexes.patch, since it has been reliable for months now for myself and a few others that have been using it. (again, without the mutex code - i get the occasional noize/backtrace. with mutexes (instead of semaphores) i experience ZERO backtraces from nvidia...so i think it is time to enable it :)

Comment by Ninez

2014-03-05 01:25

lulz. In my haste to update [and some confusion due to not enabling the mutexes patch in AUR], i forgot to transfer over [from my own nvidia-l-pa-beta] the revised (and thanks to nvidia -> simplified) nvidia-rt_mutexes.patch over to the stable series...

fixed on this 2nd, quick update ( for those using that patch...which btw, i am thinking about enabling be default soon - so if you do use it, or are willing to test - please give me some feedback! ).

Comment by Ninez

2014-03-04 02:34

a new stable release is out (334.21). I'll update when Arch is shipping a new nvidia-utils... I've been using 334.18 [beta driver] since it's release (aside from testing 331.49), so 334.21 should be a good release / work well.

cheerz

Comment by Ninez

2014-03-02 12:54

updated :)

@unknown78 - no problem. For me, it is very important that i don't push any updates that may break someone's system [and in the case of newer -rt patches, i was well-aware that it could cause problems for users, so I won't update until it is resolved... The last thing anyone wants (including myself) is to update their system, only to reboot and discover a big problem; like an oops, boot failure, deadlock, etc. For me, a bunk kernel isn't a big deal; as i keep around binaries of my previous kernels + nvidia - so i just boot into Arch's kernel and cd into that directory, then issue "sudo pacman -U *.tar.pkg.xz", installing my previous running kernel.

but obviously, i can't be sure what other users do in that sort of circumstance - so it is better to proceed with caution :) ... Anyway, i am getting ready for work - but when i get home i will test my built kernel of the latest -rt, hopefully it will boot - if not, i will continue with git-bisect on -rt8 to -rt9 and see what happens... cheerz

Comment by unknown78

2014-03-02 11:53

Thanx alot for the info and being that carefully about the runnable state of the kernel.

Comment by Ninez

2014-03-02 04:52

@unknown78 - i just got in ( a bit later than planned ). I just installed nvidia latest blob. I'd like to run it over night, even though, i am sure this release is fine, i am going to run it over night before pushing to AUR.

as far as next -l-pa release. I had hit the linux-rt user list about boot problems, and sebastion had that that it may have been a mainline commit that broke it on some AMD cpus, but i have not been able to determine that, and certainly not within the space of commits, where it should have appeared.

now, i am left testing -rt9 and doing a git-bisect, although i am positive -rt is at fault here, i haven't sorted it out. [I'm also compiling the latest liux-rt patch right now, to see if anything has changed... tomorrow night i plan on doing the git bisect if they latest -rt still fails]...

as far as linux-3.13, i won't be releasing one. My kernels follow -rt with is; 3.4, 3.6, 3.8, 3.10, 3.12, etc... so i won't have a new major release until 3.14 is released and likewise BFQ, UKSM and -rt 3.14 are released.

Comment by unknown78

2014-03-01 14:34

thanx much for the information and have a good time (besides the work) with you brother.

Maybe when you are back you might also give a hint if you might update the kernel with a new rt patchset or maybe a 3.13 kernel?

Again thanx for all the efford.

Comment by Ninez

2014-03-01 14:19

Hey man,

It's totally possible and i will do it later tonight - just heading out the door to help my brother move into his new house [otherwise i would do it now, last i checked Arch hadn't updated but i see they have now]... cheers

Comment by unknown78

2014-03-01 14:17

Hi Ninez

please update to 331.49 if possible. The same problem might be with nvidia-rt.

Comment by Ninez

2014-02-21 14:21

that's funny, I am not sure why this has gotten flagged - until Arch is shipping a 334.x driver, i am not updating this package. [locally, i use it against -beta drivers, so i am already using 334.18]...

I'll update when Archlinux ships newer nvidia blob.

Comment by Ninez

2014-01-29 02:31

Sorry for that late reply!

hmmm... I'm not really sure - i do keep linux-rt around for periodic testing/benchmarking against linux-l-pa. but I've never bumped into that kind of thing before - so i am at a bit of a loss too?!

I don't see how it's possible that a non-running kernel, could affect the running one / or produce those logs - it's really odd.

I'm gonna free up some time this weekend, and see what i can do / via some comparisons / testing.

Comment by unknown78

2014-01-25 23:18

Ok first my last report was under linux-rt kernerl. I booted it 100% , uname shows it.

The question for me was how could get l-pa erros into this kernel. (like kern.log) suggestet.

I never had any issues with rt7-l-pa (if i but his kernel).

The problems always orcured on kernel-rt with nvidia-rt.
They never orcurded with kernel-l-pa and nvidia-l-pa.

Comment by Ninez

2014-01-25 15:09

@unknown78 - it occurred to me this morning, that -rt9 may have fixed the problem you've hit on -rt7-l-pa. I am in the process of compiling -rt10-l-pa (rt10 was released this morning). I've also picked up a few Arch/upstream RPC-related patches...

Maybe test -rt10-l-pa when i upload it and then see if you can reproduce any hangs. (i should have it up on AUR, in a couple of hours).

cheerz

Comment by Ninez

2014-01-25 03:26

@unknown78 - After a bit of research, it does appear that the warning (that you posted) should have NOT been fatal, but is a sign of a little sketchiness with config_no_hz_full... I would tend to think that you should be able to produce the same error on linux-rt. (ie: this shouldn't only happen on -l-pa, as it is 3.12/-rt related bug, nothing specific to linux-l-pa)... That is also why i am wondering how to produced it and if it can be reproduced (?).

regarding the graphical glitches, if it was in fact nvidia-l-pa where you saw that - i can only assume that _if_ no_wbinvd.patch was enabled, you may have had problems due to that. but obviously, you were testing different patches, so i can't be sure - that is something that you will have to figure out / try to reproduce on your end. - U

* but remember; none of those patches (mutexes or no_wbinvd.patch) are intended to be enabled, by default. As wbinvdt is really just to show where the latency spikes happen in nv blob, while i consider the mutex code to be experimental, for now. ~ so if you find either of those patches to be the source of your pain - you can disable them, let nvidia use wbinvdt() && nvidia can use semaphores on -rt, rather than the mutex code.



Comment by Ninez

2014-01-24 21:43

you must not be running in -l-pa, as uname suggests to me that you're in linux-rt kernel (3.12.6-rt9-1-rt). ie: (my machine says:)

uname -a
Linux localhost 3.12.5-rt7-3-l-pa #1 SMP PREEMPT RT Sun Jan 5 19:24:17 EST 2014 x86_64 GNU/Linux

I haven't release an -rt9 based kernel. anyway, it does look like your machine went south on you on -l-pa. according to that log. Can you re-produce this at all? ...and which version/patch of nvidia did this happen under? <as you were testing different setups, so i have no idea.>

you said before; "linux-l-pa + nvidia-l-pa works fine while linux-rt and nvidia-rt give me xid errors and gfx glitches? Do you think turning on IOMMU should make problems"... but your logs show -l-pa as the problematic one? (I'm confused)... We'll need to know what patches were is use too.

I've seen those XID errors before, but not strictly on -rt. If i go back far enough, in my logs i can find;

Jul 17 20:12:45 localhost kernel: [208157.088802] NVRM: Xid (0000:02:00): 8, Channel 00000001

.. but note: that it is July/2013 - i know that i was using the nvidia(-rt) driver of that era, on older l-pa/rt kernel, which didn't use any of my current patches. (it used 3.0-rt era patch).

If i were you, I would try to figure out out the above (where/with what patches caused it) and see if you can reproduce it.


Comment by unknown78

2014-01-24 19:46

@Ninez --> here is the part from the kernel log with what happend http://sprunge.us/FFNJ

Interesting is the following ->
Jan 24 16:18:38 localhost kernel: CPU: 3 PID: 49 Comm: ksoftirqd/3 Tainted: P O 3.12.5-rt7-2-l-pa #1
but uname -a
Linux archbox 3.12.6-rt9-1-rt #1 SMP PREEMPT RT Sat Jan 11 18:18:36 CET 2014 x86_64 GNU/Linux

And the xid errors @ the end.

Also if there are gfx glitches it looked like the complete geometry of textures was broken (like an dark triangle throu my desktop)

Hope that helps a little.

Comment by Ninez

2014-01-22 00:27

@unknown78 - any examples of these xid errors? ... Last time i used linux-rt + nvidia-rt they both worked fine for me. (although, i get better performance out of linux-l-pa/nvidia-l-pa, lower latencies, better mem usage, etc). Also, could you be more specific about what 'gfx gliches' means? - maybe you could post a screenshot if it is 'graphical artifacts' - like blue dots, or discolored areas... or do you mean like jutter? (when it's vsynced, but images skip - like when moving windows, etc)... anyway, more details on that would help..

IOMMU is enabled on my system/kernel/bios, it shouldn't cause you any issues. (if you do happen to be running multiple linux VMs (at the same time), you should also see a nice reduction in mem usage, over stock kernels due to UKSM). I have no experience with KVM but use VMware Workstation.

regarding 2 gpu systems, I don't know much about them - but i do know people who have had problems with bumblebee on -rt... I also don't do a lot of gaming (aside from some retro games, here and there - ie: NES, snes, etc). I do run benchmarking stuff like unigine, play with CUDA a little, etc though.

Comment by unknown78

2014-01-21 22:55

@Ninez Maybe a explination why i ask .. -> lets imagine a second nvidia card + a kvm windows machine and steam in-house streaming over the virtual network :) That might be a solution to run linux as desktop os and have a option for games which doesn't run with wine or have no native port. Most people will soon have anyway to cards 1 performance and 1 integrated in the cpu. That integrated should be fine to play most old games and to a decent level even new games.

Comment by unknown78

2014-01-21 22:51

If you need more test from me let me know. I'm in to help make this as stable as possible. Also do you have a idea why linux-l-pa + nvidia-l-pa works fine while linux-rt and nvidia-rt give me xid errors and gfx glitches? Do you think turning on IOMMU should make problems ?

Comment by Ninez

2014-01-21 21:39

@Unknown78 - Yes, OpenGL/vdpau/and sometimes cuda is typically where you will notice the spikes. (virtualization too) ... that is exactly why i wanted you to do that kind of test - to 'poke' your machine/nvidia. - so i could see / show you.

In regards to your 1) test; that is more inline with what i would expect to have seen. Regarding your 2) test; Yes, your higher latency on test "2) a." was <most likely> because your system was probably still booting/initializing some stuff. * your very last test is what i would expect to see on my machines, with mutexes + no_wbinvd.patch. ~ ie: for the most part under 100ns, rather than puking in the 1000's or more.

My next goal/step is to try replacing those (wbinvdt) calls with a linux interface / counterpart to see if their is a viable alternative that doesn't cause those latency spikes / ruin determinism on linux-rt... I gotta read up on it though, + find the time to sit down and experiment, test, etc.

Comment by unknown78

2014-01-21 20:32

1.) with mutex patch cyclictest --smp -p 85 (root) / a few minutes with 11 video tabs chromium (sorry my downloadspeed limit more tab so i can't 100% my cpu)
twitter - hotot-kde / system monitoring applet / steam companion and a self written plasmoid for twitch followers

run a right after boot with starting the browser after cyclictest)
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 3.51 2.37 1.24 4/719 11838

T: 0 ( 7494) P:85 I:1000 C: 274877 Min: 1 Act: 16 Avg: 9 Max: 1904
T: 1 ( 7495) P:85 I:1500 C: 183251 Min: 2 Act: 13 Avg: 23 Max: 1293
T: 2 ( 7496) P:85 I:2000 C: 137438 Min: 2 Act: 29 Avg: 19 Max: 1641
T: 3 ( 7497) P:85 I:2500 C: 109951 Min: 2 Act: 11 Avg: 24 Max: 1066
T: 4 ( 7498) P:85 I:3000 C: 91625 Min: 2 Act: 27 Avg: 15 Max: 1279
T: 5 ( 7499) P:85 I:3500 C: 78536 Min: 2 Act: 22 Avg: 24 Max: 1833

2.) with mutex & wbindvdt patch cyclictest --smp -p 85 (root) / a few minutes with 11 video tabs chromium (sorry my downloadspeed limit more tab so i can't 100% my cpu)
twitter - hotot-kde / system monitoring applet / steam companion and a self written plasmoid for twitch followers

run a right after boot with starting the browser after cyclictest)
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 3.83 2.52 1.21 3/701 9130

T: 0 ( 5647) P:85 I:1000 C: 221883 Min: 1 Act: 2 Avg: 15 Max: 2050
T: 1 ( 5648) P:85 I:1500 C: 147922 Min: 2 Act: 19 Avg: 20 Max: 2103
T: 2 ( 5649) P:85 I:2000 C: 110941 Min: 2 Act: 24 Avg: 18 Max: 254
T: 3 ( 5650) P:85 I:2500 C: 88753 Min: 2 Act: 11 Avg: 17 Max: 364
T: 4 ( 5651) P:85 I:3000 C: 73961 Min: 2 Act: 25 Avg: 18 Max: 544
T: 5 ( 5652) P:85 I:3500 C: 63395 Min: 2 Act: 21 Avg: 22 Max: 610

run b with 11 videos running and enabling a second chromium instanz to hitbox.tv)
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 4.37 3.83 2.01 1/444 11002

T: 0 ( 9132) P:85 I:1000 C: 203030 Min: 1 Act: 38 Avg: 12 Max: 83
T: 1 ( 9133) P:85 I:1500 C: 135353 Min: 2 Act: 82 Avg: 16 Max: 263
T: 2 ( 9134) P:85 I:2000 C: 101515 Min: 2 Act: 44 Avg: 16 Max: 97
T: 3 ( 9135) P:85 I:2500 C: 81212 Min: 2 Act: 14 Avg: 15 Max: 266
T: 4 ( 9136) P:85 I:3000 C: 67676 Min: 2 Act: 42 Avg: 16 Max: 89
T: 5 ( 9137) P:85 I:3500 C: 58008 Min: 2 Act: 43 Avg: 16 Max: 95

Interesting it seems whenever pepperflash is initialising it spikes a little (the first run could be still the end of my boot) << maybe latency spikes happen when an opengl / vdpau context is created ?

Comment by Ninez

2014-01-21 14:46

@Unknown78 - lulz -> splashtop streamer was running @ home, (i just noticed on my tablet) So I've uploaded the updated version.

checked it against the nvidia-rt package, as well; essentially just give nvidia-l-pa it's own/unique nvidia-l-pa.conf && I also added the enabling of MSI for nvidia + it's config file.

there's no conflicts, as i also just re-installed plain nvidia, etc. So we should be fine now... ya, when i switched to using nvidia-beta+rt <locally>, i overlooked that bit. (for the last year or so, i was using modded nvidia-rt - which looking at those packages has the renamed configs.... my bad!). thx again.

Comment by Ninez

2014-01-21 13:52

@Unknown78 - thx for the info.

Wow, what a difference just using the mutexes patch makes on your system! - on mine it is less-pronounced than on yours (but still shows benefit + doesn't dirty-up the kernel ring buffer with scheduling bugs). I would say for sure, you should be using that patch ;)

If you could do one last comparison between mutex + wbinvdt VS. Mutex + without wbinvdt patch - i would 1. setup cyclictest as <cyclictest --smp -p 85> then 2. open your browser to youtube and start opening multiple tabs of video... This would give us an idea of how your higher priority threads / apps are effected...

For example, on my system my MAX (in cyclictest for -p 85) tends to below 60-70ns even when launching multiple videos, in the browser ~ this is with wbinvdt() call removed from nvidia - but with wbinvdt() + nvidia, i can expect much much higher latencies / negative impact on determinism. (As a side note: with 100%/full load, that 60-70 is cut in half.).

* i have renamed nvidia.conf to nvidia-l-pa.conf locally, I'll update it when i get home - as i am on the way to work right now. ( meant to upload it last night but got distracted).

Comment by unknown78

2014-01-20 22:46

Ok first let me say thanx for the detailed explination.

Second removing conflics might not be enough since you will still get a error with usr/lib/modprobe.d/nvidia.conf used by both packages (nvidia and nvidia-l-pa). So maybe nameing this file nvidia-l-pa.conf should do it. Even that has the disadvantages that noveau might be disabled even if you remove nvidia... not sure how to fix something like this.

Now towards your question:
My configuration:

uname -a
Linux archbox 3.12.5-rt7-2-l-pa #1 SMP PREEMPT RT Fri Jan 17 00:12:36 CET 2014 x86_64 GNU/Linux

CPU
model name : AMD Phenom(tm) II X6 1090T Processor

cat /proc/meminfo
MemTotal: 8112632 kB
MemFree: 4183252 kB

Mainboard = GA-990FXA-UD5 (rev 1.0)
Desktop Enviroment = KDE
IOMMU is disabled / Virtualisation is enabled

Here are the results towards cyclitest run:

1.) without both patches cyclictest -t all (root) / a few minutes with my normal background processes
(vdpau - video - twitch / twitter - hotot-kde / system monitoring applet / steam companion and a self written plasmoid for twitch followers)

# /dev/cpu_dma_latency set to 0us
policy: other/other: loadavg: 0.75 1.01 0.73 1/448 16916

T: 0 ( 8580) P: 0 I:1000 C: 276399 Min: 5 Act: 113 Avg: 85 Max: 1870
T: 1 ( 8581) P: 0 I:1500 C: 184268 Min: 5 Act: 120 Avg: 80 Max: 3174
T: 2 ( 8582) P: 0 I:2000 C: 138200 Min: 5 Act: 56 Avg: 62 Max: 4505
T: 3 ( 8583) P: 0 I:2500 C: 110561 Min: 7 Act: 71 Avg: 72 Max: 1554
T: 4 ( 8584) P: 0 I:3000 C: 92134 Min: 6 Act: 93 Avg: 77 Max: 1343
T: 5 ( 8585) P: 0 I:3500 C: 78972 Min: 6 Act: 54 Avg: 80 Max: 1288

2.) with mutex patch cyclictest -t all (root) / a few minutes with my normal background processes
(vdpau - video - twitch / twitter - hotot-kde / system monitoring applet / steam companion and a self written plasmoid for twitch followers)
# /dev/cpu_dma_latency set to 0us
policy: other/other: loadavg: 0.70 0.82 0.53 2/446 14758

T: 0 ( 6894) P: 0 I:1000 C: 277005 Min: 6 Act: 91 Avg: 66 Max: 960
T: 1 ( 6895) P: 0 I:1500 C: 184670 Min: 5 Act: 153 Avg: 72 Max: 868
T: 2 ( 6896) P: 0 I:2000 C: 138502 Min: 6 Act: 69 Avg: 57 Max: 586
T: 3 ( 6897) P: 0 I:2500 C: 110802 Min: 6 Act: 56 Avg: 74 Max: 573
T: 4 ( 6898) P: 0 I:3000 C: 92335 Min: 6 Act: 42 Avg: 70 Max: 556
T: 5 ( 6899) P: 0 I:3500 C: 79144 Min: 7 Act: 93 Avg: 85 Max: 518

3.) with mutex & wbinvdt patch cyclictest -t all (root) / a few minutes with my normal background processes
(vdpau - video - twitch / twitter - hotot-kde / system monitoring applet / steam companion and a self written plasmoid for twitch followers)
# /dev/cpu_dma_latency set to 0us
policy: other/other: loadavg: 1.20 1.02 0.73 1/452 21252

T: 0 (13447) P: 0 I:1000 C: 277330 Min: 11 Act: 66 Avg: 86 Max: 1023
T: 1 (13448) P: 0 I:1500 C: 184886 Min: 10 Act: 84 Avg: 67 Max: 955
T: 2 (13449) P: 0 I:2000 C: 138665 Min: 10 Act: 22 Avg: 58 Max: 575
T: 3 (13450) P: 0 I:2500 C: 110932 Min: 10 Act: 66 Avg: 69 Max: 647
T: 4 (13451) P: 0 I:3000 C: 92443 Min: 11 Act: 68 Avg: 94 Max: 491
T: 5 (13452) P: 0 I:3500 C: 79237 Min: 9 Act: 120 Avg: 78 Max: 527

Sidenote i use the kmod-roccat and an usb soundcard for soundoutput .. so i wouldn't the max > 1000 can come from a different module too.
My experience while measurement was that wbinvdt patch gaves me a smaller starting max value and keep it for a while but is much more fluctuating. So maybe going with mutex patch might be the overall better choice for me but not sure.



Comment by Ninez

2014-01-19 15:51

@unknown78 - regarding nvidia's kthread rtprio of ff50 - there is nothing wrong with the default necessarily. For my systems i tend to set it higher (ff65), as any (threaded) interrupt that isn't handled by rtirq, ends up with ff50 rtprio - which may or may not be an issue (depending what you are wanting out of your h/w)... For me, I do a lot of virtualization and some other GFX intensive stuff, so for example; i want nvidia to not be interrupted by things like networking, etc...

as for adjusting, how to check; '/usr/bin/rtirq status' will show you (assuming you have rtirq installed/running), or to see all FF threads (not just kthreads); 'ps -eLo rtprio,cls,pid,pri,cmd | grep "FF" | sort -r'

you can manage rtprios with 'rtirq' (including nvidia, if you like) or you could use a bash script (which is what i do). In a script you basically just you the chrt command. For example;

chrt -f -p 65 `pgrep nvidia` (in this case, i use pgrep, rather than using the PID)

chrt -f -p 48 68 (using the PID, as in this case 'pgrep' doesn't work for i8042, it has multiple kthreads, so i find the PID using the ps command, written above).
___

lastly, regarding no_wbinvd.patch, you may or may not notice issues - but removing that call could lead to cache-aliasing, coherency or consistency problems... I use that patch and through rigorous tests haven't found any issues on my H/W - but i know someone who did experience some graphical artifacts... anyway, you've been warned ;)

also, could you tell me a little about your H/W / DE, etc??? also, have you run cyclictest t all with/without that patch applied?? if so, what kind of latencies is cyclictest reporting?

(it would be helpful to know, if possible).

Comment by Ninez

2014-01-19 15:30

Updated to fix nvidia being in conflicts(), as pointed out by unknown78 - thx.

Comment by unknown78

2014-01-19 00:05

no need to hurry just should be fixed. Btw what is the disadvantage of keeping rtprios 50 fifo? And where should i adjust it / to what should it be set ?

I use both patches and i have no problems right of now.

Comment by Ninez

2014-01-18 22:27

@unknown78 - yes i can, but i am not infront of that machine / not @ home right now. I think i accidently forgot to remove nvidia from conflicts (as i rebased this PKGBUILD when I originally uploaded it to AUR).. sorry about that - I'll update this later tonight when i am home, k?

cheerz

Comment by unknown78

2014-01-18 20:50

Question: Can you change the package so it doesn't conflict with the official nvidia package for the official kernel ? I would like to keep both and this module doesn't work with the official kernel.

Comment by Ninez

2014-01-18 04:06

updated to nvidia-l-pa-331.38-2

Changes:

- Added nvidia-libgl to depends.
- update to mutexes patch.

(again) please note:

-rt_mutexes.patch && no_wbinvd.patch are disabled by default.

* the mutexes patch replaces the semaphore code (in nvidia), with mutexes. ~ Doing so fixes scheduling bugs, that would normally appear in the kernel ring buffer / dmesg / logs. It works quite well, but you may want to adjust nvidia's kthread rtprios (from the default of 50 fifo). nvidia-rt_mutexes.patch should be safe to use (I've been using it on several machines for quite some time), but i am still keeping it disabled by default... feel free to test, report back).

* no_wbinvd.patch makes the intel instruction wbinvdt() a no-op (for nvidia), to avoid massive (1000ns) latency spikes which one would normally experience with nvidia-rt / linux-rt. However, enabling this patch is not recommended / use at own risk.- as it has mainly been used to test/find the source of latency issues in nvidia, while experimentation/alternatives to wbinvdt() are being investigated, by myself and others... - it was also helpful in reporting to nvidia.).

Comment by Ninez

2014-01-14 03:21

updated to match Archlinux / X.org update + rebased mutexes patch.

(regarding nvidia/wbinvd - I'm making some progress, as in I (and others) originally thought that the spikes were originating from with nv-vm.c code, however, after some experimentation/tests - allowing that code to use the wbinvdt() call doesn't appear to be the cause of the worst spikes. - this weekend i am going to poke around nv's code more + get in touch with the other people helping/working on the problem)..

PS: 331.38 gets lower latency in cyclictest on my system (than 331.20). (well, not that it matters, unless wbinvdt() is disabled since it adds 1000ns or more.. but with 331.20 i would average 40-60ns, where as now i am sitting around 30ns).

Comment by Ninez

2014-01-06 14:30

As of yesterday, I have finally heard from nvidia. Hopefully, in the coming weeks i can get more info on the wbinvdt() situation. The nvidia dev that i am talking with, is going to defer my issue with a more experienced nvidia kernel developer. From there, i hope we can (potentially) address the severe latency spikes caused by that particular call in nvidia... regardless, it's a start ;) ...I've also put my contact from the linux-rt-user-list in the loop, as well.

cheers

Comment by Ninez

2013-12-18 19:43

Okay, after a bit of feedback - i have decided to disable the no_wbinvd.patch. for two reasons;

- While i experience zero issues using this patch, i know of one person using an older nv video card that gets bits of corruption (in the form of bluish dots, apparently visible in KDE's login window and XBMC).

- I have someone off of the linux-rt user list, who (hopefully) next week / over the holidays is going to have some time to help me sort out nvidia. (they have infinitely more experience with the internals of nvidia's driver than I do).

so, i don't recommend re-enabling it, as it may or may not cause you issues. (and i obviously cannot test any configurations, but those of my own machines, where it doesn't cause issue). Hopefully, we can get this sorted out - as I am really enjoying the nvidia driver NOT giving me extremely high scheduling latencies. :)

Comment by Ninez

2013-12-17 21:39

Updated to nvidia-l-pa 331.20-4

changes:

+ 2 new patches, picked up from Debian/SteamOS;

1. nvidia_useLDbfd_avoid_LDgold.patch - use ld.bfd over ld.gold. Apparently, ld.gold can have some subtle issues with nvidia, so ld.bfd is a better choice.

2. nvidia_kbuildCommands_verbose.patch - this patch to used to allow more verbose messages from kbuild (when building nvidia kernel module).

note: kbuild_verbose patch is disabled by default, patch ld.bfd patch is enabled.

lastly, a small fixup to nvidia-rt_mutexes.patch.

I've still yet to experience any issues with either the no_wbinvd.patch, nor the nvidia-rt_mnutexes.patch - in fact, they work great. (for now, the mutexes patch is still disabled by default). cheerz

Comment by Ninez

2013-12-07 04:32

Just some tips here:

1. If you experience any problems using he default package (being patched with the nvdiai-rt_nowbinvd.patch) try disabling it (by commenting that patch out in the PKGBUILD and compare to see differences (and report back).

2. if the nvidia dirver locks up X or your machine (on -rt). - try using the nvidiart_mutexes. patch. (while it may choke nvidia a litlle -> you will not get any scheduling bugs reported in kernel ring buffer. You should be able to throttle or choke nvidia blob without any issue <as i have tested to various extremes >.

* note: all of these patches are still considered experimental until i get feedback from nVidia and other users.

Comment by Ninez

2013-12-04 07:05

Break down of patches:

- nvidia-rt_explicit.patch [enabled by default] = sets nvidia to PREEMPT_RT_FULL. ~ this doesn't seem to be needed for nvidia driver itself, but the other patches rely on this patch being enabled.

- nvidia-rt_mutexes.patch [disabled by default] = converts all semaphores into (regular) mutexes. ~ This is experimental, but you can get very deterministic performance out of nvidia with this patch. Afaict, more so than when using semaphores. (that being said, i don't think it's as suitable for desktop usage).

- nvidia-rt_no_wbinvd.patch [enabled by default] = this patch disables nvidia calling "wbinvd();" - which is a part of the Intel Instruction Set (486+). This solves a long time issue of high latency spikes when using the nvidia binary driver patched for -rt. wbinvd() is the cause of such pain and shouldn't be used on RT. to explain further;

WBINVD flushes internal cache, then signals the external cache to write back current data followed by a signal to flush the external cache. When the nvidia calls the wbinvd instruction, that invalidates the caches of *ALL* CPUs, forcing them to flush the caches and read everything again. ~ this literally *stalls all of the cpus* -> leading to fairly substantial latencies / poor performance, in a system that otherwise should be quite deterministic..