summarylogtreecommitdiffstats
path: root/fix-race-in-PRT-wait-for-completion-simple-wait-code_Nvidia-RT-160319.patch
blob: 7e9e9c825def5912b4361ba68a251d16a497be6f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
From: Joakim Hernberg <jhernberg@alchemy.lu>
Date: Sat, 19 Mar 2016 13:03:55 +0100
Subject: [PATCH] Fix a race in the PRT wait for completion simple wait code
 for NVIDIA on the rt patch

-Note refactored again 160319.

-NOTE: this patch is a rebase of John Blackwood's patch. On his kernel, he must be using 
-an older simple wait patch - as his applies to kernel/sched/core.c, while the simple wait
-completion code lives in kernel/sched/completion.c ... I have ported this to test with 
-nvidia, as i would like to see if it fixes the semaphore issues i have seen. 

-I've kept the original patch comment in tact;

I'm not 100% sure that the patch below will fix your problem, but we
saw something that sounds pretty familiar to your issue involving the
nvidia driver and the preempt-rt patch.  The nvidia driver uses the
completion support to create their own driver's notion of an internally
used semaphore.

Fix a race in the PRT wait for completion simple wait code. 

A wait_for_completion() waiter task can be awoken by a task calling
complete(), but fail to consume the 'done' completion resource if it
looses a race with another task calling wait_for_completion() just as
it is waking up.

In this case, the awoken task will call schedule_timeout() again
without being in the simple wait queue.

So if the awoken task is unable to claim the 'done' completion resource,
check to see if it needs to be re-inserted into the wait list before
waiting again in schedule_timeout().

Fix-by: John Blackwood <john.blackwood@ccur.com>

---
 kernel/sched/completion.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index b62cf64..c01bbd8 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -61,11 +61,19 @@ static inline long __sched
 do_wait_for_common(struct completion *x,
 		   long (*action)(long), long timeout, int state)
 {
+	int again = 0;
+
 	if (!x->done) {
 		DECLARE_SWAITQUEUE(wait);
 
 		__prepare_to_swait(&x->wait, &wait);
 		do {
+			/* Check to see if we lost race for 'done' and are
+			* no longer in the wait list.
+			*/
+			if (unlikely(again) && list_empty(&wait.task_list))
+				__prepare_to_swait(&x->wait, &wait);
+
 			if (signal_pending_state(state, current)) {
 				timeout = -ERESTARTSYS;
 				break;
@@ -74,6 +82,7 @@ do_wait_for_common(struct completion *x,
 			raw_spin_unlock_irq(&x->wait.lock);
 			timeout = action(timeout);
 			raw_spin_lock_irq(&x->wait.lock);
+			again = 1;
 		} while (!x->done && timeout);
 		__finish_swait(&x->wait, &wait);
 		if (!x->done)
-- 
2.7.2