aboutsummarylogtreecommitdiffstats
path: root/zstd-udpate-fixes.patch
blob: 70310ffd8409ec668eec1b161ff843dfd609116d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
From 9a8cd176f6ee90a3605d5dab75a4ab778e73f6f0 Mon Sep 17 00:00:00 2001
From: Scott B <arglebargle@arglebargle.dev>
Date: Sat, 20 Nov 2021 12:14:56 -0800
Subject: [PATCH] zstd udpate fixes

Squashed commit of the following:

commit 82290fbdaffe7519c8451d5a8fe20af6faddd665
Author: Nick Terrell <terrelln@fb.com>
Date:   Tue Nov 16 15:11:39 2021 -0800

    lib: zstd: Don't add -O3 to cflags

    After the update to zstd-1.4.10 passing -O3 is no longer necessary to
    get good performance from zstd. Using the default optimization level -O2
    is sufficient to get good performance.

    I've measured no significant change to compression speed, and a ~1%
    decompression speed loss, which is acceptable.

    This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
    gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
    stacks that are ~3KB. With -O2 these same functions generate stacks in
    the < 100B, completely fixing the problem. Function size deltas are
    listed below:

    ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68
    ZSTD_compressBlock_fast: 2216 -> 40
    ZSTD_compressBlock_fast_dictMatchState: 1848 ->  64
    ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76
    ZSTD_fillDoubleHashTable: 3252 -> 0
    ZSTD_compressBlock_doubleFast: 5856 -> 36
    ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84
    ZSTD_copmressBlock_lazy2: 2420 -> 72

    Additionally, this improves the reported code bloat [1]. With gcc-11
    bloat-o-meter shows an 80KB code size improvement:

    ```
    > ../scripts/bloat-o-meter vmlinux.old vmlinux
    add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190)
    Total: Before=6418562, After=6336372, chg -1.28%
    ```

    Compared to before the zstd-1.4.10 update we see a total code size
    regression of 105KB, down from 374KB at v5.16-rc1:

    ```
    > ../scripts/bloat-o-meter vmlinux.old vmlinux
    add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522)
    Total: Before=6228850, After=6336372, chg +1.73%
    ```

    [0] https://lkml.org/lkml/2021/11/15/710
    [1] https://lkml.org/lkml/2021/11/14/189

    Link: https://lore.kernel.org/r/20211117014949.1169186-4-nickrterrell@gmail.com/
    Link: https://lore.kernel.org/r/20211117201459.1194876-4-nickrterrell@gmail.com/

    Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Signed-off-by: Nick Terrell <terrelln@fb.com>

commit e764e672bf43ad4a3a3a85e79b267daaad406990
Author: Nick Terrell <terrelln@fb.com>
Date:   Mon Nov 15 20:33:08 2021 -0800

    lib: zstd: Don't inline functions in zstd_opt.c

    `zstd_opt.c` contains the match finder for the highest compression
    levels. These levels are already very slow, and are unlikely to be used
    in the kernel. If they are used, they shouldn't be used in latency
    sensitive workloads, so slowing them down shouldn't be a big deal.

    This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
    I've also opened an issue upstream [1] so that we can properly tackle
    the code size issue in `zstd_opt.c` for all users, and can hopefully
    remove this hack in the next zstd version we import.

    Bloat-o-meter output on x86-64:

    ```
    > ../scripts/bloat-o-meter vmlinux.old vmlinux
    add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
    Function                                     old     new   delta
    ZSTD_compressBlock_opt_generic.constprop       -    7559   +7559
    ZSTD_insertBtAndGetAllMatches                  -    6304   +6304
    ZSTD_insertBt1                                 -    1731   +1731
    ZSTD_storeSeq                                  -     693    +693
    ZSTD_BtGetAllMatches                           -     255    +255
    ZSTD_updateRep                                 -     128    +128
    ZSTD_updateTree                               96      99      +3
    ZSTD_insertAndFindFirstIndexHash3             81       -     -81
    ZSTD_setBasePrices.constprop                  98       -     -98
    ZSTD_litLengthPrice.constprop                138       -    -138
    ZSTD_count                                   362     181    -181
    ZSTD_count_2segments                        1407     938    -469
    ZSTD_insertBt1.constprop                    2689       -   -2689
    ZSTD_compressBlock_btultra2                19990     423  -19567
    ZSTD_compressBlock_btultra                 19633      15  -19618
    ZSTD_initStats_ultra                       19825       -  -19825
    ZSTD_compressBlock_btopt                   20374      12  -20362
    ZSTD_compressBlock_btopt_extDict           29984      12  -29972
    ZSTD_compressBlock_btultra_extDict         30718      15  -30703
    ZSTD_compressBlock_btopt_dictMatchState    32689      12  -32677
    ZSTD_compressBlock_btultra_dictMatchState   33574      15  -33559
    Total: Before=6611828, After=6418562, chg -2.92%
    ```

    [0] https://lkml.org/lkml/2021/11/14/189
    [1] https://github.com/facebook/zstd/issues/2862

    Link: https://lore.kernel.org/r/20211117014949.1169186-3-nickrterrell@gmail.com/
    Link: https://lore.kernel.org/r/20211117201459.1194876-3-nickrterrell@gmail.com/

    Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Signed-off-by: Nick Terrell <terrelln@fb.com>

commit e5fa8f8334f188f496e02c0c5dcb1f9234fec2fb
Author: Nick Terrell <terrelln@fb.com>
Date:   Mon Nov 15 19:08:19 2021 -0800

    lib: zstd: Fix unused variable warning

    The variable `litLengthSum` is only used by an `assert()`, so when
    asserts are disabled the compiler doesn't see any usage and warns.

    This issue is already fixed upstream by PR #2838 [0]. It was reported
    by the Kernel test robot in [1].

    Another approach would be to change zstd's disabled `assert()`
    definition to use the argument in a disabled branch, instead of
    ignoring the argument. I've avoided this approach because there are
    some small changes necessary to get zstd to build, and I would
    want to thoroughly re-test for performance, since that is slightly
    changing the code in every function in zstd. It seems like a
    trivial change, but some functions are pretty sensitive to small
    changes. However, I think it is a valid approach that I would
    like to see upstream take, so I've opened Issue #2868 to attempt
    this upstream.

    Lastly, I've chosen not to use __maybe_unused because all code
    in lib/zstd/ must eventually be upstreamed. Upstream zstd can't
    use __maybe_unused because it isn't portable across all compilers.

    [0] https://github.com/facebook/zstd/pull/2838
    [1] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/
    [2] https://github.com/facebook/zstd/issues/2868

    Link: https://lore.kernel.org/r/20211117014949.1169186-2-nickrterrell@gmail.com/
    Link: https://lore.kernel.org/r/20211117201459.1194876-2-nickrterrell@gmail.com/

    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/Makefile                            |  2 --
 lib/zstd/common/compiler.h                   |  7 +++++++
 lib/zstd/compress/zstd_compress_superblock.c |  2 ++
 lib/zstd/compress/zstd_opt.c                 | 12 ++++++++++++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/lib/zstd/Makefile b/lib/zstd/Makefile
index 65218ec5b8f2..fc45339fc3a3 100644
--- a/lib/zstd/Makefile
+++ b/lib/zstd/Makefile
@@ -11,8 +11,6 @@
 obj-$(CONFIG_ZSTD_COMPRESS) += zstd_compress.o
 obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd_decompress.o
 
-ccflags-y += -O3
-
 zstd_compress-y := \
 		zstd_compress_module.o \
 		common/debug.o \
diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h
index a1a051e4bce6..f5a9c70a228a 100644
--- a/lib/zstd/common/compiler.h
+++ b/lib/zstd/common/compiler.h
@@ -16,6 +16,7 @@
 *********************************************************/
 /* force inlining */
 
+#if !defined(ZSTD_NO_INLINE)
 #if (defined(__GNUC__) && !defined(__STRICT_ANSI__)) || defined(__cplusplus) || defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L   /* C99 */
 #  define INLINE_KEYWORD inline
 #else
@@ -24,6 +25,12 @@
 
 #define FORCE_INLINE_ATTR __attribute__((always_inline))
 
+#else
+
+#define INLINE_KEYWORD
+#define FORCE_INLINE_ATTR
+
+#endif
 
 /*
   On MSVC qsort requires that functions passed into it use the __cdecl calling conversion(CC).
diff --git a/lib/zstd/compress/zstd_compress_superblock.c b/lib/zstd/compress/zstd_compress_superblock.c
index ee03e0aedb03..b0610b255653 100644
--- a/lib/zstd/compress/zstd_compress_superblock.c
+++ b/lib/zstd/compress/zstd_compress_superblock.c
@@ -411,6 +411,8 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef*
     const seqDef* sp = sstart;
     size_t matchLengthSum = 0;
     size_t litLengthSum = 0;
+    /* Only used by assert(), suppress unused variable warnings in production. */
+    (void)litLengthSum;
     while (send-sp > 0) {
         ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp);
         litLengthSum += seqLen.litLength;
diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c
index 70adc7c2cc5e..09483f518dc3 100644
--- a/lib/zstd/compress/zstd_opt.c
+++ b/lib/zstd/compress/zstd_opt.c
@@ -8,6 +8,18 @@
  * You may select, at your option, one of the above-listed licenses.
  */
 
+/*
+ * Disable inlining for the optimal parser for the kernel build.
+ * It is unlikely to be used in the kernel, and where it is used
+ * latency shouldn't matter because it is very slow to begin with.
+ * We prefer a ~180KB binary size win over faster optimal parsing.
+ *
+ * TODO(https://github.com/facebook/zstd/issues/2862):
+ * Improve the code size of the optimal parser in general, so we
+ * don't need this hack for the kernel build.
+ */
+#define ZSTD_NO_INLINE 1
+
 #include "zstd_compress_internal.h"
 #include "hist.h"
 #include "zstd_opt.h"
-- 
2.34.0