@rekman Is there a real problem that is intended to solve? Why aren't the following sufficient?
- Build in clean chroot.
makepkg -C
- Whatever command your AUR helper uses to clear cache.
Git Clone URL: | https://aur.archlinux.org/python-tokenizers.git (read-only, click to copy) |
---|---|
Package Base: | python-tokenizers |
Description: | Fast State-of-the-Art Tokenizers optimized for Research and Production |
Upstream URL: | https://github.com/huggingface/tokenizers |
Keywords: | huggingface |
Licenses: | Apache-2.0 |
Submitter: | filipg |
Maintainer: | xiota (daskol) |
Last Packager: | xiota |
Votes: | 8 |
Popularity: | 0.96 |
First Submitted: | 2021-10-23 11:17 (UTC) |
Last Updated: | 2024-12-21 18:21 (UTC) |
@rekman Is there a real problem that is intended to solve? Why aren't the following sufficient?
makepkg -C
Recommend adding git -C "${srcdir}/${pkgname}" clean -dfx
to prepare()
to clean out stale wheels.
@dreieck Thank you for finding a solution to the oniguruma issue.
I am not removing the --locked
option because this package builds fine in a clean chroot. It is also recommended by rust package guidelines.
I am not adding the --offline
option because there is no benefit. It would only break the package if cargo needs to download something.
When trying to build the up to date version 0.15.1 (your package is out of date now), it fails with
==> Starting prepare()...
Updating crates.io index
error: the lock file /tmp/makepkg/build/python-tokenizers/src/tokenizers/bindings/python/Cargo.lock needs to be updated but --locked was passed to prevent this
If you want to try to generate the lock file without accessing the network, remove the --locked flag and use --offline instead.
Remove --locked
in prepare()
(and maybe replace --locked
with --offline
in build()
).
Regards and thanks for maintaining!
{{bc|<nowiki> [sirus@neuromancer ontherag]$ python ./myrag.py Traceback (most recent call last): File "/usr/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1382, in _get_module return importlib.import_module("." + module_name, self.name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1126, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/usr/lib/python3.11/site-packages/transformers/models/init.py", line 15, in <module> from . import ( File "/usr/lib/python3.11/site-packages/transformers/models/mt5/init.py", line 29, in <module> from ..t5.tokenization_t5 import T5Tokenizer File "/usr/lib/python3.11/site-packages/transformers/models/t5/tokenization_t5.py", line 26, in <module> from ...convert_slow_tokenizer import import_protobuf File "/usr/lib/python3.11/site-packages/transformers/convert_slow_tokenizer.py", line 26, in <module> from tokenizers import AddedToken, Regex, Tokenizer, decoders, normalizers, pre_tokenizers, processors File "/usr/lib/python3.11/site-packages/tokenizers/init.py", line 78, in <module> from .tokenizers import ( ImportError: /usr/lib/python3.11/site-packages/tokenizers/tokenizers.cpython-311-x86_64-linux-gnu.so: undefined symbol: OnigDefaultSyntax
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/code/git/ontherag/./myrag.py", line 2, in <module> from transformers import AutoTokenizer File "<frozen importlib._bootstrap>", line 1229, in _handle_fromlist File "/usr/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1372, in getattr module = self._get_module(self._class_to_module[name]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1384, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback): /usr/lib/python3.11/site-packages/tokenizers/tokenizers.cpython-311-x86_64-linux-gnu.so: undefined symbol: OnigDefaultSyntax </nowiki>}}
I haven't figured out how to resolve issues with oniguruma yet.
I'm experiencing the following issue:
$ python -c 'import tokenizers'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.11/site-packages/tokenizers/__init__.py", line 78, in <module>
from .tokenizers import (
ImportError: /usr/lib/python3.11/site-packages/tokenizers/tokenizers.cpython-311-x86_64-linux-gnu.so: undefined symbol: OnigDefaultSyntax
Any ideas? Thanks!
@xiota Glob ignore pattern breaks build. Check discussion about maturin
and gitignore
in the parallel thread.
Manual intervention required (I'll leave it here for whoever encounters the same problem). To solve:
==> ERROR: /home/user/.cache/yay/python-tokenizers/tokenizers is not a clone of https://github.com/huggingface/tokenizers.git
Aborting...
it suffices to remove /home/user/.cache/yay/python-tokenizers/tokenizers
and install again with makepkg
+ pacman -U
or whatever AUR helper of choice.
Pinned Comments
xiota commented on 2024-08-30 16:15 (UTC) (edited on 2024-08-30 16:59 (UTC) by xiota)
Problems: