Package Details: python-tokenizers 0.20.0-1

Git Clone URL: https://aur.archlinux.org/python-tokenizers.git (read-only, click to copy)
Package Base: python-tokenizers
Description: Fast State-of-the-Art Tokenizers optimized for Research and Production
Upstream URL: https://github.com/huggingface/tokenizers
Keywords: huggingface
Licenses: Apache-2.0
Submitter: filipg
Maintainer: xiota (daskol)
Last Packager: xiota
Votes: 6
Popularity: 0.44
First Submitted: 2021-10-23 11:17 (UTC)
Last Updated: 2024-08-30 21:10 (UTC)

Pinned Comments

xiota commented on 2024-08-30 16:15 (UTC) (edited on 2024-08-30 16:59 (UTC) by xiota)

Problems:

Latest Comments

« First ‹ Previous 1 2 3 4 Next › Last »

rekman commented on 2024-02-10 00:45 (UTC)

Recommend adding git -C "${srcdir}/${pkgname}" clean -dfx to prepare() to clean out stale wheels.

xiota commented on 2024-02-09 02:14 (UTC)

@dreieck Thank you for finding a solution to the oniguruma issue.

I am not removing the --locked option because this package builds fine in a clean chroot. It is also recommended by rust package guidelines.

I am not adding the --offline option because there is no benefit. It would only break the package if cargo needs to download something.

dreieck commented on 2024-02-08 15:58 (UTC) (edited on 2024-02-08 16:20 (UTC) by dreieck)

Adding options+=('!lto') fixes the

ImportError: /usr/lib/python3.11/site-packages/tokenizers/tokenizers.cpython-311-x86_64-linux-gnu.so: undefined symbol: OnigDefaultSyntax

issue for me.

Regards!

dreieck commented on 2024-02-08 15:45 (UTC) (edited on 2024-02-08 15:47 (UTC) by dreieck)

When trying to build the up to date version 0.15.1 (your package is out of date now), it fails with

==> Starting prepare()...
    Updating crates.io index
error: the lock file /tmp/makepkg/build/python-tokenizers/src/tokenizers/bindings/python/Cargo.lock needs to be updated but --locked was passed to prevent this
If you want to try to generate the lock file without accessing the network, remove the --locked flag and use --offline instead.

Remove --locked in prepare() (and maybe replace --locked with --offline in build()).

Regards and thanks for maintaining!

sirus20x6 commented on 2024-02-03 06:36 (UTC) (edited on 2024-02-03 06:41 (UTC) by sirus20x6)

{{bc|<nowiki> [sirus@neuromancer ontherag]$ python ./myrag.py Traceback (most recent call last): File "/usr/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1382, in _get_module return importlib.import_module("." + module_name, self.name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1126, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/usr/lib/python3.11/site-packages/transformers/models/init.py", line 15, in <module> from . import ( File "/usr/lib/python3.11/site-packages/transformers/models/mt5/init.py", line 29, in <module> from ..t5.tokenization_t5 import T5Tokenizer File "/usr/lib/python3.11/site-packages/transformers/models/t5/tokenization_t5.py", line 26, in <module> from ...convert_slow_tokenizer import import_protobuf File "/usr/lib/python3.11/site-packages/transformers/convert_slow_tokenizer.py", line 26, in <module> from tokenizers import AddedToken, Regex, Tokenizer, decoders, normalizers, pre_tokenizers, processors File "/usr/lib/python3.11/site-packages/tokenizers/init.py", line 78, in <module> from .tokenizers import ( ImportError: /usr/lib/python3.11/site-packages/tokenizers/tokenizers.cpython-311-x86_64-linux-gnu.so: undefined symbol: OnigDefaultSyntax

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/code/git/ontherag/./myrag.py", line 2, in <module> from transformers import AutoTokenizer File "<frozen importlib._bootstrap>", line 1229, in _handle_fromlist File "/usr/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1372, in getattr module = self._get_module(self._class_to_module[name]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1384, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback): /usr/lib/python3.11/site-packages/tokenizers/tokenizers.cpython-311-x86_64-linux-gnu.so: undefined symbol: OnigDefaultSyntax </nowiki>}}

xiota commented on 2024-01-24 19:52 (UTC) (edited on 2024-01-27 13:35 (UTC) by xiota)

I haven't figured out how to resolve issues with oniguruma yet.

carsme commented on 2024-01-10 16:34 (UTC)

I'm experiencing the following issue:

$ python -c 'import tokenizers'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.11/site-packages/tokenizers/__init__.py", line 78, in <module>
    from .tokenizers import (
ImportError: /usr/lib/python3.11/site-packages/tokenizers/tokenizers.cpython-311-x86_64-linux-gnu.so: undefined symbol: OnigDefaultSyntax

Any ideas? Thanks!

daskol commented on 2023-12-13 11:40 (UTC) (edited on 2023-12-13 11:41 (UTC) by daskol)

@xiota Glob ignore pattern breaks build. Check discussion about maturin and gitignore in the parallel thread.

mane.andrea commented on 2023-11-23 17:42 (UTC) (edited on 2023-11-23 17:43 (UTC) by mane.andrea)

Manual intervention required (I'll leave it here for whoever encounters the same problem). To solve:

==> ERROR: /home/user/.cache/yay/python-tokenizers/tokenizers is not a clone of https://github.com/huggingface/tokenizers.git
    Aborting...

it suffices to remove /home/user/.cache/yay/python-tokenizers/tokenizers and install again with makepkg + pacman -U or whatever AUR helper of choice.

xiota commented on 2023-10-29 21:32 (UTC)

I do not know the reasons for the build errors. I was able to build this in a clean chroot immediately before typing this.