I do not know the reasons for the build errors. I was able to build this in a clean chroot immediately before typing this.
Search Criteria
Package Details: python-tokenizers 0.21.0-1
Package Actions
Git Clone URL: | https://aur.archlinux.org/python-tokenizers.git (read-only, click to copy) |
---|---|
Package Base: | python-tokenizers |
Description: | Fast State-of-the-Art Tokenizers optimized for Research and Production |
Upstream URL: | https://github.com/huggingface/tokenizers |
Keywords: | huggingface |
Licenses: | Apache-2.0 |
Submitter: | filipg |
Maintainer: | xiota (daskol) |
Last Packager: | xiota |
Votes: | 8 |
Popularity: | 0.66 |
First Submitted: | 2021-10-23 11:17 (UTC) |
Last Updated: | 2024-12-21 18:21 (UTC) |
Dependencies (14)
- python (python37AUR, python311AUR, python310AUR)
- clang (llvm-gitAUR, clang-minimal-gitAUR, clang17-binAUR) (make)
- python-build (make)
- python-installer (make)
- python-maturin (python-maturin-gitAUR) (make)
- python-setuptools-rust (make)
- python-wheel (make)
- rust-bindgen (make)
- python-datasetsAUR (check)
- python-numpy (python-numpy-gitAUR, python-numpy-mkl-tbbAUR, python-numpy-mklAUR, python-numpy1AUR, python-numpy-mkl-binAUR) (check)
- python-pyarrow (check)
- python-pytest (check)
- python-requests (check)
- python-setuptools-rust (check)
Required by (11)
Sources (3)
xiota commented on 2023-10-29 21:32 (UTC)
hashworks commented on 2023-10-18 18:51 (UTC)
Build fails for me with aurutils:
==> Making package: python-tokenizers 0.14.1-1 (Wed 18 Oct 2023 08:50:59 PM CEST)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
-> Updating tokenizers git repo...
remote: Enumerating objects: 494, done.
remote: Counting objects: 100% (87/87), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 494 (delta 58), reused 61 (delta 50), pack-reused 407
Receiving objects: 100% (494/494), 202.20 KiB | 8.42 MiB/s, done.
Resolving deltas: 100% (323/323), completed with 3 local objects.
From https://github.com/huggingface/tokenizers
* [new branch] Pierrci-patch-1 -> Pierrci-patch-1
+ 52037045...628f4e70 refs/pull/1144/merge -> refs/pull/1144/merge (forced update)
+ 2cab5258...41fc5357 refs/pull/1203/merge -> refs/pull/1203/merge (forced update)
* [new ref] refs/pull/1367/head -> refs/pull/1367/head
* [new ref] refs/pull/1367/merge -> refs/pull/1367/merge
+ 882a169a...8e0b8141 refs/pull/559/merge -> refs/pull/559/merge (forced update)
+ c73f1a10...7e638d1d refs/pull/716/merge -> refs/pull/716/merge (forced update)
+ df77d0cd...10a18343 refs/pull/842/merge -> refs/pull/842/merge (forced update)
+ 30ecc0f1...860bab23 refs/pull/992/merge -> refs/pull/992/merge (forced update)
==> Validating source files with sha256sums...
tokenizers ... Skipped
==> Extracting sources...
-> Creating working copy of tokenizers git repo...
Cloning into 'tokenizers'...
done.
Switched to a new branch 'makepkg'
==> Starting prepare()...
==> Starting build()...
error: package `clap_builder v4.4.6` cannot be built because it requires rustc 1.70.0 or newer, while the currently active rustc version is 1.65.0
Either upgrade to rustc 1.70.0 or newer, or use
cargo update -p clap_builder@4.4.6 --precise ver
where `ver` is the latest version of `clap_builder` supporting rustc 1.65.0
==> ERROR: A failure occurred in build().
Aborting...
intelfx commented on 2023-10-07 20:59 (UTC)
v0.14.0 does not build anymore with rust 1.73:
Compiling tokenizers v0.14.0 (/build/python-tokenizers/src/tokenizers/tokenizers)
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
--> /build/python-tokenizers/src/tokenizers/tokenizers/src/models/bpe/trainer.rs:541:47
|
537 | let w = &words[*i] as *const _ as *mut _;
| -------------------------------- casting happend here
...
541 | let word: &mut Word = &mut (*w);
| ^^^^^^^^^
|
= note: `#[deny(invalid_reference_casting)]` on by default
error: could not compile `tokenizers` (lib) due to previous error
Please bump to 0.14.1. I'm flagging the package as this does not build on an uptodate Arch anymore.
piotroxp commented on 2023-09-28 14:17 (UTC) (edited on 2023-09-28 14:23 (UTC) by piotroxp)
Latest logs from pacaur -S python-tokenizers @dreieck
==> Rozpakowywanie źródeł...
-> Tworzenie kopii roboczej repozytorium tokenizers git...
Zresetuj gałąź „makepkg”
==> Rozpoczynanie prepare()...
Updating crates.io index
error: failed to select a version for the requirement `env_logger = "=0.10.0"`
candidate versions found which didn't match: 0.9.3, 0.9.2, 0.9.1, ...
location searched: crates.io index
required by package `tokenizers-python v0.14.0 (/nvme/Homes/developer/.cache/pacaur/python-tokenizers/src/tokenizers/bindings/python)`
==> BŁĄD: Wystąpił błąd w prepare().
Przerywam...
:: failed to verify integrity or prepare python-tokenizers package
xiota commented on 2023-05-30 05:02 (UTC) (edited on 2023-09-08 07:28 (UTC) by xiota)
@dreieck Updated so that some crates are downloaded in prepare()
. More crates are downloaded midway through the build process. This is out of my control because the build process is controlled by python scripts. You'll have to work with upstream if you want this changed.
dreieck commented on 2023-05-04 09:54 (UTC)
This PKGBUILD
downloads stuff during build()
and stores that in the user's $HOME
directory ($HOME/.cargo/
).
Can you
- make sure that all the rust dependency downloads take place in
prepare()
, so thatbuild()
andpackage()
do not need internet connection, and - make sure that the download goes into some subdirectory of
$srcdir
, to not clutter the user's home directory (I think$CARGO_HOME
is the environment variable that controls this, but please cross-check for yourself, also withpython-setuptools-rust
specifities)?
cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module -- --crate-type cdylib
Updating crates.io index
Fetch [ ] 1.81%, 295.81KiB/s
Regards and
thanks for maintaining!
taba commented on 2023-02-27 20:07 (UTC)
Ignore what I said. I think I was being pedantic. Sorry for the notification.
xiota commented on 2023-02-27 18:45 (UTC) (edited on 2023-02-27 18:46 (UTC) by xiota)
Why? The release is tagged. This way is easier to update versions or switch to a git build.
taba commented on 2023-02-27 18:08 (UTC)
Pin the source release hash in PKGBUILD. Use https://github.com/huggingface/tokenizers/archive/refs/tags/v0.13.2.tar.gz.
Pinned Comments
xiota commented on 2024-08-30 16:15 (UTC) (edited on 2024-08-30 16:59 (UTC) by xiota)
Problems: