Package Details: python-tokenizers 0.21.0-1

Git Clone URL: https://aur.archlinux.org/python-tokenizers.git (read-only, click to copy)
Package Base: python-tokenizers
Description: Fast State-of-the-Art Tokenizers optimized for Research and Production
Upstream URL: https://github.com/huggingface/tokenizers
Keywords: huggingface
Licenses: Apache-2.0
Submitter: filipg
Maintainer: xiota (daskol)
Last Packager: xiota
Votes: 8
Popularity: 0.66
First Submitted: 2021-10-23 11:17 (UTC)
Last Updated: 2024-12-21 18:21 (UTC)

Pinned Comments

xiota commented on 2024-08-30 16:15 (UTC) (edited on 2024-08-30 16:59 (UTC) by xiota)

Problems:

Latest Comments

« First ‹ Previous 1 2 3 4 Next › Last »

xiota commented on 2023-10-29 21:32 (UTC)

I do not know the reasons for the build errors. I was able to build this in a clean chroot immediately before typing this.

hashworks commented on 2023-10-18 18:51 (UTC)

Build fails for me with aurutils:

==> Making package: python-tokenizers 0.14.1-1 (Wed 18 Oct 2023 08:50:59 PM CEST)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
  -> Updating tokenizers git repo...
remote: Enumerating objects: 494, done.
remote: Counting objects: 100% (87/87), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 494 (delta 58), reused 61 (delta 50), pack-reused 407
Receiving objects: 100% (494/494), 202.20 KiB | 8.42 MiB/s, done.
Resolving deltas: 100% (323/323), completed with 3 local objects.
From https://github.com/huggingface/tokenizers
 * [new branch]        Pierrci-patch-1      -> Pierrci-patch-1
 + 52037045...628f4e70 refs/pull/1144/merge -> refs/pull/1144/merge  (forced update)
 + 2cab5258...41fc5357 refs/pull/1203/merge -> refs/pull/1203/merge  (forced update)
 * [new ref]           refs/pull/1367/head  -> refs/pull/1367/head
 * [new ref]           refs/pull/1367/merge -> refs/pull/1367/merge
 + 882a169a...8e0b8141 refs/pull/559/merge  -> refs/pull/559/merge  (forced update)
 + c73f1a10...7e638d1d refs/pull/716/merge  -> refs/pull/716/merge  (forced update)
 + df77d0cd...10a18343 refs/pull/842/merge  -> refs/pull/842/merge  (forced update)
 + 30ecc0f1...860bab23 refs/pull/992/merge  -> refs/pull/992/merge  (forced update)
==> Validating source files with sha256sums...
    tokenizers ... Skipped
==> Extracting sources...
  -> Creating working copy of tokenizers git repo...
Cloning into 'tokenizers'...
done.
Switched to a new branch 'makepkg'
==> Starting prepare()...
==> Starting build()...
error: package `clap_builder v4.4.6` cannot be built because it requires rustc 1.70.0 or newer, while the currently active rustc version is 1.65.0
Either upgrade to rustc 1.70.0 or newer, or use
cargo update -p clap_builder@4.4.6 --precise ver
where `ver` is the latest version of `clap_builder` supporting rustc 1.65.0
==> ERROR: A failure occurred in build().
    Aborting...

intelfx commented on 2023-10-07 20:59 (UTC)

v0.14.0 does not build anymore with rust 1.73:

   Compiling tokenizers v0.14.0 (/build/python-tokenizers/src/tokenizers/tokenizers)
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
   --> /build/python-tokenizers/src/tokenizers/tokenizers/src/models/bpe/trainer.rs:541:47
    |
537 |                     let w = &words[*i] as *const _ as *mut _;
    |                             -------------------------------- casting happend here
...
541 |                         let word: &mut Word = &mut (*w);
    |                                               ^^^^^^^^^
    |
    = note: `#[deny(invalid_reference_casting)]` on by default

error: could not compile `tokenizers` (lib) due to previous error

Please bump to 0.14.1. I'm flagging the package as this does not build on an uptodate Arch anymore.

piotroxp commented on 2023-09-28 14:17 (UTC) (edited on 2023-09-28 14:23 (UTC) by piotroxp)

Latest logs from pacaur -S python-tokenizers @dreieck

==> Rozpakowywanie źródeł...
  -> Tworzenie kopii roboczej repozytorium tokenizers git...
Zresetuj gałąź „makepkg”
==> Rozpoczynanie prepare()...
    Updating crates.io index
error: failed to select a version for the requirement `env_logger = "=0.10.0"`
candidate versions found which didn't match: 0.9.3, 0.9.2, 0.9.1, ...
location searched: crates.io index
required by package `tokenizers-python v0.14.0 (/nvme/Homes/developer/.cache/pacaur/python-tokenizers/src/tokenizers/bindings/python)`
==> BŁĄD: Wystąpił błąd w prepare().
    Przerywam...
:: failed to verify integrity or prepare python-tokenizers package

xiota commented on 2023-05-30 05:02 (UTC) (edited on 2023-09-08 07:28 (UTC) by xiota)

@dreieck Updated so that some crates are downloaded in prepare(). More crates are downloaded midway through the build process. This is out of my control because the build process is controlled by python scripts. You'll have to work with upstream if you want this changed.

dreieck commented on 2023-05-04 09:54 (UTC)

This PKGBUILD downloads stuff during build() and stores that in the user's $HOME directory ($HOME/.cargo/).

Can you

  1. make sure that all the rust dependency downloads take place in prepare(), so that build() and package() do not need internet connection, and
  2. make sure that the download goes into some subdirectory of $srcdir, to not clutter the user's home directory (I think $CARGO_HOME is the environment variable that controls this, but please cross-check for yourself, also with python-setuptools-rust specifities)?
cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module -- --crate-type cdylib
    Updating crates.io index
       Fetch [                         ]   1.81%, 295.81KiB/s     

Regards and
thanks for maintaining!

taba commented on 2023-02-27 20:07 (UTC)

Ignore what I said. I think I was being pedantic. Sorry for the notification.

xiota commented on 2023-02-27 18:45 (UTC) (edited on 2023-02-27 18:46 (UTC) by xiota)

Why? The release is tagged. This way is easier to update versions or switch to a git build.

taba commented on 2023-02-27 18:08 (UTC)

Pin the source release hash in PKGBUILD. Use https://github.com/huggingface/tokenizers/archive/refs/tags/v0.13.2.tar.gz.