39 packages found. Page 1 of 1.

Name Version Votes Popularity? Description Maintainer Last Updated
sentencepiece 0.2.0-2 2 0.49 Unsupervised text tokenizer for Neural Network-based text generation Henry-ZHR 2024-03-18 04:19 (UTC)
php56-tokenizer 5.6.40-11 48 0.25 tokenizer module for php56 el_aur 2024-03-27 14:51 (UTC)
python-tokenizers 0.15.2-1 3 0.21 Fast State-of-the-Art Tokenizers optimized for Research and Production xiota 2024-02-17 00:58 (UTC)
php83-tokenizer 8.3.4-1 2 0.20 tokenizer module for php83 el_aur 2024-03-27 13:15 (UTC)
php74-tokenizer 7.4.33-5 11 0.03 tokenizer module for php74 el_aur 2024-03-27 14:57 (UTC)
php82-tokenizer 8.2.17-1 3 0.03 tokenizer module for php82 el_aur 2024-03-27 13:09 (UTC)
python-rtf_tokenize 1.0.0-8 1 0.01 Simple RTF tokenizer package for Python AlphaJack 2023-02-06 11:04 (UTC)
php81-tokenizer 8.1.27-1 10 0.01 tokenizer module for php81 el_aur 2024-01-03 11:55 (UTC)
jsmn 1.1.0-1 1 0.01 JSON parser/tokenizer orphan 2023-08-01 23:12 (UTC)
php73-tokenizer 7.3.33-11 11 0.00 tokenizer module for php73 matth 2024-03-27 14:58 (UTC)
mailparser 3.15.0-3 3 0.00 Tokenizer for raw mails kleintux 2023-07-09 19:09 (UTC)
php80-tokenizer 8.0.30-1 15 0.00 tokenizer module for php80 muhviehstarr 2023-11-23 15:15 (UTC)
r-tokenizers 0.3.0-1 1 0.00 Fast, Consistent Tokenization of Natural Language Text BioArchLinuxBot 2022-12-22 12:02 (UTC)
python-crossandra 1.3.0-1 1 0.00 A simple tokenizer operating on enums with a decent amount of configuration MithicSpirit 2023-03-09 15:58 (UTC)
php72-tokenizer 7.2.34-15 11 0.00 tokenizer module for php72 el_aur 2024-03-27 14:58 (UTC)
python-sacremoses 0.1.1-1 1 0.00 Python port of Moses tokenizer, truecaser and normalizer hottea 2023-10-30 22:03 (UTC)
uctodata 0.9.1-1 0 0.00 An advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages. Tokenization is an essential first step in any NLP pipeline. This package contains the necessary data. proycon 2022-07-22 09:43 (UTC)
ucto 0.32.1-1 1 0.00 An advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages. Tokenization is an essential first step in any NLP pipeline. proycon 2024-03-21 11:50 (UTC)
tokenizer-git r113.8b0c4e2-1 0 0.00 Convert source code into numerical tokens aksr 2023-04-22 07:03 (UTC)
sentences-bin 1.0.0-2 0 0.00 A multilingual command line sentence tokenizer neurosnap 2023-02-06 14:27 (UTC)
sentencepiece-git r492.ffa2c82-1 0 0.00 Unsupervised text tokenizer for Neural Network-based text generation panosk 2019-05-14 09:18 (UTC)
r-hunspell 3.0.3-2 0 0.00 High-Performance Stemmer, Tokenizer, and Spell Checker BioArchLinuxBot 2023-10-06 18:04 (UTC)
python2-ucto-git 10-1 1 0.00 Python binding for Ucto, an advanced tokenizer (for NLP) MarsSeed 2015-06-21 10:53 (UTC)
python-sentencepiece-git 0.2.0-1 0 0.00 Sentencepiece text tokenizer (Python version) lumaku 2023-07-23 18:28 (UTC)
python-sacremoses-git 0.0.35.r41.gb94654f-1 0 0.00 Python port of Moses tokenizer, truecaser and normalizer orphan 2020-07-23 10:33 (UTC)
python-html5lib-git 1.1.r9.gf7cab6f-1 0 0.00 A Python HTML parser/tokenizer based on the WHATWG HTML5 spec robertfoster 2022-06-09 23:17 (UTC)
python-gruut-lang-en 2.0.0-00 0 0.00 English language files for gruut tokenizer/phonemizer skeilnet 2022-11-14 15:41 (UTC)
python-gruut 2.3.4-00 0 0.00 A tokenizer, text cleaner, and phonemizer for many human languages. skeilnet 2022-11-14 16:42 (UTC)
php71-tokenizer 7.1.33-11 12 0.00 tokenizer module for php71 wget 2024-03-27 15:00 (UTC)
php70-tokenizer 7.0.33-13 11 0.00 tokenizer module for php70 wget 2024-03-27 15:00 (UTC)
php55-tokenizer 5.5.38-15 4 0.00 tokenizer module for php55 el_aur 2024-03-27 14:52 (UTC)
perl-string-tokenizer 0.05-1 0 0.00 A simple string tokenizer. jnbek 2015-06-16 21:54 (UTC)
perl-perl-tokenizer 0.10-2 1 0.00 Perl::Tokenizer - a tiny Perl code tokenizer. trizen 2019-02-09 05:27 (UTC)
miteiru-bin 2.2.0-7 0 0.00 An open source Electron video player to learn Japanese. It has main language dictionary and tokenizer (morphological analyzer), heavily based on External software MeCab zxp19821005 2024-03-22 02:30 (UTC)
miteiru 4.2.1-2 0 0.00 An open source Electron video player to learn Japanese. It has main language dictionary and tokenizer (morphological analyzer), heavily based on External software MeCab zxp19821005 2024-03-22 02:39 (UTC)
groonga-tokenizer-friso 1.1.0-1 0 0.00 Friso tokenizer for Groonga. cosmo0920 2020-12-17 08:42 (UTC)
gposttl-git r34.4d19dda-2 0 0.00 Brill's Parts-of-Speech Tagger, with built-in Tokenizer and Lemmatizer m3thodic 2022-05-05 01:40 (UTC)
frog 0.32-1 1 0.00 Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It includes a tokenizer, part-of-speech tagger, lemmatizer, morphological analyser, named entity recognition, shallow parser and dependency parser. proycon 2023-12-05 14:52 (UTC)
friso 1.6.4-2 0 0.00 An opensource tokenizer for Chinese. cosmo0920 2020-12-20 13:54 (UTC)

39 packages found. Page 1 of 1.