sentencepiece
|
0.2.0-2 |
2 |
0.49
|
Unsupervised text tokenizer for Neural Network-based text generation |
Henry-ZHR
|
2024-03-18 04:19 (UTC) |
php56-tokenizer
|
5.6.40-11 |
48 |
0.25
|
tokenizer module for php56 |
el_aur
|
2024-03-27 14:51 (UTC) |
python-tokenizers
|
0.15.2-1 |
3 |
0.21
|
Fast State-of-the-Art Tokenizers optimized for Research and Production |
xiota
|
2024-02-17 00:58 (UTC) |
php83-tokenizer
|
8.3.4-1 |
2 |
0.20
|
tokenizer module for php83 |
el_aur
|
2024-03-27 13:15 (UTC) |
php74-tokenizer
|
7.4.33-5 |
11 |
0.03
|
tokenizer module for php74 |
el_aur
|
2024-03-27 14:57 (UTC) |
php82-tokenizer
|
8.2.17-1 |
3 |
0.03
|
tokenizer module for php82 |
el_aur
|
2024-03-27 13:09 (UTC) |
python-rtf_tokenize
|
1.0.0-8 |
1 |
0.01
|
Simple RTF tokenizer package for Python |
AlphaJack
|
2023-02-06 11:04 (UTC) |
php81-tokenizer
|
8.1.27-1 |
10 |
0.01
|
tokenizer module for php81 |
el_aur
|
2024-01-03 11:55 (UTC) |
jsmn
|
1.1.0-1 |
1 |
0.01
|
JSON parser/tokenizer |
orphan
|
2023-08-01 23:12 (UTC) |
php73-tokenizer
|
7.3.33-11 |
11 |
0.00
|
tokenizer module for php73 |
matth
|
2024-03-27 14:58 (UTC) |
mailparser
|
3.15.0-3 |
3 |
0.00
|
Tokenizer for raw mails |
kleintux
|
2023-07-09 19:09 (UTC) |
php80-tokenizer
|
8.0.30-1 |
15 |
0.00
|
tokenizer module for php80 |
muhviehstarr
|
2023-11-23 15:15 (UTC) |
r-tokenizers
|
0.3.0-1 |
1 |
0.00
|
Fast, Consistent Tokenization of Natural Language Text |
BioArchLinuxBot
|
2022-12-22 12:02 (UTC) |
python-crossandra
|
1.3.0-1 |
1 |
0.00
|
A simple tokenizer operating on enums with a decent amount of configuration |
MithicSpirit
|
2023-03-09 15:58 (UTC) |
php72-tokenizer
|
7.2.34-15 |
11 |
0.00
|
tokenizer module for php72 |
el_aur
|
2024-03-27 14:58 (UTC) |
python-sacremoses
|
0.1.1-1 |
1 |
0.00
|
Python port of Moses tokenizer, truecaser and normalizer |
hottea
|
2023-10-30 22:03 (UTC) |
uctodata
|
0.9.1-1 |
0 |
0.00
|
An advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages. Tokenization is an essential first step in any NLP pipeline. This package contains the necessary data. |
proycon
|
2022-07-22 09:43 (UTC) |
ucto
|
0.32.1-1 |
1 |
0.00
|
An advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages. Tokenization is an essential first step in any NLP pipeline. |
proycon
|
2024-03-21 11:50 (UTC) |
tokenizer-git
|
r113.8b0c4e2-1 |
0 |
0.00
|
Convert source code into numerical tokens |
aksr
|
2023-04-22 07:03 (UTC) |
sentences-bin
|
1.0.0-2 |
0 |
0.00
|
A multilingual command line sentence tokenizer |
neurosnap
|
2023-02-06 14:27 (UTC) |
sentencepiece-git
|
r492.ffa2c82-1 |
0 |
0.00
|
Unsupervised text tokenizer for Neural Network-based text generation |
panosk
|
2019-05-14 09:18 (UTC) |
r-hunspell
|
3.0.3-2 |
0 |
0.00
|
High-Performance Stemmer, Tokenizer, and Spell Checker |
BioArchLinuxBot
|
2023-10-06 18:04 (UTC) |
python2-ucto-git
|
10-1 |
1 |
0.00
|
Python binding for Ucto, an advanced tokenizer (for NLP) |
MarsSeed
|
2015-06-21 10:53 (UTC) |
python-sentencepiece-git
|
0.2.0-1 |
0 |
0.00
|
Sentencepiece text tokenizer (Python version) |
lumaku
|
2023-07-23 18:28 (UTC) |
python-sacremoses-git
|
0.0.35.r41.gb94654f-1 |
0 |
0.00
|
Python port of Moses tokenizer, truecaser and normalizer |
orphan
|
2020-07-23 10:33 (UTC) |
python-html5lib-git
|
1.1.r9.gf7cab6f-1 |
0 |
0.00
|
A Python HTML parser/tokenizer based on the WHATWG HTML5 spec |
robertfoster
|
2022-06-09 23:17 (UTC) |
python-gruut-lang-en
|
2.0.0-00 |
0 |
0.00
|
English language files for gruut tokenizer/phonemizer |
skeilnet
|
2022-11-14 15:41 (UTC) |
python-gruut
|
2.3.4-00 |
0 |
0.00
|
A tokenizer, text cleaner, and phonemizer for many human languages. |
skeilnet
|
2022-11-14 16:42 (UTC) |
php71-tokenizer
|
7.1.33-11 |
12 |
0.00
|
tokenizer module for php71 |
wget
|
2024-03-27 15:00 (UTC) |
php70-tokenizer
|
7.0.33-13 |
11 |
0.00
|
tokenizer module for php70 |
wget
|
2024-03-27 15:00 (UTC) |
php55-tokenizer
|
5.5.38-15 |
4 |
0.00
|
tokenizer module for php55 |
el_aur
|
2024-03-27 14:52 (UTC) |
perl-string-tokenizer
|
0.05-1 |
0 |
0.00
|
A simple string tokenizer. |
jnbek
|
2015-06-16 21:54 (UTC) |
perl-perl-tokenizer
|
0.10-2 |
1 |
0.00
|
Perl::Tokenizer - a tiny Perl code tokenizer. |
trizen
|
2019-02-09 05:27 (UTC) |
miteiru-bin
|
2.2.0-7 |
0 |
0.00
|
An open source Electron video player to learn Japanese. It has main language dictionary and tokenizer (morphological analyzer), heavily based on External software MeCab |
zxp19821005
|
2024-03-22 02:30 (UTC) |
miteiru
|
4.2.1-2 |
0 |
0.00
|
An open source Electron video player to learn Japanese. It has main language dictionary and tokenizer (morphological analyzer), heavily based on External software MeCab |
zxp19821005
|
2024-03-22 02:39 (UTC) |
groonga-tokenizer-friso
|
1.1.0-1 |
0 |
0.00
|
Friso tokenizer for Groonga. |
cosmo0920
|
2020-12-17 08:42 (UTC) |
gposttl-git
|
r34.4d19dda-2 |
0 |
0.00
|
Brill's Parts-of-Speech Tagger, with built-in Tokenizer and Lemmatizer |
m3thodic
|
2022-05-05 01:40 (UTC) |
frog
|
0.32-1 |
1 |
0.00
|
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It includes a tokenizer, part-of-speech tagger, lemmatizer, morphological analyser, named entity recognition, shallow parser and dependency parser. |
proycon
|
2023-12-05 14:52 (UTC) |
friso
|
1.6.4-2 |
0 |
0.00
|
An opensource tokenizer for Chinese. |
cosmo0920
|
2020-12-20 13:54 (UTC) |