NeverSight/skills_feed/sentencepiece
Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.
Risk Score
50
out of 100
Popularity
10
Stars
2
Forks
Feb 12, 2026
Updated
Findings by Severity (Latest Scan)
CodeThreat AppSec
Full SAST + SCA agentic security analysis for MCP servers and Skills.