Prepare text for word analysis
count word frequency after punctuation is stripped so punctuation marks do not create noisy tokens.
Remove every punctuation and symbol character from the input using the Unicode categories P and S. Letters, digits, and whitespace remain intact.
Are emoji removed?
Yes — emoji belong to the Unicode Symbol category.
Are Vietnamese diacritics affected?
No. Diacritical marks are preserved; use Remove Accents for that.