How do I regex non-Latin alphabets? (4)

1 Name: 4n0n4ym0u5 h4xx0r : 2008-03-29 17:19 ID:6t89q39/

If I wanted to, say, look for katakana in a block of Japanese text, or look for men and women of the same family in a Cyrillic passage (because Russian surnames change with gender), how would I do that?

2 Name: 4n0n4ym0u5 h4xx0r : 2008-03-30 11:35 ID:Heaven

It probably depends on the regular expression parser, but with Java's you can use \p{InKatakana} to match a single character in the katakana block (substitute Katakana for the name of whatever block you need, it seems to be case insensitive too.)

3 Name: 4n0n4ym0u5 h4xx0r : 2008-03-30 19:13 ID:LfNbRF7m

It doesn't matter you just use the UTF character.

4 Name: 4n0n4ym0u5 h4xx0r : 2008-04-01 13:04 ID:PttmDpY7

Certainly for simple tests you can just iterate over the characters, yes. For anything more complicated having regular expressions is useful.

This thread has been closed. You cannot post in this thread any longer.