Unlocking the Power of Japanese Character Conversion

February 14, 2024

Habachen: Unlocking the Power of Japanese Character Conversion

re you tired of manually converting Japanese text between full-width and half-width characters? Have you struggled with cumbersome methods to convert hiragana to katakana and vice versa? Look no further! Introducing Habachen, the high-speed and memory-efficient text conversion module that will simplify your Japanese text processing tasks.

Installation

Before we dive into the amazing capabilities of Habachen, let’s quickly install it:

#bash
pip install habachen

Usage

Habachen provides several convenient functions for character conversion. Let’s explore a few of them:

Converting Half-width to Full-width

#python
import habachen

text = 'abc!?012ﾊﾝｶｸﾓｼﾞ'
converted_text = habachen.han_to_zen(text)
print(converted_text)
# Output: 'ａｂｃ！？０１２ハンカクモジ'

Converting Half-width Katakana Only

#python
import habachen

text = 'abc!?012ﾊﾝｶｸﾓｼﾞ'
converted_text = habachen.han_to_zen(text, ascii=False, digit=False, kana=True)
print(converted_text)
# Output: 'abc!?012ハンカクモジ'

Converting Full-width to Half-width

#python
import habachen

text = 'ａｂｃ！？０１２ゼンカクモジ'
converted_text = habachen.zen_to_han(text)
print(converted_text)
# Output: 'abc!?012ｾﾞﾝｶｸﾓｼﾞ'

Converting Full-width Katakana to Hiragana

#python
import habachen

text = 'モジレツノ変換'
converted_text = habachen.to_hiragana(text)
print(converted_text)
# Output: 'もじれつの変換'

Converting Hiragana to Katakana

#python
import habachen

text = 'もじれつの変換'
converted_text = habachen.to_katakana(text)
print(converted_text)
# Output: 'モジレツノ変換'

Benchmarks

To demonstrate the remarkable performance of Habachen, we conducted benchmarks using different conversion tasks. Here are the results:

Short Text (140 characters)

| Conversion Task | Habachen | mojimoji | jaconv |
|—|—|—|—|
| Full-width to Half-width | 1.319 µs | 11.92 µs | 11.22 µs |
| Half-width to Full-width | 1.147 µs | 10.15 µs | 26.49 µs |
| Hiragana to Katakana | 0.3674 µs | | 11.22 µs |
| Katakana to Hiragana | 0.3542 µs | | 10.97 µs |

Long Text (468,996 characters)

| Conversion Task | Habachen | mojimoji | jaconv |
|—|—|—|—|
| Full-width to Half-width | 2.607 ms | 55.07 ms | 40.36 ms |
| Half-width to Full-width | 1.832 ms | 33.89 ms | 57.16 ms |
| Hiragana to Katakana | 0.711 ms | | 38.72 ms |
| Katakana to Hiragana | 0.755 ms | | 40.36 ms |

Impressive, right? Habachen provides lightning-fast performance compared to other existing libraries in the market.

Conclusion

Habachen is a game-changer for anyone working with Japanese text processing. Its high-speed and memory-efficient character conversion capabilities make it a must-have tool. Whether you need to convert full-width characters to half-width, hiragana to katakana, or vice versa, Habachen is your go-to module.

Now go ahead and unleash the power of Habachen in your Japanese text processing tasks!

For more details, documentation, and an in-depth performance analysis, visit the Habachen GitHub Repository and the Habachen Documentation.

Also, check out the fascinating Qiita article (in Japanese) by the creator of Habachen.

Happy text processing with Habachen!

Group Sum