Open Source Chinese Conversion in WebAssembly

Blake Bradford Avatar

·

Introducing wasm-opencc: Open Source Chinese Conversion in WebAssembly

Have you ever needed to convert Chinese text between simplified and traditional characters in a browser environment? Look no further than wasm-opencc, an exciting open-source project that leverages WebAssembly to bring the power of OpenCC to the web.

What is wasm-opencc?

wasm-opencc is a WebAssembly version of the popular Chinese conversion library OpenCC, developed by BYVoid. This project enhances OpenCC and utilizes Emscripten for compilation, enabling Chinese text conversion capabilities directly in the web browser.

Key Features

  • Direct execution in the browser environment using WebAssembly (wasm)
  • Seamless integration in node.js and Electron applications without requiring addon compilation
  • Potential compatibility with React Native and Web Workers (untested)
  • Efficient loading of necessary dictionary data in the browser and support for custom data loading methods from a CDN

Installation

To run wasm-opencc in a browser environment, simply copy the files from the “dist” folder into your project and load them in your HTML file. Don’t forget to include the “.mem” file, as it is necessary for the code to run.

For node.js environments, you can install wasm-opencc using npm:

$ npm i -d wasm-opencc

After installation, you can use wasm-opencc as follows:

“`javascript
const { DictSource, Converter } = require(‘wasm-opencc’)
const dictSource = new DictSource(‘s2t.json’);

dictSource.get().then((args) => {
const converter = new Converter(…args)
console.log(converter.convert(‘繁体’))
// Don’t forget to manually call the “delete” method when you no longer need the converter to release memory resources
converter.delete()
})
“`

Please note that OpenCC itself also has a Node Addon version, so choose the appropriate version based on your needs.

API

The wasm-opencc API consists of two main classes: DictSource and Converter.

class DictSource

The DictSource class is responsible for loading the dictionary data required to initialize the Converter. You can either use the default data sources or provide your own dictionary data.

“`javascript
const dictSource = new DictSource(‘s2t.json’)

dictSource.setDictProxy((dictName) => {
// Proxy needs to return a promise
return Promise.resolve(‘僞\t偽\n’)
})

dictSource.get() // Calls the proxy function
“`

class Converter

The Converter class is used to perform text conversion.

javascript
const { DictSource, Converter } = OpenCCWasm_
OpenCCWasm_.ready().then(() => {
const dictSource = new DictSource('s2t.json');
return dictSource.get();
}).then((args) => {
const converter = new Converter(...args)
console.log(converter.convert('繁体'))
// Don't forget to manually call the "delete" method when you no longer need the converter to release memory resources
converter.delete()
})

Known Issues

  • Cannot handle the “.ocd” type of dictionary data
  • Compatibility issues with UglifyJS: Uglifying the wasm-compiled code may cause it to fail on Chrome but work on Safari. This may be a Chrome-specific issue.
  • Due to the above reasons and other code-related issues, the author has customized the bundled files for wasm-opencc running in the browser environment. It is not recommended to bundle wasm-opencc yourself if your project is intended for the browser environment.

Future Plans

The wasm-opencc project has exciting plans for future development, including:

  • Benchmarking to compare performance differences between the C++ version, the Node Addon version, and the wasm version
  • Providing a WebAssembly version for even smaller file sizes
  • Exploring compilation with Closure to improve efficiency

Contribute

If you’re interested in contributing to the wasm-opencc project, follow these steps:

  1. Install the necessary dependencies for Emscripten and OpenCC
  2. Compile the code using Emscripten: make -f WasmMakefile
  3. Build the JavaScript-related code: cd wasm && npm run build
  4. Generate documentation: cd wasm && npm run docs
  5. Run tests: cd wasm && npm run test

License

The wasm-opencc project is based on OpenCC and includes additional code in the /src/wasm directory. Some CMakeLists.txt files from the original project have been modified. The wasm-related JavaScript code is mainly located in the /wasm directory and is considered an addition to the original code. The /docs folder contains code for the gh-pages documentation.

wasm-opencc is licensed under Apache 2.0

This summary covers the key aspects of the wasm-opencc project, including its scope, system architecture, chosen technology stack, and robust data model. It highlights the importance of well-documented APIs, security measures, and strategies for scalability and performance. The deployment architecture, development environment setup, and code organization are also addressed. Error handling, logging, and comprehensive documentation standards are emphasized, along with plans for maintenance, support, and team training.

Leave a Reply

Your email address will not be published. Required fields are marked *