Simplifying Text Complexity Analysis with spaCy

Blake Bradford Avatar

·

TRUNAJOD: Simplifying Text Complexity Analysis with spaCy

Text complexity analysis plays a crucial role in various domains, including education, content creation, and natural language processing (NLP) applications. Gathering meaningful insights from texts requires the ability to extract and measure numerous linguistic aspects. That’s where TRUNAJOD comes in.

Introducing TRUNAJOD, a powerful Python library for text complexity analysis built on the popular spaCy library. With TRUNAJOD, software engineers and solution architects can leverage a range of measurements and capabilities to gain valuable insights from text data.

Key Features of TRUNAJOD

TRUNAJOD offers a rich set of features that simplify the analysis of text complexity:

  1. Text Processing Utilities: TRUNAJOD provides utilities for text processing, including lemmatization and part-of-speech checking.

  2. Semantic Measurements: Measure the average coherence between sentences and the average synonym overlap in a text.

  3. Giveness Measurements: Calculate metrics such as pronoun density and pronoun noun ratio.

  4. Emotion Calculations: Utilize a built-in emotion lexicon to compute emotion calculations based on words in the text.

  5. Lexico-Semantic Norm Dataset: Access a lexico-semantic norm dataset to compute lexico-semantic variables from text.

  6. Type Token Ratio (TTR) Metrics: Explore TTR-based metrics and tunable options.

  7. Syllabizer: Employ a built-in syllabizer (currently only for Spanish) for syllable analysis.

  8. Discourse Markers: Measure connectivity within text using discourse markers.

  9. Surface Proxies of Readability: Compute various surface proxies of text readability.

  10. Parse Tree Similarity: Measure parse tree similarity as an approximation of syntactic complexity.

  11. Parse Tree Correction: Enhance parse tree analysis with periphrasis and heuristics for clause count.

  12. Entity Grid and Entity Graphs: Utilize entity grid and entity graph models as measures of coherence.

  13. User-Friendly API: TRUNAJOD offers an easy-to-use and user-friendly API for seamless integration.

Installation and Getting Started

To get started with TRUNAJOD, simply run pip install trunajod to install the library. Note that TRUNAJOD requires Python 3.6.2+ to run successfully.

Before using TRUNAJOD, it is recommended to have your spaCy model set up. Install or download a model (e.g., for Spanish users, a Spanish model) using python -m spacy download es_core_news_sm. For other models, refer to the spaCy documentation.

Next, refer to the provided code snippet in the README documentation for a quick start guide. The code demonstrates how to load TRUNAJOD models, load the spaCy model, and perform various text complexity analyses such as lexico-semantic norms, frequency index, clause count, and entity grid.

A Real-World Example

TRUNAJOD has been successfully applied in real-world scenarios. One notable example is the creation of the TRUNAJOD web app, which assesses text complexity and checks text adequacy for particular school levels. The app utilizes various TRUNAJOD indices and latent features derived from analyzing multiple Chilean school system texts, making it a valuable tool for educators and curriculum developers.

To learn more about the TRUNAJOD web app and see it in action, watch the demo video provided in the README documentation.

Contributing and References

Contributions to TRUNAJOD are welcome! If you encounter any bugs or have feature requests, please file an issue on the GitHub repository. Refer to the contributing guidelines for more details on how to contribute.

If you find TRUNAJOD useful, consider citing the following papers for reference:

  1. Palma, D., & Atkinson, J. (2018). “Coherence-based automatic essay assessment.” IEEE Intelligent Systems, 33(5), 26-36.
  2. Palma, D., Soto, C., Veliz, M., Riffo, B., & Gutiérrez, A. (2019). “A Data-Driven Methodology to Assess Text Complexity Based on Syntactic and Semantic Measurements.” In International Conference on Human Interaction and Emerging Technologies.

For more information and additional references, consult the README documentation.

In conclusion, TRUNAJOD offers a comprehensive set of tools and measurements for simplifying text complexity analysis. Leveraging spaCy’s capabilities, TRUNAJOD empowers software engineers and solution architects to gain meaningful insights from text data and enhance NLP applications. Start exploring TRUNAJOD today and unlock a new level of text analysis and understanding.

If you have any questions, please feel free to ask.

Leave a Reply

Your email address will not be published. Required fields are marked *