This repository contains codes and resources for the Master thesis "Exploring different Chinese segmentation approaches: Benefits of radical-based segmentation in low-resource text classification" (2022-2023 Winter semester Eberhard-Karls-Universität Tübingen)
- Data for the TNews experiments: https://metatext.io/datasets/toutiao-text-classification-for-news-titles-(tnews)-(clue-benchmark)
- Data for the ChnSentiCorp experiments: https://ieee-dataport.org/open-access/chnsenticorp
- Data for the WU3D experiments: https://github.com/aidenwang9867/Weibo-User-Depression-Detection-Dataset
- Data for the SWSR experiments: https://zenodo.org/record/4773875
- The radical list: https://github.com/hankcs/sub-character-cws/blob/master/data/radical/radical.txt