Introduction
For a comprehensive overview of my open-source projects, please visit my GitHub.
I am Ayaka, a 24-year-old computer science, historical linguistics, and mathematics researcher.
NLP
I am driven by my lifelong goal of designing powerful artificial intelligence that can understand and process a diverse range of languages, both widely spoken and under-resourced. My strong background in computer science has allowed me to gain a thorough understanding of the architecture of state-of-the-art NLP models.
I have made significant contributions to NLP, including the development of TransCan, an English-Cantonese machine translation model that outperforms state-of-the-art commercial products by 11.8 BLEU. I implemented the BART model from scratch using JAX, establishing a versatile codebase for future deep learning model architecture research. Moreover, I created the LIHKG Cantonese Dataset through a scraper that bypassed many layers of Cloudflare’s protection, resulting in a corpus of 172 million unique sentences.
Recent advancements in NLP technology have been particularly exciting, including the release of ChatGPT, a highly advanced large language model with strong language understanding and reasoning abilities. Being one of the first pioneers of ChatGPT Plus, I am thrilled to be at the forefront of this cutting-edge journey. I am working on various prompt and dialogue engineering techniques, and I am fine-tuning ChatGPT-like language models such as LLaMA for various NLP tasks and applications.
Nevertheless, even the best NLP models today are unable to process low-resource languages, despite the fact that humans are capable of adapting to these languages through appropriate methods. My long-term objective is to seek an appropriate method that enables AI to pick up low-resource languages without human supervision.
Linguistics
As a historian of languages, I am deeply interested in exploring the evolution of language and how ancient languages from different geographical regions and ethnic groups have developed into what they are today and what they may become in the future.
To address this problem, I believe it is essential to have a deep understanding of the history of various languages. With regard to the Chinese language, I have utilised my mastery of the Qieyun phonological system to develop the widely-used qieyun-js programming library. I also possess a deep knowledge of the history of Japanese kanji pronunciation, with the ability to read any text written in kanji using the old Japanese pronunciation of on'yomi.
To attain an intuitive sense of language evolution, I have been researching groups of similar languages and dialects, incorporating the knowledge of anthropology and ethnology. For example, in addition to my proficiency in Mandarin and Cantonese, I am also familiar with the Hakka, Hokkien and Teochew dialects. I am simultaneously studying the North Germanic languages of Norwegian, Danish, Swedish, Icelandic, Faroese and Old Norse. Furthermore, I am delving into the study of Malay and the historical development of Austronesian languages.
By leveraging my extensive knowledge of language evolution, I am poised to make valuable contributions to the field of historical linguistics and shape our understanding of the future of world languages.
Mathematics
Mathematics has always been a passion of mine and I have dedicated myself to exploring a wide range of areas including group theory, number theory, category theory and type theory.
In my research, I am particularly interested in formalising mathematical theorems through theorem provers, specifically using the Lean proof assistant. I believe that this approach not only deepens our understanding of mathematical concepts but also opens up new avenues for discovery in mathematics.
Unfortunately, due to limited time, I am unable to provide individual supervision at this time. However, I am always eager to collaborate with other researchers who share my passion for mathematics and welcome the opportunity to work together.
If you would like to learn more about my work or discuss potential collaboration opportunities, please feel free to reach out to me. I look forward to hearing from you!
Articles
Miscellaneous
Aside from my professional pursuits, I am passionate about anthropology, ethnology, mythology, typography, and astronomy. In my leisure time, I engage in a range of diverse hobbies that help me stay grounded and inspired. These include singing, hiking, plane-spotting and stargazing.
In my use of technology, I have made several unique choices. I use the PWBRHK keyboard layout on my mobile phone, instead of the conventional QWERTY layout. This layout, which I designed specifically with linguistics in mind, is optimised for typing Cantonese and Nordic languages.
I use my self-designed Nya calendar alongside the Gregorian calendar. The Nya calendar takes into account not only the orbit of the Earth, but also that of the Moon and Mercury, and boasts several advantageous features.
Additionally, I use Arch Linux as my daily desktop operating system. As a lightweight and highly customizable Linux distribution, Arch Linux is known for its rolling-release model, up-to-date software packages, and minimalistic approach. By maintaining only one copy of a software package, which is shared among all packages that require it, Arch Linux keeps my system uncluttered and reduces redundancy.
As a Baroness of the Principality of Sealand, I am honoured to hold a noble title and play a crucial role in shaping its future. My role is a privilege that enables me to represent its interests, promote its sovereignty, and embody its values and principles. I am proud to stand at the forefront of this self-determined entity, paving the way for a new era of independence and self-expression.