
Sonia Ben Hedia Twomey
Linguistics - Data - Research - Education
A mixture second to none
Who am I?
I am first and foremost someone who likes to bring order into the chaos, who solves problems and finds answers.
The domains in which I do so have changed over the years, and so have the ways in which I do it.
I am a computational linguist, a scientist, a data analyst. I am a writer, a teacher, a critical thinker.
I love planning, organizing, efficiency, and structure. I am eager to learn, motivated, and never give up. I am genuine and not afraid to admit if I do not know anything - I am the one person who asks.


My Story
I grew up in Siegen, Germany. After High School I decided to become a teacher as I a) wanted to work in the social field and give back, b) loved languages and maths and did not know another way to combine it, and c) did not really know what else to do. I started a 5 year program at Siegen University in 2008. During my studies, I fell in love with my choice - studying psychology, didactics and pedagogy was beyond interesting and teaching was really fun. I did, however, discover something else I loved - linguistics and empirical research.
I enjoy the structure of language and fell in love with the analytics of the scientific method. It all just made so much sense. I therefore decided to write my Masters thesis in linguistics, and it turned out pretty well. In fact it was so good that I was offered a PhD position in a linguistic research project on the morpho-phonological interface (no worries if you don't know what that is, I did not at the time). I took the opportunity and moved to Düsseldorf where I started the project.
During my time in Düsseldorf, I learned more than I could have ever imagined. I designed studies, worked with data bases, annotated data, learned how to code using python, modeled data statistically, taught university seminars, conducted experiments in Cambridge,UK, went to a number of international conferences, presented my project innumerous times, and published a number of research papers in international journals. In April 2018, I then handed in my PhD thesis and defended it successfully on September 6th 2018. I stayed in Düsseldorf for one more year, worked as a post-doctoral researcher and was part of a wonderful project in which we worked on the career-specific aspects in undergraduate and graduate classes, as well as the relation between the business and the academic world.
While I loved my job, my research and my environment, I decided to leave Düsseldorf and Germany and move to the US afterwards. I moved to Texas in the summer of 2019 and after receiving my work authorization at the beginning of 2020 I first started working as a freelancing linguist. In addition to utterance creation and providing localization expertise, I also established quality control procedures for utterances, conducted quality control analyses, and helped manage a team of over 100 vendors. In May 2020 I started a position as a Search Language Specialist with QualiTest in Austin. What I loved about this position is that it gave me the opportunity to follow a lot of my professional interests such as project management, linguistics, and data science. While my main task was to analyze data sets to help optimize software for certain locales and languages to finally deliver the right product in the locale, I was also involved in several other projects, such as a linguistic talk series and the establishment of a learning platform. Furthermore, I got the chance to work with a Fortune 100 company which gave me the opportunity to gain so much valuable industry experience.
In September 2020 I started another position as a Senior Computational Linguist at Roku. I am part of the Roku Voice team and have been part of several big initiatives over the last year. My first big project was the launch of the German Voice support. I was involved in the customization of the ASR model, created a German rule-based grammar for the supported intents, created training data for the ML model used in our system and also put together test sets. I especially enjoyed the fine-tuning of the ML models where I provided quantitive and qualitative analyses that pointed us to areas where we could improve the models.
After the German launch, my main main focus area was the improvement of processes for training and testing NLU systems by standardizing, automating and organizing data processing pipelines. To be more precise, I spearheaded a project that used rule-based systems and ML models to automatically annotate data to decrease the amount of data that needed to be annotated manually. This project reduced the annotation cost by 88%. Furthermore, I lead the initiative to design and implement a data base with the processed data making the data easily and readily available in a standardized format. This sped up the creation of training and test data which eventually lead to an improvement of the production models, e.g. 10% improvement of the Spanish model. I also developed frontend apps using streamlit to monitor the status of the data base (dashboards) and to analyze the data in standardized and quick ways.
Apart from this I also worked on several initiatives, such as supporting new domains, such as news and sports with the Voice system. Furthermore, I collaborated with other teams by applying linguistic knowledge to data analyses in order to develop and improve exiting algorithms. This work is leading to improvements in areas such as genre searches, ASR error correction and confirmation huds. Lately, I have also been conducting work in the area of large language models, investigating the usefulness of such models for language generation and ASR error correction.