SEBASTIN
SANTY

I am a PhD student in Computer Science at the University of Washington, advised by Sewoong Oh.

My previous works explored how to make language systems more usable for people.

A significant part of my research has revolved around studying the inclusivity of NLP systems. For example, we found that NLP systems have positionalities that make them not work well for the broader population, eventually marginalizing communities such as those who speak low resource languages. In our recent studies, we found that inclusion not only benefits the society, but also has the potential to improve performance of ML systems, ultimately contributing to AI progress.

The other major thread I've been interested in is designing seamless human-computer interactions. This has previously manifested in the form of user interfaces, some of which are now part of Firefox, Bugzilla, & other open-source projects, published as research papers, or simply ended up as fun side projects. I believe we still have a long way to go in figuring out naturalistic interactions. Some of my thoughts on this topic is covered in a recent tutorial we gave at EMNLP 2023.

RESEARCH

Linguistic Diversity

Multilingual Diversity Improves Vision-Language Representations
Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt,
Pang Wei Koh, Ranjay Krishna
NeurIPS 2024 SPOTLIGHT PDF ABS
Vision Datasets and Models Exhibit Cultural and Linguistic Diversity in Perception
Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna PDF ABS
NLPositionality: Characterizing Design Biases of Datasets and Models
Sebastin Santy*, Jenny Liang*, Ronan Le Bras, Katharina Reinecke, Maarten Sap
ACL 2023 OUTSTANDING PAPER CMU ML Blog WEB PDF ABS
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
Pratik Joshi*, Sebastin Santy*, Amar Budhiraja*, Kalika Bali, Monojit Choudhury
ACL 2020 US FTC NLP News NLP Beyond English Quartz Underrated ML WEB TALK PDF ABS

Language System Deployments

Language Translation as a Socio-Technical System
Sebastin Santy, Kalika Bali, Monojit Choudhury, Sandipan Dandapat, Tanuja Ganu, Anurag Shukla,
Jahanvi Shah, Vivek Seshadri
COMPASS 2021 Mint Lounge SLIDES PDF ABS
Learnings from Technological Interventions in a Low Resource Language
Devansh Mehta*, Sebastin Santy*, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava,
Alok Sharma, Anurag Shukla, Vishnu Prasad, Venkanna U, Amit Sharma, Kalika Bali

Unsung Challenges of Building and Deploying Language Technologies for LRL Communities
Pratik Joshi, Christain Barnes, Sebastin Santy, Simran Khanuja, Sanket Shah, Anirudh Srinivasan,
Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury, Kalika Bali

User Interfaces

BLIP: Facilitating the Exploration of Undesirable Consequences of Digital Technologies
Rock Yuren Pang, Sebastin Santy, René Just, Katharina Reinecke
CHI 2024 PDF ABS
LeetPrompt: Leveraging Collective Human Intelligence to Study Large Language Models
Sebastin Santy, Ayana Bharadwaj, Sahith Dambekodi, Alex Albert, Cathy Yuan, Ranjay Krishna
AI & HCI @ ICML 2023 PDF PLAY
INMT: Interactive Neural Machine Translation
Sebastin Santy, Sandipan Dandapat, Monojit Choudhury, Kalika Bali
EMNLP 2019 Demo Slate WEB CODE POSTER PDF ABS
CoSSAT: Code-Switched Speech Annotation Tool
Sanket Shah, Pratik Joshi, Sebastin Santy, Sunayana Sitaram
AnnoNLP @ EMNLP 2019 SLIDES PDF ABS
BERTologiCoMix: How does Code-Mixing interact with Multilingual BERT?
Sebastin Santy*, Anirudh Srinivasan*, Monojit Choudhury
AdaptNLP@EACL 2021 POSTER PDF ABS
Towards Task Understanding in Visual Settings
Sebastin Santy, Wazeer Zulfikar, Rishabh Mehrotra, Emine Yilmaz
AAAI 2019 (Student Abstract) POSTER PDF ABS
TALKS
Designing, Evaluating, and Learning from Humans Interacting with NLP Models
A conference tutorial on research in the intersection of NLP and HCI
EMNLP 2023 - Singapore
WEB SLIDES
The State and Fate of Linguistic Diversity in the NLP world
A casual talk about our ACL 2020 paper on the same topic
NLP with Friends - Remote
ABS TALK
Repeatable Data Setup for Repeatable Science
Talked about DataDepsGenerators.jl and Reproducible AI.
PyData 2018 - New York City, USA
ABS TALK
Miscellaneous
Resources: My PhD Statement, CV Template