Workshop on Multilingual Search

The Web Conference, April 19-23, 2021 - multilingual.workshop@gmail.com

Search engines are the workhorses of the World Wide Web, returning billions of responses to billions of queries every day. With the explosion of information on the internet, every website from social media to newsrooms to shopping portals rely on a search engine to help users quickly and easily find information that is of interest, without the need to wade through numerous irrelevant web pages. With the advent of smart assistants like Alexa and Siri, search technology is no longer restricted to a written interface, with more and more users interacting with their devices with voice and gestures. As users interact with the internet in their natural method of communication, it has become important for search engines to understand the different languages of their users. In a global setting, challenges of multilingual data are also faced by backend systems that support search engines like Catalog systems, Ads Servers, Cloud Services, IOT devices and more.

English is a widely used language on the internet. Understandably then, a large proportion of the research in search technologies like language and query understanding systems, has taken place for the English language. These models can be adopted to different languages via transfer learning and domain adaptation. However, it is not scalable to relearn the the model for each new language.

Ensuring that Search works equally well in all languages has several major challenges: How can we properly scale language and query understanding systems to languages that are significantly less wide-spread than English? Can we build universal query understanding models for all languages? How do we serve customers searching in languages with little or no annotated data? How can we leverage state-of-art deep learning research in multilingual language understanding? How can we improve the experience of users searching in a variety of languages? State-of-art NLP research has shown promising progress in building multilingual language understanding models with deep learning and massive amounts of data. We aim to bring together experts from across the globe to share their knowledge and experiences on how to leverage state-of-art science in NLP and deep learning, thus helping achieve an improved search experience in a multilingual setting.

Call For Contributions

In this first Multilingual Search workshop, we aim to bring together researchers and practitioners from across the world, and, in particular, from different disciplines, such as information retrieval, data mining, machine learning, data science, NLP/NLU, machine translation, transfer learning and other related areas to share their ideas and research achievements in providing a seamless search experience in a multilingual setting.

This workshop will cover the challenges in providing a seamless search experience in a multi-lingual settings. We welcome contributions dealing with all aspects of multilingual search including but not limited to:

  • Cross-lingual representations
  • Multilingual query understanding engines
  • Transfer learning, Domain adaptation and label propagation techniques
  • Applications to multi-lingual web search, e-commerce search and social networks
  • Construction of cross-lingual knowledge bases
  • Backend systems like catalogs of of shopping portals or storage and indexing of say new articles in different languages
  • Advances in Machine Translation
  • Challenges for IOT devices that interact with users in multiple languages
  • Tackling lack of behavioral data for non-dominant language queries
  • Matching, ranking and query understanding for cold-start and multiple languages
  • Zero-shot and few-shot learning
  • Learning from monolingual datasets
  • Role of uncertainty in learning multilingual embeddings
  • ... and related areas

Paper Submission

Authors are invited to submit papers of 4-8 pages in length. Papers should be submitted electronically in PDF format, using the ACM SIG Proceedings format, with a font size no smaller than 10pt. Submit papers through EasyChair. All submissions will be single blind and peer-reviewed. All accepted papers will be presented at the workshop. In addition, accepted papers will be published in the companion proceedings of the WWW conference and the ACM digital library, unless the authors choose to opt out from publishing their papers. We encourage both academic and industry submissions.

Important Dates

Paper Submission Deadline: March 07, 2021
Acceptance notification: March 26, 2021
Camera-ready due: April 09, 2021
Workshop date: April 15, 2021
Conference dates: April 19-23, 2021

Organizers

  • Ashutosh Joshi, Amazon
  • Shailendra Agarwal, Amazon
  • Vaclav Petricek, Amazon
  • Atul Saroop, Amazon
  • Rahul Bhagat, Amazon

Invited Speakers and Panelists

  • Douglas Oard is a Professor at the University of Maryland, College Park (USA), with joint appointments in the College of Information Studies (the iSchool) and the University of Maryland Institute for Advanced Computer Studies (UMIACS). He is an electrical engineer, with research interests that center around the use of emerging technologies to support information seeking by end users. Additional information is available at http://terpconnect.umd.edu/~oard/.
  • Francisco (Paco) Guzman is a Research Scientist Manager at Facebook AI working on Translation. His research has been focused on several aspects of Machine Translation including low-resource translation, translation mining, evaluation, and quality estimation. Before joining Facebook in 2016, Paco was a Research Scientist at Qatar Computing Research Institute in Qatar in 2012-2016. He obtained his PhD in 2011 from ITESM (Monterrey Tech) in Mexico
  • Rahul Bhagat is the Head of Global and Multilingual Search Quality in Amazon Search. Rahul holds a PhD in Computer Science from University of Southern California (USC) and has over 18 years of experience leading research and development teams in industry and academia. His primary interests lie in solving real world problems through advances in Natural Language Processing (NLP), Information Retrieval (IR), and Personalization.
  • Alessandro Moschitti is a Principal Applied Research Scientist of Amazon Alexa leading the research on retrieval-based QA systems, and a professor of the CS Dept.~of the University of Trento, Italy. His expertise concerns theoretical and applied machine learning in the areas of NLP, IR and Data Mining. He has devised innovative structural kernels and neural networks for advanced syntactic/semantic processing and inference over text, documented by about 300 scientific articles. He has received four IBM Faculty Awards, one Google Faculty Award, and five best paper awards. He was the General Chair of EMNLP 2014, a PC co-chair of CoNLL 2015, and has had a chair role in more than 50 conferences and workshops and been an editor of several journals.

Program

Session Talk Presenter Duration (min) PDT (Los Angeles) EDT (New York) CEST (Ljubljana) IST (Mumbai)
Introduction Ashutosh Joshi 5 7:00 am 10:00 am 4:00 pm 7:30 pm
Session 1 Invited Talk: Using Translation to Connect People: Low Resource Challenges Francisco Guzman 30 7:05 am 10:05 am 4:05 pm 7:35 pm
Instance Based transfer Learning for Multilingual Deep Retrieval Andrew Arnold 20 7:35 am 10:35 am 4:35 pm 8:05 pm
Query Language Identification with Weak Supervision and Noisy Label Pruning Sweta Sharma, Vijay Huddar 15 7:55 am 10:55 am 4:55 pm 8:25 pm
Break 10 8:10 am 11:10 am 5:10 pm 8:40 pm
Session 2 Invited Talk: Multilingual Answer Sentence Ranking via Automatically Translated Data Alessandro Moschitti 30 8:20 am 11:20 am 5:20 pm 8:50 pm
Towards Zero-Shot Learning for Image Retrieval and Tagging Pranav Agarwal, Ritiz Tambi 20 8:50 am 11:50 am 5:50 pm 9:20 pm
Leveraging Multilingual Neural Language Models for On-Device Natural Language Understanding Huy Tu 15 9:10 am 12:10 am 6:10 pm 9:40 pm
Break 10 9:25 am 12:25 am 6:25 pm 9:55 pm
Session 3 Invited Talk: Cross Language Speech Retireval Doug Oard 30 9:35 am 12:35 am 6:35 pm 10:05 pm
IndicSOUNDEX Algorithm for text Matching Christopher DiPersio 15 10:05 am 1:05 pm 7:05 pm 10:35 pm
Panel Discussion Rahul Bhagat, Francisco Guzman, Alessandro Moschitti, Doug Oard 30 10:20 am 1:20 am 7:20 pm 10:50 pm
Closing Ramarks Ashutosh Joshi 5 10:50 am 1:50 am 7:50 pm 11:20 pm