Description of the imageWelcome to The Second DISPLACE Challenge @Interspeech 2024 Description of the image
We are looking forward to seeing everyone!


Latest Updates (as of 13/06/2024):


About


In multilingual communities, the social conversations often involve code-mixed and code-switched speech. The code-mixing refers to the scenario where words or morphemes from one language (secondary) are used within a sentence of another language (primary). However, the switching of languages at the sentence or phrase level is known as code-switching, where the conversational language is itself shifted. In such cases, the extraction of various analytics for speech-based systems, such as speaker and language information or automatic speech recognition (ASR) to generate rich transcriptions, becomes highly challenging. The current speaker diarization systems are simply not equipped to deal with multilingual conversations, where the same talker speaks in multiple code-mixed languages.

Focusing on the Interspeech-2024 theme, i.e., Speech and Beyond, the DISPLACE-2024 challenge aims to address research issues related to speaker and language diarization along with Automatic Speech Recognition (ASR) in an inclusive manner. The goal of the challenge is to establish new benchmarks for speaker diarization (SD) in multilingual settings, language diarization (LD) in multi-speaker settings, and ASR in multi-accent settings, using the same underlying dataset. The previous works have addressed speaker and language diarization, & ASR but in isolation. A collective effort from worldwide researchers is required to address associated research issues. We look forward to your participation in reaching a new milestone in the speaker and language diarization & ASR areas. We also encourage general submissions in the field of speaker and/or language diarization and/or ASR under DISPLACE-2024 challenge/special session.

We also encourage general submissions related to speaker and language diarization & ASR in DISPLACE-2024 challenge / special session in Interspeech-2024.

Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and LAnguage in Conversational Environments (Click here for full paper) has been accepted in Speech Communication.



Timeline

Registrations Opens:
15 Dec 2023  
Data Release (Dev):
10 Jan 2024
Baseline System Release:
20 Jan 2024
Leaderboard Active:
1 Feb 2024
Phase-I Evaluation Data Release:
1 Feb 2024
Registrations Closes:
1 Feb 2024 Extended till 15 Feb 2024
Phase-I Evaluation Closes:
28 Feb 2024 Extended till 4 March 2024
System Report submission:
28 Feb 2024 Extended till 4 March 2024
INTERSPEECH Paper Submission Deadline:
2 Mar 2024
INTERSPEECH Paper Update Deadline:
11 Mar 2024
Phase-II Evaluation Opens:
1 Apr 2024
Phase-II Evaluation Closes:
20 Apr 2024 Extended till 20 May 2024 Closed

Tracks

This challenge organizes three tracks and you can participate in one, two, or even all of them.
Track-1 is dedicated to speaker diarization (SD).
Track-2 focuses on language diarization (LD).
Track-3 is exclusive for Automatic Speech Recognition (ASR)

  • You are encouraged to submit your experimental findings and observations to the DISPLACE-2024 Challenge at Interspeech 2024 for peer review and subsequent consideration for presentation (and publication) at the conference. For this, we require you to participate in one, two, or even all of the tracks.

Track-1:
Speaker Diarization in multilingual scenarios.
  • a. The goal is to perform speaker diarization (who spoke when) in multilingual conversational audio data, where the same speaker speaks in multiple code-mixed and/or code switched languages. 
  • b. You will be provided with a dev set (far-field recordings), and a baseline system to enable the design of your own models.
  • c. Subsequently, a blind evaluation set (far-field recordings), will be provided to all participants. You will need to submit your model predictions (in rttm format) on the blind set to a leaderboard interface (setup in Codalab). The leaderboard will be featuring the performance of other teams on the same dataset.
  • d. The performance metric for evaluation will be the Diarization Error Rate (DER). 
  • e. All participants will be required to submit a system description report (2-4 pages) to the organizers (Submission Deadline: click here). All participants are also encouraged to submit their findings to the DISPLACE challenge, Interspeech 2023 for peer-review (Submission page) .
  • f. The participating teams are encouraged to use any open datasets for training and developing the diarization systems. 
Track-2:
Language Diarization in multi-speaker settings.
  • a. The goal is to perform language diarization in multi-speaker conversational audio data, recorded in far-field settings. 
  • b. You will be provided with a dev audio dataset, and a baseline system to enable the design of your own models. 
  • c. Subsequently, a blind evaluation dataset will be provided to all participants. You will need to submit your model predictions (in rttm format) on the blind set to a leaderboard interface (setup in Codalab). The leaderboard will be featuring the performance of other teams on the same dataset. 
  • d. The performance metric for evaluation will be the Diarization Error Rate (DER). 
  • e. All participants will be required to submit a system description report (2-4 pages) to the organizers (Submission Deadline: click here). All participants are also encouraged to submit their findings to the DISPLACE-2024 challenge, Interspeech 2024 for peer-review (Submission page) .
  • f. The participating teams are encouraged to use any open datasets for training and developing the diarization systems. 
Track-3:
Automatic Speech Recognition in multi-accent settings.
  • a. The goal is to perform automatic speech recognition in multi-accent conversational audio data, recorded in far-field settings. 
  • b. You will be provided with a dev audio dataset, and a baseline system to enable the design of your own models. 
  • c. Subsequently, a blind evaluation dataset will be provided to all participants. You will need to submit your model predictions (in text format) on the blind set to a leaderboard interface (setup in Codalab). The leaderboard will be featuring the performance of other teams on the same dataset. 
  • d. The performance metric for evaluation will be the Word Error Rate (WER). 
  • e. All participants will be required to submit a system description report (2-4 pages) to the organizers (Submission Deadline: click here). All participants are also encouraged to submit their findings to the DISPLACE-2024 challenge, Interspeech 2024 for peer-review (Submission page) .
  • f. The participating teams are encouraged to use any open datasets for training and developing the ASR systems. 
For track 1 and track 2, the overall evaluation of submissions will be done in terms of Diarization Error Rate (DER) with overlap and without collar. A baseline system for both tracks will be provided to the registered teams. For track 3 , the overall evaluation of submissions will be in terms of Word Error Rate (WER). The evaluation results of submissions will be displayed on a leaderboard for continuous monitoring of the progress.

Registration  

Thank you for your interest! Below are the two quick steps involved in registering your participation and get started in the challenge.
Step-1:
One representative of the participating team fills the form at: click here
Step-2:
Subsequently, you need to send a signed Terms & Conditions (please, save it as "Terms_and_Conditions_DISPLACE_2024_<team_name>.pdf") document to us at displace2024@gmail.com.
After a quick verification from our side, we will confirm your registration and send you the access details to the dataset. That's it!.

Resources  

Evaluation Plan:
Evaluation plan for this challenge is available here.
DISPLACE Leaderboard:
Click here.
Baseline Systems:
Click here.
Dataset:
Click here (Password Protected)
Web Demo for Speaker Diarization and Language Diarization:
Click here

Organizers

Dr. Kalluri Shareef Babu
Post Doctoral Researcher, Indian Institute of Science, Bangalore, India
Dr. Shikha Baghel
Assistant Professor, National Institute of Technology Karnataka Surathkal, India
Prof. Sriram Ganapathy
Associate Professor, Indian Institute of Science, Bangalore, India
Prof. Deepu Vijayasenan
Associate Professor, National Institute of Technology Karnataka Surathkal, India
Prof. S. R. Mahadeva Prasanna
Professor, Dept of Electrical Engineering, IIT Dharwad, India
Dr. K. T. Deepak
Assistant Professor, Indian Institute of Information Technology Dharwad (IIIT-DWD), India

Contributors

Dr. Kalluri Shareef Babu
Post Doctoral Researcher, Indian Institute of Science, Bangalore, India
Prachi Singh
Research Scholar, Indian Institute of Science, Bangalore, India
Dr. Shikha Baghel
Assistant Professor, National Institute of Technology Karnataka Surathkal, India
Pratik Roy Chowdhuri
Research Scholar, National Institute of Technology Karnataka Surathkal, India
Prof. Sriram Ganapathy
Associate Professor, Indian Institute of Science, Bangalore, India
Prof. Deepu Vijayasenan
Associate Professor, National Institute of Technology Karnataka Surathkal, India
Prof. S. R. Mahadeva Prasanna
Professor, Dept of Electrical Engineering, IIT Dharwad, India
Dr. K. T. Deepak
Assistant Professor, Indian Institute of Information Technology Dharwad (IIIT-DWD), India
Apoorva Kulkarni
Intern at Leap Lab, Indian Institute of Science, Bangalore, India
Udyat Jain
Intern at Leap Lab, Indian Institute of Science, Bangalore, India
Pradyoth Hegde,
Research Scholar, Indian Institute of Information Technology Dharwad (IIIT-DWD), India
Swapnil Sontakke,
Research Scholar, Indian Institute of Information Technology Dharwad (IIIT-DWD), India
Prashant Bannulmath,
Research Scholar, Indian Institute of Information Technology Dharwad (IIIT-DWD), India
Rishith Sadashiv T N,
Research Scholar, Indian Institute of Technology Dharwad, India
Kumar Kaustubh,
Research Scholar, Indian Institute of Technology Dharwad, India
Lokesh Kumar,
M.Tech Student, Indian Institute of Technology Dharwad, India
Devesh Kumar,
B.Tech Student, Indian Institute of Technology Dharwad, India

Frequently Asked Questions

Q. Which programming languages can I use?

A. You are free to use any programming language you like. We will require you to submit the output decisions as a Rich Transcription Time Marked (RTTM) file for system evaluation. 
Q. How do I get the DISPLACE audio dataset?

A. It is simple - by registering for the challenge. Please see the registration section on this webpage (above).
Q. Can I re-distribute the data?

A. No, you can not re-distribute the data even if you have participated in the challenge. However, you can use it for research purposes with proper citations
Q. In which format, do I need to submit the output?

A. The output should be in the text file with the Rich Transcription Time Marked (RTTM) extension.
Q. How do I submit my findings obtained by participating in this challenge to Interspeech 2024?

A. That's great! You can follow the Interspeech 2024 paper submission portal here. Remember to select "DISPLACE Challenge” while uploading your paper there. 

Contact Us

You have more questions? Feel free to contact us at:

displace2024@gmail.com.