audiblelight.synthesize.generate_dcase2024_metadata#

audiblelight.synthesize.generate_dcase2024_metadata(scene, temporal_resolution=0.1)#

Given a Scene, generate metadata for each microphone in the DCASE 2024 format.

The output format is given as {“mic_alias_0”: <pd.DataFrame>, “mic_alias_1”: <pd.DataFrame>} for every microphone added to the scene. The exact specification of the metadata can be found on the [DCASE 2024 challenge website] (https://dcase.community/challenge2024/task-audio-and-audiovisual-sound-event-localization-and-detection-with-source-distance-estimation)

In particular, the columns of each dataframe are as follows: - frame number (int): the index of the frame - active class index (int): the index of the soundevent: see audiblelight.event.DCASE_SOUND_EVENT_CLASSES for

a complete mapping.

source number index (int): a unique integer identifier for each event in the scene.
azimuth (int): the azimuth, increasing counter-clockwise (ϕ=90∘ at the left, ϕ=0∘ at the front).
elevation (int): the elevation angle (θ=0∘ at the front).
distance (int): the distance from the microphone, measured in centimeters.

The audio is quantised to 10 frames per second (i.e., frame length = 100 ms). In cases of moving trajectories, the position of each IR is linearly interpolated throughout the duration of the audio file in order to obtain a value for azimuth, elevation, and distance estimated at every frame. The quantisation can be changed by adjusting the temporal_resolution argument.

Note that, source number index value is assigned separately for each class (in the STARSS format): thus, with two telephone classes and one femaleSpeech, we would expect to see values of 0 and 1 for the two telephone instances and only 0 for the femaleSpeech instance. Events that share the same audio file are always assigned the same source ID every time they occur.

Finally, note that frames without sound events are omitted from the output.

Parameters:

scene (Scene)
temporal_resolution (int | float | complex | integer | floating)

Return type:

dict[str, DataFrame]

audiblelight.synthesize.generate_dcase2024_metadata

Contents

audiblelight.synthesize.generate_dcase2024_metadata#