audiblelight.core.Scene#

class audiblelight.core.Scene(duration, backend, sample_rate=44100, fg_path=None, bg_path=None, image_path=None, allow_duplicate_audios=True, allow_same_class_events=True, ref_db=-65, scene_start_dist=None, event_start_dist=None, event_duration_dist=None, event_velocity_dist=None, event_resolution_dist=None, snr_dist=None, max_overlap=2, event_augmentations=None, backend_kwargs=None, class_mapping='DCASE2023Task3', video_fps=10, video_res=(1920, 960), video_low_power=True, video_overlay_distance_scale_factor=1.0, video_overlay_base_size=0.5)#

Bases: object

Initializes a Scene.

The Scene object is the highest level object within AudibleLight. It holds information relating to the current WorldState (including a 3D mesh, alongside listeners and sound sources) and any sound Event objects within it.

Parameters:
  • duration (int | float | complex | integer | floating)

  • backend (str | WorldState)

  • sample_rate (int | float | complex | integer | floating | None)

  • fg_path (str | Path | None)

  • bg_path (str | Path | None)

  • image_path (str | Path | None)

  • allow_duplicate_audios (bool)

  • allow_same_class_events (bool)

  • ref_db (int | float | complex | integer | floating | None)

  • scene_start_dist (DistributionLike | None)

  • event_start_dist (DistributionLike | None)

  • event_duration_dist (DistributionLike | None)

  • event_velocity_dist (DistributionLike | None)

  • event_resolution_dist (DistributionLike | None)

  • snr_dist (DistributionLike | None)

  • max_overlap (int | float | complex | integer | floating | None)

  • event_augmentations (Iterable[Type[EventAugmentation]] | Iterable[tuple[Type[EventAugmentation], dict]] | Type[EventAugmentation] | None)

  • backend_kwargs (dict | None)

  • class_mapping (TClassMapping | dict | str | None)

  • video_fps (int | float | complex | integer | floating | None)

  • video_res (tuple[int | float | complex | integer | floating, int | float | complex | integer | floating] | None)

  • video_low_power (bool | None)

  • video_overlay_distance_scale_factor (int | float | complex | integer | floating | None)

  • video_overlay_base_size (int | float | complex | integer | floating | None)

__init__(duration, backend, sample_rate=44100, fg_path=None, bg_path=None, image_path=None, allow_duplicate_audios=True, allow_same_class_events=True, ref_db=-65, scene_start_dist=None, event_start_dist=None, event_duration_dist=None, event_velocity_dist=None, event_resolution_dist=None, snr_dist=None, max_overlap=2, event_augmentations=None, backend_kwargs=None, class_mapping='DCASE2023Task3', video_fps=10, video_res=(1920, 960), video_low_power=True, video_overlay_distance_scale_factor=1.0, video_overlay_base_size=0.5)#

Initializes the Scene with a given duration and mesh.

Parameters:
  • duration (int | float | complex | integer | floating) – the length of time the scene audio should last for.

  • backend (str | WorldState) – the name of the backend to use. Either ‘rlr’, ‘sofa’, ‘or ‘shoebox’ are supported.

  • fg_path (str | Path | None) – a directory (or list of directories) pointing to foreground audio. Note that directories will be introspected recursively, such that audio files within any subdirectories will be detected also.

  • bg_path (str | Path | None) – a directory (or list of directories) pointing to background audio. Note that directories will be introspected recursively, such that audio files within any subdirectories will be detected also.

  • fg_path – a directory (or list of directories) pointing to Event images. Note that directories will be introspected recursively, such that image files within any subdirectories will be detected also.

  • allow_duplicate_audios (bool) – if True (default), the same audio file can appear multiple times in the Scene.

  • allow_same_class_events (bool) – if True (default), multiple Events from the same class may be added to the Scene.

  • ref_db (int | float | complex | integer | floating | None) – reference decibel level for scene noise floor, defaults to -65 dB

  • scene_start_dist (DistributionLike | None) – distribution-like object or callable used to sample starting times for any Event objects applied to the scene. If not provided, will be a uniform distribution between 0 and duration

  • event_start_dist (DistributionLike | None) – distribution-like object used to sample starting (offset) times for Event audio files. If not provided, Event audio files will always start at 0 seconds. Note that this can be overridden by passing a value into Scene.add_event(event_start=…)

  • event_duration_dist (DistributionLike | None) – distribution-like object used to sample Event audio duration times. If not provided, Event audio files will always use their full duration. Note that this can be overridden by passing a value into Scene.add_event(duration=…)

  • event_velocity_dist (DistributionLike | None) – distribution-like object used to sample Event spatial velocities. If not provided, a uniform distribution between 0.5 and 2.0 metres-per-second will be used.

  • event_resolution_dist (DistributionLike | None) – distribution-like object used to sample Event spatial resolutions. If not provided, a uniform distribution between 1.0 and 4.0 Hz (i.e., IRs-per-second) will be used.

  • snr_dist (DistributionLike | None) – distribution-like object used to sample Event signal-to-noise ratios. If not provided, a uniform distribution between 5 and 30 will be used.

  • max_overlap (int | float | complex | integer | floating | None) – the maximum number of overlapping audio Events allowed in the Scene, defaults to 2.

  • event_augmentations (Iterable[Type[EventAugmentation]] | Iterable[tuple[Type[EventAugmentation], dict]] | Type[EventAugmentation] | None) – an iterable of audiblelight.EventAugmentation objects that can be applied to Event objects. The number of augmentations sampled from this list can be controlled by setting the value of augmentations when calling Scene.add_event, i.e. Scene.add_event(augmentations=3) will sample 3 random augmentations from event_augmentations and apply them to the Event.

  • backend_kwargs (dict | None) – keyword arguments passed to audiblelight.WorldState.

  • class_mapping (TClassMapping | dict | str | None) – a mapping used to map class names to indices, and vice versa. Can be a subclass of audiblelight.class_mapping.ClassMapping, dict, or str. Defaults to DCASE 2023, task 3 mapping

  • video_fps (int | float | complex | integer | floating | None) – The number of frames-per-second to use when creating a video, defaults to 10

  • video_res (tuple[int | float | complex | integer | floating, int | float | complex | integer | floating] | None) – The resolution of generated video files, defaults to (1920 x 960). Note that height must be exactly half of width for an equirectangular video.

  • video_low_power (bool | None) – Applies a variety of adjustments to improve video performance on weaker hardware.

  • video_overlay_distance_scale_factor (int | float | complex | integer | floating | None) – Scales the size of overlaid images depending on proximity to camera. A larger scaling factor means that images closer to the camera will appear smaller, vs. a lower scaling factor. Defaults to 0.1.

  • video_overlay_base_size (int | float | complex | integer | floating | None) – The base size of overlaid images on the video, independent of distance. Defaults to 0.5.

  • sample_rate (int | float | complex | integer | floating | None)

  • image_path (str | Path | None)

Methods

__init__(duration, backend[, sample_rate, ...])

Initializes the Scene with a given duration and mesh.

add_ambience([filepath, noise, channels, ...])

Add ambient noise to the WorldState.

add_emitter(**kwargs)

Add an emitter to the WorldState.

add_emitters(**kwargs)

Add emitters to the WorldState.

add_event([event_type, filepath, alias, ...])

Add an event to the foreground, either "static", "moving", or "predefined".

add_event_moving([filepath, alias, ...])

Add a moving event to the foreground with optional overrides.

add_event_predefined([filepath, trajectory, ...])

Add a moving event to the foreground that follows a predefined path.

add_event_static([filepath, alias, ...])

Add a static event to the foreground with optional overrides.

add_microphone(**kwargs)

Add a microphone to the WorldState.

add_microphone_and_emitter(**kwargs)

Add both a microphone and emitter with specified relationship.

add_microphones(**kwargs)

Add microphones to the WorldState.

clear_ambience()

Removes all current ambience events.

clear_emitter(alias)

Alias for WorldState.clear_emitter.

clear_emitters()

Alias for WorldState.clear_emitters.

clear_event(alias)

Given an alias for an event, clears the event and updates the state.

clear_events()

Removes all current events and emitters from the state

clear_microphone(alias)

Alias for WorldState.clear_microphone.

clear_microphones()

Alias for WorldState.clear_microphones.

from_dict(input_dict)

Instantiate a Scene from a dictionary.

from_json(json_fpath)

Instantiate a Scene from a JSON file.

generate([output_dir, audio, metadata_json, ...])

Render scene to disk.

generate_acoustic_image([output_dir, t_sti, ...])

Generate acoustic image and associated metadata for each microphone array added to the Scene.

get_ambience(alias)

Given a valid alias, get an associated ambience event, as in self.ambience[alias]

get_ambiences()

Get all ambience objects, as in self.ambience.values()

get_class_mapping()

Alias for ClassMapping.mapping

get_emitter(alias[, emitter_idx])

Alias for WorldState.get_emitter

get_emitters(alias)

Alias for WorldState.get_emitters

get_event(alias_or_idx)

Given a valid alias, get an associated event either by alias (string) or idx (int).

get_events()

Return a list of all events for this scene, as in self.events.values()

get_microphone(alias)

Alias for WorldState.get_microphone

get_microphones()

Alias for WorldState.get_microphones

to_dict()

Returns metadata for this object as a dictionary

__eq__(other)#

Compare two Scene objects for equality.

Internally, we convert both objects to a dictionary, and then use the deepdiff package to compare them, with some additional logic to account e.g. for significant digits and values that will always be different (e.g., creation time).

Parameters:

other (Any) – the object to compare the current Scene object against

Returns:

True if the Scene objects are equivalent, False otherwise

Return type:

bool

__getitem__(alias_or_idx)#

An alternative for self.get_event(alias) or `self.events[alias]

Parameters:

alias_or_idx (str | int)

Return type:

Event

__iter__()#

Yields an iterator of Event objects from the current scene

Examples

>>> test_scene = Scene(...)
>>> for n in range(9):
>>>     test_scene.add_event_static(...)
>>> for ev in test_scene:
>>>     assert isinstance(ev, Event)
Return type:

Iterator[Event]

__len__()#

Returns the number of events in the scene

Return type:

int

__repr__()#

Returns representation of the scene as a JSON

Return type:

str

__str__()#

Returns a string representation of the scene

Return type:

str

add_ambience(filepath=None, noise=None, channels=None, ref_db=None, alias=None, **kwargs)#

Add ambient noise to the WorldState.

The ambience can be either a file on the disk (in which case filepath must not be None) or a type of noise “color” such as white, red, or blue (in which case noise must not be None). The number of channels can be provided directly or will be inferred from the microphones added to the state, when this is possible.

Parameters:
  • channels (int) – the number of channels to generate noise for. If None, will be inferred from available mics.

  • filepath (str or Path) – a path to an audio file on the disk. If None (and noise is None), will try and sample a random audio file from Scene.bg_audios.

  • noise (str) – either the type of noise to generate, e.g. “white”, “red”, or an arbitrary numeric exponent to use when generating noise with powerlaw_psd_gaussian.

  • ref_db (Numeric) – the noise floor, in decibels

  • alias (str) – string reference to refer to this Ambience object inside Scene.ambience

  • kwargs – additional keyword arguments passed to audiblelight.ambience.powerlaw_psd_gaussian

add_emitter(**kwargs)#

Add an emitter to the WorldState.

An alias for WorldState.add_emitter: see that method for a full description.

add_emitters(**kwargs)#

Add emitters to the WorldState.

An alias for WorldState.add_emitters: see that method for a full description.

add_event(event_type='static', filepath=None, alias=None, augmentations=None, position=None, trajectory=None, mic=None, polar=False, ensure_direct_path=False, scene_start=None, event_start=None, duration=None, snr=None, class_id=None, class_label=None, shape=None, spatial_resolution=None, spatial_velocity=None, max_place_attempts=1000, image_filepath=None, **event_kwargs)#

Add an event to the foreground, either “static”, “moving”, or “predefined”.

Note that the arguments “scene_start”, “event_start”, “duration”, “snr”, “spatial_velocity”, & “spatial_resolution” will (by default) sample from their respective distributions, provided in Scene.__init__. If a numeric value is provided, this will be treated as an override and used instead of random sampling.

Parameters:
  • event_type (str) – the type of event to add, must be either “static”, “moving”, or “predefined”.

  • filepath (str | Path | None) – a path to a foreground event to use. If not provided, a foreground event will be sampled from fg_category_paths, if this is provided inside __init__; otherwise, an error will be raised.

  • alias (str | None) – the string alias used to index this event inside the events dictionary

  • augmentations (Iterable[Type[EventAugmentation]] | Type[EventAugmentation] | int | float | complex | integer | floating | None) – augmentation objects to associate with the Event. If a list of EventAugmentation objects or a single EventAugmentation object, these will be passed directly. If a number, this many augmentations will be sampled from either Scene.event_augmentations, or a master list of valid augmentations (defined inside audiblelight.augmentations) If not provided, EventAugmentations can be registered later by calling register_augmentations on the Event.

  • position (list | ndarray | None) – Location to add the event. When event_type==”static”, this will be the position of the Event. When event_type==”moving”, this will be the starting position of the Event. When not provided, a random point inside the mesh will be chosen.

  • trajectory (ndarray | None) – The trajectory the moving event will follow, given in Cartesian coordinates inside the mesh. Only used when event_type==”predefined”. If not provided, will attempt to infer from state.waypoints.

  • mic (str | None) – String reference to a microphone inside self.state.microphones; when provided, position is interpreted as RELATIVE to the center of this microphone

  • polar (bool | None) – When True, expects position to be provided in [azimuth, elevation, radius] form; otherwise, units are [x, y, z] in absolute, cartesian terms.

  • ensure_direct_path (bool | list | str | None) – Whether to ensure a direct line exists between the emitter and given microphone(s). If True, will ensure a direct line exists between the emitter and ALL microphone objects. If a list of strings, these should correspond to microphone aliases inside microphones; a direct line will be ensured with all of these microphones. If False, no direct line is required for a emitter.

  • scene_start (int | float | complex | integer | floating | None) – Time to start the Event within the Scene, in seconds. Must be a positive number.

  • event_start (int | float | complex | integer | floating | None) – Time to start the Event audio from, in seconds. Must be a positive number.

  • duration (int | float | complex | integer | floating | None) – Time the Event audio lasts in seconds. Must be a positive number.

  • snr (int | float | complex | integer | floating | None) – Signal to noise ratio for the audio file with respect to the noise floor

  • class_label (str | None) – Optional label to use for sound event class. If not provided, will attempt to infer label from filepath using the DCASE sound event classes.

  • class_id (int | None) – Optional ID to use for sound event class. If not provided, will attempt to infer ID from filepath using the DCASE sound event classes.

  • spatial_velocity (int | float | complex | integer | floating | None) – Speed of a moving sound event in metres-per-second

  • spatial_resolution (int | float | complex | integer | floating | None) – Resolution of a moving sound event in Hz (i.e., number of IRs created per second)

  • shape (str | None) – the shape of a moving event trajectory; one of “linear”, “semicircular”, “random”, “sine”, “sawtooth”, “predefined”

  • max_place_attempts (Numeric) – the number of times to try and place an Event before giving up.

  • image_filepath (str | Path | None) – A path to an image file, used when generating visual representations of a scene. Must be provided in order to generate videos from a Scene.

  • event_kwargs – additional keyword arguments passed to Event.__init__

Returns:

the Event object added to the Scene

Return type:

Event

Examples

Creating an event with random sampling of parameters. Here, note that “scene_start”, “event_start”, “duration”, “snr” will be sampled at random from the distributions defined when initialising the Scene.

>>> scene = Scene(...)
>>> scene.add_event(
>>>     event_type="static",
>>>     filepath="some/path.wav"
>>> )

Creating an event with a predefined position:

>>> scene = Scene(...)
>>> scene.add_event(
...     event_type="static",
...     filepath="some/path.wav",
...     alias="tester",
...     position=[-0.5, -0.5, 0.5],
...     polar=False,
...     ensure_direct_path=False
... )

Creating an event with overrides:

>>> scene = Scene(...)
>>> scene.add_event(
...     event_type="moving",
...     filepath="some/path.wav",
...     alias="tester",
...     event_start=5.0,
...     duration=5.0,
...     snr=0.0,
... )

Creating an event with an image:

>>> scene = Scene(...)
>>> scene.add_event(
...     event_type="moving",
...     filepath="some/path.wav",
...     image_filepath="some/image.jpg"
... )
add_event_moving(filepath=None, alias=None, augmentations=None, position=None, mic=None, polar=False, shape=None, scene_start=None, event_start=None, duration=None, snr=None, class_id=None, class_label=None, spatial_resolution=None, spatial_velocity=None, ensure_direct_path=False, max_place_attempts=1000, image_filepath=None, **event_kwargs)#

Add a moving event to the foreground with optional overrides.

Note that the arguments “scene_start”, “event_start”, “duration”, “snr”, “spatial_velocity”, & “spatial_resolution” will (by default) sample from their respective distributions, provided in Scene.__init__. If a numeric value is provided, this will be treated as an override and used instead of random sampling.

Parameters:
  • filepath (str | Path | None) – a path to a foreground event to use. If not provided, a foreground event will be sampled from fg_category_paths, if this is provided inside __init__; otherwise, an error will be raised.

  • alias (str | None) – the string alias used to index this event inside the events dictionary

  • augmentations (Iterable[Type[EventAugmentation]] | Type[EventAugmentation] | int | float | complex | integer | floating | None) – augmentation objects to associate with the Event. If a list of EventAugmentation objects or a single EventAugmentation object, these will be passed directly. If a number, this many augmentations will be sampled from either Scene.event_augmentations, or a master list of valid augmentations (defined inside audiblelight.augmentations) If not provided, EventAugmentations can be registered later by calling register_augmentations on the Event.

  • position (list | ndarray | None) – Starting point for the event. When not provided, a random point inside the mesh will be chosen.

  • mic (str | None) – String reference to a microphone inside self.state.microphones; when provided, position is interpreted as RELATIVE to the center of this microphone

  • polar (bool | None) – When True, expects position to be provided in [azimuth, elevation, radius] form; otherwise, units are [x, y, z] in absolute, cartesian terms.

  • scene_start (int | float | complex | integer | floating | None) – Time to start the Event within the Scene, in seconds. Must be a positive number.

  • event_start (int | float | complex | integer | floating | None) – Time to start the Event audio from, in seconds. Must be a positive number.

  • duration (int | float | complex | integer | floating | None) – Time the Event audio lasts in seconds. Must be a positive number.

  • snr (int | float | complex | integer | floating | None) – Signal to noise ratio for the audio file with respect to the noise floor

  • class_label (str | None) – Optional label to use for sound event class. If not provided, will attempt to infer label from filepath using the DCASE sound event classes.

  • class_id (int | None) – Optional ID to use for sound event class. If not provided, will attempt to infer ID from filepath using the DCASE sound event classes.

  • spatial_velocity (int | float | complex | integer | floating | None) – Speed of a moving sound event in metres-per-second

  • spatial_resolution (int | float | complex | integer | floating | None) – Resolution of a moving sound event in Hz (i.e., number of IRs created per second)

  • shape (str | None) – the shape of a moving event trajectory; one of “linear”, “semicircular”, “random”, “sine”, “sawtooth”

  • ensure_direct_path (bool | list | str | None) – Whether to ensure a direct line exists between the emitter and given microphone(s). If True, will ensure a direct line exists between the emitter and ALL microphone objects. If a list of strings, these should correspond to microphone aliases inside microphones; a direct line will be ensured with all of these microphones. If False, no direct line is required for a emitter.

  • max_place_attempts (Numeric) – the number of times to try and place an Event before giving up.

  • image_filepath (str | Path | None) – A path to an image file, used when generating visual representations of a scene. Must be provided in order to generate videos from a Scene.

  • event_kwargs – additional keyword arguments passed to Event.__init__

Returns:

the Event object added to the Scene

Return type:

Event

add_event_predefined(filepath=None, trajectory=None, alias=None, augmentations=None, scene_start=None, event_start=None, duration=None, snr=None, class_id=None, class_label=None, ensure_direct_path=False, max_place_attempts=1000, image_filepath=None)#

Add a moving event to the foreground that follows a predefined path.

The spatial velocity and resolution of the event will be inferred from the trajectory itself, in combination with the duration (which may be provided or randomly sampled).

Parameters:
  • filepath (str | Path | None)

  • trajectory (ndarray | None)

  • alias (str | None)

  • augmentations (Iterable[Type[EventAugmentation]] | Type[EventAugmentation] | int | float | complex | integer | floating | None)

  • scene_start (int | float | complex | integer | floating | None)

  • event_start (int | float | complex | integer | floating | None)

  • duration (int | float | complex | integer | floating | None)

  • snr (int | float | complex | integer | floating | None)

  • class_id (int | None)

  • class_label (str | None)

  • ensure_direct_path (bool | list | str | None)

  • max_place_attempts (int | float | complex | integer | floating | None)

  • image_filepath (str | Path | None)

add_event_static(filepath=None, alias=None, augmentations=None, position=None, mic=None, polar=False, ensure_direct_path=False, scene_start=None, event_start=None, duration=None, snr=None, class_id=None, class_label=None, max_place_attempts=1000, image_filepath=None, **event_kwargs)#

Add a static event to the foreground with optional overrides.

Note that the arguments “scene_start”, “event_start”, “duration”, & “snr” will (by default) sample from their respective distributions, provided in Scene.__init__. If a numeric value is provided, this will be treated as an override and used instead of random sampling.

Parameters:
  • filepath (str | Path | None) – a path to a foreground event to use. If not provided, a foreground event will be sampled from fg_category_paths, if this is provided inside __init__; otherwise, an error will be raised.

  • alias (str | None) – the string alias used to index this event inside the events dictionary

  • augmentations (Iterable[Type[EventAugmentation]] | Type[EventAugmentation] | int | float | complex | integer | floating | None) – augmentation objects to associate with the Event. If a list of EventAugmentation objects or a single EventAugmentation object, these will be passed directly. If a number, this many augmentations will be sampled from either Scene.event_augmentations, or a master list of valid augmentations (defined inside audiblelight.augmentations) If not provided, EventAugmentations can be registered later by calling register_augmentations on the Event.

  • position (list | ndarray | None) – Location to add the event. When not provided, a random point inside the mesh will be chosen.

  • mic (str | None) – String reference to a microphone inside self.state.microphones; when provided, position is interpreted as RELATIVE to the center of this microphone

  • polar (bool | None) – When True, expects position to be provided in [azimuth, elevation, radius] form; otherwise, units are [x, y, z] in absolute, cartesian terms.

  • ensure_direct_path (bool | list | str | None) – Whether to ensure a direct line exists between the emitter and given microphone(s). If True, will ensure a direct line exists between the emitter and ALL microphone objects. If a list of strings, these should correspond to microphone aliases inside microphones; a direct line will be ensured with all of these microphones. If False, no direct line is required for a emitter.

  • scene_start (int | float | complex | integer | floating | None) – Time to start the Event within the Scene, in seconds. Must be a positive number.

  • event_start (int | float | complex | integer | floating | None) – Time to start the Event audio from, in seconds. Must be a positive number.

  • duration (int | float | complex | integer | floating | None) – Time the Event audio lasts in seconds. Must be a positive number.

  • snr (int | float | complex | integer | floating | None) – Signal to noise ratio for the audio file with respect to the noise floor

  • class_label (str | None) – Optional label to use for sound event class. If not provided, will attempt to infer label from filepath using the DCASE sound event classes.

  • class_id (int | None) – Optional ID to use for sound event class. If not provided, will attempt to infer ID from filepath using the DCASE sound event classes.

  • max_place_attempts (Numeric) – the number of times to try and place an Event before giving up.

  • image_filepath (str | Path | None) – A path to an image file, used when generating visual representations of a scene. Must be provided in order to generate videos from a Scene.

  • event_kwargs – additional keyword arguments passed to Event.__init__

Returns:

the Event object added to the Scene

Return type:

Event

add_microphone(**kwargs)#

Add a microphone to the WorldState.

An alias for WorldState.add_microphone: see that method for a full description.

Return type:

None

add_microphone_and_emitter(**kwargs)#

Add both a microphone and emitter with specified relationship.

An alias for WorldState.add_microphone_and_emitter: see that method for a full description.

Return type:

None

add_microphones(**kwargs)#

Add microphones to the WorldState.

An alias for WorldState.add_microphones: see that method for a full description.

Return type:

None

clear_ambience()#

Removes all current ambience events.

Return type:

None

clear_emitter(alias)#

Alias for WorldState.clear_emitter.

Parameters:

alias (str)

Return type:

None

clear_emitters()#

Alias for WorldState.clear_emitters.

Return type:

None

clear_event(alias)#

Given an alias for an event, clears the event and updates the state.

Note: simply calling del self.events[alias] is not enough; we also need to remove the source from the ray-tracing engine by updating the state.emitters dictionary and calling state._update.

Parameters:

alias (str)

Return type:

None

clear_events()#

Removes all current events and emitters from the state

Return type:

None

clear_microphone(alias)#

Alias for WorldState.clear_microphone.

Parameters:

alias (str)

Return type:

None

clear_microphones()#

Alias for WorldState.clear_microphones.

Return type:

None

classmethod from_dict(input_dict)#

Instantiate a Scene from a dictionary.

The new Scene will have the same WorldState, Emitters, Events, and Microphones as the original, serialised dictionary created from to_dict. Ensure that any necessary files (e.g. meshes, audio files) are located in the same places as specified in the dictionary.

Note that, currently, distribution objects (e.g., Scene.event_start_dist) cannot be loaded from a dictionary.

Parameters:

input_dict (dict[str, Any]) – Dictionary that will be used to instantiate the Scene.

Returns:

Scene instance.

classmethod from_json(json_fpath)#

Instantiate a Scene from a JSON file.

Parameters:

json_fpath (str | Path) – Path to the JSON file to load.

Returns:

Scene instance.

generate(output_dir=None, audio=True, metadata_json=True, metadata_dcase=True, audio_fname='audio_out', metadata_fname='metadata_out', video=False, video_fname='video_out')#

Render scene to disk. Generating audio, its metadata, and a video representation.

Parameters:
  • output_dir (str | Path | None) – directory to save the output, defaults to current working directory

  • audio (bool) – whether to save audio as an output, default to True

  • metadata_json (bool) – whether to save metadata JSON file, default to True

  • metadata_dcase (bool) – whether to save metadata CSVs in DCASE format, default to True

  • audio_fname (str | Path | None) – name to use for the output audio file, default to “audio_out”

  • metadata_fname (str | Path | None) – name to use for the output metadata, default to “metadata_out”

  • video (bool) – whether to save video as an output, default to False

  • video_fname (str | Path | None) – name to use for the output video, default to “video_out”

Returns:

None

Return type:

None

generate_acoustic_image(output_dir=None, t_sti=0.01, scale='linear', nbands=9, frame_cap=None, fmin=1500, fmax=4500, bw=50.0, sh_order=10, polygon_mask_threshold=4e-05, resolution=(360, 180), circle_radius=20, json_fname='acoustic_image_metadata', hdf_fname='acoustic_image', standardise=True, n_jobs=-1, verbosity=50)#

Generate acoustic image and associated metadata for each microphone array added to the Scene.

Acoustic images are produced in the form (tesselation, bands, frames). These are produced from synthesised audio using the Accelerated Proximal Gradient Descent (APGD) method.

Metadata for the acoustic images consists of the pixel coordinates of associated “segmentations” (or “blobs”!) extracted from the acoustic image. These segmentations can be treated similar to bounding boxes often found in computer vision, and can be used for tasks like sound event localisation and detection. The method used to obtain the metadata is as follows:

  1. Take the median energy for each band in the acoustic image: gives (tesselation, frames)

  2. Iterate over all frames with an active annotation in the metadata array

    2a. Interpolate the corresponding acoustic image frame to an image with shape (height, width) 2b. Iterate over all annotations for the current frame:

    2bi. Create a 2D Gaussian centered at the X and Y pixel coordinates of the annotation, with radius set

    to span 2SD of all pixel values

    2bii. Scale the acoustic image frame by multiplying by the Gaussian 2biii. Mask all values in the scaled acoustic image frame that are below polygon_mask_threshold 2biv. Apply contour detection to grab the edges of each “blob” in the image

    2c. Append all “blobs” for the frame: each of these have the format [x_pixel, y_pixel, amplitude]

  3. If standardise:
    3a. The pixel amplitude values are Z-scored using the mean/SD of the distribution of max pixel values per

    mask, according to the training data in the STARSS23 dataset

    3b. 0.5 is added to the Z-scored values 3c. The results are then clipped between 0.01 and 1.0

  4. Return a full dictionary containing annotations of every frame

The dictionaries contain the following keys:
  • “metadata_frame_index”: the index of the frame within the acoustic image

  • “instance_id”: a unique integer identifier for each event in the scene

  • “category_id”: the index of the soundevent

  • “distance”: the distance of the soundevent

  • “segmentation”: a list of [x_pixel, y_pixel, amplitude] values for every segmentation in that frame.

The resulting JSON dictionaries are dumped at metadata_fname: one JSON per microphone added to the Scene. It is also assumed that the amplitude values should be scaled across multiple JSON files that constitute an entire dataset, e.g. by Z-scoring, scaling between 0 and 1, etc. As this process relies on summary statistics that cannot easily be known when computing individual JSONs, this must be accomplished after calling this function.

Note that this functionality is separated from generate due to the number of optional arguments.

Parameters:
  • output_dir (str | Path | None) – directory to save the output, defaults to current working directory

  • t_sti (Numeric) – frame length, defaults to 100 ms (same as DCASE label resolution)

  • scale (str) – scaling to use for nbands frequency bands, must be either “linear” or “log”

  • nbands (Numeric) – number of frequency bands

  • frame_cap (Numeric) – maximum number of frames to compute: set to None to use all frames

  • fmin (Numeric) – minimum frequency for nbands frequency bands

  • fmax (Numeric) – maximum frequency for nbands frequency bands

  • bw (Numeric) – bandwidth for nbands frequency bands

  • sh_order (Numeric) – spherical harmonic order that determines sampling density

  • resolution (tuple) – the resolution to interpolate the image to: must be equirectangular, in form (width, height)

  • polygon_mask_threshold (Numeric) – after scaling the acoustic image according to the 2D Gaussian, values below this threshold will be set to 0. This value should be tweaked based on looking at the images.

  • circle_radius (Numeric) – the radius of the circle placed at ground-truth azimuth and elevation points when calculating the 2D Gaussian

  • json_fname (str) – name to use for the output JSON, default to “acoustic_image_metadata”

  • hdf_fname (str) – name to use for the output HDF file, default to “acoustic_image”

  • standardise (bool) – whether to standardise the results according to the distribution of pixel values within the STARSS23 training set, defaults to True.

  • n_jobs (Numeric) – number of multiprocessing jobs, set to 1 to disable multiprocessing. Note that the number of workers will be dynamically reduced on out-of-memory or CPU errors: thus, it is recommended to set n_jobs=-1 to take advantage of all CPU cores for small audio files, falling back automatically to n_jobs=1 (no multiprocessing) for larger files.

  • verbosity (Numeric) – verbosity level to use when multiprocessing: higher prints more frequently

Returns:

None

Return type:

None

get_ambience(alias)#

Given a valid alias, get an associated ambience event, as in self.ambience[alias]

Return type:

Ambience

get_ambiences()#

Get all ambience objects, as in self.ambience.values()

Return type:

list[Ambience]

get_class_mapping()#

Alias for ClassMapping.mapping

Return type:

Type[TClassMapping]

get_emitter(alias, emitter_idx=0)#

Alias for WorldState.get_emitter

Parameters:
  • alias (str)

  • emitter_idx (int)

Return type:

Emitter

get_emitters(alias)#

Alias for WorldState.get_emitters

Parameters:

alias (str)

Return type:

list[Emitter]

get_event(alias_or_idx)#

Given a valid alias, get an associated event either by alias (string) or idx (int).

Parameters:

alias_or_idx (str | int)

Return type:

Event

get_events()#

Return a list of all events for this scene, as in self.events.values()

Return type:

list[Event]

get_microphone(alias)#

Alias for WorldState.get_microphone

Parameters:

alias (str)

Return type:

Type[MicArray]

get_microphones()#

Alias for WorldState.get_microphones

Return type:

list[Type[MicArray]]

to_dict()#

Returns metadata for this object as a dictionary

Return type:

dict