audiblelight.core.Scene#
- class audiblelight.core.Scene(duration, backend, sample_rate=44100, fg_path=None, bg_path=None, image_path=None, allow_duplicate_audios=True, allow_same_class_events=True, ref_db=-65, scene_start_dist=None, event_start_dist=None, event_duration_dist=None, event_velocity_dist=None, event_resolution_dist=None, snr_dist=None, max_overlap=2, event_augmentations=None, backend_kwargs=None, class_mapping='DCASE2023Task3', video_fps=10, video_res=(1920, 960), video_low_power=True, video_overlay_distance_scale_factor=1.0, video_overlay_base_size=0.5)#
Bases:
objectInitializes a Scene.
The Scene object is the highest level object within AudibleLight. It holds information relating to the current WorldState (including a 3D mesh, alongside listeners and sound sources) and any sound Event objects within it.
- Parameters:
duration (int | float | complex | integer | floating)
backend (str | WorldState)
sample_rate (int | float | complex | integer | floating | None)
fg_path (str | Path | None)
bg_path (str | Path | None)
image_path (str | Path | None)
allow_duplicate_audios (bool)
allow_same_class_events (bool)
ref_db (int | float | complex | integer | floating | None)
scene_start_dist (DistributionLike | None)
event_start_dist (DistributionLike | None)
event_duration_dist (DistributionLike | None)
event_velocity_dist (DistributionLike | None)
event_resolution_dist (DistributionLike | None)
snr_dist (DistributionLike | None)
max_overlap (int | float | complex | integer | floating | None)
event_augmentations (Iterable[Type[EventAugmentation]] | Iterable[tuple[Type[EventAugmentation], dict]] | Type[EventAugmentation] | None)
backend_kwargs (dict | None)
class_mapping (TClassMapping | dict | str | None)
video_fps (int | float | complex | integer | floating | None)
video_res (tuple[int | float | complex | integer | floating, int | float | complex | integer | floating] | None)
video_low_power (bool | None)
video_overlay_distance_scale_factor (int | float | complex | integer | floating | None)
video_overlay_base_size (int | float | complex | integer | floating | None)
- __init__(duration, backend, sample_rate=44100, fg_path=None, bg_path=None, image_path=None, allow_duplicate_audios=True, allow_same_class_events=True, ref_db=-65, scene_start_dist=None, event_start_dist=None, event_duration_dist=None, event_velocity_dist=None, event_resolution_dist=None, snr_dist=None, max_overlap=2, event_augmentations=None, backend_kwargs=None, class_mapping='DCASE2023Task3', video_fps=10, video_res=(1920, 960), video_low_power=True, video_overlay_distance_scale_factor=1.0, video_overlay_base_size=0.5)#
Initializes the Scene with a given duration and mesh.
- Parameters:
duration (int | float | complex | integer | floating) – the length of time the scene audio should last for.
backend (str | WorldState) – the name of the backend to use. Either ‘rlr’, ‘sofa’, ‘or ‘shoebox’ are supported.
fg_path (str | Path | None) – a directory (or list of directories) pointing to foreground audio. Note that directories will be introspected recursively, such that audio files within any subdirectories will be detected also.
bg_path (str | Path | None) – a directory (or list of directories) pointing to background audio. Note that directories will be introspected recursively, such that audio files within any subdirectories will be detected also.
fg_path – a directory (or list of directories) pointing to Event images. Note that directories will be introspected recursively, such that image files within any subdirectories will be detected also.
allow_duplicate_audios (bool) – if True (default), the same audio file can appear multiple times in the Scene.
allow_same_class_events (bool) – if True (default), multiple Events from the same class may be added to the Scene.
ref_db (int | float | complex | integer | floating | None) – reference decibel level for scene noise floor, defaults to -65 dB
scene_start_dist (DistributionLike | None) – distribution-like object or callable used to sample starting times for any Event objects applied to the scene. If not provided, will be a uniform distribution between 0 and duration
event_start_dist (DistributionLike | None) – distribution-like object used to sample starting (offset) times for Event audio files. If not provided, Event audio files will always start at 0 seconds. Note that this can be overridden by passing a value into Scene.add_event(event_start=…)
event_duration_dist (DistributionLike | None) – distribution-like object used to sample Event audio duration times. If not provided, Event audio files will always use their full duration. Note that this can be overridden by passing a value into Scene.add_event(duration=…)
event_velocity_dist (DistributionLike | None) – distribution-like object used to sample Event spatial velocities. If not provided, a uniform distribution between 0.5 and 2.0 metres-per-second will be used.
event_resolution_dist (DistributionLike | None) – distribution-like object used to sample Event spatial resolutions. If not provided, a uniform distribution between 1.0 and 4.0 Hz (i.e., IRs-per-second) will be used.
snr_dist (DistributionLike | None) – distribution-like object used to sample Event signal-to-noise ratios. If not provided, a uniform distribution between 5 and 30 will be used.
max_overlap (int | float | complex | integer | floating | None) – the maximum number of overlapping audio Events allowed in the Scene, defaults to 2.
event_augmentations (Iterable[Type[EventAugmentation]] | Iterable[tuple[Type[EventAugmentation], dict]] | Type[EventAugmentation] | None) – an iterable of audiblelight.EventAugmentation objects that can be applied to Event objects. The number of augmentations sampled from this list can be controlled by setting the value of augmentations when calling Scene.add_event, i.e. Scene.add_event(augmentations=3) will sample 3 random augmentations from event_augmentations and apply them to the Event.
backend_kwargs (dict | None) – keyword arguments passed to audiblelight.WorldState.
class_mapping (TClassMapping | dict | str | None) – a mapping used to map class names to indices, and vice versa. Can be a subclass of audiblelight.class_mapping.ClassMapping, dict, or str. Defaults to DCASE 2023, task 3 mapping
video_fps (int | float | complex | integer | floating | None) – The number of frames-per-second to use when creating a video, defaults to 10
video_res (tuple[int | float | complex | integer | floating, int | float | complex | integer | floating] | None) – The resolution of generated video files, defaults to (1920 x 960). Note that height must be exactly half of width for an equirectangular video.
video_low_power (bool | None) – Applies a variety of adjustments to improve video performance on weaker hardware.
video_overlay_distance_scale_factor (int | float | complex | integer | floating | None) – Scales the size of overlaid images depending on proximity to camera. A larger scaling factor means that images closer to the camera will appear smaller, vs. a lower scaling factor. Defaults to 0.1.
video_overlay_base_size (int | float | complex | integer | floating | None) – The base size of overlaid images on the video, independent of distance. Defaults to 0.5.
sample_rate (int | float | complex | integer | floating | None)
image_path (str | Path | None)
Methods
__init__(duration, backend[, sample_rate, ...])Initializes the Scene with a given duration and mesh.
add_ambience([filepath, noise, channels, ...])Add ambient noise to the WorldState.
add_emitter(**kwargs)Add an emitter to the WorldState.
add_emitters(**kwargs)Add emitters to the WorldState.
add_event([event_type, filepath, alias, ...])Add an event to the foreground, either "static", "moving", or "predefined".
add_event_moving([filepath, alias, ...])Add a moving event to the foreground with optional overrides.
add_event_predefined([filepath, trajectory, ...])Add a moving event to the foreground that follows a predefined path.
add_event_static([filepath, alias, ...])Add a static event to the foreground with optional overrides.
add_microphone(**kwargs)Add a microphone to the WorldState.
add_microphone_and_emitter(**kwargs)Add both a microphone and emitter with specified relationship.
add_microphones(**kwargs)Add microphones to the WorldState.
Removes all current ambience events.
clear_emitter(alias)Alias for WorldState.clear_emitter.
Alias for WorldState.clear_emitters.
clear_event(alias)Given an alias for an event, clears the event and updates the state.
Removes all current events and emitters from the state
clear_microphone(alias)Alias for WorldState.clear_microphone.
Alias for WorldState.clear_microphones.
from_dict(input_dict)Instantiate a Scene from a dictionary.
from_json(json_fpath)Instantiate a Scene from a JSON file.
generate([output_dir, audio, metadata_json, ...])Render scene to disk.
generate_acoustic_image([output_dir, t_sti, ...])Generate acoustic image and associated metadata for each microphone array added to the Scene.
get_ambience(alias)Given a valid alias, get an associated ambience event, as in self.ambience[alias]
Get all ambience objects, as in self.ambience.values()
Alias for ClassMapping.mapping
get_emitter(alias[, emitter_idx])Alias for WorldState.get_emitter
get_emitters(alias)Alias for WorldState.get_emitters
get_event(alias_or_idx)Given a valid alias, get an associated event either by alias (string) or idx (int).
Return a list of all events for this scene, as in self.events.values()
get_microphone(alias)Alias for WorldState.get_microphone
Alias for WorldState.get_microphones
to_dict()Returns metadata for this object as a dictionary
- __eq__(other)#
Compare two Scene objects for equality.
Internally, we convert both objects to a dictionary, and then use the deepdiff package to compare them, with some additional logic to account e.g. for significant digits and values that will always be different (e.g., creation time).
- Parameters:
other (Any) – the object to compare the current Scene object against
- Returns:
True if the Scene objects are equivalent, False otherwise
- Return type:
bool
- __getitem__(alias_or_idx)#
An alternative for self.get_event(alias) or `self.events[alias]
- Parameters:
alias_or_idx (str | int)
- Return type:
- __iter__()#
Yields an iterator of Event objects from the current scene
Examples
>>> test_scene = Scene(...) >>> for n in range(9): >>> test_scene.add_event_static(...) >>> for ev in test_scene: >>> assert isinstance(ev, Event)
- Return type:
Iterator[Event]
- __len__()#
Returns the number of events in the scene
- Return type:
int
- __repr__()#
Returns representation of the scene as a JSON
- Return type:
str
- __str__()#
Returns a string representation of the scene
- Return type:
str
- add_ambience(filepath=None, noise=None, channels=None, ref_db=None, alias=None, **kwargs)#
Add ambient noise to the WorldState.
The ambience can be either a file on the disk (in which case filepath must not be None) or a type of noise “color” such as white, red, or blue (in which case noise must not be None). The number of channels can be provided directly or will be inferred from the microphones added to the state, when this is possible.
- Parameters:
channels (int) – the number of channels to generate noise for. If None, will be inferred from available mics.
filepath (str or Path) – a path to an audio file on the disk. If None (and noise is None), will try and sample a random audio file from Scene.bg_audios.
noise (str) – either the type of noise to generate, e.g. “white”, “red”, or an arbitrary numeric exponent to use when generating noise with powerlaw_psd_gaussian.
ref_db (Numeric) – the noise floor, in decibels
alias (str) – string reference to refer to this Ambience object inside Scene.ambience
kwargs – additional keyword arguments passed to audiblelight.ambience.powerlaw_psd_gaussian
- add_emitter(**kwargs)#
Add an emitter to the WorldState.
An alias for WorldState.add_emitter: see that method for a full description.
- add_emitters(**kwargs)#
Add emitters to the WorldState.
An alias for WorldState.add_emitters: see that method for a full description.
- add_event(event_type='static', filepath=None, alias=None, augmentations=None, position=None, trajectory=None, mic=None, polar=False, ensure_direct_path=False, scene_start=None, event_start=None, duration=None, snr=None, class_id=None, class_label=None, shape=None, spatial_resolution=None, spatial_velocity=None, max_place_attempts=1000, image_filepath=None, **event_kwargs)#
Add an event to the foreground, either “static”, “moving”, or “predefined”.
Note that the arguments “scene_start”, “event_start”, “duration”, “snr”, “spatial_velocity”, & “spatial_resolution” will (by default) sample from their respective distributions, provided in Scene.__init__. If a numeric value is provided, this will be treated as an override and used instead of random sampling.
- Parameters:
event_type (str) – the type of event to add, must be either “static”, “moving”, or “predefined”.
filepath (str | Path | None) – a path to a foreground event to use. If not provided, a foreground event will be sampled from fg_category_paths, if this is provided inside __init__; otherwise, an error will be raised.
alias (str | None) – the string alias used to index this event inside the events dictionary
augmentations (Iterable[Type[EventAugmentation]] | Type[EventAugmentation] | int | float | complex | integer | floating | None) – augmentation objects to associate with the Event. If a list of EventAugmentation objects or a single EventAugmentation object, these will be passed directly. If a number, this many augmentations will be sampled from either Scene.event_augmentations, or a master list of valid augmentations (defined inside audiblelight.augmentations) If not provided, EventAugmentations can be registered later by calling register_augmentations on the Event.
position (list | ndarray | None) – Location to add the event. When event_type==”static”, this will be the position of the Event. When event_type==”moving”, this will be the starting position of the Event. When not provided, a random point inside the mesh will be chosen.
trajectory (ndarray | None) – The trajectory the moving event will follow, given in Cartesian coordinates inside the mesh. Only used when event_type==”predefined”. If not provided, will attempt to infer from state.waypoints.
mic (str | None) – String reference to a microphone inside self.state.microphones; when provided, position is interpreted as RELATIVE to the center of this microphone
polar (bool | None) – When True, expects position to be provided in [azimuth, elevation, radius] form; otherwise, units are [x, y, z] in absolute, cartesian terms.
ensure_direct_path (bool | list | str | None) – Whether to ensure a direct line exists between the emitter and given microphone(s). If True, will ensure a direct line exists between the emitter and ALL microphone objects. If a list of strings, these should correspond to microphone aliases inside microphones; a direct line will be ensured with all of these microphones. If False, no direct line is required for a emitter.
scene_start (int | float | complex | integer | floating | None) – Time to start the Event within the Scene, in seconds. Must be a positive number.
event_start (int | float | complex | integer | floating | None) – Time to start the Event audio from, in seconds. Must be a positive number.
duration (int | float | complex | integer | floating | None) – Time the Event audio lasts in seconds. Must be a positive number.
snr (int | float | complex | integer | floating | None) – Signal to noise ratio for the audio file with respect to the noise floor
class_label (str | None) – Optional label to use for sound event class. If not provided, will attempt to infer label from filepath using the DCASE sound event classes.
class_id (int | None) – Optional ID to use for sound event class. If not provided, will attempt to infer ID from filepath using the DCASE sound event classes.
spatial_velocity (int | float | complex | integer | floating | None) – Speed of a moving sound event in metres-per-second
spatial_resolution (int | float | complex | integer | floating | None) – Resolution of a moving sound event in Hz (i.e., number of IRs created per second)
shape (str | None) – the shape of a moving event trajectory; one of “linear”, “semicircular”, “random”, “sine”, “sawtooth”, “predefined”
max_place_attempts (Numeric) – the number of times to try and place an Event before giving up.
image_filepath (str | Path | None) – A path to an image file, used when generating visual representations of a scene. Must be provided in order to generate videos from a Scene.
event_kwargs – additional keyword arguments passed to Event.__init__
- Returns:
the Event object added to the Scene
- Return type:
Examples
Creating an event with random sampling of parameters. Here, note that “scene_start”, “event_start”, “duration”, “snr” will be sampled at random from the distributions defined when initialising the Scene.
>>> scene = Scene(...) >>> scene.add_event( >>> event_type="static", >>> filepath="some/path.wav" >>> )
Creating an event with a predefined position:
>>> scene = Scene(...) >>> scene.add_event( ... event_type="static", ... filepath="some/path.wav", ... alias="tester", ... position=[-0.5, -0.5, 0.5], ... polar=False, ... ensure_direct_path=False ... )
Creating an event with overrides:
>>> scene = Scene(...) >>> scene.add_event( ... event_type="moving", ... filepath="some/path.wav", ... alias="tester", ... event_start=5.0, ... duration=5.0, ... snr=0.0, ... )
Creating an event with an image:
>>> scene = Scene(...) >>> scene.add_event( ... event_type="moving", ... filepath="some/path.wav", ... image_filepath="some/image.jpg" ... )
- add_event_moving(filepath=None, alias=None, augmentations=None, position=None, mic=None, polar=False, shape=None, scene_start=None, event_start=None, duration=None, snr=None, class_id=None, class_label=None, spatial_resolution=None, spatial_velocity=None, ensure_direct_path=False, max_place_attempts=1000, image_filepath=None, **event_kwargs)#
Add a moving event to the foreground with optional overrides.
Note that the arguments “scene_start”, “event_start”, “duration”, “snr”, “spatial_velocity”, & “spatial_resolution” will (by default) sample from their respective distributions, provided in Scene.__init__. If a numeric value is provided, this will be treated as an override and used instead of random sampling.
- Parameters:
filepath (str | Path | None) – a path to a foreground event to use. If not provided, a foreground event will be sampled from fg_category_paths, if this is provided inside __init__; otherwise, an error will be raised.
alias (str | None) – the string alias used to index this event inside the events dictionary
augmentations (Iterable[Type[EventAugmentation]] | Type[EventAugmentation] | int | float | complex | integer | floating | None) – augmentation objects to associate with the Event. If a list of EventAugmentation objects or a single EventAugmentation object, these will be passed directly. If a number, this many augmentations will be sampled from either Scene.event_augmentations, or a master list of valid augmentations (defined inside audiblelight.augmentations) If not provided, EventAugmentations can be registered later by calling register_augmentations on the Event.
position (list | ndarray | None) – Starting point for the event. When not provided, a random point inside the mesh will be chosen.
mic (str | None) – String reference to a microphone inside self.state.microphones; when provided, position is interpreted as RELATIVE to the center of this microphone
polar (bool | None) – When True, expects position to be provided in [azimuth, elevation, radius] form; otherwise, units are [x, y, z] in absolute, cartesian terms.
scene_start (int | float | complex | integer | floating | None) – Time to start the Event within the Scene, in seconds. Must be a positive number.
event_start (int | float | complex | integer | floating | None) – Time to start the Event audio from, in seconds. Must be a positive number.
duration (int | float | complex | integer | floating | None) – Time the Event audio lasts in seconds. Must be a positive number.
snr (int | float | complex | integer | floating | None) – Signal to noise ratio for the audio file with respect to the noise floor
class_label (str | None) – Optional label to use for sound event class. If not provided, will attempt to infer label from filepath using the DCASE sound event classes.
class_id (int | None) – Optional ID to use for sound event class. If not provided, will attempt to infer ID from filepath using the DCASE sound event classes.
spatial_velocity (int | float | complex | integer | floating | None) – Speed of a moving sound event in metres-per-second
spatial_resolution (int | float | complex | integer | floating | None) – Resolution of a moving sound event in Hz (i.e., number of IRs created per second)
shape (str | None) – the shape of a moving event trajectory; one of “linear”, “semicircular”, “random”, “sine”, “sawtooth”
ensure_direct_path (bool | list | str | None) – Whether to ensure a direct line exists between the emitter and given microphone(s). If True, will ensure a direct line exists between the emitter and ALL microphone objects. If a list of strings, these should correspond to microphone aliases inside microphones; a direct line will be ensured with all of these microphones. If False, no direct line is required for a emitter.
max_place_attempts (Numeric) – the number of times to try and place an Event before giving up.
image_filepath (str | Path | None) – A path to an image file, used when generating visual representations of a scene. Must be provided in order to generate videos from a Scene.
event_kwargs – additional keyword arguments passed to Event.__init__
- Returns:
the Event object added to the Scene
- Return type:
- add_event_predefined(filepath=None, trajectory=None, alias=None, augmentations=None, scene_start=None, event_start=None, duration=None, snr=None, class_id=None, class_label=None, ensure_direct_path=False, max_place_attempts=1000, image_filepath=None)#
Add a moving event to the foreground that follows a predefined path.
The spatial velocity and resolution of the event will be inferred from the trajectory itself, in combination with the duration (which may be provided or randomly sampled).
- Parameters:
filepath (str | Path | None)
trajectory (ndarray | None)
alias (str | None)
augmentations (Iterable[Type[EventAugmentation]] | Type[EventAugmentation] | int | float | complex | integer | floating | None)
scene_start (int | float | complex | integer | floating | None)
event_start (int | float | complex | integer | floating | None)
duration (int | float | complex | integer | floating | None)
snr (int | float | complex | integer | floating | None)
class_id (int | None)
class_label (str | None)
ensure_direct_path (bool | list | str | None)
max_place_attempts (int | float | complex | integer | floating | None)
image_filepath (str | Path | None)
- add_event_static(filepath=None, alias=None, augmentations=None, position=None, mic=None, polar=False, ensure_direct_path=False, scene_start=None, event_start=None, duration=None, snr=None, class_id=None, class_label=None, max_place_attempts=1000, image_filepath=None, **event_kwargs)#
Add a static event to the foreground with optional overrides.
Note that the arguments “scene_start”, “event_start”, “duration”, & “snr” will (by default) sample from their respective distributions, provided in Scene.__init__. If a numeric value is provided, this will be treated as an override and used instead of random sampling.
- Parameters:
filepath (str | Path | None) – a path to a foreground event to use. If not provided, a foreground event will be sampled from fg_category_paths, if this is provided inside __init__; otherwise, an error will be raised.
alias (str | None) – the string alias used to index this event inside the events dictionary
augmentations (Iterable[Type[EventAugmentation]] | Type[EventAugmentation] | int | float | complex | integer | floating | None) – augmentation objects to associate with the Event. If a list of EventAugmentation objects or a single EventAugmentation object, these will be passed directly. If a number, this many augmentations will be sampled from either Scene.event_augmentations, or a master list of valid augmentations (defined inside audiblelight.augmentations) If not provided, EventAugmentations can be registered later by calling register_augmentations on the Event.
position (list | ndarray | None) – Location to add the event. When not provided, a random point inside the mesh will be chosen.
mic (str | None) – String reference to a microphone inside self.state.microphones; when provided, position is interpreted as RELATIVE to the center of this microphone
polar (bool | None) – When True, expects position to be provided in [azimuth, elevation, radius] form; otherwise, units are [x, y, z] in absolute, cartesian terms.
ensure_direct_path (bool | list | str | None) – Whether to ensure a direct line exists between the emitter and given microphone(s). If True, will ensure a direct line exists between the emitter and ALL microphone objects. If a list of strings, these should correspond to microphone aliases inside microphones; a direct line will be ensured with all of these microphones. If False, no direct line is required for a emitter.
scene_start (int | float | complex | integer | floating | None) – Time to start the Event within the Scene, in seconds. Must be a positive number.
event_start (int | float | complex | integer | floating | None) – Time to start the Event audio from, in seconds. Must be a positive number.
duration (int | float | complex | integer | floating | None) – Time the Event audio lasts in seconds. Must be a positive number.
snr (int | float | complex | integer | floating | None) – Signal to noise ratio for the audio file with respect to the noise floor
class_label (str | None) – Optional label to use for sound event class. If not provided, will attempt to infer label from filepath using the DCASE sound event classes.
class_id (int | None) – Optional ID to use for sound event class. If not provided, will attempt to infer ID from filepath using the DCASE sound event classes.
max_place_attempts (Numeric) – the number of times to try and place an Event before giving up.
image_filepath (str | Path | None) – A path to an image file, used when generating visual representations of a scene. Must be provided in order to generate videos from a Scene.
event_kwargs – additional keyword arguments passed to Event.__init__
- Returns:
the Event object added to the Scene
- Return type:
- add_microphone(**kwargs)#
Add a microphone to the WorldState.
An alias for WorldState.add_microphone: see that method for a full description.
- Return type:
None
- add_microphone_and_emitter(**kwargs)#
Add both a microphone and emitter with specified relationship.
An alias for WorldState.add_microphone_and_emitter: see that method for a full description.
- Return type:
None
- add_microphones(**kwargs)#
Add microphones to the WorldState.
An alias for WorldState.add_microphones: see that method for a full description.
- Return type:
None
- clear_ambience()#
Removes all current ambience events.
- Return type:
None
- clear_emitter(alias)#
Alias for WorldState.clear_emitter.
- Parameters:
alias (str)
- Return type:
None
- clear_emitters()#
Alias for WorldState.clear_emitters.
- Return type:
None
- clear_event(alias)#
Given an alias for an event, clears the event and updates the state.
Note: simply calling del self.events[alias] is not enough; we also need to remove the source from the ray-tracing engine by updating the state.emitters dictionary and calling state._update.
- Parameters:
alias (str)
- Return type:
None
- clear_events()#
Removes all current events and emitters from the state
- Return type:
None
- clear_microphone(alias)#
Alias for WorldState.clear_microphone.
- Parameters:
alias (str)
- Return type:
None
- clear_microphones()#
Alias for WorldState.clear_microphones.
- Return type:
None
- classmethod from_dict(input_dict)#
Instantiate a Scene from a dictionary.
The new Scene will have the same WorldState, Emitters, Events, and Microphones as the original, serialised dictionary created from to_dict. Ensure that any necessary files (e.g. meshes, audio files) are located in the same places as specified in the dictionary.
Note that, currently, distribution objects (e.g., Scene.event_start_dist) cannot be loaded from a dictionary.
- Parameters:
input_dict (dict[str, Any]) – Dictionary that will be used to instantiate the Scene.
- Returns:
Scene instance.
- classmethod from_json(json_fpath)#
Instantiate a Scene from a JSON file.
- Parameters:
json_fpath (str | Path) – Path to the JSON file to load.
- Returns:
Scene instance.
- generate(output_dir=None, audio=True, metadata_json=True, metadata_dcase=True, audio_fname='audio_out', metadata_fname='metadata_out', video=False, video_fname='video_out')#
Render scene to disk. Generating audio, its metadata, and a video representation.
- Parameters:
output_dir (str | Path | None) – directory to save the output, defaults to current working directory
audio (bool) – whether to save audio as an output, default to True
metadata_json (bool) – whether to save metadata JSON file, default to True
metadata_dcase (bool) – whether to save metadata CSVs in DCASE format, default to True
audio_fname (str | Path | None) – name to use for the output audio file, default to “audio_out”
metadata_fname (str | Path | None) – name to use for the output metadata, default to “metadata_out”
video (bool) – whether to save video as an output, default to False
video_fname (str | Path | None) – name to use for the output video, default to “video_out”
- Returns:
None
- Return type:
None
- generate_acoustic_image(output_dir=None, t_sti=0.01, scale='linear', nbands=9, frame_cap=None, fmin=1500, fmax=4500, bw=50.0, sh_order=10, polygon_mask_threshold=4e-05, resolution=(360, 180), circle_radius=20, json_fname='acoustic_image_metadata', hdf_fname='acoustic_image', standardise=True, n_jobs=-1, verbosity=50)#
Generate acoustic image and associated metadata for each microphone array added to the Scene.
Acoustic images are produced in the form (tesselation, bands, frames). These are produced from synthesised audio using the Accelerated Proximal Gradient Descent (APGD) method.
Metadata for the acoustic images consists of the pixel coordinates of associated “segmentations” (or “blobs”!) extracted from the acoustic image. These segmentations can be treated similar to bounding boxes often found in computer vision, and can be used for tasks like sound event localisation and detection. The method used to obtain the metadata is as follows:
Take the median energy for each band in the acoustic image: gives (tesselation, frames)
- Iterate over all frames with an active annotation in the metadata array
2a. Interpolate the corresponding acoustic image frame to an image with shape (height, width) 2b. Iterate over all annotations for the current frame:
- 2bi. Create a 2D Gaussian centered at the X and Y pixel coordinates of the annotation, with radius set
to span 2SD of all pixel values
2bii. Scale the acoustic image frame by multiplying by the Gaussian 2biii. Mask all values in the scaled acoustic image frame that are below polygon_mask_threshold 2biv. Apply contour detection to grab the edges of each “blob” in the image
2c. Append all “blobs” for the frame: each of these have the format [x_pixel, y_pixel, amplitude]
- If standardise:
- 3a. The pixel amplitude values are Z-scored using the mean/SD of the distribution of max pixel values per
mask, according to the training data in the STARSS23 dataset
3b. 0.5 is added to the Z-scored values 3c. The results are then clipped between 0.01 and 1.0
Return a full dictionary containing annotations of every frame
- The dictionaries contain the following keys:
“metadata_frame_index”: the index of the frame within the acoustic image
“instance_id”: a unique integer identifier for each event in the scene
“category_id”: the index of the soundevent
“distance”: the distance of the soundevent
“segmentation”: a list of [x_pixel, y_pixel, amplitude] values for every segmentation in that frame.
The resulting JSON dictionaries are dumped at metadata_fname: one JSON per microphone added to the Scene. It is also assumed that the amplitude values should be scaled across multiple JSON files that constitute an entire dataset, e.g. by Z-scoring, scaling between 0 and 1, etc. As this process relies on summary statistics that cannot easily be known when computing individual JSONs, this must be accomplished after calling this function.
Note that this functionality is separated from generate due to the number of optional arguments.
- Parameters:
output_dir (str | Path | None) – directory to save the output, defaults to current working directory
t_sti (Numeric) – frame length, defaults to 100 ms (same as DCASE label resolution)
scale (str) – scaling to use for nbands frequency bands, must be either “linear” or “log”
nbands (Numeric) – number of frequency bands
frame_cap (Numeric) – maximum number of frames to compute: set to None to use all frames
fmin (Numeric) – minimum frequency for nbands frequency bands
fmax (Numeric) – maximum frequency for nbands frequency bands
bw (Numeric) – bandwidth for nbands frequency bands
sh_order (Numeric) – spherical harmonic order that determines sampling density
resolution (tuple) – the resolution to interpolate the image to: must be equirectangular, in form (width, height)
polygon_mask_threshold (Numeric) – after scaling the acoustic image according to the 2D Gaussian, values below this threshold will be set to 0. This value should be tweaked based on looking at the images.
circle_radius (Numeric) – the radius of the circle placed at ground-truth azimuth and elevation points when calculating the 2D Gaussian
json_fname (str) – name to use for the output JSON, default to “acoustic_image_metadata”
hdf_fname (str) – name to use for the output HDF file, default to “acoustic_image”
standardise (bool) – whether to standardise the results according to the distribution of pixel values within the STARSS23 training set, defaults to True.
n_jobs (Numeric) – number of multiprocessing jobs, set to 1 to disable multiprocessing. Note that the number of workers will be dynamically reduced on out-of-memory or CPU errors: thus, it is recommended to set n_jobs=-1 to take advantage of all CPU cores for small audio files, falling back automatically to n_jobs=1 (no multiprocessing) for larger files.
verbosity (Numeric) – verbosity level to use when multiprocessing: higher prints more frequently
- Returns:
None
- Return type:
None
- get_ambience(alias)#
Given a valid alias, get an associated ambience event, as in self.ambience[alias]
- Return type:
- get_class_mapping()#
Alias for ClassMapping.mapping
- Return type:
Type[TClassMapping]
- get_emitter(alias, emitter_idx=0)#
Alias for WorldState.get_emitter
- Parameters:
alias (str)
emitter_idx (int)
- Return type:
- get_emitters(alias)#
Alias for WorldState.get_emitters
- Parameters:
alias (str)
- Return type:
list[Emitter]
- get_event(alias_or_idx)#
Given a valid alias, get an associated event either by alias (string) or idx (int).
- Parameters:
alias_or_idx (str | int)
- Return type:
- get_events()#
Return a list of all events for this scene, as in self.events.values()
- Return type:
list[Event]
- get_microphone(alias)#
Alias for WorldState.get_microphone
- Parameters:
alias (str)
- Return type:
Type[MicArray]
- to_dict()#
Returns metadata for this object as a dictionary
- Return type:
dict