obs_utils
VideoLoader
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 | |
frames
property
Return all frames at once.
__init__(*args, path, batch_size=None, stride=1, output_size=None, start_idx=0, end_idx=None, start_idx_is_keyframe=False, fps=30, downsample_factor=1, **kwargs)
Sequentially load RGB video with robust frame extraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the video file |
required |
batch_size
|
int
|
Batch size to load the video into memory. If None, load the entire video into memory. |
None
|
stride
|
int
|
Stride to load the video into memory. i.e. if batch_size=3 and stride=1, iter will return [0, 1, 2], [1, 2, 3], [2, 3, 4], ... |
1
|
output_size
|
Tuple[int, int]
|
Output size of the video frames to resize to. |
None
|
start_idx
|
int
|
Frame to start loading the video from. Default is 0. |
0
|
end_idx
|
Optional[int]
|
Frame to stop loading the video at. If None, will load until video end. NOTE: end idx is not inclusive, i.e. if end_idx=10, the last frame will be 9. |
None
|
start_idx_is_keyframe
|
bool
|
Whether the start index is a keyframe. Set this to True if you know the start index is a keyframe, which will allow for faster seeking to the start index. |
False
|
fps
|
int
|
Frames per second of the video. Default is 30. |
30
|
downsample_factor
|
int
|
Factor to downsample the video frames by. Default is 1 (no downsampling). |
1
|
Returns: th.Tensor: (T, H, W, 3) RGB video tensor
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
create_video_writer(fpath, resolution, codec_name='libx264', rate=30, pix_fmt='yuv420p', stream_options=None, context_options=None)
Creates a video writer to write video frames to when playing back the dataset using PyAV
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fpath
|
str
|
Absolute path that the generated video writer will write to. Should end in .mp4 or .mkv |
required |
resolution
|
tuple
|
Resolution of the video frames to write (height, width) |
required |
codec_name
|
str
|
Codec to use for the video writer. Default is "libx264" |
'libx264'
|
rate
|
int
|
Frame rate of the video writer. Default is 30 |
30
|
pix_fmt
|
str
|
Pixel format to use for the video writer. Default is "yuv420p" |
'yuv420p'
|
stream_options
|
dict
|
Additional stream options to pass to the video writer. Default is None |
None
|
context_options
|
dict
|
Additional context options to pass to the video writer. Default is None |
None
|
Returns: av.Container: PyAV container object that can be used to write video frames av.Stream: PyAV stream object that can be used to write video frames
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
depth_to_pcd(depth, rel_pose, K)
Convert depth images to point clouds with batch processing support. Args: depth: (B, H, W) depth tensor rel_pose: (B, 7) relative pose from camera to base tensor [pos, quat] K: (3, 3) camera intrinsics tensor max_depth: maximum depth value to filter Returns: pc: (B, H, W, 3) point cloud tensor in base frame
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
dequantize_depth(quantized_depth, min_depth=MIN_DEPTH, max_depth=MAX_DEPTH, shift=DEPTH_SHIFT)
Dequantizes a 14-bit depth tensor back to the original depth values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
quantized_depth
|
ndarray
|
Quantized depth tensor. |
required |
min_depth
|
float
|
Minimum depth value. |
MIN_DEPTH
|
max_depth
|
float
|
Maximum depth value. |
MAX_DEPTH
|
shift
|
float
|
Small value to shift depth to avoid log(0). |
DEPTH_SHIFT
|
Returns: np.ndarray: Dequantized depth tensor.
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
downsample_pcd(color_pcd, num_points, use_fps=True)
Downsample point clouds with batch FPS processing or random sampling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
color_pcd
|
(B, [T], N, 6) point cloud tensor [rgb, xyz] for each batch |
required | |
num_points
|
target number of points |
required |
Returns: color_pcd: (B, num_points, 6) downsampled point cloud sampled_idx: (B, num_points) sampled indices
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
find_non_overlapping_text_position(x1, y1, x2, y2, text_size, occupied_regions, img_height, img_width)
Find a text position that doesn't overlap with existing text.
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
generate_yuv_palette(num_ids)
Generate num_ids equidistant YUV colors in the valid YUV space.
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
instance_id_to_instance(obs, instance_id_mapping, unique_ins_ids)
Instance_id segmentation map each unique visual meshes of objects (e.g. /World/scene_name/object_name/visual_mesh_0) This function merges all visual meshes of the same object instance to a single instance id. Args: obs (th.Tensor): (N, H, W) instance_id segmentation instance_id_mapping (Dict[int, str]): Dict mapping instance_id ids to instance names Returns: instance_seg (th.Tensor): (N, H, W) instance segmentation instance_mapping (Dict[int, str]): Dict mapping instance ids to instance names
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
instance_to_bbox(obs, instance_mapping, unique_ins_ids)
Convert instance segmentation to bounding boxes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
Tensor
|
(N, H, W) tensor of instance IDs |
required |
instance_mapping
|
Dict[int, str]
|
Dict mapping instance IDs to instance names Note: this does not need to include all instance IDs, only the ones that we want to generate bbox for |
required |
unique_ins_ids
|
List[int]
|
List of unique instance IDs |
required |
Returns: List of N lists, each containing tuples (x_min, y_min, x_max, y_max, instance_id) for each instance
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
instance_to_semantic(obs, instance_mapping, unique_ins_ids, is_instance_id=True)
Convert instance / instance id segmentation to semantic segmentation. Args: obs (th.Tensor): (N, H, W) instance / instance_id segmentation instance_mapping (Dict[int, str]): Dict mapping instance IDs to instance names unique_ins_ids (List[int]): List of unique instance IDs is_instance_id (bool): Whether the input is instance id segmentation Returns: semantic_seg (th.Tensor): (N, H, W) semantic segmentation
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
overlay_bboxes_with_names(img, bbox_2d_data, instance_mapping, task_relevant_objects)
Overlays bounding boxes with object names on the given image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
img
|
ndarray
|
The input image (RGB) to overlay on. |
required |
bbox_2d_data
|
List[Tuple[int, int, int, int, int]]
|
Bounding box data with format (x1, y1, x2, y2, instance_id) |
required |
instance_mapping
|
Dict[int, str]
|
Mapping from instance ID to object name |
required |
task_relevant_objects
|
List[str]
|
List of task relevant objects |
required |
Returns: np.ndarray: The image with bounding boxes and object names overlaid.
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
process_fused_point_cloud(obs, camera_intrinsics, pcd_range, pcd_num_points=None, use_fps=True, verbose=False)
Given a dictionary of observations, process the fused point cloud from all cameras and return the final point cloud tensor in robot base frame. Args: obs (dict): Dictionary of observations containing point cloud data from different cameras. camera_intrinsics (Dict[str, th.Tensor]): Dictionary of camera intrinsics for each camera. pcd_range (Tuple[float, float, float, float, float, float]): Range of the point cloud to filter [x_min, x_max, y_min, y_max, z_min, z_max]. pcd_num_points (Optional[int]): Number of points to sample from the point cloud. If None, no downsampling is performed. use_fps (bool): Whether to use farthest point sampling for point cloud downsampling. Default is True. verbose (bool): Whether to print verbose output during processing. Default is False.
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
quantize_depth(depth, min_depth=MIN_DEPTH, max_depth=MAX_DEPTH, shift=DEPTH_SHIFT)
Quantizes depth values to a 14-bit range (0 to 16383) based on the specified min and max depth.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth
|
ndarray
|
Depth tensor. |
required |
min_depth
|
float
|
Minimum depth value. |
MIN_DEPTH
|
max_depth
|
float
|
Maximum depth value. |
MAX_DEPTH
|
shift
|
float
|
Small value to shift depth to avoid log(0). |
DEPTH_SHIFT
|
Returns: np.ndarray: Quantized depth tensor.
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
rgbd_vid_to_pcd(data_folder, task_id, demo_id, episode_id, robot_camera_names=ROBOT_CAMERA_NAMES['R1Pro'], downsample_ratio=4, pcd_range=(-0.2, 1.0, -1.0, 1.0, -0.2, 1.5), pcd_num_points=4096, batch_size=500, use_fps=False)
Generate point cloud data from compressed RGBD data (mp4) in the specified task folder. Args: data_folder (str): Path to the data folder containing RGBD data. task_id (int): Task ID for the task being processed. demo_id (int): Demo ID for the episode being processed. episode_id (int): Episode ID for the episode being processed. robot_camera_names (dict): Dict of camera names to process. downsample_ratio (int): Downsample ratio for the camera resolution. pcd_range (tuple): Range of the point cloud. pcd_num_points (Optional[int]): Number of points to sample from the point cloud. If None, no downsampling is performed. batch_size (int): Number of frames to process in each batch. use_fps (bool): Whether to use farthest point sampling for point cloud downsampling.
Source code in OmniGibson/omnigibson/learning/utils/obs_utils.py
570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 | |
write_video(obs, video_writer, mode='rgb', batch_size=None, **kwargs)
Writes videos to the specified video writers using the current trajectory history
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
ndarray
|
Observation data |
required |
video_writer
|
(container, stream)
|
PyAV container and stream objects to write video frames to |
required |
mode
|
str
|
Mode to write video frames to. Only "rgb", "depth" and "seg" are supported. |
'rgb'
|
batch_size
|
int
|
Batch size to write video frames to. If None, write video frames to the entire video. |
None
|
kwargs
|
dict
|
Additional keyword arguments to pass to the video writer. |
{}
|