dataset_utils
assign_test_instances(task_ws, ws_misc, misc_values)
For a given task worksheet and the misc spreadsheet: 1. Get task_id and task_name from worksheet title "{id} - {name}". 2. Collect unique integers in Column A and compute missing IDs from {1..300}. 3. Sample up to 20 missing IDs 4. Write groups into columns C in the matching row of Test Instances tab.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
check_leaf_folders_have_n(data_dir, n=200)
Recursively find all leaf folders under data_dir. A leaf folder is one that contains only files (no subdirectories). For each leaf folder, check it has exactly n files. Args: data_dir (str): The root directory to start searching. n (int): The exact number of files each leaf folder should have. Returns: Tuple[dict, int]: A tuple containing: - A dictionary mapping leaf folder paths to their file counts. - The total file count across all leaf folders.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
download_and_extract_data(url, data_dir, task_name, instance_id, traj_id)
[Internal use only] Download and extract data from a Lightwheel API URL. Args: url (str): The download URL. data_dir (str): The directory to save the data. task_name (str): The name of the task. instance_id (int): The instance ID. traj_id (int): The trajectory ID.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
extract_annotations(data_dir, annotation_data_dir, credentials_path='~/Documents/credentials', remove_memory_prefix=False)
Extract annotations from the annotation data directory and store in the data directory. If remove_memory_prefix is True, remove "memory_prefix" field in skill annotations.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
fix_permissions(root_dir)
Recursively set rw-rw-r-- for all files owned by the current user.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
get_all_instance_id_for_task(lw_token, lightwheel_api_credentials, task_name)
[Internal use only] Given task name, fetch all instance IDs for that task. Args: lw_token (str): Lightwheel API token. lightwheel_api_credentials (dict): Lightwheel API credentials. task_name (str): Name of the task. Returns: Tuple[int, str]: instance_id and resourceUuid
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
get_credentials(credentials_path='~/Documents/credentials')
[Internal use only] Get Google Sheets and Lightwheel API credentials. Args: credentials_path (str): Path to the credentials directory. Returns: Tuple[gspread.Client, dict, str]: Google Sheets client and Lightwheel API credentials and token.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
get_timestamp_from_lightwheel(urls)
[Internal use only] Given a list of URLs, fetch their timestamps (on the filename) from Lightwheel API. Args: urls (List[str]): List of download URLs. Returns: List[str]: List of timestamps.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
get_urls_from_lightwheel(uuids, lightwheel_api_credentials, lw_token)
[Internal use only] Given a list of UUIDs, fetch their download URLs from Lightwheel API. Args: uuids (List[str]): List of version UUIDs. lightwheel_api_credentials (dict): Lightwheel API credentials. lw_token (str): Lightwheel API token. Returns: List[str]: List of download URLs.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
makedirs_with_mode(path, mode=1533)
Recursively create directories with specified mode applied to all newly created dirs. Args: path (str): The directory path to create. mode (int): The mode to apply to newly created directories.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
remove_failed_episodes(worksheet, data_dir)
For the given worksheet and data_dir: 0. Ignore the first row (header) 1. Extract task_id from ws.title, which is "{task_id} - {task_name}" 2. For each row with column B == -1: - take demo_id = int(column A) - construct episode_name = f"episode_{task_id:04d}{demo_id:04d}" - remove corresponding files from data_dir in all subfolders
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
remove_grasp_state(root_dir)
For every parquet file named episode_XXXXXXXX.parquet, If observation.state has dim 258, remove dim 193 and 233 (grasp_left and grasp_right) and save the parquet back to disk.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
reorder_sheet(worksheet)
Reorder rows in the worksheet based on column B and column A.
Rules: 0. First row is header row -> keep as-is. 1. Rows with B == 0 → first group, sorted by A. 2. Rows with B != -1 (and not 0) → second group, sorted by A. 3. Rows with B == -1 → last group, sorted by A.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
update_google_sheet(credentials_path, task_name, row_idx)
[Internal use only] update internal data replay tracking sheet. Args: credentials_path (str): Path to the credentials directory. task_name (str): Name of the task to update. row_idx (int): Row index to update.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
update_parquet_indices(root_dir)
For every parquet file named episode_XXXXXXXX.parquet, update episode_index and task_index.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
update_sheet_counts(worksheet)
[Internal use only] Updates the worksheet: 1. For rows with B != 0: - E = "ignored" - F = "" 2. Replace column B with the number of occurrences of column A in previous rows.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
update_tracking_sheet(credentials_path='~/Documents/credentials', max_entries_per_task=None)
[Internal use only] Updates the tracking sheet with the latest information from lightwheel. Args: credentials_path (str): The path to the credentials file. max_entries_per_task (Optional[int]): The maximum number of entries to process per task.
Source code in OmniGibson/omnigibson/learning/utils/dataset_utils.py
691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 | |