Bulk Operations
Functions for operating on multiple datasets efficiently.
Overview
Bulk operations provide significant performance improvements when working with many datasets:
| Operation | Sequential | Bulk (Async) | Speedup |
|---|---|---|---|
| 10 datasets | ~2s | ~0.2s | 10x |
| 100 datasets | ~20s | ~0.5s | 40x |
| 1000 datasets | ~200s | ~2s | 100x |
Available Functions
Synchronous
from huwise_utils_py import bulk_get_metadata, bulk_update_metadata, bulk_get_dataset_ids
# Get metadata for multiple datasets
metadata = bulk_get_metadata(dataset_ids=["100123", "100456", "100789"])
# Update multiple datasets
updates = [
{"dataset_id": "100123", "title": "New Title 1"},
{"dataset_id": "100456", "title": "New Title 2"},
]
results = bulk_update_metadata(updates)
# Get all dataset IDs
ids = bulk_get_dataset_ids()
Asynchronous
import asyncio
from huwise_utils_py import (
bulk_get_metadata_async,
bulk_update_metadata_async,
bulk_get_dataset_ids_async,
)
async def main():
# Fetch metadata concurrently
metadata = await bulk_get_metadata_async(dataset_ids=["100123", "100456", "100789"])
# Update concurrently
updates = [
{"dataset_id": "100123", "title": "New Title 1"},
{"dataset_id": "100456", "title": "New Title 2"},
]
results = await bulk_update_metadata_async(updates)
# Get all IDs
ids = await bulk_get_dataset_ids_async()
asyncio.run(main())
Usage Examples
Bulk Metadata Fetch
from huwise_utils_py import bulk_get_metadata
dataset_ids = ["100123", "100456", "100789"]
metadata = bulk_get_metadata(dataset_ids=dataset_ids)
for dataset_id, meta in metadata.items():
title = meta.get("default", {}).get("title", {}).get("value", "No title")
print(f"{dataset_id}: {title}")
Bulk Update with Error Handling
from huwise_utils_py import bulk_update_metadata
updates = [
{"dataset_id": "100123", "title": "Title 1", "description": "Desc 1"},
{"dataset_id": "100456", "title": "Title 2", "description": "Desc 2"},
{"dataset_id": "100789", "title": "Will fail"},
]
results = bulk_update_metadata(updates, publish=True)
for dataset_id, result in results.items():
if result["status"] == "success":
print(f"{dataset_id}: Updated {result['fields_updated']}")
else:
print(f"{dataset_id}: Failed - {result['error']}")
Get All Dataset IDs with Filtering
from huwise_utils_py import bulk_get_dataset_ids
# Get all public datasets (exclude restricted)
public_ids = bulk_get_dataset_ids(include_restricted=False)
# Get first 100 datasets
limited_ids = bulk_get_dataset_ids(max_datasets=100)
Async with Custom Configuration
import asyncio
from huwise_utils_py import HuwiseConfig, bulk_get_metadata_async
async def fetch_from_multiple_domains():
# Config for domain A
config_a = HuwiseConfig(api_key="key-a", domain="domain-a.com")
# Config for domain B
config_b = HuwiseConfig(api_key="key-b", domain="domain-b.com")
# Fetch concurrently from both domains
metadata_a, metadata_b = await asyncio.gather(
bulk_get_metadata_async(dataset_ids=["100123", "100456"], config=config_a),
bulk_get_metadata_async(dataset_ids=["100789"], config=config_b),
)
return {**metadata_a, **metadata_b}
asyncio.run(fetch_from_multiple_domains())
API Reference
Bulk operations for Huwise datasets.
This module provides both synchronous and asynchronous functions for performing bulk operations on multiple datasets efficiently.
bulk_get_metadata(dataset_ids: list[str] | None = None, dataset_uids: list[str] | None = None, config: HuwiseConfig | None = None) -> dict[str, dict[str, Any]]
Fetch metadata for multiple datasets synchronously.
Uses sequential HTTP requests. For better performance with many datasets, use bulk_get_metadata_async instead.
Either dataset_ids or dataset_uids must be provided, but not both.
| PARAMETER | DESCRIPTION |
|---|---|
dataset_ids
|
List of numeric dataset IDs to fetch metadata for.
TYPE:
|
dataset_uids
|
List of dataset UIDs to fetch metadata for.
TYPE:
|
config
|
Optional HuwiseConfig instance.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, dict[str, Any]]
|
Dictionary mapping dataset ID to its metadata. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If both or neither identifier lists are provided. |
Source code in src/huwise_utils_py/bulk.py
bulk_get_metadata_async(dataset_ids: list[str] | None = None, dataset_uids: list[str] | None = None, config: HuwiseConfig | None = None) -> dict[str, dict[str, Any]]
async
Fetch metadata for multiple datasets concurrently.
Uses async HTTP requests to fetch metadata in parallel, providing significant performance improvements over sequential requests.
Either dataset_ids or dataset_uids must be provided, but not both.
| PARAMETER | DESCRIPTION |
|---|---|
dataset_ids
|
List of numeric dataset IDs to fetch metadata for.
TYPE:
|
dataset_uids
|
List of dataset UIDs to fetch metadata for.
TYPE:
|
config
|
Optional HuwiseConfig instance.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, dict[str, Any]]
|
Dictionary mapping dataset ID to its metadata. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If both or neither identifier lists are provided. |
Example
Source code in src/huwise_utils_py/bulk.py
bulk_update_metadata(updates: list[dict[str, Any]], config: HuwiseConfig | None = None, *, publish: bool = True) -> dict[str, dict[str, Any]]
Update metadata for multiple datasets synchronously.
Each update dict must contain either 'dataset_uid' or 'dataset_id' along with the metadata fields to update.
| PARAMETER | DESCRIPTION |
|---|---|
updates
|
List of update dictionaries.
TYPE:
|
config
|
Optional HuwiseConfig instance.
TYPE:
|
publish
|
Whether to publish datasets after updating.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, dict[str, Any]]
|
Dictionary mapping dataset UID to update result. |
Source code in src/huwise_utils_py/bulk.py
bulk_update_metadata_async(updates: list[dict[str, Any]], config: HuwiseConfig | None = None, *, publish: bool = True) -> dict[str, dict[str, Any]]
async
Update metadata for multiple datasets concurrently.
Each update dict must contain either 'dataset_uid' or 'dataset_id' along with the metadata fields to update.
| PARAMETER | DESCRIPTION |
|---|---|
updates
|
List of update dictionaries, each containing: - dataset_uid or dataset_id: Identifier for the dataset - Other keys: Metadata fields to update (e.g., title, description)
TYPE:
|
config
|
Optional HuwiseConfig instance.
TYPE:
|
publish
|
Whether to publish datasets after updating.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, dict[str, Any]]
|
Dictionary mapping dataset UID to update result. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If an update dict contains both or neither identifier. |
Example
Source code in src/huwise_utils_py/bulk.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
bulk_get_dataset_ids(config: HuwiseConfig | None = None, *, include_restricted: bool = True, max_datasets: int | None = None) -> list[str]
Retrieve all dataset IDs synchronously.
Uses sequential HTTP requests with pagination.
| PARAMETER | DESCRIPTION |
|---|---|
config
|
Optional HuwiseConfig instance.
TYPE:
|
include_restricted
|
Include restricted datasets.
TYPE:
|
max_datasets
|
Maximum number of datasets to return.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
Sorted list of dataset IDs. |
Source code in src/huwise_utils_py/bulk.py
bulk_get_dataset_ids_async(config: HuwiseConfig | None = None, *, include_restricted: bool = True, max_datasets: int | None = None) -> list[str]
async
Retrieve all dataset IDs asynchronously.
| PARAMETER | DESCRIPTION |
|---|---|
config
|
Optional HuwiseConfig instance.
TYPE:
|
include_restricted
|
Include restricted datasets.
TYPE:
|
max_datasets
|
Maximum number of datasets to return.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
Sorted list of dataset IDs. |