Adding New Datasets
To add a new dataset to BackdoorMBTI, follow the steps outlined below. The example provided uses CIFAR-100, but the same process can be applied to other datasets, including custom ones.
Dataset Loading Example
In Backdoor, we use the function load_dataset (in utils/data) as our unified dataset loading interface. Here’s how to load a dataset (e.g., CIFAR-100):
def load_dataset(args, train=True):
match args.dataset:
case "cifar100":
from torchvision.datasets import CIFAR100
ds_dir = DATA_DIR / "CIFAR100"
dataset = CIFAR100(
root=ds_dir,
train=train,
download=True,
transform=transforms.ToTensor(),
)
Import the Dataset Module
If you are using a customed dataset, put it to the datasets folder, or the dataset is available in torchvision.datasets, import it. For example:
from torchvision.datasets import CIFAR100
Set the Dataset Path
Specify the storage path for the dataset. By default, BackdoorMBTI uses the ./data directory:
ds_dir = DATA_DIR / "CIFAR100"
Replace “CIFAR100” with your new dataset name.
Create the Dataset Instance
Use the imported dataset class to create the dataset instance, specifying whether it’s for training or testing, and ensure the data is downloaded if necessary:
dataset = CIFAR100(
root=ds_dir,
train=train,
download=True,
transform=transforms.ToTensor(),
)
Make sure the dataset outputs data in tensor format, as this is required for attack processing in BackdoorMBTI.
Call the Dataset
You can now call the dataset instance in your code to load the training or test data.
from utils.data import load_dataset
# or
from backdoormbti.utils.data import load_dataset
dataset = load_dataset(args, train=True)
Following this structure, you can easily add new datasets to the project.