Getting Started

This section provides a quick guide to help you get started with BackdoorMBTI, including downloading data, running backdoor attacks, and executing defense experiments.

Backdoor Attack

Here is an example to quickly start an attack experiment and reproduce the BadNets backdoor attack results.

import argparse
from pathlib import Path

from torchvision import transforms
from torchvision.datasets import CIFAR10


# if you install from PyPI
from backdoormbti.attacks.image import BadNet

# if you cloned the package
from attacks.image import BadNet

# prepare dataset
transform = transforms.Compose([transforms.ToTensor()])
trainset = CIFAR10(
   root="./data/cifar10", download=True, train=True, transform=transform
)
testset = CIFAR10(
   root="./data/cifar10", download=True, train=False, transform=transform
)

# load args
parser = argparse.ArgumentParser()
args = parser.parse_args()
args.data_type = "image"
args.dataset = "cifar10"
args.attack_name = "badnet"
args.pratio = 0.1
args.attack_target = 0
args.random_seed = 0
args.input_size = (32, 32, 3)
args.patch_mask_path = "resources/badnet/trigger_image.png"

# create attack instance
poison_trainset = BadNet(trainset, args=args, mode="train", pop=False)
poison_testset = BadNet(testset, args=args, mode="test", pop=False)

# make and save poison data
poison_trainset.make_and_save_dataset(save_dir=Path("./"))
poison_testset.make_and_save_dataset(save_dir=Path("./"))

After running the above code, the backdoor attack will be executed, and the poison dataset image_badnet_poison_train_set.pt will be saved in the current directory. The following images show the benign and poison images generated by the BadNets attack. The right image is the poison image with a trigger pattern added to the bottom-right corner.

../_images/getting_started_benign_image.png ../_images/getting_started_poison_image.png

Backdoor Training via Customed Training Pipeline

When the poiosn dataset is generated, you can use it to train a backdoor model in your own code. If you want to customize the training pipeline, you can use the following code snippet:

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models
from tqdm import tqdm

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
poisonset = torch.load("image_badnet_poison_train_set.pt")
backdoor_trainloader = torch.utils.data.DataLoader(
   poisonset, batch_size=32, shuffle=True
)

# define your model
model = models.resnet18(pretrained=False)
model.fc = nn.Linear(model.fc.in_features, 10)
model.to(device)

# define your criterion and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# train the model
num_epochs = 10
for epoch in range(num_epochs):
   model.train()
   running_loss = 0.0
   # the data format in poison datset: (inputs, labels, is_poison, pre_labels)
   for inputs, labels, is_poison, pre_labels in tqdm(
      backdoor_trainloader, desc="training"
   ):
      inputs, labels = inputs.to(device), labels.to(device)

      optimizer.zero_grad()

      outputs = model(inputs)
      loss = criterion(outputs, labels)
      loss.backward()
      optimizer.step()

      running_loss += loss.item()

   print(
      f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(backdoor_trainloader):.4f}"
   )

torch.save(model.state_dict(), "backdoor_model.pth")

Backdoor Attack Evaluation

After training the backdoor model, you can evaluate the attack success rate (ASR) and robustness accuracy (RA) of the model on the test set.

import torch
from torchvision import models

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# load the backdoor model
state_dict = torch.load("backdoor_model.pth")
backdoor_model = models.resnet18(weights=None)
backdoor_model.fc = torch.nn.Linear(backdoor_model.fc.in_features, 10)
backdoor_model.load_state_dict(state_dict)
backdoor_model.to(device)

# load poison test set
poison_testset = torch.load("image_badnet_poison_test_set.pt")
testloader = torch.utils.data.DataLoader(poison_testset, batch_size=32, shuffle=False)

backdoor_model.eval()
robustness = 0
success = 0
total = 0
with torch.no_grad():
   for inputs, labels, is_poison, pre_labels in testloader:
      inputs, labels, pre_labels = (
            inputs.to(device),
            labels.to(device),
            pre_labels.to(device),
      )
      outputs = backdoor_model(inputs)
      _, predicted = torch.max(outputs.data, 1)
      total += labels.size(0)
      robustness += (predicted == pre_labels).sum().item()
      success += (predicted == labels).sum().item()

print(
   f"Robust Accuracy of the model on the test images: {100 * robustness / total:.2f}%"
)
print(
   f"Attack Success Rate of the model on the test images: {100 * success / total:.2f}%"
)

Backdoor Defense

After evaluating the backdoor attack, you can start the defense experiment. Here is an example to quickly start a defense experiment and reproduce the fine-pruning defense results.

import argparse

import torch
from torchvision import models
from torchvision.datasets import CIFAR10
from torchvision.transforms import transforms

from defenses.image import STRIP
from models.wrapper import ImageModelWrapper as ModelWrapper
from utils.data import CleanDatasetWrapper as DatasetWrapper
from utils.eval import eval_def_acc
from utils.io import save_results

# init args
parser = argparse.ArgumentParser()
args = parser.parse_args()
args.fast_dev = False
args.random_seed = 0
args.batch_size = 32
args.num_workers = 4
args.num_devices = 1
args.num_classes = 10
args.collate_fn = None
# defense args
args.repeat = 5
args.pertub_ratio = 0.8
args.frr = 0.05
args.use_oppsite_set = False
# training args
args.client_optimizer = "sgd"
args.lr = 0.01
args.lr_scheduler = "CosineAnnealingLR"
args.weight_decay = 0.0005
args.freqency_save = 10


# set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# prepare dataset
transform = transforms.Compose([transforms.ToTensor()])
trainset = CIFAR10(
   root="./data/cifar10", download=True, train=True, transform=transform
)
args.train_set = DatasetWrapper(trainset)

poison_trainset = torch.load("image_badnet_poison_train_set.pt")
testset = CIFAR10(
   root="./data/cifar10", download=True, train=False, transform=transform
)
poison_testset = torch.load("image_badnet_poison_test_set.pt")

# load backdoor model
state_dict = torch.load("backdoor_model.pth")
backdoor_model = models.resnet18(weights=None)
backdoor_model.fc = torch.nn.Linear(backdoor_model.fc.in_features, 10)
backdoor_model.load_state_dict(state_dict)
backdoor_model.to(device)
bkd_lit_model = ModelWrapper(backdoor_model, args)

# evaluate the backdoor model
backdoor_model.eval()

# initialize the defense
defense = STRIP(args)
defense.setup(
   clean_train_set=DatasetWrapper(trainset),
   clean_test_set=DatasetWrapper(testset),
   poison_train_set=DatasetWrapper(poison_trainset),
   poison_test_set=DatasetWrapper(poison_testset),
   model=bkd_lit_model,
   collate_fn=None,
)

is_clean_lst = defense.get_sanitized_lst(poison_trainset)
results = eval_def_acc(is_clean_lst, poison_trainset)
save_results("results.json", results)

STRIP is a sample detection defense method. After running the above code, the detection accuracy of the defense will be collected, and the sanitized dataset can be used to retrain the model. After retraining, the ACC, ASR, and RA metrics will be collected for further evaluation.

Backdoor Attack Training via Command Line

We use ResNet-18 as the default model and a poison ratio of 0.1. For users installing from PyPI, you can run the following commands directly in the terminal:

atk_train --data_type image --dataset cifar10 --attack_name badnet --model_name resnet18 --pratio 0.1 --num_workers 4 --epochs 100
atk_train --data_type audio --dataset speechcommands --attack_name blend --model_name audiocnn --pratio 0.1 --num_workers 4 --epochs 100 --add_noise true
atk_train --data_type text --dataset sst2 --attack_name addsent --model_name bert --pratio 0.1 --num_workers 4 --epochs 100 --mislabel true

For users installing from source code, use the following command structure:

cd backdoormbti
cd training_pipeline
python atk_train.py --data_type image --dataset cifar10 --attack_name badnet --model_name resnet18 --pratio 0.1 --num_workers 4 --epochs 100
python atk_train.py --data_type audio --dataset speechcommands --attack_name blend --model_name audiocnn --pratio 0.1 --num_workers 4 --epochs 100 --add_noise true
python atk_train.py --data_type text --dataset sst2 --attack_name addsent --model_name bert --pratio 0.1 --num_workers 4 --epochs 100 --mislabel true

To introduce noise or label mislabeling, you can add the –add_noise true or –mislabel true arguments. After the experiment, metrics such as ACC (Accuracy), ASR (Attack Success Rate), and RA (Robustness Accuracy) will be collected in the attack phase.

For more detailed command options, run:

atk_train -h
python atk_train.py -h

Backdoor Defense Training via Command Line

For defense experiments, it depends on the backdoor model generated in the attack phase, so make sure to complete the corresponding attack experiment before running defense.

For users installing from PyPI, use the following commands:

def_train --data_type image --dataset cifar10 --attack_name badnet --pratio 0.1 --defense_name finetune --num_workers 4 --epochs 10
def_train --data_type audio --dataset speechcommands --attack_name blend --model_name audiocnn --pratio 0.1 --defense_name fineprune --num_workers 4 --epochs 1 --add_noise true
def_train --data_type text --dataset sst2 --attack_name addsent --model_name bert --pratio 0.1 --defense_name strip --num_workers 4 --epochs 1 --mislabel true

For users installing from source code, use the following command structure:

cd backdoormbti
cd training_pipeline
python def_train.py --data_type image --dataset cifar10 --attack_name badnet --pratio 0.1 --defense_name finetune --num_workers 4 --epochs 10
python def_train.py --data_type audio --dataset speechcommands --attack_name blend --model_name audiocnn --pratio 0.1 --defense_name fineprune --num_workers 4 --epochs 1 --add_noise true
python def_train.py --data_type text --dataset sst2 --attack_name addsent --model_name bert --pratio 0.1 --defense_name strip --num_workers 4 --epochs 1 --mislabel true

For more details on defense commands, run:

def_train -h
python def_train.py -h

In the defense phase, detection accuracy will be collected if the defense is a detection method, and the sanitized dataset will be used to retrain the model. After retraining, ACC, ASR, and RA metrics will be collected for further evaluation.