Segmentation Model Package
Segmentation Models Package
Introduction
I feel that it is valuable to be able to define model architectures from scratch by subclassing nn.Module. However, there are many architectures available for different use cases and problem types. Also, you may want to have additional flexibility, such as the ability to change backbones or initialize using parameters obtained using pre-training. Also, some modern architectures can be very complicated and difficult to build from scratch. So, it would be desirable to have access to libraries that provide a wide variety of architectures and that offer flexibility. For semantic segmentation tasks, the Segmentation Models package provides these functionalities. This package is available at https://github.com/qubvel/segmentation_models.pytorch. We will explore the PyTorch implementation of this package specifically; however, there is also a Keras/Tensorflow version available: https://github.com/qubvel/segmentation_models.
Segmentation Models provides access to a wide variety of semantic segmentation architectures including UNet, UNet++, MANet, LinkNet, FPN, PSPNet, PAN, DeepLabv3, and DeepLabv3+. In this module, the demonstration will use DeepLabv3+. This package also provides a large number of CNN architectures that can be used as the encoder or backbone component of the semantic segmentation architecture including ResNet, ResNeXt, DenseNet, Inception, MobileNet, VGG, and Mix Vision Transformer. Pre-trained weights generated from different datasets are available. The number of different pre-trained weight sets available vary between the available CNN backbone architectures.
The user also has the ability to customize the architecture, and the settings available vary. For the DeepLabv3+ model, the user can specify which encoder to use, the number of stages or blocks in the encoder, if and which pre-trained weights to use, atrous rates, number of input channels, and number of classes being differentiated. There are also auxiliary parameters associated with the classification head auxiliary output component of the model. Interestingly, you can use pre-trained weights even if the number of current input channels is different from the data used to develop the pre-trained weights.
Outside of models, encoders, and pre-trained weights, this package also provides access to additional loss and assessment metrics applicable to semantic segmentation. It also provides methods to simplify the training loop.
Preparation
In this example, we will explore the binary classification problem of extracting surface mine extents from topographic maps. We will use the training, validation, and testing data developed in the prior module.
I import the standard data science packages: numpy, pandas, and matplotlib. I also import os for working with file paths, math, and OpenCV (cv2) for reading in the image data. For the deep learning-specific tasks, I read in torch, albumentations, torchmetrics, and Segmentation Models. Rasterio will be used to read the geospatial raster data for inference at the end of the training process. I also set the device to my available GPU. As in the prior CNN-based modules, the training process presented here is too computationally intensive for CPU-based computation.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import os
import math
import cv2
import torch
import torch.nn as nn
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader
import albumentations as A
import segmentation_models_pytorch as smp
import torchmetrics as tm
import rasterio as rio
from rasterio.plot import show
= torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device print(device)
cuda:0
I next read in the CSV files that were created in the prior module and contain information for each chip in the training, testing, and validation datasets as pandas dataframes. Printing the lengths of these dataframes provides the number of available chips or samples in each partition. Checking the column names, the path to the image chip is housed in the 3rd column while the path to the mask is housed in the 4th column. The “division” column notes whether the chip was a background-only sample or if it contained some pixels mapped to the positive class.
= "C:/myFiles/work/topo_dl_data/topo_dl_data/processing/"
folder = pd.read_csv(folder + "trainDF.csv")
train = pd.read_csv(folder + "valDF.csv")
val = pd.read_csv(folder + "testDF.csv") test
len(train)
49842
len(val)
11826
len(test)
10692
train.columns
Index(['Unnamed: 0', 'chpN', 'chpPath', 'mskPth', 'division'], dtype='object')
On issue with the training set is that there is a large number of background-only chips. To deal with class imbalance, I will only use a subsample of the available background-only chips. This is accomplished by separating the background-only and presence samples into separate dataframes, sampling 2,000 background chips from the larger set without replacement, and merging the presence and subset of background-only chips into a new dataframe. This subset will be used to train the model.
'division'])['division'].count() train.groupby([
division
Background 42071
Positive 7771
Name: division, dtype: int64
= train.query('division == "Positive"')
trainP = train.query("division == 'Background'")
trainB = trainB.sample(n=2000, replace=False)
trainB = pd.concat([trainP, trainB]) train2
'division'])['division'].count() train2.groupby([
division
Background 2000
Positive 7771
Name: division, dtype: int64
I next define a DataSet subclass to read in and convert the image chips and associated masks to tensors. Here are a few notes about the DataSet subclass.
- The path to the image chip is read from the third column (index 2).
- The path to the mask is read from the fourth column (index 3).
- Since cv2 reads images using a channels-last configuration, I must use permute() to convert the tensors to channels-first.
- The model will expect the image mini-batches to have a shape of (Mini-Batch, Channels, Height, Width), a 32-bit float data type, and be scaled from 0 to 1. To rescale the 8-bit data, the values are divided by 255.
- The model will expect the mask mini-batches to have a shape of (Mini-Batch, Height, Width), a long integer data type, and have unique codes for each class. In this case, the original data assigns 255 to the surface disturbance class and 0 to the background. Dividing by 255 rescales the data from 0 to 1 where 1 indicates the positive case and 0 indicates the background.
# Subclass and define custom dataset ===========================
class SegData(Dataset):
def __init__(self, df, transform=None):
self.df = df
self.transform = transform
def __getitem__(self, idx):
= self.df.iloc[idx, 2]
image_name = self.df.iloc[idx, 3]
mask_name = cv2.imread(image_name)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.imread(mask_name, cv2.IMREAD_UNCHANGED)
mask = image.astype('uint8')
image = mask.astype('uint8')
mask = np.expand_dims(mask, axis=2)
mask
if(self.transform is not None):
= self.transform(image=image, mask=mask)
transformed = transformed["image"]
image = transformed["mask"]
mask = torch.from_numpy(image)
image = torch.from_numpy(mask)
mask = image.permute(2, 0, 1)
image = image.float()/255
image = mask.permute(2, 0, 1)
mask = mask.float()/255
mask = mask.squeeze().long()
mask else:
= torch.from_numpy(image)
image = torch.from_numpy(mask)
mask = image.permute(2, 0, 1)
image = image.float()/255
image = mask.permute(2, 0, 1)
mask = mask.float()/255
mask = mask.squeeze().long()
mask return image, mask
def __len__(self):
return len(self.df)
The albumentations package is used to define data augmentations to potentially reduce overfitting. Specifically, random changes to brightness and contrast are applied along with random horizontal and vertical flips and median blurring.
= A.Compose(
train_transform
[=0.3, contrast_limit=0.3, p=0.5),
A.RandomBrightnessContrast(brightness_limit=0.5),
A.HorizontalFlip(p=0.5),
A.VerticalFlip(p=3, always_apply=False, p=0.1),
A.MedianBlur(blur_limit
] )
I next instantiate the training and validation DataSet instances with the data augmentations applied to only the training data. The DataLoaders are defined using a mini-batch size of 16, which worked on my architecture. You may need to change this based on your resources.
= SegData(train2, transform=train_transform)
trainDS = SegData(val, transform=None)
valDS print("Number of Training Samples: " + str(len(trainDS)) + " Number of Validation Samples: " + str(len(valDS)))
Number of Training Samples: 9771 Number of Validation Samples: 11826
= torch.utils.data.DataLoader(trainDS, batch_size=16, shuffle=True, num_workers=0, drop_last=True)
trainDL = torch.utils.data.DataLoader(valDS, batch_size=16, shuffle=False, num_workers=0, drop_last=True) valDL
I plot information about the mini-batches and an individual image and associated mask as checks. The generated mini-batches and individual images have the correct shapes and data types.
I then plot an image and associated mask. I must permute the bands since matplotlib expects the data to be in a channels-last as opposed to channels-first configuration. It looks like these data are being read and converted to tensors correctly.
= next(iter(trainDL))
batch = batch
images, labels print(images.shape, labels.shape, type(images), type(labels), images.dtype, labels.dtype)
torch.Size([16, 3, 256, 256]) torch.Size([16, 256, 256]) <class 'torch.Tensor'> <class 'torch.Tensor'> torch.float32 torch.int64
= images[1]
testImg = labels[1]
testMsk print(testImg.shape, testImg.dtype, type(testImg), testMsk.shape,
type(testMsk), testImg.min(),
testMsk.dtype, max(), testMsk.min(), testMsk.max()) testImg.
torch.Size([3, 256, 256]) torch.float32 <class 'torch.Tensor'> torch.Size([256, 256]) torch.int64 <class 'torch.Tensor'> tensor(0.) tensor(1.) tensor(0) tensor(1)
= SegData(trainP, transform=None)
trainDSP = torch.utils.data.DataLoader(trainDSP, batch_size=16, shuffle=True, num_workers=0, drop_last=True)
trainDLP = next(iter(trainDLP))
batch = batch
images, labels = images[1]
testImg = labels[1] testMsk
# Plot example image =====================================
1,2,0))
plt.imshow(testImg.permute( plt.show()
# Plot example image =====================================
=0).permute(1,2,0))
plt.imshow(testMsk.unsqueeze(dim plt.show()
DeepLabv3+
The DeepLabv3+ semantic segmentation architecture is initiated using the DeepLabV3Plus() function from Segmentation Models. Since the Segmentation Models package includes this architecture, this function can be called to instantiate an instance of DeepLabv3+ as opposed to building it on your own by subclassing nn.Module. I specify the name of the encoder (“ResNet-32”), and I do not instantiate using pre-trained weights since this is a very different problem than the ImageNet classification problem. No activation function (i.e., softmax) is used so that raw logits are delivered.
Since this is a binary classification problem, I could define the model to return only a logit for the positive class or logits for both the positive and background classes. In this example, I have chosen to treat the problem the same as a multiclass classification problem, so logits will be returned for both the positive and background class.
= "resnet18"
encoder = None
encoder_weights = None activation
= smp.DeepLabV3Plus(
model =encoder,
encoder_name=encoder_weights,
encoder_weights=2,
classes=activation,
activation ).to(device)
Train Model
The Segmentation Models package provides some tools to simplify the training loop. However, here I will make use of our standard training loop, which has been discussed and implemented multiple times in prior modules. You can consult the package documentation if you are interested in learning more about the provided TrainEpoch() and ValidEpoch() classes provided by Segmentation Models.
I first initialize an instance of the cross entropy loss as implemented in PyTorch. I will use the AdamW optimizer with the default learning rate. I will monitor the overall accuracy, class-aggregated, macro-averaged F1-score, and Kappa statistic as implemented with torchmetrics. The model will run for a total of 50 epochs, and the saved model will only be overwritten if the class-aggregated F1-score improves for the validation data.
In this example, all parameters are trainable since none have been frozen. If you initialize the model using pre-trained weights, the default behavior is that all parameters will also be updateable. If you want to freeze or unfreeze the backbone or encoder weights, you can use the functions provided below and obtained at the commented-out URL.
If you choose to run the training loop, it will take some time. It took roughly 5 hours to run on my machine with a single GPU. We have provided a trained model if you want to run the later code without training the model.
= nn.CrossEntropyLoss() criterion
= torch.optim.AdamW(model.parameters()) optimizer
= tm.Accuracy(task="multiclass", average="micro", num_classes=2).to(device)
acc = tm.F1Score(task="multiclass", average="macro", num_classes=2).to(device)
f1 = tm.CohenKappa(task="multiclass", average = "micro", num_classes=2).to(device) kappa
= 50
epochs = "C:/myFiles/work/dl/topoDL_models_2/" saveFolder
#https://github.com/qubvel/segmentation_models.pytorch/issues/79
def freeze_encoder(model):
for child in model.encoder.children():
for param in child.parameters():
= False
param.requires_grad return
def unfreeze(model):
for child in model.children():
for param in child.parameters():
= True
param.requires_grad return
= []
eNum = []
t_loss = []
t_acc = []
t_f1 = []
t_kappa = []
v_loss = []
v_acc = []
v_f1 = []
v_kappa
= 0.0
f1VMax
model.train()#Initiate running loss for epoch
= 0.0
running_loss # Loop over epochs
for epoch in range(1, epochs+1):
# Loop over training batches
for batch_idx, (inputs, targets) in enumerate(trainDL):
# Get data and move to device
= inputs.to(device), targets.to(device)
inputs, targets
# Clear gradients
optimizer.zero_grad()# Predict data
= model(inputs)
outputs # Calculate loss
= criterion(outputs, targets)
loss
# Calculate metrics
= acc(outputs, targets)
accT = f1(outputs, targets)
f1T = kappa(outputs, targets)
kappaT
# Backpropagate
loss.backward()
# Update parameters
optimizer.step()
#Update running with batch results
+= loss.item()
running_loss
# Accumulate loss and metrics at end of training epoch
= running_loss/len(trainDL)
epoch_loss = acc.compute()
accT = f1.compute()
f1T = kappa.compute()
kappaT
# Print Losses and metrics at end of each training epoch
print(f'Epoch: {epoch}, Training Loss: {epoch_loss:.4f}, Training Accuracy: {accT:.4f}, Training F1: {f1T:.4f}, Training Kappa: {kappaT:.4f}')
# Append results
eNum.append(epoch)
t_loss.append(epoch_loss)
t_acc.append(accT.detach().cpu().numpy())
t_f1.append(f1T.detach().cpu().numpy())
t_kappa.append(kappaT.detach().cpu().numpy())
# Reset metrics
acc.reset()
f1.reset()
kappa.reset()
# loop over validation batches
with torch.no_grad():
#Initialize running validation loss
= 0.0
running_loss_v for batch_idx, (inputs, targets) in enumerate(valDL):
# Get data and move to device
= inputs.to(device), targets.to(device)
inputs, targets
# Predict data
= model(inputs)
outputs # Calculate validation loss
= criterion(outputs, targets)
loss_v
#Update running with batch results
+= loss_v.item()
running_loss_v
# Calculate metrics
= acc(outputs, targets)
accV = f1(outputs, targets)
f1V = kappa(outputs, targets)
kappaV
#Accumulate loss and metrics at end of validation epoch
= running_loss_v/len(valDL)
epoch_loss_v = acc.compute()
accV = f1.compute()
f1V = kappa.compute()
kappaV
# Print validation loss and metrics
print(f'Validation Loss: {epoch_loss_v:.4f}, Validation Accuracy: {accV:.4f}, Validation F1: {f1V:.4f}, Validation Kappa: {kappaV:.4f}')
# Append results
v_loss.append(epoch_loss_v)
v_acc.append(accV.detach().cpu().numpy())
v_f1.append(f1V.detach().cpu().numpy())
v_kappa.append(kappaV.detach().cpu().numpy())
# Reset metrics
acc.reset()
f1.reset()
kappa.reset()
# Save model if validation F1-score improves
= f1V.detach().cpu().numpy()
f1V2 if f1V2 > f1VMax:
= f1V2
f1VMax + 'topoDL_dlv3p_model.pt')
torch.save(model.state_dict(), saveFolder print(f'Model saved for epoch {epoch}.')
= pd.Series(eNum, name="epoch")
SeNum = pd.Series(t_loss, name="training_loss")
St_loss = pd.Series(t_acc, name="training_accuracy")
St_acc = pd.Series(t_f1, name="training_f1")
St_f1 = pd.Series(t_kappa, name="training_kappa")
St_kappa = pd.Series(v_loss, name="val_loss")
Sv_loss = pd.Series(v_acc, name="val_accuracy")
Sv_acc = pd.Series(v_f1, name="val_f1")
Sv_f1 = pd.Series(v_kappa, name="val_kappa")
Sv_kappa = pd.concat([SeNum, St_loss, St_acc, St_f1, St_kappa, Sv_loss, Sv_acc, Sv_f1, Sv_kappa], axis=1)
resultsDF
+"resultsTopoDL.csv") resultsDF.to_csv(saveFolder
= pd.read_csv("data/models/resultsTopoDL.csv") resultsDF
Once the model training is completed, I explore the training process by plotting the training and validation losses over the training epochs and the training and validation F1-scores.
'figure.figsize'] = [10, 10]
plt.rcParams[= resultsDF.plot(x='epoch', y="training_loss")
firstPlot ='epoch', y="val_loss", ax=firstPlot)
resultsDF.plot(x plt.show()
'figure.figsize'] = [10, 10]
plt.rcParams[= resultsDF.plot(x="epoch", y="training_f1")
firstPlot ='epoch', y="val_f1", ax=firstPlot)
resultsDF.plot(x plt.show()
Assess Model
Once the model is trained, it can be used to perform inference on new data. I will demonstrate assessing the model using the withheld testing data. This involves the following steps:
- Redefine the model and load in the learned parameters from disk
- Instantiate a DataSet for the testing data
- Define a DataLoader for the testing data
- Initialize assessment metrics provided by torchmetrics
- Execute a loop to predict the testing samples and calculate and accumulate the assessment metrics over multiple epochs
= smp.DeepLabV3Plus(
model =encoder,
encoder_name=encoder_weights,
encoder_weights=2,
classes=activation,
activation ).to(device)
"data/models/topoDL_dlv3p_model.pt")) model.load_state_dict(torch.load(
<All keys matched successfully>
= SegData(test, transform=None) testDS
= torch.utils.data.DataLoader(testDS, batch_size=16, shuffle=False, num_workers=0, drop_last=True) testDL
= tm.Accuracy(task="multiclass", num_classes=2).to(device)
acc = tm.F1Score(task="multiclass", num_classes=2, average='none').to(device)
f1 = tm.Recall(task="multiclass", num_classes=2, average='none').to(device)
recall = tm.Precision(task="multiclass", num_classes=2, average='none').to(device)
precision = tm.CohenKappa(task="multiclass", num_classes=2).to(device)
kappa = tm.ConfusionMatrix(task="multiclass", num_classes=2).to(device) cm
eval() model.
DeepLabV3Plus(
(encoder): ResNetEncoder(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), dilation=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(decoder): DeepLabV3PlusDecoder(
(aspp): Sequential(
(0): ASPP(
(convs): ModuleList(
(0): Sequential(
(0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(2): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(24, 24), dilation=(24, 24), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(3): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(36, 36), dilation=(36, 36), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(4): ASPPPooling(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU()
)
)
(project): Sequential(
(0): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Dropout(p=0.5, inplace=False)
)
)
(1): SeparableConv2d(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU()
)
(up): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
(block1): Sequential(
(0): Conv2d(64, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block2): Sequential(
(0): SeparableConv2d(
(0): Conv2d(304, 304, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=304, bias=False)
(1): Conv2d(304, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
)
(segmentation_head): SegmentationHead(
(0): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1))
(1): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
(2): Activation(
(activation): Identity()
)
)
)
with torch.no_grad():
for batch_idx, (inputs, targets) in enumerate(testDL):
= inputs.to(device), targets.to(device)
inputs, targets = model(inputs)
outputs #loss_v = criterion(outputs, targets)
= acc(outputs, targets)
accV = f1(outputs, targets)
f1V = recall(outputs, targets)
rV = precision(outputs, targets)
pV = kappa(outputs, targets)
kappaV = cm(outputs, targets)
cmV = acc.compute()
accV = f1.compute()
f1V = recall.compute()
rV = precision.compute()
pV = kappa.compute()
kappaV = cm.compute()
cmV
acc.reset()
f1.reset()
recall.reset()
precision.reset()
kappa.reset() cm.reset()
Generally, the assessment suggests strong model performance. This is expected since the pattern representing surface mining is very unique or differentiable from the background features.
print(accV)
tensor(0.9972, device='cuda:0')
print(f1V)
tensor([0.9986, 0.9129], device='cuda:0')
print(rV)
tensor([0.9985, 0.9168], device='cuda:0')
print(pV)
tensor([0.9986, 0.9089], device='cuda:0')
print(kappaV)
tensor(0.9114, device='cuda:0')
print(cmV)
tensor([[688103214, 1038740],
[ 940327, 10366487]], device='cuda:0')
Spatial Prediction
Once a model is trained, it can be used to infer back to entire image extents as opposed to individual image chips. We explored making spatial predictions using the landcover.ai dataset in a prior module. Here, I will demonstrate the same process for this classification problem. I first define a topographic map that I want to predict to that was included as part of the testing datasets. Plotting the topographic map, you can see that some surface disturbance extents are present.
= "data/files/KY_Dorton_708542_1954_24000_geo.tif" testImg
= rio.open(testImg)
src rio.plot.show(src)
src.close()
I next redefine the geoInfer() function that was explored in the prior module that can make predictions on overlapping image chips, crop off outer rows and columns of pixels, and merge the predictions back to a single raster file that is georeferenced and has an associated coordinate reference system. I then use this function and the trained model to predict the example topographic map.
def geoInfer(image_in, pred_out, chip_size, stride_x, stride_y, crop, n_channels):
#Read in topo map and convert to tensor===========================
= cv2.imread(image_in)
image1 = cv2.cvtColor(image1, cv2.COLOR_BGR2RGB)
image1 = image1.astype('uint8')
image1 = torch.from_numpy(image1)
image1 = image1.permute(2, 0, 1)
image1 = image1.float()/255
image1 = image1
t_arr
#Make blank grid to write predictions two with same height and width as topo===========================
= cv2.imread(image_in)
image2 = cv2.cvtColor(image2, cv2.COLOR_BGR2RGB)
image2 = image2.astype('uint8')
image2 = torch.from_numpy(image2)
image2 = image2.permute(2, 0, 1)
image2 = image2.float()/255
image2 = image2[0, :, :]
p_arr = 0
p_arr[:,:]
#Predict to entire topo using overlapping chips, merge back to original extent=============
= chip_size
size = stride_x
stride_x = stride_y
stride_y = crop
crop = n_channels
n_channels
= t_arr.shape[2]
across_cnt = t_arr.shape[1]
down_cnt = size
tile_size_across = size
tile_size_down = stride_x
overlap_across = stride_y
overlap_down = math.ceil(across_cnt/overlap_across)
across = math.ceil(down_cnt/overlap_down)
down = list(range(0, across, 1))
across_seq = list(range(0, down, 1))
down_seq = [(x*overlap_across) for x in across_seq]
across_seq2 = [(x*overlap_down) for x in down_seq]
down_seq2 #Loop through row/column combinations to make predictions for entire image
for c in across_seq2:
for r in down_seq2:
= c
c1 = r
r1 = c + size
c2 = r + size
r2 #Default
if c2 <= across_cnt and r2 <= down_cnt:
= r1
r1b = r2
r2b = c1
c1b = c2
c2b #Last column
elif c2 > across_cnt and r2 <= down_cnt:
= r1
r1b = r2
r2b = across_cnt - size
c1b = across_cnt + 1
c2b #Last row
elif c2 <= across_cnt and r2 > down_cnt:
= down_cnt - size
r1b = down_cnt + 1
r2b = c1
c1b = c2
c2b #Last row, last column
else:
= across_cnt - size
c1b = across_cnt + 1
c2b = down_cnt - size
r1b = down_cnt + 1
r2b = t_arr[0:n_channels, r1b:r2b, c1b:c2b]
ten1 = ten1.to(device).unsqueeze(0)
ten1 eval()
model.with torch.no_grad():
= model(ten1)
ten2 = nn.Softmax(dim=1)
m = m(ten2)
pr_probs = torch.argmax(pr_probs, dim=1).squeeze(1)
ten_p = ten_p.squeeze()
ten_p #print("executed for " + str(r1) + ", " + str(c1))
if(r1b == 0 and c1b == 0): #Write first row, first column
-crop, c1b:c2b-crop] = ten_p[0:size-crop, 0:size-crop]
p_arr[r1b:r2belif(r1b == 0 and c2b == across_cnt+1): #Write first row, last column
-crop, c1b+crop:c2b] = ten_p[0:size-crop, 0+crop:size]
p_arr[r1b:r2belif(r2b == down_cnt+1 and c1b == 0): #Write last row, first column
+crop:r2b, c1b:c2b-crop] = ten_p[crop:size+1, 0:size-crop]
p_arr[r1belif(r2b == down_cnt+1 and c2b == across_cnt+1): #Write last row, last column
+crop:r2b, c1b+crop:c2b] = ten_p[crop:size, 0+crop:size+1]
p_arr[r1belif((r1b == 0 and c1b != 0) or (r1b == 0 and c2b != across_cnt+1)): #Write first row
-crop, c1b+crop:c2b-crop] = ten_p[0:size-crop, 0+crop:size-crop]
p_arr[r1b:r2belif((r2b == down_cnt+1 and c1b != 0) or (r2b == down_cnt+1 and c2b != across_cnt+1)): # Write last row
+crop:r2b, c1b+crop:c2b-crop] = ten_p[crop:size, 0+crop:size-crop]
p_arr[r1belif((c1b == 0 and r1b !=0) or (c1b ==0 and r2b != down_cnt+1)): #Write first column
+crop:r2b-crop, c1b:c2b-crop] = ten_p[crop:size-crop, 0:size-crop]
p_arr[r1belif (c2b == across_cnt+1 and r1b != 0) or (c2b == across_cnt+1 and r2b != down_cnt+1): # write last column
+crop:r2b-crop, c1b+crop:c2b] = ten_p[crop:size-crop, 0+crop:size]
p_arr[r1belse: #Write middle chips
+crop:r2b-crop, c1b+crop:c2b-crop] = ten_p[crop:size-crop, crop:size-crop]
p_arr[r1b
#Read in a GeoTIFF to get CRS info=======================================
= rio.open(image_in)
image3 = image3.profile.copy()
profile1
image3.close()"driver"] = "GTiff"
profile1["dtype"] = "uint8"
profile1["count"] = 1
profile1["PHOTOMETRIC"] = "MINISBLACK"
profile1["COMPRESS"] = "NONE"
profile1[
= p_arr.cpu().numpy().round().astype('uint8')
pr_out
#Write out result========================================================
with rio.open(pred_out, "w", **profile1) as f:
1)
f.write(pr_out,
torch.cuda.empty_cache()
=testImg,
geoInfer(image_in="data/files/topo_prediction.tif",
pred_out=256, stride_x=128, stride_y=128, crop=50, n_channels=3) chip_size
Lastly, I read the prediction back in and display it using matplotlib and earthpy. The result look pretty good. If you want to explore the output in more detail and compare it to the input topographic map, I would suggest using a GIS software, such as ArcGIS Pro or QGIS.
#https://www.earthdatascience.org/courses/use-data-open-source-python/intro-raster-data-python/raster-data-processing/classify-plot-raster-data-in-python/
import earthpy.plot as ep
from matplotlib.colors import ListedColormap, BoundaryNorm
= rio.open("data/files/topo_prediction.tif", )
clsOut = clsOut.read().squeeze()
clsOutA
src.close()print(np.max(clsOutA.shape))
6827
= list(np.unique(clsOutA).astype('int'))
classes = ['gray', 'red']
classColors = ['background', 'mining']
classNames = BoundaryNorm([-.5, .5, 1.5], 2)
clsBounds
= ListedColormap(classColors)
colorMap = plt.subplots(figsize=(10,5))
f, ax = ax.imshow(clsOutA, cmap = colorMap, norm=clsBounds)
im set(title="Mining")
ax.= classNames, classes = classes)
ep.draw_legend(im, titles
ax.set_axis_off() plt.show()
Using Variable Learning Rates
Before we end this module, I want to describe how to use different base learning rates for different components of the architecture. When using a pre-trained backbone as an encoder but not freezing the backbone during training, it has generally been suggested that using a lower learning rate for the encoder components relative to the decoder components and classification head can be beneficial. The idea here is that, since the encoder was initialized using non-random parameters, it will require less dramatic parameter updates in comparison to the decoder, which was initialized using random parameters. In my own experimentation and when using the same learning rate for all components of a model, I have generally struggled with unstable training processes when fine tuning a pre-trained encoder, as evident by loss and/or assessment metric results that vary greatly between epochs for the validation data. Using lower learning rates in the pre-trained but unfrozen encoder components has generally helped stabilize my training processes. In short, I highly recommend using variable learning rates for different components of the architecture if you are using pre-trained parameters.
Different learning rates can be implemented as follows:
- Instantiate a model.
- Build an array of dictionaries where each dictionary lists the parameters followed by the associated learning rate. If you are not sure how the modules are named, this can generally be determined by exploring the properties and classes associated with the instantiated model. In the example, I am using a learning rate of 1e-6 in the 1st and 2nd encoder blocks, a learning rate of 1e-5 in the 3rd and 4th encoder blocks, and a learning rate of 1e-4 in the decoder and classification head.
Once the array of dictionaries is defined, it can be passed to the optimizer during instantiation.
When using different learning rates, it is still possible to use optimizers that implement adaptive learning rates, such as RMSProp and Adam, momentum, weight decay, and learning rate schedulers. The stated learning rates just serve as a base learning rate but can still be augmented.
= smp.Unet(
model ="resnet34",
encoder_name="imagenet",
encoder_weights=5,
classes=None).to(device)
activation
= [
model_parameters 'params':model.encoder.layer1.parameters(), 'lr': 1e-6},
{'params':model.encoder.layer2.parameters(), 'lr': 1e-6},
{'params':model.encoder.layer3.parameters(), 'lr': 1e-5},
{'params':model.encoder.layer4.parameters(), 'lr': 1e-5},
{'params':model.decoder.parameters(), 'lr': 1e-4},
{'params':model.segmentation_head.parameters(), 'lr': 1e-4}
{
]
= torch.optim.AdamW(model_parameters) optimizer
Concluding Remarks
We highly encourage you to explore the Segmentation Models library. It offers an easy and intuitive means to implement a variety of semantic segmentation architectures and pre-trained CNN encoder backbones. In a later module and also using Segmentation Models, we will explore the use of a transformer-based encoder, SegFormer, for a UNet architecture.