From Pixels to Meaning: Semantic Change Detection in Urban Satellite Imagery

The project uses Semantic Change Detection (SCD) to monitor and analyze changes in urban environments through multi-temporal satellite imagery. By utilizing cutting-edge AI models and machine learning techniques, the project aims to provide a more efficient and accurate system for urban monitoring, reducing the reliance on traditional, labor-intensive methods. This system identifies and classifies changes in terrain, buildings, water bodies, and other urban features at a pixel level.

Key Contributions

Benchmarking New Dataset: The project uses the new SECOND dataset, which comprises 4,662 pairs of multi-temporal satellite images from cities like Hangzhou, Chengdu, and Shanghai. As this dataset has not been widely used in prior studies, one key contribution of this project is to evaluate and compare model performance on this new dataset.
Leveraging State-of-the-Art Architectures: The project applies seven state-of-the-art (SOTA) models for semantic change detection, including BiSRNet, HRNet, Twin Siamese Network, and others. These models are compared to establish a new benchmark for urban monitoring using satellite imagery.
Improvement in Evaluation Metrics: The project implements advanced evaluation metrics tailored to SCD tasks, such as mean Intersection over Union (mIoU), Separated Kappa (SeK), and overall accuracy (OA), to assess the model's ability to identify and categorize urban changes.
Generation of Semantic Change Maps: Unlike traditional methods that generate binary change maps, this project provides semantic change maps, which not only highlight the areas of change but also categorize the type of change (e.g., new buildings, water bodies, vegetation).

AI Implementation in MINDSETS

MINDSETS implements various machine learning models optimized for multi-temporal satellite image analysis:

DualStreamFCN: This model processes two input images in parallel using ResNet-34 as a backbone. The outputs of both streams are combined and passed to a classifier to detect and classify changes. The dual-stream architecture is ideal for capturing differences between temporal images.
HRNet: A high-resolution network that preserves pixel-level details throughout the architecture, making it suitable for urban monitoring tasks requiring accurate localization. Its ability to handle high-resolution images improves the segmentation and classification of urban structures.
BiSRNet: The project’s best-performing model, BiSRNet (Bi-Temporal Semantic Reasoning Network), uses co-attention mechanisms to improve the reasoning between two temporal images. The model detects subtle changes in urban landscapes and achieves the highest performance across key metrics.

Methodology

1. Data Preparation

The dataset was divided into training, validation, and testing sets. Each image is resized to 512x512 pixels, and the data split ensures a consistent representation of the various classes across different sets. Each split consists of the older and newer satellite images, along with their respective labels for semantic segmentation and change detection.

2. Data Augmentation

Data augmentation techniques such as random flips, brightness, and contrast adjustment, as well as Gaussian noise, were applied to improve model generalization and reduce overfitting. Specific augmentations relevant to satellite imagery, like random fog and rain, were also introduced to simulate real-world atmospheric conditions.

3. Model Building and Training

Seven different model architectures were implemented and trained on the multi-temporal satellite image pairs:

DualStreamFCN: Two parallel streams of input images are combined and passed to a classifier.
HRNet: A high-resolution model that preserves spatial details across multiple layers for accurate change detection.
Twin Siamese Network: Processes and compares two input images using shared weights with a cosine similarity measure to identify differences.
HRSCD4: A multi-task network that produces both binary change maps and semantic segmentation maps for each image.
SSCDe and SSCDl: These architectures apply early and late fusion of input images, respectively, to detect changes and classify the terrain type.
BiSRNet: The best-performing model, utilizing co-attention mechanisms to improve semantic reasoning and change detection.

During training, hyperparameters such as batch size, learning rate, weight decay, and number of epochs were fine-tuned using the validation set. Cross-validation was employed to ensure robust performance evaluation.

4. Evaluation Metrics

The models were evaluated using:

Overall Accuracy (OA): Measures the proportion of correctly classified pixels.
Mean Intersection over Union (mIoU): A key metric for semantic segmentation that calculates the overlap between predicted and true labels.
Separated Kappa (SeK): A statistical measure of agreement for categorical labels, excluding unchanged pixels to provide a more accurate measure of change detection performance.

Results

BiSRNet Performance: BiSRNet achieved the best results with an accuracy of 87.55%, a mIoU of 71.68%, and a SeK of 20.64%. This model demonstrated balanced precision and recall, making it highly effective in accurately detecting and categorizing urban changes.
SSCD-e and SSCD-l: These models also showed significant improvements over traditional models, with SSCD-l achieving an accuracy of 86.36% and a mIoU of 70.95%, demonstrating the importance of late fusion for change detection tasks.

Qualitative Results

The project generated qualitative results by comparing ground truth and predicted masks for semantic change detection and segmentation. The BiSRNet model successfully identified large-scale changes, such as new buildings and water bodies, although it struggled with smaller or more subtle features like vegetation or playgrounds.

Discussion

The BiSRNet model outperformed other architectures due to its use of co-attention mechanisms, which enabled the model to focus on relevant changes between two temporal images. Other models, such as HRNet and Twin Siamese, also performed well, particularly for larger and more distinct features. However, improvements could be made in handling more nuanced changes, which may require additional data or fine-tuning.

Conclusion

The project successfully demonstrates how state-of-the-art machine learning models can enhance semantic change detection for urban satellite imagery. The project sets a new benchmark for urban monitoring by using multi-temporal imagery and generating semantic change maps, improving upon traditional binary change detection methods. With BiSRNet achieving top performance, this study highlights the potential of advanced AI techniques for real-world applications in urban planning, environmental monitoring, and disaster management.

From Pixels to Meaning: Semantic Change Detection in Urban Satellite Imagery

Key Contributions

AI Implementation in MINDSETS

Methodology

1. Data Preparation

2. Data Augmentation

3. Model Building and Training

4. Evaluation Metrics

Results

Qualitative Results

Discussion

Conclusion

Other Projects

MAGNET-AD: Multitask Spatiotemporal GNN for Interpretable Prediction of PACC and Conversion Time in Preclinical Alzheimer

ClinGRAD: Clinically-Guided Genomics and Radiomics Interpretable GNN for Dementia Diagnosis

MINDSETS: Multi-Omics Integration with Neuroimaging for Dementia Subtyping and Effective Temporal Study

DEFUSE-MS: Deformation Field-Guided Spatiotemporal Graph-Based Framework for Multiple Sclerosis New Lesion Detection

Contact Me

Useful Links

Social

Contact Me

Useful Links

Social