Vision Transformer PIM Simulation
This tutorial demonstrates how to apply Processing-in-Memory (PIM) transformations to Vision Transformer (ViT) models for PIM-aware fine-tuning.
Overview
- PIM-aware fine-tuning takes a pretrained ViT model, applies PIM transformation, and fine-tunes the model on downstream vision datasets.
- The entry point is at
experiments/vit-pim/run_vit.py.
- The entry point is at
The PIM transformation simulates the effect of processing-in-memory architectures, with both digital and analog support.
Evaluation of PIM-aware Fine-tuning
Environment Setup
If you have not set up environments, please follow the guidelines in Environment Setup.
MASE Dependency
The PIM simulation uses quantization primitives from the MASE submodule. Ensure MASE is installed: pip install -e ./submodules/mase.
We provide scripts to apply PIM-aware fine-tuning on Vision Transformer models and evaluate their performance on standard vision benchmarks.
PIM-aware Fine-tuning & Evaluate on Vision Tasks
git clone https://github.com/AICrossSim/NewComputeBench.git
cd NewComputeBench
model_name="google/vit-base-patch16-224" # HuggingFace ViT model
dataset_name="imagenet" # Vision dataset for evaluation
pim_config_path="./experiments/llm-pim/configs/sram.yaml" # PIM transformation configuration
output_dir="./log_eval_results" # Output directory for results
python experiments/vit-pim/run_vit.py \
--model_name_or_path ${model_name} \
--dataset_name ${dataset_name} \
--pim_config_path ${pim_config_path} \
--output_dir ${output_dir} \
--per_device_eval_batch_size 32 \
--enable_pim_transform \
--do_eval
PIM Configuration
The PIM configuration file defines the noise characteristics, quantization levels, and other parameters that simulate the analog processing-in-memory effects. See experiments/llm-pim/configs/ for example configurations.
PIM Configuration Examples
Typical SRAM PIM Configuration
# experiments/llm-pim/configs/sram.yaml
by: "type"
conv2d:
config:
tile_type: "digital"
core_size: 16
rescale_dim: "vector"
x_quant_type: "e4m3"
weight_quant_type: "e4m3"
linear:
config:
tile_type: "digital"
core_size: 64
rescale_dim: "vector"
x_quant_type: "e4m3"
weight_quant_type: "e4m3"
Supported Models and Datasets
Models
Support ViT model family from huggingface:
e.g.google/vit-base-patch16-224
Datasets
- ImageNet: 1000-class image classification (requires custom path)
Performance Metrics
The evaluation provides comprehensive metrics:
- Accuracy: Top-1 and Top-5 accuracy