Vision Transformer CIM Simulation

This tutorial demonstrates how to apply Compute-in-Memory (CIM) transformations to Vision Transformer (ViT) models for CIM-aware fine-tuning.

Overview

CIM-aware fine-tuning takes a pretrained ViT model, applies CIM transformation, and fine-tunes the model on downstream vision datasets.
- The entry point is at experiments/vit-cim/run_vit.py.

The CIM transformation simulates the effect of compute-in-memory architectures, with both digital and analog support.

Evaluation of CIM-aware Fine-tuning

Environment Setup

If you have not set up environments, please follow the guidelines in Environment Setup.

We provide scripts to apply CIM-aware fine-tuning on Vision Transformer models and evaluate their performance on standard vision benchmarks.

CIM-aware Fine-tuning & Evaluate on Vision Tasks

git clone https://github.com/AICrossSim/NewComputeBench.git
cd NewComputeBench

model_name="google/vit-base-patch16-224"    # HuggingFace ViT model
dataset_name="imagenet"                      # Vision dataset for evaluation
cim_config_path="./experiments/llm-cim/configs/sram.yaml" # CIM transformation configuration
output_dir="./log_eval_results"             # Output directory for results

python experiments/vit-cim/run_vit.py \
    --model_name_or_path ${model_name} \
    --dataset_name ${dataset_name} \
    --cim_config_path ${cim_config_path} \
    --output_dir ${output_dir} \
    --per_device_eval_batch_size 32 \
    --enable_cim_transform \
    --do_eval

CIM Configuration

The CIM configuration file defines the noise characteristics, quantization levels, and other parameters that simulate the analog compute-in-memory effects. See experiments/llm-cim/configs/ for example configurations.

CIM Configuration Examples

Typical SRAM CIM Configuration

# experiments/llm-cim/configs/sram.yaml
by: "type"
conv2d:
  config:
    tile_type: "digital"
    core_size: 16
    rescale_dim: "vector"
    x_quant_type: "e4m3"
    weight_quant_type: "e4m3"

linear:
  config:
    tile_type: "digital"
    core_size: 64
    rescale_dim: "vector"
    x_quant_type: "e4m3"
    weight_quant_type: "e4m3"

Supported Models and Datasets

Models

Support ViT model family from huggingface: e.g.google/vit-base-patch16-224

Datasets

ImageNet: 1000-class image classification (requires custom path)

Performance Metrics

The evaluation provides comprehensive metrics:

Accuracy: Top-1 and Top-5 accuracy