本文以ResNet50的圖片分類模型訓練為例,為您介紹KSpeed在CV領域加速圖片數據的加載實踐。ResNet50模型是基于NVIDIA官方開源代碼DeepLearningExamples中的實現。使用KSpeed需要在原來的代碼上做一點改動,改動的地方可以通過git patch的方式適配到ResNet50模型中,改動細節在文末接入KSpeed關鍵模塊說明進行了簡要說明。
代碼準備
代碼base庫:
https://github.com/NVIDIA/DeepLearningExamples/commit/174b3d40bfc26f2adcf252676d38d6d5ffa7cbdc
git clone https://github.com/NVIDIA/DeepLearningExamples.git
cd DeepLearningExamples
git checkout master
git reset --hard 174b3d40bfc26f2adcf252676d38d6d5ffa7cbdc
接入KSpeed代碼
#保持在DeepLearningExamples目錄下
wget http://kspeed-release.oss-cn-beijing.aliyuncs.com/kspeed_resnet50.patch
git apply kspeed_resnet50.patch
運行環境配置
啟動訓練容器命令如下:
docker run -it --gpus all --name=resnet50_kspeed_test --net=host --ipc host --device=/dev/infiniband/ --ulimit memlock=-1:-1 -v /{path-to-imagenet}:/{path-to-imagenet-in-docker} -v /{path-to-DeepLearningExamples}:/{path-to-DeepLearningExamples-in-docker} eflo-registry.cn-beijing.cr.aliyuncs.com/eflo/ngc-pytorch-kspeed-22.05-py38:v2.2.0
上述命令中
{path-to-imagenet}
表示物理機中imagenet數據集所在路徑;{path-to-imagenet-in-docker}
表示用戶將數據集映射到容器中的路徑;{path-to-DeepLearningExamples}
表示物理機中模型訓練代碼所在路徑;{path-to-DeepLearningExamples-in-docker}
表示模型訓練代碼映射到容器中的路徑;
以上路徑需要用戶自己設置。
imagenet數據集目錄結構如下所示:
imagenet
├── train
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ └── ......
│ ├── n01443537
│ └── ......
└── val
├── n01440764
│ ├── ILSVRC2012_val_00000293.JPEG
│ ├── ILSVRC2012_val_00002138.JPEG
│ └── ......
├── n01443537
└── ......
運行模型訓練
#保持在DeepLearningExamples目錄下
cd ./PyTorch/Classification/ConvNets
#單機八卡 baseline
bash ./resnet50v1.5/training/AMP/DGXA100_resnet50_AMP_multi.sh pytorch {path-to-imagenet-in-docker}
#單機八卡 kspeed
bash ./resnet50v1.5/training/AMP/DGXA100_resnet50_AMP_multi.sh kspeed {path-to-imagenet-in-docker}
#單機八卡 dali+kspeed
bash ./resnet50v1.5/training/AMP/DGXA100_resnet50_AMP_multi.sh dali-kspeed {path-to-imagenet-in-docker}
上述命令中
{path-to-imagenet-in-docker}
表示imagenet數據集在容器中的路徑,需要與啟動容器時設置的路徑保持一致。
執行KSpeed測試前,需要確保已經部署好kspeed服務。
接入KSpeed關鍵模塊說明
增加kspeeddataloader模塊文件
新增文件DeepLearningExamples/PyTorch/Classification/ConvNets/image_classification/kspeeddataloader.py
,主要實現了包括基于KSpeed的Pytorch Dataloader和基于KSpeed的Dali Dataloader。
基于KSpeed的Pytorch Dataloader
實現基于KSpeed的Pytorch Dataloader,只需修改Dataset,然后結合Pytorch原生的Sampler和Dataloader即可。核心代碼如下:
導入kspeeddataset模塊
import kspeed.utils.data.kspeeddataset as KSpeedDataset
將
torchvison.datasets.ImageFolder
替換為KSpeedDataset.KSpeedImageFolder
,從而可以使用KSpeed數據加載加速能力
train_dataset = KSpeedDataset.KSpeedImageFolder(
traindir, None, workers, kspeed_iplist,
"admin", "admin", transforms.Compose(transforms_list),
)
val_dataset = KSpeedDataset.KSpeedImageFolder(
valdir, None, workers, kspeed_iplist,
"admin", "admin",
transforms.Compose(
[
transforms.Resize(
image_size + crop_padding, interpolation=interpolation
),
transforms.CenterCrop(image_size),
]
),
)
實現
get_kspeed_train_loader
和get_kspeed_val_loader
方法,詳見kspeeddataloader.py
16~72行和74~128行
基于KSpeed的Dali Dataloader
實現基于KSpeed的Dali Dataloader,只需修改Dali pipeline的輸入數據源為一個外部數據源KSpeedCallable即可。核心代碼如下:
KSpeedCallable
KSpeedCallable對象繼承KSpeedDataset.KSpeedFolder
,在kspeeddataloader.py
164~179行中176行,通過self.dataset.getBIN(path)
讀取imagenet數據集樣本。
def __call__(self, sample_info):
if self.dataset is None:
self.load()
if sample_info.iteration >= self.full_iters:
raise StopIteration()
if self.last_seen_epoch != sample_info.epoch_idx:
self.last_seen_epoch = sample_info.epoch_idx
self.perm = np.random.default_rng(seed=42 + sample_info.epoch_idx).permutation(len(self.files))
idx = self.perm[sample_info.idx_in_epoch + self.shard_offset]
path = os.path.join(self.root, self.files[idx])
dout = self.dataset.getBIN(path)
sample = np.frombuffer(dout, dtype=np.uint8)
label = np.int32([self.labels[idx]])
return sample, label
基于KSpeedCallable的Dali Pipeline
在kspeeddataloader.py
223~229行中,使用KSpeedCallable作為Dali Pipeline的外部數據源獲取數據集樣本。
if kspeed:
images, labels = fn.external_source(source=kscallable,
num_outputs=2,
batch=False,
parallel=True,
dtype=[types.UINT8, types.INT32],
device='cpu')
增加DATA_BACKEND_CHOICES選項
在DeepLearningExamples/PyTorch/Classification/ConvNets/image_classification/dataloaders.py
40行,將原來的DATA_BACKEND_CHOICES = ["pytorch", "syntetic"]
, 修改如下:
DATA_BACKEND_CHOICES = ["pytorch", "syntetic", "kspeed", "dali-kspeed", "dali"]
增加args.data_backend選項
在文件DeepLearningExamples/PyTorch/Classification/ConvNets/main.py
中512~520行,將如下代碼添加到args.data_backend的分支當中:
elif args.data_backend == "kspeed":
get_train_loader = get_kspeed_train_loader
get_val_loader = get_kspeed_val_loader
elif args.data_backend == "dali":
get_train_loader = get_dali_kspeed_train_loader(dali_cpu=True, kspeed=False)
get_val_loader = get_dali_kspeed_val_loader(dali_cpu=True, kspeed=False)
elif args.data_backend == "dali-kspeed":
get_train_loader = get_dali_kspeed_train_loader(dali_cpu=True)
get_val_loader = get_dali_kspeed_val_loader(dali_cpu=True)