• Home
  • About
    • Sangyeop-kim photo

      Sangyeop-kim

      데이터 광부

    • Learn More
    • Email
    • Github
  • Blog
  • Paper-review

내가 만들어 쓰는 Analysis-template (pytorch)

28 Aug 2020

Reading time ~2 minutes

2020.08.31

  • AWS S3에서 데이터 불러오기 : 다른 블로그 글을 통해 설명할 예정
  • Pytorch template version 1 : model_fit, model_test, fit_cross_validation, test_cross_validation, early_stopping, save_ckpt_file, tensorboard, save_jupyter_ipynb_file, training_validation_log, hyperparameter_setting_list, data_loader, auto_learning_rate_finding, oversampling 추가

추가 예정

  • sklearn template
  • data EDA tool
  • hyperparameter optimization
  • AWS db load

아래 깃허브에서 Analysis template을 다운받을 경우 사용 가능!!

sangyeop-kim/Analysis-template
Contribute to sangyeop-kim/Analysis-template development by creating an account on GitHub.
https://github.com/sangyeop-kim/Analysis-template.git

깃헙에서 학습내용이 좀 깨져서 나오긴 하는데 실제로는 더 이쁘게 나온다.....

Sample code
Requirements
import package
data preprocessing (sample)
model define (sample)
hyperparameters (sample)
model training & test
model load
cross_validation train & test
saved file list
ckpt
hparams.yaml
saved_ipynb_file.html
tensorboard
cross_validation

Sample code

(sample)이 적힌 부분만 수정해서 사용하면 됨.

Requirements

boto3==1.14.51 easydict==1.9 numpy==1.19.1 pandas==0.25.3 pytorch-lightning==0.9.0 scikit-learn==0.23.1 torch==1.6.0 torchvision==0.7.0

import package

# 필수
import os
from glob import glob
from easydict import EasyDict
from Analysis_template import AWS_s3
from Analysis_template import Model_template

import torch
from torch import nn, optim
import pytorch_lightning as pl
from torch.nn import functional as F
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import Dataset, DataLoader

# 선택, 전처리에 맞춰서 추가하기
from torchvision import datasets
import torchvision.transforms as transforms

data preprocessing (sample)

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

train_data = datasets.CIFAR10('./data', train = True, download=True, transform = transform)
test_data = datasets.CIFAR10('./data', train = False, download=True, transform = transform)

model define (sample)

class Model(Model_template) :
    def __init__(self, hyperperameters) :
        super().__init__(hyperparameters)
        
        self.loss = nn.CrossEntropyLoss()

        self.conv1 = nn.Conv2d(3, 32, 3)
        self.max_pool = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(32, 16, 3)
        self.linear = nn.Linear(16 * 6 * 6, 10)
    
    def forward(self, x) :
        x = self.conv1(x)
        x = F.relu(self.max_pool(x))
        x = self.conv2(x)
        x = F.relu(self.max_pool(x))
        x = F.relu(self.linear(x.view(x.size(0), -1)))       
        
        return x
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.lr)
        scheduler = StepLR(optimizer, step_size=self.hparams.step_size, 
                           gamma=self.hparams.gamma)
        
        return [optimizer], [scheduler]

hyperparameters (sample)

hyperparameters = EasyDict({'lr' : 0.007,
                            'max_epochs' :1,
                            'step_size' : 3, # scheduler
                            'gamma' : 0.9, # schduler
                            'batch_size' : 256, # train_batch_size
                            'test_batch_size' : -1, # test_batch_size
                            'gpus' : [0],
                            'num_workers' : 16,
                            'auto_lr_find' : False,
                            'save_top_k' : 3,
                            'folder' : 'best_model',
                             'early_stopping' : True,
                             'patience' : 5
                            })


if not os.path.isdir(hyperparameters['folder']) :
    os.mkdir(hyperparameters['folder'])

model training & test

model = Model(hyperparameters)
model.fit(train_data, test_data)

# 새로 정의한 loss or accuracy 사용 가능
def accuracy (y_hat, y) : # manual metric example
    return torch.sum(torch.max(y_hat, axis = 1)[1] == y).item()/len(y_hat)

model.test(test_data, 'accuracy', accuracy)

# 원래 loss 사용
model.test(test_data, 'loss')

model load

saved_folder_name = glob('./best_model/*')[0].split('/')[-1] # 원하는 파일명으로 변경

model = model.load_model(saved_folder_name)
model.test(test_data, 'accuracy', accuracy)
model.test(test_data, 'loss')

cross_validation train & test

model.fit_cross_validation(train_data, 5, 0) # 5fold, random_state = 0

saved_folder_name = glob('./best_model/*')[0].split('/')[-1] # 원하는 파일명으로 변경

model.test_cross_validation(train_data, 'loss', saved_folder_name)

saved file list

코드가 실행된 시간으로 폴더명이 생성

ckpt

save_top_k개의 높은 성능을 가진 ckpt 파일만 저장

ex) epoch=n_val_loss=x.xxxx.ckpt

hparams.yaml

실행 시 설정한 hyperparameters 값과 training, validation log 저장

saved_ipynb_file.html

모델 학습 당시 ipynb 파일 형태를 저장

tensorboard

train은 모두 측정

validation은 epoch 단위로 측정.

cross_validation

n_fold 형태로 저장

한 번에 볼 수도 있음



Share Tweet +1