動作環境

Mac OS X Mojave 10.14.6
Python 3.7.3
jupyter notebook

サンプルHSI (Pavia University)

今回はPavia UniversityのHSIを使用。 www.ehu.eus

【参考】

sjsy.hatenablog.com

ディレクトリ構成

.
├── HSI_classification.ipynb # ここにコード書く
└── dataset
    └── PaviaU
        ├── PaviaU.mat
        └── PaviaU_gt.mat

コード

パッケージのインポート

%matplotlib inline
import numpy as np
import scipy
import matplotlib.pyplot as plt

import os
import scipy.io

from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

データの読み込み

data_path = os.path.join(os.getcwd(),'dataset/PaviaU')
# HSIデータ
X = scipy.io.loadmat(os.path.join(data_path, 'PaviaU.mat'))['paviaU']
# ラベルデータ
y =scipy.io.loadmat(os.path.join(data_path, 'PaviaU_gt.mat'))['paviaU_gt']

import spectral

# RGB画像を表示
color_img = spectral.imshow(X,(65,33,13),figsize =(5,5))

f:id:SJSY:20191006142807p:plain

前処理

# HSIのプロパティを取得
height = X.shape[0]
width = X.shape[1]
bands =  X.shape[2]

def createPatches(X, y, removeZeroLabels = True):
    height = X.shape[0]
    width = X.shape[1]
    bands =  X.shape[2]
    
    patchesData = np.zeros((height * width, bands))
    patchesLabels = np.zeros((height * width))
    patchIndex = 0
    
    for r in range(height):
        for c in range(width):
            patchesData[patchIndex, :] = X[r, c]
            patchesLabels[patchIndex] = y[r, c]
            patchIndex = patchIndex + 1
    if removeZeroLabels:
        patchesData = patchesData[patchesLabels>0,:]
        patchesLabels = patchesLabels[patchesLabels>0]
        patchesLabels -= 1
    return patchesData, patchesLabels

def splitTrainTestSet(X, y, testRatio=0.10):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=testRatio, random_state=0,stratify=y)
    return X_train, X_test, y_train, y_test

# 2次元画像を1次元ベクトルに変換
XPatches, yPatches = createPatches(X, y)

# 訓練用データとテスト用データに分割
testRatio = 0.5
X_train, X_test, y_train, y_test = splitTrainTestSet(XPatches, yPatches, testRatio)

学習

# 分類器はランダムフォレスト
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

評価

class_name = ['Asphalt', 'Meadows', 'Gravel', 
              'Trees', 'Painted metal sheets', 'Bare Soil', 
              'Bitumen', 'Self-Blocking Bricks','Shadows']

pre = clf.predict(X_test)

ac_score = metrics.accuracy_score(y_test, pre)
print('正解率:{0:.1f}%'.format(ac_score * 100))

出力

正解率:91.3%

完全ではないですが、単純なコードでも分類できていることがわかります。

各クラスでの評価指標を見てみましょう。

import pandas as pd
d = metrics.classification_report(y_test, pre,target_names=class_name,
                              output_dict=True)
df = pd.DataFrame(d)
df

	Asphalt	Meadows	Gravel	Trees	Painted metal sheets	Bare Soil	Bitumen	Self-Blocking Bricks	Shadows	accuracy	macro avg	weighted avg
precision	0.923716	0.917186	0.797820	0.936658	0.997024	0.926680	0.890785	0.854864	1.0	0.912942	0.916081	0.913013
recall	0.938480	0.978764	0.767398	0.907311	0.995542	0.723658	0.784962	0.873438	1.0	0.912942	0.885506	0.912942
f1-score	0.931040	0.946975	0.782313	0.921751	0.996283	0.812681	0.834532	0.864052	1.0	0.912942	0.898847	0.910920
support	3316.000000	9324.000000	1049.000000	1532.000000	673.000000	2515.000000	665.000000	1841.000000	473.0	0.912942	21388.000000	21388.000000

Gravelの正解率が他と比べて小さいことがわかります。

正解率を上げるには前処理の方法や他の分類器を検討する必要があります。

分類の結果を画像で表示

# 画像の用意
outputs = np.zeros((height,width))
for i in range(height):
    for j in range(width):
        target = y[i,j]
        if target == 0 :
            continue
        else :
            prediction = clf.predict(X[i,j,:].reshape(1, -1))                       
            outputs[i,j] = prediction+1

# 正解画像
ground_truth = spectral.imshow(classes = y,figsize =(10,10))
# 機械学習画像
predict_image = spectral.imshow(classes = outputs.astype(int),figsize =(10,10))

正解画像

f:id:SJSY:20191006142814p:plain