Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 845 (2005)
(USC DC Other)
USC Computer Science Technical Reports, no. 845 (2005)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CLeVer: a Feature Subset Selection Technique
for Multivariate Time Series
?
(Full Version)
KiyoungYang,HyunjinYoon,andCyrusShahabi
Computer Science Department
University of Southern California
Los Angeles, CA 90089, U.S.A.
fkiyoungy,hjy,shahabig@usc.edu
Abstract. Feature subset selection (FSS) is one of the techniques to
preprocess the data before performing any data mining tasks, e.g., clas-
si¯cation and clustering. FSS provides both cost-e®ective predictors and
a better understanding of the underlying process that generated data.
We propose a novel method of FSS for Multivariate Time Series (MTS)
basedonCommonPrincipalComponentAnalysis,termed CLeVer.Tra-
ditional FSS techniques, such as Recursive Feature Elimination (RFE)
andFisherCriterion(FC),havebeenappliedtoMTSdatasets,e.g.,Elec-
tro Encephalogram (EEG) datasets. However, these techniques may lose
thecorrelationinformationamongfeatures,whileourproposedtechnique
utilizes the properties of the principal component analysis to retain that
information. In order to evaluate the e®ectiveness of our selected subset
of features, we employ classi¯cation as the target data mining task. Our
exhaustive sets of experiments show that CLeVer outperforms RFE and
FC by up to 100% in terms of classi¯cation accuracy, while requiring sig-
ni¯cantly less processing time (up to 2 orders of magnitude) than RFE
and FC.
Keywords: Feature Subset Selection, Multivariate Time Series, Prin-
cipal Component Analysis, Common Principal Component Analysis, K-
means Clustering
1 Introduction
Feature subset selection (FSS) is one of the techniques to pre-precess the data
beforeweperformanydataminingtasks,e.g.,classi¯cationandclustering.FSS
is to identify a subset of original features from a given dataset while removing
irrelevantand/orredundantfeatures[1].TheobjectivesofFSSare[2]:
{ toimprovethepredictionperformanceofthepredictors
?
A short version of this paper is to appear in the proceedings of the 9th Paci¯c-Asia
Conference on Knowledge Discovery and Data Mining (PAKDD), Hanoi,Vietnam,
2005.
2
{ toprovidefasterandmorecost-e®ectivepredictors
{ to provide a better understanding of the underlying process that generated
thedata
TheFSSmethodschooseasubsetoftheoriginalfeaturestobeusedforthe
subsequent processes. Hence, only the data generated from those features need
tobecollected.Thedi®erencesbetweenfeature extraction andFSSare:
{ Featuresubsetselectionmaintainsinformationontheoriginalfeatureswhile
thisinformationisusuallylostwhenfeatureextractionisused.
{ After identifying the subset of original features, only those features can be
measuredandcollectedignoringalltheotherfeatures.However,featureex-
tractioningeneralrequiresmeasuringalltheoriginalfeatures.
A time series is a series of observations, x
i
(t);[i = 1;¢¢¢ ;n;t = 1;¢¢¢ ;m],
madesequentiallythroughtimewhere iindexesthemeasurementsmadeateach
timepoint t[3]. Itiscalledaunivariate timeserieswhen nisequal to1,anda
multivariatetimeseries(MTS)when nisequalto,orgreaterthan2.
MTS datasets are common in various ¯elds, such as in multimedia and
medicine. For example, in multimedia, Cybergloves used in the Human and
ComputerInterfaceapplicationshavearound20sensors,eachofwhichgenerates
50»100 values in a second [4,5]. In [6], 22 markers are spread over the human
body to measure the movements of human parts while people are walking. The
dataset collected is then used to recognize and identify the person by how he
orshewalks.IntheNeuro-rehabilitationdomain,kinematicsdatasetsgenerated
fromsensorsarecollectedandanalyzedtoevaluatethefunctionalbehavior(i.e.,
themovementofupperextremity)ofpost-strokepatients.
ThesizeofanMTSdatasetcanbecomeverylargequickly.Forexample,the
EEG dataset in [7] utilizes tens of electrodes and the sampling rate is 256Hz.
In order to process MTS datasets e±ciently, it is therefore inevitable to pre-
process the datasets to obtain the relevant subset of features which will be
subsequently employed for further processing. In the ¯eld of Brain Computer
Interfaces(BCIs),theselectionofrelevantfeaturesisconsideredabsolutelynec-
essary for the EEG dataset, since the neural correlates are not known in such
detail[7].
An MTS item is naturally represented in an m£ n matrix, where m is the
numberofobservationsand nisthenumberof variables,e.g.,sensors.However,
thestateoftheartfeaturesubsetselectiontechniques,suchasRecursiveFeature
elimination (RFE) [2], require each item to be represented in one row. Conse-
quently, to utilize these techniques on MTS datasets, each MTS item needs to
be ¯rst transformed into one row or column vector. For example, in [7] where
an EEG dataset with 39 channels is used, an autoregressive (AR) model of or-
der 3 is utilized to represent each channel. Hence, each 39 channel EEG time
seriesistransformedintoa117dimensionalvector.However,ifeachchannelof
EEGisconsideredseparately,wewilllosethecorrelationinformationamongthe
variables.
3
In this paper, we propose a novel feature subset selection method for mul-
tivariate time series (MTS)
1
based on common principal component analysis
(CPCA) named CLeVer
2
. In order to perform feature subset selection on an
MTS dataset, CLeVer ¯rst obtains the descriptive common principal compo-
nents (DCPCs) which agree most closely with the principal components of all
the MTS items [8]. Note that the DCPC loadings represent how much each
variable contributes to each of the DCPCs (See Section 2 for a brief review of
PCAandDCPC).TheintuitiontousetheDCPCsasabasisfor variable subset
selectionisthattheykeepthemostcompactoverviewoftheMTSitemsinadra-
maticallyreducedspace,whileretainingboththecorrespondencetotheoriginal
variablesandthecorrelationamongthevariables. CLeVersubsequentlyclusters
the DCPC loadings to identify the variables that have the similar contribution
to each of the DCPCs. For each cluster, we obtain the centroid variable, elimi-
natingallthe similar variableswithinthecluster.Thesecentroidvariablesform
theselectedsubsetofvariables.Ourexperimentsshowthattheperformancesof
thevariablesubsetsobtainedby CLeVerperformuptoabout100%betterthan
other feature subset selection methods, such as Recursive Feature Elimination
(RFE)andFisherCriterion(FC)intermsofclassi¯cationperformanceinmost
cases.Moreover, CLeVertakesupto2ordersofmagnitudelesstimethanRFE,
whichisa wrapper method[9].
The remainder of this paper is organized as follows. Section 2 discusses the
background. Our proposed method is described in Section 3, which is followed
by the experiments and results in Section 4. Conclusions and future work are
presentedinSection5.
2 Background
In this section, we brie°y review principal component analysis and common
principalcomponentanalysis.Formoredetails,pleasereferto[10,8].
2.1 Principal Component Analysis
Principal Component Analysis (PCA) has been widely used for multivariate
data analysis and dimension reduction [10]. Intuitively, PCA is a process to
identifythedirections,i.e.,principalcomponents(PCs),wherethevariancesof
scores(orthogonalprojectionsofdatapointsontothedirections)aremaximized
andtheresidualerrorsareminimizedassumingtheleastsquaredistance.These
directions, in non-increasing order, explain the variations underlying original
datapoints.Thatis,the¯rstprincipalcomponentdescribesthelargestvariation,
thesubsequentdirectionexplainsthenextlargestvarianceandsoon.Figure1(a)
1
For multivariate time series, each variable is regarded as a feature [7]. Hence, the
terms feature and variable are interchangeably used throughout this paper, when
there is no ambiguity.
2
CLeVer is an abbreviation of descriptive Common principal component Loading
based Variable subset selection method.
4
X
1
X
2
a
1
ß
1
Score
(Orthogonal Projection)
PC
1
= (cos a
1
)X
1
+ (cos ß
1
)X
2
PC
2
x
1
x
2 PC
1_A
PC
1_B
CPC
1_AB
(a) (b)
Fig.1. (a) Two principal components obtained for one multivariate data with two
variables X1 and X2 measured on 30 observations. (b) A common principal component
of two multivariate data items with the same variables X1 and X2 measured on 20 and
30 observations respectively.
illustratesprincipalcomponentsobtainedfromaverysimple(thoughunrealistic)
multivariatedatawithonlytwovariables(X
1
, X
2
)measuredon30observations.
Geometrically, the principal component is a linear transformation of original
variables. The coe±cients de¯ning this transformation are called loadings. For
example, the ¯rst principal component (PC1) in Figure 1(a) can be described
asalinearcombinationoforiginalvariables X
1
and X
2
,andthetwocoe±cients
(loadings)de¯ningPC1arethecosinesoftheanglesbetweenPC1andvariables
X
1
and X
2
, respectively. In addition, the higher loading value of variable X
2
implies that X
2
is more dominant in the PC1 than X
1
. The loadings are thus
interpretedasthe contributions or weights ondeterminingthedirections.
2.2 Common Principal Component Analysis
CommonPrincipalComponentAnalysis(CPCA)isageneralizationofPCAfor
N(¸2)multivariatedataitems,wherethe ithdataitem,(i=1;:::;N),isrep-
resentedinan m
i
£nmatrix[8,11].CPCAisbasedontheassumptionthatthere
exists a common subspace across all multivariate data items and this subspace
shouldbespannedbytheorthogonalcomponents.Oneapproachproposedin[8]
obtainedtheCommonPrincipalComponents(CPC)bybisectingtheanglesbe-
tween their principal components after each multivariate data item undergoes
PCA.Thatis,¯rstlyeachofthemultivariatedataitemsisdescribedbyits¯rst
pprincipalcomponents.Then,theCPCsareobtainedbysuccessivelybisecting
the angles between their jth (j = 1;:::;p) principal components. These CPCs
de¯nethecommonsubspacethat agree most closely witheverysubspaceofthe
multivariatedataitems.Figure1(b)givesaplotoftwomultivariatedataitems
5
Aand B.Let Aand Bbedenotedasaswarmofwhiteandblackpoints,respec-
tively,andhavethesamenumberofvariables,i.e., X
1
and X
2
,measuredon20
and30observations,respectively.The¯rstprincipalcomponentofeachdataset
isobtainedusingPCAandthecommoncomponentisobtainedbybisectingthe
angle between those two ¯rst principal components. We will refer to this CPC
modelas Descriptive Common Principal Component (DCPC).
Each common principal component loading for DCPCs, e.g., the (i, j)th
elementinamatrixthatcontainsDCPCs,canbeinterpretedasthe contribution
ofthe jthoriginalvariablewithrespecttothe ithcommonprincipalcomponent,
whichisanalogoustothewaytheprincipalcomponentloadingsareinterpreted.
3 Proposed Method
We propose a novel variable subset selection method for multivariate time se-
ries (MTS) based on common principal component analysis (CPCA) named
CLeVer.Figure2illustratestheentireprocessof CLeVer,whichinvolvesthree
phases:(1)principalcomponents(PCs)computationperMTSitem,(2)descrip-
tive common principal components (DCPCs) computation per label
3
and their
concatenation, and (3) variable subset selection using DCPC loadings of vari-
ables.Eachofthesephasesisdescribedinthesubsequentsections.Table1lists
thenotationsusedintheremainderofthispaper,ifnotspeci¯edotherwise.
Table 1. Notations used in this paper
SymbolDe¯nition
N number of MTS items in an MTS dataset
n number of variables in an MTS item
K number of clusters for K-means clustering
p number of PCs for each MTS item to be used
for computing DCPCs
C set of labels, i.e.,fC1;:::;C
jCj
g
jCj number of unique labels
3.1 PC and DCPC Computations
The ¯rst and second phases (except the concatenation) of CLeVer are incor-
porated into Algorithm 1. It obtains both PCs and then DCPCs consecutively.
TherequiredinputtoAlgorithm1isasetofMTSitemswiththesamelabel.
Though there are n PCs for each item, only the ¯rst p(< n) PCs, which
are adequate for the purpose of representing each MTS item, are taken into
3
The MTS datasets considered in our analysis are composed of labeled MTS items.
See Section 4.1 for details.
6
MTS Dataset
I. Principal Components Computation
II. Descriptive Common Principal Components Computation
and Concatenation
III. K- means Clustering on DCPC Loadings
and Variable Selection
Selected Variables (one per cluster)
Label A Label B
V
1
, … ,V
n
V
1
, … ,V
n
DCPC
1
…
DCPC
p
V
1
, … ,V
n
PC
1
…
PC
p
…
PC
n
…
K
…
K 1 1 … …
Fig.2. The process of CLeVer.
consideration.Itiscommonplacethat pisdeterminedbasedontheratioofthe
sumofthevariancesexplainedbythe¯rst pPCstothetotalvarianceunderlying
the original MTS item, which exceeds, e.g., at least 0.8. Algorithm 1 takes the
sumofvariation,i.e.,thethresholdtodetermine p,asaninput.Thatis,foreach
input MTS item, p is determined to be the minimum value such that the total
variation explained by its ¯rst p PCs exceed the provided threshold ± for the
¯rst time (Lines 3»10). Since the MTS items can have di®erent values for p, p
is¯nallydeterminedastheirmaximumvalue(Line11).
All MTS items are now described by their ¯rst p principal components. Let
thembedenotedas L
i
(i=1;:::;N).Then,theDCPCsthat agree most closely
withall N setsof pPCsaresuccessivelyde¯nedbytheeigenvectorsofthematrix
H =
P
N
i=1
L
T
i
L
i
[8]:
SVD(H)= SVD(
N
X
i=1
L
T
i
L
i
)= V¤V
T
(1)
whererowsof V
T
areeigenvectorsof H andthe¯rst pofthemde¯ne p DCPCs
for N MTSitems. ¤isadiagonalmatrixwhosediagonalelementsaretheeigen-
values of H and describe the total discrepancy between DCPCs and PCs. For
example, the ¯rst eigenvalue implies the overall closeness of the ¯rst DCPC
to the ¯rst PC of every MTS item (for more details, please refer to [8]). This
computationofDCPCiscapturedbyLines16»17.
7
Algorithm 1 ComputeDCPC:PCandDCPCComputations
Require: MTS data groups with N items and ±fa prede¯ned thresholdg
1: DCPCÃ ¿;
2: H[0]Ã ¿;
3: for i=1 to N do
4: XÃ the ith MTS item;
5: [U;S;U
T
]Ã SVD(correlation matrix of X);
6: loading[i]Ã U
T
;
7: varianceà diag(S);
8: percentVarà 100£(variance=
P
n
j=1
variancej);
9: pià number of the ¯rst p percentVar elements whose cumulative sum¸ ±;
10: end for
11: pà max(p1;p2;:::;pN);
12: for i=1 to N do
13: L[i]Ã the ¯rst p rows of loading[i];
14: H[i]Ã H[i¡1] + (L[i]
T
£L[i]);
15: end for
16: [V;S;V
T
]Ã SVD(H);
17: DCPCÃ ¯rst p rows of V
T
;
3.2 CLeVer
CLeVerutilizesaclusteringmethodtogroupthe similar variablestogetherand
selecttheleastredundantvariables,whichisdescribedinAlgorithm2.Ittakes
MTSitemsandtheirlabelinformationasinputs.First,theDCPCsperlabelare
computed by Algorithm 1 and are concatenated if the MTS dataset has more
than 1 labels (Lines 3»10). Subsequently, K-means clustering is performed on
thecolumnsoftheconcatenatedDCPCloadings.Thatis,eachcolumnbecomes
one item for clustering. Hence, the column vectors with the similar pattern of
contributions to each of the DCPCs will be clustered together. The intuition
behind using the clustering technique for the variable selection is based on the
observation that variables with similar pattern of loading values will be highly
correlated and have high mutual information [12]. Since K-means clustering
methodcanreachthelocalminima,weiteratethe K-meansclustering20times
(Lines11»14).
The next step is the actual variable selection, which involves deciding the
representativesofclusters.Oncetheclusteringisdone,onecolumn vectorclos-
est to the centroid vector of each cluster is chosen as the representative of that
cluster. The other columns within each cluster therefore can be eliminated. Fi-
nally, the corresponding original variable to the selected column is identi¯ed,
which will form the selected subset of variables with the least redundant and
possiblythemostrelatedinformationforthegiven K.
8
Algorithm 2 CLeVer.
Require: MTS data sets,jCjfthe number of unique labelsg, Kfthe number of clus-
tersg, and ±fa prede¯ned thresholdg
1: S
best
à ¿;
2: DCPCÃ ¿;
3: ifjCj· 1 then
4: DCPCÃ computeDCPC(MTS, ±);
5: else
6: for i=1 tojCj do
7: dc[i]Ã computeDCPC(MTS with the label Ci, ±);
8: DCPCÃ Concatenate(DCPC;dc[i]);
9: end for
10: end if
11: for i=1 to 20 do
12: (cnt[i];idx[i])Ã Kmeans(DCPC loadings, K);
13: dist[i]Ã
P
K
j=1
ED(cnt[j];items in the jth cluster);
14: end for
15: bestà min(dist[1];:::;dist[K]);
16: (cnt
best
;idx
best
)Ã (cnt[best];idx[best]);
17: S
best
à extract K column vectors closest to cnt
best
of each cluster and identify
the corresponding variables;
4 Performance Evaluation
We evaluate the e®ectiveness of CLeVer in terms of classi¯cation performance
and processing time. We conducted several experiments on three real-world
datasets. After obtaining a subset of variables using CLeVer, we performed
classi¯cation using Support Vector Machine (SVM) with linear kernel. Subse-
quently,wecomparedtheperformanceof CLeVerwiththoseofRecursiveFea-
tureElimination(RFE)[2],FisherCriterion(FC),ExhaustiveSearchSelection
(ESS),andusingalltheavailablevariables(ALL).Thealgorithmof CLeVerfor
the experiments is implemented in Matlab
TM
. SVM classi¯cation is completed
with LIBSVM [13].
4.1 Datasets
TheHumanGaitdataset[6]hasbeenusedforidentifyingapersonbyrecogniz-
inghis/hergaitatadistance.Inordertocapturethegaitdata,atwelve-camera
VICONsystemwasutilizedwith22re°ectivemarkersattachedtoeachsubject.
15 subjects, which are the labels assigned to the dataset, participated in the
experiments.Thetotalnumberofdataitemsis540.
MotorBehaviorandRehabilitationLaboratory,UniversityofSouthernCal-
ifornia collected Brain and Behavior Correlates of Arm Rehabilitation
(BCAR) kinematics dataset to study the e®ect of Constraint-Induced (CI)
physical therapy on the post-stroke patients' control of upper extremity [14].
The functional speci¯c task performed by subjects was a continuous 3 phase
9
reach-grasp-place action. Four control (i.e., healthy) subjects and three post-
stroke subjects experiencing a di®erent level of impairment participated in the
experiments.Thetotalnumberofdataitemsis39.
The Brain Computer interface (BCI) dataset at the Max Planck
Institute (MPI) [7] was collected to examine the relationship between the
brain activity and the motor imagery, i.e., the imagination of limb movements.
39 electrodes were placed on the scalp to record the EEG signals at the rate of
256Hz.Thetotalnumberofitemsis2000,i.e.,400itemspersubject.
Table2showsthesummaryofthedatasetsusedintheexperiments.
Table 2. Summary of datasets used in the experiments
HumanGait BCAR BCI MPI
# of variables 66 11 39
average length 133 454 1280
# of labels 15 2 2
# of items per label 36 22/17 1000
total # of items 540 39 2000
4.2 Classi¯cation Performance
We¯rstevaluatedthee®ectivenessof CLeVerintermsofclassi¯cationaccuracy.
SupportVectorMachine(SVM)withlinearkernelwasadoptedfortheclassi¯er.
UsingSVM,weperformedleave-one-outcrossvalidationfortheBCARdataset
and 10 fold strati¯ed cross validation [15] for the rest since they have too large
numberofitemstoconductleave-one-outcrossvalidation.
FortheMTSdataset tobefedintoSVM,eachoftheMTSitemsshouldbe
represented as a vector with the same dimension, which we call vectorization.
In [16], a correlation matrix is utilized to compute the similarity between two
MTS items, which outperformed DTW and Euclidean distance with linear in-
terpolation intermsof precision/recall. Hence,for CLeVer,Exhaustive Search
Selection(ESS),andusingallthevariables(ALL),wevectorizedeachMTSitem
usingtheuppertriangleofitscorrelationmatrix.
ForRFEandFC,wevectorizedeachMTSitemasin[7].Thatis,eachvari-
able is represented as the autoregressive (AR) ¯t coe±cients of order 3 using
the forward backward linear prediction [17]. Therefore, each MTS item with n
variables is represented in a vector of size n£3. The The Spider [18] imple-
mentation of FC is subsequently employed. For small datasets, i.e., BCAR and
HumanGait,RFEwithin The Spider [18]wasemployed,whileforlargedataset,
i.e., BCI MPI, one of the LIBSVM tools [13] was modi¯ed and utilized. Note
thatESSmethodwasperformedonlyonBCARdatasetduetotheintractability
ofESSforthelargedatasets.
Figure 3(a) presents the generalization performances on the HumanGait
dataset.TheXaxisisthenumberofselectedsubsetofvariables,i.e.,thenumber
10
0 10 20 30 40 50 60 70
0
10
20
30
40
50
60
70
80
90
100
110
# of selected variables
Precision (%)
CLeVer
FC
RFE
ALL
(a) (b)
2 3 4 5 6 7 8 9 10 11
0
10
20
30
40
50
60
70
80
90
100
110
# of selected variables
Precision (%)
CLeVer
RFE
FC
ESS
ALL
MIN
AVG
MAX
0 5 10 15 20 25 30 35 40
0
10
20
30
40
50
60
70
80
90
100
# of selected variables
Precision (%)
CLeVer
FC
RFE
MIC 17
ALL
(c) (d)
Fig.3. (a) Classi¯cation Evaluation for HumanGait dataset (b) 22 markers for the
HumanGait dataset. The markers with a ¯lled circle represent 16 markers from which
the 27 variables are selected by CLeVer, which yields the same performance accuracy
as using all the 66 variables. Classi¯cation Evaluations for (c) BCAR dataset and (d)
BCI MPI dataset
ofclusters K,andtheYaxisistheclassi¯cationaccuracy.Itshowsthatasubset
of27variablesselectedby CLeVeroutof66performsthesameastheoneusing
allthevariables,whichis99.4%accuracy.The27variablesselectedby CLeVer
are from only 16 markers (marked with a ¯lled circle in Figure 3(b)) out of 22,
which would mean that the values generated by the remaining 6 markers does
not contribute much to the identi¯cation of the person. From this information
we may be able to better understand the characteristics of the human walking.
TheperformancesbyRFEandFCismuchworsethantheonesusing CLeVer.
Even when using all the variables, the classi¯cation accuracy is around 55%.
Considering the fact that RFE on 3 AR coe±cients performed well in [7], this
mayindicatethatfortheHumanGaitdatasetthecorrelationinformationamong
variablesismoreimportantthanfortheBCIMPIdataset.Hence,eachvariable
should not be taken out separately to compute the autoregressive coe±cients,
bywhichthecorrelationinformationwouldbelost.Notethatin[7],theorder3
11
for the autoregressive ¯t is identi¯ed after proper model selection experiments,
which would mean that for the HumanGait dataset, the order of the autore-
gressive¯tshouldbedetermined,again,aftercomparingdi®erentordermodels.
This represents that it is not a trivial task to transform an MTS item into a
vector,afterwhichthetraditionalmachinelearningtechniques,suchasSupport
VectorMachine(SVM),canbeapplied.
Figure3(c)showstheclassi¯cation performanceoftheselectedvariableson
theBCARdataset.TheBCARisthesimplestdatasetwith11originalvariables
and the number of MTS items is just 39. Hence, we applied the Exhaustive
Search Selection (ESS) method to ¯nd all the possible variable combinations,
foreachofwhichweperformedleave-one-outcrossvalidation.TheresultofESS
showsthat100%classi¯cationaccuracycanbeachievedbynolessthan6vari-
ables out of 11. The dotted lines representthe best, the average, and the worst
performanceobtainedbyESS,respectively.Theresultshowsthat CLeVercon-
sistently outperforms RFE and Fisher methods. Figure 3(c) also depicts that
the7variablesselectedby CLeVerproduceabout100%classi¯cationaccuracy,
whichisevenbetterthanusingallthe11variableswhichisrepresentedasahor-
izontalsolidline.Thisimpliesthat CLeVernevereliminatesusefulinformation
initsvariableselectionprocess.
Figure3(d)representstheperformancecomparisonusingtheBCIMPIdataset
4
.
It depicts that when the number of selected variables is less than 10, RFE per-
forms better than CLeVer and FC technique. When the number of selected
variables is greater than 10, however, CLeVer performs far better than RFE.
The classi¯cation performance using the 17 motor imagery channels (MIC 17)
is presented in dashed lines, while the performance using all the variables is
shown in solid horizontal line. Using the 17 variables selected by CLeVer, the
classi¯cationaccuracyis72.85%,whichisveryclosetotheperformanceof MIC
17 whose accuracy is 73.65%. Note again that even using all the variables, the
performanceofRFEisworsethanthatof15variablesselectedby CLeVer.
4.3 Processing Time
Table 3 summarizes the processing time of the 3 feature selection methods em-
ployedfortheexperiments.Theprocessingtimefor CLeVerincludesthetimeto
performAlgorithm1andtheaveragetimetoperformtheclusteringandobtain
the variable subsets while varying K from 2 to the number of all variables for
eachdataset.Forexample, K changesfrom2to66fortheHumanGaitdataset.
TheprocessingtimeforRFEandFCincludesthetimetoobtain3autoregressive
¯tcoe±cientsandperformthefeaturesubsetselection.
RFE is a wrapper feature selection method [15]. That is, RFE utilizes the
classi¯erwithinthefeatureselectionproceduretoselectthefeatureswhichpro-
ducethe best classi¯cationprecision.Intuitively, CLeVerutilizeshowmuchthe
4
Unlike in [7] where they performed the feature subset selection per subject, the
whole items from the 5 subjects were utilized in our experiments. Moreover, the
regularization parameter Cs was estimated via 10 fold cross validation from the
training datasets in [7], while we used the default value, which is 1.
12
Table3.Comparisonofprocessingtimeinsecondsfordi®erentfeatureselectionmeth-
ods on 3 di®erent datasets
HumanGait BCAR BCI MPI
CLeVer 6.2186 0.2416 48.0381
RFE 962.063 9.0390 7886.844
FC 113.907 6.4690 7594.9413
variablescontributetotheDCPCsinordertodeterminetheimportanceandthe
similarityofthem.Since CLeVerdoesnotincludetheclassi¯cationprocedure,
it takes less time to yield the feature subset than wrapper methods. For exam-
ple, for the HumanGait dataset, CLeVer took less than 6 seconds to compute
DCPCs,andacoupleofsecondstoperform K-meansclusteringontheloadings
oftheDCPCs.Overall, CLeVertakeupto2ordersofmagnitudelesstimethan
RFE,whileperformingbetterthanRFEuptoabout100%.
5 Conclusion and Future Work
In this paper, we proposed a novel feature subset selection method for mul-
tivariate time series (MTS), based on common principal component analysis
(CPCA), termed CLeVer. CLeVer utilizes the properties of the descriptive
common principal components (DCPCs) to retain the correlation information
amongthevariables.Subsequently, CLeVerperformsclusteringontheDCPCs
loadingstoselectasubsetofvariables.Ourexperimentsonthethreereal-world
datasets show that CLeVer outperforms other feature selection methods, such
as Recursive Feature Elimination (RFE) and Fisher Criterion (FC) in terms of
classi¯cationperformance.Moreover, CLeVertakesupto2ordersofmagnitude
lessprocessingtimethanRFE.
Weintendtoextendthisresearchintwodirections.First,weplantoextend
thisworktobeabletoestimatetheoptimalnumberofvariables,i.e.,theoptimal
numberofclusters Kusing,e.g.,theGapstatistic[19].Wealsoplantogeneralize
thisresearchanduse k-wayPCA[20]toperformPCAona k-wayarray.
Acknowledgement
ThisresearchhasbeenfundedinpartbyNSFgrantsEEC-9529152(IMSCERC),
IIS-0238560 (PECASE) and IIS-0307908, and unrestricted cash gifts from Mi-
crosoft. Any opinions, ¯ndings, and conclusions or recommendations expressed
in this material are those of the author(s) and do not necessarily re°ect the
viewsoftheNational ScienceFoundation. Theauthorswould like tothankDr.
CaroleeWinsteinandJarugoolTretiluxanaforprovidingustheBCARdataset
and valuable feedbacks, and Thomas Navin Lal for providing us the BCI MPI
dataset.
13
References
1. Liu, H., Yu, L., Dash, M., Motoda, H.: Active feature selection using classes. In:
Paci¯c-Asia Conference on Knowledge Discovery and Data Mining. (2003)
2. Guyon, I., Elissee®, A.: An introduction to variable and feature selection. Journal
of Machine Learning Research 3 (2003) 1157{1182
3. Tucker, A., Swift, S., Liu, X.: Variable grouping in multivariate time series via
correlation. IEEE Trans. on Systems, Man, and Cybernetics, Part B 31 (2001)
4. Kadous, M.W.: Temporal Classi¯cation: Extending the Classi¯cation Paradigm to
Multivariate Time Series. PhD thesis, University of New South Wales (2002)
5. Shahabi, C.: AIMS: An immersidata management system. In: VLDB Biennial
Conference on Innovative Data Systems Research. (2003)
6. Tanawongsuwan, Bobick: Performance analysis of time-distance gait parameters
under di®erent speeds. In: 4th International Conference on Audio- and Video
Based Biometric Person Authentication, Guildford, UK (2003)
7. Lal, T.N., SchrÄ oder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N.,
SchÄ olkopf,B.: SupportvectorchannelselectioninBCI. IEEETrans.onBiomedical
Engineering 51 (2004)
8. Krzanowski, W.: Between-groups comparison of principal components. Journal of
the American Statistical Association 74 (1979)
9. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Arti¯cial Intelli-
gence 97 (1997) 273{324
10. Jolli®e, I.T.: Principal Component Analysis. Springer (2002)
11. Flury, B.N.: Common principal components in k groups. Journal of the American
Statistical Association 79 (1984) 892{898
12. Cohen, I., Tian, Q., Zhou, X.S., Huang, T.S.: Feature selection using principal
feature analysis. University of Illinois at Urbana-Champaign (2002)
13. Chang, C.C., Lin, C.J.: Libsvm { a library for support vector machines.
http://www.csie.ntu.edu.tw/»cjlin/libsvm/ (2004)
14. Winstein, C., Tretriluxana, J.: Motor skill learning after rehabilitative therapy:
Kinematics of a reach-grasp task. In: the Society For Neuroscience, San Diego,
USA (2004)
15. Han, J., Kamber, M.: 3. In: Data Mining: Concepts and Techniques. Morgan
Kaufmann (2000) 121
16. Yang, K., Shahabi, C.: A PCA-based similarity measure for multivariate time
series. In: The Second ACM MMDB. (2004)
17. Moon, T.K., Stirling, W.C.: Mathematical Methods and Algorithms for Signal
Processing. Prentice Hall (2000)
18. Weston, J., Elissee®, A., BakIr, G., Sinz, F.: Spider: object-orientated machine
learning library. http://www.kyb.tuebingen.mpg.de/bs/people/spider/ (2004)
19. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a
data set via the gap statistic. Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 63 (2001) 411{423
20. Leibovici, D., Sabatier, R.: A singular value decomposition of a k-way array for
a principal component analysis of multiway data, pta-k. Linear Algebra and its
Applications (1998)
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 868 (2005)
PDF
USC Computer Science Technical Reports, no. 855 (2005)
PDF
USC Computer Science Technical Reports, no. 733 (2000)
PDF
USC Computer Science Technical Reports, no. 721 (2000)
PDF
USC Computer Science Technical Reports, no. 785 (2003)
PDF
USC Computer Science Technical Reports, no. 869 (2005)
PDF
USC Computer Science Technical Reports, no. 744 (2001)
PDF
USC Computer Science Technical Reports, no. 840 (2005)
PDF
USC Computer Science Technical Reports, no. 736 (2000)
PDF
USC Computer Science Technical Reports, no. 739 (2001)
PDF
USC Computer Science Technical Reports, no. 587 (1994)
PDF
USC Computer Science Technical Reports, no. 943 (2014)
PDF
USC Computer Science Technical Reports, no. 742 (2001)
PDF
USC Computer Science Technical Reports, no. 650 (1997)
PDF
USC Computer Science Technical Reports, no. 968 (2016)
PDF
USC Computer Science Technical Reports, no. 645 (1997)
PDF
USC Computer Science Technical Reports, no. 694 (1999)
PDF
USC Computer Science Technical Reports, no. 835 (2004)
PDF
USC Computer Science Technical Reports, no. 966 (2016)
PDF
USC Computer Science Technical Reports, no. 799 (2003)
Description
Kiyoung Yang, Hyunjin Yoon, Cyrus Shahabi. "CLeVer: a feature subset selection technique for multivariate time series." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 845 (2005).
Asset Metadata
Creator
Shahabi, Cyrus
(author),
Yang, Kiyoung
(author),
Yoon, Hyunjin
(author)
Core Title
USC Computer Science Technical Reports, no. 845 (2005)
Alternative Title
CLeVer: a feature subset selection technique for multivariate time series (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
13 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16271019
Identifier
05-845 CLeVer a Feature Subset Selection Technique for Multivariate Time Series (filename)
Legacy Identifier
usc-cstr-05-845
Format
13 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/