Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 646 (1997)
(USC DC Other)
USC Computer Science Technical Reports, no. 646 (1997)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Analysis and Design of Serv er Informativ e WWWsites
Amir M Zark esh
y
Jafar Adibi
z Cyrus Shahabi
z
Reza Sadri
y y
and Vishal Shah
z
y
Quad Design Group
Viewlogic Systems
Camarillo CA azarkeshqdtcom
z
In tegrated Media Systems Cen ter
Univ ersit y of Southern California
Los Angeles CA adibi shahabi vishalshuscedu
y y
Computer Science Departmen t
Univ ersit y of California
Los Angeles CA rezacsuclaedu
Abstract
The ac c ess p atterns of the users on a website ar etr aditional ly investigatedinordertoimpr ove
the user ac c ess to the sites information In this study however a systematic appr o ach is intr o duc e d
in or der to analyze the users navigation p ath to the advantage of the website owner As users
navigate thr ough a website they ar etr ansp ar ently l ling a questionnair egener ate d by the web
site owner We rst cluster the users who navigate similar p aths employing the Path Mining
algorithm Next the c orr elation b etwe en a set of tar get questions and the structur e of the WWW
site is quantie d This has b e en done by b orr owing the c onc ept of channel fr om information the ory
A channel c an bec onsider e d as an information bridge b etwe en the users p ath classes and the
answers to a questionnair e By adopting many c onc epts fr om information the ory we intr o duc e
a natur al me asureto c ompute the ee ctiveness of a WWWsite structur e in answering the tar get
questionnair e Using this me asur e we pr ovide a set of design guidelines to make WWWsites mor e
informative for the server T o nd the p ar ameters of a channel we pr op ose a le arning pr o c ess
b asedon a setof tr aining data andor inputs fr om a human exp ert Final ly our pr op ose d appr o ach
is teste d on a sample WWWsite and the r esults demonstr ate dr amatic impr ovement in the server
informationp assing
Authors w ork supp orted b y NSF gran t EEC IMSC ER C
In tro duction
An imp ortan t task for the mark eting departmen t of businesses is to capture the c haracteristics of
their p oten tial customers A traditional metho d to ac hiev e this is a sur v ey metho d a question
naire is sentto eac h customer and they are ask ed to ll the questionnaire This metho d is usually
not v ery successful due to t w o reasons First it is cum b ersome for the p eople to ll questionnaire
Second there is no guaran tee that the customers replies ha v e a close correlation with the real
in terest of the customer In this pap er w e suggest a metho d to capture customers c haracteristics
as they na vigate through a business w ebsite W e demonstrate ho w this navigation p ath can b e
considered as lling a transparen t tar get questionnaire This metho d eliminates b oth of the ab o v e
men tioned problems with the traditional surv ey metho d In w ein tro duced a metho d for gathering accurate users na vigation paths emplo ying a Ja v a
applet implemen tation Moreo v er a clustering algorithm termed P ath Mining w as applied to
the users na vigation paths As a result users of a w ebsite can b e clustered in to a n um ber of
classes The k ey observ ation is that p eople in the same class share the same in terests
This pap er is a follo wup study on Ref and it pro vides the tec hniques needed for the next
step in the kno wledge disco v ery from the WWW users na vigation paths Hence in this step w e
can assume that the classes of the users with similar paths are giv en No w the main issue is to
relate these classes to the answ ers of target questions W eshowho w the information theory can b e used in addressing this issue The relation among the answ ers of questionnaire and the path
classes can b e conceptualized as an information channelA c hannel ha v e a precise probabilistic
denition in information theory based on conditional probabilities These c hannel parameters are
termed channel char acteristics Moreo v er w e use the concept of mutual information of a c hannel to
quan tify the c hannel eectiv eness The higher the capacit yof a c hannel the ric her the information
that can b e extracted from a WWWsite The denition and the calculation metho d of this WWW
site eectiv eness measure is one ma jor con tribution of this pap er
Using the ab o v e w e construct a systematic metho d to compare the eectiv eness of dieren t
WWWsite designs in information passing to the serv er W e also discuss some basic design rules
whic h can impro v e the eciency of a WWWsite to the b enet of the serv er o wner W e will then
address the tradeos b et w een using these design rules as compared to those whic hmak e the user s
access to the site more ecien t
As a b ypro duct of constructing the ab o vemen tioned mac hineryw ein tro duce an eectiv e pro
cedure for kno wledge disco v ery on users in terests First the parameters of a discov er y c hannel
are c haracterized During the same pro cess the classication of similar users is done based on our
P ath Mining algorithm Subsequen tly for eac h new user the tendency of b ecoming the mem ber of
eac h class is determined Finally the probabilit y distribution of the answ ers of eac h user to the
questionnaire is calculated
The c haracterization of the c hannel is c hallenging in order to mak e our formalism useful in
practice Wepropose t w o learning metho ds for this purp ose Gathering inputs from an exp ert
h uman or automatic learning from a set of training data are the prop osed approac hes In practice
a com bination of b oth yields the b est result
The rest of this pap er is organized as follo ws In Sec a review of P ath Mining and its
application in WWW na vigation path clustering are pro vided A sample WWWsite to b e used
in the rest of the pap er is in tro duced In Sec a disco v ery c hannel b et w een the answ ers to the
target questionnaire and na vigation path classes is constructed Moreo v er the m utual information
is in tro duced Next in Sec a learning pro cess to nd the disco v ery map is explained A set of
general design rules to construct more eectiv e WWWsites is presen ted in Sec In Sec our
WWWsite is used as a test b ed to sho w the merits of our approac h Finally in Sec w e conclude
our pap er
Clusterization of the Na vigation P aths
In this section w e briey review the P ath Mining metho dology in the con text of its application
in clustering the WWW na vigation paths F or more details please consult the cited pap ers
Structure of WWWsite can b e sho wn byits c onne ctivity gr aph A connectivit y graph is a
directed graph with p ossible cycles where WWWpages are graph no des and h yp ertext links are
graph edges T o illustrate the idea a sample site for a h yp othetical en tertainmentcen ter ha v e b een
dev elop ed This site pro vides en tertainmen t related information suc h as news ab out latest m usic
and mo vies Its structure co v ers dieren t issues in a WWW site suc h as links with iden tical source
and target pages but app earing in dieren tcon texts links to outside of the site links from outside
of the site etc F or more detail description of our sample site please consult Ref This sample
site is used frequen tly for the rest of this pap er
A users na vigation path can b e sho wn b y a path on the connectivit y graph The main idea is
that the users na vigation path is correlated with its in terests Moreo v er the time sp end on eac h
page can b e considered as another reasonable sign of the user in terest in the page con text The
dev elopmen t of a proler to accurately capture the na vigation paths including the time sp end on
eac h page is rep orted in Ref
F or man y applications including the curren t pap er it is crucial to b e able to classify similar
paths Note that a path can include complex structures suc h as rep eated cycles F urthermore paths
ha v e dieren t lengths that usually v aries o v er a wide ranges In tuitiv elyw e can recognize similarit y
bet w een t w o paths ev en if they ha v e dieren t cycles and lengths P ath Mining pro vides a quan tita
tiv e measure for similarit y among paths and has b een used in div erse t yp es of applications P ath clustering pro vides a systematic approac h to capture imp ortan t features of a path while a v oid
ing irrelev an t details In this metho d the similarit yof t w o paths is measured based on the n um ber
of common subpaths at dieren t length scales Moreo v er the similarit y in the time sp end on eac h
common subpath increases the total similarit y measure of t w o paths
The core of P ath Mining algorithm is based on mapping eac h path to a linear feature space
where eac h dimension corresp onds to one of the subpaths Subsequen tly the standard tec hniques
a v ailable in a linear v ector space can b e applied F or example innerpro duct and angle b et w een t w o
paths can b e dened In Ref path angle w as used to measure the similarit y among na viga
tion paths Ha ving the similarit y measure as a scalar n um b er the classical clustering approac hes
can b e emplo y ed to classify similar paths
Consider database D of na vigation paths whic h has already b een classied to classes C
C
n
of similar paths
D n
k
C
k
C
k
C
l
k l f n g The path database divided b y the equiv alence relation of path similarityissho wn b y
D D Eac h class can b e represen ted byan aver age p ath of a class W e denote the represen tativepaths
by n
Therefore
D f n
g Un til no w w eassumed har d class mem b ership ie eac h path b elongs to one and only one class
In the more general framew ork of Ba y esian clusterization a path can b e a mem ber of
more than one class with dieren t probabilities ie soft class mem b ership P ath Mining pro vides
a natural and robust framew ork to calculate the soft mem b ership of a new path to the k class
based on the cosine of the angle b et w een the new path and the a v erage represen tativeofeachclass k
F or the denition of the path angle please consult Refs This angle can b e used
as a measure for path similarit y Consequen tlyw e can use the normalized similaritybet w een the
new path and eac h class to estimate mem b ership probabilit y ie
P C
k
cos k
P
n
l cos l
Weha v e to emphasize on the imp ortance of the soft mem b ership measure in P ath Mining The
similarit y among the paths can b e due to their complex substructures It is crucial to k eep the
degree of similarit y of a path with dieren t classes as opp osed to transforming it to a threshold
based hard mem b ership In this case to compute the prop erties of a new path pw e calculate
the probabilistic a v erage of the prop erties of the classes w eigh ted b y the probabilities of p b eing
a memberofeac h class This is a more robust measure as compare to a direct assignmen t of the
prop erties of the closest class to a new path
Due to their denition path angles are b ounded b yzeroand and hence their cosines are nonnegativ e
Disco v ery Channels
Our ob jectiv e is to nd the answ ers to some tar get questions based on the users in terests As it
is sho wn in Ref dieren t path classes are correlated to p eople with dieren t set of in terests
Ho w ev er here w estriv e to nd answ ers to a sp ecic set of target questions p er eac h user If the
path clusters classify the in terests of the users and if the con text of those in terests and the con text
of the target questions b e highly correlated then w e exp ect most of the mem b ers of the same
class will answ er the questions similarlyW ecan quan tify this concept using Ba y esian conditional
probabilities
3.1 Channel Definition
First let set the notations V ariable q
i
assumes v alues to l
i
as the answ er to the i th m ultiple
c hoice question where l
i
is the n um b er of p ossible answ ers W e sho w the set of answ ers to m
questions b y the v ector q q
q
m
A class of similar paths is iden tied b yv ariable k whic h
assumes v alues from f n g where n is the n umberof pathclasses The probabilit y that a
user with answ er v ector q b elongs to the path class C
k
is P k jq Conditional Probabilities P k jq enco de the kno wledge ab out the correlation b et w een the answ er to the target questions and the
con ten t of the WWWsite W e term the set of P k jqasthe channel char acteristicsA sc hematics
of the c hannel is sho wn in Fig If the classes of the na vigation paths and the c hannel c haracteristics are kno wn then the prob
abilit y distribution of the answ ers v ector q for a user with path na vigation can b e computed as
follo ws First the probabilit y that the na vigation path b elongs to eac h class is computed ie
P C
k
k Subsequen tlyb y applying the total probabilitylaww e obtain
Figure 1. Channel Schematic
P qj
n
X
k
P qjk P C
k
Emplo ying Ba y es relation w e can compute P qjk based on the c hannel c haracteristics as
P qjk
P k jq P q P
r
P k jr P r where P q P q
q
m
are the a priori probabilit y of the answ ers in the p opulation W e also
used follo wing compact notation for the sums o v er the v ectors of answ ers
X
r
l
X
r
l m
X
r m
It is crucial to observ e the imp ortance of using the na vigation path classes instead of summing
o v er all p ossible paths The reason is that the path database is v ery ric h and complex Hence
the complexit y of computing P qj for eac h path in the database is v ery high Moreo v er as it is
sho wn in Ref the in terest of a user can b e recognized b y considering the features of the user
na vigation o v er dieren t scales Therefore if w ew an t to use a direct mapping b et w een paths and
the answ ers w eha v e to consider all p ossible subpaths with dieren t lengths ie dieren t scales
The n um b er of these cases is exp onen tially large and nding a reliable map for all those cases with
some learning pro cess is ev en harder b y one order of magnitude
The clusterization based on P ath Mining has a fundamen tal prop ert y It enco des the similarit y
o v er dieren t scales where the details are abstracted out in a v ery natural w a y Therefore w e only
need to deal with manageable n um b er of classes with similar paths This is in parallel to the fact
that although p eople in terest patterns in general mightbe v ery complex in the con text of target
questions only classes of the users with similar in terests based on their answ ers are imp ortan t
3.2 Discovery flow
In Sec w e will showho w to extract path classes C
k
s and the c hannel P k jQb y a learning
from training data Ho w ev er Assuming these parameters are pro vided here w eshowhowwecan
disco v er the most probable answ er of a new user to the questionnaire
The disco v ery o w is sho wn in Fig Consider a new user na vigates the WWWsite The
proler gathers the na vigation path Then using the P ath Mining algorithm w e nd the angle of
the new path with the a v erage path in eac h class ie k
No w Eq can b e used to
estimate the probabilityof mem b ership to eac h classes Finally using Eqs w e can calculate
the probabilit y of the answ ers to the questionnaire In Sec w e rep ort on actual implemen tation
of this o w
Figure 2. Discovery Flow
3.3 Channel Mutual Information
Assume the disco v ery c hannel for a WWWsite is giv en The next question is ho wcan w e quan tify
the eectiv eness of a WWWsite in nding the answ ers to the target questions T o analyze this
problem w e shall lo ok at the relation b et w een the probabilistic distributions o v er the path classes
and the answ ers to the target questions This question can b e systematically answ ered b y using
the notion of c hannel capacit y in the con text of the Information Theory A disco v ery c hannel can b e conceptualized as a c hannel that carries information from answ er to
a target question to the path classes The answ ers v ector q can b e considered as the v alue of a
random v ector Q consisting of m random v ariables Q
Q
m
where v alues of Q
i
is in f l
i
g On the other hand the path class n um ber k f n g can b e considered as the v alue of a random
v ariable K The mutual information among answ er v ector q when auserisamem b er of class kis
giv en b y I q k log
P qjk P q whic h measures the amoun t of information ab out the answ er to the question whic h is gained b y
the kno wledge of users path class In the b est case if the users path class exactly iden ties the
answ er then P qjk and I q k log
P q whic h is the information v alue of the answ er to
the question On the other hand if the users path class is totally irrelev an t to the answ er then
P qjk P q and I q k whic h suggests that the kno wledge of users path class do es not
add an y information to answ er the question Av eraging o v er set of questions and path classes the
a v erage m utual information b et w een the answ ers to the m questions and the n path classes is giv en
b y
I Q K
n
X
k X
q
P qk I q k where P qk P q
q
n
k is the join t probabilit y among the answ ers of questions and path
classes
Using the denition of entr opy and c onditional entr opy H Q X
q
P qlog P q H QjK n
X
k
X
q
P qk log P qjk w e can obtain the m utual information as
I Q K H Q H QjK The m utual information quan ties ho wm uc h the uncertain t y ab out the answ er of the target ques
tion is reduced bykno wing to whic h path class the user b elongs W e consider all logarithms in base
and hence the unit of information is in bits
If the set of target questions are indep enden t and ha v e indep enden t conditional probabilities
e
P q m
Y
i
P q
i
P qjk m
Y
i
P q
i
jk then Eqs and can b e simplied to the follo wing form
I Q K
m
X
i
I Q
i
K where I Q
i
K is the m utual information for the ith question and w e obtain
I Q
i
K H Q
i
H Q
i
jK H Q
i
l
i
X
q
i
P q
i
log P q
i
H Q
i
jK n
X
k
l
i
X
q
i
P q
i
k log P q
i
jk and hence the m utual information in dieren tc hannels can b e considered indep enden tlyHo w ev er
in practice it is hard to mak e all questions indep enden t F or example in our sample WWWsite
questions Age and Sex Category are fairly indep enden t Ho w ev er the question Education
lev el is dep enden t to the latter one Therefore simplied v ersions of the equations can b e applied
if one can mak e the questions indep enden t
Figure 3. Learning Flow
Learning Pro cess
This section determines the parameters needed for measuring the c hannel m utual information based
on a learning metho dology The c hannel c haracteristics P k jq and the a prior probabilities
P q are the parameters that should b e determined in order to compute m utual information I q k
The learning pro cedure is consisted of t w o ma jor phases The rst phase in whic hw e cluster the
paths in the database is an unsup ervised learning metho d In Ref w epro vided a similarit y
measure and w e explained ho w can w e cluster paths of a giv en database As an example w e applied
KMeans clustering algorithm to classify paths One ma y use an alternativ e clustering
algorithm for this step The learning o wis sho wn in Fig The second phase is a sup ervised learning approac h in whichw e use either a h uman exp ert
kno wledge or apply a set of training examples In the rst metho d w e use the domain kno wledge
of an exp ert in the con text of WWWsite and the target questions W e ask the exp ert to go o v er
the path classes and based on hiser prior kno wledge of the sub ject measures the p ossibilit yof a
certain answ er to dieren t target questions p er eachclass W ema y refer to this metho d as class
in terpretation The second metho d is to render training data in order to automatically learn the
c hannel c haracteristics The training set data should include a na vigation path database for a set
of users and also their correct answ ers to a target set of questions Ha ving this information the
disco v ery map based on the frequency of answ ers to a set of questions can b e computed
In practice the b est result can b e obtained b y using a com bination of the automatic learning
metho d and the guidance of an exp ert The adv an tage of suc h approac h is that in case of incon
sistency b et w een the exp erts prediction and training data either the selected p opulation can b e
corrected or the domain kno wledge can b e up dated This will increase the system eciency and
accuracyF or a complete discussion ab out dieren t learning metho ds please consult Ref
The learning pro cess can b e dynamically impro v ed during the normal op eration of the system
b y emplo ying an adaptiv e learning algorithm Once new data is inserted in to our database from
new users na vigations w ema y recluster the paths and rerun the en tire pro cedure The learning
pro cedure ma y b e applied dynamically or in a batc h mo de running p erio dicall y F or our curren t
exp erimen ts w e up date classes p erio dically WWWsite design rules
The c hannel m utual information dened in Sec is a natural w a y to quan tify the qualityof a
WWWsite design in passing information to the serv er A design with higher m utual information is
more ecien t There are no general algorithm to automatically generate an ecien tsev er informa
tiv e site This is a complex design pro cess Ho w ev er w e can compare dieren t structures of a w eb
site b y using our eciency measure F urthermore w e prop ose some general design guidelines W e
argue wh y using these design guidelines yields more ecien t serv er informativew eb sites W e also
pro vide a metho dology to examine the impacts of the design rules The result of some exp erimen ts
are rep orted in the follo wing section
5.1 Design rules
The source of the information passing from the user to the serv er is h yp ertext selection or clicking More clic king translates in to more p oten tial information passing to the serv er Ho w ev er for an
inecien t serv er site ev en manyclic ks of a user migh t b e useless and do esnt pro duce v aluable
information On the other hand the reason that a user is clic king through the site is to gather the
information she is in terested in Generally an ecien t site for the user is the one whic h the user
can nd these information as fast as p ossible There has b een man yw orks in this direction In
most cases the less n umberofclic ks b y a user translates in to faster access In these cases increasing
the eciency of a site for b oth the serv er and the user is a conicting ob jectiv e Designing the
w eb page optimally for b oth the user and the serv er and studying the tradeos can b e considered
as another researc h issue and is b ey ond the scop e of this pap er The follo wing is a list of some
guidelines that can makeaw ebsite more ecien t on detecting users in terests and c haracteristics
Short pages
Extra links to pro vide information hierarc hically
No links from a page to itself eg a table of con ten ts at the b eginning of a page It is m uc h
harder to detect this t yp e of links Using links with iden tical source and target pages but in dieren tcon texts
Searc h engines are less informativ e for the serv er T rying to use tree access algorithms
5.2 Testing the Design Rules
Here wepro vide a metho dology to test dieren t design rules W e rst x the questionnaire and
the con ten t material of a WWWsite Subsequen tlyw e calculate the m utual information while
wev ary the str uctur e of the site and not its con ten t T o test the impact of N dieren t design
rules w e need to ha v e N cop y of the site sa y S
S
N
W e assume eacht w o successiv e
implemen tations lik e S
i
and S
i are iden tical except a relativ ely small part where t w o dieren t
design rules are used S
i
without using the ith design rule and S
i
with using the rule Supp ose
w e can gather na vigation paths for all these N sites Subsequen tlyw e execute the P ath Mining
algorithm and determine the classes K
i for eac h i Finally w e calculate the c hannel parameters
P K
i jQ and m utual information I Q K
i The ith design rule has b een successful in impro ving
the eciency of the site if
I Q K
i I Q K
i
T o guaran tee that dieren t design rules do not ha v e destructivein teraction with eachother ie
no buttery eect w eneed to ha v e
I Q K
I Q K
N
The ab o v e test metho dology can b e implemen ted as follo ws W e pro vide sligh tly dieren t copies
of the site via dieren t directories in our serv er Using a coun ter eac h new user is redirected
in to one of the v e copies The proler collects the na vigation paths of eac h user and store it in
separate databases Finally the m utual information for dieren t implemen tations of the site can
b e calculated and comp ered
Exp erime n tal Result
Toev aluate our metho dologyw e conducted t w o sets of exp erimen ts In the rst set w e used our
Entertainment w ebsite to implicitl y gather answ ers to a simple hidden questionnaire from users
na vigation paths In the second set w e compared dieren t structures of a w ebsite b y computing
the m utual information p er eac h structure A thorough set of more elab orated exp erimen ts will b e
rep orted in the full pap er
6.1 Discovery of Answers to a Hidden Questionnaire
Mo vie pro ducers usually desire to study the reaction of the audiences of their mo vie prior to its
public screening One p opular metho d to realize this is to prescreen a mo vie for a sample set
of audiences After the prescreening a simple surv ey con taining a small set of questions will b e
passed among the audience A t ypical questionnaire includes audience sex age lev el of education
and their tendency to w ards w atc hing the mo vie for the second time W e decided to use our sample
en tertainmen t site to see if w e can disco v er the answ ers to these questions from the users na vigation
paths Subsequen tlyw e compared our deriv ed results with the real data gathered from the users
W e showana verage of accuracy bet w een the deriv ed result and real data F or the rest of this
section w e describ e our exp erimen t in more details
The surv ey w as conducted based on questions ab out sex age and lev el of education Therefore
Q sex ag e education During our exp erimen t the proler recorded accesses to our
sample site The rst accesses w ere used as a training set That is the sex age and education
lev el of these users w ere kno wn in adv ance Subsequen tly the statistics of the training set
w as used to calculate the c haracteristics of the c hannel ie the conditional probabilities More
P ath F emale Male
class Undergrad Grad Undergrad Grad
ID
Table 1. Channel parameters calculated from the training data
sp ecically the na vigation paths of the training set users w ere classied using the P ath Mining
metho d F or clustering w e used k means with v eclasses
ie k W e can calculate P qjk
the frequency of a sp ecic set of answ ers qgiv en the user b elongs to cluster k This is due to the
fact that wekno w the explicit answ er to eac h question from users in the training set Moreo v er w e
classied all users in the training set W e made lo ok up table Tabin whic hw e put all conditional
probabilities for further references
This learning o w is sho wn in Fig Using T ab w e can disco v er the most probable answ er of a new user to the questionnaire The
disco v ery pro cedure w as explained in Sec T o illustrate consider the na vigation path of a new
user out of the remaining users First using the path angle in P ath Mining and Eq the probabilities of b eing a mem ber of eac h class are calculated Next using Eq and the
c hannel c haracteristics w e calculate the most probable sex age in terv al and education lev el of the
user with path Tov erify our metho d w e compared the deriv ed information of the users with their real data
w eha v e already gathered accurate answ ers to our age sex education questionnaire for eac hofthe
users W e used the standard deviation of our probabilistic predication against the accurate
A soft classication using AutoClass algorithm has also b een emplo y ed and the result will b e rep orted in the full
pap er
The n um b ers for the users with age less than and users with high sc ho ol diploma as their highest degree are
not rep orted in T ab
answ ers to measure the accuracy of our predictions The accuracy of our predictions for sex age
and education w ere and resp ectiv ely 6.2 Evaluating of alternative structures of a WWW-site
The second set of exp erimen ts is conducted to demonstrate the relation b et w een a WWWsite
structure and a c hannel m utual information for a giv en surv ey W e can examine the impact of
the structure of a WWWsite on the c hannel m utual information as explained in Sec In our
exp erimen t users visited alternativ e structures of a single sample site W e examined the
impact of dieren t rules men tioned in Sec on the v alue of m utual information F or example w e
tested the impact of splitting a long page in to t w o shorter pages while connecting them via con text
sensitiv e links In tuitiv ely the second structure can pass more information since the user needs to
visit more pages Our in tuition w as v eried b y the deriv ed v alue of m utual information p er eac h
design T able sho ws the eect of ab o v e strategy in reduction of classication error
F urther studies and exp erimen ts are needed to nd denite design guidelines Ho w ev er the
usefulness of m utual information as a natural to ol to measure the impact of dieren t design imple
men tations is clear
Error Structure Structure Gender
Age
Education
Table 2. Comparison between two alternative structures of a WWW-site
Conclusions
WWW has b een usually considered as a source of information for the users ho w ev er there is also
v aluable information that could b e gained b y analyzing the access patterns of w eb site users In
this pap er w e presen ted a simple and practical metho d for answ ering a set of questions ab out the
w eb site users in terests and c haracteristics b y observing hiser na vigation path Using the concept
of m utual information from information theoryw eproposedaw ell founded analytic measure to
determine howw ell a serv er can answ er a set of questions This leads to in tro ducing the concept of
c hannel b et w een the desired information and the a v ailable information from the users na vigation
paths The high accuracy of our o w in prediction of users in terest is sho wn through an exp erimen t
on a sample WWWsite
In addition w epro vided a quan titativ e measure for comparing sites eciencies in passing infor
mation to the serv er A set of design rules for WWW pages in order to mak e them more ecien t
for this purp ose are pro vided Some of these rules migh tha v e conicts with the ob jectiv eof ha ving
an ecien t WWW page from the user p ersp ectiv e Pro viding optimal solutions for an ecien t
bidirectional o w of information needs more in v estigation
Although weha vesho wn a mapping b et w een users na vigation path and a giv en questionnaire
still w e need a systematic W ebsite design metho dology to create new W ebpages or mo dify existing
ones While this pap er has prop osed some general and sub jectiv e design rules this area should b e
considered as a ric h nonexplored territory for further researc h
References
J Adibi D Meord H Alk er and A M Zark esh Conict resolution based on case tra jectory In
preparation
J Adibi R P atil W Sho emak er and A M Zark esh Realtime casebase reasoning system for critical
care Submitted to IJCAI
P Cheesman J Kelly M Self J Stutz W T a ylor and D F reeman Auto class a ba y esian classication
system In Pr o c e e ding of The Fifth Int Confer enc e on Machine L e arning T M Co v er and J A Thomas Elements of Information The ory New Y ork John Wiley Sons Inc
U F a yy ad G PiatetskyShapiro and P Sm yth Kno wledge disco v ery and data mining to w ard a
unifying framew ork In Pr o c e e ding of The Se c ond Int Confer enc e on Know le dge Disc overy and Data
Mining pages U M F a yy ad G PiatetskyShapiro G Sm yth and P Uth urusam y A dvanc es in Know le dge disc overy
and Data Mining AAAIMIT Press D H Fisher Machine L e arning pages Boston Klu w er Academia Publishers J Hartigan Clustering A lgorithmsNew Y ork John Wiley Sons Inc G PiatetskyShapiro R Braac hman T Khabaza W Klo esgen and E Simoudis An o v erview of
issues in dev eloping industrial data mining and kno wledge disco v ery applications In Pr o c e e ding of The
Se c ond Int Confer enceonKnowle dge Disc overy and Data Mining pages M Sahami Learning limited dep endencies ba y esian classication In The Se c ond Int Confer enceon
Know le dge Disc overy and Data MiningP ortland Oregon Aug C Shahabi A M Zark esh J Adibi and V Shah Kno wledge disco v ery from users w ebpage na vigation
Accepted for publication in IEEE RIDE
C E Shannon A mathematical theory of comm unication Bel l Sys T e ch Journal J Sha vlik and T G Dietteric h R e adings in Machine L e arning Morgan Kaufmann Publishers T WY an M Jacobsen H GarciaMolina and U Da y al F rom user access patterns to dynamic
h yp ertext linking In Pr o c e e dings of the th
International WorldWide Web Confer enc eP aris F rance
Ma y
A M Zark esh J Adibi R Sadri and C Shahabi P ath mining Submitted to KDD
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 645 (1997)
PDF
USC Computer Science Technical Reports, no. 647 (1997)
PDF
USC Computer Science Technical Reports, no. 868 (2005)
PDF
USC Computer Science Technical Reports, no. 600 (1995)
PDF
USC Computer Science Technical Reports, no. 628 (1996)
PDF
USC Computer Science Technical Reports, no. 835 (2004)
PDF
USC Computer Science Technical Reports, no. 840 (2005)
PDF
USC Computer Science Technical Reports, no. 623 (1995)
PDF
USC Computer Science Technical Reports, no. 719 (1999)
PDF
USC Computer Science Technical Reports, no. 618 (1995)
PDF
USC Computer Science Technical Reports, no. 590 (1994)
PDF
USC Computer Science Technical Reports, no. 748 (2001)
PDF
USC Computer Science Technical Reports, no. 826 (2004)
PDF
USC Computer Science Technical Reports, no. 828 (2004)
PDF
USC Computer Science Technical Reports, no. 785 (2003)
PDF
USC Computer Science Technical Reports, no. 959 (2015)
PDF
USC Computer Science Technical Reports, no. 948 (2014)
PDF
USC Computer Science Technical Reports, no. 845 (2005)
PDF
USC Computer Science Technical Reports, no. 813 (2004)
PDF
USC Computer Science Technical Reports, no. 744 (2001)
Description
Amir M. Zarkesh, Jafar Adibi, Cyrus Shahabi, Reza Sadri, and Vishal Shah. "Analysis and design of server informative WWW-sites." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 646 (1997).
Asset Metadata
Creator
Adibi, Jafar (author), Sadri, Reza (author), Shah, Vishal (author), Shahabi, Cyrus (author), Zarkesh, Amir M. (author)
Core Title
USC Computer Science Technical Reports, no. 646 (1997)
Alternative Title
Analysis and design of server informative WWW-sites (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
21 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269208
Identifier
97-646 Analysis and Design of Server Informative WWW-sites (filename)
Legacy Identifier
usc-cstr-97-646
Format
21 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/