Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Creation and influence of visual verbal communication: antecedents and consequences of photo-text similarity in consumer-generated communication
(USC Thesis Other)
Creation and influence of visual verbal communication: antecedents and consequences of photo-text similarity in consumer-generated communication
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CREATION AND INFLUENCE OF VISUAL VERBAL COMMUNICATION:
ANTECEDENTS AND CONSEQUENCES OF PHOTO-TEXT SIMILARITY
IN CONSUMER-GENERATED COMMUNICATION
by
Gizem Ceylan
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BUSINESS ADMINISTRATION)
May 2022
Copyright 2022 Gizem Ceylan
ii
DEDICATION
This dissertation is dedicated to the memory of my mother – Aynur Ceylan. As the
anchor of our house, her consistent encouragement, support, trust, and love helped me persevere
through the challenges I faced during my doctoral studies. Her brilliance in mathematics has
inspired me when I was a kid and love for science that she instilled in me has brought me to this
day. While she was not able to see me receive my PhD, nothing fulfills me more than knowing
how proud she would have felt that day.
She has been and always will be my north star…
iii
ACKNOWLEDGMENT
During my PhD, I have received support and encouragement from a great number of
individuals. No words can adequately express my gratitude for all they have done for me, but I
will do my best to convey as much as I can.
I owe enormous gratitude to the brilliant mentors who have guided, inspired, and
supported me throughout my PhD journey. First to my advisor. Kristin Diehl is the primary
reason I chose to attend USC, one of the best decisions I have ever made. She has instilled the
highest research standards in me. The long conversations we had taught me how to turn messy
ideas into clean studies and ultimately into compelling papers. The training I have received from
her makes me confident and eager to continue my own research and eventually pass down my
knowledge to my own students in the future.
Thanks also to my dissertation committee members. You collectively believed in me and
took a chance on working with me before I was a fully-formed researcher. Deborah MacInnis
was a profound influence on me, a mentor who generously shared her time and ideas and taught
me how to convey ideas clearly and write compelling papers. I am also indebted to her for her
unconditional support throughout the PhD program and during the job market. Norbert Schwarz
was a source of inspiration for my current ideas and an expert whom I would often go to for his
big-picture insights and steadfast guidance. Wendy Wood taught me how to conduct rigorous
and impactful research. Our deep conversations about the job market and academic success will
lighten my way in my academic pursuit. Finally, Davide Proserpio spent hours with me to teach
me new skills and pushed me out of my comfort zone to become an all-rounded researcher I have
become today.
iv
I have been incredibly fortunate to have classmates and friends who have supported to me
throughout this long journey. To Arianna Uhalde, who is so incredibly generous with her time
and knowledge, and who has been an essential source of encouragement and understanding over
the years. To Yanyan Li, whose help was invaluable especially when I am stuck with coding. To
Melissa Kurata, who have become my best friend and running partner, and who made me believe
that even seemingly crazy targets were achievable if one planned and worked for it. To Erin
Cotter, who has become an invaluable friend with her insights into most pressing issues and big-
picture perspective. Finally, to Cem Dereli, who has become my solid companion and who has
rooted for me through the job market.
Finally, I must express my deep gratitude for the infinite support of my family. None of
this would have been possible without their love, patience, and dedication. My mom, who is not
with us anymore, was my tireless cheerleader who helped me believe in myself and believe that
anything was possible. My dad was my rock, my most reliable sounding board and confidant,
and the person I could count on to drop everything to be there for me. My two cats, Kuzu and
Beefie, have been my unequivocal supporters and love bundles, who were able to make me smile
even in the hardest times. Thank you for believing in me and supporting me all along!
v
TABLE OF CONTENTS
Dedication...................................................................................................................…………... ii
Acknowledgements....................................................................................................................... iii
List of Tables................................................................................................................……….... vii
List of Figures................................................................................................................……….. viii
Introduction................................................................................................................………….... 1
Essay 1 - How do People Communicate Their Experiences Visually and Verbally? More Photos
and More Words............................................................................................................................. 3
Abstract…........................................................................................................................... 3
Introduction......................................................................................................................... 4
Creating Visual Word of Mouth...………………………………….……......................... 8
Conversational Norms in Word of Mouth ……………...………….……......................... 9
Visual Complexity ……………...………………………………….……....................... 11
Impact of Communicators’ Focus……….……………...………….……....................... 12
Overview of Studies......................................................................................................... 13
Study 1 – People Naturally Communicate as if They Held a Redundancy Goal............. 14
Study 2 – Testing Redundant Communication in Naturally Occurring Word of Mouth. 18
Study 3 – Visual Complexity Increases Redundancy ……..…....................................... 23
Study 4 – The Impact of Self- Versus Receiver-Focus……………................................ 27
General Discussions……………...….............................................................................. 31
Essay 2 - Words Meet Photos: When and Why Visual Content Increases Review Helpfulness. 38
Abstract…........................................................................................................................ 38
Introduction...................................................................................................................... 40
Reviews and Review Helpfulness...……………………………….……........................ 42
The Impact of Photos on Review Helpfulness…………………....……......................... 43
The Interplay Between Photos and Review Text…………………....……..................... 44
Overview of Studies......................................................................................................... 46
Study 1 – Greater Photo-Text Similarity Increases Helpfulness in Yelp Reviews.......... 47
Study 2 – Greater Photo-Text Similarity Causes Greater Helpfulness…........................ 63
Study 3 – The Effect of Multiple Photos on Helpfulness……..….................................. 70
General Discussions……………...….............................................................................. 76
Bibliography ………………...……………...….......................................................................... 80
Appendices for Chapter 1............................................................................................................ 91
Appendix 1.1– Study 1: Stimulus..................................................................................... 91
Appendix 1.2 – Study1: Goal Manipulation ……............................................................ 92
Appendix 1.3 – Study 1: Custom Dictionary.……........................................................... 93
vi
Appendix 1.4 – Study2: Yelp and TripAdvisor Datasets …………………………...…. 94
Appendix 1.5 – LIWC See Words Category ……........................................................... 97
Appendix 1.6 –Study 2: Visual Depiction of Relationship between Semantic Similarity
and Photo Sharing……………………………………..……........................................... 98
Appendix 1.7 – Study 3: Stimuli ..................................................................................... 99
Appendix 1.8 – Study 3: Results Without Controlling for Star Rating ………………. 100
Appendix 1.9 – Study 4: Goal Manipulation ………………………....………………. 101
Appendix 1.10 – Study 4: Pretest ………………………………………….…………. 102
Appendix 1.11 – Study Mentioned in the General Discussion…………….…………. 103
Chapter 1 Appendices References …………………………………...…….…………. 104
Appendices for Chapter 2.......................................................................................................... 105
Appendix 2.1 – Study 2: Stimulus.............................................................................................. 105
Appendix 2.2 – Study2: Goal Manipulation ……...................................................................... 106
Appendix 2.3 – Study 3: Stimulus.............................................................................................. 108
Appendix 2.4 – Study 3: Analysis of Discriminant Validity ……............................................. 110
Appendix 2.5 – Study 3: Results on Quality Inference…… ……............................................. 112
vii
List of Tables
CHAPTER 1
Table 1.1A: Study 2 – Yelp Dataset Descriptive Statistics ............................................. 19
Table 1.1B: Study 2 – TripAdvisor Dataset Descriptive Statistics .................................. 19
Table 1.2A: Study 2 – Results of Fixed Effect Model Using Word-Frequency
Method in Yelp Dataset ……..……………………......................................................... 22
Table 1.2B: Study 2 – Results of Fixed Effect Model Using Word-Frequency
Method in TripAdvisor Dataset …..……………............................................................. 22
Table 1.3A: Study 2 – Results of Fixed Effect Model Using Word-Embedding
Method in Yelp Dataset …………………..……............................................................. 23
Table 1.3B: Study 2 – Results of Fixed Effect Model Using Word-Embedding
Method in TripAdvisor Dataset ………………………................................................... 23
CHAPTER 2
Table 2.1: Study 1 – Examples of Photos And Labels Extracted Using Google
Vision API. ……………………...................................................................................... 48
Table 2.2: Study 1 – Examples of Reviews With High And Low Similarity
Between The Photo And The Text.….............................................................................. 50
Table 2.3: Study 1 – Mean (Standard Deviation) of the Yelp Dataset Variables............ 55
Table 2.4: Study 1 – The Effect of Having Photos on Helpful Votes………….............. 57
Table 2.5: Study 1 – The Effect of the Number of Photos and Similarity on
Helpful Votes.................................................................................................................... 59
Table 2.6: Study 1 – Robustness Check: Reviewer FE.................................................... 61
Table 2.7: Study 1 – Robustness Check: Including Topics.............................................. 62
viii
List of Figure
CHAPTER 1
Figure 1.1A: Study 1 – Redundancy as a Function of Communication Goals
(Percentage of Photo-Related Words) ............................................................................. 17
Figure 1.1B: Study 1 – Redundancy as a Function of Communication Goals
(Deliberate Engagement in Redundancy) ........................................................................ 17
Figure 1.2: Study 4 – Percentage of Photo-Related Words as a Function of
Self- Versus Other-Focused Goals.............................…………………………………... 30
CHAPTER 2
Figure 2.1: Study 1 – Visual Depiction of Similarity Assessment between
Review Text and Images in a Review.... ……………………………………………….. 50
Figure 2.2: Study 1 – Number of Monthly Yelp Reviews (Left) And Monthly
Number of Photos Posted (Right). …………………….……………………….………. 52
Figure 2.3: Study 1 – Fraction of Reviews with At Least One Photo by Star
Rating (Left) and Average Number of Photos per Review by Star-Rating (Right).…..... 53
Figure 2.4: Study 1 – Density of the Similarity Scores between Reviews and Photos…. 54
Figure 2.5: Study2 – Helpfulness by Photo Condition………………….……………… 66
Figure 2.6: Study2 – Inferred Quality by Photo Condition………….….……………… 67
Figure 2.7A: Study2 – Comprehension Ease by Photo Condition…………..…..……… 68
Figure 2.7B: Study2 – Amount of New Information by Photo Condition….….……..… 68
Figure 2.8: Study2 – Mediation Model with Three Parallel Mediators……..….…….… 69
Figure 2.9A: Study 3 – Comprehension Ease as a Function of Number of Photos
X Number of Topics………………...…………………………………………………. 73
Figure 2.9B: Study 3 – Amount of New Information as a Function of Number of
Photos X Number of Topics……………………………………………………………. 73
Figure 2.10: Study 3 – Helpfulness as a Function of Number of Photos X
Number of Topics………………………………………………………………………. 74
1
INTRODUCTION
Human information sharing is both idiosyncratic and pervasive. From vacations and
holidays to funny moments with their children, people often share their experiences with others.
Especially with the advent of camera phones, people constantly take photos of their experiences
and share these photos in one-on-one settings or on social platforms. This year alone, people are
expected to take more than a trillion photos of their experiences (Carrington 2020). Every
minute, people share 41 million photos in messages on WhatsApp (Omnicore 2021) and 147,000
photos on Facebook (Omnicore 2020).
Using solely photos as a communication medium may be efficient. However, it is often
not easy for communicators to describe these experiences and for receivers to understand them
based on the information in photos. Hence, those who share their experiences often use photos
and text to convey their experiences, even on sites like Instagram, where visual information is
primary. Yet, how people share their experiences using both visual and verbal information
remains surprisingly unexplored. In the first essay of my dissertation, I examine how consumers
use photos and words to share their experiences with others. Consumers may use photos to
substitute words to communicate efficiently (i.e., “a picture is worth a thousand words”) or use
both photos and text to emphasize certain information by repeating it (i.e., “show and tell”).
Using computational text analyses of two large natural datasets of restaurant reviews (Yelp and
TripAdvisor) and conducting four tightly controlled lab experiments, I find that people’s
perspectives influence how they use visual and verbal information. When people take on others’
perspective and focus on the usefulness of their information for them, they offer similar
information in both photos and text. In contrast, when they fail to take others’ perspective and
focus on themselves instead (e.g., signaling their expertise), people use visual and verbal
2
information as substitutes. This finding holds when people communicate publicly with unknown
others (e.g., writing a review) and privately with close others (e.g., texting a friend).
In the second essay of my dissertation, I focus on the receiver side of visual-verbal
communication and examine how the similarity between visual and verbal content influences
receivers. Specifically, I examine whether receivers find communication helpful when it includes
visual and verbal content that conveys similar information. From an information theory
perspective, dissimilar information in the photo and in the text provides overall more
information. Receivers should find more information helpful as it alleviates more uncertainty.
From a processing perspective, communicating similar information in photo and in text can help
the receiver process the writer’s experience more easily. Using computational text and photo
analyses of a large dataset (Yelp) combined with controlled lab experiments, I find that when
photos and text convey similar (vs. dissimilar) information, (1) the information becomes more
concrete and easier to process, and (2) creates more positive quality inferences regarding the
focal attribute. At the same time, receivers recognize that this similarity also (3) limits the
amount of new information conveyed. Through the totality of these three distinct processes,
greater similarity (vs. dissimilarity) between photos and text increases the review’s helpfulness.
Every day, millions of people convey their experiences to (close or distant) others or
consult other people before engaging in an experience (e.g., before eating at a restaurant or
traveling to a new city). Across both sharing and learning about experiences, my research
examines the interplay between photos and words and identifies the similarity between text and
photos as an important characteristic of visual-verbal communication. Collectively, my work
sheds light on the previously unexplored relationship between photos and words in consumer-
generated communication.
3
CHAPTER ONE
ESSAY ONE: MORE PHOTOS AND MORE WORDS: CHOOSING REDUNDANCY IN
VISUAL WORD OF MOUTH
Gizem Ceylan
Kristin Diehl
ABSTRACT
With the ubiquity of camera phones, consumers often communicate their experiences in
both photos and words. This paper asks how consumers use photos and words to share their
experiences. Is a photo indeed worth a thousand words? Thus, do people convey different
information in each modality efficiently in line with one of the conversational norms?
Alternatively, do people show and tell, repeating similar information in words and photos
redundantly, following a different norm? Across two large-scale, real-world datasets of 887,318
restaurant reviews and three laboratory experiments (N=1,462), I find that photos do not replace
words. Instead, to inform receivers, consumers convey information in both photos and words
redundantly. Consumers are more likely to communicate their visually complex experiences
using photos. At the same time, they are more likely to use words emphasizing visual aspects in
text, and thus, offer more redundant content in visually complex situations. However, when
communicators are self-focused (e.g., to signal expertise), they become more efficient in their
communication. This effect is robust when people communicate with close (e.g., texting a friend)
and distant others (e.g., writing a review) and in real and hypothetical settings. Connecting
4
theories from linguistics and psychology and using a multi-method approach, we examine what
shapes the creation of visual WOM.
Keywords: photos, language, Grice, norms, visual WOM, experiences
5
INTRODUCTION
People often share information with others who were not originally part of their
experience (e.g., a vacation) and creating word of mouth (WOM) about such experiences is
central to people’s everyday lives. People disclose their experiences with close (e.g., friends) and
distant others (e.g., strangers) in private (e.g., text messages) and public settings (e.g., social
media posts). Indeed, many of these conversations are about marketer-provided experiences
(e.g., dining at a restaurant, visiting a museum). Thus, the content and characteristics of the
information people share with others can significantly impact firms’ economic outcomes (Ghose
and Ipeirotis 2011). A large body of work on WOM has examined the verbal content of these
conversations (Chevalier and Mayzlin 2006; Forman, Ghose, and Wiesenfeld 2008). Yet, with
easy access to camera-enabled phones, consumers increasingly share information with others
using not just words but also photos. The current research investigates this novel and unexplored
aspect of WOM: How do people create visual WOM, especially when they want to be
informative to others?
We define visual WOM as any consumer-initiated communication that includes both
pictorial (i.e., photos) and verbal information (i.e., text) related to their experiences. When
people create visual WOM, they have to decide what to share visually (i.e., in the photo) and
what to share verbally (i.e., in the text). In verbal communication, language theorists (Grice
1975; Zipf 1949) point to communication efficiency as the normative choice, given people's
bounded cognitive capacities and cost in producing speech. According to this norm,
communicators should be brief, provide just the appropriate amount of information, and avoid
sharing information that has already been conveyed. In visual-verbal communication, do
communicators follow the same norms? Do consumers use photos to substitute words to be
6
efficient communicators, as suggested by the saying “a photo is worth a thousand words”? We
suggest that, because they want to inform the receiver in an increasingly noisy world, people do
not engage in efficiency, but rather engage in redundancy. Thus, they communicate similar
aspects of their experience in both photos and words. However, when people focus less on the
receiver and more on themselves, they prioritize efficiency.
We test these predictions combining analyses of close to a million consumer reviews
from Yelp and TripAdvisor with three carefully designed experiments. We find that people
generally offer redundant content, conveying the same type of information in both photos and
words. This tendency increases when an experience is visually more complex (vs. simple) and
decreases when communicators focus on themselves (vs. others).
Our findings make several important contributions. First, our work contributes to the
burgeoning literature on psychological drivers of word of mouth from the communicator’s
perspective. While much research has examined the impact of reviews on receivers (see Moore
and Lafreniere 2020 for a recent review), little attention has been paid to how consumers
generate word of mouth in the first place. Further, to the best of our knowledge, no work has
examined the interplay between words and photos in visual WOM. To fill this gap in the
literature, we demonstrate that communicator characteristics (e.g., communicators’ focus) and
features of the experience (e.g., visual complexity) can impact the production of visual WOM.
Second, this work deepens our understanding of the role of photos in WOM and
consumer-generated content more generally. In 2021, consumers will have taken 1.4 trillion
photos (Carrington 2020). Yet, little is known about how people use the visual content they
create in social interactions and, particularly, in WOM. We contribute to the literature by
7
examining the interplay between words and photos and the factors that can impact this
relationship.
Third, our findings extend work on the linguistic content and structure of WOM by
focusing on words communicators use when they generate visual WOM. Prior work identified
antecedents and consequences of using different types of words (e.g., recommendation words,
Packard and Berger 2017). We add to this literature by examining the interplay between photos
and words and suggest that when consumers want to be informative, they use visual words in
addition to photos to convey their experiences.
Fourth, we examine norms of conversation in the context of visual WOM. In verbal
communication, prior research identified norms by which conversational partners establish what
information is mutually available, what level of detail is expected, and what would be redundant
information (Grice 1975). In visual-verbal communication, it has yet to be examined whether
people follow these norms and, if so, and how people prioritize contradictory norms when they
want to share their experiences visually and verbally. Our findings suggest that communicators
prioritize different norms depending on their focus (on the receiver or themselves), ultimately
influencing how they create visual WOM.
Fifth, from a methodological perspective, we use a broad range of quantitative text
analysis methods, mixing standard (e.g., frequency-based dictionary) and “nonstandard” (e.g.,
word-embeddings) techniques. We use text analysis to gain insights into “what” is being said and
“how” it is said. This method helps us quantify the information contained in the textual data as it
naturally occurs (Berger et al. 2020). We further strengthen our analyses with experiments for
causal inference.
8
CREATING VISUAL WORD OF MOUTH
Grice describes several norms (or maxims) of communication (Grice 1975). While these
norms were formulated strictly for verbal communication, they are relatively broad and may be
applicable also to visual-verbal communication. We empirically examine two of these norms in
situations when people communicate using words and photos. The Maxim of Quantity suggests
that the communicator should be brief, provide the appropriate amount of information, and avoid
sharing information that has already been conveyed to the receiver. In a communication that
includes both visual and verbal components, this maxim may predict that each component should
convey a different piece of information to reduce the overlap between the two components. To
that end, the information that can be described visually should be conveyed in the photo and not
in the words. Thus, photos should substitute words (or vice versa).
Alternatively, people’s visual-verbal communications may follow a different norm:
Grice’s Maxim of Relation. This principle suggests that the communicator should avoid
ambiguity and guide the receiver towards the relevant information. This maxim may predict that
people convey similar information in visual and verbal modalities in a redundant way, to
increase the potential relevance of the information to the receiver and to facilitate processing
(Wilson 1993). According to Grice, by emphasizing the same information visually and verbally,
communicators can guide the receiver towards what is relevant in their decision making.
Redundancy may seem necessary especially in environments where the receivers’ attention is
spread thin (Partan and Marler 1999) and communicators want to ensure their messages will get
through as intended. Given the dearth of attention in today’s world in general (Davenport and
Beck 2001) and the relatively low cost of using words and photos jointly, we expect that, in
9
visual-verbal communications, people will prioritize redundancy (in line with the Maxim of
Relation) over efficiency (in line with the Maxim of Quantity).
CONVERSATIONAL NORMS IN WORD OF MOUTH
In WOM, people converse with others. These conversations can be one-on-one (e.g.,
chat) or one-to-many (e.g., reviews). Prior research shows that even when consumers engage in
one-to-many communication, factors related to the audience, such as tie strength (Frenzen and
Nakamoto 1993) and audience size (Barasch and Berger 2014) matter, suggesting that
communicators imagine a conversation partner. Thus, we examine whether conversational norms
can inform us about visual WOM and help us predict how people create visual WOM.
In conversations, communicators encode their experiences into utterances, and receivers
decode and construct meaning (H. Clark 1992). Prior research characterizes verbal
communication as a cooperative endeavor between communicators and receivers, governed by
norms (H. H. Clark 1996; Schwarz 1994; Sperber and Wilson 1986). Grice's cooperative
principles (1975) specify maxims (norms) by which conversational partners establish what
information is mutually available, what level of detail is expected, and what would be redundant
information. Two norms of conversation are fundamental in informing our predictions. Grice's
Maxim of Quantity states that communicators should be brief, provide just the appropriate
amount of information, and avoid sharing information that has already been conveyed (Grice
1975). This norm recommends communicators avoid redundant communication, which is
defined as mentioning more than the minimal information necessary to be informative. Avoiding
redundancy may benefit the receiver who is not overwhelmed with processing more than the
necessary information. It also benefits the communicator who does not have to bear the cost of
creating and sharing unnecessary information (Goodman and Frank 2016). In visual WOM, this
10
principle may predict that communicators should convey different aspects of the experience in
each modality, reducing the information overlap between modalities.
To adhere to this norm, communicators could take advantage of each modality’s capacity
to convey particular types of information. Photos could make visual aspects visible to the
receiver and may be important in providing context. For instance, Barasch, Zauberman, and
Diehl (2018) find that when individuals share photos with others who were not present during
the experience, they are more likely to share photos depicting prototypical elements of the
event—in their case, Christmas trees, stockings, and gifts. Such photos could ease the
communication process between the communicator and the receiver. In general, photos may be
advantageous when consumers want to communicate visual aspects of their experiences.
Therefore, a communicator could convey visual aspects of the experience (e.g., the color of the
sky during sunset) using a photo rather than words, minimizing the information overlap between
modalities (i.e., as if they hold an efficiency goal).
Grice's Maxim of Relation, however, states that communicators should strive to be as
informative as possible and guide the receiver towards the most relevant information. One way
to accomplish this is to reduce ambiguity by re-emphasizing relevant information even when this
information is already available to the receiver (Wilson 1993). In visual WOM, this principle
may suggest that communicators convey information about the same aspect of the experience
both visually and verbally. They may emphasize the aspect of interest (e.g., the color of the sky
during sunset) by repeating information about that aspect in the text, even though this
information is already visible to the receiver in the photo. As they focus on getting their message
across to the receiver, individuals offer redundant content by repeating similar information in
both modalities. Past research suggests that consumers communicate as if they hold a
11
redundancy goal, particularly in environments where the receivers' attention is spread thin
(Partan and Marler 1999) and communicators want to ensure their messages get through as
intended. Given the dearth of attention in today's world in general (Davenport and Beck 2001)
and the relatively low cost of using words and photos jointly, we expect consumers to prioritize
the Maxim of Relation (redundancy) over the Maxim of Quantity (efficiency). In other words,
we predict that when people focus on informing others, they communicate as if they have a
redundancy goal: individuals refer to the same aspect of a photo in the text, or they add a photo
illustrating a visual aspect presented in the text.
H1: In visual WOM, people offer redundant content, sharing similar information in both
photos and words.
VISUAL COMPLEXITY
Photos should be particularly efficient in capturing visually complex experiences (e.g., a
painting in a museum, the northern lights, etc.). In learning, for example, using visual instead of
verbal information can reduce cognitive load in conveying complex topics (Perkins and Unger
1994). Counterintuitively, however, we predict that, as visual complexity increases,
communicators may feel a greater need to use words in addition to visuals in their
communication. This prediction aligns with recent research in psycholinguistic (Degen et al.
(2020) that found evidence of greater verbal redundancy in more complex environments. In
referential communication (i.e., pointing out a particular object in a given scene), Degen et al.
find that people communicate more redundantly when the object is presented in more complex
environments. For instance, when a banana photo is presented together with other fruits (the
more complex environment), individuals redundantly mention the color of the banana (i.e., they
12
say “yellow banana”) even when the depiction is also visible to the receiver. Their findings
suggest that, contrary to linguistic theories (e.g., Rational Speech Act models, Goodman and
Frank 2016) and related norms calling for efficiency, redundant communication can be
informative in a more complex environment. Thus, we expect that communicators to offer
redundant content when sharing a visually more (vs. less) complex experience. This prediction
may seem surprising since photos should be more important as visual complexity increases. We
suggest that photos are important in conveying complex experiences and hence are used more in
such situations, but so are words.
H2: People will create greater redundancy when visual aspects of the experience are
more (vs. less) complex.
IMPACT OF COMMUNICATORS’ FOCUS
Social interactions typically necessitate at least some level of attention to others
(Cavanaugh, Gino, and Fitzsimons 2015). We suggest that redundancy reflects communicators’
focus on and their intention to be informative to the receiver. However, communicators may not
always focus on the receiver. Instead, they may, at times, focus on themselves. We suggest that
this shift in focus may alter the extent to which individuals offer redundant content. To be
informative, a communicator must produce messages sensitive to a receiver's status, knowledge,
and abilities (Listener Rule; Sonnenschein and Whitehurst 1982). However, when consumers
focus greater attention on themselves, they are less likely to take the receiver’s side into account.
Instead, focusing on themselves may lead them to communicate with the least effort and thus,
prioritize efficiency.
13
One common goal in interpersonal communication and WOM is to establish one’s
expertise. This goal alters people’s language use and leads people to use more assertive and task-
oriented language (Littlepage et al. 1995). We suggest that this goal may also affect how people
prioritize norms and create visual WOM. Indeed, higher (vs. lower) status individuals in
organizations focus less on others and communicate more efficiently, using less redundant means
of communication (Leonardi, Neeley, and Gerber 2012). Similarly, we predict that people are
less likely to offer redundant content and emphasize relevant aspects in photos and words when
focused on themselves (e.g., to establish their expertise) versus the receiver (i.e., to inform).
H3: When people focus on themselves (e.g., to establish expertise) rather than the
receiver (e.g., to inform), they become less redundant in their communication.
OVERVIEW OF STUDIES
Four studies test these predictions in real and hypothetical settings. Study 1 manipulates
people’s communication goals, demonstrating that people naturally share an experience as if they
hold a redundancy (vs. efficiency) goal. In study 2, we tested our theorizing in the field using
two separate datasets (Yelp and TripAdvisor). We analyzed naturally occurring consumer
reviews to examine whether the presence of photos is accompanied by a greater visual emphasis
in the review text (i.e., greater redundancy). Controlling for a broad range of review features
(e.g., review length, star rating, device type, restaurant type, review date), we found real-life
evidence for redundant communication. Study 3 manipulates the visual complexity of the
experience and demonstrates that an increase in visual complexity increases not only the usage of
photos but also the usage of words referring to visual aspects, increasing redundant
communication. Further, study 4 tests the role of the communicator’s focus. We theorize and
14
find that individuals become more efficient and less redundant in their communication when they
focus on themselves rather than the receiver.
STUDY 1 - PEOPLE NATURALLY COMMUNICATE AS IF THEY HELD A
REDUNDANCY GOAL
As the first test of hypothesis 1, we manipulated people’s linguistic goals (i.e.,
redundancy vs. efficiency) in a one-to-one communication context and compared these
benchmarks to how people communicate in the absence of any explicit goal. We expected people
to naturally communicate as if they had an explicit redundancy goal, i.e., they convey visual
content not just in the photo they share but also in the text they write to their communication
partner.
Method
Participants. As preregistered (AsPredicted #59716), study 1 followed a 3-group
between-subjects design (linguistic communication goals: no explicit goal (natural situation) vs.
redundancy goal vs. efficiency goal). We posted the study for a target of 300 participants on
Amazon's Mechanical Turk and received 330 sign-ups. Eight failed the preregistered attention
check, yielding a sample of 322 qualified participants (53.7% female; Mage = 41.3, SDage = 20.9).
This study was approved by the university’s IRB.
Procedure. All participants imagined seeing an interesting donut at a new donut shop.
They further imagined communicating this experience to a friend. As per the cover story, they
decided to share the photo of the donut with their friend. The donut photo was constant across
conditions (Figure S1 in Supplemental Material). They were told that they also decided to send
15
some text. In the redundancy goal condition, participants were told to convey their experience
with the goal to communicate thoroughly, possibly repeating some of the information that was
already in the photo. In the efficiency goal condition, participants were told to convey their
experience with the goal to communicate efficiently, without repeating information that was
already in the photo. Those in the natural, no explicit goal condition were simply told to convey
their experience to their friend (for the exact wording, see Table S1 in Supplemental Material).
Participants then typed the text they would send (minimum 100 characters ~ 20 words). We also
measured participants' deliberate engagement in redundant communication using two items.
After completing their text, participants indicated the extent to which (1) the information
conveyed in their text was exactly the same as the information conveyed in the photo (1 = not at
all, 7 = very much), and (2) their text verbally repeated the information that was conveyed in the
photo (1 = not at all, 7 = very much). We averaged these two measures (r = .75) and created a
composite score.
Assessing redundancy. To test H1, we measured the extent to which the text participants
wrote included visual content similar to the photo. To measure the visual content related to the
photo, we built a custom dictionary based on two sources. First, we used the see category from
the Linguistic Inquiry and Word Count program (LIWC), a language analysis program
commonly used to study the relationship between language and psychological variables
(Pennebaker, Mehl, and Niederhoffer 2003). LIWC analyzes occurrences of words based on a
dictionary. The LIWC lexicon is a well-validated dictionary of words reflecting various
psychological constructs and has been used in numerous text analysis applications (Humphreys
and Wang 2018). This approach is often recommended for semantic markers (Humphreys and
Wang 2018) and is well-suited to our investigation. The see words category, a sub-category
16
under perceptual processes includes words that describe visual features, such as blue, colorful,
round, dark (Appendix C in Supplemental Material for all words in that category). Second, we
conducted a pretest, following methods from Edell and Staelin (1983), and showed the donut
photo to 400 participants. Participants wrote down everything the photo conveyed to them about
the donut. We examined the top 100 words from these descriptions. We added 59 unique words
from that list that were not in the see category (such as rainbow, sky; all words are in Appendix
C in Supplemental Material). The final custom dictionary includes 195 visual words related to
the donut photo. As the photo of the donut is constant in all conditions, we assessed the extent to
which communicators offered redundant content by measuring the frequency of visual words in
the text.
Results
Redundant content in the text. We regressed the frequency of visual words on goal
conditions, which revealed differences among the three goal conditions, F(2,319) = 6.82, p =
.001 (figure 1). Pairwise comparisons showed that, as intended, participants in the redundancy
condition referred to the photo more. A larger percentage of the text included photo-related
words in the redundancy (M = 9.1% of total words) than in the efficiency condition (M = 6.4%),
= -2.66, t(320) = -3.68, p < .001. More importantly, and supporting our prediction (H1),
participants in the natural goal condition also included more photo-related words (M = 7.9 %)
than those in the efficiency goal condition, = -1.50, t(320) = -2.11, p = .04. Further, there was
no significant difference in the frequency of photo-related words between the natural goal and
the redundancy goal conditions, = 1.15, t(320) = 1.61, p = .11.
17
Deliberate engagement in redundancy. Paralleling these findings, compared to the
efficiency goal condition (M = 2.53), participants in the redundancy goal reported greater
deliberate engagement in redundant communication (M = 3.95), = -1.41, t(320) = -6.62, p <
.001. Importantly, in the natural goal condition participants also reported greater deliberate
engagement in redundancy than in the efficiency goal condition (M = 4.02), = -1.48, t(320) = -
7.04, p < .001. Further, the reported engagement in redundancy did not differ between the natural
goal and redundancy goal conditions, = 0.06, t(320) = 0.31, p = .76.
FIGURE 1.1 STUDY 1 – REDUNDANCY AS A FUNCTION OF COMMUNICATION
GOALS
A. PERCENTAGE OF PHOTO-RELATED
WORDS
B. DELIBERATE ENGAGEMENT IN
REDUNDANCY
NOTE. – *** indicates p < .001, * indicates p <.05. Error bars represent standard errors
of the mean.
18
Discussion
These results provide initial evidence that when people share a photo of their experience,
they also communicate visual content in the text, although this content is already visible to the
receiver in the photo. Thus, when they create visual WOM, people convey similar information in
both photos and words. Our findings provide evidence that individuals deliberately communicate
redundantly and communicate as if they hold a redundancy goal, supporting H1. Next, we aim to
further examine this hypothesis in a more externally valid dataset of restaurant reviews posted on
Yelp and TripAdvisor. Additionally, we aim to generalize our findings beyond one-to-one
communication (e.g., texting a friend in study 1) and test our prediction in a one-to-many setting
(e.g., writing an online review).
STUDY 2 - TESTING REDUNDANT COMMUNICATION IN NATURALLY OCCURRING
WORD OF MOUTH
Datasets
We examined two large, real-world datasets, one from Yelp and TripAdvisor. The
datasets included reviews of restaurants in the Los Angeles area (681,526 reviews on Yelp and
205,792 reviews on TripAdvisor) written between 2004 and 2019. We collected the review text
as well as several control variables: (a) the reviewer's rating of the restaurant, (b) any photo the
reviewer uploaded, (c) device type (mobile vs. not, available only on TripAdvisor), (d) reviewer
status (elite vs. not, available only on Yelp), (e) number of total reviews the reviewer has written,
19
(f) restaurant-related information (e.g., cuisine), and (h) date of the review (available only on
TripAdvisor). We summarize these variables for each dataset in tables 1A and 1B.
TABLE 1.1A STUDY 1 – YELP DATASET: DESCRIPTIVE STATISTICS
N = 681,526 reviews in Yelp Mean SD Min Max
Number of words per review 111 105 0 994
Number of photos per review 0.6 2.0 0 36
Proportion of reviews with a photo 21%
Proportion of 5-star reviews 49%
Proportion of reviews by elite users 16%
Number of reviews written by the average
reviewer
152 325.8 1 12,596
TABLE 1.1B STUDY 1 – TRIPADVISOR DATASET: DESCRIPTIVE STATISTICS
N = 205,792 reviews in TripAdvisor Mean SD Min Max
Number of words per review 77 75 0 2.514
Number of photos per review 0.6 1.3 0 4
Proportion of reviews with a photo 22%
Proportion of 5-star reviews 49%
Number of reviews written by the average
reviewer
146 272.6 1 7,037
Proportion of reviews written on a mobile
device
31%
Analyses
Word-frequency averaging method. Similar to study 1, we used word-frequency
averaging method to analyze the relationship between photos and visual content in the reviews.
Given the breadth of experiences (e.g., different cuisines and restaurant types) discussed in the
reviews, we did not build a custom dictionary. Instead, we relied on the see words category from
LIWC.
Embeddings method. While the frequency method is very useful, one limitation is that not
every review contains the words in the pre-defined dictionary. To avoid data sparsity issues and
20
to ensure the robustness of our results, we also analyzed the relationship between photos and
words in reviews by using a second approach relying on word embeddings methods.
We used the distributed dictionary method (Garten et al., 2018) applied to the set of see
words in the LIWC lexicon. The distributed dictionary method uses word embeddings
representations to measure the semantic distance (i.e., dissimilarity in meaning) between the
review and a given construct characterized by a set of words. Word embeddings are popular tools
in computational linguistics that quantify the meanings of words by describing them as high-
dimensional vectors. Word vectors are derived from the structure of word co-occurrence in
natural language and are useful for more comprehensive text analysis (Bhatia, Richie, and Zou
2019). Here, we used word embeddings to extrapolate visual content as a measure of
redundancy.
Since word embeddings quantify word meaning, reviews closely related to the construct
of interest, i.e., reviews with visual content in our case, will be characterized by vectors closer to
the vectors of the words describing the construct. In contrast, reviews less related to the construct
of interest will have vectors further away from the vectors of the words describing the construct.
Formally, we calculate the degree of visual content in each review by first tokenizing and
vectorizing the review with the word2vec embeddings model (Mikolov et al. 2013). This gives
us, for each review i, a 300-dimensional vector wi. We also used the word2vec model to obtain
word vectors wk for each of the LIWC visual content words and then averaged the vectors for the
entire see word set to obtain a single vector representation v. Finally, for each review i, we
calculated the relative semantic distance between its vector wi and the vector v.
The measure of cosine distance used to compute the distance between visual content
words and each review lies in the range [−1, +1]. Maximum similarity is indicated by cos𝜃 = 1,
21
and maximum dissimilarity is indicated by cos𝜃 = -1. Reviews with more positive values use
words that are more semantically related to visual content words.
Results
We tested whether the presence of a photo in the review predicts an increase in the degree
of visual content in the review, i.e., evidence of redundancy. For each dataset, we predicted the
visual content in the review from the presence of a photo in the review over and above the
control variables. Each review was coded as binary (1 = with a photo and 0 = without a photo).
Some restaurants may be inherently more visually attractive, which may contribute to our effect.
To eliminate the possibility of restaurant heterogeneity driving our results, we estimated a fixed
effect model by using the lfe package from R (Gaure 2013). Using OLS, we clustered standard
errors at the restaurant level. Further, because camera-enabled phones became more available
after the iPhone launch in 2007, more recent reviews may be more likely to included photos.
Where available (TripAdvisor dataset), we clustered standard errors at both the restaurant and the
review date levels.
Word-frequency averaging method. Using fixed-effects models for both the Yelp and
TripAdvisor datasets, we found evidence of redundancy: The presence of a photo in the review
predicted the degree of visual content in the review text across both platforms; Yelp = 0.10, t =
28.80, p < .001; TripAdvisor = 0.12, t = 16.88, p < .001 (Tables 2A and 2B).
22
TABLE 1.2A STUDY 1 – RESULTS OF FIXED EFFECT MODEL USING WORD-
FREQUENCY METHOD IN YELP DATASET*
Estimate SE t-value p-value
Presence of a photo 0.095 0.003 28.798 <2e-16
Review rating -0.001 0.001 -1.579 .114
Word count (excluding visual
words) 0.082 0.001 51.112 <2e-16
Reviewer status (elite) ** 0.018 0.003 5.160 <2e-16
Adjusted R-squared 4.2%
* Restaurant specific fixed effects are included in the model but not presented here
** Reviewers’ total number of reviews correlated strongly with the reviewer status (r = .47) and
hence, was not included in the model
TABLE 1.2B STUDY 1 – RESULTS OF FIXED EFFECT MODEL USING WORD-
FREQUENCY METHOD IN TRIPADVISOR DATASET*
Estimate SE t-value p-value
Presence of a photo 0.128 0.008 16.882 <2e-16
Review Rating 0.019 0.003 6.494 <2e-16
Word count (excluding visual
words) 0.026 0.004 6.006 <2e-16
Device (mobile) -0.018 0.006 2.721 .006
Reviewers’ total reviews (log
transformed) 0.005 0.002 2.423 .015
Adjusted R-squared 10.5%
* Restaurant and time fixed effects are included in the model but not presented here
Word-embedding method. Paralleling our findings from the frequency-based method,
results based on semantic similarity also revealed a significant positive association between the
likelihood to include a photo and the similarity scores between reviews and visual words (Yelp
= 0.001, t = 6.14, p < .001; TripAdvisor = 0.001 t = 6.05, p < .001; (Tables 3A and 3B).
23
TABLE 1.3A STUDY 1 – RESULTS OF FIXED EFFECT MODEL USING WORD-
EMBEDDING METHOD IN YELP DATASET*
Estimate SE t-value p-value
Presence of a photo 0.001 0.001 6.136 <2e-16
Review rating 0.001 0.000 31.521 <2e-16
Word count (excluding visual
words) 0.001 0.000 102.97 <2e-16
Reviewer status (elite) 0.004 0.001 38.209 <2e-16
Adjusted R-squared 16.8%
* Restaurant specific fixed effects are included in the model but not presented here
TABLE 1.3B STUDY 1 – RESULTS OF FIXED EFFECT MODEL USING WORD-
EMBEDDING METHOD IN TRIPADVISOR DATASET*
Estimate SE t-value p-value
Presence of a photo 0.001 0.001 6.046 <2e-16
Review Rating 0.001 0.001 9.925 <2e-16
Word count (excluding visual
words) 0.001 0.000 124.594 <2e-16
Device (mobile) -0.002 0.002 -10.491 <2e-16
Reviewers’ total reviews (log
transformed) 0.001 0.001 20.163 <2e-16
Adjusted R-squared 25.2%
* Restaurant and time fixed effects are included in the model but not presented here
Discussion
Study 2 further demonstrates that people engage in redundant communication when both
photos and words are at their disposal: When reviewers include a photo with their review, they
also emphasize visual aspects in the review text more. This study provides additional and
externally valid evidence for H1. However, in review datasets, the experiences people shared
varied in both review datasets. Some restaurant experiences may be more visually complex than
others, which we predict is part of our effect. Namely, we predict, people offer more redundant
communication as the visual complexity of an experience increases (H2). Using a fixed-effects
model, we controlled for all differences between restaurants and were not able to isolate the
24
effect of visual complexity. Next, we examine H2 and assess whether people engage in even
greater redundancy when their experience is more (vs. less) visually complex. Intuitively, photos
should be more helpful and convey the required information better in more complex settings.
However, we predict and test that people become more redundant and also convey more visual
content in the text when sharing their more complex experiences.
STUDY 3 – VISUAL COMPLEXITY INCREASES REDUNDANCY
Study 3 has three goals. First, we test whether people become more redundant in their
communication as a function of visual complexity (H2). Photos can reduce ambiguity more when
experiences are more visually complex. As a result, one would expect that consumers evaluate
photos as more helpful and are more likely to share a photo in a more (vs. less) complex context.
More importantly, though, we predict that, at the same time, people will also create greater
redundancy by sharing photos and emphasizing visual aspects in words when experiences are
more visually complex. Second, study 2 provided evidence for redundancy when people shared
their own experiences. Yet, these experiences took place across a large variety of settings. Here,
we test H2 when all participants have the same focal experience (consuming a donut). Third, we
provide evidence of redundancy when individuals choose both their own words and their own
photos to convey the experience, mimicking how consumers share their experiences in real life.
Thus, this study involves participants’ decisions on each part of their visual-verbal
communication and provides evidence that visual complexity impacts not just photo-sharing but,
more importantly, also the extent of redundancy in individuals’ communication.
25
Method
Participants. Study 3 followed a 2-group between-subjects design (visual-complexity:
high vs. low). We expected a greater percentage of people would find the photo useful and share
a photo of the donut experience as visual complexity increases. Further, we expected people to
offer more redundant content in the high-complexity compared with the low-complexity
condition. We recruited 203 participants (58.1% female; Mage = 19.9, SDage = 1.55) from a
university subject pool. Participants received course credit for their participation in an hour-long
lab session that also included other studies. This study was approved by the university’s IRB.
Procedure. Participants completed the study on their cell phones. All participants
imagined they went to a new donut shop. After consenting and receiving general instructions, the
study prompted participants to raise their hands and ask for a donut from the research assistant.
Participants were randomly assigned to one of two complexity conditions by virtue of the lab
session they participated in. In the complex condition, participants received a decorated donut
that was more visually complex (i.e., a rainbow-decorated donut similar to the one used in study
1; Appendix G in Supplementary Material). In the less complex condition, participants received
a chocolate-glazed donut. All participants received two plates with their assigned donuts; one
plate featured a donut as the display donut that participants could later photograph if they wanted
to, the other plate included half a donut for participants to eat. Participants ate as much as they
wanted of the donut and then proceeded to the next survey page. To increase the realism of the
study, we asked participants to provide a star rating for the donut (from 1-star to 5-star) as they
would normally do on a review site. Participants, on average, assigned a higher star value in the
complex (M = 3.84, SD = 0.90) than in the simple condition (M = 3.47, SD = 0.97, F (1, 200) =
7.99, p = .005) likely because of the donut’s uniqueness. To establish the effect of visual
26
complexity on our focal measures over and above its effect on star ratings, we enter star ratings
as a control in all subsequent analyses.
1
After providing the star rating, participants were asked to write a review. Next,
participants were asked whether they would consider adding a photo to their review (1 = yes, 0 =
no). If they indicated they would add a photo, participants took a photo using their camera-phone
and uploaded it to the survey. We measured redundancy as the extent to which the review text
included see words using LIWC dictionary among those who chose to take a photo.
We also asked whether reviewers thought their review text and a photo would be useful
to a fellow student looking for a donut shop. They indicated how useful they thought their review
text would be (1 = not at all useful, 7 = very useful). Further, independent of their photo choice,
they also indicated how useful they thought including a photo with the review text would be (1 =
not at all useful, 7 = very useful).
Results
Review and photo helpfulness. Participants rated their review text as marginally more
helpful in the more complex (M = 4.97, SD = 1.35) versus less complex condition (M = 4.55, SD
= 1.39); = 0.38, t (200) = 1.92, p = .06. They also found that adding a photo would be more
helpful in the more complex (M = 5.94, SD = 1.51) versus less complex (M = 4.64, SD = 1.93)
condition; = 1.23, t (200) = 4.94, p < .001. These results align with our theorizing that photos
should reduce ambiguity more when experiences are visually more complex.
Sharing a photo. We expected that with greater visual complexity, more participants
would be more likely to share a photo with their text. We regressed the likelihood of sharing a
1
Results are similar in pattern and statistical significance without this control (Appendix H in Supplemental
Material).
27
photo (1 = yes, 0 = no) on condition using logistic regression and controlled for star rating. As
expected, a greater percentage of participants in the complex condition (93%) indicated that they
would include a photo with the review compared to those in the simple condition (61%), =
1.00, z = 4.52, p < .001. Still, even in the less complex condition, the majority of participants
shared a photo, suggesting that in today’s world, photos are generally important in telling a story
about one’s experience.
Redundant content in the text. We examined whether participants who shared a photo (N
= 155, 76% of original sample) also included visual words in their text, i.e., whether they offered
redundant content. With this objective, we regressed visual content words in the review on
condition, controlling for star ratings. This analysis revealed a significant effect of the condition;
= 3.69, t(153) = 4.85, p < .001. Even though everybody in the sample shared a photo, those in
the more complex condition created greater redundancy by referring to visual aspects more (M =
3.92, SD = 5.96) than in the less complex condition (M = 0.14, SD = 0.77).
Discussion
Photos inherently capture visual aspects of an experience better than words. Thus, photos
should be even more important and better able to replace words when consumers convey more
visually complex experiences. Indeed, we find that people rate photos as more helpful as visual
complexity increases and are more likely to include photos when sharing a more complex versus
less complex experience. However, even when they share a photo, people also create greater
redundancy in their communication as visual complexity increases in line with H2. As one would
expect, photos become more helpful in conveying complex experiences, but not at the expense of
words. Rather, photos and words go hand in hand.
28
So far, we have found evidence for redundant communication across a range of contexts.
We suggest that people create redundancy in their communication because they want to inform
the receiver. As a result, if their focus shifts from the receiver to themselves, we expect this shift
to alter the extent to which they create redundant communication. Next, we will examine this
prediction (H3).
STUDY 4 – THE IMPACT OF SELF- VERSUS RECEIVER-FOCUS
Study 4 examines whether consumers’ focus on themselves versus the receiver alters
their redundant use of photos and words. If people’s engagement in redundancy reflects their
focus on the receiver, any deviation from that focus should alter their redundant content. To test
this prediction (H3), we manipulated consumers’ communication goals (i.e., to inform others vs.
to establish one’s own expertise) to shift their focus from the receiver to themselves. We also
examined how people communicated naturally, i.e., in the absence of any explicit
communication goal. We expected that people would naturally communicate as if they focused
on the receiver of their communication.
Method
Participants. As preregistered (AsPredicted #66817), Study 4 followed a 3-group
between-subjects design (goals: no-goal (natural) vs. information-goal vs. expertise-goal; Table
S2 in Supplemental Material). We expected people to offer more redundant content in the
information- and no-goal conditions than in the expertise-goal condition. We posted the study for
400 participants on Amazon's Mechanical Turk with an intended final sample of 360 participants
(i.e., 120 participants per condition). As 48 participants failed the preregistered attention check,
29
the final sample included 352 qualified participants (56.3% female; Mage = 37.9, SDage = 12.9).
This study was approved by the university’s IRB.
Procedure. Participants were randomly assigned to one of the three goal conditions. All
participants imagined seeing the same donut from study 1. The photo of the donut was constant
across conditions (Figure S1 in Supplemental Material). They further imagined sending this
photo and some text in a text message to their friend. In the information-goal condition,
participants were told that they wanted to provide information about new openings in the
neighborhood and to keep their friend up to date. In the expertise-goal condition, participants
were told that they wanted to show their expertise in finding out about new openings in the
neighborhood, and to impress their friend with their discovery. A pretest had confirmed that the
expertise-goal condition created greater self-focused concerns about one’s own credibility
compared with the information-goal condition (see Appendix J in Supplemental Material).
Further, similar to study 1, those in the no-goal (natural) condition were simply told they wanted
to convey their experience to their friend. Participants then typed the text they would send along
with the donut photo (minimum 100 characters ~ 20 words). To assess redundancy, we assessed
visual content in the text using the same custom dictionary as in study 1.
Results
Manipulation check. We examined whether people’s language changed as a function of
the expertise- or information-goal manipulations. In line with prior research (Littlepage et al.
1995), we found that people in the expertise-goal condition (M = 3.55) used fewer tentative
words than in the information-goal (M = 5.20), = -1.65, t(351) = -3.07, p = .002 and the no-
goal conditions (M = 4.29), = -1.25, t(351) = -2.29, p = .02. We did not find any difference
between the information- and no-goal conditions, = -.40, t(351) = - .77, p = .44.
30
Redundant content in the text. Next, we regressed the frequency of photo-related words
on the goal conditions, which revealed a significant difference of the goal conditions, F(2,349) =
4.28, p = .01 (Figure 2). Pairwise comparisons showed that participants in the information-goal
condition used more photo-related words (M = 7.5% of total words) than participants in the
expertise-goal condition (M = 6.1%), = -1.42, t(351) = -2.44, p = .02, supporting H3. Further
supporting our prediction, we find participants in the no-goal condition also used more photo-
related words (M = 7.7%) than those in the expertise-goal condition, = -1.57, t(351) = -2.67, p
= .008. There was no significant difference in the frequency of photo-related words between the
no-goal and information-goal conditions, = .14, t(351) = .26, p = .79.
FIGURE 1.2
STUDY 4 – PERCENTAGE OF PHOTO-RELATED WORDS AS A FUNCTION OF SELF-
VERSUS OTHER-FOCUSED GOALS
NOTE. – ** indicates p < .01, * indicates p <.05. Error bars represent standard errors of
the mean.
31
Discussion
These results provide direct evidence that when people simply want to share their
experience, their focus is on the receiver. As a result, they offer redundant content, referring to
the information that is already in the photo in their text. This tendency is reduced when they
focus on themselves to establish their expertise. When they focus on themselves, individuals
become more efficient in their communication. This result is in line with the idea that people
ideally would like to communicate following the principle of least effort (Zipf 1949). In a
redundant communication, communicators incur additional costs by repeating similar
information in the photos and the words. When communicators focus on themselves, however,
their communication becomes more efficient and less costly as they do not repeat similar
information in different modalities anymore. Further, these findings provide evidence for the
psychological process that explains why people create redundancy in their communication in the
first place (i.e., to inform the receiver) when one modality (i.e., the photo) would seem sufficient
to convey an experience.
GENERAL DISCUSSION
Researchers have long been interested in the psychological drivers of social transmission.
In social interactions, photos have become more prevalent due to the popularity and availability
of camera-enabled phones. This paper examines how people convey their experiences to others
when both visual and verbal means of communication are at their disposal. In this context, we
ask: is the saying "a photo is worth a thousand words" accurate in visual WOM? When sharing
experiences, do photos replace words (or vice versa) for efficient communication?
32
Across two large, real-world datasets and three laboratory experiments, we consistently
find evidence that people communicate similar information in photos and words redundantly.
Indeed, we demonstrate that people communicate as if they have a redundancy (vs. efficiency)
goal. Our findings suggest that people convey their experiences in line with Grice's Maxim of
Relation (to provide relevant information) and less with Grice’s Maxim of Quantity (to avoid
information that is already known). We also demonstrate that the visual complexity of the
experience increases people’s engagement in redundancy. In conveying visually complex (vs.
simple) experiences, people consider a photo helpful to the receiver and are more likely to
include it in their communication. However, verbally, they also refer to the visual aspects of this
experience more, even when these aspects are already visible in the photo. Notably, repeating the
same aspects in both photos and words makes the communication redundant, increasing the cost
to the communicator. Still, people seem to willingly incur these additional costs and offer
redundant content when they focus on the receiver (i.e., to inform) but less so when they focus
on themselves (e.g., to signal expertise). This pattern holds for both public communication with
unknown others (e.g., writing a review) and private conversations with close others (e.g., texting
a friend).
Contributions
The current work makes several theoretical contributions. Most importantly, our findings
extend prior research on how consumers create WOM. While prior work has shown how people
create verbal content they share with others (Berger and Milkman 2012; Packard and Berger
2017), the current work reveals insights into how people create visual WOM. Our findings
33
demonstrate that consumers offer redundant content using both photos and words when they
create visual WOM.
We also identify individuals' focus as a factor contributing to redundant communication.
We find that consumers naturally intend to be informative and focus on the receiver. This focus
impacts how they use photos and words in their communication. As a result, consumers seem to
incur additional costs and repeat similar information redundantly. Prior work suggests that
individuals are often concerned with self-presentational goals and, thus, focus on themselves in
their WOM communication (Barasch and Berger 2014; Rocklage, Rucker, and Nordgren 2018).
Relatedly, we show that when consumers shift their focus from the receiver to themselves, their
communication becomes more efficient (less costly) and less redundant.
Our findings also relate to and extend earlier findings on how languages accommodate
the need to reduce complexity versus ambiguity (Zipf 1949). For example, a recent article found
evidence that the syntactic structure of many languages is optimized for efficiency (Hahn,
Jurafsky, and Futrell 2020). Often, redundant communication is assumed to be poor
communication. At the same time, other research in psycholinguistics suggests that in certain
contexts, people forego efficiency to be useful in their verbal communication (Degen et al.
2020). Our findings align with the latter and extend them to the broader context of visual-verbal
communication. We find that individuals offer redundant content that includes both photos and
words when they intend to inform and communicate clearly with others. Emojis, written
manifestations of nonverbal visual elements (Luangrath, Peck, and Barger 2017), may replace
certain words (e.g., a happy face can replace the word "happy"). However, when it comes to
sharing experiences, people do not seem to simply replace words with photos of their
34
experiences. Indeed, to tell their complex stories, along with photos, people seem to require
more, rather than fewer, words.
We also contribute to the literature in advertising and journalism that provided initial
investigations into the interplay between photos and words. Earlier work in the literature seems
to suggest that advertisers or journalists should avoid redundant communication. Instead, they
should share a photo that offers different information than the verbal content. In journalistic
communication, incongruence between the text and the photo could encourage readers to read
the text to make sense of the photo (McIntyre, Lough, and Manzanares 2018). In marketing,
conveying different content between modalities may encourage more elaborate processing, thus
increasing the effectiveness of the advertisement (Houston, Childers, and Heckler 1987).
Contrary to these considerations, we find that, in consumer-initiated interactions, communicators
prioritize informing the receivers over other concerns such as maintaining attention for the article
or fostering elaboration of the ad. Hence, people communicate their experiences using photos
and text that offer similar information.
Managerial Implications and Directions for Future Research
The research presented in this paper investigates the production of visual WOM. This
topic is important for marketers who aim to motivate consumers to share their experiences and
platforms that intend to generate constant consumer engagement. It has been a challenge for
consumers to decode the formula of a helpful review. On Google, the query of “how do I write a
good review” returns over 2 billion search results, suggesting consumers are eager to create
informative reviews. At the same time, platforms aim to guide their contributors to create
35
informative reviews. Our work suggests that platforms may encourage consumers to create
reviews that include both photos and text and to focus on the receivers when creating content.
Traditionally, platforms have focused on either verbal (e.g., Twitter) or visual (e.g.,
Instagram) modalities as the primary form of communication. More and more, however, people
choose to communicate both visually and verbally. For example, on Instagram, the caption
length has been steadily increasing (Chacon 2020), suggesting that people use both photos and
words to share their experiences. Still, at least procedurally, platforms often prioritize one
modality over the other. For example, on Instagram, users first pick a photo before adding text.
On Amazon, reviewers first write text and are then prompted to add a photo. These varying
practices may pose the question: Does the order of modality in which content creators convey
information impact the extent to which they offer redundant content? We tested this possibility in
a separate study (Appendix K in Supplemental Material). Our results suggest that the order of
modality of information that creators choose (i.e., text first or photos first) does not impact how
people use words and photos. Starting with either words or photos, people offer redundant
content to others. Even if both orders generate redundant visual-verbal communication, our
findings suggest platforms may want to facilitate visual-verbal communication jointly.
Consumers perusing reviews ask for visual-verbal content (Bazaarvoice 2021), and
communicators themselves deem both helpful. However, platform design (e.g., order,
prominence of text box size vs. other elements) may inadvertently favor one or the other
modality, hampering consumers' natural communication to inform the receiver.
Frequent contributors to review platforms (e.g., elite users on Yelp) seem less affected by
such platform’s design choices. They are generally more likely to include photos in their reviews
than less frequent contributors. Further, they are more inclined to offer redundant content by
36
including similar information in their text. However, salient social context cues can set norms for
information sharing on a review platform (Constant, Kiesler, and Sproull 1994). By highlighting
the contributors’ expert status or the need to earn badges, such reputational distinctions may
focus communicators on themselves rather than on the receiver. Thus, this shift in focus may
reduce the extent to which the receiver’s needs are taken into account and limit the redundant
content created, similar to what we found in study 4. As such, our findings suggest platforms
may want to highlight expertise less and want to focus reviewers on informing the receiver.
A timely problem that platforms fact is fighting fake reviews that individuals may be
incentivized to create (Mayzlin, Dover, and Chevalier 2014; He et al. 2021). Creating falsely
positive reviews may motivate consumers to repeat the same information in both photos and
words to increase the perceived validity of their messages. As such, platforms may need
additional tools to separate fake from authentic reviews intended to be informative.
Marketers have long valued WOM, and online WOM has uniquely allowed them to
quantify WOM via brand mentions. However, purely visual consumer-generated content requires
marketers to use image detection to identify their brand via logos (e.g., using marketing research
tools such as Visual Listening). Still, these methods are not as common and easily adapted as
other social listening tools. Reminding communicators about the receivers may increase
redundant content, making the tracking of WOM easier for marketers.
This paper provides a starting point to understand how people create visual WOM.
However, many interesting questions remain. For example, we focused on communication with
others who were not part of the same experience. In object identification, Rubio-Fernandez
(2016) showed that communicators often mentioned colors redundantly even when they shared
the same context with the receivers (i.e., looked at the same objects). Similarly, when the original
37
context of an experience is shared (e.g., a joint vacation), people may engage in even greater
redundancy than our results suggest. Alternatively, they may assume their shared experience
does not require redundancy, which future research may examine.
Further, we investigated drivers of visual WOM and identified communicators’ focus as
one factor that increased redundant communication. However, other factors may also affect the
extent to which communicators offer redundant content to receivers. For example, consumers
may be more likely to offer redundant content when they intend to persuade people. Those who
hold a persuasion goal may believe that showing and telling others may be more convincing than
either modality alone. Conversely, individuals’ self-enhancement goals may reduce the
redundant content. Communicators may be concerned that they could come across as too
boastful and braggy by emphasizing the same information in both photos and words. Future
research may want to investigate these and other communication goals
Technology has long been shaping social interactions. Today, people have unprecedented
opportunities to share information and interact with others through diverse communication
channels and modalities (from writing to snapping). In an age when multi-modal communication
is part of many interactions, we set out to understand an important aspect of this behavior: how
people convey experiences using photos and words. By investigating this novel phenomenon, we
extend our understanding of a vital aspect of fundamental human behavior (i.e., sharing
experiences) and how that behavior may have changed with the relatively novel ability to
communicate visually and verbally at the same time. Though many interesting questions remain,
we hope our investigation signifies a critical starting point towards understanding visual-verbal
communication.
38
CHAPTER TWO
ESSAY TWO: WORDS MEET PHOTOS: WHEN AND WHY VISUAL CONTENT
INCREASES REVIEW HELPFULNESS
Gizem Ceylan
Kristin Diehl
Davide Proserpio
ABSTRACT
Is visual-verbal word-of-mouth more persuasive? If so, do consumers find
communication more helpful when photos and text convey similar or different information? This
paper examines the effect of photo-text similarity on review helpfulness and its underlying
drivers. Using a dataset of 6.8M reviews including 3.3M photos from Yelp and applying state-of-
the-art machine learning algorithms, we quantify the similarity of the content between text and
photos. We find that it is not only the mere presence of a photo that increases helpfulness but
also the similarity between the photo content and the review text. We replicate our main findings
and examine the underlying drivers in two laboratory experiments. When the photos and text
convey similar (vs. dissimilar) information, consumers find the review more helpful because 1)
the information in the review becomes easier to process, and 2) quality inferences of the focal
attribute are heightened. However, greater similarity 3) limits the total amount of information
conveyed, reducing helpfulness. We find that drivers (1) and (2) outweigh (3). Therefore, the
39
totality of these three distinct processes allows greater photo-text similarity to heighten
persuasion. These findings provide novel insights into the persuasiveness of visual-verbal word-
of-mouth and its underlying psychological drivers.
Keywords: photos, language, natural language processing, reviews, helpfulness, word-of-mouth
40
INTRODUCTION
People learn from others about a range of things: from places to visit to products to buy.
The ability to provide and access reviews online has systematically changed this discovery
process. Instead of asking a friend or an expert agent for recommendations, today, most people
consult review platforms before eating at a restaurant or traveling to a new city. Prior research in
marketing and computer science has examined the effect of structured (e.g., star ratings) and
unstructured review characteristics (e.g., the valence of the text) as well as reviewer
characteristics (e.g., prior experience) on review helpfulness. Review helpfulness is not just
important to the person perusing the review but also critical to businesses. Helpful reviews are
more likely to affect consumers’ attitudes and behaviors, driving economic outcomes such as
sales (Ghose and Ipeirotis 2011). In addition to text, though, consumers increasingly include
visual information (photos) in their reviews. Indeed, in surveys, consumers state they particularly
value those reviews that feature user-generated visuals (Bazaarvoice 2021). In this paper, we
investigate when and how the presence of photos in a review and the content they display may
impact the helpfulness of that review.
Our specific research question is centered around the interplay between what people
communicate in text and in photos and its downstream consequences on persuasion (i.e., review
helpfulness). Review platforms often suggest that adding a photo to a review could increase that
review’s helpfulness (Schwartz 2019). However, it is unclear whether any photo can increase a
review’s helpfulness or whether some photos are more effective than others. Outside the review
context, the mere presence of a photo increases engagement with the platform (Li and Xie 2020).
Further, on platforms such as Airbnb, platform-provided photos of the property can increase
demand (Zhang et al. 2017). In this paper, we focus on consumer-generated rather than platform-
41
generated photos. Further, we examine whether receivers find communication helpful when it
includes visual and verbal content that conveys similar information. Using a large-scale data set
of 6.8M restaurant reviews that include both text and photos, we find that the presence of a photo
in a review and the presence of more photos heighten helpfulness. Moreover, using image
identification algorithms and a representation learning algorithm, Doc2Vec (Le and Mikolov
2014), we assess the extent to which the content of the photo is similar to the content of the text.
Controlling for review and reviewer characteristics, we find that greater similarity between
photos and review text heightens helpfulness. We replicate these findings in two laboratory
experiments that allow us to establish the causal link between photo-text similarity and review
helpfulness. In addition, these experiments allow us to identify and distinguish three different
reasons for why a photo can be helpful: two that favor greater similarity between the photo and
text (ease of comprehension, quality inference) and one that favors greater dissimilarity between
photo and text (amount of new information).
Jointly, this paper examines a previously unexplored aspect of online reviews and
persuasive communication (i.e., word-of-mouth) that includes visual-verbal consumer-generated
content). This paper makes several contributions. First, these findings shed light on the emerging
literature on visual-verbal word-of-mouth (WOM). We identify novel insights into when and
why some reviews that include photos are more helpful than others. Second, we contribute to the
literature on WOM in general. We identify three distinct psychological mechanisms that
collectively impact consumers helpfulness judgments. Third, we identify the similarity of the
content between photos and text as a novel antecedent that impacts a review’s helpfulness.
Fourth, we examine real consumer reviews to demonstrate the effect of photo-text similarity on
helpfulness and further conduct two laboratory experiments to causally test for the mechanism
42
underlying this effect. By levering multi-method and state-of-the-art tools available in machine
learning, we show robust evidence for our predictions. The findings we present in this paper are
important for both people and businesses who aim to create helpful content. Further, using these
findings, platforms could automatically identify helpful reviews by incorporating photo-text
similarity into their algorithms to improve review rankings.
REVIEWS AND REVIEW HELPFULNESS
Consumers often consult reviews to reduce uncertainty and aid their decision-making
processes. Earlier work on reviews suggests that reviews can have a significant impact on firms’
economic outcomes. For example, research has demonstrated an association between how
positively a product such as a book or a movie is rated by consumers and subsequent sales of the
product (Dellarocas, Zhang, and Awad 2007; Chevalier and Mayzlin 2006) and between review
volume and sales (Duan, Gu, and Whinston 2008).
Helpful reviews are more likely to influence consumers’ attitudes and behavior. As a
result, helpful reviews have a larger economic impact on firms’ sales (Ghose and Ipeirotis 2011).
What makes a review helpful? Prior research identified reviewer characteristics and aspects of
the review and review text that improve helpfulness: identity of the reviewer (Forman, Ghose,
and Wiesenfeld 2008), valence based on star ratings (Kim et al. 2006), rating extremity
(Mudambi and Schuff 2010), semantic and stylistic aspects of the review text (Ghose and
Ipeirotis 2011; Kim et al. 2006), and text readability and informativeness (Ghose and Ipeirotis
2011) can all improve helpfulness. Our paper adds to this literature by examining the effect of
photos associated with reviews on helpfulness. Specifically, we examine whether the presence of
photos and the interplay between the content of review text and photos impact helpfulness.
43
THE IMPACT OF PHOTOS ON REVIEW HELPFULNESS
With unprecedented access to smartphones, people can take and share photos of almost
everything they encounter. Indeed, people share 4.5 billion photos daily on WhatsApp and 4
billion snaps on Snapchat (Business Today 2017). Furthermore, review sites often prompt
consumers to add a photo when they share their experiences. For instance, Amazon now reminds
reviewers to add a photo claiming that shoppers find reviews that include images more helpful
than text alone. We ask whether including a photo in a review indeed increases review
helpfulness.
Prior research suggests potential reasons for why photos may be helpful. First, a photo
carries information that may be important for the consumer but hard to convey otherwise (e.g.,
the restaurant's decor). Second, photos are processed faster than words (Paivio 1969), which may
enhance the comprehension of the review text. Third, visuals evoke more intense emotional
reactions (Rossiter and Percy 1980), which may be particularly important in instances when
hedonic reactions are important. While we focus on user-generated photos and reviews, the
literature in advertising also examined the effect of photos on comprehension, recall, and
attitudes and found that photos indeed improve these outcomes (Edell and Staelin 1983). For
these reasons, we posit that the mere presence of a photo (or multiple photos) in a review should
increase its perceived helpfulness.
H1: Reviews with a photo/s in addition to text are perceived to be more helpful than reviews
with text alone.
44
THE INTERPLAY BETWEEN PHOTOS AND REVIEW TEXT
While photos per se may be helpful, we are more interested in assessing when photos
increase the helpfulness of a review more. Recent research has focused on the content of photos
and found that depicting the product user increases review helpfulness for hedonic products
while depicting the product itself (without the user) increases review helpfulness for utilitarian
products (Ding et al. 2021). While the content of photos by itself is important, consumers process
both the information in the photo and the information in the text. Hence, we suggest that the
interplay between the text and the photo, not just the photo per se, affects review helpfulness.
Understanding the interplay between text and photo content allows us to predict when visual-
verbal WOM will be more persuasive.
In a review, the text and the photo can either convey similar or different information. For
instance, consumers can depict their entrée in a photo and also write about the entrée in their
review (i.e., conveying similar information). Alternatively, they can show the entrée in a photo
but write about the atmosphere of the restaurant in review text (i.e., conveying dissimilar
information). Overall, we propose that reviews are more helpful to consumers when photo and
text convey similar information. We propose three mechanisms that can explain this effect. First,
the greater similarity between text and photos can increase helpfulness because photos can
enhance comprehension (Levin and Lesgold 1978) and help readers perceive and understand the
essence of the textual information more easily (Levie and Lentz 1982). Photos can enhance the
clarity of the information in the text and make it more concrete for the reader, ultimately
facilitating the comprehension of the textual information (Levin and Lesgold 1978). Further,
photos conveying similar information as the text may help comprehension by giving readers a
source of information they can use to verify their understanding of the text (Levie and Lentz
45
1982). Finally, dissimilar photos may create processing difficulty, which can interfere with
comprehension and reduce learning (Peeck 1993).
Second, similarity can increase helpfulness because text and photo jointly emphasize the
focal attribute and heighten the perceived quality of this attribute. In print advertising, Houston,
Childers, and Heckler (1987) found that product attribute information was recalled more
accurately when the same information was presented both in words and photos than when words
and photos conveyed different information. Further, dissimilar photos may evoke counterarguing
as to why the review writer included different information in the photo than in the text,
diminishing the message's credibility (Petty and Cacioppo 1986). We suggest that the inferred
quality of the focal attribute is greater when the review text and the photo convey similar (vs.
dissimilar) information.
Third, greater photo-text similarity can also decrease helpfulness. Helpfulness should be
affected by the total amount of information a review provides. When photo and text content is
similar, the message conveys less information than if each modality conveyed different content.
For review text, prior research found that the length of the review and the depth of information
transmitted (such as identifying pros and cons of a product or service) increase review
helpfulness (Ghose and Ipeirotis 2011). In a visual-verbal review, when the photo and the text
focus on the same aspect of an experience, this limits the total amount of information a reader
can obtain from a review. Thus, when photos and text convey similar information, this may
reduce a review’s helpfulness.
We predict that these three different processes jointly influence helpfulness. We suggest
that the advantages of photos and text conveying similar information has on comprehension and
46
quality inferences will outweigh its limitation in terms of the total amount of information
conveyed. Thus, overall, greater photo-text similarity (vs. dissimilarity) will be perceived as
more helpful. Formally:
H2: Greater similarity between the text and the photo/s in a review increases the perceived
helpfulness of a review.
H3: The effect of similarity on helpfulness is due to the positive effects of greater
comprehension ease and quality inferences, despite the negative effect of the amount of
information conveyed.
We also examine under which conditions multiple photos increase review helpfulness. If
the similarity between photos and the text is essential for review helpfulness, as we suggest
above, additional photos should increase helpfulness only to the extent that the review text
conveys similar information.
H4: Additional photos increase review helpfulness only when photos and text convey similar
topics.
OVERVIEW OF STUDIES
To ensure external and internal validity, we test our predictions in a dataset that includes
naturally occurring consumer reviews (from Yelp) and two carefully designed laboratory
experiments. Study 1 provides real-life evidence of our core prediction: photos increase
helpfulness in a review and, even more importantly, greater photo-text similarity increases
helpfulness. In study 2, we manipulate photo-text similarity and examine psychological drivers
that render reviews with greater photo-text similarity more helpful. Study 3 further manipulates
47
photo-text similarity and identifies when and why multiple photos can increase helpfulness.
STUDY 1 – GREATER PHOTO-TEXT SIMILARITY INCREASES HELPFULNESS IN
YELP REVIEWS
Data
To test our predictions, we collected Yelp reviews for the complete set of restaurants
located in Los Angeles County that received a review between 2004 and 2020. The final dataset
contained 22,678 restaurants listed on Yelp, 6.8M reviews associated with 3.28M photos written
by 1.96M reviewers. For every review, we obtained its text, star rating, number of useful votes,
and all the photos that the reviewer uploaded along with the review if any. For every reviewer,
we obtained their location (at the city level that Yelp provides), whether the reviewer had Yelp
elite status, and the total number of reviews the reviewer had written so far.
Measuring Review-Photos Similarity
We created a measure of similarity between the content of the text and the photos in two
steps. First, we used Google Cloud Platform Vision API and the “Detect Labels” function. The
Vision API can detect and extract information about entities in an image across a broad group of
categories. Labels can identify general objects, locations, activities, animal species, products, and
more.
2
We provide a few examples of images and labels extracted in Table 1.
2
For more information, see: https://cloud.google.com/vision/docs/labels
48
TABLE 2.1 STUDY 1 – EXAMPLES OF PHOTOS AND LABELS EXTRACTED USING
GOOGLE VISION API.
Next, we applied Doc2Vec – a representation learning algorithm that converts text
documents to low-dimensional vectors – to obtain vectors for both reviews and photo labels (Le
and Mikolov 2014). An important property of these document-vectors is that they preserve
Image Labels
Food, Fried egg, Egg yolk, Ingredient,
Tableware, Recipe, Fast food, Staple food,
Cuisine, Egg white
Food, Ingredient, Recipe, Al dente, Cuisine,
Dish, Pasta, Noodle, Produce, Bigoli
Tableware, Drinkware, Stemware, Wine
glass, Wine, Table, Barware, Fluid, Alcoholic
beverage, Champagne stemware
49
semantic information about the text content such that documents close together in the vector
space have similar meanings and documents distant from each other in the vector space have
differing meanings. In line with prior research, we measured semantic similarity between
reviews and photo labels by computing the cosine similarity between their vectors. The
maximum similarity between the vector of the review text and the vector of the photo label is
indicated by cos𝜃 = 1 and maximum dissimilarity is indicated by cos𝜃 = -1.
We trained Doc2Vec using 80 % of the entire corpus of Yelp reviews and image labels
extracted from the photos associated with each review. Since a review can be associated with
multiple photos, we create one “photos document” for each review by concatenating the labels
extracted from each photo associated with the same review. Additionally, there are a few
parameters that need to be set in this type of analysis: the size of the vectors, which is the number
of dimensions of the vectors; the window size, i.e., the maximum distance (in words) between
the current and the predicted word within a document; and the number of iterations carried out
over the training corpus. After testing different configurations and assessing each model by
computing its ability to find similar documents, we set these values to 64 vector dimensions, a
window size of 8 words, and 120 iterations because this configuration behaved slightly better
than all others.
3
We provide a visual description of our approach in figure 1 and examples of
reviews with high and low photo-text similarity in table 2.
3
Across all models we tested, we found that inferred documents are found to be most similar to themselves in >
95% of the cases suggesting that the models behaved in a consistent manner. In assessing Doc2Vec models we
followed procedures outlined at https://radimrehurek.com/gensim/auto_examples/tutorials/run_doc2vec_lee.html
50
FIGURE 2.1 STUDY 1 – VISUAL DEPICTION OF SIMILARITY ASSESSMENT
BETWEEN REVIEW TEXT AND IMAGES IN A REVIEW
TABLE 2.2 STUDY 1 – EXAMPLES OF REVIEWS WITH HIGH AND LOW SIMILARITY
BETWEEN THE PHOTO AND THE TEXT.
Photo Text
High Similarity
(similarity score =
0.70)
Delicious food. Great
service. Highly
recommend. We had
the chicken tikka
masala and chicken
vindaloo. Both were
great. The garlic naan
bread was delicious as
well!
51
Measuring Image Quality
Besides controlling for observable variables such as review and reviewer characteristics,
we also controlled for image quality for the photos in our dataset. We measured image quality by
using the Neural Image Assessment (NIMA) model described in Talebi and Milanfar (2018),
which uses Convolutional Neural Networks to compute a measure of both aesthetic and technical
image quality.
4
Higher values for these variables imply higher estimated quality.
4
We use the implementation and pre-trained model available at: https://idealo.github.io/image-quality-
assessment/#datasets
Low Similarity
(similarity score =
-0.20)
Happy hour is
FABULOUS at RFD!
Jackfruit street tacos,
Pinot Grigio for 10
dollars! Happy little
Friday (Thursday) to
us! I'm coming back for
Taco Tuesday!
52
Descriptive Statistics
Uploading photos in addition to the review text became popular in the last decade, likely
due to the diffusion of smartphones. This is visible in figure 2, where we plot, in the left panel,
the number of monthly reviews and in the right panel the number the monthly number of photos
posted on Yelp.
5
Out of the 6.8M reviews, about 1.33M are associated with at least one photo,
and among these reviews, the average number of images per review is 2.46.
FIGURE 2.2 STUDY 1 – NUMBER OF MONTHLY YELP REVIEWS (LEFT) AND
MONTHLY NUMBER OF PHOTOS POSTED (RIGHT).
As figure 3 shows, it is more likely for photos to be associated with reviews with positive
ratings (3 stars or above) rather than negative ratings (1 or 2 stars). Turning to the similarity
5
In addition to showing the popularity of photos, Figure 2 shows the effect that COVID19 had on the number of
reviews and photos uploaded to Yelp. All results presented in the empirical analysis section are robust to the
exclusion of the year 2020.
0
25000
50000
75000
Jan 2005 Jan 2010 Jan 2015 Jan 2020
Monthly Reviews
10000
20000
30000
40000
50000
Jan 2005 Jan 2010 Jan 2015 Jan 2020
Monthly Photos
53
between review text and photos, in figure 4, we plot the distribution of this variable. The mean
similarity is 0.21 (95% CI [0.2131, 0.2135]), suggesting that, on average, there is some overlap
between the content of the review text and what is displayed in the photos. Finally, the average
number of useful votes per review is 1.1, and this number doubles when considering only
reviews with at least one image. Table 3 reports the dataset summary statistics.
FIGURE 2.3 STUDY 1 – FRACTION OF REVIEWS WITH AT LEAST ONE PHOTO BY
STAR RATING (LEFT) AND AVERAGE NUMBER OF PHOTOS PER REVIEW BY STAR-
RATING (RIGHT).
0.00
0.05
0.10
0.15
0.20
1 2 3 4 5
S t a r − rating
P(photo>0)
0.0
0.2
0.4
0.6
1 2 3 4 5
S t a r − rating
Avg. Photos
54
FIGURE 2.4 STUDY 1 – DENSITY OF THE SIMILARITY SCORES BETWEEN
REVIEWS AND PHOTOS. THE RED DASHED LINE REPRESENT THE MEAN
0
1
2
3
− 0.4 0.0 0.4 0.8
Similarity
Density
55
TABLE 2.3 STUDY 1 – MEAN (STANDARD DEVIATION) OF THE YELP
DATASET VARIABLES
Model and Results
We started by assessing the effect of the presence of photos in a review on the number of
helpful votes that the review received in order to test H1. We did so by estimating the following
model:
log 𝐻𝑒𝑙𝑝𝑓𝑢𝑙 𝑉𝑜𝑡𝑒 𝑠 𝑖 𝑗𝑡
= β has Photos
ijt
+𝑋 𝑖𝑗𝑡
′
𝛾 + 𝛼 𝑗 + 𝜏 𝑡 + 𝜖 𝑖𝑗𝑡
(1)
where the dependent variable was the log of the number of helpful votes of review i of restaurant
56
j received at year-month t.
6
Has photos, the variable of interest, is a binary indicator of whether
the review i is associated with any photos (1) or not (0); 𝑋 𝑖𝑗𝑡
′
is a vector of time-varying controls
in which we included the rating of the review, the log of review length (in characters), whether
the reviewer is local (i.e., whether the reviewer wrote a review for a restaurant that is in the same
city as the reviewer’s location), whether the reviewer had elite status, and the log of the number
of reviews written by the reviewer of review i. Finally, we included restaurant and year-month
fixed effects to account for time-invariant unobservable restaurant characteristics and time-
varying shocks (e.g., COVID19) common to all restaurants that can affect the number of helpful
votes a review receives. Because we include restaurant fixed effects, our specification exploits
within restaurant variation to estimate the effect of a review, including at least one photo on the
helpful votes it receives. We estimated Equation 1 using OLS and, following standard practice
(Bertrand, Duflo, and Mullainathan 2004), clustering standard errors at the restaurant level. We
reported the estimates in Table 4.
6
Every time we take the log of a variable, we add one to avoid taking the log of zero.
57
TABLE 2.4 STUDY 1 – THE EFFECT OF HAVING PHOTOS ON HELPFUL VOTES.
Without any time-varying controls, the coefficient of interest has photos was positive and
significant, suggesting that adding photos to reviews increases the review's helpful votes, as
predicted by H1. The results were qualitatively similar when we included the wide array of
controls discussed above. Using these estimates, we found that reviews with at least one photo
experienced, on average, a 19% (1 – exp(0.175)) increase in helpful votes compared with
reviews without photos.
58
Of course, reviews with photos could be different from reviews without photos, and these
differences, if not captured by the controls included in Equation 1, could drive the observed
effect of photos on helpful votes. To reduce these concerns, we investigated the effect of having
photos on helpful votes, conditional on having at least one photo.
7
To do so, we limited the
dataset to those reviews with at least one photo and estimated the following model:
log 𝐻𝑒𝑙𝑝𝑓𝑢𝑙 𝑉𝑜𝑡𝑒 𝑠 𝑖𝑗𝑡
= β log Photos
ijt
+𝑋 𝑖𝑗𝑡
′
𝛾 + 𝛼 𝑗 + 𝜏 𝑡 + 𝜖 𝑖 𝑗𝑡
, (2)
where everything is as in Equation 1, but the main independent variable is the log of the number
of photos associated with review i. In addition, we controlled for the average quality of the
photos associated with the review by computing the average of the technical quality scores we
obtained using the Neural Image Assessment proposed by Hossein and Milanfar (2018).
8
As
before, we estimated Equation 2 via OLS and clustering standard errors at the restaurant level.
We report these results in table 5, without and with controls. With these controls, the estimates
suggest that a 1% increase in the number of photos is associated with a 0.15% increase in the
number of helpful votes. Again, these results point to a positive relationship between including
photos in a review and the review's helpful votes.
Next, we estimate the effect of similarity between review text and photos on helpful
votes. We did so by re-estimating Equation 2 but including the similarity score between review i
and the photos associated with this review. We reported these results in column 3 of table 5. We
observe that the coefficient of the variable similarity is positive and significant, suggesting that
7
In Appendix A, we show that results are similar when including all reviews and interacting the variables that exist
only for reviews with photos with the variable has Photos.
8
In all analyses reported in the paper, we control for the technical quality measure. However, we obtain similar
results controlling for the aesthetic quality measure. We avoid including both quality measures because they are
extremely correlated (r=0.99).
59
increasing similarity by 1 percentage point increases the helpful votes by 0.13%.
TABLE 2.5 STUDY 1 – THE EFFECT OF THE NUMBER OF PHOTOS AND SIMILARITY
ON HELPFUL VOTES
60
Robustness Checks
Reviewer FE. So far, we relied on comparing helpfulness votes across reviews written
for the same restaurants, in this way controlling unobserved restaurants heterogeneity. However,
one could argue that reviewers’ heterogeneity could be driving the results discussed above.
While we control for reviewers’ experience by including in the model whether the reviewer is an
elite reviewer, the number of reviews written by the reviewer, and whether the reviewer is local,
it is still possible that some reviewers are just better than others at creating helpful reviews and
that these reviews are more likely to be associated with photos. To reduce this concern, we re-
estimated Equation 2 but replaced the restaurant fixed effect with the reviewer fixed effect.
Doing so, we limited our analysis to reviewers with at least two reviews. We report these results
in table 6. Overall, the results are consistent with those previously reported in table 5, suggesting
that unobserved reviewer heterogeneity is unlikely to drive our results.
61
TABLE 2.6 STUDY 1 – ROBUSTNESS CHECK: REVIEWER FE
Review topics. One may also wonder whether some reviews are more helpful as a
function of the topics discussed in the review. We assessed the robustness of the results
presented over and above the topics discussed in the review. We estimated review topics using
the Latent Dirichlet Allocation (LDA) algorithm discussed in Blei et al. (2003). To estimate the
review topics, we rely on the parallelized Latent Dirichlet Allocation algorithm provided by the
Python Gensim library. We set the parameter 𝛼 to 1/𝐾 (where K is the number of topics) and the
parameter 𝜂 to ‘auto’ (i.e., the model learns the asymmetric prior from the data). We varied the
number of topics k between 5 and 8 and retained the model with the highest coherence score (k =
62
5, Syed and Spruit, 2017).
9
We then included the topic weights of each review as controls in
Equation 2. We report these results in table 7. We continue to observe positive and significant
effects of photos and similarity, suggesting that the topics discussed in the review do not drive
the results reported in table 5.
TABLE 2.7 STUDY 1 – ROBUSTNESS CHECK: INCLUDING TOPICS
9
Results are not sensitive to the inclusion of more topics.
63
Discussion
Overall, the results presented so far suggest that including photos in addition to the
review text increases the helpfulness of the review (H1) and, more importantly, that choosing
photos that more closely relate to the content of the review further increases the helpfulness of
that review (H2). To reduce endogeneity concerns and establish a causal link between similarity
and helpfulness, in the next section, we replicate these findings and study the drivers behind the
effects we observe using two laboratory experiments.
STUDY 2 – GREATER PHOTO-TEXT SIMILARITY CAUSES GREATER HELPFULNESS
Study 2 had two objectives. First, we wanted to experimentally measure the causal effects
of the mere presence of a photo (H1) and the similarity between review text and photo on
helpfulness (H2). Second, we wanted to examine the psychological processes underlying this
effect (H3). We propose that the effect of greater similarity on helpfulness is mediated by the
totality of three distinct processes: (1) the information becomes more concrete and easier to
process and (2) evaluations of focal attributes are heightened (both (1) and (2) affecting
helpfulness positively), and (3) there is limited new information (affecting helpfulness
negatively).
Method
Participants and exclusions. As preregistered (#62601), we recruited 450 U.S.
participants on Amazon’s Mechanical Turk (Mage = 40.65; 52.2% female). We did not exclude
anyone from the study.
64
Procedure. We randomly assigned participants to one of three conditions (similar-photo,
dissimilar-photo, no-photo) in a between-subjects design. All participants read the same review
text about the latte art design a coffee shop offered on its coffees (for stimuli, see Supplemental
Material Appendix A). In the similar-photo condition, participants saw a photo of the latte art. In
the dissimilar-photo condition, participants saw a photo of an avocado toast. Finally, in the no-
photo condition, participants did not see a photo. After reading the review, participants rated the
review’s helpfulness, usefulness, and value of review on a 9-point scale with higher scores
corresponding to greater helpfulness, usefulness, and value of the review (1 = not at all, 9 =
very). As these items cohered well (a = .98), we averaged them and created a composite
perceived helpfulness score. Next, we used three items to assess inferred quality of the focal
attribute (i.e., visual appeal of the coffee) on a 5-point scale (1 = not at all, 5 = very much; a =
.87): (1) The look of the coffee would be appealing to me, (2) the coffee would look striking, (3)
the way the coffee looks would be attractive. In the photo conditions (i.e., similar- and
dissimilar-photo), we also assessed comprehension ease and amount of new information. We
used three items to assess comprehension ease on a 7-point scale (1 = not at all, 7 = very much; a
= .96): (1) The photo makes the writer's experience very concrete, (2) the photo helps me process
information faster, (3) the photo communicates the gist of the experience. We also used three
items to assess the amount of new information on a 7-point scale (1 = not at all, 7 = very much; a
= .93): (1) The photo provides evidence on a different aspect of the coffee shop than the text, (2)
the photo documents a different aspect of the coffee shop than what was discussed in the text, (3)
The text and photo provide information on different aspects of the coffee shop.
Discriminant validity was assessed using the criterion suggested by Fornell and Larcker
(1981). According to this criterion, to establish discriminant validity between two constructs, the
65
average variance extracted (AVE) of two constructs must be greater than the variance shared by
the two (i.e., the squared correlation between constructs). This condition was met by all variable
pairs, establishing discriminant validity between all potential mediators and the dependent
variable (see Supplemental Material Appendix B).
Results
Perceived helpfulness. To test H1, we compared the effect of the two photo conditions
(similar and dissimilar) to that of the no-photo condition as. As predicted in H1, review
helpfulness was greater in the photo conditions (Many-photo = 5.63, SD = 2.55) than in the no-
photo condition (Mno-photo = 4.87, SD = 2.59), F(1, 448) = 8.82, p = .003. Further, we also found
a significant effect of photo type on helpfulness, F(2, 447) = 11.23, p < .001 (see figure 5).
Supporting H2, participants in the similar-photo condition (Msimilar = 6.06, SD = 2.47) rated the
review more helpful than those in the dissimilar-photo condition (Mdissimilar = 5.05, SD = 2.44;
t(447) = -3.53, p < .001) and in the no-photo condition (Mnopic = 4.77, SD = 2.54; t(447) = -4.60,
p < .001). There was no significant difference between the dissimilar-photo and the no-photo
conditions (t(447) = .97, p = .33).
66
FIGURE 2.5 STUDY 2 – HELPFULNESS BY CONDITION
Quality inference. Turning to the underlying processes, as predicted, we found a
significant effect of condition on quality inference, F(2, 447) = 4.51, p = .01 (see figure 6).
Specifically, participants in the similar-photo (Msimilar = 4.43, SD = .75) expected the visual
appeal of the coffee to be higher quality than those in the dissimilar-photo condition (Mdissimilar =
4.16, SD = .88; tsimilar(447) = -2.99, p = .003. Also, participants’ expectations were marginally
higher in the no-photo condition (Mnopic = 4.32, SD = .76) than in the dissimilar-photo condition
(tnopic(447) = -1.72, p = .09). There was no significant difference between the similar and the no-
photo conditions (t(447) = -1.26, p = .21).
67
FIGURE 2.6 STUDY 2 – INFERRED QUALITY BY CONDITION
Comprehension ease. Focusing on comprehension as part of the underlying process, as
predicted, a similar photo (Msimilar = 5.90, SD = 1.29) was rated as making the textual
information easier to comprehend than a dissimilar photo (Mdissimilar = 2.73, SD = 1.84; t(447) = -
17.35, p < .001; see figure 7A).
Amount of new information. Finally, focusing on the amount of new information as part
of the underlying process, as predicted, a similar photo (Msimilar = 2.77, SD = 1.93) was more
limited in the amount of new information it conveyed, compared to the dissimilar photo
(Mdissimilar = 6.08, SD = 1.20; t(447) = 17.89, p < .001; see figure 7B).
68
FIGURE 2.7 STUDY 2 – COMPREHENSION EASE (PANEL A) AND AMOUNT OF
NEW INFORMATION (PANEL B) BY PHOTO CONDITION
A. COMPREHENSION EASE B. AMOUNT OF NEW
INFORMATION
Mediation. To test H3, we estimated a mediation model using Model 4 (using 10,000
bootstrap samples; Hayes and Preacher 2014). We focused on the two photo conditions (N =
361), with photo condition (similar = 1 vs. dissimilar = 0) as the independent variable and
perceived helpfulness score as the dependent variable (see figure 8). We examined three parallel
mediators: inferred quality, comprehension ease, and amount of new information. Supporting
H3, we found that all three processes significantly mediated the effect of photo type on perceived
helpfulness in the predicted direction (Bindirect-quality = .27, SE = .10, 95% confidence interval [CI]
69
= [.09, .49]; Bindirect-comprehension = 1.73, SE = .27, 95% [CI] = [1.22, 2.29]; Bindirect-newinformation = -
.98, SE = .22, 95% [CI] = [-1.43, -.55]; see figure 8).
FIGURE 2.8 STUDY 2 – MEDIATION MODEL WITH 3 PARALLEL MEDIATORS
Discussion
Study 2 provided further support for the results obtained in study 1 and allowed us to
establish a causal link between photo-text similarity and the perceived helpfulness of the review.
As predicted, greater photo-text similarity leads to greater helpfulness. However, a dissimilar
photo did not significantly improve helpfulness compared to a review without a photo. While
study 1 and also findings from this study suggested that adding a photo in general helps,
additional analyses provide a more nuanced perspective, i.e., adding similar photos helps more
than adding dissimilar photos. In addition, study 2 allowed us to understand the different
Picture Condition
(Similar = 1,
Dissimilar = -1)
Comprehension
Ease
Perceived
Helpfulness
3.17***
.54***
ab = 1.73***
Amount of New
Information
Inferred
Quality
-3.31***
.27**
.30***
.98***
ab = .27**
ab = -.98**
c' = -.08
70
processes through which greater photo-text similarity heightens helpfulness. As predicted, the
effect of photo-text similarity is due to three distinct processes: the positive effect of both
comprehension ease and inferred quality and the negative effect of the limited amount of new
information that greater photo-text similarity can convey. Jointly, these three processes
contribute to greater photo-text similarity leading to be greater helpfulness.
STUDY 3 – THE EFFECT OF MULTIPLE PHOTOS ON HELPFULNESS
In study 1, we found that more photos increase helpfulness. If the similarity between text
and photo is critical, as shown in study 2, this should also apply to situations involving multiple
photos. Hence, one would predict that more photos only increase helpfulness if both photos'
contents align with the review text. Such an increase in helpfulness should be because greater
similarity between all photos and the text allows for easier comprehension, even though no new
information is provided. We test this (H4) hypothesis in this study.
Method
Participants and exclusions. In line with our pre-registration (#64569), we recruited
1,040 U.S. participants on Amazon’s Mechanical Turk (Mage = 38.39 years, SD = 12.45 years;
57.0% female). We did not exclude anyone from the study.
Procedure. In a 2 photos (one vs. two) x 2 topics (one vs. two) between-subjects design,
we randomly assigned participants to one of four conditions. The one-photo one-topic condition
was equivalent to our similar-photo condition in study 2. Participants read the review text about
the latte art the coffee shop offered and saw a latte art photo. In the two-photos one-topic
71
condition, participants read the same review text, but this time saw two photos: one depicting the
latte art and another one depicting an avocado toast (for stimuli, see Supplemental Material
Appendix C). The avocado toast photo was the same as the dissimilar photo in study 2 and
reflects that, in real-life, when multiple photos are present, such photos generally are not repeats
of each other but convey different information from each other. Participants in the one-photo
two-topics condition read a review text discussing both the coffee shop’s latte art and the
avocado toast but saw only the latte art photo. Finally, participants in the two-photos two-topics
condition read the review text about both the latte art and the avocado toast and saw both the
latte art and the avocado toast photo. After examining the review (text and photo/s) for at least
five seconds, participants rated the review’s helpfulness (a = .98), comprehension ease (a = .86),
quality inference of the visual appeal of the coffee (a = .85), and amount of new information (a =
.84) using the same items as in study 2. To assess Fornell and Larcker’s (1981) criterion for
discriminant validity, we confirmed that each of the four measures is discriminable from one
another (see Supplemental Material Appendix D). For completeness, we also measured and
examined the impact of our manipulations on quality inference. However, since all conditions
included a photo that showed the latte art, quality perceptions of the visual appeal should not and
indeed did not vary significantly between conditions, nor did they mediate helpfulness (see
Supplemental Material Appendix E).
Results
Comprehension ease. An ANOVA on ease of comprehension revealed a main effect of
number of photos (F(1, 1037) = 8.50, p = .004) and a topic by photo interaction (F(1, 1037) =
72
9.68, p = .002; see figure 9A). When the review text included only one topic, comprehension
ease did not differ between the two-photos (one of which was unrelated) and the one photo
condition (M2photos = 5.96, SD = 1.11, M1photo = 5.97, SD = 1.19; F(1, 518) = .02, p = .89).
However, as predicted, when the review included two topics, two-photos (both conveying
information also conveyed in the text) heightened comprehension compared to the one photo
condition (M2photos = 6.26, SD = .91; M1photo = 5.84, SD = 1.22; F(1, 519) = 19.49, p < .001). See
figure 9A.
Amount of new information. An ANOVA on the amount of new information also
revealed a main effect of number photos (F(1, 1037) = 38.10, p < .001) and a topic by photo
interaction (F(1, 1037) = 43.50, p < .001; see figure 9B). In this case, when the review included
only one topic, two-photos (M2photos = 3.64, SD = 1.16) provided more new information than one
photo (M1photo = 2.29, SD = 1.66; F(1, 518) = 83.32, p < .001). Receivers were able to obtain
new information from the additional (unrelated) photo that was not conveyed in the text. When
the review included two topics, however, two-photos (M2photos = 2.89, SD = 1.77) did not provide
addition information compared to the one-photo condition (M1photo = 2.94, SD = 1.66, (F(1, 519)
= .09, p = .76).
73
FIGURE 2.9 STUDY 3 – COMPREHENSION EASE (PANEL A) AND AMOUNT OF NEW
INFORMATION (PANEL B) AS A FUNCTION OF NUMBER OF PHOTOS X NUMBER OF
TOPICS
A: COMPREHENSION EASE B: AMOUNT OF NEW INFORMATION
Perceived helpfulness. As shown above, the number of photos affects the two focal
processes (i.e., comprehension ease and amount of information) differently. As the effect on
helpfulness depends on the relative strength of these two underlying processes, we did not
necessarily expect the interaction between the number of photos and the number of topics to be
significant and hence preregistered planned comparisons. Indeed, an ANOVA of helpfulness
74
revealed a main effect of number of topics (F(1, 1037) = 56.82, p < .001), a main effect of
number of photos (F(1, 1037) = 4.60, p = .03), but not a significant interaction (F(1, 1037) =
2.44, p = .12). As more information is helpful, reviews with two-topics (M2topics = 7.12, SD =
2.07) were more helpful than reviews with one-topic (M1topic = 6.07, SD = 2.41). Similarly,
reviews with two-photos (M2photos = 6.74, SD = 2.25) were more helpful than reviews with one-
photo (M1photo = 6.45, SD = 2.35), in line with our findings in study 1. The planned comparisons
revealed that, like earlier analyses for comprehension ease, when the review included one topic,
the number of photos did not impact review helpfulness (see figure 10). However, as predicted
and supporting H4, when the review text included two-topics, two-photos increased helpfulness
(M2photos = 7.33, SD = 1.88) compared to one photo (M1photo = 6.83, SD = 2.18; F(1, 519) = 8.01,
p = .005).
FIGURE 2.10 STUDY 3 – HELPFULNESS AS A FUNCTION OF THE NUMBER OF
PHOTOS AND NUMBER OF TOPICS
75
Mediation. To test H3, we conducted a moderated mediation analysis using Model 8
(using 10,000 bootstrap samples; Hayes 2015). We entered number of photos as the independent
variable (-1 = one-photo, 1 = two-photos) and overall helpfulness as the dependent variable. We
treated comprehension ease and amount of new information as parallel mediators. We identified
number of topics as the moderator (-1 = one-topics, 1 = two-topics) as it is predicted to change
the relationship between number of photos and the mediating processes. Two significant indices
of moderated mediation indicated that the mediation depended on the number of topics (Bindirect-
comprehension= .22, SE = .07, 95% [CI] = [.08, .36]; Bindirect-newinformation = -.17, SE = .03, 95% [CI] =
[-.24, -.11]; Model 8). Comprehension ease mediated the effect of multiple photos on helpfulness
when there were two-topics (Bindirect = .22, SE = .05, 95% [CI] = [.12, .31]), but not when there
was only one-topic (Bindirect = -.01, SE = .05, 95% [CI] = [-.11, .10]). Amount of new
information, however, mediated the effect of multiple photos on helpfulness when there was one-
topic (Bindirect = .17, SE = .03, 95% [CI] = [.12, .23], but not when there were two-topics (Bindirect
= -.01, SE = .02, 95% [CI] = [-.04, .03]).
Discussion
Similar to study 1, we find that more photos increase helpfulness. However, study 3 also
adds a more nuanced perspective. Paralleling the findings of study 2 that examined the effect of a
single photo, our findings in study 3 suggest that it is not only the presence of multiple photos
that increases helpfulness, but that greater similarity between multiple photos and the review text
increases helpfulness. When the content of the photos is similar to the content of the text, two
photos help receivers comprehend the textual information more easily even though these photos
76
do not provide any new information. As with study 2, comprehension ease seems to be more
central in influencing helpfulness than the amount of new information (or lack thereof). As a
result, multiple photos increase helpfulness only when they convey similar information to the
review text.
GENERAL DISCUSSION
This paper examines the role of photos in persuasive communication. Our findings from
naturally occurring reviews on Yelp suggest that photos increase helpfulness, but more so when
they convey similar content as the text. Since the content of reviews and photos is necessarily
self-selected, we validate these findings experimentally. Our findings provide a more nuanced
understanding of when and why more similar photos may be more persuasive in consumer-
generated communication. We show that photos that are more similar to the content of the
review text increase helpfulness of that review. Greater similarity can heighten the inferred
quality of a focal attribute and help the reader comprehend the essence of the review more easily.
While similar photos and text are limited in the amount of new information they convey to the
reader, the totality of these three processes leads to a more similar photo-text combination,
increasing review helpfulness.
Given the prevalence and importance of photos in online reviews, these findings provide
several theoretical contributions. We add to the online WOM literature by examining visual
WOM. Prior literature has focused on structured (e.g., star rating) and unstructured components
(e.g., textual content) of WOM. However, consumers frequently use visuals alongside verbal
information in their communication. Hence, expanding our understanding of visual WOM is both
77
timely and important. Further, in addition to the mere presence of photos, we examine the
interplay between the text and the photos and how they jointly affect review helpfulness.
Notably, we identify photo-text similarity as an important determinant of perceived helpfulness.
Relatedly, we also contribute to the literature examining visuals in marketing
communication in general. In the advertising literature, the effect of visual information is
predominantly studied through small-sample laboratory experiments (e.g., Edell and Staelin
1983). However, this literature generally lagged in analyzing visual-verbal information in
naturally occurring settings (e.g., on online platforms) and typically focused only on a single
aspect of the visuals (e.g., shape, Lutz and Lutz 1978; size, Rossiter and Percy 1980), rather than
the interplay between words and visuals. Our approach using machine learning to parse visual
content and to connect visual to verbal content may also be used to examine the visual-verbal
interplay in the advertising context. Our findings may provide insights into how to improve
advertisements' effectiveness and conduct more focused and theoretically grounded randomized
control trials (e.g., Gordon et al. 2019).
Further, we identify and document three distinct processes through which visual WOM
can affect persuasion. As such, our theoretical account makes a novel contribution in uncovering
unexamined drivers of review helpfulness. Identifying these processes also contributes to
existing literatures on information processing, imagery, and linguistics. Prior research in
linguistics shows that repeating verbal information (e.g., providing different verbal accounts of
the same event) increases comprehension and learning (Berlyne 1970). We add to this literature
by suggesting that repeating the same information in different modalities (i.e., in words and
photos) provides promotes comprehension. In addition, prior research in information processing
showed that message repetition leads to a greater liking for the focal object through repeated
78
processing (Petty and Cacioppo 1986). Similarly, we show that in messages that include text and
photos, the repetition of the same focal attribute in words and photos may lead to a more
favorable evaluation of the focal attribute.
Lastly, we extend the rare examination of photo-text consistency on memory to
helpfulness judgment. Houston, Childers, and Heckler (1987) predicted that photo-text
inconsistency (not consistency) should lead to greater elaboration and increase attribute recall.
However, they found the opposite, namely that recall of the focal attribute is superior in
consistent (vs. inconsistent) ads. We contribute to their findings and suggest that photo-text
similarity increases attribute evaluations and subsequently review helpfulness through the
processing of the focal attribute in two modalities.
Managerial Implications and Future Research
Our paper has clear implications for product websites and review platforms. First, our
findings can guide platforms on ranking more helpful reviews without needing to wait for
reviews to accumulate helpful votes, therefore increasing customers’ experience. Today,
consumers have access to a myriad of reviews. However, consumers can only process a fraction
of these. To overcome the processing burden on consumers, platforms try new ways to identify
the most helpful reviews in an automated manner. With our findings on what drives review
helpfulness, we offer relatively easy to compute metrics: the presence of photos and the
similarity between visual and verbal content. These metrics can help platforms identify reviews
more likely to be helpful and rank them accordingly even when explicit helpfulness ratings are
yet absent.
Further, our findings allow websites to provide guidance regarding the type of photos that
79
may be more helpful and that they want customers to upload. Many review platforms now
encourage review writers to add photos to their reviews likely because consumers state they find
consumer-provided photos very helpful (Bazaarvoice 2021). However, our findings suggest that
not any photo will necessarily be helpful and hence that platforms should guide reviewers
towards the type of photo that may be most helpful to readers. Our findings suggest that photos
that convey similar information as the review text facilitate comprehension and increase review
helpfulness. Hence, websites may want to prompt reviewers to upload photos that convey similar
information as the review text or to write a review that refers to the photo content.
These findings also give rise to several interesting and important questions that could be
the focus of future work. While we focus on static visuals (i.e., photos), consumers also
increasingly share videos related to their experiences. Videos provide a short slice of one’s
experience. Thus, on the one hand, they may be more self-explanatory and easier to understand
than photos that often only show a single moment in the experience. On the other hand, many
videos also include verbal explanations by the sharer, and close captioning of verbal explanations
heightens engagement (Vrountas 2020), a possible indication that even videos may benefit from
visual-verbal similarity. Future research could address whether the greater similarity between a
video and either the review text or any in-video words also increase a review’s helpfulness.
We examine consumers’ content choices in settings where consumers share their
opinions about restaurant experiences. Recent research in the context of product reviews finds
that consumers are less likely to rely on reviews for experiential purchases than for material
purchases (Dai, Chan, and Mogilner 2020). This effect is driven by beliefs that reviews for
experiential purchases are more idiosyncratic and less reflective of the purchase’s objective
quality for experiences than for material goods. Extrapolating from these findings for material
80
purchases, repeating the same information in the text and the photos could be even more
important for experiences than for experiential material goods. By the same token, if the product
is essentially the same for each reviewer, as is the case for many material goods (e.g.,
electronics), this information is typically already available on the website. Repeating the same
product-related information in the text and the photo may be overly redundant. Thus, it may
reduce the helpfulness of (later) reviews to include such photos, which future research may
examine.
Our paper examines situations when consumers share a photo of their experiences. These
situations may naturally include particularly those experiences that are visually interesting and
atypical. However, visuals may also be helpful in situations where other sensory inputs are more
important. For example, Latour and Deighton (2019) find that creating visuals of taste
experiences (i.e., wine) allows people to be transported more into their own taste experience. It is
possible that learning about an experience by reading a visual-verbal review may also increase
transportation. Hence, the similarity effect we found in our research may extend to non-visual
dimensions and experiences, which can be addressed in future research.
People increasingly communicate using visual information. In 2021, people will have
taken 1.4 trillion photos of their experiences (Carrington 2020). Based on these ongoing trends,
visual information will continue to be central to the WOM consumers will generate. Our paper is
one of the first to examine the role of photos in online WOM, and we hope our findings will
open many novel avenues for future research on visual-verbal WOM.
81
REFERENCES
Barasch, Alixandra, and Jonah Berger. 2014. “Broadcasting and Narrowcasting: How Audience
Size Affects What People Share.” Journal of Marketing Research 51 (3): 286–99.
https://doi.org/10.1509/jmr.13.0238.
Barasch, Alixandra, Gal Zauberman, and Kristin Diehl. 2018. “How the Intention to Share Can
Undermine Enjoyment: Photo-Taking Goals and Evaluation of Experiences.” Journal of
Consumer Research 44 (6): 1220–37. https://doi.org/10.1093/jcr/ucx112.
Bazaarvoice. 2021. “The 101 on UGC | Bazaarvoice.” Bazaarvoice. 2021.
https://www.bazaarvoice.com/resources/the-101-on-ugc/.
Berger, Jonah, Ashlee Humphreys, Stephan Ludwig, Wendy W Moe, Oded Netzer, and David A
Schweidel. 2020. “Uniting the Tribes: Using Text for Marketing Insight.” Journal of
Marketing 84 (1): 1–25. https://doi.org/10.1177/0022242919873106.
Berger, Jonah, and Katherine L. Milkman. 2012. “What Makes Online Content Viral?” Journal
of Marketing Research 49 (2): 192–205. https://doi.org/10.1509/jmr.10.0353.
Berlyne, D. E. 1970. “Novelty, Complexity, and Hedonic Value.” Perception & Psychophysics 8
(5): 279–86. https://doi.org/10.3758/BF03212593.
Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. 2004. “How Much Should We
Trust Differences-in-Differences Estimates?” Quarterly Journal of Economics 119 (1):
249–75. https://doi.org/10.2139/ssrn.288970.
Bhatia, Sudeep, Russell Richie, and Wanling Zou. 2019. “Distributed Semantic Representations
for Modeling Human Judgment.” Current Opinion in Behavioral Sciences 29: 31–36.
https://doi.org/10.1016/j.cobeha.2019.01.020.
Blei, David M, Andrew Y Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.”
Journal of Machine Learning Research 3 (4–5): 993–1022. https://doi.org/10.1016/b978-0-
12-411519-4.00006-9.
Business Today. 2017. “WhatsApp Users Share 55 Billion Texts, 4.5 Billion Photos, 1 Billion
Videos Daily.” Business Today, 2017.
Carrington, David. 2020. “How Many Photos Will Be Taken in 2020.” Mylio, 2020.
https://blog.mylio.com/how-many-photos-will-be-taken-in-2021/.
Cavanaugh, Lisa A., Francesca Gino, and Gavan J. Fitzsimons. 2015. “When Doing Good Is Bad
in Gift Giving: Mis-Predicting Appreciation of Socially Responsible Gifts.” Organizational
Behavior and Human Decision Processes 131: 178–89.
https://doi.org/10.1016/j.obhdp.2015.07.002.
Chacon, Benjamin. 2020. “This Is the Best Instagram Caption Length in 2021 - Later Blog.”
82
Chevalier, Judith A., and Dina Mayzlin. 2006. “The Effect of Word of Mouth on Sales: Online
Book Reviews.” Journal of Marketing Research 43 (3): 345–54.
https://doi.org/10.1509/jmkr.43.3.345.
Clark, Herbert. 1992. Arenas of Language Use. Arenas of Language Use.
https://doi.org/10.1177/002383099403700209.
Clark, Herbert H. 1996. “Using Language.” In First Language Acquisition, 315–16.
https://doi.org/10.1017/cbo9781316534175.014.
Constant, David, Sara Kiesler, and Lee Sproull. 1994. “What’s Mine Is Ours, or Is It? A Study of
Attitudes about Information Sharing.” Information Systems Research 5 (4): 400–421.
https://doi.org/10.1287/ISRE.5.4.400.
Dai, Hengchen, Cindy Chan, and Cassie Mogilner. 2020. “People Rely Less on Consumer
Reviews for Experiential than Material Purchases.” Journal of Consumer Research 46 (6):
1052–75. https://doi.org/10.1093/jcr/ucz042.
Davenport, Thomas H., and John C. Beck. 2001. “The Attention Economy.” Ubiquity 2001
(May): 1. https://doi.org/10.1145/376625.376626.
Degen, Judith, Robert D. Hawkins, Caroline Graf, Elisa Kreiss, and Noah D. Goodman. 2020.
“When Redundancy Is Useful: A Bayesian Approach to ‘Overinformative’ Referring
Expressions.” Psychological Review 127 (4): 591–621. https://doi.org/10.1037/rev0000186.
Dellarocas, Chrysanthos, Xiaoquan Zhang, and Neveen F. Awad. 2007. “Exploring the Value of
Online Product Reviews in Forecasting Sales: The Case of Motion Pictures.” Journal of
Interactive Marketing 21 (4): 23–45. https://doi.org/10.1002/dir.20087.
Ding, Mengqi, Shirley Chen, Xin Wang, and Neil Bendle. 2021. “Show Me You or The Goods ?
Effect of Image Content on Review Helpfulness.”
Duan, Wenjing, Bin Gu, and Andrew B. Whinston. 2008. “The Dynamics of Online Word-of-
Mouth and Product Sales-An Empirical Investigation of the Movie Industry.” Journal of
Retailing 84 (2): 233–42. https://doi.org/10.1016/j.jretai.2008.04.005.
Edell, Julie A., and Richard Staelin. 1983. “The Information Processing of Pictures in Print
Advertisements.” Journal of Consumer Research 10 (1): 45.
https://doi.org/10.1086/208944.
Forman, Chris, Anindya Ghose, and Batia Wiesenfeld. 2008. “Examining the Relationship
Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic
Markets.” Information Systems Research 19 (3): 291–313.
https://doi.org/10.1287/isre.1080.0193.
Frenzen, Jonathan, and Kent Nakamoto. 1993. “Structure, Cooperation, and the Flow of Market
Information.” Journal of Consumer Research 20 (3): 360. https://doi.org/10.1086/209355.
83
Gaure, Simen. 2013. “OLS with Multiple High Dimensional Category Variables.”
Computational Statistics & Data Analysis 66 (October): 8–18.
https://doi.org/10.1016/j.csda.2013.03.024.
Ghose, Anindya, and Panagiotis G. Ipeirotis. 2011. “Estimating the Helpfulness and Economic
Impact of Product Reviews: Mining Text and Reviewer Characteristics.” IEEE
Transactions on Knowledge and Data Engineering 23 (10): 1498–1512.
https://doi.org/10.1109/TKDE.2010.188.
Goodman, Noah D., and Michael C. Frank. 2016. “Pragmatic Language Interpretation as
Probabilistic Inference.” Trends in Cognitive Sciences 20 (11): 818–29.
https://doi.org/10.1016/J.TICS.2016.08.005.
Gordon, Brett R, Florian Zettelmeyer, Neha Bhargava, and Dan Chapsky. 2019. “A Comparison
of Approaches to Advertising Measurement: Evidence from Big Field Experiments at
Facebook.” Marketing Science 38 (2): 193–205. https://doi.org/10.1287/mksc.2018.1135.
Grice, H. 1975. Logic and Conversation. Syntax and Semantics 3: Speech Arts.
Hahn, Michael, Dan Jurafsky, and Richard Futrell. 2020. “Universals of Word Order Reflect
Optimization of Grammars for Efficient Communication.” PNAS 117 (5): 2347–53.
https://doi.org/10.1073/pnas.1910923117/-/DCSupplemental.y.
Hayes, Andrew F. 2015. “An Index and Test of Linear Moderated Mediation.” Multivariate
Behavioral Research 50 (1): 1–22. https://doi.org/10.1080/00273171.2014.962683.
Hayes, Andrew F., and Kristopher J. Preacher. 2014. “Statistical Mediation Analysis with a
Multicategorical Independent Variable.” British Journal of Mathematical and Statistical
Psychology 67 (3): 451–70. https://doi.org/10.1111/BMSP.12028.
He, S, B Hollenbeck, D Proserpio - Available at SSRN, and Undefined 2020. 2021. “The Market
for Fake Reviews.” Papers.Ssrn.Com.
Houston, Michael J., Terry L. Childers, and Susan E. Heckler. 1987. “Picture-Word Consistency
and the Elaborative Processing of Advertisements.” Journal of Marketing Research 24 (4):
359–69. https://doi.org/10.1177/002224378702400403.
Humphreys, Ashlee, and Rebecca Jen Hui Wang. 2018. “Automated Text Analysis for Consumer
Research.” Journal of Consumer Research 44 (6): 1274–1306.
https://doi.org/10.1093/jcr/ucx104.
Kim, Soo Min, Patrick Pantel, Tim Chklovski, and Marco Pennacchiotti. 2006. “Automatically
Assessing Review Helpfulness.” In 2006 Conference on Empirical Methods in Natural
Language Processing, Proceedings of the Conference, 423–30.
https://doi.org/10.3115/1610075.1610135.
Latour, Kathryn A., and John A. Deighton. 2019. “Learning to Become a Taste Expert.” Journal
of Consumer Research 46 (1): 1–19. https://doi.org/10.1093/jcr/ucy054.
84
Le, Quoc, and Tomas Mikolov. 2014. “Distributed Representations of Sentences and
Documents.” In 31st International Conference on Machine Learning, ICML 2014, 4:2931–
39. http://proceedings.mlr.press/v32/le14.html.
Leonardi, Paul M, Tsedal B Neeley, and Elizabeth M Gerber. 2012. “How Managers Use
Multiple Media: Discrepant Events, Power, and Timing in Redundant Communication.”
Organization Science 23 (1): 98–117. https://doi.org/10.1287/orsc.1110.0638.
Levie, W. Howard, and Richard Lentz. 1982. “Effects of Text Illustrations: A Review of
Research.” Educational Communication & Technology 30 (4): 195–232.
https://doi.org/10.1007/BF02765184.
Levin, Joel R., and Alan M. Lesgold. 1978. “On Pictures in Prose.” Educational Communication
and Technology 26 (3): 233–43. https://doi.org/10.1007/BF02766607.
Li, Yiyi, and Ying Xie. 2020. “Is a Picture Worth a Thousand Words? An Empirical Study of
Image Content and Social Media Engagement.” Journal of Marketing Research 57 (1): 1–
19. https://doi.org/10.1177/0022243719881113.
Littlepage, Glenn E., Greg W. Schmidt, Eric W. Whisler, and Alan G. Frost. 1995. “An Input-
Process-Output Analysis of Influence and Performance in Problem-Solving Groups.”
Journal of Personality and Social Psychology 69 (5): 877–89. https://doi.org/10.1037/0022-
3514.69.5.877.
Luangrath, Andrea Webb, Joann Peck, and Victor A. Barger. 2017. “Textual Paralanguage and
Its Implications for Marketing Communications.” Journal of Consumer Psychology 27 (1):
98–107. https://doi.org/10.1016/j.jcps.2016.05.002.
Lutz, Kathy A., and Richard J. Lutz. 1978. “Imagery-Eliciting Strategies: Review and
Implications of Research.” In Advances in Consumer Research, 611–20.
Mayzlin, Dina, Yaniv Dover, and Judith Chevalier. 2014. “Promotional Reviews: An Empirical
Investigation of Online Review Manipulation.” American Economic Review 104 (8): 2421–
55. https://doi.org/10.1257/AER.104.8.2421.
McIntyre, Karen, Kyser Lough, and Keyris Manzanares. 2018. “Solutions in the Shadows: The
Effects of Photo and Text Congruency in Solutions Journalism News Stories.” Journalism
and Mass Communication Quarterly 95 (4): 971–89.
https://doi.org/10.1177/1077699018767643.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Distributed Representations
of Words and Phrases and Their Compositionality.” Advances in Neural Information
Processing Systems, 3111–19.
Moore, Sarah G., and Katherine C. Lafreniere. 2020. “How Online Word‐of‐mouth Impacts
Receivers.” Consumer Psychology Review 3 (1): 34–59. https://doi.org/10.1002/arcp.1055.
Mudambi, Susan M., and David Schuff. 2010. “Research Note: What Makes a Helpful Online
Review? A Study of Customer Reviews on Amazon.Com.” MIS Quarterly 34 (1): 185.
https://doi.org/10.2307/20721420.
85
Omnicore. 2020. “Facebook by the Numbers (2020): Stats, Demographics & Fun Facts.” 2020.
https://www.omnicoreagency.com/facebook-statistics/.
———. 2021. “Snapchat by the Numbers (2021): Stats, Demographics & Fun Facts.” 2021.
https://www.omnicoreagency.com/snapchat-statistics/.
Packard, Grant, and Jonah Berger. 2017. “How Language Shapes Word of Mouth’s Impact.”
Journal of Marketing Research 54 (4): 572–88. https://doi.org/10.1509/jmr.15.0248.
Paivio, Allan. 1969. “Mental Imagery in Associative Learning and Memory.” Psychological
Review 76 (3): 241. https://doi.org/10.1037/h0027272.
Partan, Sarah, and P Marler. 1999. “Communication Goes Multimodal.” Science 283 (5406):
1272–73. https://doi.org/10.1126/science.283.5406.1272.
Peeck, Joan. 1993. “Increasing Picture Effects in Learning from Illustrated Text.” Learning and
Instruction 3 (3): 227–38. https://doi.org/10.1016/0959-4752(93)90006-L.
Pennebaker, James W., Matthias R. Mehl, and Kate G. Niederhoffer. 2003. “Psychological
Aspects of Natural Language Use: Our Words, Our Selves.” Annual Review of Psychology
54 (1): 547–77. https://doi.org/10.1146/annurev.psych.54.101601.145041.
Perkins, David N, and Chris Unger. 1994. “A New Look in Representations for Mathematics and
Science Learning.” Instructional Science 22 (1): 1–37. https://doi.org/10.1007/BF00889521.
Petty, Richard E., and John T. Cacioppo. 1986. The Elaboration Likelihood Model of
Persuasion. In L. Berkowitz (Ed.), Advances in Experimental Social Psychology (Vol. 19,
Pp. 123-205). Communication and Persuasion.
Rocklage, Matthew D., Derek D. Rucker, and Loran F. Nordgren. 2018. “Persuasion, Emotion,
and Language: The Intent to Persuade Transforms Language via Emotionality.”
Psychological Science 29 (5): 749–60. https://doi.org/10.1177/0956797617744797.
Rossiter, John R., and Larry Percy. 1980. “Attitude Change through Visual Imagery in
Advertising.” Journal of Advertising 9 (2): 10–16.
https://doi.org/10.1080/00913367.1980.10673313.
Rubio-Fernández, Paula. 2016. “How Redundant Are Redundant Color Adjectives? An
Efficiency-Based Analysis of Color Overspecification.” Frontiers in Psychology 7
(February): 153. https://doi.org/10.3389/fpsyg.2016.00153.
Schwartz, Ali. 2019. “Why It’s Important to Upload Quality Photos - Yelp.” Yelp. 2019.
https://blog.yelp.com/2019/07/businesses-upload-photos-yelp.
Schwarz, Norbert. 1994. “Judgment in a Social Context: Biases, Shortcomings, and the Logic of
Conversation.” Advances in Experimental Social Psychology 26 (C): 123–62.
https://doi.org/10.1016/S0065-2601(08)60153-7.
86
Sonnenschein, Susan, and Grover J. Whitehurst. 1982. “The Effects of Redundant
Communications on the Behavior of Listeners: Does a Picture Need a Thousand Words?”
Journal of Psycholinguistic Research 11 (2): 115–25. https://doi.org/10.1007/BF01068215.
Sperber, D, and D Wilson. 1986. Relevance: Communication and Cognition. 142nd ed.
Cambridge, MA: Harvard University Press.
Talebi, Hossein, and Peyman Milanfar. 2018. “NIMA: Neural Image Assessment.” IEEE
Transactions on Image Processing 27 (8): 3998–4011.
https://doi.org/10.1109/TIP.2018.2831899.
Vrountas, Ted. 2020. “How Closed Captioning Facebook Videos Can Improve Viewership.”
2020. https://instapage.com/blog/closed-captioning-mute-videos.
Wilson, Deirdre. 1993. “Relevance and Understanding.” Pragmalinguistica, no. 1: 335–66.
Yarkoni, Tal. 2010. “Personality in 100,000 Words: A Large-Scale Analysis of Personality and
Word Use among Bloggers.” Journal of Research in Personality 44 (3): 363–73.
https://doi.org/10.1016/j.jrp.2010.04.001.
Zhang, Shunyuan, Dokyun Lee, Param Vir Singh, and Kannan Srinivasan. 2017. “How Much Is
an Image Worth? Airbnb Property Demand Analytics Leveraging A Scalable Image
Classification Algorithm.” SSRN Electronic Journal.
Zipf, George Kingsley. 1949. “Human Behavior and the Principle of Least Effort: An
Introduction to Human Ecology.” Language 26 (3): 394. https://doi.org/10.2307/409735.
87
APPENDICES FOR CHAPTER ONE
Appendix 1.1 – Study 1: Stimulus
Figure S1 Stimulus used in Study 1
88
Appendix 1.2 – Study 1: Goal Manipulation
Table S1.2. Three goal conditions in Study 1
Redundancy Goal Using the photo and additional text, you want to
convey your experience to your friend with the goal
to communicate thoroughly. What this means is that
you describe your experience thoroughly, possibly
repeating some of the information that the photo
already communicates.
Efficiency Goal Using the photo and additional text, you want to
convey your experience to your friend with the goal
to communicate efficiently. What this means is that
you describe your experience efficiently, not
repeating any information that the photo already
communicates.
Natural Goal Using the photo and additional text, you want to
convey your experience to your friend.
89
Appendix 1.3 – Study 1: Custom Dictionary
We created a custom dictionary. Starting with the see words category in LIWC, we added words
(highlighted in red) that can describe or modify the specific object (i.e., donut) in the photo.
aesthetic, aesthetically, aesthetics, appear, appearance, appeared, appearing, appears, art,
beautiful, beautify, beauty, belt, black, blind, blonde, blue, bottom, bridge, bright, brighter,
brightest, brightness, brown, candle, circle, clear, cloud, clouds, cloudy, color, colored, colorful,
coloring, colors, colour, colourful, colours, creative, cute, dark, darken, darker, darkest, darkness,
decorate, decorated, decorating, decoration, decorations, decorative, depict, design, designs,
disappear, disappeared, disappearing, disappears, display, dollop, dye, eye, eying, gaz*, glanc*,
gloom, gloomier, gloomily, gloominess, gloomy, glow, graphic, gray, green, grey, hazy, hidden,
image, light, lights, lit, look, looked, looker, looking, looks, monitor, multicolor, multicolored,
muted, orange, outlook, photo, photograph, photos, pic, photo, photos, pink, pop, popping, pops,
portrayal, present, presentation, presents, pretty, purple, rainbow, rectang*, red, redden, reveal,
rope, round, rounder, roundest, saw, scan, scan*, scans, screen, search, searched, searches,
searching, see, seeing, seen, sees, selfie, shadow, sheen, shine, shining, shiny, show, showed,
showing, shows, sight, sky, sought, sprinkle, sprinkles, square, stare, staring, strip, structure,
sunli*, sunnier, sunniest, sunny, sunshin*, top, triang*, vibrant, vid, video, view, viewer,
viewing, views, violet, visible, visibly, vivid, vividly, watch, watched, watcher, watches,
watchful, watching, white, yellow.
90
Appendix 1.4 – Study 2: Yelp and TripAdvisor Datasets
Yelp Dataset. Our analysis investigates reviews from Yelp, a popular consumer review platform.
Yelp includes more than 210 million reviews for local businesses such as restaurants, bars, hair
salons, and many other services. At the time of data collection, Yelp received approximately 61.8
million unique visitors via desktop computers and 76.7 million unique visitors from mobile users
on a monthly average basis (Tankovska, 2019). In this analysis, we focused on restaurant
reviews in the Los Angeles area. We included in our dataset every Yelp review written for
restaurants in Los Angeles from the founding of Yelp in 2004 through 2019. In total, our dataset
contains 681,526 reviews for 1,699 restaurants. 20.7% of reviews included at least one photo.
TripAdvisor Dataset. Our analysis also investigates reviews from TripAdvisor, the world's
largest travel platform (Similarweb 2021). TripAdvisor includes more than 860 million reviews
on 8.7 million accommodations, restaurants, experiences, airlines, and cruises. At the time of
data collection, TripAdvisor received approximately 463 million unique visitors in a month. In
this analysis, we focused on restaurant reviews in the Los Angeles area. We included in our
dataset every TripAdvisor review written in English for restaurants in Los Angeles from 2004
through 2019. In total, our dataset contains 205,754 reviews for 7,142 restaurants. 22.1% of the
reviews included at least one photo.
Review controls. We extracted the following information about each review: (1) Star rating, (2)
Number of words, (3) Device type (dummy coded as mobile = 1, otherwise = 0; available only
on TripAdvisor), (4) Reviewer status (dummy coded as elite = 1, otherwise = 0; available only
on Yelp), (5) Number of total reviews the reviewer has written, (6) Restaurant ID, and (7) date of
the review (available only on TripAdvisor). The distribution of each variable is presented in
figures S2 and S3.
91
Figure S1.4a. Distributions of variables in Yelp dataset
92
Fig. S1.4b. Distributions of variables in TripAdvisor dataset
93
Appendix 1.5 – LIWC See Words Category
Linguistic Inquiry and Word Count (LIWC) is a language analysis program commonly used to
study relationships between language and psychological variables (Pennebaker et al., 2003;
Yarkoni, 2010). LIWC includes 33 semantic word categories. To test our hypothesis, we used the
see words category that contains the following words/word stems:
appear, appearance*, appeared, appearing, appears, beautiful, beautify, beauty, black, blind*,
blonde, blue, bright, brighter, brightest, brightness, brown, candle*, circle, clear, color*,
colour*, dark, darken, darker, darkest, darkness, depict, disappear, disappeared, disappearing,
disappears, display, eye, eying, gaz*, glanc*, gloom, gloomier, gloomily, gloominess, gloomy,
glow*, graphic*, gray, green, grey, hazy, hidden, image*, light, lights, lit, look, looked, looker*,
looking, looks, monitor, orange, photo, photograph, photos, pic, photo*, pink, purple, rectang*,
red, redden, reveal, searches, searching, see, seeing, seen, sees, selfie*, shadow, shine, shini*,
shiny, show, showed, showing, shows, sight*, sought, square, stare*, staring, sunli*, sunnier,
sunniest, sunny, sunshin*, triang*, vid, video*, view, viewer*, viewing*, views, violet, visible,
visibly, vivid*, watch, watched, watcher, watches, watchful, watching, white, yellow.
94
Appendix 1.6 – Visual Depiction of Relationship between Semantic Similarity and Photo
Sharing
Figure S1.6a. Evidence of redundancy: Greater semantic similarity and photo sharing are
positively correlated independent of review rating (Yelp dataset)
Figure S1.6b. Evidence of redundancy: Greater semantic similarity and photo sharing are
positively correlated independent of review rating (TripAdvisor dataset)
95
Appendix 1.7 – Study 3: Stimuli
High Visual Complexity (Rainbow-decorated donut)
Low Visual Complexity (Chocolate-glazed donut)
96
Appendix 1.8 – Study 3: Results Without Controlling For Star Rating
Review and photo helpfulness. Participants rated their review text as marginally more
helpful in the more complex (M = 4.97, SD = 1.35) versus less complex condition (M = 4.55, SD
= 1.39); = 0.21, t(200) = 2.16, p = .03. They also found that adding a photo would be more
helpful in the more complex (M = 5.94, SD = 1.51) versus less complex (M = 4.64, SD = 1.93)
condition; = 0.65, t(200) = 5.30, p < .001.
Sharing a photo. We expected that with greater visual complexity, more participants
would share a photo with their text. We regressed the likelihood of sharing a photo (1 = yes, 0 =
no) on condition using logistic regression and controlled for star rating. As expected, a greater
percentage of participants in the complex condition (93%) indicated that they would include a
photo with the review compared to those in the simple condition (61%), = 2.12, z = 4.81, p <
.001. Still, even in the less complex condition, the majority of participants shared a photo,
suggesting that in today’s world photos are generally important in telling a story about one’s
experience.
Redundancy. We examined whether participants who shared a photo (N = 155, 76% of
original sample) also included visual words in their text, i.e., whether they offered redundant
content. With this objective, we regressed visual content words in the review on condition,
controlling for star ratings. This analysis revealed a significant effect of the condition; = 3.77,
t(200) = 5.00, p < .001. Even though everybody in the sample shared a photo, those in the more
complex condition created greater redundancy by referring to visual aspects more (M = 3.92, SD
= 5.96) than in the less complex condition (M = 0.14, SD = 0.77).
97
Appendix 1.9 – Study 4: Goal Manipulation
Table S1.9. Three goal conditions in Study 4
Redundancy Goal Using the photo and additional text, you want to
convey your experience to your friend with the goal
to communicate thoroughly. What this means is that
you describe your experience thoroughly, possibly
repeating some of the information that the photo
already communicates.
Efficiency Goal Using the photo and additional text, you want to
convey your experience to your friend with the goal
to communicate efficiently. What this means is that
you describe your experience efficiently, not
repeating any information that the photo already
communicates.
No Goal Using the photo and additional text, you want to
convey your experience to your friend.
98
Appendix1.10 – Study 4: Pretest
We conducted a pretest to ensure each goal manipulation (information-goal and expertise-goal)
induced the intended goal.
Methods
Participants. This study followed a two-group between-subjects design. We posted the
study on Amazon's Mechanical Turk for 150 respondents with a target sample of 75 respondents
per condition. We did not exclude any participant and the final sample included 150 respondents
(48.6% female; Mage = 38.4, SDage = 10.4). Participants were paid $0.5. Participation was
restricted to those over the age of 18 and located in the U.S.
Procedure. All participants imagined seeing an interesting donut at a new donut shop and
communicating this experience to a friend. They saw a picture of the donut (same as in study 4
and constant across conditions). They imagined sending this picture along with some text in a
text message to their friend. In the information-goal condition, we told participants that their goal
was to provide information about new openings in the neighborhood and to keep their friend up
to date. In the expertise-goal condition, we told participants that their goal was to show their
expertise in finding out about new openings in the neighborhood, and to impress their friend with
their discovery. Next, participants answered three questions that aimed to measure their self-
focused concern about their own credibility on a 7-point scale (1 = not at all, 7 = very much): 1)
"To what extent are you concerned that your friend will find your message credible?", 2) "To
what extent are you concerned that your friend will find your evaluation compelling?, and 3) "To
what extent are you concerned that your friend will find your evaluation of the donut
informative?". Participants lastly reported their gender and age.
Results
As intended, participants reported greater self-focused concern regarding their credibility in the
expertise-goal (M = 5.72) than in the information-goal condition (M = 4.51), b = 1.20, t(148) =
2.79, p = .006. Participants also reported greater concern on whether their friend would find their
evaluation compelling in the expertise-goal (M = 6.31) than in the information-goal condition (M
= 4.83), b = 1.48, t(148) = 3.87, p < .001. Finally, participants further reported greater concern
on whether their friend would find their evaluation informative in the expertise-goal (M = 6.36)
than in the information-goal condition (M = 5.20), b = 1.16, t(148) = 2.96, p = .004. Based on
these results, we concluded that our expertise manipulation focused participants more on
themselves and their own message than the receiver.
99
Appendix 1.11 – Study Mentioned in the General Discussion
Methods
In the data from Yelp and TripAdvisor, we found consistent evidence that people convey their
experiences using both visual words and pictures concurrently, engaging in redundancy.
However, one may argue that our findings could be driven by platform design. Many platforms
(e.g., Yelp and TripAdvisor) prompt users to first write about their experiences and then prompt
users to include pictures. Our studies followed a similar sequence. However, other platforms like
Instagram or Snapchat prioritize pictures over text, i.e., people are prompted to first select a
picture before they can add text. Next, we examined whether the order in which people write
about the experience versus choose the picture changes people’s preference for redundancy. We
examined this question in the context of a slightly longer one-to-one email message recounting a
self-chosen experiences (a past restaurant experience the respondent had).
Participants. This study followed a two-group between-subjects design. We recruited 360
undergraduate participants (48.1% female; Mage = 19.7, SDage = 1.3) from the departmental
subject pool. We did not exclude anyone from the study. All participants were compensated by
course credit. This study was approved by the University’s IRB.
Procedure. In this experiment, we manipulated the order in which participants conveyed
their experiences using words and pictures. All participants imagined telling a friend about a past
restaurant experience they had via email (i.e., they recounted their own experience). In the text-
first condition, participants first composed an email about their experience and then choose a
picture to send along with the email. When they wrote the email, participants did not know that
they would also choose a picture. This order was similar to that of study 3 in the main text and
the secondary data. In the picture-first condition, however, participants first chose a picture and
then were prompted to compose an email about their experience. When they chose the picture,
participants did not know that they would also write an email. In order to test for a preference for
redundancy, we examined the frequency of visual words as the dependent variable measured via
the LIWC see word category. Regardless of condition, participants were told to write around 100
words. In line with this instruction, participants used a similar number of words to describe their
experience in both conditions (Mtext-first = 95, Mpicture-first = 94.3; ns).
Results
We examined the effect of order (text-first, picture first) on frequency of visual words. If our
previous findings were driven solely by the order in which text and pictures were elicited, we
would find fewer visual words in the picture first condition if pictures replaced words (i.e., the
opposite of redundancy). Instead, we found that participants used visual words with greater
frequency in the picture-first (M = 0.7) than in the text-first condition (M = 0.4), = -0.24,
t(358) = -2.96, p = .003. This finding provided further evidence that words and picture are used
to communicate similar information, resulting in redundancy. Further it suggests that this
relationship does not depend on whether the communication starts with words or with pictures.
100
APPENDICES FOR CHAPTER TWO
Appendix 2.1 – Study 2: Stimulus
Review Text. Best latte art ever! They elevated the usual latte art to an exceptional level. Upon
getting served, my eyes were delighted with the most beautiful foam art on my latte. They
delicately drew a colorful flower with the foam. This flower had yellow petals with a little pink
in the center. Each center petal was connected to larger turquoise and blue petals. I enjoyed my
experience at this coffee shop and will definitely come back here.
Similar Condition.
Dissimilar Condition.
101
Appendix 2.2 – Study 2: Analysis of Discriminant Validity
Confirmatory factor analyses including the items measuring review helpfulness and,
separately, those measuring quality inferences, comprehension ease, and amount of new
information support discriminant validity between mediators and dependent variable as well as
between the mediators. The Fornell-Larcker (1981) criterion for discriminant validity requires
the average variance extracted (AVE) of both constructs to be greater than the squared
correlation between the two constructs.
Between Dependent Variable and Mediators
The AVE is 0.95 for helpfulness and 0.69 for quality inference and the squared
correlation between helpfulness and quality inference is 0.22, meeting this criterion and
suggesting discriminant validity of the measures.
The AVE is 0.95 for helpfulness and 0.90 for comprehension ease and the squared
correlation between helpfulness and comprehension ease is 0.19, meeting this criterion and
suggesting discriminant validity of the measures.
The AVE is 0.95 for helpfulness and 0.82 for amount of new information and the squared
correlation between helpfulness and amount of new information is 0.0001, meeting this criterion
and suggesting discriminant validity of the measures.
Between Mediators
The AVE is 0.69 for quality inference and 0.90 for comprehension ease and the squared
correlation between quality inference and comprehension ease is 0.30, meeting this criterion and
suggesting discriminant validity of the measures.
The AVE is 0.69 for quality inference and 0.82 for amount of new information and the
squared correlation between quality inference and comprehension ease is is 0.01, meeting this
criterion and suggesting discriminant validity of the measures.
102
The AVE is 0.90 for comprehension ease and 0.82 for amount of new information the
squared correlation between comprehension ease and quality inference is 0.34, meeting this
criterion and suggesting discriminant validity of the measures.
103
Appendix 2.3 – Study 3: Stimulus
Review Text in One-Topic Condition
Review about Meno's Coffee:
Best latte art ever! This place elevated the usual latte art to an exceptional level. Upon getting
served, my eyes were delighted with the most beautiful foam art on my latte. With the foam, they
delicately drew a colorful flower with yellow petals that had a dark red in the center. Each center
petal was brilliantly connected to larger turquoise and blue petals. I enjoyed my experience at
this coffee shop and will definitely come back here.
Review Text in Two-Topics Condition
Review about Meno's Coffee:
Best latte art ever! My eyes were delighted with the most beautiful foam art on my latte. With
the foam, they delicately drew a colorful flower with yellow petals that had a dark red in the
center. I also ordered the smashed avocado toast topped with fresh radish and served on a slice of
crusty sourdough bread! The whole thing looked exceptionally beautiful. I enjoyed my
experience at this coffee shop and will definitely come back here.
Photos Shown in One-Picture Condition
104
Photos Shown in Two-Pictures Conditions
105
Appendix 2.4 – Study 3: Analysis of Discriminant Validity
Confirmatory factor analyses including the items measuring review helpfulness and,
separately, those measuring quality inference, comprehension ease, and amount of new support
discriminant validity between mediators and dependent variable and between the mediators. The
Fornell-Larcker (1981) criterion for discriminant validity requires the average variance extracted
(AVE) of both constructs to be greater than the squared correlation between the two constructs.
To test discriminate validity, we conducted similar analyses as for Study 2.
Between Dependent Variable and Mediators
The AVE is 0.94 for helpfulness and 0.68 for quality inference and the squared
correlation between helpfulness and quality inference is 0.22, meeting this criterion and
suggesting discriminant validity of the measures.
The AVE is 0.94 for helpfulness and 0.67 for comprehension ease and the squared
correlation between helpfulness and comprehension ease is 0.32, meeting this criterion and
suggesting discriminant validity of the measures.
The AVE is 0.94 for helpfulness and 0.65 for amount of new information and the squared
correlation between helpfulness and amount of new information is 0.03, meeting this criterion
and suggesting discriminant validity of the measures.
Between Mediators
The AVE is 0.68 for quality inference and 0.67 for comprehension ease and the squared
correlation between helpfulness and quality inference is 0.29, meeting this criterion and
suggesting discriminant validity of the measures.
The AVE is 0.68 for quality inference and 0.65 for amount of new information and the
squared correlation between helpfulness and quality inference is 0.004, meeting this criterion and
suggesting discriminant validity of the measures.
106
The AVE is 0.67 for comprehension ease and 0.65 for amount of new information the
squared correlation between comprehension ease and quality inference is 0.0004, meeting this
criterion and suggesting discriminant validity of the measures.
107
Appendix 2.5 – Study 3: Results on Quality Inference
The result revealed a marginally significant effect of picture condition F(1, 1037) = 2.98,
p = .09, M1photo = 4.46, SD = .76, M2photos = 4.54, SD = .68), a non-significant effect of topic
condition (F(1, 1037) = 1.94, p = .16), and a marginally significant topic by picture interaction
(F(1, 1037) = 3.35, p = .07). When the review text included only one topic, two-photos (one of
which was unrelated; M2photos = 4.47, SD = .75) did not create any differences in quality
inference compared to one-photo (M1photo = 4.47, SD = .74; b = .005, t(518) = .07, p = .94).
When the review included two topics, two-photos (both related to the text; M2photos = 4.61, SD =
.60) led to a greater quality inference than one-photo (M1photo = 4.45, SD = .77; b = .16, t(519) =
2.62, p = .009). However, quality inference did not mediate the process between photo type and
perceived helpfulness (bindex-quality = .07, BootSE = .04, 95% [BootCI] = [-.01, .15], likely because
each condition included a photo of the focal attribute (i.e., latte art).
Abstract (if available)
Abstract
Human information sharing is both idiosyncratic and pervasive. Especially with the advent of camera phones, people constantly take photos of their experiences and share these photos in one-on-one settings or on social platforms. People often show their experience in the photo and explain it further in the caption. In my dissertation, I examine antecedents and consequences of sharing one’s experiences in photos and words. In the first essay, I examine how consumers use photos and words to share their experiences with others. Consumers may use photos to substitute words to communicate efficiently (i.e., “a picture is worth a thousand words”) or use both photos and text to emphasize certain information by repeating it (i.e., “show and tell”). Using computational text analyses of two large natural datasets of restaurant reviews (Yelp and TripAdvisor) and conducting four tightly controlled lab experiments, I find that people’s perspectives influence how they use visual and verbal information. When people take on others’ perspective and focus on the usefulness of their information for them, they offer similar information in both photos and text. In contrast, when they fail to take others’ perspective and focus on themselves instead (e.g., signaling their expertise), people use visual and verbal information as substitutes. This finding holds when people communicate publicly with unknown others (e.g., writing a review) and privately with close others (e.g., texting a friend). In the second essay of my dissertation, I focus on the receiver side of visual-verbal communication and examine how the similarity between visual and verbal content influences receivers. Specifically, I examine whether receivers find communication helpful when it includes visual and verbal content that conveys similar information. From an information theory perspective, dissimilar information in the photo and in the text provides overall more information. Receivers should find more information helpful as it alleviates more uncertainty. From a processing perspective, communicating similar information in photo and in text can help the receiver process the writer’s experience more easily. Using computational text and photo analyses of a large dataset (Yelp) combined with controlled lab experiments, I find that when photos and text convey similar (vs. dissimilar) information, (1) the information becomes more concrete and easier to process, and (2) creates more positive quality inferences regarding the focal attribute. At the same time, receivers recognize that this similarity also (3) limits the amount of new information conveyed. Through the totality of these three distinct processes, greater similarity (vs. dissimilarity) between photos and text increases the review’s helpfulness.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Savoring future experiences: antecedents and effects on evaluations of consumption experiences
PDF
Strategic audience partitioning: antecedents and consequences
PDF
Consumers' subjective knowledge influences evaluative extremity and product differentiation
PDF
How sequels seduce: consumers' affective expectations for entertainment experiences
PDF
Disclosure of changes in taste: implications for companies and consumers
PDF
The effects of curiosity-evoking events on consumption enjoyment
PDF
How product designs and presentations influence consumers’ post-acquisition decisions
PDF
Intuitions of beauty and truth: what is easy on the mind is beautiful and true
PDF
Essays on the unintended consequences of digital platform designs
PDF
Essays on understanding consumer contribution behaviors in the context of crowdfunding
PDF
Essays on the role of entry strategy and quality strategy in market and consumer response
PDF
Metacognitive experiences in judgments of truth and risk
PDF
Novelty versus familiarity: divergent effects of low predictability and low personal influence
PDF
The effects of a customer's comparative processing with positive and negative information on product choice
PDF
Marketing strategies with superior information on consumer preferences
PDF
Essays on improving human interactions with humans, algorithms, and technologies for better healthcare outcomes
PDF
The role of individual variability in tests of functional hearing
PDF
Essays on information design for online retailers and social networks
PDF
Sentenced an interactive exploration of compassionate communication through a black lens
PDF
Building and validating computational models of emotional expressivity in a natural social task
Asset Metadata
Creator
Ceylan, Gizem
(author)
Core Title
Creation and influence of visual verbal communication: antecedents and consequences of photo-text similarity in consumer-generated communication
School
Marshall School of Business
Degree
Doctor of Philosophy
Degree Program
Business Administration
Degree Conferral Date
2022-05
Publication Date
04/19/2022
Defense Date
03/02/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
consumer behavior,Consumers,experiences,experimental design,Grice,helpfulness,Language,Marketing,natural language processing,norms,OAI-PMH Harvest,photos,reviews,text,visual WOM,visuals,WOM,word-of-mouth
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Diehl, Kristin (
committee chair
), MacInnis, Deborah J. (
committee member
), Proserpio, Davide (
committee member
), Schwarz, Norbert (
committee member
)
Creator Email
ceylanho@marshall.usc.edu,gceylan@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111023114
Unique identifier
UC111023114
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ceylan, Gizem
Type
texts
Source
20220420-usctheses-batch-930
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
consumer behavior
experiences
experimental design
Grice
helpfulness
natural language processing
norms
visual WOM
WOM
word-of-mouth