Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Insider editing on Wikipedia
(USC Thesis Other)
Insider editing on Wikipedia
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
i
INSIDER EDITING ON WIKIPEDIA
by
Stacey L. Ritter
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
BUSINESS ADMINISTRATION
August 2021
Copyright 2021 Stacey L. Ritter
ii
Dedication
I dedicate this dissertation to my family and my friends. Most especially, I dedicate
this work to my husband. I am endlessly grateful for your genuine curiosity in my
research, your countless hours of bearing more than your share of our parenting
responsibilities, and your unending love and encouragement throughout this journey. I
also dedicate this work to our three beloved children, Watson, Annelise, and Barrett;
without your small acts of joy and your moments of pure patience this endeavor would
have never been possible. I hope that you remember fondly these years and that you
never give up chasing after your own dreams! I also express a special feeling of gratitude
to my loving parents, Wayne and Milly Watson, who have provided a critical support
system for our family, especially during this extraordinary pandemic-year.
I also dedicate this work to my friends in California and Texas. I will always
appreciate everything you have done for me and our family over these years. I especially
appreciate all of my amazing PhD colleagues whom I have had the opportunity to work
with and learn from, especially Allison Kays, Satish Sahoo, and Tina Lang because we
share a unique bond in our journey together.
iii
Acknowledgements
I wish to thank my committee members who were more than generous with their
expertise and precious time. I am especially thankful to Dr. Clive Lennox, my committee
chair for his countless hours of reflecting, reading, guidance, and most of all patience
throughout this entire process. Special thanks to Dr. Mark Soliman for inviting me to join
the PhD program and for cheering me along the way, as this would not have been
possible without your calming sense of humor and ability to navigate through rough
waters. Thank you Dr. Patty Dechow and Dr. Jerry Hoberg for agreeing to serve on my
committee, and for your guidance and encouragement along the way.
I would like to thank the Leventhal School of Accounting and Dean Holder for
allowing me to conduct my research and providing all of the assistance I requested over
the years. Special thanks to Danielle Galvan, Audrena Goodie, Julie Phaneuf, and Lori
Smith for your unending kindness, words of wisdom, and support over the years. I
sincerely thank all of the USC faculty, especially those from whom I had the opportunity
to learn in PhD seminars, when teaching, and while researching.
Finally, I would like to thank everyone who helped me along the way and made
this project possible. I appreciate the technical support from the Marshall IT team and the
programing expertise of Amir and Prateek. I am also grateful to my fellow USC PhD
students for their time and feedback throughout the development of this project. I
especially would like to recognize Tina Lang, for her countless hours of feedback, and
Vivek Pandey, Satish Sahoo, and Allison Kays for their detailed reviews of this
iv
manuscript. Thank you also to Jungkoo Kang, Jonathan Craske, Katherine Bruere, and
Taylor James, for your helpful comments and support throughout. Fight On!
v
Table of Contents
Dedication ..................................................................................................................................... ii
Acknowledgements .................................................................................................................... iii
List of Tables ............................................................................................................................... vii
List of Figures ............................................................................................................................ viii
Abstract......................................................................................................................................... ix
Introduction .................................................................................................................................. 1
Chapter 1: Wikipedia Setting, Literature Review, and Hypotheses Development ............ 8
1.1 Wikipedias Information Policies .................................................................................... 8
1.2 Biased Edits on Wikipedia Pages ................................................................................... 11
1.3 The WikiScanner Shock ................................................................................................... 12
1.4 Accounting Misstatements and Insider Edits on Wikipedia ..................................... 14
Chapter 2: Data Collection Process ......................................................................................... 16
2.1 Overview ........................................................................................................................... 16
2.2 Identification of Firms with Wikipedia Pages ............................................................. 16
2.3 Revisions to Firm Pages .................................................................................................. 17
2.4 Word Changes from Revisions ...................................................................................... 19
2.5 User Revisions .................................................................................................................. 20
Chapter 3: Identification, Research Design, Sample, and Descriptive Statistics .............. 23
3.1 Identification of Insider Editing ..................................................................................... 23
3.2 Research Design ............................................................................................................... 24
3.3 Sample Selection............................................................................................................... 29
3.4 Descriptive Statistics ........................................................................................................ 30
Chapter 4: Empirical Analyses ................................................................................................. 32
4.1 Univariate Results ............................................................................................................ 32
4.2 Testing H1: The Effects of Insider Editing on the Deletion of Negative Words ..... 33
4.3 Testing H2: The Effects of the WikiScanner Shock ..................................................... 34
4.4 Testing H3: The Association with Firm Misreporting ................................................ 36
Chapter 5: Additional Analyses ............................................................................................... 37
5.1 Sensitivity Analyses for the Firm Focus Measure ....................................................... 37
5.2 Alternative Model Selection ........................................................................................... 38
Chapter 6: Conclusions and Future Research ........................................................................ 40
References ................................................................................................................................... 43
vi
Appendix A: User deletes backdating section ................................................................... 48
Appendix B: User accuses another user of biased incentives .............................................. 49
Appendix C: Excerpts from lists of tone words..................................................................... 50
Appendix D: Variable definitions ............................................................................................ 54
Data Appendix ........................................................................................................................... 71
vii
List of Tables
Table 1: The sample selection ................................................................................................... 59
Table 2: Descriptive statistics ................................................................................................... 60
Table 3: Univariate tests of predictions................................................................................... 61
Table 4: Multivariate tests of H1 .............................................................................................. 62
Table 5: Multivariate tests of H2a ............................................................................................ 63
Table 6: Multivariate tests of H2b ............................................................................................ 64
Table 7: Multivariate tests of H3a & H3b ............................................................................... 65
Table 8: Sensitivity analyses for the firm focus measure ..................................................... 67
Table 9: Alternative model selection ....................................................................................... 70
viii
List of Figures
Figure 1: Histogram of users .................................................................................................... 56
Figure 2: Editing across the sample period ............................................................................ 57
Figure 3: Editing among business sectors .............................................................................. 58
Figure DA.1: Example view of a current Wikipedia page ................................................... 71
Figure DA.2: Summary of firms with Wikipedia pages ....................................................... 72
Figure DA.3: Example view of an Editing screen ............................................................. 73
Figure DA.4: Example view of a Revision history log ...................................................... 74
Figure DA.5: Summary of revisions on firms Wikipedia pages ........................................ 75
Figure DA.6: Editing activity over the sample period .......................................................... 76
Figure DA.7: Firm page growth over the sample period ..................................................... 77
Figure DA.8: Firm page activity across business sectors ..................................................... 78
Figure DA.9: Example view of a Difference between revisions screen .......................... 79
Figure DA.10: Summary of words changed on firms Wikipedia pages ........................... 80
Figure DA.11: Example view of a User contributions log ................................................ 81
Figure DA.12: Summary of users complete editing history of Wikipedia pages ............. 82
Figure DA.13: Number of users by number of revisions users make ................................ 83
Figure DA.14: Characteristics of users among different editing frequencies.................... 84
ix
Abstract
I assemble a unique dataset covering 19 years and over 30 million edits to the
Wikipedia pages of firms in the Russell 3000 index. I create a measure that indirectly
detects the edits that are most likely to have been made by firm insiders (e.g., employees
maintaining a firms social media presence). Using this measure, I find insiders bias their
firms Wikipedia pages by systematically removing negative words about their firm s. I
further find that bias and insider editing are attenuated after an exogenous shock which
reduced editing anonymity and increased the reputational costs of insiders making self-
serving edits.
1
Finally, I find that insider editing occurs more often in periods when firms
are materially misstating their financial reports. This study contributes to the strategic
disclosure literature by providing the first large-sample evidence of biased editing on
Wikipedia, as well as being the first to examine an important mechanism for disciplining
firm voluntary disclosures crowd monitoring. More broadly, my dissertation speaks
to the concern that self-interested parties are able to spread misinformation on social
media.
1
As an exogenous shock, I exploit the sudden and unanticipated introduction of an independently
developed WikiScanner tool that was able to identify whether an IP address was sourced from a firm
being edited.
1
Introduction
Wikipedia is a social media platform created in 2001 as the free encyclopedia that
anyone can edit (Wikipedia 2021a). With over 24 billion views per month (Wikimedia
Statistics 2021), this online encyclopedia has consistently ranked among the worlds most
popular websites (Alexa 2020). Not only is Wikipedia considered one of the most
influential websites of all time (named #3 by Time Magazine; Fitzpatrick, Eadicicco, and
Peckham 2017), but recent studies also suggest it is increasingly important in capital
markets, enhancing firms information environments (Rubin and Rubin 2010; Xu and
Zhang 2013) and influencing investors decisions (Moat et al. 2013; Comprend 2015; Zhu
2019). Wikipedia has policies in place prohibiting firms from editing their own Wikipedia
pages (Wikipedia 2021b), but user anonymity has made it difficult to deter insider
editing. In this study, I examine whether firm insiders attempt to influence readers
perceptions by removing negative content on their firms Wikipedia pages.
2
While it has
long been suspected that insiders do bias Wikipedia pages (e.g., Feinberg 2019a), this is
the first study to systematically identify insider editing and their biased edits.
Despite Wikipedias enormous readership, little is known about the individuals
who edit Wikipedia pages because the platform prioritizes anonymity over credibility.
Some social media platforms (e.g., LinkedIn) require users to reveal their true identities,
2
For example, on August 11
th
, 2008, two days before settling a shareholder lawsuit for fourteen million
dollars (Chen 2009), a single Wikipedia user discreetly deleted an entire section of Apple Inc.s Wikipedia
page called Stock option backdating investigation (see Appendix A). While this issue is still described in
detail on the current Wikipedia pages of other firms that were implicated in this corporate governance
scandal, the word backdating has never again appeared on Apples Wikipe dia page.
2
but Wikipedia explicitly cautions against this (Wikipedia 2021c), relying instead on the
monitoring efforts of anonymous users to seek out and remediate biased editing across
the platform. Therefore, it is difficult to identify insiders who make biased edits to their
firms pages .
In this study, I develop a novel strategy for detecting revisions to firm pages that
are likely made by firm insiders.
3
Using the revision histories of over 250,000 Wikipedia
users, I identify insiders based on their editing focus because I expect firm insiders to
concentrate their editing efforts on a single firm (i.e., their own). This rationale is
consistent with anecdotes observed in the Wikipedia archives, as Wikipedia editors
sometimes accuse other users of having biased incentives due to their pattern of focused
editing behavior.
4
Because blatant accusations of insider editing are rare, unverifiable,
and noisy to measure, I instead lean into the intuition underscored in these examples by
designing a large-scale empirical study to systematically detect and examine focused
editing.
A unique strength of my empirical design is that I have multiple revisions by
different Wikipedia users for any given firm-year, which allows me to control for all
economic events and issues affecting a firm through the inclusion of Firm × Year fixed
3
A user edits Wikipedia by publishing a revised page that contains their edits (e.g., similar to editing an
MS Word document and then submitting the entire document as one revision). I use the word revision
to describe a single instance in which a user submits an edited revision to a Wikipedia page. For narrative
purposes, I describe the act of revising pages as editing or making edits.
4
For example, in 2014 a Wikipedia user named ThurnerRupert accused another user (Elpablo69) of bias
when removing an entire section of text titled Criticism on Costcos Wikipedia page. In the comments,
ThurnerRupert complained that Elpablo69 only edits on Costcos page (see Appendix B). For context, at
the time of the complaint, 66% of Elpablo69s 218 Wikipedia revisions had been made to Costcos page.
3
effects. This design enhances my ability to draw causal inferences about differences in
editing behaviors across users, while holding constant any time-varying events that could
affect the firm (and thus the content of its Wikipedia page) in any given year.
I perform several validation tests to assess the construct validity of my measure.
First, I expect editors with a heavy focus on one firm are more likely to systematically
remove negative words about the firm. This prediction is not without tension, as the
reputational costs of being caught making biased edits may outweigh the perceived
benefits of impression management. Consistent with my hypothesis, I find a strong and
significant association between my measure of firm-focused users and revisions that
delete negative words on firms Wikip edia pages.
Next, I examine whether there is a change in the propensity for firm-focused
editors to remove negative words about their firms following an exogenous shock that
increased the reputation costs of biased editing by firm insiders. On August 13, 2007,
there was a sudden and unanticipated introduction of an independently developed
WikiScanner tool, which exacted swift and steep reputational costs to insiders caught
making self-serving edits. Specifically, this tool reduced anonymity of firm insiders by
tying the IP addresses of Wikipedia users to the internal networks of their organizations.
Thus, anyone with internet access could investigate and unmask the true identities of
users who defied Wikipedias Conflict of Interest policy by making biased edits from
their firms networks. Within days of WikiScanners launch, The New York Times ran a
4
front-page article titled Seeing Corporate Fingerprints in Wikipedia Edits (Hafner
2007).
5
I expect the WikiScanner shock to reduce the incidence of biased editing by firm
insiders due to the increased reputational costs. Notably, the WikiScanner shock affects
firm insiders, but not other Wikipedia users who do not edit from firms own computer
networks. Thus, I predict that the WikiScanner shock attenuates the positive association
between firm-focused editors and self-serving bias in their edits. Consistent with this
prediction, I find the introduction of the WikiScanner tool has a significant negative
impact on the propensity of users with a heavy firm focus to remove negative words
about their firms.
While the WikiScanner tool was remarkably successful in drawing attention to
biased editing by firm insiders, it had a number of limitations. Most notably, the tool was
only able to scrutinize the identity of about half of all editors because the tool required
users IP addresses to be visible in the Wikipedia archives.
6
Thus, there was an increased
threat of detection for firm insiders who were previously editing with visible IP
addresses, but not for firm insiders who were using registered Wikipedia user accounts.
Further, the utility of the WikiScanner tool as a monitoring device diminished shortly
5
This article detailed particularly egregious examples of biased editing: SeaWorlds removal of its animal
cruelty criticisms, Wal-Marts downplaying o f its employee compensation controversies, PepsiCos
deletion of concerns regarding the health impacts of its products, and ExxonMobils self -serving reframing
of the 1989 Exxon Valdez oil spill in Alaska.
6
The IP addresses of users who edit through registered Wikipedia user accounts are not visible in the
Wikipedia archives. In contrast, the IP addresses of users who edit Wikipedia directly, without first creating
and logging into a registered Wikipedia account, are visible in the Wikipedia archives because their IP
addresses are stored as usernames in the archives.
5
after it was introduced because firm insiders could easily evade detection by choosing to
edit through registered Wikipedia user accounts rather than through user accounts with
visible IP addresses. Given this opportunity to avoid scrutiny, I predict and find that firm-
focused users are less likely to edit with a visible IP address after the WikiScanner shock.
Finally, I explore how edits by firm insiders relate to another form of misreporting
accounting misstatements. I expect the type of firm prepared to misstate its financial
statements would also be the type of firm that is more likely to contravene Wikipedia
policy by having insiders edit its own Wikipedia page. This prediction has tension
because the insiders who edit Wikipedia pages are unlikely to be the senior executives
who are responsible for materially misstating their firms financial reports. Yet, an
unethical corporate culture may pervade all levels of employees who edit Wikipedia.
Thus, I predict a positive association between material accounting misstatements and
revisions by firm insiders. Consistent with this prediction, I find that revisions by firm-
focused users occur more often when firms are concurrently misstating their financial
reports. However, I do not find that firm-focused users make more biased revisions when
their firms are misstating their financial reports. A potential explanation for this non-
result is that the motives for providing biased accounting information (e.g., higher
managerial compensation, meeting earnings targets) are not related to the reasons why
firm insiders remove negative words on their firms Wikipedia pages.
This study is the first to empirically document how firm insiders use Wikipedia to
strategically disclose biased information. Prior studies examine how Wikipedia pages
enhance a firms overall information environment (Rubin and Rubin 2010; Xu and Zhang
6
2013), but my study is the first to document the firms role in making edits to its own
page. Further, while prior research examines firm disclosures via social media (Jung,
Naughton, Tahoun, and Wang 2018), I extend this line of research by examining a social
media setting in which user anonymity makes it difficult to identify whether firms are
the information source.
Second, I contribute to the literature by being the first to empirically examine the
effects of crowd monitoring. In particular, I exploit an exogenous shock that directly
increased the ability of outsiders to monitor biased edits by firm insiders. I find that when
the source of editing cannot be verified, crowd monitoring is unable to restrain firms from
making biased edits. However, when firms are specifically identified as the source of
editing following the WikiScanner shock, firms respond by reducing their biased edits. I
also document that firms attempt to avoid crowd monitoring by reducing their edits from
IP addresses that could be traced to the firms internal networks . These findings have
implications for both academia and practitioners because Wikipedias information
environment is unregulated, yet the site is often used by capital market participants
(Bradshaw 2008; Comprend 2015).
This study is also the first to exploit systematic differences in editing patterns to
identify the edits of firm insiders. Prior to this study, only anecdotal accusations have
existed to link suspicious editing behavior to specific Wikipedia users (Lehmann 2006;
Borland 2007; Hafner 2007; Kenber and Ahmed 2012; Cieply 2015; Craver 2015; Feinburg
2019a; Feinburg 2019b). The methodology developed in this study can be employed to
examine any of the other 53 million Wikipedia pages in over 300 languages that cover
7
thousands of topics over the past two decades. For example, the methodology could be
employed to investigate insider editing on Wikipedia pages related to politics, public
health, and news media (see Schroeder and Taylor 2015 for a review of Wikipedia
research).
More broadly, my dissertation adds to concerns that social media offers the ability
to spread false facts and misinformation. A growing concern is that people rely on news
sources that are not unbiased but are based on opinions that align with the users views
(e.g., Facebook groups). My paper gives strong and clear evidence of how social media
can be important for providing new and important information but can also have the
unintended consequence of giving self-interested parties the ability to bias and exclude
information (i.e., the absence of which results in misinformation).
Section 2 discusses the setting and prior research related to Wikipedia. It also
develops the studys hypotheses. Section 3 describes how the data is collected using
Wikipedias logs and archives. Section 4 explains how I construct the measure of inside
editors, describes the research design for testing the empirical predictions, and presents
the sample and descriptive statistics. Section 5 discusses the empirical findings. Section 6
reports the results of sensitivity analyses and robustness tests. Section 7 concludes by
discussing the studys limitations and opportunities for future research.
8
Chapter 1: Wikipedia Setting, Literature Review, and
Hypotheses Development
1.1 Wikipedias Information Policies
Created in 2001, Wikipedia has become one of the most influential websites in the
world (Fitzpatrick et al. 2017).
7
Unlike other news and social media platforms that contain
perishable information, e.g., published articles or commentary about current events,
Wikipedia is contemporaneously updated and is considered an unparalleled repository
of enduring human knowledge (Keller 2011). However, unlike traditional encyclopedias
where information was previously vetted and stowed, anyone with access to the Internet
can anonymously create and edit the content of Wikipedia pages by adding or deleting
information.
Given the open and anonymous features of the Wikipedia site, one might expect
biased editing in this continuously updated information environment to be a severe
problem. Yet, researchers have found that Wikipedia pages are surprisingly reliable
across numerous disciplines, including medicine, philosophy, history, and politics
(Devgan, Powe, Blakey, and Makary 2007; Clauson, Polen, Boulos, and Dzenowagis 2008;
Bragues 2007; Holman Rector 2008; Brown 2011). For example, a 2005 study in Nature
7
In 2006, Wikipedia ranked 12
th
for most popular websites in the US, with over thirty million unique
visitors per month to its 900K pages. By 2012, Wikipedia had grown in popularity and size to rank 6
th
in
the world in terms of total web traffic, with almost ninety million unique visitors per month and 3.8M pages
of English language content. Today, Wikipedia attracts over 23 billion visits per month to its 53M pages,
which are available in over 300 languages (Sterling 2012; Alexa Internet 2012; Wikimedia Statistics 2021;
Wikipedia 2021d; Wikipedia 2021e).
9
finds that scientific Wikipedia pages are nearly as accurate as the corresponding entries
in the Encyclopedia Britannica (Giles 2005). A recent examination of Covid-19
information disseminated across 4,500 Wikipedia pages within the first few months of
the pandemic shows that Wikipedia editors rely on highly cited and peer-reviewed
sources (Colavizza 2020).
The accuracy of Wikipedia content is widely attributed to Wikipedias Conflict of
Interest and Neutral Point of View policies (e.g., Shi, Teplitskiy, Duede and Evans
2019; Pasternack 2020). These policies encourage sincere Wikipedia editors to monitor
and eliminate biased and inaccurate information.
8
This unique self-policing editorial
process is believed to be highly effective at combating misinformation (e.g., Kumar, West,
and Leskovec 2016; Shi et al. 2019; Pasternack 2020). For example, one study finds that
90% of all hoaxes containing misinformation are flagged by Wikipedia editors within one
hour and then removed entirely within the same day (Kumar et al. 2016). Not without
some justification, Ryan Merkley the Chief of Staff of the Wikimedia Foundation recently
claimed that Wikipedia is one of the most trusted sites on the internet (Pasternack
2020).
8
In Wikipedias Conflict of Interest policy, Wikipedia specifies that users with a conflict of interest
should not edit Wikipedia pages. Conflict of interest is defined as contributing to Wikipedia about
yourself, family, friends, clients, employers, or your financial and other relationships (Wikipedia 2021b).
Further, in Wikipedias Neutral Point of View policy, Wikipedia specifies that content
on Wikipedia must be written from a neutral point of view, which means representing fairly,
proportionately, and, as far as possible, without editorial bias, all the significant views that have
been published by reliable sources on a topic (Wikipedia 2021f).
10
Nevertheless, there remains the possibility that bias exists on individual
Wikipedia pages. A critic of Wikipedia and the former editor-in-chief of the Encyclopedia
Britannica has even quipped:
The user who visits Wikipedia ... is rather in the position of a visitor to a public restroom.
It may be obviously dirty so that he knows to exercise great care, or it may seem fairly
clean, so that he may be lulled into a false sense of security. What he certainly does not
know is who has used the facilities before him. (McHenry 2004)
Anecdotally, investigative journalists have uncovered examples of such bias, including
high-profile celebrities downplaying their failed business attempts (Craver 2015; Cieply
2015), international oligarchs replacing their Soviet-era criminal convictions with
philanthropic activities (Kenber and Ahmed 2012), and prominent politicians removing
remarks about their broken promises (Lehmann 2006). However, this form of biased
editing remains anecdotal and has not yet been examined in a large-scale empirical study.
Despite the anecdotal examples of bias, multiple surveys indicate that people
generally trust Wikipedia content (Dooley 2010; Flanagin and Metzger 2011; Jordan 2014;
Mothe and Sahut 2018). For example, surveys find that 50% of adults use Wikipedia
sometimes, often, or very often to look up information on t he internet (Flanagin
and Metzger 2011) and that they perceive Wikipedia content to be credible (Chesney
2006). Another recent study of young people (aged 11 to 25) finds that the vast majority
use Wikipedia every month, with 40% (13%) using it weekly (daily). Further, 93%
perceive the information on Wikipedia to be useful and accurate (Mothe and Sahut
11
2018). A survey by YouGov.org found that around two thirds of British people trust the
authors of Wikipedia entries to tell the truth, many more than trust newspaper journalists
(Jordan 2014).
Wikipedia is not only influential among the general public, there is growing
evidence that it is also used regularly by financial intermediaries. For example, an
industry survey finds that 78% of buy-side analysts very often use Wikipedia to locate
information pertaining to a company (Comprend 2015) . There is evidence that investors
use Wikipedia to gather information before making investment decisions (Moat et al.
2013). Zhu (2019) finds that web traffic to firms Wikipedia pages increases around key
informational events, particularly when information from other sources is more difficult
to interpret. Taken together, these studies advance the view that business users may be
drawn to Wikipedias unique encyclopedic style to contextualize information about
firms.
1.2 Biased Edits on Wikipedia Pages
User anonymity in the Wikipedia setting makes it relatively difficult to
systematically identify the individuals who make edits to Wikipedia pages. In other
settings, such as Twitter, where firms are not anonymous, there is evidence that firms
send fewer tweets about their earnings announcements when the news is bad (Jung et al.
2018). The Wikipedia setting is different from Twitter because Wikipedia policies make
clear that firms should not edit their own pages (Wikipedia 2021b). Thus, firms may
perceive the reputational consequences of being caught making biased edits to their own
12
pages as too costly. Nevertheless, there is some anecdotal evidence that firms do edit their
own Wikipedia pages in a self-serving manner. For example, investigative journalists
have uncovered several firms, including Walmart, Exxon, Facebook, Axios, and NBC,
that allocated significant resources and attention to managing readers impressions on
their Wikipedia pages (Borland 2007; Feinburg 2019a). I expect that such insider editing
will lead to biased information content. Thus, my first hypothesis is as follows:
H1. Firm insiders ae e likel delee a egaie d hei fi Wikiedia
page, than other editors.
1.3 The WikiScanner Shock
A limitation of the first hypothesis is that firm insiders cannot be directly identified
and are proxied using a measure of the users firm focus. To address the concern that this
proxy may be capturing something else, I employ the WikiScanner shock because this
shock increases the reputational costs of insider editing. Therefore, the WikiScanner
shock can help alleviate concerns that my measure of the users firm focus is capturing
something other than insider edits.
There was no way to identify inside editors until a programmer named Virgil
Griffith created a tool for this purpose. Griffith believed that allowing the public to trace
the identity of Wikipedia users would help make Wikipedia more reliable, particularly
for controversial topics (Griffith 2007). On August 13, 2007, Wired magazine was the first
to report on the availability of the new WikiScanner tool that Griffith invented and made
13
publicly available (Borland 2007). The tool worked by tying the IP addresses of individual
Wikipedia users to the internal networks of large organizations. Using this tool, anyone
with internet access could investigate and unmask the identity of users who had defied
Wikipedias Conflict of Interest policy by making edits from their own firm s computer
network.
9
Within days of WikiScanners launch, The New York Times ran a front-page
article titled Seeing Corporate Fingerprints in Wikipedia Edits (Hafner 2007). The
article detailed egregious examples of biased editing by firms (see Footnote 5 for
examples), as well as the admission by some firms that their employees had indeed made
these controversial edits.
Importantly, the WikiScanner tool increased the risk of reputation damage for firm
insiders, but it did not have the same effect for other Wikipedia users whose edits could
not be traced to an IP address at the firms internal computer network. Because of the
negative attention that biased insider editing received following the introduction of the
WikiScanner tool, I expect that firms responded by reducing the extent of their self-
serving edits.
H2a. There is a reduction in the propensity of firm insiders to delete a negative word on
hei fi Wikiedia age fllig he WikiScanner shock.
9
I am unable to use the data from WikiScanner in this paper because, as of April 2013, WikiScanner and its
IP address data were made unavailable (Wikipedia 2021g). I have attempted to contact Griffith and his
former colleagues from Caltech to obtain data from the WikiScanner tool and was informed that the data
are not retrievable. Griffith is currently awaiting trial in the U.S. District Court for the Southern District of
New York, charged with violating North Korea sanctions in connection with a crypto-currency conference
held in Pyongyang in April 2019 (Lee 2021).
14
While the WikiScanner tool was remarkably successful in drawing attention to
biased editing by firm insiders, it was limited in a number of important respects. Most
notably, WikiScanner required users IP addresses to be visible in the Wikipedia archives.
Thus, the tool increased the threat of detection for firm insiders who were making edits
with visible IP addresses, but not for firm insiders who were using registered Wikipedia
user accounts. Further, firm insiders who had been editing with visible IP addresses
before the WikiScanner shock could immediately switch to a registered user account in
order to evade detection. Thus, I expect that some firm insiders tried to avoid scrutiny by
reducing their edits from visible IP addresses and switching instead to registered
Wikipedia user accounts.
H2b. Firm insiders have a larger decrease in the likelihood of editing with visible IP
addresses after the WikiScanner shock, compared to other editors.
1.4 Accounting Misstatements and Insider Edits on Wikipedia
Finally, I investigate whether insider edits relate to another form of misreporting
accounting misstatements. Wikipedia policies explicitly prohibit firms from editing their
own Wikipedia pages. I expect the type of firm with a corporate culture prepared to
misstate its financial statements may also be the type of firm willing to violate Wikipedia
policy by editing its own Wikipedia page. Nevertheless, this prediction has tension
because firm insiders who edit Wikipedia are unlikely to be the same individuals who
15
materially misstate their firms fin ancial reports. Moreover, Wikipedia is not a source of
accounting information and the motives for providing biased accounting information
(e.g., higher managerial compensation, meeting earnings targets) may not apply to the
non-accounting information typically found on Wikipedia. It is therefore an open
question whether insider edits occur more often and are more biased during periods in
which firms are materially misstating their financial statements.
H3a. Edits by firm insiders occur more often in periods when firms are materially
misstating their financial reports.
H3b. Firm insiders are more likely to delete negative words in periods when firms are
materially misstating their financial reports.
16
Chapter 2: Data Collection Process
2.1 Overview
Every edit to a Wikipedia page can be traced historically through Wikipedia logs
and archives. Using a web scraping algorithm, I collect and analyze Wikipedia records to
create a dataset that tracks the editing behavior of over 250 thousand Wikipedia users
who made 30 million revisions on 3.7 million unique pages between 2001 and 2019. To
empirically test the predictions in this study, I focus specifically on 925,410 revisions to
1,849 pages of firms listed in the Russell 3000 Index. In this section, I offer a detailed
description of the data collection process because this study is the first to provide a
comprehensive examination of editing behavior on Wikipedia. I expect that the methods
used in this study can also be exploited in future research. All figures referred to in this
section are found in the Data Appendix.
2.2 Identification of Firms with Wikipedia Pages
I begin the data collection process by identifying firms with Wikipedia pages. I
first search Wikipedia for the names of 2,925 firms listed in the Russell 3000 Index, as of
March 26, 2019. This process yields 1,923 firm-page matches. To validate these matches,
I search each firms current Wikipedia page for its ticker symbol (see Figure DA.1 in the
Data Appendix). I exclude 74 matches that do not contain the ticker symbol, leaving a
sample of 1,849 firms with Wikipedia pages (see Figure DA.2, Panel A). Figure DA.2,
Panel B compares the financial characteristics of firms in the Russell 3000 Index with and
17
without Wikipedia pages. I find the type of firm that has a Wikipedia page differs from
the type of firm without a page. Firms with Wikipedia pages are on average larger (t =
25.5), more profitable (t = 11.8), and have larger operating cash flows (t = 24.9) compared
to firms without Wikipedia pages.
2.3 Revisions to Firm Pages
Next, I collect information about the revisions to each firms page. Each revision
includes all edits a user makes to a page before publishing the changes (see Figure DA.3).
Every time a revised page is published, an entry is recorded in the pages Revision
history log (see Figure DA.4). These logs contain detailed information about each
revision, including: the username of the person who made the revision, the date and time
of the revision, the size of the revision (in bytes), and any comments the user may have
noted when making the revision. I collect this log for each of the 1,849 firm pages in the
sample, yielding a total of 925,410 revisions between 2001 and 2019.
Summary statistics for the revisions to firm pages are provided in Figure DA.5.
Panel A shows there are 251,647 unique users that have edited a firms Wikipedia page ;
firm pages are revised an average of 501 times by 243 Wikipedia users. The average firm
page has existed for 10.9 years and is 11.4 thousand bytes in size, which is roughly
equivalent to 6 to 14 paragraphs of text.
10
Panel B provides details on how the revisions
impact page content (measured in net bytes). Overall, an average revision increases firm
10
I estimate this, based on an average 8.3 bytes per word and 100-200 words per paragraph.
18
page size by 36.6 bytes (which is roughly equivalent to 4.5 words).
11
While revisions are
more likely to increase content (58%) than decrease content (35%), the average size of
revisions that decrease content (-1,189.8 bytes) is larger than the revisions that increase
content (792.3 bytes). These statistics make intuitive sense given that less effort is required
to delete existing content than add new content.
Figure DA.6 tracks editing activity over the sample period. Editing activity
increased sharply in the early years of the sample, and then decreased and remained
relatively steady in the second half of the sample period. Figure DA.7 reports firm page
growth. While the average size of individual firm pages has grown steadily over the
sample period, the number of firm pages created is concentrated in the early years (with
almost 600 new firm pages created in 2005/06) and has remained relatively constant at
less than 100 new pages created per year since 2009. Taken together, the figures report a
dramatic increase in the popularity of Wikipedia in its early years and then a levelling off
as the platform matured.
Figure DA.8 reports editing activity across different business sectors. There is
considerable variation among sectors in the number of revisions and users, suggesting
that some companies have greater consumer and investor recognition than others. The
sectors with the most editing activity are Consumer (Non-Essentials), Information
Technology, Communication, and Industrials. The average size of firm pages is relatively
consistent across business sectors, except Communications which is noticeably larger.
11
Changes in bytes can also be the result of other types of revisions; e.g., adding or removing blank spaces,
graphics, etc.
19
Certain business sectors have a larger proportion of firms with Wikipedia pages than
others. For example, 82% of firms listed in the Utilities sector have Wikipedia pages
compared to only 42% of firms in the Healthcare sector.
2.4 Word Changes from Revisions
To better understand the type of information that is deleted or added, I collect and
analyze the words that are changed in each revision. To collect this data, I navigate to a
view in Wikipedia that displays the before and after text, called the Difference between
revisions view (see Figure DA.9). The panel on the left-hand side of this view is the
source code for the content on the page before the revision is made, and the panel on the
right-hand side is the source code after the revision is made. I scrape the text on this view
for each firm page revision and collect 217M words. I then analyze these words for
variation in tone using the 2011 word lists provided by Loughran and MacDonald (refer
to Appendix C for excerpts from these lists).
Figure DA.10 summarizes and describes the words changed on firm pages. I create
an algorithm to count the number of words edited by comparing the total number of
words before and after each revision. Panel A shows that a total of 88M words are edited
in the sample. Panel B summarizes the number and types of words that were deleted and
inserted. While there are more revisions that insert words (43%) than delete words (31%),
the number of deleted words per revision (151.6) exceeds the number of inserted words
20
per revision (113.4).
12
This is consistent with the previously documented change in net
bytes reported in the revision history logs (see Figure DA.5, Panel B). Within the sample
of revisions that change words, there are slightly more deletions of tonal words (27%)
than insertions of tonal words (25%). I also examine variation in the types of tonal words
edited. I find that negative words are deleted and inserted more often than other tonal
words. For example, an average of 1.25 unique negative words are removed per deletion
revision, whereas only .57 positive words are deleted in each revision.
2.5 User Revisions
I analyze the behavior of individual editors using the User contribution s log s.
Wikipedia updates these logs every time a user makes a revision (see Figure DA.11). I
download this log for each user in my sample, resulting in 29,967,963 revisions by 251,647
users.
13
Using this data, I am able to assess whether a user is highly focused on a single
firm (i.e., firm-focused) or the user edits many different pages across the Wikipedia
platform.
Panel A of Figure DA.12 provides descriptive information on users complete
editing histories in the full sample of 30 million revisions, (hereafter, the all -edits
sample) versus the subsample of revisions made specifically to firm pages (hereafter, the
12
Note that the remainder of revisions in this sample (26%) do not change the number of words on firm-
pages. Because I only code words that contain alphabetical characters, this remainder suggests that many
revisions in the sample are related to other types of edits, e.g., editing citations, references, numerical
values, graphics, font, formatting, etc.
13
For 8% of users, I collect only their most recent one thousand revisions, as this is a reasonable number to
assess whether each user has a heavy firm focus.
21
firm page sub-sample). I find that users in the all-edits sample edit an average of 137.8
times on 75.8 different Wikipedia pages. Looking specifically at the revisions of firm
pages, the same user edits an average of 3.7 times on 1.8 different firm pages.
Panel B examines variation in the use of IP addresses and the frequency of
revisions in the all-edits sample and the firm page sub-sample.
14
There is a larger
proportion of revisions by visible IP address users in the firm page sub-sample (35%) as
opposed to the all-edits sample (28%), and there is also a substantially larger proportion
of revisions by low-frequency users (i.e., users editing less than 5 times on Wikipedia) in
the firm page sub-sample (16%) compared to the all-edits sample (0.58%). These
differences suggest that users who edit primarily on firm pages edit disproportionately
from IP addresses and tend to edit less on Wikipedia in general. A majority of users edit
with a visible IP address (64%) in both samples, however revisions by users with visible
IP addresses represent only 28% (35%) of revisions in the all-edits sample (firm page sub-
sample), suggesting users who edit with visible IP addresses edit less than registered
Wikipedia users overall.
I also explore the wide range in the frequency of user revisions. Figure DA.13
shows that the vast majority of users make a relatively small number of revisions. Figure
DA.14 charts characteristics of users by their frequency of revisions. I find that, relative
to other users, the average low-frequency user (making fewer than 5 revisions on any
Wikipedia page) decreases the content of pages and has a relatively short editing period
14
Users have two options when editing Wikipedia. In one option their IP address is visible, whereas in the
other option their registered username is visible.
22
(in days). Users who edit with a greater frequency tend to increase the content of pages
and edit over a longer duration.
23
Chapter 3: Identification, Research Design, Sample,
and Descriptive Statistics
3.1 Identification of Insider Editing
Insider editing cannot be directly observed because Wikipedia users are
anonymous. To empirically assess the editing behavior of firm insiders on Wikipedia, I
create a measure that indirectly detects insider editing by Wikipedia users. I expect firm
insiders to concentrate their editing efforts on a single firm (i.e., their own). Therefore,
using the Wikipedia logs, I identify inside editors based on their degree of editing focus.
This rationale is consistent with anecdotes observed from the Wikipedia archives, in
which Wikipedia users accuse other users of having biased incentives due to their highly
focused editing on a single subject or single firm (see example at Appendix B). I lean into
this intuition instead of attempting to identify these salient, yet rare, instances of direct
accusations.
To construct a measure of firm focus, I calculate the revisions a user makes on a
firm page as a percentage of the total number of revisions the same user makes on all
Wikipedia pages.
15
While I expect inside editors to be highly focused, I do not require
them to focus exclusively on their own firms page because it is possible that firm insiders
would also edit other related pages (e.g., subsidiaries, product lines, executive profiles,
or the pages of competitors).
15
For example, if a user makes 100 total edits on Wikipedia and 10 of these edits are on Firm As page
and 5 of these edits are on Firm Bs page , this percentage is 10% for Firm A and 5% for Firm B.
24
A histogram of the distribution of users shows that most users make only a very
small proportion of their total revisions on a single firms Wikipedia page (see Figure 1).
The next largest group of users edits 100% on one firm page. However, many of these
single page users make only one or two revisions total. To reduce the noise in my
identification of firm-focused users, I exclude any users that do not make at least three
revisions in total because such users do not edit enough to establish a reliable pattern of
behavior. Among the rest, I code users as firm-focused (FOCUS = 1) if the user
concentrates at least 65% of their total revisions on a single firms page. All other users
are coded as non-focused (FOCUS = 0). All revisions by a focused user on other firm
pages are coded as non-focused revisions (FOCUS = 0). I acknowledge that the 65%
threshold is somewhat arbitrary. In robustness tests, I check that that my results are not
sensitive to this specific threshold. I report that all empirical results remain unchanged if
I instead use cutoffs of 50% and 80% (in lieu of 65%). Likewise, I report that my results
remain robust if I exclude users that do not make at least five revisions in total, or I
include all users no matter their total number of revisions.
3.2 Research Design
My first hypothesis (H1) predicts that firm insiders are more likely to delete
negative words than other users. I test H1 by estimating the following model:
DEL_NEG =
0
+
1
FOCUS + Control Variables + Firm × Year fixed effects + (1)
25
The dependent variable (DEL_NEG) is a dummy that indicates whether the revision
deletes a negative word, where negative words are identified following Loughran &
McDonald (2011). I concentrate my analyses on negative words for two reasons. First,
they are the most frequently edited of all the tonal words (see Data Appendix, Figure
DA.10, Panel B). Second, prior literature cautions against using other tonal words
(Loughran and McDonald, 2011), such as positive words that are often negated in context
(e.g., did not benefit).
The variable of interest is FOCUS, which captures whether the user has a heavy
focus on a single firms page. H1 predicts that firm-focused users are more likely to delete
a negative word when editing the firms page, i.e., a positive coefficient on FOCUS. The
control variables include user- and firm-level characteristics that may also affect the
decision to delete a negative word. The user-level controls capture whether users edit
with a visible IP address (USER_IP) and users total editing activity on Wikipedia
(USER_WIKI_EDITS). I also include quarterly firm-level characteristics that capture firm
size (FIRM_SIZE), profitability (FIRM_PROFIT), and liquidity (FIRM_CF). All
continuous variables are winsorized at 1% and 99% to minimize the influence of extreme
values.
Each firm-year in my sample has multiple revisions. Given that eq. (1) is estimated
using revisions as the unit of observation, I am able to include Firm × Year fixed effects,
which control for all economic factors affecting any firm in any given year. The estimated
coefficients are identified from the variation in editing behavior across different editors
within a given firm-year. This fixed effects specification greatly improves my ability to
26
draw causal inferences, as it subsumes both time-varying and time-invariant
characteristics of the firm. This is important because the number and nature of user
revisions are likely to vary over time as firms become subject to different types of news-
worthy events. Thus, it is highly unlikely that a correlated firm characteristic is
influencing my results. I find that logit models are unable to converge with the inclusion
of Firm × Year fixed effects, so I utilize linear probability models instead.
16
In robustness
tests, I use logit models with firm and year fixed effects (rather than Firm × Year fixed
effects) and find that the results are not sensitive to the choice of a logit model or linear
probability model.
My second set of hypotheses relate to the impact of the WikiScanner shock on the
biased editing behavior of firm-focused users. In H2a, I examine how the shock affects
the likelihood of focused users removing a negative word from the firms Wikipedia
page:
DEL_NEG = 0 + 1 FOCUS + 2 SHOCK + 3 FOCUS × SHOCK
+ Control Variables + Firm × Year fixed effects + (2)
The SHOCK variable equals one if a revision was made on or after the WikiScanner shock
(August 13, 2007), and zero otherwise. The interaction term (FOCUS × SHOCK) captures
the differential effect of the WikiScanner shock on firm-focused users. I expect firm-
16
This incidental parameter problem is common to logit and probit models (Lancaster 2000).
27
focused users are less likely to delete negative words from firm pages following the
shock. Thus, I predict a negative coefficient on FOCUS × SHOCK.
17
This model and all
subsequent models in this section utilize the same control variables, fixed effect structure,
and model selection as eq. (1). For brevity, I will hereafter omit discussion pertaining to
these design choices unless there are differences.
In H2b, I examine whether the shock impacts the tendency of firm-focused users
to edit using IP addresses:
USER_IP = 0 + 1 FOCUS + 2 SHOCK + 3 FOCUS × SHOCK
+ Control Variables + Firm × Year fixed effects + (3)
The dependent variable in eq. (3) indicates whether revisions are made with a visible IP
address (USER_IP = 1), rather than a registered Wikipedia account (USER_IP = 0). I
expect firm insiders gravitate away from using visible IP addresses after the WikiScanner
shock. This would result in a negative coefficient on FOCUS × SHOCK. The Wikipedia
shock occurred on August 13, 2007 and, to address the potential concern that there may
be fewer revisions by users with visible IP addresses in the latter part of each year, eq. (3)
includes an additional control variable that identifies revisions occurring after August 13
in any sample year (EDIT_AF_AUG_13). This control variable is also included in eq. (2)
17
In robustness tests (untabulated), I examine the parallel trends assumption by including in eq. (2)
variables capturing three 90-day periods before the WikiScanner shock on August 13, 2007. My results for
3 remain statistically equivalent, whereas the coefficients for the interactions between FOCUS × these 90-
day pre-shock periods are not statistically significant.
28
to control for the possibility that there may be more deletions of negative words in the
latter part of each year.
My third set of hypotheses relate to insider editing during firms misstatement
periods. H3a predicts that revisions by firm insiders are more likely to occur in periods
when firms are materially misstating their financial reports:
FOCUS =
0
+
1
MISSTATE + Control Variables + Firm × Year fixed effects + (4)
The MISSTATE variable equals one if a firm is currently in a misstatement period, and
zero otherwise. I gather non-reliance restatements from Audit Analytics and define
misstatement periods as the quarters covered by firms financial statements that are later
restated. A positive coefficient on MISSTATE would indicate that a revision is more likely
to be made by a firm-focused user if the firm is currently in a misstatement period.
H3b predicts that firm-focused users are more likely to delete negative words from
their firm pages when their firms are materially misstating their financial reports:
DEL_NEG = 0 + 1 FOCUS + 2 MISSTATE + 3 FOCUS × MISSTATE
+ Control Variables + Firm × Year fixed effects + (5)
Under H3b, I predict a positive coefficient on the interaction term (FOCUS × MISSTATE).
29
3.3 Sample Selection
Table 1 describes the sample selection process. The initial sample comprises
925,410 revisions to 1,849 pages between the years 2001 and 2019. From this initial sample,
revisions are excluded based on the following two restrictions. First, I exclude revisions
related to vandalism and the remediation of vandalism. In the Wikipedia setting,
vandalism is defined as editing in an intentionally disruptive or malicious manner
(Wikipedia 2021h). While vandalism is technically prohibited, it is a known problem. One
of the more noticeable forms of vandalism occurs when a user deletes all or most of the
content on a firms page and another user remediates the vandalism by simply reverting
the page to its original content by pushing the undo button. This vandalism -
remediation pattern can go back and forth numerous times before the vandal eventually
surrenders or is blocked by an administrator. Because vandalism of this kind is ultimately
futile, it produces exceptionally large deletion revisions that are unrelated to biased
editing. To minimize the noise created by such vandalism, I exclude 11,459 revisions that
result in the deletion or insertion of a large amount of content. Specifically, I measure the
average size (in bytes) of each firms page for each day in the sample and I assume a
revision is vandalism-related if the absolute value of the size of the revision is 90% of the
average daily size of the firm page. Second, I drop 216,217 revisions because financial
information from COMPUSTAT is unavailable for the firm-quarter. In untabulated
sensitivity analyses, I estimate all my models with the initial sample (N = 925,410) and
find that the results are not sensitive to these sample screening criteria.
30
3.4 Descriptive Statistics
Table 2 reports summary statistics for the variables used in the regression analyses.
Variable definitions are listed in Appendix D. Panel A of Table 2 reports 9.6% of revisions
are made by firm-focused users (FOCUS) and 5.2% of revisions delete a negative word
from a firm page (DEL_NEG).
18
Panel B reports pairwise correlations among the variables
used in the regressions. I observe a positive correlation between FOCUS and both
DEL_NEG and MISSTATE, suggesting that focused user editing is associated with
deleting a negative word and editing during firms misstatement periods. These
correlations provide preliminary support for H1 and H3a. I also observe that focused
editing occurs more often with visible IP addresses and by users who make fewer total
revisions on Wikipedia, as FOCUS is positively (negatively) correlated with USER_IP
(USER_WIKI_EDITS). Focused editing is also negatively associated with FIRM_SIZE and
positively associated with FIRM_PROFIT, implying focused editing occurs more often on
the Wikipedia pages of smaller and more profitable firms.
I also provide descriptions of firm-focused editing and deleting negative words
over the sample period and among business sectors. Figure 2 reports that while the
frequency of edits that delete a negative word is relatively stable across the sample
period, firm-focused editing increases sharply in the early years of the sample (from 1%
of revisions in 2001 to a high of 11% in 2010) and then decreases slowly in the second half
of the sample period. Among business sectors, Figure 3 shows that focused editing is
18
The remaining 95% of revisions are related to other types of editing, such as, adding and deleting other
words, correcting spelling mistakes, re-arranging existing content, making changes to numeric characters
or graphics, etc.
31
most prevalent in the Real Estate, Financial, and Materials sectors, whereas the
Communication and Utilities sectors have a notably smaller proportion of their revisions
by firm-focused users. In my empirical analyses, I control for these observed variations
across time and among firms by employing a Firm × Year fixed effect structure.
32
Chapter 4: Empirical Analyses
4.1 Univariate Results
Table 3 reports the univariate tests for each hypothesis. Consistent with H1, Panel
A indicates that firm-focused users are significantly more likely to delete a negative word
from their firm s Wikipedia page (z = 8.38). Indeed, focused users delete negative words
on firm pages 33% more often than non-focused users do.
19
This univariate test supports
my first prediction (H1) that revisions by firm insiders are more likely to delete negative
words than revisions by firm outsiders.
Panel B reports the change in edits that delete negative words following the
WikiScanner shock (H2a). While all users are less likely to delete a negative word after
the introduction of the WikiScanner tool, firm-focused users are affected significantly
more (z = 4.68), such that the shock has a larger negative effect on the likelihood a revision
deletes a negative word for focused users. On average, focused users reduce their
deletion of negative words by 39%, whereas the decrease is only 12.5% for other users.
Thus, the percentage decrease for focused users following the WikiScanner shock is more
than three times the percentage decrease observed in other users.
20
This is consistent with
the WikiScanner tool increasing the risk of reputational damage for firm insiders making
revisions that reduce negative content on their own Wikipedia pages.
19
Percent increase is calculated as: 33% = (.0673-.0505)/.0505*100.
20
Calculated as: 3.12 = ((.0974-.0594)/.0974)/((.0559-.0489)/.0559).
33
Panel C examines how users change the way they edit Wikipedia pages following
the WikiScanner shock. While all users reduce their editing with visible IP addresses, the
reduction is significantly larger for firm-focused users compared to other users (z = 3.13).
Focused users decrease editing with visible IP addresses by 31.6% and other users
decrease by 21.6%.
21
This supports the prediction in H2b, that firm insiders have a larger
decrease in the likelihood of editing with a visible IP address after the WikiScanner shock.
Consistent with H3a, Panel D finds that firm-focused users are more likely to edit
their firms Wikipedia page during periods in which the firm is concurrently misstating
its financial statements (z = 2.15). However, Panel E does not find support for H3b, which
predicts that insiders are more likely to delete negative content during these periods.
Instead, the focused users are more likely to delete negative words during both
misstatement (z = 3.53) and non-misstatement periods (z = 7.68). Taken together, the
univariate results in Panels D and E suggest that firms are more likely to violate
Wikipedia policies prohibiting users from editing their own pages while they are
misstating their financial reports. However, I do not find that the revisions themselves
are more likely to delete negative words.
4.2 Testing H1: The Effects of Insider Editing on the Deletion of Negative Words
Table 4 reports the results for eq. (1). The dependent variable DEL_NEG captures
the likelihood a revision deletes a negative word. Consistent with H1 and the univariate
21
Calculated as: 31.6% = ((.6190-.4236)/.6190) and 21.6% = ((.3855-.3023)/.3855).
34
test (Table 3, Panel A), Column (1) shows that the coefficient on FOCUS is positive and
highly significant (t = 4.559). Thus, focused users are significantly more likely to delete
negative words from firm pages, relative to non-focused users.
Column (2) reports results after controlling for user characteristics (USER_IP and
USER_WIKI_EDITS). The positive and significant coefficient on USER_WIKI_EDITS
suggests users who frequently edit Wikipedia are more likely to delete negative words.
Column (3), reports results controlling for a firms financial characteristics in each fiscal
quarter (i.e., FIRM_SIZE, FIRM_PROFIT, and FIRM_CF). Coefficients on these control
variables suggest that revisions are more likely to delete a negative word when firms are
larger and less profitable. The coefficient on FOCUS remains large and highly significant
after explicitly controlling for these user- and firm-characteristics.
In the remaining sections, I utilize the same format for presenting user- and firm-
level control variables, as in Columns (2) and (3) of Table 4, to demonstrate robustness in
the coefficients of interest for each model. For brevity, I omit discussion pertaining to
these specific control variables unless there are differences.
4.3 Testing H2: The Effects of the WikiScanner Shock
In the second set of hypotheses, I test whether the WikiScanner shock deters firm-
focused users from removing negative words about their firms. Table 5 reports results
from the difference-in-differences research design in eq. (2). As predicted in H2a and
consistent with univariate results (Table 3, Panel B), Columns (1) through (3) show that
the coefficients on FOCUS × SHOCK are negative and significant (t-stats. ranging from -
35
4.052 to -4.201). This suggests that firm-focused users reduce their deletion of negative
content in response to the WikiScanner shock. To address the possibility that revisions in
the later part of each year merely happen to delete fewer negative words, I include an
additional control variable in Column (3), capturing revisions taking place after August
13 for every year in the sample (EDIT_AF_AUG_13). While not significant, this control
variable ensures that the estimated coefficients are identified from only the variation in
editing behavior across years before and after the shock regardless of the time of year.
The remaining time-variant and time-invariant firm characteristics are subsumed by the
Firm × Year fixed effects. The results for H2a remain significant in these specifications.
Table 6 reports the results from estimating eq. (3). Consistent with H2b and the
univariate test (Table 3, Panel C), Columns (1) through (3) show the coefficients on the
treatment variable, FOCUS × SHOCK, are highly significant and negative (t-stats. range
from -2.460 to -7.586). The control variable USER_WIKI_EDITS has a noticeably large and
significant negative coefficient (t = -187.3), reflecting the fact that users who frequently
edit Wikipedia are much less likely to edit with a visible IP address rather than a
Wikipedia account. Despite this highly influential control variable, the estimated
coefficient on the predicted interaction FOCUS × SHOCK remains robust.
Taken together, my results indicate that firm-focused users respond to the
WikiScanner shock as predicted. Not only do focused users reduce their systematic
deletion of negative content when editing (H2a), but they are also less likely to edit with
visible IP addresses following the introduction of the WikiScanner tool (H2b).
36
4.4 Testing H3: The Association with Firm Misreporting
For the third set of hypotheses, I examine how insider editing behavior differs
during periods of firms financial misreporting (see Table 7). Panel A reports results from
eq. (4). As predicted in H3a and consistent with the univariate test (Table 3, Panel D),
Columns (1) through (3) report that the coefficients on MISSTATE are positive and highly
significant (t-stats. range from 2.665 to 2.864). Thus, more revisions are made by firm-
focused users during misstatement periods than during non-misstatement periods.
Panel B, however, does not support the prediction in H3b that firm insiders are
more likely to delete negative words during periods of financial misreporting. In this
panel, I estimate the likelihood a revision deletes a negative word (DEL_NEG) using the
difference-in-differences research design in eq. (5). While I find strong and significant
coefficients on FOCUS (consistent with H1), the interaction term (FOCUS × MISSTATE)
is not significant. This is consistent with the univariate test in Panel E of Table 3. I
conclude that firm-focused users are more likely to delete negative words from their
firms pages, but their propensity to do so is not greater when their firm is concurrently
misstating its financial statements.
Taken together, the results from Table 7 support the notion that misreporting firms
are more likely to edit their own Wikipedia pages. However, the incentives that motivate
firm insiders to bias their financial reports (e.g., stock-based compensation) do not appear
to heighten managers motivation to remove negative words from their firms Wikipedia
pages.
37
Chapter 5: Additional Analyses
5.1 Sensitivity Analyses for the Firm Focus Measure
Given the novelty of the Wikipedia setting, the 65% threshold used to code the
FOCUS measure is somewhat arbitrary. This section evaluates whether the results are
robust to this research design choice. To assess this, I re-estimate all models using four
new measures that vary the thresholds used to capture firm-focused editing.
While the original FOCUS measure is coded as one if a user makes at least 65% of
their total revisions on a single firms page , I instead use alternative cutoffs of 50%
(FOCUS_WEAK) and 80% (FOCUS_STRONG). Further, in the original measure, I exclude
users that do not make at least three revisions on any Wikipedia page. For the new
measures, the FOCUS_HIGH variable excludes users that do not make at least five
revisions while FOCUS_LOW includes all users no matter how many times they edit
Wikipedia.
Table 8 reports the results of these sensitivity analyses. In each panel, I estimate
equations (1) through (5) under the tightest empirical specifications (i.e., including all
control variables and Firm × Year fixed effects). Panel A finds that two of the alternative
measures (FOCUS_STRONG and FOCUS_HIGH) produce slightly larger coefficients
than the original FOCUS measure. More importantly, the coefficient for each new
measure is statistically significant at the 1% level, confirming that the inferences for H1
are qualitatively unchanged.
38
Panel B reports the results for H2a. I find that each coefficient on the interaction
term of interest (e.g., FOCUS_WEAK × SHOCK) is negative and statistically significant,
yielding results that are qualitatively unchanged from the results using the original
FOCUS measure. Similarly, Panel C, shows that the H2b results are qualitatively
unchanged from the results reported in Table 6. Panels D and E report the results for eq.
(4) and eq. (5), respectively. The coefficients of interest in both models are consistent with
those observed in Table 7, suggesting that these results are also qualitatively unchanged
for H3a and H3b.
5.2 Alternative Model Selection
A strength of my research design is that I am able to employ Firm × Year fixed
effects. This allows me to draw stronger causal inferences because the number and types
of revisions are likely to vary over time as firms become subject to news-worthy events.
The dependent variables in each of my equations are dichotomous, making logit and
probit models preferred to OLS. However, because of the incidental parameter problem
I am unable to estimate logit or probit models with Firm × Year fixed effects.
22
Thus, I
estimate OLS models in my main analyses. In this section, I examine the results from logit
regressions with firm and year fixed effects included separately, in order to evaluate
whether my inferences are robust to the estimation method.
22
The incidental parameter problem arises when the number of fixed effects in a non-linear model makes
the variance-covariance matrix too large to converge (Lancaster 2000).
39
Table 9 reports the results. Consistent with the H1 results in Table 4, Column (1)
of Panel A finds that focused users are significantly more likely to delete negative words
when editing a firms Wikipedia page ( z = 8.453). Likewise, in Columns (2) and (3) of
Panel A, I find strong support for H2a and H2b, consistent with the results presented in
Tables 5 and 6, respectively. In Panel B, the results are consistent with Table 7 for H3a
and H3b. In particular, I find that focused users increase their editing during periods of
misreporting, but they do not increase their deletions of negative words.
40
Chapter 6: Conclusions and Future Research
In this study, I utilize detailed data on Wikipedia editing activity to document a
novel form of biased corporate disclosure, namely the making of self-serving revisions to
a firm s own Wikipedia page. Specifically, I collect and analyze 30 million user edits to
create a measure that indirectly identifies insider editing based on the users firm focus .
Using this measure, my findings suggest that firm insiders are more likely to remove
negative language relating to their firms. This editing behavior is attenuated when user
anonymity is compromised, although firm insiders attempt to avoid scrutiny of their
editing when monitoring tools become available. In addition, I find that insider editing
occurs more often when firms are materially misstating their financial statements.
Overall, the study contributes to the strategic disclosure literature by providing the first
large-sample evidence of biased editing on Wikipedia, as well as being the first to
examine an important mechanism for disciplining firm behavior crowd monitoring.
My study has some limitations. As currently constructed, the measure of firm
focus does not capture editing activity on other firm-related Wikipedia pages (e.g., firms
products, management profiles, competitors, etc.). Future research may develop a more
precise measure by expanding the number of pages identified as firm pages. In addition,
I use the firm focus measure to study the removal of negative words on firm pages, but
there are other methods of strategic editing. Future research could examine edits that tout
managers accomplishments, sabotage competitors, or leak information. A natural
extension of this study would investigate the determinants and consequences of insider
41
editing to further understand the motives and the efficacy of biased edits. For example,
biased insider editing may be related to the personal characteristics of firm managers
(e.g., narcissism) or specific governance features of the firm (e.g., captured boards).
Further, the timing of editing could be related to firms impression management motives
during key information events (e.g., mergers and acquisitions, equity issuances, etc.).
Future studies could also explore how outsiders constrain biased editing and how other
financial information intermediaries (e.g., analysts) interact with crowd monitoring
mechanisms.
More generally, this study is one of the first papers in any discipline to explore the
vast amounts of granular data available in Wikipedia. Relative to other forms of social
media, Wikipedia data logs are unique in that they capture dynamic exchanges of
information across thousands of interconnected topical areas. For example, empirical
studies could be designed to capture network effects of firm insiders (e.g., political
affiliations) or firm outsiders (e.g., activist groups). Further, the multidimensionality of
the Wikipedia data allows researchers to examine subtleties of firms information
environments that are unobservable in other settings. For example, the data can be
amassed in such a way as to capture and examine stakeholders perceptions of (e.g.,
credibility) and their exposure to specific firm disclosures (e.g., web traffic).
Wikipedia data logs and archives exist for all 53 million Wikipedia pages that
cover thousands of topical areas, in over 300 languages, constructed over the past two
decades. Future research can use these data to examine the impression management
activities of organizations that we currently have very little public information about,
42
especially non-listed entities such as audit firms, law firms, hospitals, political parties,
etc. Not only is the identification of biased editing particularly valuable in settings where
constituents are highly motivated to persuade (e.g., political science, public health issues,
etc.), but the intuition used to construct this measure might also prove useful in other
areas of research related to Wikipedia.
43
References
Alexa Internet. 2012. Alexa Top 500 Global Sites. Available at:
https://en.wikipedia.org/w/index.php?title=List_of_most_popular_websites&o
ldid=524404301#cite_note-Alexa-2 (last accessed May 8, 2021).
Alexa Internet. 2020. Alexa Top 500 Global Sites. Available at:
http://www.alexa.com/topsites (last accessed March 7, 2020).
Borland, J. 2007. See Who's Editing Wikipedia - Diebold, the CIA, a Campaign. Wired
(August 14). Available at: https://www.wired.com/2007/08/wiki-
tracker/?currentPage=all (last accessed May 10, 2021).
Bradshaw, T. 2008. Companies woo investors via social websites. Financial Times
(December 1): 18.
Bragues, G. 2007. Wiki-philosophizing in a marketplace of ideas: Evaluating
Wikipedia's entries on seven great minds. Working paper, University of Guelph-
Humber.
Brown, A. 2011. Wikipedia as a Data Source for Political Scientists: Accuracy and
Completeness of Coverage. PS, Political Science & Politics, 44(2), 339 343.
https://doi.org/10.1017/S1049096511000199.
Chen, B. 2009. Forbes Uncovers Jobs' Deposition From Backdating Scandal. Wired (April
22). Available at: https://www.wired.com/2009/04/forbes-uncovers/ (last
accessed June 6, 2021).
Chesney, T. 2006. An empirical examination of Wikipedias credibility . First Monday,
vol. 11, no. 11, [Online] Available at:
https://firstmonday.org/ojs/index.php/fm/article/view/1413 (last accessed
May 10, 2021).
Cieply, M. 2015. Wikipedia Pages of Star Clients Altered by P.R. Firm. The New York
Times (June 23), Section B, Page 3. Available at:
https://www.nytimes.com/2015/06/23/business/media/a-pr-firm-alters-the-
wiki-reality-of-its-star-clients.html (last accessed May 10, 2021).
Clauson, K. A., H. H. Polen, M. N. K. Boulos, and J. H. Dzenowagis. 2008. Scope,
completeness, and accuracy of drug information in Wikipedia. Annals of
Pharmacotherapy 42, no. 12: 1814 1821.
44
Colavizza, G. 2020. COVID-19 Research in Wikipedia. Quantitative Science Studies, pp.1-
32.
Comprend. 2015. Comprends Capital Market Report 2015. Available at:
https://comprend.com/spotlight/2015/capital-market/capital-market-report-
2015.pdf (last accessed October 15, 2020).
Craver, J. 2015. PR firm covertly edits the Wikipedia entries of its celebrity clients: How
a big Hollywood firm altered Naomi Campbells entry. Wiki Strategies. Available
at: https://wikistrategies.net/sunshine-sachs/ (last accessed May 10, 2021).
Devgan, L., N. Powe, B. Blakey, and M. Makary. 2007. Wiki-surgery? Internal validity of
Wikipedia as a medical and surgical reference. Journal of the American College of
Surgeons 205, no. 3: S76-77.
Dooley, P.L. 2010. Wikipedia and the two-faced professoriate. In Proceedings of the 6th
International Symposium on Wikis and Open Collaboration. New York, NY,
USA: ACM.
Feinberg, A. 2019a. Facebook, Axios And NBC Paid This Guy To Whitewash Wikipedia
Pages. Huffpost. Available at: https://www.huffpost.com/entry/wikipedia-paid-
editing-pr-facebook-nbc-axios_n_5c63321be4b03de942967225 (last accessed May
10, 2021).
Feinberg, A. 2019b. Pete Buttigiegs Campaign Says This Wikipedia User Is Not Pete. So
Who Is It? Slate. Available at: https://slate.com/news-and-
politics/2019/12/pete-buttigieg-wikipedia-page-editor.html (last accessed May
10, 2021).
Fitzpatrick, A., L. Eadicicco, and M. Peckham. 2017. The 15 Most Influential Websites of
All Time. Time Magazine. Available at: https://time.com/4960202/most-
influential-websites/ (last accessed May 10, 2021).
Flanagin J. & M. Metzger. 2011. From Encyclopedia Britannica to Wikipedia.
Information, Communication & Society, 14:3, 355-374.
Giles, J. 2005. Internet encyclopedias go head to head. Nature, 438, 900 901.
Griffith, V. 2007. WikiScanner FAQ. Available at:
https://web.archive.org/web/20070830003427/http://virgil.gr/31.html (last
accessed May 10, 2021).
45
Hafner, K. 2007. Seeing Corporate Fingerprints in Wikipedia Edits. The New York Times
(August 19). p. A1. Available at:
https://www.nytimes.com/2007/08/19/technology/19wikipedia.html?ei=5124
&en=786d0a243046f262&ex=1345262400&partner=permalink&exprod=permalin
k&pagewanted=print (last accessed May 10, 2021).
Holman Rector, L. 2008. Comparison of Wikipedia and other encyclopedias for
accuracy, breadth, and depth in historical articles. Reference Services Review 36, no.
1: 7-22.
Jordan, W. 2014. British people trust Wikipedia more than the news. YouGov. Available
at: https://yougov.co.uk/topics/politics/articles-reports/2014/08/09/more-
british-people-trust-wikipedia-trust-news (last accessed May 10, 2021).
Jung, M., j. Naughton, A. Tahoun, & C. Wang. 2018. Do Firms Strategically
Disseminate? Evidence from Corporate Use of Social Media. The Accounting
Review, 93(4), 225 252.
Keller, J. 2011. Is Wikipedia a World Cultural Repository? The Atlantic (May 23, 2011).
Available at: https://www.theatlantic.com/technology/archive/2011/05/is-
wikipedia-a-world-cultural-repository/239274/ (last accessed May 10, 2021).
Kenber, B. and M. Ahmed. 2012. Oligarch's PR firm cleaned up his entry in Wikipedia.
The Times, 21 November, p.1.
Kumar, S., R. West, and J. Leskovec. 2016. Disinformation on the Web: Impact,
characteristics, and detection of Wikipedia hoaxes. Proceedings of the 25th
International Conference on World Wide Web. International World Wide Web
Conferences Steering Committee.
Lancaster, T. 2000. The incidental parameter problem since 1948. Journal of
Econometrics, 95(2), 391 413. https://doi.org/10.1016/S0304-4076(99)00044-5
Lee, M. R. 2021. Griffith NoKo Crypto Trial Delay to Sept 13 Sought by US Also
Pushing Back CIPA Motion. Inner City Press. Available at:
http://www.innercitypress.com/sdnysealed30nkgriffithcyptoicp040921.html
(last accessed May 10, 2021).
Lehmann, E. 2006. Rewriting history under the dome. Lowell Sun (January 27). Available
at: https://www.lowellsun.com/2006/01/27/rewriting-history-under-the-
dome-2/ (last accessed May 10, 2021).
46
Loughran, T. and B. Mcdonald. 2011. When is a Liability not a Liability? Textual
Analysis, Dictionaries, and 10-Ks. The Journal of Finance (New York), 66(1), 35 65.
https://doi.org/10.1111/j.1540-6261.2010.01625.x
Mchenry, R. 2004. The Faith-Based Encylopedia. TCS Daily (November 15). Available at:
https://web.archive.org/web/20060719003313/http://www.tcsdaily.com:80/ar
ticle.aspx?id=111504A (last accessed May 10, 2021).
Moat, H., C. Curme, A. Avakian, D. Kenett, H. Stanley, & T. Preis. 2013. Quantifying
Wikipedia Usage Patterns Before Stock Market Moves. Scientific Reports, 3(1),
1801 . https://doi.org/10.1038/srep01801
Mothe, J., & G. Sahut. 2018. How trust in Wikipedia evolves: a survey of students aged
11 to 25. The University of Boras, Sweden. VOL. 23 NO. 1 (March, 2018)
Available at: https://files.eric.ed.gov/fulltext/EJ1174243.pdf (last accessed May
10, 2021).
Pasternack, A. 2020. How Wikipedias volunteers became the webs best weapon
against misinformation. Fast Company (March 7). Available at:
https://www.fastcompany.com/90471667/how-wikipedia-volunteers-became-
the-webs-best-weapon-against-misinformation (last accessed May 10, 2021).
Rubin, A., & E. Rubin. 2010. Informed investors and the internet. Journal of Business
Finance & Accounting, 37(7 ‐8), 841-865.
Schroeder, R., & L. Taylor. 2015. Big data and Wikipedia research: social science
knowledge across disciplinary divides. Information, Communication &
Society, 18(9), 1039 1056.
Shi, F., M. Teplitskiy, E. Duede, & J. Evans. 2019. The wisdom of polarized
crowds. Nature Human Behaviour, 3(4), 329 336. https://doi.org/10.1038/s41562-
019-0541-6
Sterling, G. 2012. Comparing comScores Top 50 Sites: 2006 And 2012. CMO Zone.
Available at: https://marketingland.com/comparing-comscores-top-50-sites-
2006-and-2012-29269 (last accessed May 10, 2021).
Wikimedia Statistics. 2021. Monthly Overview. Available at:
https://stats.wikimedia.org/#/all-projects (last accessed May 10, 2021).
Wikipedia, 2021a. Wikipedia: The Free Encyclopedia. Available at:
https://en.wikipedia.org/wiki/Wikipedia:The_Free_Encyclopedia.
(last accessed May 10, 2021).
47
Wikipedia, 2021b. Wikipedia: Conflict of Interest. Available at:
https://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest
(last accessed May 10, 2021).
Wikipedia, 2021c. Wikipedia: How to not get outed on Wikipedia. Available at:
https://en.wikipedia.org/wiki/Wikipedia:How_to_not_get_outed_on_Wikipedi
a (last accessed May 10, 2021).
Wikipedia, 2021d. Wikipedia: Size of Wikipedia. Available at:
https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
(last accessed May 10, 2021).
Wikipedia, 2021e. Wikipedia. Available at: https://en.wikipedia.org/wiki/Wikipedia
(last accessed May 10, 2021).
Wikipedia, 2021f. Wikipedia: Neutral point of view. Available at:
https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view
(last accessed May 10, 2021).
Wikipedia, 2021g. WikiScanner. Available at:
https://en.wikipedia.org/wiki/WikiScanner
(last accessed May 10, 2021).
Wikipedia, 2021h. Vandalism on Wikipedia. Available at:
https://en.wikipedia.org/wiki/Vandalism_on_Wikipedia
(last accessed May 10, 2021).
Xu, S. X., & X. M. Zhang. 2013. Impact of Wikipedia on Market Information
Environment: Evidence on Management Disclosure and Investor Reaction. MIS
Quarterly, 37(4), 1043- 1068.
Zhu, C. 2019. Investor Demand for Contextual Information: Evidence from Wikipedia.
Working paper as of February 2, 2019.
48
Appendix A: User deletes backdating section
Below is a screenshot in which the user Adderz91 removed a second of text from Apple
Inc.s Wikipedia page called Stock option backdating investigation.
User deleted
section of page
caed Sc
option backdating
ega
49
Appendix B: User accuses another user of biased incentives
Below is a screenshot in which the user ThurnerRupert accuses another user (Elpablo69)
of bias when removing an entire section of text titled Criticism on Costcos Wikipedia
page. In the comments ThurnerRupert says Elpablo69 only edits costco, conflict of
interest by Elpablo69.
ThurnerRupert
complains that
Eab69
ed Cc
website
50
Appendix C: Excerpts from lists of tone words
In this study, words edited were classified by the type of tone words using the
Loughran & McDonald (2011) dictionaries. Below are excerpts from these word lists.
Negative tone words (excerpt: 170 of 2,355 total words):
ABANDON ABSENCES ADVERSARIAL ANNOYED AVERSELY
ABANDONED ABSENTEEISM ADVERSARIES ANNOYING BACKDATING
ABANDONING ABUSE ADVERSARY ANNOYS BAD
ABANDONMENT ABUSED ADVERSE ANNUL BAIL
ABANDONMENTS ABUSES ADVERSELY ANNULLED BAILOUT
ABANDONS ABUSING ADVERSITIES ANNULLING BALK
ABDICATED ABUSIVE ADVERSITY ANNULMENT BALKED
ABDICATES ABUSIVELY AFTERMATH ANNULMENTS BANKRUPT
ABDICATING ABUSIVENESS AFTERMATHS ANNULS BANKRUPTCIES
ABDICATION ACCIDENT AGAINST ANOMALIES BANKRUPTCY
ABDICATIONS ACCIDENTAL AGGRAVATE ANOMALOUS BANKRUPTED
ABERRANT ACCIDENTALLY AGGRAVATED ANOMALOUSLY BANKRUPTING
ABERRATION ACCIDENTS AGGRAVATES ANOMALY BANKRUPTS
ABERRATIONAL ACCUSATION AGGRAVATING ANTICOMPETITIVE BANS
ABERRATIONS ACCUSATIONS AGGRAVATION ANTITRUST BARRED
ABETTING ACCUSE AGGRAVATIONS ARGUE BARRIER
ABNORMAL ACCUSED ALERTED ARGUED BARRIERS
ABNORMALITIES ACCUSES ALERTING ARGUING BOTTLENECK
ABNORMALITY ACCUSING ALIENATE ARGUMENT BOTTLENECKS
ABNORMALLY ACQUIESCE ALIENATED ARGUMENTATIVE BOYCOTT
ABOLISH ACQUIESCED ALIENATES ARGUMENTS BOYCOTTED
ABOLISHED ACQUIESCES ALIENATING ARREARAGE BOYCOTTING
ABOLISHES ACQUIESCING ALIENATION ARREARAGES BOYCOTTS
ABOLISHING ACQUIT ALIENATIONS ARREARS BREACH
ABROGATE ACQUITS ALLEGATION ARREST BREACHED
ABROGATED ACQUITTAL ALLEGATIONS ARRESTED BREACHES
ABROGATES ACQUITTALS ALLEGE ARRESTS BREACHING
ABROGATING ACQUITTED ALLEGED ARTIFICIALLY BREAK
ABROGATION ACQUITTING ALLEGEDLY ASSAULT BREAKAGE
ABROGATIONS ADULTERATE ALLEGES ASSAULTED BREAKAGES
ABRUPT ADULTERATED ALLEGING ASSAULTING BREAKDOWN
ABRUPTLY ADULTERATING ANNOY ASSAULTS BREAKDOWNS
ABRUPTNESS ADULTERATION ANNOYANCE ASSERTIONS BREAKING
ABSENCE ADULTERATIONS ANNOYANCES ATTRITION BREAKS
51
Positive tone words (excerpt: 85 of 354 total words):
ABLE ADVANCEMENT ATTAINING BOLSTERED COLLABORATIVE
ABUNDANCE ADVANCEMENTS ATTAINMENT BOLSTERING COLLABORATOR
ABUNDANT ADVANCES ATTAINMENTS BOLSTERS COLLABORATORS
ACCLAIMED ADVANCING ATTAINS BOOM COMPLIMENT
ACCOMPLISH ADVANTAGE ATTRACTIVE BOOMING COMPLIMENTARY
ACCOMPLISHED ADVANTAGED ATTRACTIVENESS BOOST COMPLIMENTED
ACCOMPLISHES ADVANTAGEOUS BEAUTIFUL BOOSTED COMPLIMENTING
ACCOMPLISHING ADVANTAGEOUSLY BEAUTIFULLY BREAKTHROUGH COMPLIMENTS
ACCOMPLISHMENT ADVANTAGES BENEFICIAL BREAKTHROUGHS CONCLUSIVE
ACCOMPLISHMENTS ALLIANCE BENEFICIALLY BRILLIANT CONCLUSIVELY
ACHIEVE ALLIANCES BENEFIT CHARITABLE CONDUCIVE
ACHIEVED ASSURE BENEFITED COLLABORATE CONFIDENT
ACHIEVEMENT ASSURED BENEFITING COLLABORATED CONSTRUCTIVE
ACHIEVEMENTS ASSURES BENEFITTED COLLABORATES CONSTRUCTIVELY
ACHIEVES ASSURING BENEFITTING COLLABORATING COURTEOUS
ACHIEVING ATTAIN BEST COLLABORATION CREATIVE
ADEQUATELY ATTAINED BETTER COLLABORATIONS CREATIVELY
Uncertainty tone words (excerpt: 85 of 297 total words):
ABEYANCE ANTICIPATIONS ASSUME CONDITIONAL DEPENDENCIES
ABEYANCES APPARENT ASSUMED CONDITIONALLY DEPENDENCY
ALMOST APPARENTLY ASSUMES CONFUSES DEPENDENT
ALTERATION APPEAR ASSUMING CONFUSING DEPENDING
ALTERATIONS APPEARED ASSUMPTION CONFUSINGLY DEPENDS
AMBIGUITIES APPEARING ASSUMPTIONS CONFUSION DESTABILIZING
AMBIGUITY APPEARS BELIEVE CONTINGENCIES DEVIATE
AMBIGUOUS APPROXIMATE BELIEVED CONTINGENCY DEVIATED
ANOMALIES APPROXIMATED BELIEVES CONTINGENT DEVIATES
ANOMALOUS APPROXIMATELY BELIEVING CONTINGENTLY DEVIATING
ANOMALOUSLY APPROXIMATES CAUTIOUS CONTINGENTS DEVIATION
ANOMALY APPROXIMATING CAUTIOUSLY COULD DEVIATIONS
ANTICIPATE APPROXIMATION CAUTIOUSNESS CROSSROAD DIFFER
ANTICIPATED APPROXIMATIONS CLARIFICATION CROSSROADS DIFFERED
ANTICIPATES ARBITRARILY CLARIFICATIONS DEPEND DIFFERING
ANTICIPATING ARBITRARINESS CONCEIVABLE DEPENDED DIFFERS
ANTICIPATION ARBITRARY CONCEIVABLY DEPENDENCE DOUBT
52
Litigious tone words (excerpt: 85 of 297 total words):
ABOVEMENTIONED ACQUITTAL ADJUDICATE AFFIRMANCE AMENDED
ABROGATE ACQUITTALS ADJUDICATED AFFREIGHTMENT AMENDING
ABROGATED ACQUITTANCE ADJUDICATES AFOREDESCRIBED AMENDMENT
ABROGATES ACQUITTANCES ADJUDICATING AFOREMENTIONED AMENDMENTS
ABROGATING ACQUITTED ADJUDICATION AFORESAID AMENDS
ABROGATION ACQUITTING ADJUDICATIONS AFORESTATED ANTECEDENT
ABROGATIONS ADDENDUMS ADJUDICATIVE AGGRIEVED ANTECEDENTS
ABSOLVE ADJOURN ADJUDICATOR ALLEGATION ANTICORRUPTION
ABSOLVED ADJOURNED ADJUDICATORS ALLEGATIONS ANTITRUST
ABSOLVES ADJOURNING ADJUDICATORY ALLEGE ANYWISE
ABSOLVING ADJOURNMENT ADMISSIBILITY ALLEGED APPEAL
ACCESSION ADJOURNMENTS ADMISSIBLE ALLEGEDLY APPEALABLE
ACCESSIONS ADJOURNS ADMISSIBLY ALLEGES APPEALED
ACQUIREES ADJUDGE ADMISSION ALLEGING APPEALING
ACQUIRORS ADJUDGED ADMISSIONS AMEND APPEALS
ACQUIT ADJUDGES AFFIDAVIT AMENDABLE APPELLANT
ACQUITS ADJUDGING AFFIDAVITS AMENDATORY APPELLANTS
Strong tone words (excerpt: 19 of 19 total words):
ALWAYS DEFINITIVELY NEVER UNDISPUTED UNPARALLELED
BEST HIGHEST STRONGLY UNDOUBTEDLY UNSURPASSED
CLEARLY LOWEST UNAMBIGUOUSLY UNEQUIVOCAL WILL
DEFINITELY MUST UNCOMPROMISING UNEQUIVOCALLY
Weak tone words (excerpt: 27 of 27 total words):
ALMOST COULD MAYBE POSSIBLY SUGGESTS
APPARENTLY DEPEND MIGHT SELDOM UNCERTAIN
APPEARED DEPENDED NEARLY SELDOMLY UNCERTAINLY
APPEARING DEPENDING OCCASIONALLY SOMETIMES
APPEARS DEPENDS PERHAPS SOMEWHAT
CONCEIVABLE MAY POSSIBLE SUGGEST
53
Constraining tone word list (excerpt: 85 of 184 total words):
ABIDE CONFINE DEPENDANCES ENCUMBERED FORBIDDING
ABIDING CONFINED DEPENDANT ENCUMBERING FORBIDS
BOUND CONFINEMENT DEPENDENCIES ENCUMBERS IMPAIR
BOUNDED CONFINES DEPENDENT ENCUMBRANCE IMPAIRED
COMMIT CONFINING DEPENDING ENCUMBRANCES IMPAIRING
COMMITMENT CONSTRAIN DEPENDS ENTAIL IMPAIRMENT
COMMITMENTS CONSTRAINED DICTATE ENTAILED IMPAIRMENTS
COMMITS CONSTRAINING DICTATED ENTAILING IMPAIRS
COMMITTED CONSTRAINS DICTATES ENTAILS IMPOSE
COMMITTING CONSTRAINT DICTATING ENTRENCH IMPOSED
COMPEL CONSTRAINTS DIRECTIVE ENTRENCHED IMPOSES
COMPELLED COVENANT DIRECTIVES ESCROW IMPOSING
COMPELLING COVENANTED EARMARK ESCROWED IMPOSITION
COMPELS COVENANTING EARMARKED ESCROWS IMPOSITIONS
COMPLY COVENANTS EARMARKING FORBADE INDEBTED
COMPULSION DEPEND EARMARKS FORBID INHIBIT
COMPULSORY DEPENDANCE ENCUMBER FORBIDDEN INHIBITED
54
Appendix D: Variable definitions
Variable name Definition
Dependent Variables
DEL_NEG =
An indicator variable that equals 1 if a revision resulted in one
or more negative words being deleted on the firms page, and
0 otherwise. Specifically, I use the Loughran & McDonald
(2011) dictionary to classify the negative words deleted in each
revision.
USER_IP =
An indicator variable that equals 1 if a user has a visible IP
address in the Wikipedia archives, and 0 otherwise.
Explanatory Variables of Interest
FOCUS = An indicator variable that equals 1 if a user is classified as a
firm-focused user for a firm. This value is 1 if the user makes
more than 65% of all their Wikipedia revisions on the page of
a single firm and the user has made at least 3 revisions on any
Wikipedia page, and 0 otherwise.
FOCUS_WEAK = An indicator variable that equals 1 if a user is classified as a
firm-focused user for a firm. This value is 1 if a user makes
more than 50% of all their Wikipedia revisions on the page of
a single firm and the user has made at least 3 revisions on any
Wikipedia page, and 0 otherwise.
FOCUS_STRONG = An indicator variable that equals 1 if a user is classified as a
firm-focused user for a firm. This value is 1 if a user makes
more than 80% of all their Wikipedia revisions on the page of
a single firm and the user has made at least 3 revisions on any
Wikipedia page, and 0 otherwise.
FOCUS_LOW = An indicator variable that equals 1 if a user is classified as a
firm-focused user for a firm. This value is 1 if a user makes
more than 65% of all their Wikipedia revisions on the page of
a single firm, and 0 otherwise.
FOCUS_HIGH = An indicator variable that equals 1 if a user is classified as a
firm-focused user for a firm. This value is 1 if a user makes
more than 65% of all their Wikipedia revisions on the page of
a single firm and the user has made at least 5 revisions on any
Wikipedia page, and 0 otherwise.
55
SHOCK =
An indicator variable that equals 1 if a revision was made on
or after the WikiScanner shock on August 13, 2007, and 0
otherwise.
MISSTATE =
An indicator variable that equals 1 if a firm is in a
misstatement period, and 0 otherwise. A misstatement period
is defined as the period during which the financial statements
of a firm are later discovered to have been materially
misstated. Non-reliance restatements were collected from
Audit Analytics.
Control Variables
USER_WIKI_EDITS = The natural logarithm of 1 plus the number of revisions a user
makes on any Wikipedia page.
FIRM_SIZE = The natural logarithm of 1 plus the size of the quarterly total
asset account (i.e., atq) from Compustat. This value represents
the current view of a company as reported in the latest
quarterly report to the SEC (Bogue and Bailey 2001) (i.e.,
these quarterly values have been adjusted by Compustat for
prior material misstatements that have been restated).
FIRM_PROFIT = The quarterly reported net income amount (i.e., niq) divided
by the size of the quarterly reported total asset account (i.e.,
atq) from Compustat. This value represents the current view
of a company as reported in the latest quarterly report to the
SEC (Bogue and Bailey 2001) (i.e., these quarterly values
have been adjusted by Compustat for prior material
misstatements that have been restated).
FIRM_CF = The natural logarithm of 1 plus the year-to-day reported
value of Operating Activities - Net Cash Flow (i.e., oancfy)
from Compustat. Because this value is reported annually, it is
my understanding that this value has not been adjusted by
Compustat for prior material misstatements that have been
restated.
EDIT_AF_AUG_13
= An indicator variable that equals 1 if a revision was made
after August 13
th
of any year in the sample, and 0 otherwise.
56
Figure 1: Histogram of users
Note: Figure 1 reports a histogram in which the vertical axis measures the percentage of
users, and the horizontal axis measures the ratio of a users firm page edits to the user s
edits on all Wikipedia pages. The sample consists of all edits to Wikipedia pages from
2001 to 2019 of firms listed in the Russell 3000 index with available COMPUSTAT
financial information (i.e., 697,734 revision observations).
57
Figure 2: Editing across the sample period
Note: In Figure 2, the vertical axis measures the percentage of edits that are made by
firm-focused users (FOCUS = 1) and the percentage of edits in which a negative word is
removed (DEL_NEG = 1). The sample consists of all edits to Wikipedia pages from 2001
to 2019 of firms listed in the Russell 3000 index with available COMPUSTAT financial
information (i.e., 697,734 revision observations).
0%
2%
4%
6%
8%
10%
12%
% of Edits
FOCUS DEL_NEG
58
Figure 3: Editing among business sectors
Note: In Figure 3, the vertical axis measures the percentage of edits that are made by
firm-focused users (FOCUS = 1) and the percentage of edits in which a negative word is
removed (DEL_NEG = 1). The sample consists of all edits to Wikipedia pages from 2001
to 2019 of firms listed in the Russell 3000 index with available COMPUSTAT financial
information (i.e., 697,734 revision observations).
0%
2%
4%
6%
8%
10%
12%
14%
% of Edits
FOCUS DEL_NEG
59
Table 1: The sample selection
Panel A: Sample derivation
Description of Wikipedia revisions in sample (Deleted)
No. of
revisions
No. of
firms
No. of
users
Number of revisions to Wikipedia pages from
2001-2019 of firms listed in the Russell 3000, as
of March 27, 2019
925,410 1,849 251,647
(a) keep revisions unrelated to vandalism and
the remediation of vandalism
(11,459) 913,951 1,849 248,850
(b) keep revisions in which COMPUSTAT
financial information is available for the
firm-quarter
(216,217) 697,734 1,726 200,258
Final sample used in tests of predictions 697,734 1,726 200,258
Table 1 reports the sample of revisions used in testing predictions. The sample consists of all revisions
to Wikipedia pages from 2001 to 2019 of firms listed in the Russell 3000 index with available
COMPUSTAT financial information (i.e., 697,734 revision observations). Panels A describes how
the sample is derived.
60
Table 2: Descriptive statistics
Panel A: Descriptive statistics for variables used in Equations (1) through (5)
N Mean S.D. 1% 25% 50% 75% 99%
FOCUS 697,734 0.096 0.294 0 0 0 0 1
DELNEG 697,734 0.052 0.222 0 0 0 0 1
USERIP 697,734 0.335 0.472 0 0 0 1 1
SHOCK 697,734 0.777 0.416 0 1 1 1 1
MISSTATE 697,734 0.090 0.287 0 0 0 0 1
USERWIKIEDITS 697,734 4.85 2.317 0.693 2.565 6.064 6.909 6.909
FIRMSIZE 697,734 9.28 2.021 4.913 7.821 9.348 10.7 14.31
FIRMPROFIT 697,734 0.018 0.032 -0.063 0.005 0.017 0.030 0.088
FIRMCF 697,734 6.346 2.124 1.212 4.874 6.436 7.932 10.57
EDIT_AF_AUG_13 697,734 0.381 0.486 0 0 0 1 1
Panel B: Pairwise correlations among variables used in Equations (1) through (5)
(1) (2) (3) (4) (5) (6) (7) (8) (9)
(1) FOCUS 1.00
(2) DEL_NEG 0.02 1.00
(3) USER_IP 0.09 0.00 1.00
(4) SHOCK 0.01 -0.02 -0.08 1.00
(5) MISSTATE 0.01 0.00 0.00 -0.02 1.00
(6) USER_WIKI_EDITS -0.33 0.00 -0.58 0.01 0.00 1.00
(7) FIRM_SIZE -0.07 0.01 0.02 -0.04 -0.07 0.03 1.00
(8) FIRM_PROFIT 0.01 0.01 0.00 -0.07 -0.02 -0.01 -0.08 1.00
(9) FIRM_CF -0.07 0.01 0.02 -0.05 -0.08 0.02 0.86 0.07 1.00
(10) EDIT_AF_AUG_13 0.00 0.00 0.00 0.06 -0.01 0.01 0.00 0.01 0.02
Table 2 reports descriptive statistics for variables used in the regression analyses. Variable definitions are
in Appendix D. The sample consists of all revisions to Wikipedia pages from 2001 to 2019 of firms listed
in the Russell 3000 index with available COMPUSTAT financial information (i.e., 697,734 revision
observations). Panel A provides descriptive statistics for variables used in Equations (1) through (5). Panel
B provides pairwise correlation coefficients and signifies statistically significant coefficients at the 1%
significance level in bold font.
61
Table 3: Univariate tests of prediction
Panel A: Prediction H1 Likelihood revisions delete negative words (DELNEG )
FOCUS
=1 =0 z-stat.
DELNEG
.0673 .0505 8.38 ***
Panel B: Prediction H2a Likelihood revisions delete negative words (DELNEG )
FOCUS
=1 =0 z-stat.
Pre-Shock period (SHOCK =0)
.0974 .0559 7.84 ***
Post-Shock period (SHOCK =1) .0594 .0489 5.20 ***
z-stat. -6.18*** -3.13***
Difference-in-differences test [.0974 .0594] [.0559 .0489] = 0.031 4.68 ***
Panel C: Prediction H2b Likelihood users edit with a visible IP Address (USERIP )
FOCUS
=1 =0 z-stat.
Pre-Shock period (SHOCK =0)
.6190 .3855 7.28 ***
Post-Shock period (SHOCK =1) .4236 .3023 14.39 ***
z-stat. -6.16*** -11.41***
Difference-in-differences test [.6190 .4236] [.3855 .3023] = 0.1122 3.13 ***
Panel D: Prediction H3a Likelihood revisions are in misstatement period (MISSTATE)
FOCUS
=1 =0 z-stat.
MISSTATE
.1005 .0892 2.15 **
Panel E: Prediction H3b Likelihood revisions delete negative words (DELNEG )
FOCUS
=1 =0 z-stat.
Period without (MISSTATE =0)
.0667 .0507 7.68 ***
Period with (MISSTATE =1) .0730 .0476 3.53 ***
z-stat. 0.71 -1.55
Difference-in-differences test [.0667 .0730] [.0507 .0476] = -0.0094 1.22
Table 3 presents the results of univariate tests (logit regression models are used to compare means in
each group). Panels A, B, and E report the results of the likelihood revisions delete negative words.
Panel C reports the results of the likelihood a user edits with an IP address, and Panel D reports the
results of the likelihood a revision ade dg a f aee ed . The sample consists
of all revisions to Wikipedia pages from 2001 to 2019 of firms listed in the Russell 3000 index with
available COMPUSTAT financial information (i.e., 697,734 revision observations). Variable
definitions are in Appendix D. SHOCK is coded as 1 if a revision was made on or after the introduction
of the WikiScanner tool on August 13, 2007, and 0 otherwise. Two-tailed z-statistics are presented in
the last column and calculated with standard errors adjusted for clustering at the Firm-Year level. ***,
**, and * indicate significance at the 1%, 5%, and 10% levels, respectively.
62
Table 4: Multivariate tests of H1
Regression investigating whether firm-focused users are more likely to delete negative words
W
(1) (2) (3)
VARIABLES DELNEG DELNEG DELNEG
FOCUS 0.0102*** 0.0139*** 0.0139***
(4.559) (6.044) (6.041)
USERIP -0.000272 -0.000282
(-0.221) (-0.228)
USERWIKIEDITS 0.00138*** 0.00138***
(5.798) (5.790)
FIRMSIZE 0.0104**
(2.117)
FIRMPROFI T -0.0302*
(-1.719)
FIRMCF 0.000364
(0.603)
Constant 0.0511*** 0.0442*** -0.0539
(238.1) (31.19) (-1.186)
Firm-Year FE Yes Yes Yes
Number of Firm-Years 17,757 17,757 17,757
Observations 697,734 697,734 697,734
R-squared 0.000 0.000 0.000
Table 4 presents the results of linear probability models from eq. (1). The sample
consists of all revisions to Wikipedia pages from 2001 to 2019 of firms listed in
the Russell 3000 index with available COMPUSTAT financial information (i.e.,
697,734 revision observations). Variable definitions are in Appendix D. Two-
tailed t-statistics are presented in parentheses and calculated with standard errors
adjusted for clustering at the Firm-Year level. ***, **, and * indicate significance
at the 1%, 5%, and 10% levels, respectively. Please refer to Table 9 for robustness
testing of this model, using a logit regression with Firm fixed effects and Year
fixed effects.
63
Table 5: Multivariate tests of H2a
Regression investigating whether the positive association between firm-focused users and the
deletion of negative words is weaker following the WikiScanner shock
(1) (2) (3)
VARIABLES DELNEG DELNEG DELNEG
FOCUS 0.0311*** 0.0342*** 0.0342***
(5.110) (5.355) (5.346)
SHOCK 0.0115*** 0.0114*** 0.0101***
(3.168) (3.103) (2.677)
FOCUS × SHOCK -0.0274*** -0.0269*** -0.0269***
(-4.201) (-4.062) (-4.052)
USERIP -0.000609 -0.000617
(-0.494) (-0.500)
USERWIKIEDITS 0.00130*** 0.00130***
(5.430) (5.426)
FIRMSIZE 0.00659
(1.290)
FIRMPROFI T -0.0277
(-1.547)
FIRMCF 0.000220
(0.357)
EDIT_AF_AUG_13 0.000705
(0.763)
Constant 0.0422*** 0.0359*** -0.0254
(14.70) (11.01) (-0.541)
Firm-Year FE Yes Yes Yes
Number of Firm-Years 17,757 17,757 17,757
Observations 697,734 697,734 697,734
R-squared 0.000 0.001 0.001
Table 5 presents the results of linear probability models from eq. (2). The
sample consists of all revisions to Wikipedia pages from 2001 to 2019 of firms
listed in the Russell 3000 index with available COMPUSTAT financial
information (i.e., 697,734 revision observations). Variable definitions are in
Appendix D. SHOCK is coded as 1 if a revision was made on or after the
introduction of the WikiScanner tool on August 13, 2007, and 0 otherwise. Two-
tailed t-statistics are presented in parentheses and calculated with standard errors
adjusted for clustering at the Firm-Year level. ***, **, and * indicate
significance at the 1%, 5%, and 10% levels, respectively. Please refer to Table
9 for robustness testing of this model, using a logit regression with Firm fixed
effects and Year fixed effects.
64
Table 6: Multivariate tests of H2b
Regression investigating whether firm-focused users are less likely to edit with
visible IP addresses after the WikiScanner shock
(1) (2) (3)
VARIABLES USERIP USERIP USERIP
FOCUS 0.231*** -0.0520*** -0.0520***
(7.100) (-3.230) (-3.229)
SHOCK -0.0155 0.000659 -8.40e-05
(-1.123) (0.0706) (-0.00871)
FOCUS × SHOCK -0.0819** -0.133*** -0.133***
(-2.460) (-7.586) (-7.571)
USERWIKIEDITS -0.123*** -0.123***
(-187.3) (-187.2)
FIRMSIZE 0.0123
(0.874)
FIRMPROFIT -0.0609
(-1.303)
FIRMCF 0.00174
(1.092)
EDIT_AF_AUG_13 -0.000682
(-0.284)
Constant 0.331*** 0.944*** 0.821***
(30.35) (125.9) (6.323)
Firm-Year FE Yes Yes Yes
Number of Firm-Years 17,757 17,757 17,757
Observations 697,734 697,734 697,734
R-squared 0.011 0.346 0.346
Table 6 presents the results of linear probability models for eq. (3). The sample
consists of all revisions to Wikipedia pages from 2001 to 2019 of firms listed in
the Russell 3000 index with available COMPUSTAT financial information (i.e.,
697,734 revision observations). Variable definitions are in Appendix D. SHOCK
is coded as 1 if a revision was made on or after the introduction of the WikiScanner
tool on August 13, 2007, and 0 otherwise. Two-tailed t-statistics are presented in
parentheses and calculated with standard errors adjusted for clustering at the Firm-
Year level. ***, **, and * indicate significance at the 1%, 5%, and 10% levels,
respectively. Please refer to Table 9 for robustness testing of this model, using a
logit regression with Firm fixed effects and Year fixed effects.
65
Table 7: Multivariate tests of H3a & H3b
Regressions investigating whether firm-focused users are more likely to
revise (and remove negative words)
during a period in which the firm is misstating its financial statements
Pae A: Aca beee f aee ed ad edg b f -
focused users
(1) (2) (3)
VARIABLES FOCUS FOCUS FOCUS
MISSTATE 0.0213*** 0.0203*** 0.0207***
(2.665) (2.795) (2.864)
USERIP -0.0738*** -0.0738***
(-21.93) (-21.92)
USERWIKIEDITS -0.0462*** -0.0462***
(-70.01) (-70.04)
FIRMSIZE 0.0126
(1.087)
FIRMPROFIT 0.00621
(0.135)
FIRMCF -0.000522
(-0.398)
Constant 0.0939*** 0.343*** 0.229**
(130.1) (87.52) (2.118)
Firm-Year FE Yes Yes Yes
Number of Firm-Years 17,757 17,757 17,757
Observations 697,734 697,734 697,734
R-squared 0.000 0.109 0.109
66
Panel B: Differential effects of misstatement periods on firm-focused users deleting
egae d he edg he f Weda age.
(1) (2) (3)
VARIABLES DELNEG DELNEG DELNEG
FOCUS 0.0103*** 0.0140*** 0.0140***
(4.352) (5.769) (5.767)
MISSTATE -0.00600* -0.00609* -0.00543
(-1.735) (-1.754) (-1.555)
FOCUS × MISSTATE -0.00120 -0.00117 -0.00119
(-0.164) (-0.160) (-0.162)
USERIP -0.000279 -0.000287
(-0.226) (-0.233)
USERWIKIEDITS 0.00138*** 0.00138***
(5.796) (5.788)
FIRMSIZE 0.00995**
(2.024)
FIRMPROFIT -0.0293*
(-1.671)
FIRMCF 0.000355
(0.588)
Constant 0.0516*** 0.0447*** -0.0494
(137.6) (30.51) (-1.083)
Firm-Year FE Yes Yes Yes
Number of Firm-Years 17,757 17,757 17,757
Observations 697,734 697,734 697,734
R-squared 0.000 0.000 0.000
Table 7 presents the results of linear probability models from Equations (4) and (5). The
sample consists of all revisions to Wikipedia pages from 2001 to 2019 of firms listed in
the Russell 3000 index with available COMPUSTAT financial information (i.e., 697,734
revision observations). Variable definitions are in Appendix D. Two-tailed t-statistics
are presented in parentheses and calculated with standard errors adjusted for clustering
at the Firm-Year level. ***, **, and * indicate significance at the 1%, 5%, and 10%
levels, respectively. Please refer to Table 9 for robustness testing of these models, using
logit regressions with Firm fixed effects and Year fixed effects.
67
Table 8: Sensitivity analyses for the firm focus measure
Regressions investigating whether inferences are unaffected
by varying the thresholds used to code the FOCUS variable
Panel A: Sensitivity analyses for H1 in Table 4
(1) (2) (3) (4)
DELNEG DELNEG DELNEG DELNEG
FOCUS_WEAK 0.0128***
(5.97)
FOCUS_STRONG 0.0154***
(5.83)
FOCUS_LOW 0.0132***
(7.88)
FOCUS_HIGH 0.0150***
(4.67)
All control variables Included Included Included Included
Firm-Year FE Yes Yes Yes Yes
Panel B: Sensitivity analyses for H2a in Table 5
(1) (2) (3) (4)
DELNEG DELNEG DELNEG DELNEG
FOCUS_WEAK 0.0318***
(5.74)
FOCUS_WEAK × SHOCK -0.0253***
(-4.36)
FOCUS_STRONG 0.0357***
(4.86)
FOCUS_STRONG× SHOCK -0.0263***
(-3.40)
FOCUS_LOW 0.0253***
(7.50)
FOCUS_LOW × SHOCK -0.0160***
(-4.72)
FOCUS_HIGH 0.0375***
(3.36)
FOCUS_HIGH× SHOCK -0.0298***
(-2.80)
All control variables Included Included Included Included
Firm-Year FE Yes Yes Yes Yes
68
Panel C: Sensitivity analyses for H2b in Table 6
(1) (2) (3) (4)
USERIP USERIP USERIP USERIP
FOCUS_WEAK -0.0495***
(-3.52)
FOCUS_WEAK × SHOCK -0.121***
(-7.72)
FOCUS_STRONG -0.0931***
(-4.50)
FOCUS_STRONG× SHOCK -0.130***
(-5.82)
FOCUS_LOW -0.1374***
(-13.24)
FOCUS_LOW × SHOCK -0.1206***
(-11.33)
FOCUS_HIGH -0.0444***
(-1.91)
FOCUS_HIGH× SHOCK -0.1549***
(-6.12)
All control variables Included Included Included Included
Firm-Year FE Yes Yes Yes Yes
Panel D: Sensitivity analyses for H3a in Table 7
(1) (2) (3) (4)
FOCUS_
WEAK
FOCUS_
STRONG
FOCUS_
LOW
FOCUS_
HIGH
MISSTATE 0.0222*** 0.0176** 0.0137** 0.0139**
(2.95) (2.38) (2.06) (2.06)
All control variables Included Included Included Included
Firm-Year FE Yes Yes Yes Yes
69
Panel E: Sensitivity analyses for H3b in Table 7
(1) (2) (3) (4)
DELNEG DELNEG DELNEG DELNEG
MISSTATE -0.00507 -0.00574 -0.00572 -0.00503
(-1.44) (-1.63) (-1.16) (-1.45)
FOCUS_WEAK 0.0133***
(5.86)
FOCUS_WEAK × MISSTATE -0.00450
(-0.66)
FOCUS_STRONG 0.0151***
(5.50)
FOCUS_STRONG × MISSTATE 0.00318
(0.35)
FOCUS_LOW 0.0131***
(7.54)
FOCUS_LOW × MISSTATE 0.00158
(0.35)
FOCUS_HIGH 0.0158***
(4.60)
FOCUS_HIGH × MISSTATE -0.00699
(-0.72)
All control variables Included Included Included Included
Firm-Year FE Yes Yes Yes Yes
Table 8 presents the results of sensitivity analyses for the firm focus measure using linear probability
models to estimate Equations (1) through (5). The sample consists of all revisions to Wikipedia pages from
2001 to 2019 of firms listed in the Russell 3000 index with available COMPUSTAT financial information
(i.e., 697,734 revision observations). Variable definitions are in Appendix D. FOCUS_WEAK
(FOCUS_STONG) is 1 if a user makes more than 50% (80%) of all their Wikipedia revisions on the page
of a single firm and the user has made at least 3 revisions on any Wikipedia page, and 0 otherwise.
FOCUS_LOW is 1 if a user makes more than 65% of all their Wikipedia revisions on the page of a single
firm, and 0 otherwise. FOCUS_HIGH is 1 if a user makes more than 65% of all their Wikipedia revisions
on the page of a single firm and the user has made at least 5 revisions on any Wikipedia page, and 0
otherwise. Two-tailed t-statistics are presented in parentheses and calculated with standard errors adjusted
for clustering at the Firm-Year level. ***, **, and * indicate significance at the 1%, 5%, and 10% levels,
respectively.
70
Table 9: Alternative model selection
Regressions investigating whether are unaffected
when using logit regression models in lieu of linear probability models
Panel A: Sensitivity analyses for H1 in Table 4, H2a in Table 5, and H2b in Table 6
(1) (2) (3)
VARIABLES DELNEG DELNEG USERIP
FOCUS 0.312*** 0.542*** -0.501***
(8.453) (6.879) (-5.706)
SHOCK 0.154*** -0.0232
(2.684) (-0.460)
FOCUS × SHOCK -0.323*** -0.473***
(-3.894) (-5.184)
All control variables Included Included Included
Firm FE Yes Yes Yes
Year FE Yes Yes Yes
Observations 687,528 687,528 692,603
Panel B: Sensitivity analyses for H3a and H3b in Table 7
(1) (2)
VARIABLES FOCUS DELNEG
FOCUS 0.312***
(8.921)
MISSTATE 0.143*** -0.0278
(2.696) (-0.882)
FOCUS × MISSTATE 0.00453
(0.0448)
All control variables Included Included
Firm FE Yes Yes
Year FE Yes Yes
Observations 679,281 687,493
Table 9 presents the results of logit regression models for Equations (1) through (5).
The sample consists of all revisions to Wikipedia pages from 2001 to 2019 of firms
listed in the Russell 3000 index with available COMPUSTAT financial information
(i.e., 697,734 revision observations). Variable definitions are in Appendix D. Two-
tailed z-statistics are presented in parentheses and calculated with standard errors
adjusted with bootstrapping. ***, **, and * indicate significance at the 1%, 5%, and
10% levels, respectively.
71
Data Appendix
Figure DA.1: Example view of a current Wikipedia page
F
ticker
symbol
Select to
edit page
72
Figure DA.2: Summary of firms with Wikipedia pages
Panel A: Number of firms in Sample
(Less) No. of firms
Russell 3000 firms listed in index (as of
3/26/2019) 2,925
Less: Firms without Wikipedia page (1,002) 1,923
Less: Wikipedia pages without firm ticker listed (74) 1,849
Sample of firm pages 1,849
Panel B: Size, profitability, and cash flows of firms, as of year-end 2018
No. of firms
Size
(lnTA)
Profitability
(ROA)
Operating CF
(lnCF)
Firms with Wikipedia page 1,849 8.30 .026 5.83
Firms without Wikipedia page 1,076 6.69 -.081 4.10
t-stat. 25.5*** 11.8*** 24.9***
The sample consists of 2,925 firms in the Russell 3000 index as of 3/26/2019, of which 1,849 firms have a
Weda age h he f ce symbol listed on the most current Wikipedia page in 2019. Panel A describes
the sample selection process for firms with a Wikipedia page as of 3/26/2019. Panel B describes fundamental
differences between firms with and without a Wikipedia page. T-statistics comparing the mean values are listed in
italics and *** denotes a significance level of 1%, based on two-tailed t-statistics.
73
Figure DA.3: Example view of an Editing screen
Select to edit
page in a view
similar to an MS
Word document
Select to view
revision history
(Re
h Lg)
Select to publish
edited revision
of Wikipedia
page
74
Figure DA.4: Example view of a Revision histor log
Select to view
words impacted by
each revision
(Dffeece
between re
Screen)
75
Figure DA.5: Summar of revisions on firms Wikipedia pages
Panel A: Number of revisions and users, and the life and size of pages (for the sample of 1,849 firm pages)
No. Mean SD 1
st
25
th
50
th
75
th
99
th
Revisions 925,410 501 1256 12 73 167 398 6759
Users 251,647 243 553 6 37 83 208 3079
Life of page (in years) 1,849 10.9 4.2 0.6 8.0 12.5 14.0 17.4
Size of page (in thousand bytes) 1,849 11.4 12.4 1.2 4.5 7.7 13.2 66.2
Panel B: Size and directional impacts of revisions on firm pages
No. Mean SD Min 25
th
50
th
75
th
Max
Size of revisions (in net bytes) 925,410 36.6 15,317 -6.4M -9 4 50 6.4M
No. of revisions that add bytes 925,410 .58 .49 0 0 1 1 1
Size of bytes added 533,892 792 14,210 1 10 36 148 6.4M
No. of revisions that delete bytes 925,410 .35 .48 0 0 0 1 1
Size of bytes deleted 327,049 -1,190 18,216 -64M -135 -30 -6 -1
The sample consists of 925,410 revisions to 1,849 firms in the Russell 3000 index as of 3/26/2019. Panel A
describes the number of revisions and users, and the life and size of pages for the average firm in the sample of
1,849 firm pages. Panel B describes the size and directional impacts of revisions on firm pages.
76
Figure DA.6: Editing activity over the sample period
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Total
(Combined 2-year Period)
Revisions Users
77
Figure DA.7: Firm page growth over the sample period
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
0
100
200
300
400
500
600
700
Bytes
No. of Pages Created
No. of Pages Page Size
78
Figure DA.8: Firm page activity across business sectors
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0%
5%
10%
15%
20%
25%
30%
% of Russell Firms
% of Total
Revisions Users Page Size Firms with Pages
79
Figure DA.9: Example view of a Difference between revisions screen
Select to view
Ue
cb
log
Yellow highlighting
identifies deleted content
and blue identifies inserted
80
Figure DA.10: Summar of words changed on firms Wikipedia pages
Panel A: Number of words collected and edited (for the sample of 1,849 firms)
No. Mean SD 1
st
25
th
50
th
75
th
99
th
Words collected 217M 117K 51K 546 7215 21,576 55,339 2.2M
Words edited 88M 48K 329K 56 1133 3778 10,850 1.1M
Panel B: Number of all words and types of tonal words deleted and inserted
Delete words (no. of revisions):
No. Mean SD Min 25
th
50
th
75
th
Max
All words deleted 925,410 .31 .46 0 0 0 1 1
Tonal words deleted 283,120 .27 .44 0 0 0 1 1
No. of words deleted:
All words deleted 283,120 151.6 2,177 1 2 5 20 769K
Tonal words deleted:
Negative words 283,120 1.25 6.92 0 0 0 0 219
Positive words 283,120 .57 2.94 0 0 0 0 65
Litigious words 283,120 .30 1.74 0 0 0 0 38
Uncertainty words 283,120 .32 1.68 0 0 0 0 48
Strong words 283,120 .13 .56 0 0 0 0 9
Weak words 283,120 .16 .80 0 0 0 0 13
Constraining words 283,120 .12 .78 0 0 0 0 22
Insert words (no. of revisions):
All words inserted 925,410 .43 .50 0 0 0 1 1
Tonal words inserted 398,091 .25 .43 0 0 0 1 1
No. of words inserted:
All words inserted 398,091 113.4 1824.7 1 2 6 21 769K
Tonal words inserted:
Negative words 398,091 .95 5.88 0 0 0 0 219
Positive words 398,091 .43 2.50 0 0 0 0 65
Litigious words 398,091 .23 1.49 0 0 0 0 38
Uncertainty words 398,091 .24 1.43 0 0 0 0 48
Strong words 398,091 .10 .49 0 0 0 0 8
Weak words 398,091 .12 .69 0 0 0 0 13
Constraining words 398,091 .09 .67 0 0 0 0 22
The sample consists of 217 million words, identified by collecting text form the 925,410 revisions to 1,849 firm
pages in the sample for the period 2001 to 2019. Panel A describes the number of words collected and edited for
the sample of 1,849 firms. Panel B describes the number of all words and types of tonal words deleted and inserted.
Words edited were classified by word search tools, using the Lgha & McDad (2011) dca e .
81
Figure DA.11: Example view of a User contributions log
82
Figure DA.12: Summary of users complete editing history of Wikipedia pages
Panel A: Number of revisions and pages, total size of revisions, and editing periods by user for the sample of
30M revisions to all Wikipedia pages versus the subsample of 925K revisions to only firm pages
Revisions:
No. Mean SD Min 25
th
50
th
75
th
Max
All-edits sample 29,967,963 137.8 291.4 1 2 7 73 1000
Firm page sub-sample 925,410 3.7 28.8 1 1 1 2 5,864
Pages:
All-edits sample 3,655,638 75.8 173.0 1 1 4 40 1000
Firm page sub-sample 1,849 1.8 10.8 1 1 1 1 1691
Size of revisions (in bytes):
All-edits sample 29,967,963 75 8450 -4.7M -6 5 49 10M
Firm page sub-sample 925,410 -339.2 17,071 -1.2M -2 9 62.5 6.5M
Editing period (in days):
All-edits sample 29,967,963 776.9 1231.0 1 1 115 1063 6564
Firm page sub-sample 925,410 119.7 507.4 1 1 1 1 6303
Panel B: Total number of users and revisions by type of user and sample
No. of
Users
% of
total
No. of
revisions
(All-edits
sample)
% of
total
No. of
revisions
(Firm page
sub-sample)
% of
total
Users: 251,647 100% 29,967,963 100% 925,410 100%
Visible IP address users 162,188 64% 8,304,848 28% 321,332 35%
Registered Wikipedia users 89,459 36% 21,663,115 72% 604,078 65%
Low-frequency users (<5 revisions) 110,791 44% 172,335 .58% 150,356 16%
Medium-frequency users 121,996 48% 12,434,628 41% 361,564 39%
High-frequency users (>1K revisions) 18,860 8% 17,361,000 58% 413,490 45%
The sample consists of 29,967,963 revision, defed b cecg Ue cb g f he 251,647 users
in sample of 925,410 revisions to 1,849 firm pages for the period 2001 to 2019. Panel A describes number of
revisions and pages, total size of revisions, and editing periods by user for the sample of 30M revisions to all
Wikipedia pages (All-edits sample) versus the sample of 925K revisions to only firm pages (Firm page sub-sample).
Panel B describes the total number of users and revisions by type of user and sample.
83
Figure DA.13: Number of users by number of revisions users make
Number of Revisions
Number of Users
84
Figure DA.14: Characteristics of users among different editing frequencies
Note: In Figure DA.14, the vertical axis on the left-hand side measures the number of
days a user edits Wikipedia (i.e., between the first and last edit on all Wikipedia pages).
The vertical axis on the right-hand side measures the average size of revisions a user
makes on all Wikipedia pages (in bytes added or deleted). The horizontal axis groups
users into three groups based on their frequency of edits to all Wikipedia pages. Low-
frequency users edit fewer than 5 total times on Wikipedia. Medium-frequency users
edit between 5 and 1000 total times on Wikipedia. High-frequency users edit more than
1000 total times on Wikipedia.
-500
-400
-300
-200
-100
0
100
200
0
200
400
600
800
1000
1200
1400
1600
1800
Low-frequency user Medium-frequency
user
High-frequency user
Average Bytes Added or Deleted
Number of Days
Editing Period Size of Revisions
Abstract (if available)
Abstract
I assemble a unique dataset covering 19 years and over 30 million edits to the Wikipedia pages of firms in the Russell 3000 index. I create a measure that indirectly detects the edits that are most likely to have been made by firm insiders (e.g., employees maintaining a firm's social media presence). Using this measure, I find insiders bias their firms' Wikipedia pages by systematically removing negative words about their firms. I further find that bias and insider editing are attenuated after an exogenous shock which reduced editing anonymity and increased the reputational costs of insiders making self-serving edits. Finally, I find that insider editing occurs more often in periods when firms are materially misstating their financial reports. This study contributes to the strategic disclosure literature by providing the first large-sample evidence of biased editing on Wikipedia, as well as being the first to examine an important mechanism for disciplining firm voluntary disclosures? "crowd monitoring." More broadly, my dissertation speaks to the concern that self-interested parties are able to spread misinformation on social media.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Understanding the disclosure practices of firms affected by a natural disaster: the case of hurricanes
PDF
Essays on the firm and stakeholders relationships: evidence from mergers & acquisitions and labor negotiations
PDF
Voluntary disclosure responses to mandated disclosure: evidence from Australian corporate tax transparency
PDF
Private disclosure contracts within supply chains and managerial learning from stock prices
PDF
Disclosure distance and earnings announcement returns
PDF
Effectiveness of the SEC’s comment letters in initial public offerings
PDF
Investment behavior of mutual fund managers
PDF
Gone with the big data: institutional lender demand for private information
PDF
The effect of managerial retention incentives on the relationship between financing constraints and voluntary disclosure
PDF
“What you see is all there is”: The effects of media co‐coverage on investors’ peer selection
PDF
Social movements and access to credit
PDF
Leveling the playing field: unbiased tests of the relative information content of book income and taxable income
PDF
Implied equity duration: international evidence
PDF
The changing policy environment and banks' financial decisions
PDF
Essays on information and financial economics
PDF
Changing fundamental analysis in the new economy: the case of DuPont analysis and STEM firms
PDF
The structure of strategic communication: theory, measurement, and effects
PDF
Picky customers and expected returns
PDF
Essays on innovation, human capital, and COVID-19 related policies
PDF
Underrepresentation of women in the U.S. banking industry’s top executive roles: why doesn’t the CEO look like me?
Asset Metadata
Creator
Ritter, Stacey Lynn
(author)
Core Title
Insider editing on Wikipedia
School
Leventhal School of Accounting
Degree
Doctor of Philosophy
Degree Program
Business Administration
Degree Conferral Date
2021-08
Publication Date
07/22/2023
Defense Date
06/10/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
accounting,bias,crowd monitoring,insider editing,insiders,misinformation,misstatement,OAI-PMH Harvest,Reputation,social media,strategic disclosure,Wikipedia,WikiScanner
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lennox, Clive (
committee chair
), Dechow, Patricia (
committee member
), Hoberg, Gerard (
committee member
), Soliman, Mark (
committee member
)
Creator Email
slritter@usc.edu,stacey.lynn.ritter@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15621711
Unique identifier
UC15621711
Legacy Identifier
etd-RitterStac-9865
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ritter, Stacey Lynn
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
accounting
bias
crowd monitoring
insider editing
insiders
misinformation
misstatement
social media
strategic disclosure
Wikipedia
WikiScanner