Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on information and financial economics
(USC Thesis Other)
Essays on information and financial economics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Essays on Information and Financial Economimcs
by
Yuan (Bruce) Li
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BUSINESS ADMINISTRATION)
May 2019
Copyright 2019 Yuan (Bruce) Li
Acknowledgements
I am indebted and extremely grateful to my advisor, Gerard Hoberg, for his guidance, motivation,
and support throughout my PhD study. This dissertation also benefited much from my dissertation
committee: Oguzhan Ozbas, Kenneth Ahern, Yongxiang Wang, and TJ Wong. Their support is what
makes this dissertation possible. I also thank Heitor Almeida, William Cong, Lily Fang (discussant),
Itay Goldstein, Zhiguo He, Christopher Jones, E Han Kim, John Matsusaka, Gordon Phillips, Sheri-
dan Titman, Song Ma, Emily Nix, Christopher Parsons, Andrei Shleifer, Ben Zhang, and participants
at the seminars at University of Southern California, University of Georgia, Cornerstone Research,
AQR Capital, and AFA Conference 2019 for helpful comments and suggestions. All remaining errors
are my own.
ii
Table of Contents
Acknowledgements ii
List Of Tables v
List Of Figures vii
Abstract viii
Chapter 1: Beyond Attention: The Causal Effect of Media on Information Production 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Empirical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Institution background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Identification strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.1 Press release clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2 Exogenous variation of on-screen time . . . . . . . . . . . . . . . . . . . . . . 17
1.3.3 On-screen time and media coverage . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.4 Exclusion restriction requirement . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4 Main effect on information production . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.1 Possible mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.2 Information producers with different skills . . . . . . . . . . . . . . . . . . . . 32
1.5.3 Results by different firm characteristics . . . . . . . . . . . . . . . . . . . . . . 34
1.5.4 Market outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.6 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.6.1 Are the effects through media coverage? . . . . . . . . . . . . . . . . . . . . . 39
1.6.2 Pre-scheduled press releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.6.3 Endogenous disclosure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 2: U.S. Innovation and Chinese Competition for Innovation Production 44
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2 Literature and Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3 Data and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.1 Sample Selection and Panel Structure . . . . . . . . . . . . . . . . . . . . . . 54
iii
2.3.2 Patent Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.3 Internet Penetration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4 Summary Statistics and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.4.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.4.2 Validation Test: EDGAR Downloads by Chinese Internet Users . . . . . . . . . 62
2.4.3 Validation Test: Complaints about Chinese Competition . . . . . . . . . . . . 64
2.4.4 Placebo Tests using Other Major Economies . . . . . . . . . . . . . . . . . . . 67
2.5 Competition and Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.5.1 Impact on U.S. Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.5.2 Impact on Chinese Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.5.3 Impact on Firms in Placebo Economies . . . . . . . . . . . . . . . . . . . . . . 73
2.5.4 Competition and Asset Composition . . . . . . . . . . . . . . . . . . . . . . . 74
2.5.4.1 High versus Low Growth Options . . . . . . . . . . . . . . . . . . . . 75
2.5.4.2 Trapped Assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Appendix A
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A.1 Figures for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A.2 Figures for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Appendix B
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B.1 Tables for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B.2 Appendix Tables for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.3 Tables for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
B.4 Appendix Tables for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
iv
List Of Tables
B.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B.2 Covariate Balancing Test of The On-screen Time . . . . . . . . . . . . . . . . . . . . . 96
B.3 Press Releases’ On-screen Time and Media Coverage . . . . . . . . . . . . . . . . . . 97
B.4 Falsification tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
B.5 Media Coverage and Information Production . . . . . . . . . . . . . . . . . . . . . . . 99
B.6 The Effect of Media Coverage and Information Producer Characteristics . . . . . . . . 100
B.7 The Effect of Media Coverage and Firm Characteristics . . . . . . . . . . . . . . . . . 101
B.8 Media coverage and market reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
B.9 Media coverage and delayed response ratio . . . . . . . . . . . . . . . . . . . . . . . 103
B.10 Robustness tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.11 Variable definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.12 Data cleaning process to generate the main sample . . . . . . . . . . . . . . . . . . . 108
B.13 Robustness of Table B.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
B.14 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
B.15 Summary Statistics at the firm level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
B.16 EDGAR searches and Chinese internet penetration . . . . . . . . . . . . . . . . . . . 113
B.17 Competition complaints and Chinese internet penetration . . . . . . . . . . . . . . . 114
v
B.18 Placebo tests - Competition from other countries and Chinese internet penetration . . 115
B.19 Innovation activities and Chinese internet penetration . . . . . . . . . . . . . . . . . 116
B.20 Innovation activities and Chinese internet penetration - Poisson Regression . . . . . . 117
B.21 Patent citations and Chinese internet penetration . . . . . . . . . . . . . . . . . . . . 118
B.22 Placebo tests - patent citations from other counties and Chinese internet penetration 119
B.23 Subsample analysis - by Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B.24 Subsample analysis - by Asset Tangibility . . . . . . . . . . . . . . . . . . . . . . . . . 121
B.25 Variable definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.26 Robustness - Weights from Macro Data . . . . . . . . . . . . . . . . . . . . . . . . . . 123
B.27 Robustness - Top 1 provinces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
B.28 Robustness of Table B.19 Excluding Zero R&D Firms . . . . . . . . . . . . . . . . . . 125
vi
List Of Figures
A.1 Information Production after Press Releases . . . . . . . . . . . . . . . . . . . . . . . 85
A.2 Publication Time of Press Releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A.3 Coefficient Estimates for Different Event Days . . . . . . . . . . . . . . . . . . . . . . 87
A.4 Coefficient Estimates for Cumulative Effects . . . . . . . . . . . . . . . . . . . . . . . 88
A.5 Complaints about Chinese competition . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A.6 Internet penetration growth variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.7 Number of industries (SIC2) covered by Chinese public firms . . . . . . . . . . . . . . 92
A.8 Weight loadings by Province-Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
vii
Abstract
This thesis consists of two essays that study how the flow of information affects trading activities
and corporate actions. Chapter 1 shows that financial media can have causal effects on market par-
ticipants. I exploit random variation in the visual salience of corporate press releases to financial
journalists as an instrument to media coverage. Doubling the amount of media coverage increases
the number of EDGAR searches by 31% and the number of analysts issuing earnings forecasts by
78% in a two-day period. The evidence is most consistent with the theory of rational attention
allocation: sophisticated investors acquire more information for media-covered events as media
coverage signals higher variances of returns, and thus higher payoffs from having more precise
information. Analysts cater to the increased information demand from institutional investors by re-
sponding to media-reported events. Chapter 2 exploits the increasing information flow via internet
in the past decades as context and examines how competitive shocks from China impact U.S. inno-
vation through two margins: the markets for innovation and for existing products. Using Chinese
data, we map each industry to province Internet penetration levels using geographic agglomera-
tion data. The resulting industry-year database indicates the ability of Chinese firms to acquire
knowledge globally and compete in the market for intellectual property production. Increases in
provincial Chinese Internet penetration are followed by sharp reductions in R&D investment and
subsequent patents for U.S. firms, and increased patenting by Chinese firms. The new Chinese
patents also cite the patents of treated U.S. firms at a high rate, consistent with increased intel-
lectual property competition. In contrast, U.S. firms with fewer growth options and more tangible
assets tend to increase R&D and patenting activity. Overall, both competition in intellectual prop-
erty by Chinese firms and the asset competition of U.S. firms influence U.S. firm innovation.
viii
Chapter 1
Beyond Attention: The Causal Effect of Media on Information
Production
1.1 Introduction
A well-functioning financial market requires information to flow efficiently between corporations
and investors. Such information flows are often facilitated by a variety of information intermedi-
aries, including financial media, analysts, and investors’ information acquisition. While the litera-
ture has studied how each of these information intermediaries individually affects the information
environment and stock trading, little is known about whether and how these intermediaries inter-
act. Uncovering potential interactions helps us better understand the micro-process through which
stock prices incorporate new information. In this paper, I study how media coverage affect the
information acquisition of investors and earnings forecasts issuance of analysts.
Whether financial media complement, substitute, or do not affect other information intermedi-
aries is ex-ante unclear. On the one hand, investors and analysts might focus on less salient events
because these events are often associated with delayed market responses and possible abnormal
1
returns (Cohen and Frazzini, 2008; Menzly and Ozbas, 2010). On the other hand, media attention
could attract more retail traders and increase return volatility, thus increasing the reward of hav-
ing more precise signals. Or, media coverage is simply a side show for more sophisticated market
participants like analysts and institutional investors, who often have alternative and better access
to information. Ultimately, the relationship between media and other information intermediaries is
an open empirical question.
To answer the question, this paper studies that for a corporate announcement, does the media
coverage of it affects amount of investors’ information acquisition and analysts’ earning forecasts
about the firm. To construct a comprehensive sample of corporate announcements, this paper uses
a dataset that includes almost all the press releases issued by the US public firms during 2004 to
2017. Using an event-study approach based on these press releases, I find that when media articles
from Dow Jones Newswire cover an announcement, investors acquire more information about the
firm through the SEC EDGAR system, and more analysts issue earning forecasts for that firm.
One challenge for answering the above question, however, is that endogenous event- or firm-
characteristics would affect the coverage decisions of both media and other information interme-
diaries. For example, an earning surprise would attract media coverage, but at the same time also
induce analysts to update their earning forecasts, thus creating a positive correlation between the
responses from different information intermediaries.
This paper proposes a novel identification strategy based on the unique way that wire journalists
process information and produce news. For a newswire journalist, the typical workflow is to mon-
itor a real-time press release feed, select newsworthy events, and quickly replay the main points
to their subscribers. Because different newswires compete over speed, newswire journalists need
2
to produce news articles almost in real-time, and in many cases these news articles contain only a
headline summarizing the event.
A particular challenge for these journalists is that press releases cluster at specific times within
a day. For investors or analysts who only follow a handful of firms, such clustering is innocuous.
However, newswire journalists typically don’t specialize in any industry and thus cover the events
for the entire market. As a result, when many unrelated firms issue press releases at the same time,
the amount of information faced by newswire journalists could easily add up and cause cognitive
burden.
Key to my identification is that within a busy cluster of press releases, some press releases will be
more visually salient than similar others and thus receive more media coverage. Real-time systems,
which the journalists monitor, follow a ”first-in-first-out” rule: new content always shows up at the
top of the user interface, pushing current content down and onto later pages. Thus, the length
of time for which a press release stays in a prominent position (e.g. first page or the top of the
screen) is determined by the speed of new releases replacing it. Now let’s think of a cluster of press
releases. For a press release queued near the beginning of the cluster, its “on-screen” time is short
because the whole cluster follows and quickly replace it. In comparison, a press release queued
near the end of the cluster is followed by much fewer releases, thus it gets a longer on-screen time
than the releases queued near the beginning. In this paper, I define each busy cluster as the first 10
seconds of an hour, and within such a tight time frame, firms neither have incentives nor, in many
cases, the ability to precisely control the order within the queue
1
. On top of that, it is the number
of press releases from other firms that shifts the size of the cluster, thus the ultimate on-screen time
is unlikely determined by the actions of any single firm. Indeed, using variable balancing tests, I
1
For example, the user interface in PR Newswire only allows users to schedule press releases to the minute.
3
find my measure of on-screen time is uncorrelated with observable firm- or event-characteristics,
while the same characteristics significantly predict the amount of media coverage.
I find evidence for the on-screen time to be a valid instrument for media coverage. On the
relevance condition, I find that a shorter on-screen time leads to less media coverage. The effect
remains significant and large even with a rigid set of controls like firm-year interacted, date-hour,
and detailed topic fixed effects. On the exclusion restriction condition, Moreover, I find that when
the shorter on-screen time is caused by press releases from private firms, which likely only impact
wire journalists but not analysts and investors, the same results obtain.
Using an instrumental variable approach, I find that media coverage of an event significantly in-
creases the number of requests on the SEC EDGAR system. Doubling the amount of media coverage
on the press release day increases the number of abnormal EDGAR requests by 31% in a two-day pe-
riod. Perhaps more surprisingly, I also find more analysts issue earning forecasts for media-covered
events. Doubling the amount of media coverage leads to a 78% increase in the number of analysts
issuing earnings forecasts. The effects are short-lived: the effect on analysts disappears after two
days, and the effect on EDGAR searches disappears after five days. As a placebo test, I find no effect
of media coverage on the day prior to the press release.
The effect of media coverage can be through two possible channels. First, media coverage im-
proves the information production technology of investors and analysts. One can think of such
improvement as an additional signal that complements the research by investors or analysts (Gold-
stein and Yang, 2015), or extra attention make investors and analysts aware of the events, reducing
the search problem. Second, media coverage might increase the reward of having a precise sig-
nal. Sophisticated investors would rationally allocate more resource to ”learn” about firms that
have higher ex-ante payoff variance (Kacperczyk, Van Nieuwerburgh, and Veldkamp, 2016). The
4
increased information demand from also impacts the payoff to analysts, and clients’ demand is one
of the most important determinants for issuing earning forecasts or not. As Brown, Call, Clement,
and Sharp (2015) concludes from a survey of 182 analysts, “Demand from their clients is analysts’
most important motivation for making profitable stock recommendations and their second most
important motivation for issuing accurate earnings forecasts”.
Empirical evidence is more consistent with the reward-side rather than the production technology-
side explanations. First, if media coverage takes effect by improving information production tech-
nology, then we would expect lower-skilled investors or analysts should benefit more and exhibit
larger effects. However, I find institutional investors actually react even stronger to media coverage.
Analysts of higher skills, measured by accuracy or experience, also show either indistinguishable or
larger responses to media coverage. Second, the attention channel is also unlikely. I find investors
who have searched the same firm in the previous month, thus already know about the firm and has
less of a ”search problem” (Barber and Odean, 2008), also show larger responses to media cover-
age. Furthermore, it is hard to imagine that analysts would not know about an event if not for the
media coverage of it. After all, it is analysts’ job to maintain an information advantage.
In comparison, I find consistent evidence with the reward-side explanation. First, I find in
the effect of media on analysts are stronger in the subsample of firms with higher institutional
ownerships. Second, the market outcomes are consistent with a tug of war between two types
of investors, one attracted by media due to attention, and the other that consciously trades more
media-covered firms, profiting by trading against these potentially uninformed traders. On the
event day, media coverage significantly widens the effective spread, suggesting a relative increase
in informed traders. In the next two days, while media coverage does not affect absolute returns,
it significantly increases the intraday price ranges. Peress (2014) attributes similar effects to “less
5
price-sensitive traders who transact at less favorable prices”. I also find that media neither improves
nor deteriorates price efficiency, measured by the delayed response ratio from Dellavigna and Pollet
(2009). The result echoes Blankespoor, deHaan, and Zhu (2018) who also find no effect of media on
price efficiency, and is consistent with sophisticated and less sophisticated investors having opposite
effects on the price efficiency.
The most important contribution of the paper is to cleanly identify that media coverage has a
positive effect on the information produced by sophisticated market participants. Existing empiri-
cal evidence shows that media coverage increases the volatility of expected returns (Peress, 2014;
Blankespoor, deHaan, and Zhu, 2018), changes the investor base (Barber and Odean, 2008), and
possibly increases mispricing (Hillert, Jacobs, and M¨ uller, 2014; Ahern and Sosyura, 2015). As
a result, sophisticated investors may interpret media coverage as a reliable signal which suggests
a higher reward to their information production. The increased information demand from insti-
tutional investors also impacts analysts, who would cater to such demand by also shifting their
information production to media-covered events.
This paper first contributes to the literature that studies the interaction between different in-
formation channels in the financial market. Existing literature mostly focuses on the interaction
between sell-side analysts and sophisticated investors. Kacperczyk and Seru (2007) find that fund
managers with higher skills rely less on the public information from stock analysts. Chen, Cohen,
Gurun, Lou, and Malloy (2017) find that when the information production from analysts exoge-
nously decreases due to the closures and mergers of brokerage firms, sophisticated investors scale
up their information acquisition. This paper provides novel evidence that media could also impact
6
the information production of investors and analysts. The results challenge the conventional wis-
dom that media does not matter for sophisticated market participants who have better information
access and higher information processing skills.
This paper also contributes the literature that studies the role of media in the financial market.
I refer readers to Tetlock (2014) for an excellent review of the literature. In particular, this paper
adds to the work that studies the causal effect of media coverage. Engelberg and Parsons (2011)
use extreme local weather events as exogenous shocks to news delivery. Peress (2014) uses a set of
newspaper strike events as exogenous shocks to news production. Blankespoor, deHaan, and Zhu
(2018) use the staggered implementation of robo-journalism to study the causal impacts of synthe-
sizing information from analysts and other sources. Fedyk (2018) uses the random positioning of
news on Bloomberg terminals to study the effects of being on the front page. This paper introduces
a novel identification strategy that stems from the inefficiency in media production. Compared with
previous work, the strategy also applies to a more representative and larger sample.
1.2 Empirical setup
1.2.1 Institution background
The empirical setup of the paper centers around the events of corporate disclosures through press
releases. Since Regulation Fair Disclosure (Reg FD), press releases become an increasingly popular
method for corporate disclosures, given their fast delivery and broad reach
2
. Typically, public firms
2
Reg FD implicitly encourages the use of press releases due to its fast dissemination speed and wide reach of investors.
Reg FD states that “technological developments have made it much easier for issuers to disseminate information broadly.
Whereas issuers once may have had to rely on analysts to serve as information intermediaries, issuers now can use a variety
of methods to communicate directly with the market. In addition to press releases, these methods include, among others,
Internet webcasting and teleconferencing”. Similar argument can be found in Neuhierl, Scherbina, and Schlusche (2013).
The full content of Reg FD can be found at https://www.sec.gov/rules/final/33-7881.htm
7
will choose one of the top four wires, namely, PR Newswire, Business Wire, Market Wire, and
GlobeNewswire, to publish their announcements (Solomon and Soltes, 2012). These press releases
can cover a wide range of topics, and in my sample, the top three topics are “earnings”, “products-
services” (e.g., new product releases), and “labor-issues” (e.g., executive appointments).
These corporate disclosures, many of which contain new information that has large market im-
pacts, also spurs much follow-on information production from journalists, analysts, and investors.
To see this point, in Figure A.1, I plot the percentages of all the news articles, analysts earning fore-
casts, and web requests on EDGAR that are produced on different days following corporate press
releases
3
. Over 35% of all the news articles are published on days when firms issued new press re-
leases. Most earning forecasts by analysts are issued immediately following corporate disclosures,
with almost 50% of all the forecasts being published within two days after press release issuance.
The evidence on analysts is consistent with the findings in Altınkılıc ¸, Balashov, and Hansen (2013),
who show that over 50% of analysts forecasts are issued following earning or guidance reports.
Even EDGAR requests peak after corporate disclosures; about 20% of all the requests are made
within two days of press releases. Therefore, even though this paper uses an event-study approach
based on press releases, it still captures a significant portion of all information production activities.
[Figure A.1 here]
While media coverage shows similarly large responses to corporate disclosures as earning fore-
casts or EDGAR requests do, financial journalists, especially newswire journalists, have drastically
different workflow than stock analysts and investors (for simplicity, I call analysts and investors as
finance professionals hereafter) for monitoring and processing press releases.
3
The sample covers all the news articles and analysts forecasts covering the firms in my press release sample during
2004-2017. The EDGAR data is from Jan. 2004 to Jun. 2017. The press releases data includes all the press releases on the
top 4 press release wires.
8
There are three major differences. First, logistically, journalists and finance professionals use
different software. In the newsroom, a common software is a press release feed aggregator that
shows press releases from a variety of press release wires in real-time. While in the trading room,
traders and analysts often use professional services like Bloomberg or Thomson Reuters to gather
real-time information. Typically, the information stream in press release wires is quite ”noisy” in
the sense that anyone can pay to publish a press release, thus many releases are promotional or
advertisement-like. It is exactly the job of wire journalists to filter out these ”noisy” events and pick
material ones. As result, the second different is that the ”information inputs” are very different for
journalists and finance professionals. Wire journalists monitor a much larger set of firms and events
than financial professionals. The job of wire journalists is often not to conduct in-depth analysis.
In many cases, journalists produce media articles that only contain a headline summarizing the
event. Because these tasks require mostly general skills, wire journalists typically do not specialize
in certain industries thus need to monitor the events for the whole market. In comparison, investors
and analysts tend to focus on a much smaller set of firms. For example, Chen, Cohen, Gurun, Lou,
and Malloy (2017) find that fund managers tend to consistently acquire insider trading forms from
a small set of firms. Similarly, analysts tend to be industry-focused and only produce research for
a handful of firms. The third difference is that wire journalists monitor a passively determined and
continuously updating event stream, while finance professionals often rely on other active active
search, notification push, or custom event filters to focus on specific information. As a summary, we
can see the unique workflow of wire journalists from a job description of a Dow Jones Newswire
journalist
4
4
See the journalist’s bio at http://www.wsj.com/news/author/8056
9
Ian currently manages the U.K. companies desk, overseeing corporate news flashes and
quick fire fills for both the Dow Jones Newswire and The Wall Street Journal’s website.
The desk covers all stocks from the largest FTSE100s to the smallest AIM companies
across the whole range of industries and subject matters.
1.2.2 Data sources
This section introduces the sources of different datasets used in this paper.
Press release data
I start by compiling a comprehensive set of press releases as the main sample of this paper. The
press release data come from the RavenPack PR (press release) Edition. The data contains press
releases published in over 10 press release wires from 2004 to 2017. Importantly, the data includes
all the top 4 press release wires (PR Newswire, Business Wire, Globe Newswire, and Marketwired).
RavenPack also links these press releases to public firms and provides the CUSIP number(s) of
associated firms, allowing me to link the press release data to other datasets of the paper.
I impose a set of standard data filters to generate the final sample. First, I require theRELEVANCE
score of each press release-firm pair to be 100. This filter ensures that the firm is the main subject of
the press release, and such filter is recommended by RavenPack user manual. Second, I only include
the top 4 press release wires because other major wires in the data are mostly in the non-US regions,
such as Canadian Newswire or LSE Regulatory News Service. Third, I remove duplicated releases by
requiring theENS score to be 100. Fourth, I require the firm to be in the Compustat/CRSP universe.
Fifth, I require that the press release is issued on a trading day. Finally, to better associate later
responses with specific press releases, I keep only observations where a firm only issues one press
10
release on that day. Table B.12 in the Appendix shows the detailed data cleaning process as well as
how the number of observations changes after each step.
A key measure of this paper is the “on-screen” time of each press release. More formally, the
“on-screen” time is the length of time of a press release staying at a prominent position of the inter-
face of the software. It is impossible to precisely measure the on-screen time without introducing
many assumptions. In this paper, I construct the proxy for on-screen time based on the following
observation: the on-screen time of a press release is determined by the speed at which new press
releases replacing it. Therefore for each press release, I use the number of New Press Releases After
it in the next 30 seconds as an (inverse) proxy for its on-screen time. For parsimony, I refer to the
measure as NPRA hereafter. As the NPRA of a press release increases, its on-screen time shortens.
Although RavenPack PR Edition starts from 2004, the timestamp variable does not include the pre-
cise second information until April 1, 2006. Therefore to correctly measure NPRA, the final sample
of the paper starts from April 1, 2006.
Media coverage data
The media coverage data comes from RavenPack DJ Edition, which includes all the news articles
published on the Dow Jones Newswire from 2000 to 2017. I apply several data filters to generate
the media coverage measures. First, Dow Jones Newswire also include press releases. There press
releases are redistribution of the original content by automated algorithms. So I drop these ob-
servations from the news measure by excluding news articles whose NEWS-TYPE is PRESS-RELEASE.
Second, I require that the RELEVANCE score of each news-firm pair to be 100. Third, I also drop
news articles covering analysts analysis and stock market reactions to avoid reverse causality. To do
that, I drop news articles whose topic variable GROUP is in ‘analyst-ratings’, ‘credit-ratings’, ‘order-
imbalances’, ‘technical-analysis’, ‘stock-prices’, or ‘price-targets’.
11
For firmi on dayt, I define the abnormal number of media coverage, AbnNews
it
, as the log of
1 plus the number of media articles from Dow Jones Newswire covering firmi on dayt, minus the
log of 1 plus the average number of media articles covering firmi per day in the days[t70;t11].
The definitions of other media coverage variables can be found in Table B.11.
EDGAR log data
To measure information acquisition, I use the EDGAR server log files from SEC
5
. Whenever
someone accesses a web page or a filing on the EDGAR system, the log data will create an obser-
vation with (1) the IP address of the visitor, (2) the CIK number, a unique firm identifier used by
EDGAR, (3) the accession number, a unique filing document ID, and (4) the timestamp of the web
request. The data contains all the web traffic records from Feb. 2003 to Jun. 2017 and contains
rich information about how investors access information through the EDGAR system. The data is
also massive in size: 2016 alone contains over 6.6 billions observations.
Following Loughran and McDonald (2017), I apply several filters to clean the data. First, I
omit index page requests (idx = 1) to avoid double counting. Second, I drop requests whose
server code is of 300 or higher, as these requests are either redirection requests or error requests.
The third task is to drop web requests from web crawlers, which access and download SEC filings
through automated algorithms. I start by dropping web requests from IP addresses that explicitly
reveal themselves as web crawlers (crawler = 1). However, not all web crawlers explicitly reveal
itself in the user-agent. Existing literature proposes two methods in identifying web crawlers. First,
proposed by Drake, Roulstone, and Thornock (2015), an IP address is a web crawler if it made more
than five web requests in a minute more than a thousand web requests in a day. Second, proposed
by Lee, Ma, and Wang (2015a), an IP address is a crawler if it requested information for more than
5
Details on the data can be found in https://www.sec.gov/dera/data/edgar-log-file-data-set.html. Also see Loughran
and McDonald (2017) for a general discussion about the data. Papers using the same dataset include Lee, Ma, and Wang
(2015a) and Drake, Roulstone, and Thornock (2015) among others
12
50 firms in a day. The results of this paper are robust using either of the method. For simplicity, in
the main text I only show results using the first method to identify web crawlers.
6
Similarly, for firmi on dayt, I define abnormal number of EDGAR requests, AbnEdgar
it
, as the
log of 1 plus the number of EDGAR requests about the filings from firmi on dayt, minus the log
of 1 plus the average daily number of EDGAR requests about firm i in the days [t70;t11]. I
augment the EDGAR database by matching the first three octets of IP addresses to the IP ranges of
different institutions, which is provided by MaxMind
7
.
Analyst data
The analysts’ earning forecast data comes from the unadjusted detail file in I/B/E/S. I keep
observations that have non-missing EPS, announcement date, analyst ID, and broker firm ID. On
a single day, an analyst may issue multiple forecasts that have different period ends for the same
firm, and these forecasts would be separate records in the I/B/E/S data. To avoid double counting,
for each firm-day, I count the number of unique analysts who issue any EPS forecast as the relevant
measure. Thus multiple forecasts from the same analyst would only be count once for a firm-day.
I define the abnormal number of analyst forecasts, AbnAnalyst
it
, as the log of 1 plus the number
of unique analysts issuing earning forecasts for firmi on dayt, minus the average daily number of
analysts issuing earning forecasts in days [t70;t11].
Other data
The rest of the data comes from a variety of sources. First, the stock return, trading volume,
and price data from CRSP. Second, firm characteristics data comes from Compustat. Third, the
6
These methods will lead to conservative measures of EDGAR requests from human users, because these methods
possibly have large type I errors in identifying web crawlers. The reason is that many institutions have more users than the
IP addresses they have. Thus they adopt network address translation (NAT) method to route the web traffics, and as a result,
many users might share a single outbound IP address, and the above methods might falsely tag these IPs as web crawlers.
7
One can download the data from the ASN (autonomous system number) data file from MaxMind. To generate a time-
varying mapping file between IP blocks and institutions, I use WayBack Machine to extract the historical versions of the ASN
file. Details of the data construction process can be found in Online Appendix B.
13
institutional ownership and institutional name come from Thomson Reuters. Fourth, the effective
spread is calculated using the trade and quote data from TAQ.
Summary Statistics
Table B.1 shows the summary statistics of the variables. To control for the effect of outliers, I
winsorize all the variables at the 1% and 99% level
8
. My main regression sample contains 131,683
press releases from 7,503 unique firms. On average, a firm issues about 12.50 press releases in
a year, and 3.81 of them will be in the first 10 seconds of an hour. On average 48% of the press
releases will be covered by Dow Jones Newswire, and each press release receives an average of
1.75 media articles.
1.3 Identification strategy
This section introduces the identification strategy of the paper. More specifically, I first show that
press releases cluster and over-crowd at specific times within a day. Next, most importantly, I show
that within these busy clusters, some press releases will randomly stay on the prominent position
of the computer screen (“on-screen” time) that journalists monitor for a longer time than similar
releases. I then show that such a longer on-screen time leads to significantly more media coverage
from Dow Jones Newswire. Finally I test possible violations of the exclusion restriction requirement
that the “on-screen” time is exogenous to the actions of analysts or investors.
8
All of the results in this paper are robust to removing the winsorizations
14
1.3.1 Press release clustering
Using 738,196 press releases from 8,756 firms over April 1, 2006 to December 31, 2017, I show
that firms issue most of their press releases in non-trading hours. Moreover, within each hour, press
releases heavily cluster at the exact-hour and half-hour points.
[Figure A.2 here]
Figure A.2(a) shows that the press releases cluster in non-trading hours. To plot the figure, I first
split a 24-hour day into 288 5-minute intervals. Then for each firm, I calculate the percentage of
press releases that are published in each 5-minute bin. Finally for each 5-minute interval, I calculate
the average percentage of 8,756 firms. Each bar represents the average percentage of press releases
published in that 5-minute interval, and the dashed lines represent the 95% confidence intervals
of the group means. The first observation from Figure A.2(a) is the contrast between trading and
non-trading hours. The pre-market period (7-9AM) and the post-market period (4PM) contain the
majority of the press releases, while the trading hours contain a much smaller number of press
releases.
Moreover, we see that the number of press releases also varies a lot within each hour. For
example, while 8AM is a busy hour, the majority of the press releases are issued in the first five
minutes (8:00 - 8:04) and the five minutes after the half-hour (8:30 - 8:34). To better see the
pattern, I use a darker shade to denote the two five-minute bins after the exact-hour or half-hour
points, and we can see that the darker bars stand out in almost every hour. The pattern that press
releases cluster at the exact-hour or half-hour points is more obvious in Figure A.2(b), which plots
the percentage of press releases by the minute of its publication time. On average, over 25% of
15
a firm’s press releases are issued in the first minute of an hour, and almost 15% issued in the
thirty-first minute. These two minutes collectively consist over 40% of all press releases.
The preference to make announcements at these “integral” points is not hard to understand.
When we make appointments, we naturally tend to schedule events at these “integral” points like
the exact hour or the half-hour. The same social convention applies when managers determine
disclosure times. Think about a management team discussing when to disclose the new earning
results. The plan very likely will be “to disclose at 8 o’clock sharp” rather than “let’s do 8:03”. The
benefit of such social convention is that knowing the important releases will happen at the exact- or
half-hour points, typical investors or analysts only need to pay close attention during these specific
times, thus the convention greatly reduces the ”idle” time of waiting for the announcement and
frees investors or analysts to perform other tasks in other times.
However, such social conventions could negatively impact wire journalists. Investors and an-
alysts typically monitor the disclosures from only a few firms, thus the possibility of these firms
issuing press releases at the same time is low. Even if they do, the amount of releases is easy to
manage. However, wire journalists cover the events for the entire market, at busy times like 4PM
sharp, the number of firms issuing press releases could add up to a large number. As can be seen
from Table B.1, for press releases issued in the first 10 seconds of 7-9AM or 4PM, they are on av-
erage followed by 51 new press releases in the next 30 seconds. In the next section, I show that
a shorter on-screen time significantly reduces the amount of media coverage even after controlling
for a rigid set of firm and event characteristics. More importantly, the variation of the on-screen
time is likely exogenous.
16
1.3.2 Exogenous variation of on-screen time
The on-screen time of a press release affects its media coverage because of the unique way that wire
journalists produce news from press releases. A typical workflow of wire journalists is to monitor a
real-time press release feed, select newsworthy events, and summarize the content and distribute
the article to their subscribers. The real-time press release feed is constantly updating, and the
amount of new press releases determine how quickly the user interface updates. Therefore, during
busy times like 8:00 or 16:00, the computer screen in front of journalists would quickly update as
new releases flood in, causing cognitive challenges.
The most important building block of the identification strategy is that during these busy times,
some press releases will stay on a prominent position of the user interface longer than similar
press releases. Real-time systems, which journalists monitor, follows a ”first-in-first-out” rule: new
content always shows up at the top of the user interface, pushing current content down and then
onto later pages. The length of time for which a press release stays in a prominent position (e.g.
first page or the top of the screen) is determined by the speed of new releases replacing it. Now
think of a cluster of press releases all issued at the 4 o’clock sharp. For a press release queued near
the beginning of the cluster, its on-screen time is short because the whole cluster following it will
quickly replace it. In comparison, a press release queued near the end of the cluster is followed
by much fewer releases, thus it gets a longer on-screen time than the releases queued near the
beginning. In this paper, I define each cluster as the first 10 seconds of each hour. The queuing of
the press releases within such a tight time frame is likely exogenous, as firms neither have incentives
nor, in many cases, the ability to precisely control the order within the queue
9
. On top of that, it
9
For example, the user interface in PR Newswire only allows users to schedule press releases to the minute.
17
is the number of press releases from other firms that shifts the size of the cluster, thus the ultimate
on-screen time is unlikely determined by the actions of any single firm.
As a support for its randomness, I show that on-screen time is uncorrelated with many firm- or
event-characteristics covariate balancing tests. More importantly, the same characteristics signifi-
cantly predict the amount of media coverage from Dow Jones Newswire. To conduct the covariate
balancing test, I first construct a sample of busy clusters. Figure A.2(b) shows that the first minute
of an hour is the busiest, thus I define the busy clusters as the first 10 seconds of each hour. Using
a tight time frame to define busy clusters has two additional benefits. First, the press releases in
these clusters are similar in nature. Second, the tight time frame greatly reduces the concern for
strategic press release timing as discussed in Dellavigna and Pollet (2009) and Michaely, Rubin,
and Vedrashko (2016). As a proxy for the on-screen time, I construct NPRA, or the number of new
press releases issued in the next 30 seconds. As NPRA of a press release increases, its on-screen
time decreases. I then test the following regression
log(NPRA
ij
+1) =X
ij
++"
ij
(1.1)
In the regression, the dependent variable is the log of 1 plus NPRA. The key independent vari-
able, X, represents firm-characteristics including market to book ratio (Q), total asset, and firm age.
Solomon and Soltes (2012) show that these characteristics strongly correlated with the amount
of media coverage. Further more, I also use the event-characteristics like event-sentiment or title
length
10
as the independent variable X. RavenPack adopts sophisticated natural language process-
ing techniques and expert reviews to generate a set of sentiment scores. The goal of these sentiment
scores is to create sufficient event summaries and to assist in trading. For example, the press release
10
While a better measure would be the length of the full press release, RavenPack does not provide any this or any similar
measure.
18
in which Sanofi announces positive results for a trial study on June 6th, 2015
11
, receives an event
sentiment score (ESS) of 87. In comparison, the press release in which Micron disclosed decreases
in demand receives an ESS score of 17
12
. The regression also controls for a moderate set of fixed
effects, including firm-, date-, hour-, and broad topic fixed effects. The regression sample contains
131,683 press releases that are published in the first 10 seconds of each hour over the period of
April 2006 to December 2017.
[Table B.2 here]
Table B.2 shows that none of these characteristics significantly predict NPRA, as can be seen
from the insignificant coefficient estimates in the table. Furthermore, in Column (7), I use all the
firm- and event-characteristics as independent variables, and the joint F-statistics is only 0.39 with
a p-value of 0.886. However, I find that the same set of characteristics significantly predict the
amount of media coverage. In Column (8) of Table B.2, I use the log of 1 plus the number of media
articles on the event day as the dependent variable. In sharp contrast to the previous columns,
all the coefficient estimates are significant, and their jointly F-statistics is 147.7. The significant
coefficient estimates in Column (8) shows that these variables are highly relevant in terms of the
newsworthiness. Yet, none of them is significantly correlated with the proxy of on-screen time,
consistent with the conjecture that in these tightly defined busy clusters, the variation of on-screen
time is largely random.
In the next section, I show that a shorter on-screen time leads to less media coverage, satisfying
the relevance condition to use the on-screen time as an instrument to media coverage. In the final
11
See http://mediaroom.sanofi.com/sanofi-announces-positive-results-for-toujeo-in-phase-iii-study-extension-in-
japanese-people-with-uncontrolled-diabetes-2/ for the press release
12
See http://investors.micron.com/releasedetail.cfm?ReleaseID=440412 for the press release
19
section, I discuss possible violations of the exclusion restriction condition, and find no evidence that
reject that the exclusion condition holds.
1.3.3 On-screen time and media coverage
I test whether on-screen time of a press release affects its media coverage by estimating the follow-
ing regression.
AbnNews
ijt
=log(NPRA
ijt
+1)++"
ijt
(1.2)
In the regression,AbnNews
ijt
is the abnormal media coverage measure defined in Section 1.2.
The key independent variable is NPRA, which measures the number of new press releases issued
immediately after the press release j in the next 30 seconds, and is an inverse proxy for the on-
screen time. represents a set of fixed effects that I am going to include in the regression. As in
Table B.2, the regression sample contains all the press releases issued in the first 10 seconds of an
hour.
[Table B.3 here]
I find that a larger NRPA, thus a shorter on-screen time, significantly decreases the amount
of media coverage even after controlling for a rigid set of firm- and event-characteristics. Table
B.3 shows the results. As a benchmark, in Column (1) the regression only includes the date-
hour fixed effects to control for the differences across clusters. The coefficient estimate is highly
significant and economically large. Doubling NPRA
13
, thus the on-screen time of a press release
decreases, will decrease the amount of abnormal news by 12.8%. I incrementally introduce more
13
The standard deviation of log(NPRA+1) is 1.11, thus doubling NPRA is close to an increase of one standard deviation.
20
fixed effects to control for possible omitted variables at the firm and the press release level. Column
(2) further includes firm fixed effects to control for firm-invariant characteristics. Compared with
the coefficient estimate in Column (1), the new estimate slightly increases in its magnitude (from
-0.128 to -0.132). Column (3) introduces firm-year interacted fixed effects, and the analysis is
essentially to compare two press releases issued by the same firm in the same year. The more
rigid firm-characteristics controls actually increase the magnitude of the coefficient estimate. The
coefficient estimate in Column (3) is -0.163. These results reveal that the endogenous factors at
the firm-level likely work against me finding the effect.
To control for the characteristics of different press releases, I utilize the topic measures devel-
oped by RavenPack. RavenPack applies sophisticated textual analysis to categorize press releases
into different topic groups. These topic measures are essentially the summaries of the underlying
events. In Column (4) of Table B.3, I further include the fixed effects that control for broad topic
classifications (28 unique groups). With the new fixed effects, Column (4) still shows a significant
estimate of -0.100. In Column (5), I further control for a more detailed topic classification and
include the new topic fixed effects (143 topic groups) in the regressions. The richer fixed effects
only slightly decrease the magnitude of the coefficient estimate, which now becomes -0.093 with
a t-statistics of -7.99. Overall, these results show that as the on-screen time decreases, or NPRA
increases, the firm receives less media coverage.
Such results are robust to news measures, sample selection, and the functional form of the de-
pendent variable. First, I find consistent and significant results when I use different news measures,
including using non-duplicated news, using only “flash” news that only includes a title, or using
only “full” articles that have at least a paragraph. Second, I find almost identical results when I use
different samples to address the potential issues. Section 1.6 provides a more detailed discussion
21
of these issues. Finally, I find robust results when I change the log measure of media coverage to
raw counts or dummy variables as the dependent variables. Table B.13 in the Appendix shows the
results.
In the next section, I discuss possible violations of the exclusion condition.
1.3.4 Exclusion restriction requirement
To use the on-screen time as an instrument to media coverage, the exclusion restriction condition
requires that the on-screen time does not correlate with the information production of investors
and analysts. There are two possible cases that this exclusion restriction condition does not hold.
First, some omitted variable might affect both the on-screen time and the information production
of investors and analysts. Such a case is unlikely. Previous tests have shown that the on-screen
time does not correlate with common observables. Furthermore, the regressions control for very
rigid fixed effects that absorb many firm-level and event-level variations. Instead of adding more
controls and showing the results hold, in this section, I first perform falsification tests to show that it
is indeed the limited cognitive resources of journalists, rather than unobserved factors, that caused
the effects.
The second concern is that investors and analysts might be similarly and directly impacted by
the on-screen time. Such a concern is also unlikely due to the drastically different event stream
that wire journalists and other finance professionals monitor. In the second part of this section, I
provide evidence to show that the on-screen time does not correlate with the information exposure
at the industry level, which typically applies to investors and particularly analysts. In Section 1.6,
I provide more empirical support by exploiting the fact that journalists and finance professionals
have different responses to the press releases from private firms.
22
Falsification test
In this section, I conduct a series of falsification tests to show that it is indeed the limited
cognitive capacity of journalists, instead of unobserved omitted variables, that drives the effects
documented in Table B.3.
[Table B.4 here]
First, in parallel to human journalists, automated algorithms in Dow Jones Newswire will also
redistribute some press releases. These algorithms are typically based on editorial judgments about
which firms or event types are relevant to the market. The exclusion condition is that NPRA should
not correlate with these economic factors, thus we would expect NPRA to have little impacts on
such automated coverage. Indeed, Column (1) of Table B.4 shows an insignificant estimate of -
0.005 when the dependent variable, DJPR, is a dummy that equals to 1 if a press release is covered
by the automated coverage. Another natural placebo test is to see whether the media coverage prior
to the press release day, which could represent the existing interest about the coming disclosures,
changes with NPRA. Column (2) of Table B.4 also shows an insignificant estimate of -0.003.
If the cognitive capacity is the cause, then the effect of on-screen time should be stronger in
busier times, and disappear during times that are not busy. Consistent with the hypothesis, I find
similar effects in the press releases published in the 31st minute of each hour. As shown in Figure
A.2(b), the 31st minute, which is the half-hour point, also holds clusters of press releases. Column
(3) of Table B.4 shows a significant estimate of -0.085, indicating a similar effect in the 31st minute
as well.
The result shows a drastic change when I estimate the same regression using press releases from
all other minutes (excluding the 1st and 31st minutes). Column (4) of Table B.4 in fact shows a
positive coefficient estimate of 0.051. The result shows that the endogenous factors very likely will
23
work against me finding the effect. When issuing important press releases, firm would spend more
efforts to ensure that the press releases are issued on time, thus the press release is more likely to
be at the early part of a cluster and has higher NPRA. Such endogenous forces will work against me
finding a negative effect of NPRA on media coverage.
I also separate the sample into the busy hours (7-9AM and 4PM) and all non-busy hours (all
other hours), based on the pattern in Figure A.2(a). Similarly, we would expect the effect to be
stronger in the busy hours and insignificant in the non-busy hours. Column (5) and (6) of Table
B.4 show consistent results. The coefficient estimate is -0.112 in the busy hours, and becomes
insignificant in all other hours. Finally, I also sort the sample into quintiles by NPRA, and test how
the effect changes with quintiles. Column (7) of Table B.4 shows that the effects are insignificant
in the first two quintiles (lower NPRA) and become significant and stronger in quintiles that have
higher NPRA.
Effect of on-screen time on investors and analyst?
First of all, Table B.4 shows that the effect only happens in situations where the amount of new
information is so large that the cognitive capacity is stretched to the limit. Compared with wire
journalists, investors and analysts follow a smaller set of firms. Almost all the real-time systems
allow for information filters, thus investors and analysts can monitor an event stream that is in-
dividually customized. As a result, when the amount of information for the whole market is too
much to handle, the information exposure to investors and analysts is still easily manageable at the
industry level. As supporting evidence, Column (6) of Table B.2 shows that the on-screen time is
uncorrelated with the total number of press releases issued by firms in the same industry (2-digit
SIC
14
).
14
The result is robust to using 1-digit SIC industry, or the text-based industry classification by Hoberg and Phillips (2016)
24
The second necessary condition for the effect is that the user needs to monitor the event stream
in real-time. Wire journalists do not directly control the information inflow because their job is to
screen the information as it comes. Investors and analysts, on the other hand, may adopt other
information acquisition methods. They could actively search (Da, Engelberg, and Gao, 2011; Ben-
Rephael, Da, and Israelsen, 2017) or set up alarms only for the firms or events that they intend to
follow. In these modes, the on-screen time of press release becomes an irrelevant factor.
In Section 1.6, I provide additional empirical evidence to support the hypothesis that the on-
screen time is not directly impacting investors or analysts. The idea for the tests comes from the
assumption that wire journalists follow the press releases from private firms while investor and
analysts don’t. When the on-screen time decreases due to the new press release issued by private
firms, the effect should be equally strong if all the effects are indeed through media. I find consistent
evidence with the hypothesis, and Section 1.6 introduces the detailed results.
1.4 Main effect on information production
In this section, I use the on-screen time, which is inversely measured by NPRA, as an instrument
and study the effect of media on investors and analysts. For the majority of the analyses, I use
two-stage least square (2SLS) regressions. In the first stage, I estimate Equation 1.2, and in the
second stage, I estimate the following regression
Y
ijt
=
\
AbnNews
ij0
++"
ijt
(1.3)
In the regression, the dependent variable will be the information production of investors or
analysts on day t, where day 0 is the event day. The key independent variable,
\
AbnNews
ij0
, is
25
the predicted abnormal media coverage on day 0 from the first stage regression. The regression
inclues the same set of fixed effects, including firm-year, date-hour, and detailed topic fixed effects.
Because Table B.4 reveals that the blink effect is only significant in busy hours (7-9AM and 4PM),
the following analyses will only use the press releases published in the first 10 seconds of these
busy hours to show sharper results, though all my results are robust to using all the hours.
Information acquisition through EDGAR searches
I first show that media coverage increases the amount of information acquisition from investors,
which is measured by the abnormal number of of EDGAR requests made by human users, AbnEdgar.
In Column (1) of Table B.5, I first regress AbnEdgar on AbnNew, and both variables are from the
press release publication day (day 0). The coefficient estimate is highly significant. Doubling the
amount of media coverage increases the abnormal number of EDGAR requests by 19%. Note the
regression controls for the same set of rigid fixed effects that aborb all annual firm measures. In
Column (2), I then regress AbnEdgar of day 0 directly on the instrument variable, log(NPRA +
1). The coefficient estimate is -0.034, showing that when NPRA increases, thus the on-screen time
of the press release decreases, the amount of EDGAR searches also decreases. In Column (3), I
show the second stage regression result. The first stage result is the same as the result reported in
Column (5) of Table B.4
15
, and the F-stat of the instrument variable is 25.5, well above the common
threshold of 10. Consistent with the result in Column (2), I find a positive effect of media coverage
on the amount of EDGAR searches. The 2SLS estimation shows that doubling the amount of media
coverage on the event day leads to 29% more EDGAR searches on the same day. In Column (4), I
replace the dependent variable by the cumulative abnormal number of EDGAR searches on days 0
15
Note the regression in Table B.5 Column (1) - (4) has slightly lower number of observations. This is becasue the EDGAR
log data stops at June 2017. Thus the exact first stage result slightly differ from the results reported in Column (5) of Table
B.4.
26
and 1, and find a slightly larger estimate of 0.313. Overall, the results show that media coverage
significantly increases the amount of EDGAR searches.
Figure A.3(b) plots the coefficient estimate of in Equation 1.3 using other event days. I re-
estimate the Equation 1.3 and replace the dependent variable by the abnormal number of EDGAR
searches on other event days. The figure provides three interesting observations. First, the coef-
ficient estimate on the day prior to the press release is insignificant and close to 0. Such a result
is consistent with the falsification test in Table B.4, and further validates the exclusion restriction
condition. Second, the coefficient estimates on days 1 to 3 are also significant at the 95% level. Be-
cause these EDGAR searches happen after the event day, the direction of causation is more definite.
Third, by day 5, the coefficient estimate almost completely converted back to 0, consistent with the
effect driven by a transient and temporary shock.
Note there are two caveats of using EDGAR requests as measures for investors’ information
acquisitions. First, EDGAR requests represent a lower bound of the overall information acquisitions.
Investors may acquire information from other alternative sources include professional services like
Bloomberg or online services like Yahoo! Finance. However, since I am documenting a positive
effect of media on EDGAR searches, this means I am also estimating a lower bound of the positive
effect. Even so, the magnitudes of the coefficient estimates are still quite large (31% in a two-day
period). The second caveat is that not all the EDGAR requests are from investors. To partly address
the issue, in Section 1.5, I try to identify a set of IP addresses that belong to institutional investors,
and importantly, their EDGAR searches also exhibit positive responses to media coverage.
Earning forecasts of analysts
I next test how media coverage changes the information production of analysts. In the Columns
(5) - (8) of Table B.5, the dependent variable is the abnormal number of analysts issuing earning
27
forecasts for the firm. Following similar analyses, I first estimate an OLS regression by regressing
AbnAnalyst on AbnNews. The coefficient estimate is both large and highly significant even control-
ling for the strict set of fixed effects. Doubling the amount of media coverage increases the number
of analysts issuing earning forecasts for the firm by 28.8%. In Column (5), I regress AbnAnalyst
directly on the instrument variable, and in Column (6), I estiamte the effect using 2SLS regres-
sions. Both columns show consistent results: doubling the amount of media coverage increases the
amount of analyst forecasts by 46.9%, and the t-stat for the second-stage estimate is over 6. In
Column (7), I find a even larger effect for the cumulative abnormal number of analyst forecasts.
Figure A.3(c) plots the coefficient estimate of in Equation 1.3 for other event days. First,
similar to the result in EDGAR searches, the coefficient estimate is insignificantly different from 0
on the day prior to the press release. Second, we note that the coefficient estimate is even larger
on day 1 than the event day. Such pattern is consistent with the pattern in Figure A.1. Third, the
coefficient estimate quick drops to insignificant since day 2. Indeed, analysts often produce reports
that are highly time sensitive, and they constantly make tradeoffs between speed and precision
(Beyer, Cohen, Lys, and Walther, 2010).
The large effect of media coverage on stock analysts is surprising, given our conventional wis-
dom that analysts are among the most informed set of participants in the financial market. It is
the job of analysts to closely follow corporate disclosures and provide timely analysis. Moreover,
analysts typically follow a small number of firms and have superior information access (e.g. di-
rect communication with management). Thus it is hard to think how media coverage might affect
the production function of analysts. However, it is possible that media coverage might affect the
demand of earning forecasts. As noted in a survey of 182 analysts from Brown, Call, Clement,
and Sharp (2015), “Demand from their clients is analysts’ most important motivation for making
28
profitable stock recommendations and their second most important motivation for issuing accurate
earnings forecasts”. To better understand the complementary relationship between media and in-
vestors and analysts, I explore possible mechanisms and test their relative strengths in explaining
the empirical evidence.
1.5 Mechanism
This section discusses the possible mechanisms that explain the complementary relationship be-
tween media coverage and investors and analysts. I first introduce the possible mechanisms. Next,
I show how the effects change with analysts and investors of different information production skills.
Then, I show how the effects change in different subsample of firms. Finally, I present the effect of
media coverage on the market outcomes to shed light on the underlying mechanism.
1.5.1 Possible mechanisms
To guide the later analyses, I start by discussing a general framework to categorize the possible
mechanisms. Suppose an information producer, being him an investor or analyst, could choose to
produce information for one firm from a pool of candidate firms. The information producer opti-
mally chooses the target firm for information production to maximize his expected payoff. Similar
to the payoff to the producers of physical goods, the payoff function to this information producer
contains two part: the reward part, which is determined by the demand for the information of the
chosen firm, and the cost part, which is determined by the producer’s own production technology.
The empirical result shows that when media covers a specific firm, the information producer is
more likely to switch his optimal decision to the media-covered firm. The goal of this section is to
understand why.
29
The effect might be through the cost part by changing the production technology. First, there
could exist a learning mechanism, where the media coverage provides additional information that
lower the cost for the information producer to generate more precision signals. Goldstein and
Yang (2015) show a model where there are two fundamentals that affect the security payoff. They
show that as the signal of one fundamental become more precise, investors will have incentives
to acquire more information about the other fundamental. One possible information that media
articles can provide is sentiment. For example, Tetlock (2007) documents the sentiment expressed
in a Wall Street Journal column can predict even the aggregate trading next day. If media coverage
helps investors to better gauge investor sentiment, then investors may also extend their research
about the fundamental value of the firm. However, news articles produced by wire journalists are
supposed to be factual and precise. In many cases, the article is simply a headline and contains
little additional information to the original announcement, thus the learning mechanism is unlikely
in the context of this paper.
In addition to the learning mechanism, media coverage could also impact the production part
through an attention mechanism. Investors face a fundamental searching problem to pick stocks
from thousands of candidate stocks. Thus it is possible that the investors would not have known
about the underlying event if not for the media coverage of it. The existing literature in media
has shown that the media coverage has a significant impact on investors’ attention, particularly
the retail investors. Yet, it is again hard to imagine that analysts rely on media to know about the
existence of corporate disclosures. After all, analysts are compensated for having an information
advantage. It is possible that investors might be subject to inattention, and if it is indeed the case,
we would expect the effect to be weaker for less inattention investors.
30
The effect might alternatively be through the reward part to the information producer’s pay-
off. For the same signal generated by the information producer, media coverage could make the
signal more valuable by shifting the demand of it. I call this the demand-side mechanism. Existing
empirical evidence shows that media coverage increases the volatility of expected returns (Peress,
2014; Blankespoor, deHaan, and Zhu, 2018), changes the investor base (Barber and Odean, 2008),
and possibly increases mispricing (Hillert, Jacobs, and M¨ uller, 2014; Ahern and Sosyura, 2015).
As a result, sophisticated investors may interpret media coverage as a reliable signal which sug-
gests a higher reward to their information production. Indeed, in the framework from Kacperczyk,
Van Nieuwerburgh, and Veldkamp (2016), investors first choose a subset of firms that they will
later ”learn”. One prediction of the model is that more resource is allocated to assets that have
high prior payoff variance, and that is exactly the effects of media documented in prior papers.
As investors increase their information demand for media-covered events, such demand increase
might also shift the payoff to stock analysts, whose clients are mostly institutional investors.
To shed light on the above mechanisms, I conduct three sets of analyses. First, I test how the
the results change with the skills of the information producer. Second, I test how the results change
with the characteristics of the underlying firm. Finally, I test the effect of media coverage on market
outcomes.
It is important to note that these different mechanisms need not to be mutually exclusive. In
fact, these mechanisms likely co-exist as the market contains information producers of different
level of sophistication. It is also beyond the scope of this paper to reject a mechanism or full
attribute the observed the effects to a single mechanism. What I try to achieve here is to evaluate
the relative prevalence of the mechanisms and test whether the data supports the prediction of these
31
mechanisms. This line of inquiry would benefit from future research that could better distinguish
among these different channels.
1.5.2 Information producers with different skills
In this section, I test whether the effects of media are different on information producers of different
skills. If the effect of media is mainly through changing the production technology, we would expect
investors and analysts of different skills react differently. Moreover, because the media articles in
the context of this paper are often simple headlines and contain little additional information, we
would expect the results to be stronger for lower-skilled or less-resourceful information producers
who lack alternative channels and might rely more on the media article to gather information.
However, I find contradicting evidence for the EDGAR searches, as higher-skilled institutional
investors exhibit even larger responses to the media coverage. Institutional investors are among
the most sophisticated participants in the financial market, and relative to retail investors, they
should have suffer less from inattention. To identify searches from institutional investors, I first
match IP addresses to known institutions which have an autonomous system number (ASN). The
IP-block and ASN link file comes from MaxMind
16
. The link file specifies different ranges of IP
addresses that are assigned to different institutions. Because these institutions are not limited to
the finance industry, so the second step is to identify which ones are actually financial institutions.
I use two methods to do that. First, I directly search for finance-related words like “bank” or “fund”
in the institution’s names. Second, I compile a list of names from all 13F institutions, and use
a name matching algorithm
17
to identify institutions that are in the universe of 13F institutions.
Online Appendix B further describes these two methods in details. Columns (1) and (2) of Table
16
https://dev.maxmind.com/geoip/geoip2/geolite2/
17
The algorithm is based the code written by Jim Bessen, available at http://goo.gl/m4AdZ.
32
B.6 show the regression result for the EDGAR searches from financial institutions. The dependent
variables are the cumulative abnormal number of EDGAR requests on days 0 and 1. In Column
(1), I identify institutional investors by searching for finance-related words, while in Column (2),
I identify institutional investors by matching to the names of 13F institutions. Both columns show
significant and larger magnitude (0.374 and 0.385) than the baseline estimates (0.313).
I also find little empirical support for the attention mechanism. I find that investors who already
follow the firm on EDGAR systems also significantly increases their EDGAR searches after media
coverage. As shown by Chen, Cohen, Gurun, Lou, and Malloy (2017), investors (mutual fund
managers) exhibit very persistent searching activities on the EDGAR system. Moreover, the investors
who already searched for a firm’s filings in the past suffer less from the search problem associated
with inattention (Barber and Odean, 2008) because they already know about the firm. If inattention
is the only cause for the effect of media on EDGAR searches, we would expect a weaker effect from
these existing EDGAR users. However, I actually find a slightly stronger effect for these investors.
In Column (3) of Table B.6, I only include human EDGAR searches where the same IP address has
accessed any document from the same firm in the previous month. The coefficient is 0.402, larger
than the baseline effect of 0.313.
The evidence from analysts also exhibit little support for the production technology channels,
as higher-skilled analysts react equally or perhaps stronger to media coverage. I first measure ana-
lysts’ skills by their relative forecast accuracy in the previous year. I calculate the relative accuracy
following the procedures in Hong and Kubik (2003) and Ljungqvist, Marston, Starks, Wei, and Yan
(2007)
18
. In Column (4) of Table B.6, the dependent variable is the abnormal number of analyst
18
The procedure exactly follows the footnote 6 in Ljungqvist, Marston, Starks, Wei, and Yan (2007). For analysti covering
firmk in yeart, I first calculate the absolute forecast error using the following steps. (1) get the analysts most recent forecast
of year-end EPS issued between Jan. 1 and Jun. 30, (2) calculate the difference with the subsequent realized earnings, (3)
scale the difference by previous year-end price. Then for all the analysts covering firmk in yeart, I re-scale the absolute
forecast errors so that the most and least accurate analysts scores one and zero, respectively. Finally, analyst i’s relative
forecast accuracy in yeart is his/her average score across the the stocks he/she covers over yearst-2 tot.
33
forecasts from analysts whose relative accuracy is above the median accuracy in the previous year.
The dependent variables in Columns (4) to (9) are the cumulative abnormal number of analyst
forecasts on days 0 and 1. Column (4) shows an coefficient estimate of . In Column (5), I test for
the analysts who have below-median accuracy in the previous year, and the estimated effect, , is
close to the estimate in Column (4). In Column (6), the dependent variable is the difference of the
two dependent variables in Columns (4) and (5), and the coefficient estimate is insignificant. In
Columns (7) to (9), I alternatively measure the skills of analysts by their experience with the firm,
which is defined as the number of years that the analyst has been covering the same firm in the past
five years. I find that analysts who have a longer experience with the firm, as shown in Column (7),
reacts even stronger to the media coverage than the less experienced analysts, as shown in Column
(8).
1.5.3 Results by different firm characteristics
In this section, I test whether the effects of media coverage change in different subsample of firms.
I first split the firms into two groups based on their average institutional ownership in the
previous year. Then for each subsample, I separately estimate Equation 1.3 and Table B.7 reports
the second stage regression results. The dependent variables are all cumulative abnormal measures
on days 0 and 1. In Columns (1) and (2), EDGAR searches show a slightly higher and much more
significant responses in the subsample with above-median institution ownership. Such a result
again lends little support to the attention mechanism. If it is indeed the searching problem that
media coverage helps to solve, firms that are already owned by many institutional holders should
be ”easily searched”, yet the subsample shows even larger responses to media coverage.
34
Columns (5) and (6) shows strong support for the demand-side mechanism for the effects of
media on analysts. In Column (5), I find that the effect of media (1.01) is much stronger in
the subsample where the institutional ownership is higher, compared with Column (6) (0.51) or
the baseline model (0.78). Indeed, a higher institutional ownership suggest that the increased
information demand is more likely to extend to analysts. In untabulated results, I find consistent
results when I use the number of institutional investors, rather than the level of ownership, to split
the sample.
I also split the sample into two groups based on the idiosyncratic volatility. In Columns (3) and
(4) of Table B.7, I find that the effect of media coverage is larger and more significant in the low
IVOL subsample (0.395) and the high IVOL (0.275) subsample. The high IVOL firms typically have
higher cost of arbitrage, thus investors, especially arbitrageurs, might choose to not participate
in trading the stock in the first place. In Comparison, I find the analyst forecasts show similar
responses in the two subsamples. Considering high-IVOL and low-IVOL firms might have different
difficulty in predicting earnings, the results further suggest the effect of media is unlikely from the
production side.
Overall, the results are consistent with the demand-side explanation, particularly for the results
with stock analysts. Analysts show a stronger responses to media coverage in firms with larger
institutional holdings, consistent with the increased likelihood that institutional clients request in-
formation from analysts. Note that while the results for EDGAR requests are in general consistent
with the demand-side mechanism, they could not definitely reject the mechanisms in the produc-
tion side. In the next section, I provide additional evidence from the market outcomes to further
shed light on the underlying channels.
35
1.5.4 Market outcomes
The rational attention allocation hypothesis in this paper suggests that sophisticated investors focus
on media-covered events to obtain higher returns. This section shows that media coverage attracts
trade from both possible informed and noise traders. In addition, while informed investors enter
the market right on the event day, I find more possibly noise traders on the following two days.
Consistent with a tug of war between these two type of investors, I find that while the trading
volume significantly increases, the overall price efficiency is not affected. The evidence in this
section lends strong support to the demand-side mechanism.
The market outcome variables come from a variety of sources. The trading volume, intraday
price range, and daily stock return data comes from CRSP. I calculate the abnormal daily turnover
following Tetlock (2010) as the log of 1 plus the turnover minus the log of 1 plus the average
daily turnover in the past 60 trading days. The effected spread variables, both equally-weighted
and value-weighted, is generated from TAQ data following Goyenko, Holden, and Trzcinka (2009).
More specifically, it is calculated as 2jlog(P
k
)log(M
k
)j, where P
k
is the price of the trade, and
M
k
is the mid-point of the consolidated BBO at the time of the trade. I define abnormal spread
following Blankespoor, deHaan, and Zhu (2018) as the effective spread over the average daily
effective spread in the past 60 trading days
19
. Following Peress (2014), I define daily range as the
log of the daily high price minus the log of the daily low. Finally, I calculate abnormal returns by
subtracting the CRSP value-weighted index return from the daily raw returns. Because the sample
contains press releases from both pre- and post-market hours, I now define the event day, or day 0,
as the first trading day after the press release is issued.
19
This definition slightly differs from the definition in Blankespoor, deHaan, and Zhu (2018), who use the window [-40,
-11] to calculate the average effective spread. I use the window of past 60 trading days for the consistency with other tests.
My results are robust to using the window of [-40, -11] to calculate the average daily effective spread.
36
[Table B.8 here]
I start by showing that media coverage significantly increases trading volumes. I estimate Equa-
tion 1.3 with the abnormal trading volume as the dependent variables. Column (1) of Table B.8
shows the regression results where the dependent variable is the abnormal turnover on the event
day. Doubling the amout of media coverage, the abnormal turnover will increase by 49.2%. The
large effect on the trading volume is consistent with a growing literature that find similar effects.
Barber and Odean (2008) find that news coverage significantly increases the buying activities of
retail investors. Engelberg and Parsons (2011) find that investors trade a stock more when local
newspapers cover the firm’s earnings. Peress (2014) documents a 12% decrease in trading volume
during strike days when newspapers could not be produced or delivered. Blankespoor, deHaan,
and Zhu (2018) find that trading volume increases by approximately 11% when the firm is covered
by machine-generated news articles. Fedyk (2018) shows that news articles at the front-page in-
duces 280% higher trading volumes than other similar news at less prominent positions in the next
10 minutes. The large effect documented in this paper confirms that my unique setting is highly
relevant to the financial market.
As evidence for the participation of informed traders, I find that the effective spread widens
on the event day with more media coverage. In Columns (2) and (3), I use the equal-weighted
and value-weighted effective spread on the event day as the dependent variable. The coefficient
estimates in both columns are significant. Doubling the amount of media coverage will increase
the equal-weighted (value-weighted) abnormal effective spread by 23.9% (26.6%), suggesting an
increase in the relative proportion of informed trades. The result is consistent with the increased in-
formation production of institutional traders documented before. The effect on the effective spread
37
disappears in the next two trading days, as shown in Columns (7) and (8) of Table B.8. The coeffi-
cient estimate for the average daily abnormal effective spread is insignificant and quantitatively tiny
(-0.036 for equal-weighted and -0.014 for value-weighted). Indeed, as analysts earning forecasts
increase the amount of public information and more retail traders participate in the trading, the
information asymmetry for market makers should decrease.
The evidence also shows that media coverage attracts more price-insensitive traders on days
after the press release day. More specifically, I find on days [1, 2], while media coverage does not
change the overall level of return, it increases the intra-day trading price range. Column (9) of
Table B.8 shows regression result where the absolute cumulative abnormal return on days [1, 2]
is the dependent variable. The coefficient estimate is only 0.003 and insignificant. In contrast,
Column (10), where the dependent variable is the average daily price range on days [1, 2], shows
significant coefficient estimate. Such results are similar to the findings in Peress (2014), who finds
that the absence of newspaper decreases the intraday price range while the aggregate level of return
is unchanged. Peress (2014) attributes similar effects to “less price-sensitive traders who transact
at less favorable prices”.
Collectively, the results suggest a tug of war between two types of investors, one attracted by
media due to attention, and the other that consciously trades more media-covered firms, profit-
ing by trading against these potentially uninformed traders. The lead-lag responses, where the
informed trade happens on the event day while less price-sensitive trade happens on the next two
days, are possibly caused by the slow information diffusion to retail traders, who might rely on
mass media for information (Blankespoor, deHaan, and Zhu, 2018). The pattern also echoes the
findings in Ben-Rephael, Da, and Israelsen (2017), who document a similar lead-lag relationship
in information searches. It is likely that these two types of traders would have the opposite effects
38
on price efficiency. Consistently, I find the overall price efficiency is unchanged by media coverage.
To measure price efficiency, I use the delayed response ratio measure from Dellavigna and Pollet
(2009). While the original work of Dellavigna and Pollet (2009) uses a period of 75 days to mea-
sure the total cumulative returns, I construct the delayed response ratio measure using a range of
periods, as my sample is not restricted to earning announcements, which is what Dellavigna and
Pollet (2009) focus on. Across all the measures, as shown in Table B.9, none of them show signifi-
cant results. The results are in line with the findings in Blankespoor, deHaan, and Zhu (2018) who
find that news does not improve nor impede the price efficiency.
[Table B.9 here]
1.6 Robustness
This section addresses several concerns that might bias the results of the paper.
1.6.1 Are the effects through media coverage?
This paper argues that all the changes in EDGAR searches and analyst forecasts are causally im-
pacted by media coverage. Such interpretation might not be valid if variation in the on-screen time
can directly impact investors and analysts. As a result, the observed effects are not necessarily
through journalists but from a common shock to all. However, this paper argues that the measured
on-screen time is unique to wire journalists due to the way they monitor and process information.
The measure is uncorrelated with the amount of information exposure at the industry level. In this
section, I provide additional empirical support to show that the effects are mostly likely through
journalists.
39
An ideal experiment to test whether the effects are through journalists is to find another shock
that only impacts journalists. If the effects are not through journalists, such exogenous shock should
cause no changes in analysts and investors. One possible candidate for the shock is the number of
press release from private firms. While some investors might also care about news from private
firms, it is unlikely that they will monitor the press releases from these private firms in real-time,
because such information is generally not tradable. In comparison, wire journalists also cover news
events from private firms, and the press release from private firms affect wire journalists in the
same way as the releases from public firms do. Therefore, the variation in the number of press
releases from private firms is likely a shock that only affects wire journalists.
Given the above assumption, I first test whether the composition of NPRA, that whether the fol-
lowing press releases are from private or public firms, changes the effect on analysts and investors.
Suppose the effects are not through journalists, then for a press release, as the proportion of its
following press releases that are from private firms increases, the effects on investors and analysts
should decrease (as the absolute number of press release from public firms decreases), while the
effect on journalists should stay unchanged (as the total number of press releases does not change).
I test this hypothesis by including an interaction term, log(NPRA + 1) x % Priv, where % Priv is
the percentage of press releases in NPRA that are issued by private firms. The dependent variables
come from the main tests before: the abnormal news coverage, the two-day cumulative abnormal
number of EDGAR searches, the two-day cumulative abnormal number of analyst forecasts, the
abnormal turnover on the event day, the absolute abnormal return on the event day, and the price
range on the event day. Panel (A) of Table B.10 shows that the coefficient estimate for the inter-
action term is not significant in all the tests, while the coefficient estimates for log(NPRA+1) are
almost identical as in my baseline results.
40
[Table B.10 here]
The insignificant effect could also be due to the power of the test. Therefore I next test whether
the variation in the number of press releases from private firms can generate similar effects as
documented in my main tests. In Panel (B) of Table B.10, I directly use log(NPRPriv + 1), the
number of press releases from private firms in the next 30 seconds following the press release, as
the instrument variable. The results show that all coefficient estimates for log(NPRPriv+1) are
significant, and the magnitude are also similar to the effects in previous tables.
Altogether, the results show that the investors and analysts do not react differently to press
releases issued by private firms. The evidence is most consistent with the hypothesis that the effects
in this paper are through journalists, meaning interpreting the results as the causal effects of media
is valid.
1.6.2 Pre-scheduled press releases
One limitation of the paper is that I could not distinguish press release that are pre-scheduled from
the ones that are manually submitted at the spot. A valid concern is that most pre-scheduled press
releases will show up in the first few seconds, thus typically have higher NPRA and lower on-screen
time. Should the pre-scheduled press releases be very different than manually submitted releases,
then our estimates might be biased. To evaluate such a concern, I rerun the main tests using a
sample that exclude all the press releases that are published in the first second of an hour. The logic
is that the first second would contain a much larger proportion of pre-scheduled press releases,
and if there indeed exists any biases, we would find very different estimates with this new sample.
Panel (C) of Table B.10 shows the regression results. All the coefficient estimates are very close to
41
the results in the baseline cases, suggesting that the concern about pre-scheduled press releases is
unlikely impacting the results.
1.6.3 Endogenous disclosure
Another limitation of the paper is that I could not address the endogenous incentives for issuing
press releases or not, thus there could exist some sample selection bias. However, since this paper
uses a very tight time frame to construct the sample and includes a rigid set of fixed effects, it is not
clear whether and how the endogenous press release issuance might bias the results. To partially
speak to the issue, I focus on the subset of press releases that are likely required by regulation, thus
the concern for endogenous disclosure is less. More specifically, I include press releases that are
about earnings or are accompanied by a new EDGAR filing on the same day. Panel (D) of Table
B.10 shows similar results using this subsample.
1.7 Conclusion
This paper documents that wire journalists may inefficiently select and report corporate events
when they reach their cognitive limits. The problem arises because corporate press releases over-
crowd at specific times, forcing journalists to process a lot of information quickly. During these
busy times, the exogenous variation of how visually salient a press release stays on the journalists’
computer screen affects the amount of media coverage.
This paper finds strong support that the salience of press releases is affected by exogenous shocks
that most directly impact wire journalists. This unique setting thus allows me to study the causal
effect of media coverage on other market participants. I find that media coverage significantly
increases the information acquisition of investors, and the effect also exists in investors who are
42
less likely inattentive. The increased information demand from institutional investors also impacts
stock analysts, who also issue more earning forecasts for media-covered events. The effect is most
prominent for resource-constrained analysts and becomes stronger in firms with higher institutional
ownership.
The results of the paper are consistent with the rational attention framework in Kacperczyk,
Van Nieuwerburgh, and Veldkamp (2016). Sophisticated investors may rationally allocate their
learning to media-covered events, which are empirically associated with higher price volatility and
possibly more mispricing. Their increased information demand also affects analysts, especially
the resource-constrained analysts who would crowdsource their coverage decision based on the
clients’ needs. The market outcomes are consistent with the tug of war between these two types of
investors. The effective spread increases with media coverage only on the event day. On the next
two days, the intra-day price range increases with more media coverage while the overall absolute
return is unchanged.
As new data and technology brings more information to the financial market, a division of labor
between different information intermediaries, including media, analysts, and investors themselves,
could greatly improve the efficiency to process new information. However, as different information
intermediaries become more intertwined, the inefficiency from one member could now affect others
as well. This paper provides novel evidence that inefficiency in the wire media could also impact
other information channels and cause large market reaction. Moreover, I find that the equilibrium
forces do not necessarily reverse such exogenous shift in attention. Even those more sophisticated
market participants, who suffer less from inattention, might find it more rewarding to chase the
media-covered events.
43
Chapter 2
U.S. Innovation and Chinese Competition for Innovation
Production
China now has the wealth, commercial sophistication and technical expertise to make its pursuit of technological
leadership work. The fundamental issue for the U.S. and other western nations, and the IT sector is how to
respond ...
Office of the United States Trade Representative, March 28, 2018 report
2.1 Introduction
A growing body of research focuses on the impact of China’s meteoric rise as an economic power and
its impact on the innovation spending by established firms in the United States. This growing body
of research has been matched by a growing interest in this same issue by policy makers, politicians
and the popular press. Issues at stake include job loss, the incentives to innovate, and intellectual
property protections. Yet the existing literature disagrees even on the most basic question. Does
an increase in foreign competition have a positive or negative impact on the intensity of innovative
investment in the U.S?
44
On the surface, the answer might seem obvious: increased competition is a negative shock, and
afflicted firms should reduce investment in R&D if this competition is in the form of strategic sub-
stitutes, as is true in many markets. Yet this prediction is not a given even if firms compete under
strategic substitutes. For example, Aghion, Bloom, Blundell, Griffith, and Howitt (2005) suggest
that firms might increase R&D following increased competition, as this might facilitate “escaping
competition” through increased product differentiation. Bloom, Draca, and Van Reenen (2016) fur-
ther predict that when firms have “trapped assets” that are difficult to redeploy, or high adjustment
costs, these incentives to increase innovative spending increase further. In particular, these firms
will maintain high ex ante production levels despite lower prices, as curtailing production is too
costly. The increased innovative spending then restores some pricing power through differentia-
tion. It has thus become an empirical question whether increased competition leads to increases or
decreases in innovation spending.
The existing empirical evidence is also mixed. Autor, Dorn, Hanson, Pisano, and Shu (2018)
find a negative relation between competition shocks measured using trade data and R&D in the U.S.
However, Bloom, Draca, and Van Reenen (2016) finds that competition shocks (measured using
trade data) lead to increased R&D spending in a sample of European firms. Hombert and Matray
(2018) also examine U.S. firms, and find that firms that are ex ante R&D intensive experience more
positive outcomes due to their increased ability to use R&D to escape competition. We consider
a new approach to this question that examines differences in informativeness responsiveness by
the type of competition and firms’ asset composition. We also consider the direct channel through
which Chinese firms acquire information over the internet.
We propose that global competition influences innovation through at least two competitive mar-
gins, each having different implications for innovation spending in the U.S. The first is the margin
45
covered by the existing studies: direct import competition in the market for existing products.
These existing studies use tariffs and import data, reinforcing their focus on the margin of existing
products. The second margin, which has not been studied in the U.S.-China innovation literature, is
direct competition in the market for innovation and intellectual property itself. Importantly, shocks
to tariffs and imports cannot be used as direct shocks to this margin, as both relate to products
that already exist, and hence their impact on intellectual property (IP) competition would only be
indirect and observed with delay.
We study the impact of Chinese innovation and its competitive impact on U.S. innovation us-
ing direct measures of Chinese ability to access innovation in the U.S. over the internet. Traditional
instruments such as tariffs and direct imports apply to existing product competition, and not compe-
tition in the race to create new technologies. We propose that industry agglomeration and internet
penetration at the province level in China can be used to generate plausible exogenous variation in
the capacity of Chinese firms to challenge U.S. firm innovation in particular industries. First and
foremost, intellectual property itself is a form of information, and the internet has proven to be an
efficient means for accumulating knowledge, especially when the knowledge to be gathered resides
overseas and is in electronic form.
In our main analysis, we examine how U.S. firms change their innovative investment in the
face of plausibly exogenous changes in intellectual property (IP) competition from China. We
find that treated U.S. firms significantly reduce spending in R&D over a lengthy three-year period
after treatment. These same firms realize fewer patents over the same horizon, and at the same
time, there is a material increase in Chinese patents in these same intellectual property production
markets. In particular, there is a strong increase in new patents by Chinese inventors that directly
cite the existing technology of the treated U.S. firms. This crowding-out effect is unique to China,
46
as we see no analogous impact of complaints about IP competition from other major international
competitors that are related to Chinese Internet penetration.
Competition in the market for intellectual property likely has a strong industry-specific compo-
nent. We use industry production locations, relying on the agglomeration literature, at the province
level in China to identify geographic regions where the most skilled and specialized human capital
exists in China for a given industry. We then build industry-specific measures of Chinese internet
penetration by mapping province-level data on internet penetration to the primary industry loca-
tions in each province. Because internet penetration in different geographic regions depends on
the ability of unrelated utility companies (internet service providers) to provide digital infrastruc-
ture, variation in this internet penetration is plausibly exogenous. Intuitively, the provision of high
quality internet depends in part on the distribution of population in that region, geographic fea-
tures, and the relative efficiency of ISPs in different regions. Province-level penetration thus varies
substantially across provinces and over time.
1
This framework allows us to create an industry-year
panel of instruments for China’s capacity to access innovation information that can plausibly chal-
lenge U.S. firms. In turn, this panel data approach allows us adequate power and variation to test
our key hypotheses even in the presence of rigid firm and year fixed effects.
Although we are careful to note limitations in our ability to fully establish causal conclusions,
we conduct a number of tests that at least partially establish the validity of our instrument. First,
we find that our industry-year measures of Chinese internet penetration predict higher ex post
incidence of U.S. firms complaining about competition from China, specifically complaints about
Chinese competition related directly to technology and intellectual property in their 10-K docu-
ments filed with the SEC. Finding increased complaints about Chinese access to their technological
1
Roberts and Whited (2013) suggest that variation along geographic dimensions has good properties regarding identi-
fication.
47
and intellectual property indicates successful identification of the second competitive margin of in-
novation noted above. Moreover, placebo tests indicate no evidence of similar complaints about
competition in other regions of the world including Japan, Europe, and neighboring countries such
as Canada and Mexico. This test is a powerful placebo as complaints about competition from
these other regions are more common unconditionally than are complaints about competition from
China.
We thus expect significant increases in competition in the market for intellectual property com-
ing directly from Chinese firms, but we should not see increases in competition coming from firms
in other parts of the world. We find that our internet penetration measure strongly predicts higher
rates of patent citations by Chinese inventors citing the patents of the treated U.S. firms in our
sample. We observe no changes in citation rates by inventors from the other regions of the world.
Finally, we also find higher rates of patents applied for in China itself that cite these same U.S. firm
patents.
The strong results specifically for China (and not elsewhere) illustrate the mechanism driving
intellectual property competition and indicate that omitted economic state variables, such as world-
wide industry supply or demand factors, likely cannot explain our results. Our framework, which
includes region, firm and time fixed effects, also ensures that identification is coming from specific
Chinese provinces (mapped using industry agglomeration), and not from changes in China that
are nationwide in scope. These findings support the validity of the exclusion requirement, as our
instrument only measures shocks to innovative potential in China itself, and we observe a strong
impact on the specific U.S. firms that should be impacted.
Our hypothesis regarding the competitive margin of IP production predicts that our results
should be stronger in specific subsamples. We first examine whether our results are stronger in
48
industries with stronger growth options as measured by the market-to-book ratios of the treated
U.S. firms. As predicted, we find that firms with above-median market-to-book ratios experience
more extreme ex post reductions in innovative investment and patents following competitive IP
shocks from China.
A second prediction is that trapped assets will moderate these findings, as hypothesized by
Bloom, Draca, and Van Reenen (2016). U.S. firms with more tangible or “trapped” assets have
incentives to maintain high levels of innovation given high adjustment costs and hence they should
reduce innovation less when these competitive shocks materialize. We use the asset tangibility
of U.S. firms as our measure of trapped assets and find that firms with more tangible assets and
thus more ”trapped” assets do increase their relative R&D spending and patents in the face of in-
creased competition to differentiate their existing products. However, consistent with the literature,
we view this as driven by competition in existing products rather than competition in intellectual
property production.
Our findings regarding growth options and trapped assets provide deeper insights on the impor-
tance of an industry’s initial conditions, and how they shape the predictions regarding the impact
of increased IP competition. These competing forces can help to explain much of the disagreement
in the existing empirical literature, where both positive and negative competitive effects on inno-
vation have been found. Key to our conclusion is that at least two margins of competition need to
be separately explored. We find that direct competition in the market for intellectual property itself
has a sharp negative impact on treated firms due to the intuitive crowding-out effect.
In contrast, if competition only increases in the market for existing products rather than in the
market for IP production, it is more plausible that treated U.S. firms might increase innovation in
order to escape competition. Such a strategy might be most optimal when, in fact, the Chinese
49
competitors do not have the innovative capacity to compete on this second margin. For example,
in such a market, ceding market share in the lowest quality existing products to the entrants, while
increasing innovation in order to claim higher quality segments of the market for the incumbents,
can form the basis for the post-shock equilibrium. This approach can restore some pricing power
for incumbents, while accommodating the entering rivals in the market where their competitive
advantage of lower cost labor might be most advantageous.
Although our focus is on competitive intensity in the market for innovation, it is natural to ask
if our results inform the more controversial issue of intellectual property theft. We believe that our
study does not directly address IP theft, although it helps to motivate future research on the topic. A
starting point is that IP theft and fair competition should have similar impact on treated U.S. firms.
Both will crowd-out innovative spending as the foreign entrants claim a fraction of the rents for
themselves. On the surface, the increase in patents we find suggests that IP theft is less likely, as the
foreign innovators are securing legally defensible patent protection. However, this alone does not
rule out IP theft as the basis for creating the new patents might have roots in stolen trade secrets or
other intellectual property as a precursor.
In order to at least partly inform whether our results relate to IP theft, we examine the extent
to which U.S. firms complain directly about IP theft in their 10-Ks. First, we note that power is
relatively low in such tests given that such direct statements are less prevalent than are comments
directly referencing competition. Nevertheless, we find suggestive evidence that our internet pen-
etration instrument predicts a higher incidence of complaints about IP theft by the treated U.S.
firms. This evidence suggests that IP theft, or “perceived IP theft,” might explain part of the in-
creased competition in these IP markets. Yet we caution readers not to draw strong conclusions
50
from this analysis because power is limited and statements by firms about IP theft do not consti-
tute direct proof that IP theft has in fact occurred. The underlying question of potential IP theft
is important for future research to consider, as policy implications differ for IP theft versus fair
competition.
2.2 Literature and Hypotheses
We examine product market globalization, and the competitive impact of foreign product market
competition on innovation outcomes in a domestic market. We thus focus our discussion on the
existing literature on U.S.-China competition and how it relates to innovation spending, and to
papers that specifically cover competition in innovative markets.
The existing global innovation literature typically focuses on the impact of shocks to foreign
competition in the market for existing products. We propose that foreign competition plays out on
more than one competitive margin and that foreign competitors can challenge domestic firms both
on pricing existing products, and also by entering the competitive race for innovation.
The concept of competition in the market for innovation in the domestic U.S. market has been
extensively studied by many authors.
2
In an international context, Hombert and Matray (2018),
Bloom, Draca, and Van Reenen (2016), and Autor, Dorn, Hanson, Pisano, and Shu (2018) study the
impact of competition from international trade on innovation. However, no study to our knowledge
has examined the impact of product market globalization on the dual margins of competition in the
existing product markets and in the market for innovation.
Globalization of product markets results in the opening of borders, and the impact on any nation
can be modeled using theories of entry in markets with existing incumbents. In classical models of
2
Early work on innovation and competition has been summarized in the survey by Reinganum (1989) with recent
contributions by Phillips and Zhdanov (2013) and Bena and Li (2014).
51
competition with strategic substitutes, such as the Cournot model, the central prediction is that an
entrant will cause existing firms to downsize as the new competitor absorbs a fraction of the market
share and applies upward pressure on quantities produced and downward pressure on prices. If the
value of growth options in such a market is proportional to the scale of the firm, a natural follow-
on prediction regarding innovation (our setting) is that such competitive shocks will also lead to
reductions in ex-post innovation by incumbents as they reduce scale.
More recent research has challenged this classical view. Aghion, Bloom, Blundell, Griffith, and
Howitt (2005) suggest that a shock to competition could result in increases in innovation as firms
rush to differentiate their products in order to rebuild lost market power. This is the “escape compe-
tition” hypothesis. The validity of this alternative hypothesis depends at least in part on incumbent
firms having a technological advantage relative to the new entrants, as only then would they addi-
tionally be able to defend their differentiated products from entrants.
The classical theory and the escape competition theory thus have opposite predictions. It is
therefore perhaps not surprising that existing studies find mixed evidence regarding the impact of
Chinese competition on the innovation intensity of domestic firms. These studies, however, only
examine one competitive margin: competition in the market for existing products. Indeed, on this
margin, it is quite plausible that the ideal conditions for the escape competition strategy might hold
in some markets.
How do these predictions change if the entrants are also adept at producing innovation? Exam-
ining this issue is our main contribution. We propose that the overall effect of Chinese competition
and internet penetration on a domestic incumbent’s innovation spending has two parts: (1) in-
creased competition from the foreign rivals in the market for existing products and (2) increased
52
competition from the foreign rivals in the market for innovation itself. The existing literature illus-
trates the ambiguous predictions regarding the former, whereas it is largely silent on the competi-
tion in intellectual property.
Our first hypothesis relates to the margin of competition for innovation, where we predict that
increased competition from entrants on this same margin should crowd-out domestic firm innova-
tion.
Hypothesis H1: Increased foreign competition will reduce the value of growth options and re-
duce incumbent domestic firm innovation spending in R&D and patenting. We also expect more
patenting by the entering foreign firms, especially in technologies strongly related to the incumbent
domestic firm’s technologies.
Because H1 pertains to an increase in competition on the same margin that we are trying to
predict (innovation), H1 intuitively predicts that the classic model’s predictions of crowding out
should dominate. In contrast, the scenario is more complex for the second margin: competition in
the market for existing products with two potential competing forces.
Hypothesis H2a: Increased foreign competition in existing product markets leads domestic
incumbents to downsize. We thus predict decreased innovation spending by these incumbent do-
mestic firms.
Hypothesis H2b: Increased foreign competition in existing product markets leads to reduced
prices for the existing products. To recapture pricing power, incumbent domestic firms will increase
innovation spending in order to escape competition.
53
Because predictions regarding the impact of innovation in the market for existing products
are ambiguous, it is natural to ask which hypothesis is more likely under different sets of initial
conditions: H2a or H2b? We follow Bloom, Draca, and Van Reenen (2016) and propose that the
existence of trapped assets by the domestic incumbents favors H2b. In particular, if a firm has
assets that are not redeployable and adjustment costs are high, it follows that the firm has strong
incentives to maintain high production levels. By increasing innovation, such a firm can preserve
some pricing power despite its high production rate. This leads to our final hypothesis.
Hypothesis H3: When the domestic incumbent firms have high levels of existing non-redeployable
assets, these firms will increase innovation spending, all else equal, to exploit their existing assets.
2.3 Data and Methods
2.3.1 Sample Selection and Panel Structure
Our sample begins with the universe of Compustat firm-years with available 10-K filings on the
EDGAR system. We exclude financial firms and regulated utilities (SIC 6000 - 6999 and 4900 -
4949, respectively) and limit the sample to firm-years with sales and assets of at least $1 million.
Since the Chinese internet penetration measures do not exhibit enough industry-province coverage
until 2000, our final sample starts from 2001 and ends in 2016, with 61,930 firm-years from 8,474
unique firms. This panel is the base for our analyses.
We construct a set of country-specific competition complaint measures using texts in 10-K fil-
ings. For convenience, we utilize the software from meta Heuristica LLC to process our queries. To
measure complaints about competition from China, we search for paragraphs that contain at least
one word from both the country name list (”China” or ”Chinese”) and the competition word list
54
(”compete” or ”competition” or ”competing”). We then use the number of matched paragraphs and
normalize it by the total number of paragraphs in the 10-K document as our measure, CNComp.
In addition to this generic competition measure, we further construct three additional competition
measures by requiring the paragraph to contain a word from a third word list. First, to measure
the intensity of competition, we construct the high competition measure, CNCompHi, by requiring
the paragraph additionally contains one of the words in the following: (high OR intense OR sig-
nificant OR face OR faces OR substantial OR significant OR continued OR vigorous OR strong OR
aggressive OR fierce OR stiff OR extensive OR severe). Second, we measure the competition in
intellectual property, CNIntComp, by requiring the paragraph to additionally contain both “intellec-
tual” and “property” in the search. Finally, we measure complaints about intellectual property theft,
CNIntTheft, by counting the number of paragraphs that match the country list that contain both
“intellectual” and “property”, and match one of the words in the following: (protect* OR infringe*
OR theft*). In addition to constructing ratio measures of the total number of paragraphs, we also
construct dummy variables which equal to one if we hit any matching paragraphs. Similarly, we
also construct these measures for three other major economies in the world, namely Europe, North
America (Canada and Mexico), and Japan, by changing the words in the country list. Details of
these measures can be found in Table B.25.
Other firm characteristics variables come from Compustat. We measure firms’ R&D intensities
by normalizing the R&D expenses (xrd) by sales. Following the suggestions from Koh and Reeb
(2015), we replace missing R&D intensities by the industry average (2-digit SIC) if the firm has
applied for any patents in the past three years, and replace other missing values with 0. Definitions
of other variables can be found in Table B.25. Finally, we winsorize all ratio variables at the 1% and
the 99% level to control for outliers.
55
2.3.2 Patent Data
We generate our patent measures from two sources. The first source is Google Patent. Since Oct.
31, 2017, Google, in collaboration with IFI Claims, a global patent research company, has made
a set of structured and queryable datasets of patents available to the public
3
. The core part of
the datasets contain over 90 million patent publications from the patent offices of 18 countries,
including both the U.S. and China, among others. The same datasets support the searches made
through patents.google.com, and to our knowledge represent one of the highest-quality sources for
patent research. We also get the patent data from Kogan, Papanikolaou, Seru, and Stoffman (2016)
(KPSS hereafter), who kindly shared the data on their website. The key advantage of the KPSS data
is that the aforementioned authors have spent huge efforts to link the patents to U.S. public firms.
However, the data ends in 2010, thus we will combine the two data sources to generate our patent
variables.
We first use patent applications to measure firms’ innovation activities. We extend the KPSS
data with Google patent data. To link the new Google patent data to public firms, we utilize the
links that are already developed by KPSS. First, we take the overlapping part of the Google data
and the KPSS data
4
and generate links between permno numbers (from KPSS data) and (first)
assignee names (from Google data). Next, we select all the utility patents that are filed in USPTO
and granted after Nov. 1, 2010 from Google data. We then merge the permno number to the first
assignee of patents using the link file we just generated. In this step we are able to match 77.4% of
all the new patents.
3
More about this announcement at https://cloud.google.com/blog/products/gcp/google-patents-public-datasets-
connecting-public-paid-and-private-patent-data. One can access the datasets through Google’s BigQuery service
4
The Google Patent Data covers 99.95% of the patents in the KPSS data matched by the patent number, and covers
99.59% of patents matched by both the patent number and the grant date.
56
Google data also provides the country information of the assignee
5
. Thus we are able to see
patents that are assigned to foreign entities but filed in USPTO. We utilize the information by
measuring the number of new Chinese patents that cite the existing patents of U.S. firms, providing
direct evidence on the intensity of learning from Chinese firms. We also construct similar measures
for other major economies, namely Japan, Europe, and Canada and Mexico. These measures would
later be used as placebo tests to show that our internet penetration variable is not picking up
omitted factors that attract general international competition.
Finally, Google data also includes all the patents filed in China’s Patent Office, known as SIPO
(State Intellectual Property Office of the Peoples Republic of China). Therefore we are also able to
check whether patents filed by SIPO (by Chinese firms) also cite patents from U.S. firms, further
enhancing our previous measure using only the patents filed in the U.S.
2.3.3 Internet Penetration
The quality and coverage of internet access in China has dramatically changed in the last two
decades. While in the early 2000s, only fewer than 1% of the population in China had access to
the internet, by 2018, the number of internet users in China has surpassed 800 million, and the
internet penetration rate reaches 57.7%. The internet has become the most important medium
through which information is exchanged. For innovation activities, the internet enables inventors
to collect information much more efficiently, and is almost a necessary component for any modern
day research.
To measure the internet penetration rate in China, we hand collect the number of internet users
from the reports issued by the China Internet Network Information Center (CNNIC). CNNIC is the
5
The corresponding variable is assignee harmonized.country code in the dataset.
57
official administrator of the internet infrastructure in China, and starting from 1998, it publishes
semi-annual reports which describe the recent development of internet infrastructure and the de-
mographics of internet users in China. To our advantage, these reports also provide the number
of internet users separately for each province in China
6
. We then collect the population numbers
for each province from China Data Online
7
and compute the internet penetration ratio for each
province in each year.
Note that the internet infrastructure has not grown at similar rates for all the provinces in each
year. As one example, Figure A.6 plots the year in which each province experienced its largest
increase of the internet penetration ratio. The scattering pattern shows that the development of
internet infrastructure is not always in sync for the whole nation. The landscape of the telecom-
munication industry in China has gone through drastic changes in the past two decades. Prior to
1994, China had one government department that provided all the phone and internet services:
the Directorate General of Telecommunications, which was later registered as China Telecom. That
monopolistic structure was changed in 1994 when China introduced China Unicom to compete with
China Telecom. The deregulation continued in the 1990s as China Telecom was further broken up
into two companies, and other new internet service providers like China Net and China Railnet
were also established. By the end of 2001, China had seven companies in the telecommunication
industry, and these companies tend to focus in different business areas and also different regions.
For example, China Net, an internet service provider, mostly operates in the 10 provinces in the
northern part of China. The drastic changes continued in the 2000s, as the industry went through a
round of complicated consolidation, and by the end of 2008, only three companies, each of which
now cover all the telecommunication business, were left, namely China Telecom, China Mobile,
6
The statistics does not include data for Hongkong or Macau.
7
Unfortunately, the China Data Center at the University of Michigan has decided to terminate the service as of September,
2018. However, one can easily download similar data from alternative sources like http://data.stats.gov.cn/english/
58
and China Unicom. These industry changes could generate a direct impact on the internet services.
For example, we see from Figure A.6 that after China Net was acquired by China Unicom in 2008,
three northern provinces—Liaoning, Shandong, and Jilin—experienced their largest increase in the
internet penetration rate in 2009.
For each U.S. firm, we want to measure the internet penetration for the potential peer firms in
China. To do that, we use a weighted-average measure of the internet penetration of the provinces
where the industry of the U.S. firm is important. Indeed, a large literature has documented that
industry tend to cluster geographically
8
, and China is no exception. Ideally we would want the
total assets of all the firms in each industry and province. However, such detailed census data is not
publicly available, we thus retreat to the second best: using data from Chinese public firms. To help
address the endogeneity of the industry-province links, we choose to use the industry- in year 2000.
This choice is justified by our observation that the number of industries over which Chinese public
firms span becomes sufficiently high and stable in year 2000, as shown in Figure A.7. We select all
the Chinese public firms that have non-missing headquarters and asset information in 2000. Our
final sample includes 864 firms listed in mainland China (A-share), 74 firms listed in Hong Kong,
and 5 firms listed in the US
9
. We then assign each firm to the province of its headquarters. To
generate the weights, for each 2-digit SIC industry, we first calculate the weights of each province
using the total assets of all its public firms in that industry. Then we exclude provinces whose
weights are below 10%, and finally recalculate the weights using the remaining provinces. Figure
A.8 shows the weight loading for all the industry-province pairs.
Using the weights for each industry, we finally calculate the internet penetration measure as the
weighted average across all provinces. In the next section, we show that our internet penetration
8
See Florence (1948); Hoover (1948); Fuchs (1962); Krugman (1993); Ellison and Glaeser (1997); Duranton and
Overman (2005, 2008)
9
For firms that are dual-listed, we only count it once using its primary exchange
59
measure significantly predicts the complaints from U.S. firms about competition in intellectual prop-
erties. We also find the measure will positively predict the number of Chinese patents that cite the
U.S. firms’ patents. As placebo tests, we find the internet measure does not predict the complaints
about the competition and patent citations from other economies, suggesting our internet measure
is not capturing the endogenous factors which affect the overall level of international competition.
2.4 Summary Statistics and Validation
2.4.1 Summary Statistics
Table B.14 presents summary statistics for our 2001 to 2016 panel of 61,930 firm-year observations
having machine-readable 10-K filings. On average, the weighted internet penetration ratio is 36%
for each firm-year. We see that about 5% of sample firms explicitly complain about competition
from China, and 40% of them specifically mention intellectual property in their complaints. The
incidence of U.S. firms complaining about competition, and especially competition in the market
for intellectual property, also has been rising. Figure A.5 plots the time-series of the general Chinese
competition complaint measure and the complaint measure about IP competition. Both measures
show tremendous increases over the years.
Table B.14 not only indicates we have ample power to examine the impact of Chinese innovative
capacity on U.S. firms, it also indicates that we have even more power to run placebo tests. For
example, sample-wide, U.S. firms complain about European and North American (Canada and
Mexico) competition at even higher rates. As shown in Table B.14, the Chinese competition (scaled
by document size and x 1000) variable averages 0.15, whereas the analogous variable for Europe is
0.31 and it is 0.24 for North America. Because we use activity in other parts of the world as placebo
60
tests, this indicates that there is ample power to detect deviations from the exclusion requirement
using these other regions of the world as placebos. However, this variable is just 0.04 for Japan,
indicating less relative power for Japan in this capacity.
When we consider other regions of the world in our placebo tests for patent citation activity,
the average intensity of Chinese firms citing U.S. patents is 2.39, while European, Japan and North
American citations of U.S. firms are 26.85, 23.88 and 5.06, respectively. Our identifying assumption
is that Chinese internet penetration is first-order driven by the capacity of unrelated IP providers in
China and their capacity to expand. If so, our main results should not be driven by underlying state
variables such as time-varying industry demand shocks.
Because demand shocks have a global component to them, it follows that if our identifying
assumptions are violated, our Chinese internet penetration variable should also predict growth in
European, Japanese, and North American firms citing the same U.S. firms. Hence we use these
regional activities as placebo tests. Because the data is much richer for these regions than it is for
China, it follows that these placebo tests should be particularly strong in terms of the power to
detect violations of the exclusion requirement. As we document later, we find strong results for
Chinese companies and no results for placebo tests using the other regions of the world.
Table B.15 displays summary statistics at the firm level rather than at the firm-year panel level
(Table B.14). In particular, we first calculate the mean value of each variable for each firm, and
the table represents the statistics for the resulting firm averages. The primary motive for reporting
summary statistics in both dimensions is to examine the distributions of our key variables, especially
the more extreme values. As we will include firm and year fixed effects, for example, major outliers
could sway our findings.
61
As is well known in the innovation literature, many variables measuring R&D and patenting
activity do have distributions that tend to be right-skewed. Consistent with the literature, we there-
fore winsorize all of our key variables at the 1%/99% level. Overall, we find distributions that are
similar to those in other studies. Although these distributions are consistent with other studies, we
also examine robustness tests to determine if our results remain robust in key subsamples including
the set of firms with positive R&D activity or in subsamples with above-median patenting activity.
Our results remain highly robust.
2.4.2 Validation Test: EDGAR Downloads by Chinese Internet Users
In this section, we examine the informativeness and relevance of our measure of industry-specific
Chinese internet penetration. In particular, we test whether this measure predicts higher observed
rates of Chinese internet users downloading information about U.S. firms in specific industries (and
in specific years). For example, if internet penetration increases in a Chinese province that focuses
on electronics production in 2006, we predict that U.S. firms in the electronics industry will expe-
rience increased downloads by by Chinese internet users specifically in this year. If additionally,
the evolution of internet penetration in China is plausibly exogenous relative to industry condi-
tions, we additionally predict no relationship with downloads by internet users in other (placebo)
nations. Alternatively, if internet penetration was endogenously driven by industry conditions, we
instead would predict a strong link to internet downloads from many parts of the world as industry
conditions are highly correlated across nations.
We test these predictions using the EDGAR internet log files from the U.S. Securities and Ex-
change Commission. We use the IP Address of each visitor to identify which nation they are from,
and we then tabulate the number of visitors from each nation to each individual U.S. public firm
62
in each year from 2004 to 2015. We exclude IP addresses that are possibly web crawler. Following
Lee, Ma, and Wang (2015b), we tag an IP address as a web crawler if the IP address has down-
loaded files from over 50 or more firms in a day
10
. As larger firms will have more visitors, we
scale the total web visits by each firm’s sales to create our key dependent variable: # of EDGAR
searches/sales. We also standardize this variable to have unit standard deviation in each year for
ease of interpretation. We regress this variable on our internet penetration variable plus a standard
set of controls, firm fixed effects, and year fixed effects.
Panel A of Table B.16 shows that our measure of industry-specific Chinese internet penetra-
tion significantly predicts the intensity of EDGAR downloads for U.S. firms in the treated industry
by Chinese internet users. The inclusion of firm fixed effects absorbs all firm-specific unobserv-
able characteristics, and allows us to focus on the most rigorous within-firm effects. These results
provide strong evidence of our proposed mechanism: internet usage is a major tool for rapid in-
formation gathering of knowledge capital by overseas firms. This, in turn, exposes treated firms to
increased competition from abroad specifically in the market for innovation and knowledge itself.
These findings also indicate an unintended consequence of mandatory disclosure. Such disclosure
can strengthen competition from overseas, likely at the expense of domestic firms.
Panel A of Table B.16 also reports the results of our placebo tests, where we consider EDGAR
searches from other major economies. As predicted, we find no significant link to our measure of
Chinese internet penetration for the European Union, Japan, or Canada and Mexico. These results
are consistent with Chinese internet penetration being driven by factors that are plausibly exoge-
nous relative to industry state variables. In particular, if internet penetration was correlated with
industry demand or expected growth, which have a common global component, we would expect
10
In addition to excluding the requests from web crawlers, we also exclude web requests that (1) have a server code
larger than 300 and (2) are on the index pages.
63
these placebo tests to fail. Our findings thus suggest that any link between internet penetration and
industry conditions is likely small or negligible in magnitude.
Panel B of Table B.16 reproduces the tests in Panel A using our measure of internet penetration
based only on Chinese corporations that only trade via domestic A-shares (we thus exclude firms
listed in Hong Kong or in the United States). Although this restriction can reduce power, we also
note that these purely domestically firms likely have fewer options for gathering information out-
side of the internet. The results in Panel B are similar but slightly stronger than those in Panel A
indicating that our results using either approach.
2.4.3 Validation Test: Complaints about Chinese Competition
In this section, we examine the following question: if elevated levels of industry-specific Chinese
internet penetration are associated with higher ex-post complaints by U.S. firms, does that mean
they are facing higher levels of competition specifically from Chinese firms? We use textual analysis
of 10-Ks disclosed by U.S. firms during our sample period as explained earlier.
As our hypothesis is that internet penetration specifically shifts competitive intensity in the mar-
ket for intellectual property production, we also go one step further. In particular, we also measure
the intensity of U.S. firm complaints about competition that appear specifically in paragraphs where
the company is discussing innovation. We predict positive results, and such results would serve to
validate the economic content of our primary internet penetration variable.
An analogous framework for other major economies (excluding China) allows us to further ex-
amine the exclusion requirement using placebo tests. We examine if our Chinese internet penetra-
tion variable also predicts higher rates of complaints by U.S. firms about competition from Europe,
North America (Canada and Mexico) and Japan. If the exclusion requirement holds, and if our
64
internet penetration variable is not related to underlying state variables relating to industry supply
or demand shocks, then we predict that these placebo tests should produce insignificant results.
As noted earlier, these placebo tests have high power due to the fact that these other economic
regions are large in scale and hence U.S. firms frequently summarize the intensity of competition
from these regions. They key empirical question is if these complaints are also related to Chinese
internet penetration.
We estimate the following regression
Y
it
=CNInternet
it1
+
Z
it1
+
i
+
t
+"
it
(2.1)
The dependent variable is a complaint measure as noted above, where all are generated using
textual information from U.S. firm 10-K filings. Detailed definitions of these variables can be found
in Section 3.1 or Table B.25. CNInternet is our key measure of competition and is the weighted-
average internet penetration across provinces where Chinese firms agglomerate at the industry
level. Z represents the control variables, which include: CNSalesGR, the sales growth of the same
industry in China, log(10kSize), log of the total number of paragraphs of each 10-K filing, firm age,
size (total asset), and Q. All independent variables are lagged one year relative to the dependent
variable and hence are ex-ante measureable. We also include firm and year fixed effects in all
regressions, and the standard errors are clustered by firm.
Table B.17 shows the results. In the first two columns, we find that the Internet penetration
significantly predicts the rate at which treated U.S. firms in the same industry complain about
competition specifically from Chinese firms. A one standard deviation increase of the internet
penetration ratio leads to a 0.124 standard deviation increase, or a 64% increase from the sample
mean of the Chinese competition complaint measure. We obtain similar estimates if the dependent
65
variable is a dummy equal to one if the given U.S. firm has at least one complaint in its 10-K.
Columns (3) and (4) of Table B.17 show that the high competition measure is also significantly
predicted by the internet penetration measure.
Our most direct tests are in the last four columns of Table B.17. We find that internet penetration
also significantly predicts U.S. firm complaints about competition that are specifically related to
intellectual properties (IP) (see Columns (5) and (6)). In Columns (7) and (8), instead of focusing
on competition, we consider instances where U.S. firms discuss IP theft. This reflects the fact that
IP theft, in an economic sense, is a form of competition and U.S. firm complaints should thus follow
similar patterns. We find that indeed they do.
The possibility of IP theft has been a centerpiece of recent public and political debates about
recent trade conflicts between the U.S. and China. Although we do not draw any strong conclu-
sions with respect to IP theft, our finding that internet penetration significantly predicts IP theft
complaints from U.S. firms is suggestive. IP theft, or “perceived IP theft,” might thus explain part
of the increased competition observed in these IP markets. However, we caution that complaints
in 10-Ks do not constitute any proof that any IP theft has, in fact, occurred. Moreover, we do doc-
ument increased patenting by Chinese firms (discussed later), which is not a form of theft given
that patents are both transparent and legal. Yet IP theft could be a precursor to such patents, as
the younger firms in China might use trade secret theft to catch up on overall knowledge capital,
which is necessary for patents. Overall, our evidence of IP theft is not decisive and this suggestive
evidence and the importance of the question indicates that future research examining this issue
would be invaluable.
66
Overall, Table B.17 shows that industry-specific internet penetration in China strongly predicts
ex-post complaints about competition from U.S. firms, especially competition on the margin of inno-
vation itself. This validation test indicates that the economic content of our key internet penetration
variable is in line with our predictions.
2.4.4 Placebo Tests using Other Major Economies
Although the validation documented in the preceding section indicates positive information about
content, other economic forces might also affect firm innovation and be correlated with Chinese
internet penetration. For example, industry-specific internet penetration might be correlated with
global supply or demand shocks in the given industry, or it might relate to global competition
more than just Chinese competition alone. In order for our experiment to be ideal, this variable
should only identify shifts in the capacity of Chinese firms alone to challenge firms globally on the
competitive margin of innovation.
To further examine the exclusion requirement, we construct analogous competition complaint
measures for other major economies, namely Japan, Europe, and neighboring countries in North
America (Canada and Mexico). If the internet penetration variable contains information about the
industry’s state, thus violating exclusion, we would expect that complaints about competition from
these other economies would show similar positive signs. Table B.18 shows the results. In Panel
(A), we run similar regressions based on Equation 2.1, but replace the dependent variable with the
complaint measures from other countries. For brevity we focus on complaints about competition
and intellectual property theft.
Columns (1) - (4) of Table B.18 show that Chinese internet penetration is not significantly
related to complaints about competition from Japan or North America. However, Columns (5) and
67
(6) show weakly significant results for the European Union, indicating some potential concerns
about exclusion. We examined this issue in-depth and the results indicate that this result is likely
spurious. First, the significance of the European Union results is driven fully by the first year of our
sample, likely indicating an outlier perhaps relating to the formation of the European Union. If we
exclude the first year, the results for the European Union are insignificant whereas our results for
China are highly robust.
A second key issue is that our primary measures of industry agglomeration at the province
level in China use geographic headquarters location data from all publicly traded Chinese firms,
including those listed in China, Hong Kong, and in the United States. If there is a violation of the
exclusion requirement, a most likely source could be that Chinese firms that list in Hong Kong or
in the United States have better access to information about innovation in their industries, creating
channels for information transmission outside of internet penetration.
We test this issue in Panel (B) of Table B.18. In particular, we re-define Internet penetration
using industry-specific agglomeration data based only on Chinese firms listed in Mainland China
(those having A-shares). This more narrow definition of internet penetration does not load on com-
panies having listings outside of China, and hence limits any alternative channels for information
transmission beyond the internet. The results in Panel B lend support to this explanation of the
results in Panel A. In particular, all of the placebo tests from all three major economies are insignif-
icantly related to Chinese internet penetration. Although the results in Panel A for the European
Union might be spurious and thus less relevant, the results in Panel B indicate a very conservative
strategy for our main tests in the paper.
In particular, we run all tests in the paper using our main internet penetration variable and
also separately using our mainland-China-only internet penetration variable. We note that all of
68
our results are robust in both specifications. Moreover, our results are actually stronger using
the mainland-China-only measure. For this reason, we report results using the complete internet
penetration measure in order to be conservative, although all results are robust to either model.
We briefly note that we later run an additional placebo test later in the paper when we consider
patenting activity. We find even stronger support for the exclusion requirement in all of these tests.
In particular, our main result is that Chinese firms increase their patenting activity in the markets
of the treated U.S. firms after episodes where internet penetration increases. They also greatly
increase cites to the treated U.S. firms in their same industry. The key placebo test we consider later
is whether European, North American or Japanese firms do the same. If the exclusion requirement
did not hold, we would expect similar results as explained above. As we explain later, we find no
significant results for these other economies, and these placebo tests hold regardless of whether
we define Internet penetration using all Chinese firms or just those listed in mainland China. As
discussed in our summary statistics section, these tests are particularly strong placebo tests due
to the fact that patenting activity overall is more intense for firms from Europe, Japan and North
America relative to China.
Collectively, these placebo tests suggest that it is unlikely that our internet penetration vari-
able is contaminated by a global factor or by an omitted industry state variable relating to supply
or demand shocks. These findings lend support to the possibility that our results are consistent
with internet penetration causing reductions in innovative activities of treated U.S. firms due to a
crowding-out effect of increased foreign competition in the market for innovative technologies.
69
2.5 Competition and Innovation
In this section, we examine how competition from China, as measured by our industry-specific
Chinese internet penetration variable, affects the innovation activities of U.S. firms.
2.5.1 Impact on U.S. Firms
We first examine how ex ante industry-specific Chinese internet penetration impacts ex post invest-
ment in R&D expenses by treated U.S. firms. We do so by estimating a regression model as specified
in Equation 2.1. Our key dependent variables are the R&D/sales and the number of patents/sales
of our U.S. firms.
Table B.19 shows the results. Column (1), which uses the one-year ahead R&D expense ratio
as the dependent variable, shows that internet penetration significantly negatively predicts ex-post
R&D. The coefficient estimate of -0.182 is significant at the 1% level, and indicates that the R&D
expense ratio decreases by 0.182 standard deviations when Chinese internet penetration increases
by one standard deviation. The coefficient remains significant when we examine the two-year
ahead R&D activities in Column 2 and three-year ahead R&D in Column (3).
We find a similar result for the ex post patenting activities of the treated U.S. firms. In Columns
(4) - (6) of Table B.19, we use the number of patent applications in the next three years divided
by sales as the dependent variable. Column (4) shows a highly significant coefficient estimate of
-0.108, indicating a decrease of 0.108 standard deviations of patenting activities when Chinese
internet penetration increases by one standard deviation. In years two and three, we continue to
observe significant and negative coefficients.
To ensure that our results are not driven by the skewed distribution of R&D and patents, we
re-estimate the model using Poisson regressions. Table B.20 displays the results. To facilitate the
70
Poisson regressions, we drop the firm fixed effects and instead we control for the lagged dependent
variable. Overall the negative effects we find for internet penetration on ex post U.S. firm innovation
are analogous to those in Table B.19.
We thus conclude that plausibly exogenous shocks to the ability of Chinese firms to compete in
the market for innovation production are associated with sharp reductions in the ex-post innovation
rates of treated U.S. firms. This first main result in our paper is new to the literature, which instead
focuses on the margin of competition in the production of existing products.
2.5.2 Impact on Chinese Firms
We now examine the relationship between ex ante industry-specific internet penetration and ex
post increases in the number of new Chinese patents that directly cite the existing patents of the
impacted U.S. firms. We utilize the country information of the first assignee for each patent to
identify patents that are assigned to a Chinese entity. For each firmi in yeart, we then count the
number of new patents that are (1) applied for through the USPTO, (2) assigned to a Chinese entity,
and (3) cite any existing patents of firmi. Following our standard conventions, we then scale this
count (PatCiteUS
CN
) by firm sales.
We use this measure of new Chinese patents (that cite pre-existing same-industry U.S. firm
patents) as the dependent variable in our next set of tests. The results are displayed in Table
B.21. Columns (1) - (3) of Table B.21 show that ex ante internet penetration predicts increases
in the number of Chinese firms citing patents to these U.S. firms in the next three years. Results
are significant at the 1% level in each of the three ex post years. The effects are also large as a
one standard deviation increase in internet penetration is followed by a 0.285 standard deviation
increase in the number of citing patents by Chinese firms in the following year.
71
To ensure that our tests are not driven by changes in the overall intensity of patents to a given
U.S. firm’s existing patents, we consider an alternative scaling that accounts for the cites to these
same patents by other U.S. firms. In particular, we define PatCiteUS
US
as the number of cites to the
focal firm’s patents by U.S. firms. Columns (4) - (6) of Table B.21 show the results of regressions
where the dependent variable is PatCiteUS
CN
/ (PatCiteUS
CN
+ PatCiteUS
US
+ 1). The added one
in the denominator avoids division by zero and this construction ensures this variable is bounded in
[0,1] and thus does not have outliers. We find that the results in Columns (4) to (6) are very similar
to our baseline results in Columns (1) to (3). Our results are thus not driven by broad increases in
patent cites, but rather are unique to the Chinese firms citing these patents.
The Google patent database also includes all patents filed with SIPO, the Chinese Patent Office.
We thus construct a similar measure of Chinese patents that cite the U.S. firm patents, but that are
filed in China. The dependent variable for Columns (7) - (9) of Table B.21 is PatCiteCN, which
is the number of new patents that are applied with SIPO that cite the existing patents of the U.S.
firm, and we scale this quantity by the focal firm’s sales. We find that the coefficient estimates
for internet penetration once again are highly significant and economically large. A one standard
deviation increase in internet penetration is associated with an increase of 0.180 to 0.250 standard
deviations of these SIPO patents over the three ex post years.
Columns (10) - (12) of Table B.21 repeat this exercise using the same scaling convention dis-
cussed above for Columns (4) to (6), where the goal is to ensure our results are not explained
by broad-based increases in cites to the focal U.S. firm’s patents. Our results remain significantly
positive in all three years.
Overall, we find consistent evidence that the internet penetration predicts strong ex post patent-
ing activity by Chinese firms, and that these new patents are directly in the technological areas
72
previously covered by the treated U.S. firms. These results suggest that high quality internet ac-
cess facilitates increased learning by Chinese firms about the existing technologies used by U.S.
firms in their industry. Put together with our finding that U.S. firms decrease patenting in these
same technological markets, our results suggest that internet penetration is followed by a strong
crowding-out effect. As Chinese firms enter these markets for innovation, they absorb a fraction of
the associated rents, and thus crowd-out the treated U.S. firms.
2.5.3 Impact on Firms in Placebo Economies
Analogous to our earlier placebo tests in Table B.18 that examined complaints by U.S. firms about
competition from rivals in various economic centers, we perform a similar set of placebo tests
regarding the ex post patenting results we found for Chinese firms in the previous section.
If the exclusion requirement is strongly violated, we would expect to see significant increases in
patenting activity that cites these same U.S. firms by other firms in other major economies including
Europe, North America and Japan. As noted earlier in our summary statistics section, these placebo
tests are strong due to the fact that patenting activity by firms in these other regions is more active
in our sample overall than is patenting activity by Chinese firms. Even if relatively modest industry
supply and demand effects were driving our results, these placebo tests should produce significant
links to our Chinese internet penetration variable for firms in these economies.
We therefore consider regressions analogous to those in Table B.21, except that we replace
the dependent variable with patenting activity associated with firms in each of these alternative
economies. Table B.22 displays the results. In Columns (1) to (3), the dependent variable is based
on patents filed with USPTO by assigned entities in Japan. Columns (4) to (6) are based on North
American entities and (7) to (12) are based on European Union entities.
73
The results in Table B.21 show that, across all columns and thus all economic regions, we find
no evidence that our Chinese internet penetration variable predicts ex post patenting activity by
firms in these regions. The absence of results also holds uniformly over the first, second and third
years following the increases in internet penetration.
Furthermore, the economic size of the coefficients are much smaller than those for Chinese
patents documented earlier. In fact, six of the nine regressions show a negative sign, whereas the
results for China are positive and highly significant. Especially when combined with our results
for Table B.18, these placebo tests indicate that our internet penetration measure rather cleanly
measures the ability of Chinese firms uniquely to compete in the market for innovation at a global
level. We find no impact for firms in other nations, suggesting that the exclusion requirement likely
holds in a first order way.
2.5.4 Competition and Asset Composition
As we noted in our discussion of hypotheses, the impact of foreign competition on the innovation
activities of U.S. firms can vary based on the specific threats posed by the foreign entrants, and
also based on the asset composition of the affected U.S. firms. For example, theory suggests that
competition in the market for existing products can either increase or decrease innovation activities
by affected U.S. firms. Moreover, U.S. firms having non-redeployable assets might have particularly
strong incentives to increase innovation spending on the margin. In particular, innovation can help
firms “escape competition” and serve higher quality market segments while conceding low quality
segments to the entrants.
74
2.5.4.1 High versus Low Growth Options
Because our primary focus is on competition in the market for innovation, it also follows that
our predictions should be particularly strong for U.S. firms that have stronger growth options, as
innovation is a large fraction of firm value for these firms. Analogously, firms with few growth
options are likely more impacted by competition on the other margin: competition in the market
for existing products.
In this section, we thus consider the two subsample tests motivated by these hypotheses: high
versus low growth option value and high versus low asset tangibility (indicating assets that have a
higher likelihood of being costly to redeploy).
We first examine whether our results are stronger for U.S. firms with high versus low growth
options as measured by each firm’s market-to-book ratio. To do so, we start with the models we ran
in prior sections of this study, but add an interaction between the internet dummy and an additional
dummy variable, HighQ, which equals to one if the firm has an above-median market-to-book ratio
in the prior year. We also include the HighQ dummy itself in the model. The dependent variables
include the complaint measures from Table B.17, and the innovation measures from Table B.19.
Table B.23 shows the results.
Columns (1) to (3) show that higher market-to-book firms complain more about competition
from China, especially such complaints that occur in the context of paragraphs discussing innova-
tion. As documented in the existing literature, these high valuation firms tend to have more growth
options and are more innovative. As a result, their overall valuations load highly on their ability
to control markets for innovation in their sectors, and direct competition from Chinese peers on
75
the margin of innovation production should be particularly strong. The coefficient of the interac-
tion term is generally one-third as large as the coefficient of the internet penetration level alone,
suggesting an economically large difference between the high Q and low Q firms.
We also find that these high value firms have innovation activities that are also more sensitive to
Chinese internet penetration. As shown in Columns (4) to (7), these high market-to-book ratio firms
more severely scale back on their R&D expenses and patenting activities when internet penetration
is high. The coefficient of the interaction term for R&D in Column (4) is -0.063, almost half the
size of the coefficient of the internet penetration variable itself, which is -0.147. The effect is also
economically large for patenting activities.
We conclude that our results for competition in the market for innovation are stronger for U.S.
firms that that have more valuable growth options and thus more potential exposure to competitive
threats that are uniquely in the market for innovation production.
2.5.4.2 Trapped Assets
The theory of Bloom, Draca, and Van Reenen (2016) suggests that firms with more trapped assets
(assets with a high adjustment cost to redeploy) will have stronger incentives to increase innovation
following shocks to competition. This is due to the possibility that innovation can facilitate an
escape from competition into higher-quality market segments (Aghion, Bloom, Blundell, Griffith,
and Howitt (2005)). We note that this prediction is also squarely about irreversibility in production
and adjustment costs. This prediction thus pertains specifically to shocks to competition in the
market for existing products. When such competition increases, the affected firms become more
innovative even if they were not highly innovative before the shock’s arrival. The prediction is that
U.S. firms will increase innovation following such competitive shocks.
76
A material related fact obtains from the results of the previous section. Just as our results are
stronger when U.S. firms have more growth options, it follows that they are weaker in the diametric
opposite subsample: when firms have few growth options. We find in the previous section that such
firms have less negative reactions to the competitive shocks than do firms with high growth options.
The results in the previous section suggest that results for competition in the market for existing
products should be different, and might be more positive on U.S. firm innovation activities.
We now test whether the likely existence of trapped assets also favors higher innovation levels
for the affected U.S. firms as the aforementioned theories predict. We measure the likely existence
of trapped assets using the level of asset tangibility of the U.S. firms. We then consider regressions
similar to those in the previous section, but we interact internet penetration with a dummy indicat-
ing above-median asset tangibility in the prior year (instead of a high market-to-book dummy).
Table B.24 displays the results. Columns (1) to (3) show that firms with higher asset tangibility
complain more about the Chinese competition. This supports the notion that these firms face fewer
options to adapt to the increased competition because they cannot easily downsize as some theories
would predict as optimal. These results are consistent with the existence of trapped assets. More-
over, despite these additional complaints, we find that high asset tangibility firms favor increases
in innovation relative to firms with less asset tangibility as the cross terms in Columns (4) to (7)
are all positive and highly significant at the 1% level or the 5% level. These findings are consistent
with the possibility of increased innovation to plausibly escape competition.
Although these results support the theories of Bloom, Draca, and Van Reenen (2016) and
Aghion, Bloom, Blundell, Griffith, and Howitt (2005) for this well-motivated subsample, we note
that our broader results show that this outcome is not observed in all situations. In particular, the
sample-wide results strongly favor down-sizing of innovative activities when the competitive shock
77
is in the market for innovation production. Yet, these results echo the differences in outcomes when
the competitive shock is primarily in the market for existing products, as here the predictions (and
the empirical results) are more ambiguous.
Overall, our analysis of two competitive margins—competition in innovation and competition in
existing products—plus accounting for the role of asset composition of the treated U.S. firms, helps
to explain much of the disagreement in the literature regarding the impact of foreign competition
on U.S. firms’ innovative activities. Collectively, our results stress the importance of analyzing
competition on multiple margins when the competitive threats are more nuanced. Our results
also indicate the importance of initial conditions such as asset composition, as these characteristics
strongly moderate the incentives to increase or decrease innovation.
2.6 Conclusions
We examine the impact of increased Chinese competition in innovation on U.S. firms’ R&D and
patents. We use Chinese province-level data on internet penetration and geographic industry-
specific agglomeration data to generate plausibly exogenous variation in the capacity of Chinese
firms to challenge U.S. firm innovation. We find higher rates of U.S. firms ex post complaining
about high competition from Chinese firms, especially when discussing their innovation. Moreover,
we find direct evidence of realized competition as Chinese firms apply for more patents that cite
the patents of the U.S. firms that are exposed to the internet penetration. In placebo tests, we find
limited evidence that the Chinese internet penetration impacts R&D and patenting for firms from
other major economies.
Our main conclusion is that increased intellectual property competition has a strong and robust
negative impact on U.S. firm R&D spending and realized patents. This indicates a crowding-out
78
effect as the foreign rivals capture some of the rents of innovation. This result is in contrast to the
existing empirical literature, which focuses on competition for existing products.
Our results show vacation by firms with high growth options versus high assets-in-place. The
impact of foreign competition in innovation on U.S. firm innovation is particularly negative for firms
that have higher-valued growth options as measured by their market-to-book ratios. In contrast, the
impact is less severe when U.S. firms have assets-in-place and high adjustment costs. As predicted
by existing theories, our results are consistent with firms with high assets-in-place attempting to
differentiate their existing products and thus investing more in R&D and patents. Overall our
results help to reconcile disagreement in the literature on whether competition leads to increases
or decreases in domestic firm innovation. Given the importance of these issues in political and
regulatory circles, we believe more work examining multiple competitive margins and potential
intellectual property theft would be invaluable.
79
Bibliography
Aghion, Philippe, Nicholas Bloom, Richard Blundell, Rachel Griffith, and Peter Howitt, 2005, Com-
petition and innovation: an inverted u relationship, Quarterly Journal of Economics 120, 701–
28.
Ahern, Kenneth R, and Denis Sosyura, 2015, Rumor has it: Sensationalism in financial media, Rev.
Financ. Stud. 28, 2050–2093.
Altınkılıc ¸, Oya, Vadim S Balashov, and Robert S Hansen, 2013, Are analysts’ forecasts informative
to the general public?, Management Science 59, 2550–2565.
Autor, David, David Dorn, Gordon H Hanson, Gary Pisano, and Pian Shu, 2018, Foreign competition
and domestic innovation: Evidence from U.S. patents, MIT working paper.
Barber, Brad M, and Terrance Odean, 2008, All that glitters: The effect of attention and news on
the buying behavior of individual and institutional investors, Review of Financial Studies 21,
785–818.
Ben-Rephael, Azi, Zhi Da, and Ryan D Israelsen, 2017, It depends on where you search: Institutional
investor attention and underreaction to news, Review of Financial Studies 30, 3009–3047.
Bena, Jan, and Kai Li, 2014, Corporate innovations and mergers and acquisitions, Journal of
Finance 69, 1923–1960.
Beyer, Anne, Daniel A Cohen, Thomas Z Lys, and Beverly R Walther, 2010, The financial reporting
environment: Review of the recent literature, Journal of Accounting and Economics 50, 296–343.
Blankespoor, Elizabeth, Ed deHaan, and Christina Zhu, 2018, Capital market effects of media syn-
thesis and dissemination: evidence from robo-journalism, Review of Accounting Studies 23, 1–
36.
Bloom, Nicholas, Mirko Draca, and John Van Reenen, 2016, Trade induced technical change? the
impact of chinese imports on innovation, IT and productivity, Review of Economics Studies 83,
87–117.
Brown, Lawrence D, Andrew C Call, Michael B Clement, and Nathan Y Sharp, 2015, Inside the
“black box” of Sell-Side financial analysts, Journal of Accounting Research 53, 1–47.
Chen, Huaizhi, Lauren Cohen, Umit Gurun, Dong Lou, and Christopher Malloy, 2017, IQ from IP:
Simplifying search in portfolio choice, Working paper.
80
Cohen, Lauren, and Andrea Frazzini, 2008, Economic links and predictable returns, Journal of
Finance 63, 1977–2011.
Da, Zhi, Joseph Engelberg, and Pengjie Gao, 2011, In search of attention, Journal of Finance 66,
1461–1499.
Dellavigna, Stefano, and Joshua M Pollet, 2009, Investor inattention and friday earnings announce-
ments, Journal of Finance 64, 709–749.
Drake, Michael S, Darren T Roulstone, and Jacob R Thornock, 2015, The determinants and conse-
quences of information acquisition via EDGAR, Contemp Account Res 32, 1128–1161.
Duranton, Gilles, and Henry G Overman, 2005, Testing for localization using micro-geographic
data, The Review of Economic Studies 72, 1077–1106.
, 2008, Exploring the detailed location patterns of uk manufacturing industries using micro-
geographic data, Journal of Regional Science 48, 213–243.
Ellison, Glenn, and Edward L Glaeser, 1997, Geographic concentration in us manufacturing indus-
tries: a dartboard approach, Journal of political economy 105, 889–927.
Engelberg, Joseph E, and Christopher A Parsons, 2011, The causal impact of media in financial
markets, Journal of Finance 66, 67–97.
Fedyk, Anastassia, 2018, Front page news: The effect of news positioning on financial markets,
Working paper.
Florence, Philip Sargant, 1948, Investment, location, and size of plant . vol. 7 (CUP Archive).
Fuchs, Victor R, 1962, The determinants of the redistribution of manufacturing in the united states
since 1929, Review of Economics and Statistics 44, 167–177.
Goldstein, Itay, and Liyan Yang, 2015, Information diversity and complementarities in trading and
information acquisition, Journal of Finance 70, 1723–1765.
Goyenko, Ruslan Y, Craig W Holden, and Charles A Trzcinka, 2009, Do liquidity measures measure
liquidity?, J. financ. econ. 92, 153–181.
Hillert, Alexander, Heiko Jacobs, and Sebastian M¨ uller, 2014, Media makes momentum, Review of
Financial Studies 27, 3467–3501.
Hoberg, Gerard, and Gordon Phillips, 2016, Text-Based network industries and endogenous product
differentiation, Journal of Political Economics 124, 1423–1465.
Hombert, Johan, and Adrien Matray, 2018, Can innovation help us manufacturing firms escape
import competition from china?, The Journal of Finance 80, 2003–2039.
Hong, Harrison, and Jeffrey D Kubik, 2003, Analyzing the analysts: Career concerns and biased
earnings forecasts, Journal of Finance 58, 313–351.
Hoover, Edgar M, 1948, Location of economic activity (McGraw-Hill Book Company, Inc., New
York).
81
Kacperczyk, Marcin, and Amit Seru, 2007, Fund manager use of public information: New evidence
on managerial skills, Journal of Finance 62, 485–528.
Kacperczyk, Marcin, Stijn Van Nieuwerburgh, and Laura Veldkamp, 2016, A rational theory of
mutual funds’ attention allocation, Econometrica 84, 571–626.
Kogan, Leonid, Dimtris Papanikolaou, Amit Seru, and Noah Stoffman, 2016, Technological innova-
tion, resource allocation and growth, Quarterly Journal of Economics forthcoming.
Koh, Ping-Sheng, and David Reeb, 2015, Missing r&d, Journal of Accounting and Economics pp.
73–94.
Krugman, Paul R, 1993, Geography and trade (MIT press).
Lee, Charles M C, Paul Ma, and Charles C Y Wang, 2015a, Search-based peer firms: Aggregating
investor perceptions through internet co-searches, Journal of Financial Economics 116, 410–431.
, 2015b, Search-based peer firms: Aggregating investor perceptions through internet co-
searches, Journal of Financial Economics 116, 410–431.
Ljungqvist, Alexander, Felicia Marston, Laura T Starks, Kelsey D Wei, and Hong Yan, 2007, Conflicts
of interest in sell-side research and the moderating role of institutional investors, Journal of
Financial Economics 85, 420–456.
Loughran, Tim, and Bill McDonald, 2017, The use of EDGAR filings by investors, Journal of
Behavioral Finance 18, 231–248.
Menzly, Lior, and Oguzhan Ozbas, 2010, Market segmentation and cross-predictability of returns,
Journal of Finance 65, 1555–1580.
Michaely, Roni, Amir Rubin, and Alexander Vedrashko, 2016, Further evidence on the strategic
timing of earnings news: Joint analysis of weekdays and times of day, Journal of Accounting and
Economics 62, 24–45.
Neuhierl, Andreas, Anna Scherbina, and Bernd Schlusche, 2013, Market reaction to corporate press
releases, Journal of Financial and Quantitative Analysis 48, 1207–1240.
Peress, Joel, 2014, The media and the diffusion of information in financial markets: Evidence from
newspaper strikes, Journal of Finance 69, 2007–2043.
Phillips, Gordon M., and Alexei Zhdanov, 2013, R&d and the incentives from merger and acquisition
activity, Review of Financial Studies 34-78, 189–238.
Reinganum, Jennifer F, 1989, The timing of innovation: Research, development, and diffusion,
Handbook of industrial organization 1, 849–908.
Roberts, Michael R, and Toni M Whited, 2013, Endogeneity in empirical corporate finance1, in
Handbook of the Economics of Financevol. 2 . pp. 493–572 (Elsevier).
Solomon, David H, and Eugene F Soltes, 2012, Managerial control of business press coverage,
Working paper.
82
Tetlock, Paul C, 2007, Giving content to investor sentiment: The role ogf media in the stock market,
Journal of Finance 62, 1139–1168.
, 2010, Does public financial news resolve asymmetric information?, Review of Financial
Studies 23, 3520–3557.
, 2014, Information transmission in finance, Annual Review of Financial Economics 6, 365–
384.
83
Appendix A
Figures
A.1 Figures for Chapter 1
84
Figure A.1: Information Production after Press Releases
The figure plots the percentage of media articles, analyst estimates, and EDGAR searches that are produced on different
days after the most recent press release from the related firm. The sample includes all the news articles from Dow Jones
Newswire, analyst estimates from I/B/E/S, and EDGAR searches from 2004 to 2017 (EDGAR data ends at June 2017).
The sample only includes firms that are in both of the RavenPack database and CRSP/Compustat Universe. The x-axis in
the figure represents the number of days after the most recent press release, and the y-axis represents the percentage of
observations.
85
Figure A.2: Publication Time of Press Releases
The figures below plot the distribution of press releases publication time within a day or an hour. Panel (a) plots the average percentage of
press releases that are published in different 5-minute intervals within a day. I first split a 24-hour day into 288 5-minute intervals. Then for
each firm, I calculate the percentage of press releases that are published in each 5-minute bin. Finally for each 5-minute interval, I calculate
the average percentage of 8,756 firms. The x-axis denotes the publication time, and the y-axis denotes the percentage. Each bar represents
the average percentage of press releases published in that 5-minute interval, and the dashed lines represent the 95% confidence intervals of
the group means. For purpose of the exhibition, I only plot the confidence intervals for hours between 6AM to 8PM. The darker bars denote
the minutes [0, 5) or [30, 34) of each hour, or the first five minutes after the exact hour or half hour point. To show the distribution of
publication time within an hour, Panel (b) plots the average percentage of press releases that are published in each minute across firms. The
calculation method is similar as in Panel (a). I first calculate the percentage of press release published in each minute for each firm, and then
calculate the average of 8,756 firms. The bars denote the average percentage of press releases of all firms, and the error bars represent the
95% confidence intervals of the group means.
(a) distribution within a day
(b) distribution within an hour
86
Figure A.3: Coefficient Estimates for Different Event Days
This figure plots the coefficient estimates for in the following two regressions using different event days:
AbnNewsijt =log(NPRAj +1)++"ijt
AbnEdgarijtjAbnAnalystijt =
\
AbnNewsij0 ++"ijt
The dependent variables, AbnNews, AbnEdgar, and AbnAnalyst, measure the abnormal number of media coverage, EDGAR requests, and analysts issuing earning forecasts, respectively.
Detailed variable definitions can be found in Table B.11. The subscriptsi denote the firm,j denote the press release, andt denote the event time. For the press releasej, NPRA
measures the number of following press releases that are published in the next 30 seconds.
\
AbnNewsij0 is the predicted value of the abnormal media coverage on the event day
(day 0) using the first equation. includes the firm-year fixed effects, hour-date fixed effects, and detailed press release topic fixed effects. The regression sample contains all the
press releases that are published in the first 10 seconds in 7-9AM and 4PM. Figure (a) plots the estimate for the first equation, where AbnNews is the dependent variable. Figure (b)
and (c) plot the estimates from the second equation, where the dependent variables are AbnEdgar and AbnAnalyst, respectively. The triangle points denote the coefficient estimate,
and the error bars represent the 95% confidence intervals. The x-axis denotes the event timet, wheret = 0 is the press release publication day. All the standard errors are clustered
by firm and date.
(a) AbnNews (b) AbnEdgar (c) AbnAnalyst
87
Figure A.4: Coefficient Estimates for Cumulative Effects
This figure plots the coefficient estimates for in the following two regression using different cumulative period
AbnNews
1!t
ij
=log(NPRAj +1)++"ijt
AbnEdgar
1!t
ij
jAbnAnalyst
1!t
ij
=
\
AbnNewsij0 ++"ijt
The dependent variables, AbnNews, AbnEdgar, and AbnAnalyst, measure the cumulative abnormal number of media coverage, EDGAR requests, and analysts issuing earning forecasts
from day -1 to day t, respectively. Detailed variable definitions can be found in Table B.11. The subscriptsi denote the firm,j denote the press release, andt denote the event time.
For the press releasej, NPRA measures the number of following press releases that are published in the next 30 seconds.
\
AbnNewsij0 is the predicted value of the abnormal media
coverage on the event day (day 0). includes the firm-year fixed effects, hour-date fixed effects, and detailed press release topic fixed effects. The regression sample contains all the
press releases that are published in the first 10 seconds in 7-9AM and 4PM. Figure (a) plots the estimate for the first equation, where cumulative AbnNews is the dependent variable.
Figure (b) and (c) plot the estimates from the second equation, where the dependent variables are cumulative AbnEdgar and cumulative AbnAnalyst, respectively. The triangle
points denote the coefficient estimate, and the error bars represent the 95% confidence intervals. The x-axis denotes the event timet, wheret = 0 is the press release publication day.
All the standard errors are clustered by firm and date.
(a) AbnNews (b) AbnEdgar (c) AbnAnalyst
88
A.2 Figures for Chapter 2
89
Figure A.5: Complaints about Chinese competition
90
Figure A.6: Internet penetration growth variation
91
Figure A.7: Number of industries (SIC2) covered by Chinese public firms
92
Figure A.8: Weight loadings by Province-Industry
93
Appendix B
Tables
B.1 Tables for Chapter 1
Table B.1: Summary statistics
N Mean Std Dev 10th 50th 90th
Panel A: Press release issuance
Full sample
# of PRs 738,196
# of unique firms 8,756
# of PRs per firm-year 59,035 12.50 9.08 3.00 11.00 23.00
# of PRs per day 2,958 242.00 113.16 113.70 226.00 400.00
Issued in the first 10 seconds of each hour
# of PRs 131,683
# of unique firms 7,503
# of PRs per firm-year 34,532 3.81 3.97 1.00 2.00 8.00
# of PRs per day 2,958 43.04 21.97 19.00 39.00 72.00
Issued in the first 10 seconds of 7-9AM and 4PM
# of PRs 80,246
# of unique firms 6,560
# of PRs per firm-year 24,766 3.24 3.43 1.00 2.00 7.00
# of PRs per day 2,949 26.35 15.93 10.00 23.00 48.00
Panel B: Sample summary statistics
Issued in the first 10 seconds of each hour
Age 131376 20.54 16.06 5.00 15.00 48.00
Q 125557 2.05 1.52 0.97 1.52 3.81
AT 130417 21006.78 86222.38 56.95 1093.07 32476.00
ESS 80246 55.35 7.90 50.00 50.00 69.00
NWord 131683 11.56 4.30 7.00 11.00 17.00
NPRA 131683 36.49 31.95 4.00 27.00 84.00
log(NPRA + 1) 131683 3.17 1.11 1.61 3.33 4.44
News 131683 1.75 2.96 0.00 0.00 5.00
AbnNews 131683 0.34 0.82 -0.46 -0.03 1.59
News Dummy 131683 0.48 0.50 0.00 0.00 1.00
DJPR 131683 0.85 0.36 0.00 1.00 1.00
Issued in the first 10 seconds of 7-9AM and 4PM
NPRA 80246 51.07 32.24 13.00 46.00 96.00
log(NPRA + 1) 80246 3.71 0.79 2.64 3.85 4.57
AbnNews 80246 0.41 0.83 -0.43 0.00 1.67
Continued on next page
94
Table B.1 – Continued from previous page
N Mean Std Dev 10th 50th 90th
AbnEdgar Human 75673 0.35 0.75 -0.34 0.03 1.41
AbnEdgar Exist 75673 -0.48 0.99 -1.83 -0.16 0.51
AbnEdgar Ins 75673 -0.85 1.22 -2.53 -0.64 0.20
AbnEdgar 13f 75673 -0.69 1.20 -2.35 -0.28 0.48
AbnAnalyst 80246 0.18 0.62 -0.30 -0.03 1.20
AbnAnalyst MoreAccu 80246 0.04 0.59 -0.53 -0.06 0.94
AbnAnalyst LessAccu 80246 0.01 0.56 -0.57 -0.06 0.89
AbnAnalyst MoreExp 80246 -
AbnAnalyst LessExp 80246 -
AbnTurnover 80117 0.02 0.80 -0.79 0.00 0.96
AbnSpread EW 41832 1.11 11.48 0.64 0.97 1.54
AbnSpread VW 41832 1.04 0.51 0.58 0.92 1.63
jCARj 80142 2.84 5.35 0.21 1.36 6.65
Range 80142 5.09 5.62 1.33 3.53 10.35
% Priv 80246 0.04 0.04 0.00 0.03 0.09
log(NPRPriv + 1) 80246 0.83 0.71 0.00 0.69 1.79
95
Table B.2: Covariate Balancing Test of The On-screen Time
This table shows that the observable firm and event characteristics do not correlate with NPRA, which measures the number of new press
releases published in the top 4 press release wires after the press releasej in the next 30 seconds. The dependent variable for Columns (1)-(7)
is log of 1 plus NPRA, and for Column (8) is the log of 1 plus the number of media articles on the Dow Jones Newswire covering the firm
on the event day. The regression sample includes all the press releases published in the first 10 seconds of each hour. Standard errors are
clustered by firm and day, and t-statistics are reported in the parentheses. Coefficients marked with ***, **, and * are significant at the 1%,
5%, and 10% level, respectively. Detailed variable definitions can be found in Table B.11.
log(NPRA + 1) News
(1) (2) (3) (4) (5) (6) (7) (8)
Q 0.001 0.001 0.009
(0.409) (0.299) (3.503)
log(AT) 0.000 0.001 0.014
(0.065) (0.151) (1.899)
log(Age + 1) 0.002 0.013 0.039
(0.091) (0.524) (1.774)
log(ESS) 0.025 0.022 0.496
(1.403) (1.201) (17.274)
NWords 0.000 0.000 0.019
(0.191) (0.276) (22.174)
log(NPRInd + 1) 0.005 0.005 0.027
(0.658) (0.620) (3.295)
Firm FE Y Y Y Y Y Y Y Y
Date FE Y Y Y Y Y Y Y Y
Hour FE Y Y Y Y Y Y Y Y
Broad Topic FE Y Y Y Y Y Y Y Y
F-stat 0.3906 147.7
p-value 0.886 <0.001
Observations 127,844 130,407 130,407 131,683 131,683 131,683 127,844 127,844
Adjusted R
2
0.778 0.778 0.778 0.777 0.777 0.777 0.778 0.464
96
Table B.3: Press Releases’ On-screen Time and Media Coverage
This table shows that if a press release stays on the screen for shorter time due to new press releases replacing it, the issuing
firm will receive less media coverage. I estimate the following regression
AbnNews
ijt
=log(NPRA
j
+1)++"
ijt
The dependent variable is the abnormal media coverage, defined as log of 1 plus the number of news articles that cover the
press release issuing-firm on the issuing day, minus log of 1 plus the average number of news articles per day for the firm in
the past 60 days, skipping 10 days. The independent variable, NPRA, measures the number of new press releases published
in the top 4 press release wires after the press releasej in the next 30 seconds. This is the proxy for the on-screen time of
each press release. The regression controls for the fixed effects including: date-hour, firm, firm-year, broad topic, and/or
detailed topic. The regression sample contains all the press releases published in the first 10 seconds of an hour. Standard
errors are clustered by firm and day, and t-statistics are reported in the parentheses. Coefficients marked with ***, **, and
* are significant at the 1%, 5%, and 10% level, respectively. Detailed variable definitions can be found in Table B.11 in the
Appendix.
AbnNews
(1) (2) (3) (4) (5)
log(NPRA + 1) 0.128
0.132
0.163
0.100
0.093
(12.252) (12.877) (11.263) (8.332) (7.990)
Date-Hour FE Y Y Y Y Y
Firm FE Y
Firm-Year FE Y Y Y
Broad Topic FE Y
Detailed Topic FE Y
Observations 131,683 131,683 131,683 131,683 131,683
Adjusted R
2
0.174 0.303 0.327 0.473 0.491
97
Table B.4: Falsification tests
This table shows the falsification tests of the effect of on-screen time on media coverage. The dependent variable in Column
(1) is a dummy variable that equals to 1 if the automated algorithm from Dow Jones Newswire republishes the press release.
The dependent variable in Columns (2) is the abnormal media measure, AbnNews, on the day before the event (press release
publication) day, and the dependent variables in Columns (3) - (7) are the same abnormal media measure on the event day.
Columns (3) and (4) show regression results using press releases that are published in the first 10 seconds of minutes other
than the first minute. Column (3) includes all the press releases published in the first 10 seconds of the 31st minute in each
hour. Column (4) includes all the press published in the first 10 seconds of all the minutes except for the 1st and the 31st
minutes. Column (5) and (6) show regression results using press releases that are published in different hours. Column
(5) includes all the press releases published in the first 10 seconds in 7-9AM and 4PM, and Column (6) includes all the
press releases published in the first 10 seconds of all other hours. In Column (7), I first sort the press releases by NPRA into
quintiles, and Q2 - Q5 in the independent variables are quintile dummies, where Q5 represents the highest-NPRA quintile.
Standard errors are clustered by firm and day, and t-statistics are reported in the parentheses. Coefficients marked with ***,
**, and * are significant at the 1%, 5%, and 10% level, respectively. Detailed variable definitions can be found in Table B.11.
DJPR AbnNews
t-1 31st
minute
other
minutes
7-9AM &
4PM
Other
Hours
(1) (2) (3) (4) (5) (6) (7)
log(NPRA+1) 0.006 0.006 0.081
0.047
0.104
0.047 0.021
(0.774) (0.910) (3.867) (17.207) (8.523) (0.841) (0.889)
log(NPRA+1) x Q2 0.034
(1.168)
log(NPRA+1) x Q3 0.062
(2.200)
log(NPRA+1) x Q4 0.097
(3.340)
log(NPRA+1) x Q5 0.101
(3.113)
Date-Hour FE Y Y Y Y Y Y Y
Firm-Year FE Y Y Y Y Y Y Y
Detailed Topic FE Y Y Y Y Y Y Y
Second FE Y Y Y Y Y Y Y
Observations 131,683 131,683 69,185 252,532 80,246 51,437 131,683
Adjusted R
2
0.284 0.084 0.455 0.488 0.539 0.105 0.496
98
Table B.5: Media Coverage and Information Production
This table shows that media coverage increases the number of EDGAR web requests and the number of analysts issuing
earning forecasts. The dependent variables in Columns (1)-(4) are the abnormal number of EDGAR requests, defined as the
log of 1 plus the number of requests made on a day minus the log of 1 plus the average number of EDGAR requests per day
in the past 60 days, skipping 10 days. I exclude EDGAR searches which are on the index page, have a server code above 300,
or from IP addresses that have made more than 5 requests in a minute or 1000 requests in a day. The dependent variables in
Columns (5)-(8) are the abnormal number of earning forecasts, defined as log of 1 plus the number of analysts who issued
any forecasts for the firm on a day minus the log of 1 plus the average of daily number of analysts issuing forecasts for the
firm in the past 60 days, skipping 10 days. The dependent variables in Columns (1) - (3) and (5) - (7) are measures on day
0, while the dependent variables in Column (4) and (8) are cumulative measures on days 0 to 1, where the press release
day is day 0. Columns (1) and (5) show OLS regressions where the abnormal media coverage is the independent variable.
Columns (2) and (6) show OLS regressions where the instrument, log(NPRA+1), is the independent variable. Columns
(3), (4), (7), and (8) show the second-stage results of two-stage least square regressions, where the first-stage is shown in
Column (5) of Table B.4. The regression sample includes all the press releases published in the first 10 seconds of 7-9AM
and 4PM. Standard errors are clustered by firm and day, and t-statistics are reported in the parentheses. Coefficients marked
with ***, **, and * are significant at the 1%, 5%, and 10% level, respectively. Detailed variable definitions can be found in
Table B.11.
AbnEdgar AbnAnalyst
OLS IV 2SLS OLS IV 2SLS
day 0 days 0-1 day 0 days 0-1
(1) (2) (3) (4) (5) (6) (7) (8)
AbnNews 0.195
0.288
(22.370) (29.663)
log(NPRA+1) 0.034
0.053
(2.821) (5.486)
\
AbnNews 0.290
0.313
0.469
0.784
(2.870) (3.374) (6.136) (7.609)
Date-hour FE Y Y Y Y Y Y Y Y
Firm-Year FE Y Y Y Y Y Y Y Y
Detailed Topic FE Y Y Y Y Y Y Y Y
Observations 75,667 75,667 75,667 75,667 80,246 80,246 80,246 80,246
Adjusted R
2
0.300 0.272 0.294 0.328 0.479 0.398 0.446 0.495
99
Table B.6: The Effect of Media Coverage and Information Producer Characteristics
This table shows the effect of media coverage on the information producers of different characteristics. The table reports the
second-stage regression coefficients from Equation 1.3, where the first-stage regression result is shown in the Column (5)
of Table B.4. The dependent variables in Columns (1) and (3) are the cumulative abnormal number of EDGAR requests on
days 0 and 1, and the dependent variables in Columns (4) and (9) are the cumulative abnormal number of analysts issuing
earning forecasts on days 0 and 1. Columns (1) and (2) include EDGAR requests that are from financial institutions. I identify
financial institutions by matching the IP addresses that belong to known institutions registered with the autonomous system
(ASN). Then from the names of the institutions, provided by the ASN registration list, I search for finance-related words
in their names (Column 1) or match the name to 13F institutions (Column 2) to identify financial institutions. Column
(3) only includes human web requests are from IP addresses that have searched for the same firm in the previous month.
For more details about the EDGAR log data and the institution matching process, please refer to the Online Appendix B.
Columns (4)-(9) show the effect of media on analysts of different characteristics. In Columns (4) - (6), I group analysts
by their accuracy, defined as the average relative accuracy in the previous three years. Column (4) only includes analysts
who have an above-median accuracy, and Column (5) includes analysts who have below-median accuracy. Column (6) uses
the difference of the dependent variables in Columns (4) and (5) as the dependent variable. In Columns (7) - (9), I sort
analyst-firm pairs into two groups based on the experience, which is defined as the number of years that the analyst has been
covering the same firm in the past five years. Column (7) includes analysts who have an above-median experience in a year,
and Column (8) includes analysts who have an below-median experience. Column (9) uses the difference of the dependent
variables in Columns (7) and (8) as the dependent variable. The regression sample includes all the press releases published
in the first 10 seconds of 7-9AM and 4PM. Standard errors are clustered by firm and day, and t-statistics are reported in the
parentheses. Coefficients marked with ***, **, and * are significant at the 1%, 5%, and 10% level, respectively. Detailed
variable definitions can be found in Table B.11.
AbnEdgar AbnAnalyst
Ins 13F Existing
follower
Accuracy Experience
>50th <50th Diff >50th <50th Diff
(1) (2) (3) (4) (5) (6) (7) (8) (9)
\
AbnNews 0.374
0.385
0.402
0.398
0.344
0.060 0.653
0.439
0.222
(2.496) (2.503) (2.920) (5.179) (5.215) (0.915) (7.161) (6.532) (2.837)
Date-hour FE Y Y Y Y Y Y Y Y Y
Firm-Year FE Y Y Y Y Y Y Y Y Y
Detailed Topic FE Y Y Y Y Y Y Y Y Y
Observations 75,667 75,667 75,667 80,246 80,246 80,246 80,246 80,246 80,246
Adjusted R
2
0.242 0.608 0.516 0.320 0.369 0.097 0.452 0.383 0.224
100
Table B.7: The Effect of Media Coverage and Firm Characteristics
This table shows how the effect of media coverage change with firm characteristics. The table reports the second-stage
regression coefficients from Equation 1.3. The dependent variables in Columns (1)-(4) are the cumulative abnormal number
of EDGAR requests on days 0 and 1, and the dependent variables in Columns (5)-(8) are the cumulative abnormal number
of analysts issuing earning forecasts on days 0 and 1. In Columns (1), (2), (5), and (6), I first sort firms into two groups
based on their average institutional ownership in the previous year. Columns (1) and (5) include observations where the
institutional ownership is above median in the previous year, while Columns (2) and (6) use the below-median sample.
Similarly, in Columns (3), (4), (7), and (8), I first sort firms into two groups based on their average monthly idiosyncratic
volatility in the last three month. Columns (3) and (7) include observations where the IVOL is above the median measure
using the same last three months, while Columns (4) and (8) use the below-median sample. The regression sample includes
all the press releases published in the first 10 seconds of 7-9AM and 4PM. Standard errors are clustered by firm and day,
and t-statistics are reported in the parentheses. Coefficients marked with ***, **, and * are significant at the 1%, 5%, and
10% level, respectively. Detailed variable definitions can be found in Table B.11.
AbnEdgar AbnAnalyst
Institution Holding IVOL Institution Holding IVOL
> 50th < 50th > 50th < 50th > 50th < 50th > 50th < 50th
(1) (2) (3) (4) (5) (6) (7) (8)
\
AbnNews 0.395
0.325 0.275 0.395
1.009
0.506
0.920
0.927
(2.707) (1.405) (1.263) (2.091) (4.971) (2.993) (4.269) (3.793)
Date-hour FE Y Y Y Y Y Y Y Y
Firm-Year FE Y Y Y Y Y Y Y Y
Detailed Topic FE Y Y Y Y Y Y Y Y
Observations 36,220 36,223 30,420 30,429 38,258 38,304 30,420 30,429
Adjusted R
2
0.340 0.286 0.300 0.337 0.486 0.476 0.378 0.450
101
Table B.8: Media coverage and market reaction
This table shows that media coverage significantly affects trading volume, announcement returns, effective spread, and price ranges. The table reports the
second-stage regression coefficients from Equation 1.3. The dependent variables in Columns (1) - (5) are from the first trading day after the press releases, and
the dependent variables in Columns (6) - (10) are average measures (except for CAR) or cumulative measures (CAR) in days 1-2, the second and third trading
days after the press release. The dependent variables for Columns (1) and (6) are abnormal turnover, defined as log of 1 plus the turnover minus the log of 1 plus
the average daily turnover in the previous 60 trading days. The dependent variables in Columns (2), (3), (7) and (8) are abnormal effective spread measures,
with equal-weighted measures in Columns (2) and (7) and value-weighted measures by dollar amount in Columns (3) and (8). The dependent variables in
Columns (4) and (9) are the absolute value of cumulative abnormal returns, where abnormal returns are calculated by subtracting the CRSP value-weighted
index return on the same day. The dependent variables in Columns (5) and (10) are the price ranges, defined as the log of the daily highest price minus the
log of the daily lowest prices. The regression sample includes all the press releases published in the first 10 seconds of 7-9AM and 4PM. Standard errors are
clustered by firm and day, and t-statistics are reported in the parentheses. Coefficients marked with ***, **, and * are significant at the 1%, 5%, and 10% level,
respectively. Detailed variable definitions can be found in Table B.11.
Day 0 Days 1-2
AbnTurnover AbnSpread jCARj Range AbnTurnover AbnSpread jCARj Range
EW VW EW VW
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
\
AbnNews 0.492
0.239
0.266
0.021
0.023
0.222
0.036 0.014 0.003 0.007
(3.519) (3.006) (2.636) (4.349) (4.353) (1.932) (0.544) (0.171) (0.699) (2.063)
Firm-Year FE Y Y Y Y Y Y Y Y Y Y
Date-hour FE Y Y Y Y Y Y Y Y Y Y
Second FE Y Y Y Y Y Y Y Y Y Y
Detailed Topic FE Y Y Y Y Y Y Y Y Y Y
Observations 80,142 41,798 41,798 80,142 80,142 80,173 41,924 41,924 80,160 80,173
Adjusted R
2
0.372 0.197 0.153 0.310 0.517 0.352 0.245 0.203 0.269 0.609
102
Table B.9: Media coverage and delayed response ratio
This table shows that media coverage has insignificant effects on the price efficiency. The table reports the second-stage
regression coefficients from Equation 1.3. The dependent variables are delayed response ratios, as used in Dellavigna and
Pollet (2009). The delayed response ratio over days [0, X] is calculated asR
(2;X)
=R
(0;X)
, whereR
(2;X)
is the cumulative
abnormal returns over days [2, X], andR
(0;X)
is the cumulative abnormal returns over the period [0, X]. Columns (1) -
(6) show results with different period length. The regression sample includes all the press releases published in the first 10
seconds of 7-9AM and 4PM. Standard errors are clustered by firm and day, and t-statistics are reported in the parentheses.
Coefficients marked with ***, **, and * are significant at the 1%, 5%, and 10% level, respectively. Detailed variable
definitions can be found in Table B.11.
Delayed response ratio
[2,5] [2,15] [2,30] [2,45] [2,60] [2,75]
(1) (2) (3) (4) (5) (6)
\
AbnNews 0.097 0.032 1.958 0.346 0.329 0.452
(0.158) (0.061) (0.429) (0.979) (1.140) (1.614)
Firm-Year FE Y Y Y Y Y Y
Date-hour FE Y Y Y Y Y Y
Detailed Topic FE Y Y Y Y Y Y
Observations 80,127 80,127 80,127 80,127 80,127 80,127
Adjusted R
2
0.019 0.004 0.135 0.009 0.005 0.023
103
Table B.10: Robustness tests
This table shows that investors and analysts unlikely experience similar blink effect. I estimate similar regressions as in previous tables. Both
Panel (A) and (B) use all the press releases published in the first 10 seconds of 7-9AM and 4PM. Column (1) uses the abnormal media coverage
on day 0 as the dependent variable. Column (2) uses the cumulative abnormal number of EDGAR searches on days 0 and 1 as the dependent
variable. Column (3) uses the cumulative abnormal number of analyst forecasts on days 0 and 1 as the dependent variable. Column (4) uses
the abnormal turnover on the event day as the dependent variable. Column (5) uses the abnormal spread on the event day as the dependent
variable. Column (6) uses the absolute abnormal return on the event day as the dependent variable. Column (7) uses the intraday price
range on the event day as the dependent variable. In Panel (A), I include an interaction term, log(NPRA + 1) x % Priv, where % Priv is the
percentage of press releases in NPRA that are issued by private firms. In Panel (B), I directly regress the dependent variables on log(NPRPriv
+ 1), where NPRPriv is the number of press releases issued by private firms in the next 30 seconds. I also control for log(NPRInd + 1), where
NPRInd is the number of press releases issued by firms in the same 2-digit SIC industry on the same day. In Panel (C), the regression sample
includes all the press releases issued in the first 10 seconds, excluding the first second, of 7-9AM and 4PM. In Panel (D), the regression sample
only includes press releases that are on earnings or are accompanied by an EDGAR filing on the same day. Standard errors are clustered by
firm and day, and t-statistics are reported in the parentheses. Coefficients marked with ***, **, and * are significant at the 1%, 5%, and 10%
level, respectively. Detailed variable definitions can be found in Table B.11.
AbnNews AbnEdgar AbnAnalyst AbnTurnover AbnSpread
(1) (2) (3) (4) (5)
Panel A: Interaction with the ratio of press releases from private firms
log(NPRA+1) 0.127
0.045
0.100
0.065
0.038
(8.482) (3.792) (7.138) (3.537) (3.175)
log(NPRA+1) x % Priv 0.007 0.019 0.016 0.004 0.008
(0.286) (0.998) (0.767) (0.138) (0.305)
% Priv 0.012 0.036 0.004 0.032 0.041
(0.284) (1.287) (0.133) (0.666) (0.863)
FE Y Y Y Y Y
Observations 80,108 75,536 80,108 80,004 41,760
Adjusted R
2
0.540 0.316 0.465 0.356 0.241
Panel B: direct test using press releases from private firms
log(NPRA+1) 0.126
(9.170)
\
AbnNews 0.269
0.867
0.569
0.249
(2.707) (7.658) (3.673) (2.456)
FE Y Y Y Y Y
Observations 80,246 75,667 80,246 80,142 41,798
Adjusted R
2
0.539 0.335 0.462 0.365 0.191
Panel C: Drop the 1st second
log(NPRA+1) 0.120
(6.944)
\
AbnNews 0.355
0.787
0.552
0.242
(3.095) (5.764) (2.895) (2.281)
FE Y Y Y Y Y
Observations 62,809 59,420 62,809 62,725 33,879
Adjusted R
2
0.548 0.301 0.504 0.376 0.187
Panel D: Required disclosures
log(NPRA+1) 0.159
(3.992)
\
AbnNews 0.445
0.808
0.716
0.329
(2.205) (3.821) (2.403) (1.167)
FE Y Y Y Y Y
Observations 33,780 31,949 33,780 33,758 17,812
Adjusted R
2
0.527 0.347 0.661 0.314 0.058
104
B.2 Appendix Tables for Chapter 1
Table B.11: Variable definitions
Variable Definition Source
NPRA
j
The number of new press releases published immediately after the press
releasej in the next 30 seconds. I only include press releases published on
the top 4 press release wires, namely, PRNewswire, BusinessWire, Globe-
Newswire, Marketwired.
RavenPack PR
Edition
News
it
The total number of news articles covering firmi on dayt. The relevance
score needs to be 100, meaning that firmi is the main subject of the news
article.
RavenPack DJ
Edition
AbnNews
it
Abnormal news coverage. The number is calculated by subtracting the av-
erage daily number of media articles covering firmi in the previous 60 days
from News
it
RavenPack DJ
Edition
AbnNews Novel
it
From AbnNews, I only include news articles whose novelty score from
RavenPack (ESS) is 100
RavenPack DJ
Edition
AbnNews Flash
it
From AbnNews, I only include news articles whose news type in RavenPack
is NEWS-FLASH
RavenPack DJ
Edition
AbnNews Full
it
From AbnNews, I only include news articles whose news type in RavenPack
is FULL-ARTICLE
RavenPack DJ
Edition
News Dummy
it
A dummy variable that equals to 1 if there is any media coverage on that
day
RavenPack DJ
Edition
AbnEdgar
it
Abnormal number of EDGAR requests about firm i on day t. I exclude
requests where idx equals 1 (search on the index page) or the server code
code is above 300. I also exclude web requests from possible web crawlers.
To qualify as a web crawler, an IP address makes more than 5 requests in a
minute or 1000 requests in a day. The abnormal measure is calculated as
the log of 1 plus the number of searches on dayt minus the average daily
number of searches in the past 60 days.
SEC EDGAR log
AbnEdgar Exist
it
Abnormal number of human EDGAR searches from IP addresses which have
accessed the filings from the same firm in the previous month.
SEC EDGAR log
AbnEdgar Ins
it
Abnormal number of EDGAR searches from institutional investors. I identify
institutional investors first by matching IP addresses to known institutions
which have autonomous system numbers. The IP-ASN organization link file
comes from MaxMind. I then search for finance-related words in the names
of the institutions. Details can be found in the Online Appendix B
SEC EDGAR log
AbnEdgar 13F
it
Abnormal number of EDGAR searches from institutional investors. I identify
institutional investors first by matching IP addresses to known institutions
which have autonomous system numbers. The IP-ASN organization link file
comes from MaxMind. I then match the names of these institutions to the
names of all 13F institutions. Details can be found in the Online Appendix
B
SEC EDGAR log
AbnAnalyst
it
Abnormal number of analyst forecasts issued on dayt. For each firm-day,
I count the unique number of analysts who issue any earning forecasts for
firm i. Then the abnormal measure is calculated as the log of 1 plus the
number of analysts issuing any earning forecast for firmi on dayt, minus
the log of 1 plus the average number of analysts issuing forecasts per day in
the past 60 calendar days.
IBES
AbnAnalyst
it
Abnormal number of analyst forecasts issued on dayt. For each firm-day,
I count the unique number of analysts who issue any earning forecasts for
firm i. Then the abnormal measure is calculated as the log of 1 plus the
number of analysts issuing any earning forecast for firmi on dayt, minus
the log of 1 plus the average number of analysts issuing forecasts per day in
the past 60 calendar days.
IBES
Continued on next page
105
Table B.11 – Continued from previous page
Variable Definition Source
AbnAnalyst MoreAccu
it
The abnormal number of analyst forecasts from analysts whose relative
forecast accuracy is above median in the previous year. The relative forecast
accuracy is constructed following Ljungqvist, Marston, Starks, Wei, and Yan
(2007). For analyst i covering firm k in year t, I first calculate the absolute
forecast error using the following steps. (1) get the analysts most recent
forecast of year-end EPS issued between Jan. 1 and Jun. 30, (2) calculate
the difference with the subsequent realized earnings, (3) scale the differ-
ence by previous year-end price. The for all the analysts covering firm k
in year t, I re-scale the absolute forecast errors so that the most and least
accurate analysts scores one and zero, respectively. Finally, analyst i’s rela-
tive forecast accuracy in year t is his/her average score across the the stocks
he/she covers over years t-2 to t.
IBES
AbnAnalyst LessAccu
it
The abnormal number of analyst forecasts from analysts whose relative
forecast accuracy is below median in the previous year.
IBES
AbnAnalyst MoreExp
it
The abnormal number of analyst forecasts from analysts who are experi-
enced. Experienced analysts are defined as the analysts who cover the firm
for an above-median number of years in the past five years.
IBES
AbnAnalyst LessExp
it
The abnormal number of analyst forecasts from analysts who are not ex-
perienced. Experienced analysts are defined as the analysts who cover the
firm for an above-median number of years in the past five years.
IBES
AbnTurnover Abnormal turnover, calculated as the log of 1 plus the turnover minus the
log of 1 plus the average daily turnover in the past 60 trading days.
CRSP
AbnSpread EW Equal-weighted abnormal effective spread. For each trade, I calculate the
effect spread by 2jlog(P
k
)log(M
k
)j, whereP
k
is the price of the trade,
andM
k
is the mid-point of the consolidated BBO at the time of the trade.
I exclude corrected orders, trades with zero price or zero size, trades with
condition codes B, G, J, K, L, O, T, W, or Z, and quotes in which the bid-ask
spread is negative or greater than 50% of the quote midpoint. Trades on all
exchanges are included. Then for each day, I calculate the equal-weighted
average effective spread over all the trades. I define abnormal spread as the
effective spread on dayt over the average daily effective spread in the past
60 trading days.
TAQ
AbnSpread VW value-weighted abnormal effective spread, calculated as the dollar-value
weighted average of all the effective spread within the day.
TAQ
jCARj Absolute value of the abnormal return. I calculate abnormal returns by sub-
tracting the CRSP value-weighted index return from the daily raw returns.
CRSP
Range Daily price range, defined as the log of the daily high price minus the log of
the daily low
CRSP
% Priv The percentage of press releases in NPRA that are issued by private firms. RavenPack PR
Edition
NPRPriv
it
The number of press releases that are from private firms and published in
the next 30 seconds after the press relesej.
RavenPack PR
Edition
NPRInd
it
The total number of press releases that are issued by firms from the same
2-digit SIC industry as firmi, minus 1 (the press release from firmi itself).
RavenPack PR
Edition
ESS Event sentiment score. The score is generated by RavenPack. A group of
experts first read and score a sample of stories to determine the direction
of impacts (positive or negative) and the degree of different event types. In
total they have over 2000 types of events. New articles are then compared
to these tagged events to calculate the score. Stories with scores higher than
50 are positive, and lower than 50 are negative.
RavenPack PR
Edition
AT Total asset Compustat
Q Tobin’s Q, defined as (at + csho x prcc f - ceq) / at Compustat
Age The number of years since publication Compustat,
CRSP
NWord The number of words in the title of a press release RavenPack PR
Edition
Continued on next page
106
Table B.11 – Continued from previous page
Variable Definition Source
DJPR A dummy variable that equals to one if the automated algorithm from Dow
Jones Newswire republishes the press release
RavenPack
PR/DJ Edition
107
Table B.12: Data cleaning process to generate the main sample
The table below shows the steps I take to compile the sample of press releases used in this paper.
Filtering criteria # of press
releases
# of unique firms
Keep if RELEVANCE = 100 and in CRSP/Compustat 1,620,046 9,406
Keep Top 4 press release wires 1,502,021 9,386
Remove duplicated releases (ENS = 100) 1,068,148 9,373
Keep only one press release per firm-day 909,874 9,368
Keep only trading days 901,774 9,366
Keep if after April 1, 2006 738,196 8,756
Keep if issued in the first 30 seconds of an hour 188,981 7,911
Keep if issued in the first 10 seconds of an hour 131,683 7,503
Keep if issued in 7AM-9AM or 4PM 80,246 6,560
108
B.3 Tables for Chapter 2
109
Table B.13: Robustness of Table B.3
This table shows that the results in Table B.3 are robust to news measures, sample selection, and functional form of the dependent variable. Columns (1) - (3)
re-estimate Equation 1.2 using different news measures. Column (1) uses non-duplicated news only, whoseENS score is 100 in RavenPack; Column (2) uses flash
news, which only contains a headline; Column (3) uses full news, which contains a headline and at least one paragraph. Columns (4) - (8) re-estimate Equation
1.2 using different samples. Dependent variables in Columns (4) - (8) are the abnormal number of media articles covering the firm on the press release day.
From all the press releases published in the first 10 seconds in each hour: Column (4) excludes the releases published in the first second, Column (5) excludes
firm-years where the total number of press releases is above the 75th-percentile, Column (6) excludes firm-years where there is only one release, Column (7)
excludes date-hours where there is only one release. Column (8) includes all the press releases published in the first 30 seconds of each hour. Columns (9)-(11)
change the functional form of the dependent variable. Column (9) uses the log(News + 1) as the dependent variable, rather than the abnormal news measure.
Column (10) uses the raw level of the abnormal news, rather than the log change measure, as the dependent variable. Column (11) uses a dummy variable that
equals to 1 if there are any news coverage on the event day, as the dependent variable. The independent variable, NPRA, measures the number of new press
releases published in the top 4 press release wires after the press releasej in the next 30 seconds. The standard errors in all regressions are clustered by firm
and date, and t-statistics are reported in the parentheses. Coefficients marked with ***, **, and * are significant at the 1%, 5%, and 10% level, respectively.
Detailed variable definitions can be found in Table B.11.
Robustness to news measures sample functional form
novel flash full excl. 1st
second
excl. top
25% of
PR issuers
excl. if
only 1 PR
per
firm-year
excl. if
only 1 PR
per
date-hour
first 30
seconds
no
abnormal
no log dummy
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
log(NPRA + 1) 0.085
0.039
0.040
0.094
0.088
0.094
0.094
0.046
0.093
0.365
0.045
(8.688) (4.802) (4.978) (6.023) (6.043) (8.620) (8.751) (6.986) (8.278) (7.897) (6.218)
Date-Hour FE Y Y Y Y Y Y Y Y Y Y Y
Firm-Year FE Y Y Y Y Y Y Y Y Y Y Y
Detailed Topic FE Y Y Y Y Y Y Y Y Y Y Y
Observations 131,683 131,683 131,683 101,115 103,673 120,424 116,450 188,981 131,683 131,683 131,683
Adjusted R
2
0.501 0.157 0.155 0.492 0.492 0.510 0.517 0.501 0.490 0.412 0.416
110
Table B.14: Summary Statistics
This table shows the summary statistics of the variables used in our analyses. Detailed variable definitions can be found in Table B.25
Variable N Mean Std. Dev. Median 75th 95th 99th
CNInternet 61930 0.36 0.23 0.31 0.56 0.75 0.77
CNComp % x 1000 61930 0.15 0.77 0.00 0.00 0.00 5.64
CNComp Dummy 61930 0.05 0.21 0.00 0.00 0.00 1.00
CNCompHi % x 1000 61930 0.09 0.51 0.00 0.00 0.00 3.86
CNCompHi Dummy 61930 0.03 0.18 0.00 0.00 0.00 1.00
CNIntTheft % x 1000 61930 0.04 0.27 0.00 0.00 0.00 2.2
CNIntTheft Dummy 61930 0.02 0.14 0.00 0.00 0.00 1.00
CNIntComp % x 1000 61930 0.05 0.32 0.00 0.00 0.00 2.52
CNIntComp Dummy 61930 0.02 0.15 0.00 0.00 0.00 1.00
EUComp % x 1000 61930 0.31 1.08 0.00 0.00 2.46 6.58
EUCompHi % x 1000 61930 0.16 0.69 0.00 0.00 1.37 4.62
EUIntComp % x 1000 61930 0.11 0.55 0.00 0.00 0.00 3.88
JPComp % x 1000 61930 0.04 0.30 0.00 0.00 0.00 2.39
JPCompHi % x 1000 61930 0.01 0.13 0.00 0.00 0.00 1.28
JPIntComp % x 1000 61930 0.03 0.35 0.00 0.00 0.00 0.00
NAComp % x 1000 61930 0.24 0.93 0.00 0.00 1.96 6.15
NACompHi % x 1000 61930 0.10 0.53 0.00 0.00 0.00 3.85
NAIntComp % x 1000 61930 0.05 0.32 0.00 0.00 0.00 2.53
XRD/Sales 61930 0.14 0.54 0.00 0.06 0.48 4.24
NPatent/Sales 61930 0.03 0.12 0.00 0.00 0.10 0.99
PatCiteCN 61930 3.33 35.78 0.00 0.00 6.00 67.00
PatCiteUS
CN
61930 2.39 32.09 0.00 0.00 4.00 41.00
PatCiteUS
EU
61930 26.85 237.32 0.00 1.00 57.00 549.00
PatCiteUS
JP
61930 23.88 286.82 0.00 0.00 34.00 357.71
PatCiteUS
NA
61930 5.06 53.76 0.00 0.00 11.00 93.00
PatCiteUS
US
61930 226.84 2118.64 0.00 14.00 499.00 4558.55
Age 61884 18.06 13.48 14.00 24.00 47.00 53.00
CNSalesGR 61930 0.08 0.29 0.09 0.25 0.57 0.86
Q 61831 1.95 1.78 1.36 2.09 5.03 11.19
Sales 59849 2702.18 12607.52 283.5 1236.07 10519.42 43890.60
log(TA) 61790 6.20 2.18 6.24 7.70 9.88 11.42
AssetTangibility 59483 0.16 0.20 0.07 0.22 0.62 0.92
CNInternet Macro 61930 0.27 0.20 0.23 0.46 0.62 0.70
CNInternet Top1 61930 0.35 0.24 0.29 0.55 0.75 0.78
111
Table B.15: Summary Statistics at the firm level
We first calculate the mean value of each variables for each firm, and the table shows the summary statistics of the firm-averages. Detailed
variable definitions can be found in Table B.25
Variable N Mean Std. Dev. Median 75th 95th 99th
CNInternet 8474 0.33 0.20 0.31 0.48 0.70 0.76
CNComp % x 1000 8474 0.16 0.69 0.00 0.00 0.98 4.47
CNComp Dummy 8474 0.05 0.18 0.00 0.00 0.36 1.00
CNCompHi % x 1000 8474 0.09 0.44 0.00 0.00 0.48 2.70
CNCompHi Dummy 8474 0.03 0.15 0.00 0.00 0.20 1.00
CNIntTheft % x 1000 8474 0.04 0.25 0.00 0.00 0.00 1.73
CNIntTheft Dummy 8474 0.02 0.13 0.00 0.00 0.00 1.00
CNIntComp % x 1000 8474 0.05 0.26 0.00 0.00 0.17 1.56
CNIntComp Dummy 8474 0.02 0.12 0.00 0.00 0.08 0.83
EUComp % x 1000 8474 0.30 0.85 0.00 0.00 2.06 4.36
EUCompHi % x 1000 8474 0.15 0.54 0.00 0.00 1.13 2.82
EUIntComp % x 1000 8474 0.11 0.43 0.00 0.00 0.79 2.31
JPComp % x 1000 8474 0.04 0.23 0.00 0.00 0.12 1.35
JPCompHi % x 1000 8474 0.01 0.10 0.00 0.00 0.00 0.49
JPIntComp % x 1000 8474 0.03 0.26 0.00 0.00 0.00 0.91
NAComp % x 1000 8474 0.23 0.76 0.00 0.00 1.56 4.18
NACompHi % x 1000 8474 0.10 0.42 0.00 0.00 0.65 2.32
NAIntComp % x 1000 8474 0.05 0.24 0.00 0.00 0.20 1.27
XRD 8474 35.92 279.66 0.00 9.42 89.83 564.53
NPatent 8474 15.21 220.09 0.00 0.27 16.11 211.06
PatCiteCN 8474 1.79 21.34 0.00 0.00 2.21 33.74
PatCiteUS
CN
8474 1.30 18.51 0.00 0.00 1.58 21.12
PatCiteUS
EU
8474 15.53 164.67 0.00 0.5 26.19 308.52
PatCiteUS
JP
8474 13.44 193.68 0.00 0.14 15.72 191.35
PatCiteUS
NA
8474 2.91 34.42 0.00 0.00 5.10 50.60
PatCiteUS
US
8474 132.00 1485.49 0.00 5.9 228.06 2467.89
Age 8465 13.97 12.10 9.50 17.50 44.00 48.00
CNSalesGR 8474 0.06 0.18 0.06 0.13 0.32 0.47
Q 8459 2.00 1.70 1.42 2.18 5.17 9.84
Sales 8158 1811.31 9251.19 177.30 764.93 6705.79 30456.79
log(TA) 8443 5.73 2.15 5.75 7.22 9.38 10.94
AssetTangibility 8215 0.16 0.20 0.07 0.22 0.61 0.81
CNInternet Macro 8474 0.25 0.16 0.26 0.37 0.54 0.66
CNInternet Top1 8474 0.32 0.21 0.30 0.48 0.72 0.77
112
Table B.16: EDGAR searches and Chinese internet penetration
The table displays OLS regressions in which the dependent variable is the number of EDGAR searches scaled by sales. For ease of
interpretation, we standardize this variable to have unit variance in each year. Column (1) tabulates EDGAR searches whose IP addresses are
from China; Column (2) tabulates European IP addresses, Column (3) counts Japanese IP addresses, and Column (4) counts Canadian and
Mexican IP addresses. Following Lee, Ma, and Wang (2015b), we exclude EDGAR searches by web crawlers. In Panel A, the internet
penetration variable is constructed using the weights from all Chinese-domiciled public firms listed in mainland China, Hong Kong, and the
U.S. In Panel B, the internet penetration variable is constructed using the weights from the subset of these firms that are listed only via
mainland China A-shares. All RHS variables are also standardized to have unit variance for ease of interpretation. The sample includes all
Compustat firms from 2004 to 2015 with available 10K filings on the EDGAR system as the EDGAR server log starts in February of 2003. We
exclude all observations where the total assets or sales are smaller than one million dollars. Robust standard errors clustered by firms are
reported in the parentheses. Detailed definitions of the variables can be found in Table B.25 in the Appendix.
Panel A: Weights from A-share, HK-, and US-listed firms
# of EDGAR searches / Sales
CN EU JP NA
(1) (2) (3) (4)
CNInternet 0.100
0.005 0.067 0.002
(0.048) (0.047) (0.047) (0.045)
CNSalesGR 0.008
0.0003 0.004 0.0002
(0.005) (0.005) (0.005) (0.004)
1og(10kSize) 0.012
0.015
0.012 0.023
(0.007) (0.006) (0.007) (0.005)
log(Age + 1) 0.114
0.143
0.102
0.115
(0.032) (0.025) (0.026) (0.026)
log(TA) 0.255
0.453
0.231
0.516
(0.042) (0.045) (0.043) (0.035)
Q 0.007 0.005 0.037
0.027
(0.015) (0.019) (0.021) (0.014)
Firm FE Y Y Y Y
Year FE Y Y Y Y
N 48,183 48,183 48,183 48,183
Panel B: Weights from A-share firms only
# of EDGAR searches / Sales
CN EU JP NA
(1) (2) (3) (4)
CNInternet 0.110
0.022 0.054 0.005
(0.044) (0.049) (0.044) (0.039)
CNSalesGR 0.008
0.0003 0.004 0.0002
(0.005) (0.005) (0.005) (0.004)
1og(10kSize) 0.012
0.015
0.012 0.023
(0.007) (0.006) (0.007) (0.005)
log(Age + 1) 0.115
0.143
0.102
0.115
(0.032) (0.026) (0.026) (0.026)
log(TA) 0.254
0.453
0.230
0.516
(0.042) (0.045) (0.043) (0.036)
Q 0.007 0.005 0.037
0.027
(0.015) (0.019) (0.021) (0.014)
Firm FE Y Y Y Y
Year FE Y Y Y Y
N 48,183 48,183 48,183 48,183
113
Table B.17: Competition complaints and Chinese internet penetration
The table displays OLS regressions in which the dependent variables are textual measures of competition complaints in 10K filings. We
search for four types of complaints in the 10K filings. CNComp measures competition in general; CNCompHi measures competition with
high intensity; CNIntComp measures intellectual property competition; CNIntTheft measures intellectual property theft. All these
competition measures are China-specific, meaning the words ”China” or ”Chinese” appear in the the same paragraph as the competition
complaint phrases. We exclude instances if other countries are in the same paragraph to ensure the compeitition disucssion is truely about
China. More detailed variable construction procedures can be found in Table B.25 in the Appendix. In Columns (1), (3), (5), and (7), the
dependent variables are the number of paragraphs containing the above search instances divided by the total number of paragraphs of the
10K filing. In Columns (2), (4), (6), and (8), the dependent variables are dummies that equal to 1 if we found any of the phrases in the
search. The key independent variable CNInternet is the Chinese internet penetration ratio. All independent variables, except for
log(10kSize), are one-year lagged relative to the dependent variables. All the variables are normalized by their standard deviations for easier
interpretation. The sample covers all Compustat firms from 2001 to 2015 with 10K filings. We exclude all observations where the total asset
or sales are smaller than one million dollars. Robust standard errors clustered by firms are reported in the parentheses. Detailed definitions
of the variables can be found in Table B.25 in the Appendix.
CNComp CNCompHi CNIntComp CNIntTheft
% dummy % dummy % dummy % dummy
(1) (2) (3) (4) (5) (6) (7) (8)
CNInternet 0.124
0.158
0.119
0.137
0.127
0.148
0.126
0.137
(0.041) (0.044) (0.039) (0.042) (0.040) (0.041) (0.039) (0.041)
CNSalesGR 0.001 0.006
0.001 0.002 0.001 0.001 0.004 0.004
(0.003) (0.003) (0.003) (0.003) (0.003) (0.003) (0.002) (0.002)
log(10kSize) 0.108
0.033
0.111
0.052
0.097
0.063
0.042
0.023
(0.011) (0.008) (0.011) (0.009) (0.011) (0.010) (0.008) (0.007)
log(Age + 1) 0.059
0.059
0.061
0.061
0.032 0.030 0.008 0.004
(0.024) (0.024) (0.025) (0.025) (0.027) (0.027) (0.024) (0.024)
log(TA) 0.050
0.030 0.067
0.045
0.037 0.029 0.046 0.044
(0.029) (0.028) (0.028) (0.027) (0.031) (0.030) (0.031) (0.030)
Q 0.015
0.015
0.013
0.013
0.015
0.015
0.004 0.005
(0.006) (0.006) (0.006) (0.006) (0.007) (0.007) (0.008) (0.007)
Firm FE Y Y Y Y Y Y Y Y
Year FE Y Y Y Y Y Y Y Y
N 61,930 61,930 61,930 61,930 61,930 61,930 61,930 61,930
Adjusted R
2
0.586 0.519 0.523 0.489 0.472 0.473 0.605 0.604
114
Table B.18: Placebo tests - Competition from other countries and Chinese internet penetration
The table displays OLS regressions in which the dependent variables are textual measures of competition complaints from 10K filings. The
dependent variables are constructed in a similar way as in Table B.17. However, instead of measuring China-related competition complaints,
we now search for competition complaints about other regions of the world. More specifically, Columns (1) - (2) report searches using
European Union countries, Column (3) - (4) using Japan, and Columns (5)-(6) using Canada and Mexico. All the dependent variables are
the count of matched paragraphs divided by the total number of paragraphs in the 10K filings. The key independent variable CNInternet is
the Chinese internet penetration ratio. All independent variables, except for log(10kSize), are one-year lagged relative to the dependent
variables. All the variables are normalized by their standard deviations for easier interpretation. The sample covers all Compustat firms from
2001 to 2015 with 10K filings. We exclude all observations where the total asset or the sales are smaller than one million dollars. Robust
standard errors clustered by firms are reported in the parentheses. Detailed definitions of the variables can be found in Table B.25 in the
Appendix.
Panel A: Weights from A-share, HK-, and US-listed firms
JP NA EU
IntComp IntTheft IntComp IntTheft IntComp IntTheft
(1) (2) (3) (4) (5) (6)
CNInternet 0.021 0.009 0.059 0.005 0.085
0.052
(0.038) (0.008) (0.044) (0.005) (0.048) (0.025)
CNSalesGR 0.002 0.0001 0.001 0.00001 0.0002 0.001
(0.003) (0.001) (0.005) (0.0003) (0.004) (0.002)
log(10kSize) 0.081
0.016
0.156
0.006
0.251
0.081
(0.014) (0.004) (0.014) (0.001) (0.020) (0.009)
log(Age + 1) 0.061
0.004 0.035 0.001 0.059
0.013
(0.023) (0.006) (0.026) (0.003) (0.027) (0.014)
log(TA) 0.024 0.006 0.126
0.008
0.227
0.085
(0.046) (0.008) (0.033) (0.004) (0.041) (0.018)
Q 0.007 0.003 0.014
0.001 0.008 0.0004
(0.008) (0.003) (0.008) (0.001) (0.011) (0.006)
Firm FE Y Y Y Y Y Y
Year FE Y Y Y Y Y Y
N 61,930 61,930 61,930 61,930 61,930 61,930
Adjusted R
2
0.379 0.298 0.345 0.437 0.326 0.399
Panel B: Weights from A-share listed firms only
JP NA EU
IntComp IntTheft IntComp IntTheft IntComp IntTheft
(1) (2) (3) (4) (5) (6)
CNInternet 0.032 0.009 0.053 0.005 0.020 0.027
(0.036) (0.008) (0.038) (0.005) (0.049) (0.025)
CNSalesGR 0.002 0.0002 0.001 0.00002 0.0003 0.001
(0.003) (0.001) (0.005) (0.0003) (0.004) (0.002)
log(10kSize) 0.081
0.016
0.156
0.006
0.251
0.081
(0.014) (0.004) (0.014) (0.001) (0.020) (0.009)
log(Age + 1) 0.061
0.004 0.035 0.001 0.059
0.013
(0.023) (0.006) (0.026) (0.003) (0.027) (0.014)
log(TA) 0.024 0.006 0.126
0.008
0.226
0.085
(0.046) (0.008) (0.033) (0.004) (0.041) (0.018)
Q 0.007 0.003 0.014
0.001 0.008 0.0005
(0.008) (0.003) (0.008) (0.001) (0.011) (0.006)
Firm FE Y Y Y Y Y Y
Year FE Y Y Y Y Y Y
N 61,930 61,930 61,930 61,930 61,930 61,930
Adjusted R
2
0.379 0.298 0.345 0.437 0.326 0.399
115
Table B.19: Innovation activities and Chinese internet penetration
The table displays OLS regressions in which the dependent variables are firms’ innovation activities. The dependent variable in Columns (1)
- (3) is the R&D expenses over sales. For missing R&D, we follow the Koh and Reeb (2015) and replace the missing with industry average if
the firm files for any patent patents applications in the past three years (including the current year), and 0 otherwise. The dependent
variables are measures from 1, 2, or 3 years in the future. The dependent variable in Columns (4) - (6) is the total number of patent
applications each year (by filing date) dividend by sales. The patent data comes from Google Patents, and we match the patents to
Compustat firms using the links from Kogan, Papanikolaou, Seru, and Stoffman (2016). The dependent variables are measures from 1, 2, or
3 years in the future. The key independent variable CNInternet is the Chinese internet penetration ratio. All independent variables are
one-year lagged relative to the dependent variables. All the variables are normalized by their standard deviations for easier interpretation.
The sample covers all Compustat firms from 2003 to 2015. We exclude all observations where the total asset or the sales are smaller than
one million dollars. Robust standard errors clustered by firms are reported in the parentheses. Detailed definitions of the variables can be
found in Table B.25 in the Appendix.
XRD/Sales NPatent / Sales
t+1 t+2 t+3 t+1 t+2 t+3
(1) (2) (3) (4) (5) (6)
CNInternet 0.182
0.181
0.175
0.108
0.086
0.065
(0.042) (0.042) (0.040) (0.041) (0.041) (0.038)
CNSalesGR 0.001 0.003 0.002 0.002 0.001 0.001
(0.002) (0.002) (0.002) (0.002) (0.002) (0.002)
log(Age + 1) 0.101
0.096
0.084
0.106
0.087
0.067
(0.017) (0.019) (0.021) (0.021) (0.020) (0.021)
log(TA) 0.034 0.048
0.007 0.058
0.033 0.031
(0.027) (0.028) (0.029) (0.029) (0.028) (0.026)
Q 0.002 0.011 0.001 0.016 0.007 0.003
(0.014) (0.014) (0.014) (0.014) (0.013) (0.014)
Firm FE Y Y Y Y Y Y
Year FE Y Y Y Y Y Y
N 61,768 53,799 46,726 61,768 53,799 46,726
Adjusted R
2
0.736 0.736 0.740 0.724 0.752 0.785
116
Table B.20: Innovation activities and Chinese internet penetration - Poisson Regression
The table displays poisson regressions in which the dependent variables are firms’ innovation activities. The dependent variable in Columns
(1) - (3) is the R&D expenses over sales. For missing R&D, we follow the Koh and Reeb (2015) and replace the missing with industry average
if the firm files for any patent patents applications in the past three years (including the current year), and 0 otherwise. The dependent
variables are measures from 1, 2, or 3 years in the future. The dependent variable in Columns (4) - (6) is the total number of patent
applications each year (by filing date) dividend by sales. The patent data comes from Google Patents, and we match the patents to
Compustat firms using the links from Kogan, Papanikolaou, Seru, and Stoffman (2016). The dependent variables are measures from 1, 2, or
3 years in the future. The key independent variable CNInternet is the Chinese internet penetration ratio. All independent variables are
one-year lagged relative to the dependent variables. All the variables are normalized by their standard deviations for easier interpretation.
The sample covers all Compustat firms from 2003 to 2015. We exclude all observations where the total asset or the sales are smaller than
one million dollars. Robust standard errors clustered by firms are reported in the parentheses. Detailed definitions of the variables can be
found in Table B.25 in the Appendix.
XRD/Sales NPatent / Sales
t+1 t+2 t+3 t+1 t+2 t+3
(1) (2) (3) (4) (5) (6)
CNInternet -0.348*** -0.474*** -0.542*** -0.427*** -0.512*** -0.518***
(0.0574) (0.0551) (0.0606) (0.101) (0.105) (0.114)
CNSalesGR -0.0555*** -0.0189 -0.0420** -0.0601*** -0.0244 -0.0358*
(0.0174) (0.0144) (0.0190) (0.0203) (0.0168) (0.0187)
log(Age + 1) -0.189*** -0.187*** -0.154*** 0.0433 0.00330 -0.0136
(0.0207) (0.0214) (0.0215) (0.0326) (0.0305) (0.0328)
log(TA) -0.568*** -0.537*** -0.555*** -0.385*** -0.325*** -0.337***
(0.0277) (0.0283) (0.0277) (0.0377) (0.0388) (0.0372)
Q 0.0287 0.0575*** 0.0525*** 0.0402* 0.0671*** 0.0616**
(0.0184) (0.0177) (0.0188) (0.0215) (0.0223) (0.0241)
Lagged XRD/Sales 0.242*** 0.242*** 0.246***
(0.0261) (0.0265) (0.0285)
Lagged NPatent/Sales
0.206*** 0.216*** 0.219***
(0.0223) (0.0230) (0.0268)
Year FE Y Y Y Y Y Y
Observations 56,032 53,400 46,404 56,032 53,400 46,404
117
Table B.21: Patent citations and Chinese internet penetration
The table displays OLS regressions in which the dependent variables are the annual number of citations by Chinese firms on the firm’s existing patents. In Columns (1) - (3), for each
firm we count the number of new patents that have cited the firm’s existing patents in each year. We further require the first assignee of the citing patent is a Chinese company, and the
patent is filed in the US with USPTO. The dependent variables in Columns (1) - (3) are the total count number, PatCiteUS
CN
, divided by sales in the next three years, respectively. In
Columns (4) - (6), we further compare PatCiteUS
CN
to the number of citations from news patents which are filed with USPTO and assigned to US firms. The dependent variables in
Columns (4) - (6) are PatCiteUS
CN
/(PatCiteUS
CN
+ PatCiteUS
US
+ 1) in the next three years, respectively. In Columns (7) - (9), PatCiteCN counts the number of new patents filed
with Chinese Patent Office (SIPO) that have cited the firm’s existing patents. We exclude patents that are filed in SIPO but are assigned to US companies. In Columns (10) - (12), we
use PatCiteCN / (PatCiteCN + PatCiteUS + 1) as the dependent variables, where the PatCiteUS is the total counts of new citing patents filed in the US. The key independent variable
CNInternet is the Chinese internet penetration ratio. All independent variables are one-year lagged relative to the dependent variables. All the variables are normalized by their
standard deviations for easier interpretation. The sample covers all Compustat firms from 2003 to 2015. We exclude all observations where the total asset or the sales are smaller
than one million dollars. Robust standard errors clustered by firms are reported in the parentheses. Detailed definitions of the variables can be found in Table B.25 in the Appendix.
PatCiteUS
CN
Sales
PatCiteUS
CN
PatCiteUS
CN
+PatCiteUS
US
+1
PatCiteCN
Sales
PatCiteCN
PatCiteCN+PatCiteUS+1
t+1 t+2 t+3 t+1 t+2 t+3 t+1 t+2 t+3 t+1 t+2 t+3
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
CNInternet 0.285
0.256
0.227
0.262
0.264
0.224
0.250
0.181
0.180
0.336
0.294
0.298
(0.053) (0.053) (0.054) (0.045) (0.046) (0.047) (0.046) (0.046) (0.047) (0.048) (0.052) (0.056)
CNSalesGR 0.0003 0.002 0.004 0.001 0.006 0.015
0.002 0.005
0.008
0.008
0.002 0.012
(0.003) (0.003) (0.004) (0.004) (0.004) (0.005) (0.003) (0.003) (0.003) (0.004) (0.004) (0.005)
log(Age + 1) 0.050
0.051
0.026 0.073
0.056
0.030 0.006 0.015 0.030 0.467
0.447
0.426
(0.022) (0.023) (0.024) (0.023) (0.023) (0.024) (0.020) (0.021) (0.021) (0.031) (0.031) (0.031)
log(TA) 0.298
0.251
0.166
0.116
0.104
0.090
0.345
0.293
0.208
0.027 0.050
0.051
(0.035) (0.036) (0.037) (0.027) (0.029) (0.031) (0.034) (0.036) (0.036) (0.029) (0.030) (0.032)
Q 0.059
0.052
0.051
0.050
0.048
0.039
0.037
0.046
0.039
0.008 0.009 0.005
(0.013) (0.014) (0.015) (0.009) (0.010) (0.012) (0.013) (0.013) (0.015) (0.008) (0.008) (0.008)
Firm FE Y Y Y Y Y Y Y Y Y Y Y Y
Year FE Y Y Y Y Y Y Y Y Y Y Y Y
N 61,768 53,799 46,726 61,768 53,799 46,726 61,768 53,799 46,726 61,768 53,799 46,726
Adjusted R
2
0.401 0.435 0.470 0.245 0.268 0.283 0.456 0.493 0.521 0.322 0.341 0.364
118
Table B.22: Placebo tests - patent citations from other counties and Chinese internet penetration
The table displays OLS regressions in which the dependent variables are the annual number of citations by firms in other economies on the firm’s existing patents. We define the
dependent variables as in the Columns (1)-(3) of Table B.21. PatCiteUS
JP
it
are the number of patents, which are filed by Japanese firms with USPTO in yeart, that cite firmi’s
existing patents. Similarly, PatCiteUS
NA
it
are the patent counts filed by firms from Canada or Mexica, and PatCiteUS
EU
it
, the firms from European Union. The key independent
variable CNInternet is the Chinese internet penetration ratio. All independent variables are one-year lagged relative to the dependent variables. All the variables are normalized by
their standard deviations for easier interpretation. The sample covers all Compustat firms from 2003 to 2015. We exclude all observations where the total asset or the sales are smaller
than one million dollars. Robust standard errors clustered by firms are reported in the parentheses. Detailed definitions of the variables can be found in Table B.25 in the Appendix.
PatCiteUS
JP
Sales
PatCiteUS
NA
Sales
PatCiteUS
EU
Sales
t+1 t+2 t+3 t+1 t+2 t+3 t+1 t+2 t+3
(1) (2) (3) (4) (5) (6) (7) (8) (9)
CNInternet 0.041 0.062 0.068 0.024 0.030 0.039 0.014 0.029 0.043
(0.044) (0.046) (0.049) (0.044) (0.046) (0.046) (0.045) (0.046) (0.048)
CNSalesGR 0.001 0.002 0.002 0.004
0.001 0.002 0.0001 0.002 0.0003
(0.002) (0.002) (0.003) (0.002) (0.003) (0.003) (0.002) (0.002) (0.002)
log(Age + 1) 0.066
0.035
0.030 0.075
0.065
0.065
0.062
0.036
0.026
(0.017) (0.020) (0.022) (0.019) (0.021) (0.022) (0.017) (0.019) (0.020)
log(TA) 0.198
0.093
0.029 0.222
0.157
0.107
0.207
0.123
0.062
(0.030) (0.032) (0.036) (0.030) (0.032) (0.032) (0.030) (0.032) (0.033)
Q 0.037
0.024
0.002 0.032
0.014 0.002 0.040
0.028
0.006
(0.012) (0.013) (0.014) (0.015) (0.014) (0.015) (0.014) (0.014) (0.014)
Firm FE Y Y Y Y Y Y Y Y Y
Year FE Y Y Y Y Y Y Y Y Y
N 61,768 53,799 46,726 61,768 53,799 46,726 61,768 53,799 46,726
Adjusted R
2
0.672 0.682 0.687 0.524 0.542 0.563 0.697 0.706 0.719
119
Table B.23: Subsample analysis - by Q
This table re-estimates regressions in Table B.17 and B.19 with an additional variable, HighQ, which equals to 1 if a firm’s Q is higher than
the median Q in each year, and 0 otherwise. We interact the HighQ dummy with the Chinese internet penetration variable and test whether
high- and low-Q firms have different reponses in their innovation activities to Chinese competition. All independent variables are one-year
lagged relative to the dependent variables. All the variables are normalized by their standard deviations for easier interpretation. The sample
construction follows the same procedure as in previous tables. We exclude all observations where the total asset or the sales are smaller than
one million dollars. Robust standard errors clustered by firms are reported in the parentheses. Detailed definitions of the variables can be
found in Table B.25 in the Appendix.
CNComp CNCompHi CNIntComp XRD/Sales NPatent/Sales
t+1 t+1 t+1 t+1 t+3 t+1 t+3
(1) (2) (3) (4) (5) (6) (7)
CNInternet x HighQ 0.036
0.030
0.034
0.063
0.056
0.046
0.046
(0.012) (0.012) (0.013) (0.010) (0.012) (0.011) (0.012)
CNInternet 0.100
0.098
0.095
0.147
0.147
0.083
0.043
(0.043) (0.041) (0.040) (0.038) (0.036) (0.038) (0.035)
CNSalesGR x HighQ 0.005 0.005 0.002 0.001 0.004 0.001 0.008
(0.005) (0.005) (0.006) (0.004) (0.004) (0.004) (0.004)
CNSalesGR 0.004 0.002 0.002 0.001 0.004
0.001 0.003
(0.004) (0.004) (0.004) (0.002) (0.002) (0.002) (0.002)
HighQ 0.050
0.047
0.068
0.085
0.077
0.043
0.049
(0.018) (0.019) (0.020) (0.019) (0.020) (0.021) (0.020)
log(10kSize) 0.051
0.057
0.046
(0.008) (0.009) (0.009)
log(Age + 1) 0.057
0.060
0.034 0.104
0.086
0.108
0.068
(0.024) (0.025) (0.026) (0.017) (0.021) (0.021) (0.021)
log(TA) 0.037 0.055
0.022 0.041 0.014 0.054
0.026
(0.029) (0.028) (0.032) (0.026) (0.029) (0.029) (0.026)
Q 0.014
0.012
0.011 0.006 0.004 0.015 0.006
(0.006) (0.006) (0.007) (0.015) (0.015) (0.015) (0.015)
Firm FE Y Y Y Y Y Y Y
Year FE Y Y Y Y Y Y Y
N 60,730 60,730 60,730 61,768 46,726 61,768 46,726
Adjusted R
2
0.587 0.526 0.478 0.737 0.740 0.725 0.785
120
Table B.24: Subsample analysis - by Asset Tangibility
This table re-estimates regressions in Table B.17 and B.19 with an additional variable, HighT, which equals to 1 if a firm’s asset tangibility is
higher than the median asset tangibility in each year, and 0 otherwise. We interact the HighT dummy with the Chinese internet penetration
variable and test whether high- and low-asset tangibility firms have different reponses in their innovation activities to Chinese competition.
All independent variables are one-year lagged relative to the dependent variables. All the variables are normalized by their standard
deviations for easier interpretation. The sample construction follows the same procedure as in previous tables. We exclude all observations
where the total asset or the sales are smaller than one million dollars. Robust standard errors clustered by firms are reported in the
parentheses. Detailed definitions of the variables can be found in Table B.25 in the Appendix.
CNComp CNCompHi CNIntComp XRD/Sales NPatent/Sales
t+1 t+1 t+1 t+1 t+3 t+1 t+3
(1) (2) (3) (4) (5) (6) (7)
CNInternet x HighT 0.035
0.028
0.039
0.061
0.040
0.045
0.032
(0.014) (0.015) (0.016) (0.012) (0.014) (0.013) (0.013)
CNInternet 0.090
0.092
0.083
0.223
0.204
0.136
0.083
(0.044) (0.044) (0.042) (0.049) (0.048) (0.047) (0.044)
CNSalesGR x HighT 0.006 0.006 0.006 0.003 0.008
0.007
0.004
(0.005) (0.005) (0.006) (0.004) (0.005) (0.004) (0.004)
CNSalesGR 0.002 0.004 0.004 0.001 0.003 0.006
0.001
(0.004) (0.004) (0.004) (0.003) (0.004) (0.003) (0.004)
HighT 0.004 0.0003 0.021 0.092
0.040
0.048
0.014
(0.023) (0.024) (0.026) (0.022) (0.023) (0.027) (0.025)
log(10kSize) 0.053
0.059
0.048
(0.009) (0.009) (0.010)
log(Age + 1) 0.047
0.052
0.022 0.084
0.075
0.095
0.060
(0.026) (0.027) (0.028) (0.017) (0.021) (0.021) (0.022)
log(TA) 0.045 0.063
0.034 0.032 0.006 0.062
0.032
(0.030) (0.029) (0.033) (0.028) (0.030) (0.030) (0.027)
Q 0.012
0.011
0.009 0.006 0.0004 0.017 0.002
(0.006) (0.006) (0.007) (0.015) (0.014) (0.015) (0.014)
Firm FE Y Y Y Y Y Y Y
Year FE Y Y Y Y Y Y Y
N 58,415 58,415 58,415 59,396 44,907 59,396 44,907
Adjusted R
2
0.587 0.526 0.478 0.736 0.739 0.724 0.785
121
B.4 Appendix Tables for Chapter 2
Table B.25: Variable definitions Table B.25
Variable Definition Source
CNInternet The weighted average internet penetration ratio across provinces in China. We first collect the
number of internet users from annual reports. We then get the number of population for each
province-year from China Data Online and calculate the internet penetration ratio. Next, for each
industry, we calculate the weights across provinces using the total assets of all the Chinese public
firms (mainland A-share, Hongkong, and US) in 2000, and the same weights are used in all later
years. We assign each public firm to the province of its headquarter. In calculating the weights
for each industry, we keep only provinces whose weights are above 10%, and then calculate
CNInternet as the weighted-average of the internet penetration ratio, where the weights are the
total asset of the public firms of the industry from the province.
CNNIC Reports;
CSMAR; Capital IQ;
China Data Online
CNComp % # of paragraphs that contain at least one words from the following word lists divided by the total
number of paragraphs of the 10-K filing. List 1: [China, Chinese]; List 2: [compete, competition,
competing]
10-K Filing
CNComp Dummy A dummy variable that equals to one if CNComp % is larger than 0, and 0 otherwise. 10-K Filing
CNCompHi % # of paragraphs that contain at least one words from the following word lists divided by the total
number of paragraphs of the 10-K filing. List 1: [China, Chinese]; List 2: [compete, competition,
competing]; List 3: [high, intense, significant, face, faces, substantial, significant, continued,
vigorous, strong, aggressive, fierce, stiff, extensive, severe]
10-K Filing
CNCompHi Dummy A dummy variable that equals to one if CNCompHi % is larger than 0, and 0 otherwise. 10-K Filing
CNIntComp % # of paragraphs that contain at least one words from the following word lists divided by the total
number of paragraphs of the 10-K filing. List 1: [China, Chinese]; List 2: [compete, competition,
competing]; List 3: [intellectual]; List 4: [property]
10-K Filing
CNIntComp Dummy A dummy variable that equals to one if CNIntComp % is larger than 0, and 0 otherwise. 10-K Filing
CNIntTheft % # of paragraphs that contain at least one words from the following word lists divided by the
total number of paragraphs of the 10-K filing. List 1: [China, Chinese]; List 2: [protect, infringe,
theft]; List 3: [intellectual]; List 4: [property]
10-K Filing
CNIntTheft Dummy A dummy variable that equals to one if CNIntTheft % is larger than 0, and 0 otherwise. 10-K Filing
EUIntComp % # of paragraphs that contain at least one words from the following word lists divided by the
total number of paragraphs of the 10-K filing. List 1: [Europe, European]; List 2: [compete,
competition, competing]; List 3: [intellectual]; List 4: [property]
10-K Filing
EUIntTheft % # of paragraphs that contain at least one words from the following word lists divided by the total
number of paragraphs of the 10-K filing. List 1: [Europe, European]; List 2: [protect, infringe,
theft]; List 3: [intellectual]; List 4: [property]
10-K Filing
JPIntComp % # of paragraphs that contain at least one words from the following word lists divided by the
total number of paragraphs of the 10-K filing. List 1: [Japan, Japanese]; List 2: [compete,
competition, competing]; List 3: [intellectual]; List 4: [property]
10-K Filing
JPIntTheft % # of paragraphs that contain at least one words from the following word lists divided by the total
number of paragraphs of the 10-K filing. List 1: [Japan, Japanese]; List 2: [protect, infringe,
theft]; List 3: [intellectual]; List 4: [property]
10-K Filing
NAIntComp % # of paragraphs that contain at least one words from the following word lists divided by the total
number of paragraphs of the 10-K filing. List 1: [Mexico, Mexican, Canada, Canadian]; List 2:
[compete, competition, competing]; List 3: [intellectual]; List 4: [property]
10-K Filing
NAIntTheft % # of paragraphs that contain at least one words from the following word lists divided by the total
number of paragraphs of the 10-K filing. List 1: [Mexico, Mexican, Canada, Canadian]; List 2:
[protect, infringe, theft]; List 3: [intellectual]; List 4: [property]
10-K Filing
XRD R&D expenses from Compustat. We replace the missing R&D expense ratio (over sales) by the
industry average if the firms has applied for any patents in the past three years. We replace the
other missing variables with 0.
Compustat
NPatent The number of patents that the firm applies in a year. For patents granted prior to Nov. 1, 2010,
we use the KPSS data; For patents granted after Nov. 1, 2010, we use the patent data from
Google patents.
Google Patent; Ko-
gan, Papanikolaou,
Seru, and Stoffman
(2016)
PatCiteCN The total number of new patents that (1) are applied in SIPO (China Patent Office), (2) assigned
to a Chinese firm, and (3) cite any existing patents of the firm
Google Patent
PatCiteUS
CN
The total number of new patents that (1) are applied in USPTO, (2) assigned to a Chinese firm,
and (3) cite any existing patents of the firm
Google Patent
PatCiteUS
EU
The total number of new patents that (1) are applied in USPTO, (2) assigned to an European
firm, and (3) cite any existing patents of the firm
Google Patent
PatCiteUS
JP
The total number of new patents that (1) are applied in USPTO, (2) assigned to a Japanese firm,
and (3) cite any existing patents of the firm
Google Patent
PatCiteUS
NA
The total number of new patents that (1) are applied in USPTO, (2) assigned to a Mexican or
Canadian firm, and (3) cite any existing patents of the firm
Google Patent
Continued on next page
122
Table B.25 – Continued from previous page
Variable Definition Source
PatCiteUS
US
The total number of new patents that (1) are applied in USPTO, (2) assigned to an American
firm, and (3) cite any existing patents of the firm
Google Patent
Age Number of years that the firm has been public Compustat
CNSalesGR The average sales growth of the Chinese public company of the same 2-digit SIC industry CSMAR; Capital IQ
Q Market to book ratio Compustat
Sales Sales of the firm Compustat
TA Total asset of the firm Compustat
AssetTangibility property, plant and equipment over total assets Compustat
CNInternet Macro The variable is constructed similarly to CNInternet. Instead of using the weights from public
firms, we use the industry weights from the total assets information from China Data Online. We
hand-matched each industry to 2-digit SIC industries.
CNNIC Reports;
China Data Online
CNInternet Top1 The variable is constructed similarly to CNInternet. Instead of using the value-weighted measure
using all the provinces whose weights are above 10%, we put 100% weight on the province with
the highest total assets of the industry
CNNIC Reports; Cap-
ital IQ; China Data
Online
Table B.26: Robustness - Weights from Macro Data
This table estimates the robustness of our main results using
CNComp CNCompHi CNIntComp
XRD
Sales
NPatent
Sales
PatCiteUS
CN
Sales
PatCiteCN
Sales
(1) (2) (3) (4) (5) (6) (7)
CNInternet Macro 0.198
0.159
0.162
0.069
0.092
0.343
0.351
(0.044) (0.044) (0.043) (0.021) (0.032) (0.048) (0.042)
log(10kSize) 0.107
0.110
0.097
(0.011) (0.011) (0.011)
log(Age + 1) 0.044
0.049
0.020 0.106
0.113
0.075
0.021
(0.024) (0.025) (0.027) (0.018) (0.021) (0.023) (0.020)
log(TA) 0.053
0.069
0.039 0.034 0.059
0.292
0.339
(0.029) (0.027) (0.031) (0.027) (0.029) (0.035) (0.034)
Q 0.013
0.012
0.014
0.003 0.017 0.057
0.034
(0.006) (0.006) (0.007) (0.014) (0.014) (0.013) (0.013)
CNSalesGR 0.001 0.002 0.001 0.001 0.002 0.002 0.0005
(0.003) (0.003) (0.003) (0.002) (0.002) (0.003) (0.003)
Firm FE Y Y Y Y Y Y Y
Year FE Y Y Y Y Y Y Y
N 61,930 61,930 61,930 61,768 61,768 61,768 61,768
Adjusted R
2
0.586 0.523 0.473 0.735 0.724 0.401 0.457
123
Table B.27: Robustness - Top 1 provinces
CNComp CNCompHi CNIntComp
XRD
Sales
NPatent
Sales
PatCiteUS
CN
Sales
PatCiteCN
Sales
(1) (2) (3) (4) (5) (6) (7)
CNInternet Top1 0.116
0.105
0.112
0.107
0.111
0.260
0.245
(0.034) (0.033) (0.032) (0.027) (0.030) (0.040) (0.036)
CNSalesGR 0.001 0.002 0.001 0.001 0.001 0.002 0.0001
(0.003) (0.003) (0.003) (0.002) (0.002) (0.003) (0.003)
log(10kSize) 0.107
0.110
0.097
(0.011) (0.011) (0.011)
log(Age + 1) 0.061
0.063
0.034 0.100
0.104
0.046
0.009
(0.024) (0.025) (0.027) (0.017) (0.020) (0.022) (0.020)
log(TA) 0.050
0.066
0.037 0.035 0.058
0.298
0.345
(0.029) (0.028) (0.031) (0.027) (0.029) (0.035) (0.034)
Q 0.014
0.012
0.015
0.003 0.017 0.058
0.035
(0.006) (0.006) (0.007) (0.014) (0.014) (0.013) (0.013)
Firm FE Y Y Y Y Y Y Y
Year FE Y Y Y Y Y Y Y
N 61,930 61,930 61,930 61,768 61,768 61,768 61,768
Adjusted R
2
0.586 0.523 0.472 0.736 0.725 0.401 0.456
124
Table B.28: Robustness of Table B.19 Excluding Zero R&D Firms
This table tests the robustness of Table B.19 by using subsample excluding observations where XRD/Sales equals 0. The dependent variable
for Columns (1) - (3) is the R&D expenses over sales. For missing R&D, we follow the Koh and Reeb (2015) and replace the missing with
industry average if the firm files for any patent patents applications in the past three years (including the current year), and 0 otherwise. The
dependent variables are measures from 1, 2, or 3 years in the future. The dependent variable in Columns (4) - (6) is the total number of
patent applications each year (by filing date) dividend by sales. The patent data comes from Google Patents, and we match the patents to
Compustat firms using the links from Kogan, Papanikolaou, Seru, and Stoffman (2016). The dependent variables are measures from 1, 2, or
3 years in the future. The key independent variable CNInternet is the Chinese internet penetration ratio. All independent variables are
one-year lagged relative to the dependent variables. All the variables are normalized by their standard deviations for easier interpretation.
The sample covers all Compustat firms from 2003 to 2015. We exclude all observations where the total asset or the sales are smaller than
one million dollars. Robust standard errors clustered by firms are reported in the parentheses. Detailed definitions of the variables can be
found in Table B.25 in the Appendix.
XRD/Sales NPatent / Sales
t+1 t+2 t+3 t+1 t+2 t+3
(1) (2) (3) (4) (5) (6)
CNInternet 0.354
0.325
0.268
0.156
0.133
0.092
(0.082) (0.076) (0.071) (0.084) (0.080) (0.078)
CNSalesGR 0.0004 0.008 0.004 0.007 0.0003 0.0002
(0.005) (0.006) (0.007) (0.006) (0.006) (0.007)
log(Age + 1) 0.271
0.213
0.163
0.310
0.258
0.210
(0.046) (0.046) (0.049) (0.055) (0.055) (0.057)
log(TA) 0.014 0.052 0.001 0.158
0.093
0.095
(0.054) (0.052) (0.053) (0.058) (0.055) (0.051)
Q 0.015 0.004 0.007 0.037
0.018 0.017
(0.020) (0.017) (0.015) (0.019) (0.018) (0.018)
Firm FE Y Y Y Y Y Y
Year FE Y Y Y Y Y Y
N 27,950 24,131 20,913 27,950 24,131 20,913
Adjusted R
2
0.738 0.731 0.733 0.709 0.751 0.785
125
Abstract (if available)
Abstract
This thesis consists of two essays that study how the flow of information affects trading activities and corporate actions. Chapter 1 shows that financial media can have causal effects on market participants. I exploit random variation in the visual salience of corporate press releases to financial journalists as an instrument to media coverage. Doubling the amount of media coverage increases the number of EDGAR searches by 31% and the number of analysts issuing earnings forecasts by 78% in a two-day period. The evidence is most consistent with the theory of rational attention allocation: sophisticated investors acquire more information for media-covered events as media coverage signals higher variances of returns, and thus higher payoffs from having more precise information. Analysts cater to the increased information demand from institutional investors by responding to media-reported events. Chapter 2 exploits the increasing information flow via internet in the past decades as context and examines how competitive shocks from China impact U.S. innovation through two margins: the markets for innovation and for existing products. Using Chinese data, we map each industry to province Internet penetration levels using geographic agglomeration data. The resulting industry-year database indicates the ability of Chinese firms to acquire knowledge globally and compete in the market for intellectual property production. Increases in provincial Chinese Internet penetration are followed by sharp reductions in R&D investment and subsequent patents for U.S. firms, and increased patenting by Chinese firms. The new Chinese patents also cite the patents of treated U.S. firms at a high rate, consistent with increased intellectual property competition. In contrast, U.S. firms with fewer growth options and more tangible assets tend to increase R&D and patenting activity. Overall, both competition in intellectual property by Chinese firms and the asset competition of U.S. firms influence U.S. firm innovation.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays in financials economics
PDF
“What you see is all there is”: The effects of media co‐coverage on investors’ peer selection
PDF
Internal capital markets and competitive threats
PDF
Shareholder litigation as a disciplining device: evidence from firms' financial policies
PDF
Essays on the firm and stakeholders relationships: evidence from mergers & acquisitions and labor negotiations
PDF
Three essays in asset pricing
PDF
Innovation: financial and economics considerations
PDF
Expectation dynamics and stock returns
PDF
Essays in corporate finance
PDF
Executive compensation: the trend toward one size fits all
PDF
Voluntary disclosure responses to mandated disclosure: evidence from Australian corporate tax transparency
PDF
Share repurchases: how important is market timing?
PDF
Essays in behavioral and financial economics
PDF
Essays on real options
PDF
Understanding the disclosure practices of firms affected by a natural disaster: the case of hurricanes
PDF
Essays on the effect of cognitive constraints on financial decision-making
PDF
Evolution of returns to scale and investor flows during the life cycle of active asset management
PDF
Essays on firm investment, innovation and productivity
PDF
Slashing liquidity through asset purchases: evidence from collective bargaining
PDF
Cash holdings and corporate diversification
Asset Metadata
Creator
Li, Yuan (Bruce) (author)
Core Title
Essays on information and financial economics
School
Marshall School of Business
Degree
Doctor of Philosophy
Degree Program
Business Administration
Publication Date
04/28/2019
Defense Date
03/20/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
China,competition,financial market,Information,journalist,media,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hoberg, Gerard (
committee chair
), Ahern, Kenneth (
committee member
), Ozbas, Oguzhan (
committee member
), Wong, TJ (
committee member
)
Creator Email
yli268@usc.edu,yuan@bruceyli.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-150904
Unique identifier
UC11660530
Identifier
etd-LiYuanBruc-7298.pdf (filename),usctheses-c89-150904 (legacy record id)
Legacy Identifier
etd-LiYuanBruc-7298.pdf
Dmrecord
150904
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Li, Yuan (Bruce)
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
financial market
media