Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Picky customers and expected returns
(USC Thesis Other)
Picky customers and expected returns
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Picky Customers and Expected Returns by Louis Yang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Business Administration) August 2021 Contents ListofTables iii ListofFigures iv Abstract v 1 Introduction 1 2 Background 7 3 Data 11 3.1 Amazon Review Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Firm-Level Picky Customer Measure . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Product Market Competition Measures . . . . . . . . . . . . . . . . . . . . . . 15 4 Results 17 4.1 Verifying the Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Asset Pricing Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Product Market Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 Conclusion 28 Bibliography 30 FiguresandTables 32 Appendix 45 A1 Combining AWS and Julian McAuley Data . . . . . . . . . . . . . . . . . . . 45 A2 Supplementary Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . 46 ii ListofTables Table 1: SampleComparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Table 2: BrandLoyaltyPercentiles:1-YearAhead . . . . . . . . . . . . . . . . 36 Table 3: CustomerLoyalty:1-YearLater . . . . . . . . . . . . . . . . . . . . . 37 Table 4: CustomerLoyaltybyStarRating . . . . . . . . . . . . . . . . . . . . 38 Table 5: ReviewLengthDummyVariableAnalysis . . . . . . . . . . . . . . 39 Table 6: PortfolioSorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Table 7: PortfolioSorting: Alternatives . . . . . . . . . . . . . . . . . . . . . . 40 Table 8: Cross-SectionalRegressions . . . . . . . . . . . . . . . . . . . . . . . 41 Table 9: CategoryMedianCross-SectionalRegressions . . . . . . . . . . . . 42 Table 10: RealBeta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Table 11: FactorLoadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Table A1: BrandLoyaltyTerciles:1-YearAhead . . . . . . . . . . . . . . . . . . 47 Table A2: BrandLoyaltyPercentiles:2-YearsAhead . . . . . . . . . . . . . . . 48 Table A3: BrandLoyaltyTerciles:2-YearsAhead . . . . . . . . . . . . . . . . . 49 Table A4: BrandLoyaltyPercentiles:3-YearsAhead . . . . . . . . . . . . . . . 50 Table A5: BrandLoyaltyTerciles:3-YearsAhead . . . . . . . . . . . . . . . . . 51 Table A6: SubtractBrandLoyaltyPercentiles:1-YearAhead . . . . . . . . . . 52 Table A7: SubtractBrandLoyaltyTerciles:1-YearAhead . . . . . . . . . . . . 53 Table A8: SubtractBrandLoyaltyPercentiles:2-YearsAhead . . . . . . . . . . 54 Table A9: SubtractBrandLoyaltyTerciles:2-YearsAhead . . . . . . . . . . . . 55 Table A10: SubtractBrandLoyaltyPercentiles:3-YearsAhead . . . . . . . . . . 56 Table A11: SubtractBrandLoyaltyTerciles:3-YearsAhead . . . . . . . . . . . . 57 Table A12: CustomerLoyalty:2-YearsLater . . . . . . . . . . . . . . . . . . . . . 58 Table A13: CustomerLoyalty:3-YearsLater . . . . . . . . . . . . . . . . . . . . . 59 Table A14: SubtractCustomerLoyalty:1-YearLater . . . . . . . . . . . . . . . . 60 Table A15: SubtractCustomerLoyalty:2-YearsLater . . . . . . . . . . . . . . . 61 Table A16: SubtractCustomerLoyalty:3-YearsLater . . . . . . . . . . . . . . . 62 Table A17: ExclusiveCustomerLoyaltybyStarRating . . . . . . . . . . . . . . 63 Table A18: SubtractCustomerLoyaltybyStarRating . . . . . . . . . . . . . . . 64 Table A19: SubtractExclusiveCustomerLoyaltybyStarRating . . . . . . . . . 65 Table A20: WeightedCross-SectionalRegressions . . . . . . . . . . . . . . . . . 66 Table A21: Cross-SectionalRegressionswithProductCharacteristics . . . . . 67 Table A22: Cross-SectionalRegressionsControllingforProductTurnover . . 68 Table A23: RealBeta:Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Table A24: FactorLoadings:Subtract . . . . . . . . . . . . . . . . . . . . . . . . . 70 iii ListofFigures Figure 1: MeasuresofCompetition . . . . . . . . . . . . . . . . . . . . . . . . 32 Figure 2: PickyvsNon-PickyNumberofBrands . . . . . . . . . . . . . . . . 33 Figure 3: ReviewLengthandReview/ProductCharacteristics . . . . . . . . . 34 Figure A1: HistogramofHelpfulVotesProportion . . . . . . . . . . . . . . . . 46 iv Abstract I examine the impact of having a picky customer base on a firm’s risk and return. Using Amazon.com product review lengths, I construct a proxy for customer pickiness. Picky customers have narrow product tastes and are more likely to be repeat purchasers. Thus, firms with picky customers are less vulnerable to product market competition. They expe- rience smaller decreases in profitability when competition increases, and they earn higher returns (0.53% per month). Moreover, the returns of firms with picky customers posi- tively correlates with the returns of other firms that are insulated from product market competition. v 1. Introduction Like physical capital, a firm’s customer base requires costly investment to accumulate and generates profits in the future. Thus, the customer base is an important type of intangible capital. Like other types of capital, a firm’s customer capital can also contain important information about its risk and return. Moreover, just as other sources of capital hetero- geneity - like capital age or labor skill - can change the capital-risk relationship, different types of customers can also impact the relationship between risk and customer capital. In this paper, I study how one important characteristic of firms’ customer bases, cus- tomer pickiness, can impact their expected returns. Similar to Cheng et al. (2021), I define picky customers as those with more specific product tastes. These picky customers tend to be more loyal, as it is more difficult for them to find competing products that also fit their tastes. Thus, firms with picky customers have a more loyal customer base which reduces the firms’ vulnerability to product market competition. In other words, firms with picky customers are less likely to lose customers to new entrants when product market competi- tion increases, and thus are relatively more profitable in high competition environments. Combined with a positive product market competition risk premium, firms with picky customers should have higher expected returns. Empirically, firms with picky customers earn 0.53% higher returns per month (6.24% annualized). 1 Moreover, their profits are less sensitive to product market competition. I find that, on average, firm profits decrease in high competition environments and increase in low competition environments, but this effect is mitigated among firms with pickier customers. I measure customer pickiness by combining two large datasets of online product re- views on Amazon.com from Amazon Web Services and Prof. Julian McAuley. 2 The overall 1 Picky customer firms are more exposed to product market competition insofar as they are relatively more profitable in the cross-section when competition is high. Their return spread is positive, which is consistent with a positive product market competition risk premium. 2 The Amazon Web Services data is publicly available in the AWS research data repository. Prof. Ju- 1 dataset contains the text of 80,156,983 reviews for 9,430,088 products and 110,796 brands. As online product reviews are primarily used to inform other potential customers about the reviewer’s experience concerning the product, pickier customers with more specific product requirements are more likely to leave longer, more detailed reviews. Thus, I use review lengths (word count) to proxy for reviewer pickiness. This measure is also sup- ported by survey and experimental evidence from Cheng et al. (2021) who find that picky customers have more specific product tastes and leave longer, more detailed product de- scriptions when asked to describe their ideal product. Additionally, because the type of product may impact the average review length, I also use detailed product category metadata in the Amazon dataset to adjust each review’s length for its product category’s median length (by subtracting or dividing the median). I hand-match the Amazon brands to Compustat firms to obtain a matched dataset of 4,608,424 reviews for 337 firms span- ning July 2005 to June 2017. Finally, I take the mean category-adjusted review lengths as the firm-level measure of customer pickiness. I begin the empirical analysis by validating the review length measure. I find that customers who write longer reviews buy a smaller assortment of brands compared to customers who write shorter reviews, indicating that they have narrower product tastes. I also find that the adjusted length measure predicts customer loyalty at the brand and customer level. More specifically, Amazon brands with longer average reviews have more repeat reviewers in future years. In other words, brands with longer reviews have more reviewers that return to the same brand in future years. To verify that this is due to long- review customers returning rather than longer reviews influencing more repeat buying in general, I verify that the loyalty is also present at the individual customer/reviewer level. Accordingly, I find that customers that leave long reviews are more likely to be repeat lian McAuley’s data is also available upon request and was originally used in a series of papers on image recognition and retrieval (McAuley et al. (2015) and He and McAuley (2016)). 2 reviewers in future years. I also find evidence that the category-adjustment procedure removes product-type-specific variation in review lengths. Regressing the average length for each firm-year on industry-year dummies yields an adjustedR 2 of 51% for raw review lengths, but only 13% and 6% adjusted R 2 , respectively, for division- and subtraction- adjusted review lengths. I perform standard asset pricing tests to show that picky customers predict higher expected returns. A portfolio that buys the quartile of firms with the longest average ad- justed reviews and sells the quartile with the shortest reviews earns a return spread of 0.53% per month. This return spread persists after controlling for standard asset pric- ing factors. Cross-sectional regressions with a variety of firm-level fundamental controls and product-level controls constructed from the Amazon product metadata yield similar results: longer length reviews predict higher returns. The last part of the paper explores the connection between product market competi- tion and picky customers. If picky customers are more loyal, then this should insulate firms with picky customers from product market competition. Thus, we should expect profits from firms with picky customers to be less sensitive to competition, and their re- turns should be correlated with other portfolios that are similarly insulated from prod- uct market competition. To this end, I construct two aggregated macroeconomic prox- ies of product market competition from the product market textual similarity matrix of Hoberg and Phillips (2016) and the industry markup measure of Bustamante and Donan- gelo (2017). When firms have more textually similar product and business descriptions, they are offering more similar products which indicates higher levels of product market competition. Similarly, lower average industry markup indicates increasing competitive- ness. I perform real beta tests by regressing measures of firm profitability on average adjusted review lengths, the macroeconomic measures of product market competition, 3 and their interaction terms. The analysis shows that higher levels of product market com- petition decrease firm profits, as expected, and that having customers who write longer reviews buffers firms from increased competition. In other words, the profits of firms with picky customers are less sensitive to varying levels of product market competition. Finally, I show that the return of the picky customer portfolio is positively correlated with other portfolios that are also insulated from competition. This paper is related to a growing literature in finance and economics studying the effects of customer capital on firms. Gourio and Rudanko (2014) first proposed that cus- tomers may be a type of intangible capital for firms. They embed product market search frictions and costly accumulation of customers (advertising and marketing expenses) into a firm investment model and show that these features lead to customers being a state variable for firms in a similar manner as standard capital. They argue that customer cap- ital is a missing form of capital not captured in traditional book values that may help explain weak investment-q sensitivities. In asset pricing, Belo et al. (2014b), use perpetual inventory methods to infer the stock of brand capital from advertising expenses and find that firms with more brand capital stock and investment have higher expected returns. My paper is similar in spirit to Dou et al. (2021) who find that firms that rely more on key talent to maintain customer capital have higher expected returns. They deem cus- tomers that are not reliant on key talent as ”inalienable customer capital,” and show that a firm’s proportion of inalienable customer capital negatively predicts expected returns. Their mechanism is different from mine, however, as they argue that the higher risk of ”alienable” customer capital is due to the higher likelihood of key talent leaving finan- cially constrained firms for higher compensation elsewhere. Thus, their mechanism is centered on financial constraints. In contrast, I argue that some customers, due to their specific tastes (picky), are less likely to leave due to product market competition. 4 This paper also draws on a large literature in marketing and operations management studying the microfoundations of customer behavior. Cachon et al. (2005) study optimal product assortments for customers who strategically search among vendors. Bernstein and Mart´ ınez-de Alb´ eniz (2017) analyze optimal timing of new product introductions when customers anticipate future releases. Cheng et al. (2021) use survey and experimen- tal evidence to characterize the shopping habits of picky customers. This paper draws on this literature by considering the impact of customer behavior on firm risk. More broadly, this paper is related to the literatures delving into the impact of intan- gible capital and capital heterogeneity on firm risk. Belo et al. (2014a) show that hiring rates are correlated with future stock returns, and use labor market frictions to charac- terize labor stock as a productive asset for firms, similar to physical capital. Belo et al. (2017) goes further to show that different types of labor capital - skilled or unskilled - can attenuate the relationship between hiring and risk. They find that the negative re- lationship between hiring and returns is stronger among firms with more high-skilled workers, as these workers have higher labor market frictions. Zhang (2019) explores an- other channel through which unskilled labor may reduce firm risk. He finds that firms with unskilled workers possess a real option to automate unskilled labor whose value is countercyclical. Therefore, firms with more unskilled workers are relatively more valu- able in downturns and thus are hedged against productivity shocks. Kilic (2017) shows that firms with younger workers are riskier than firms with older workers as younger workers increase the firm’s exposure to technology shocks. Eisfeldt and Papanikolaou (2013) show that organizational capital, the talent embedded in key employees, increases risk, as the bargaining power of key talent over firms’ cash flows varies systematically. Finally, Lin et al. (2020) argues that old capital vintages are riskier than young vintages as old capital increases a firm’s technology adoption costs and increases their exposure 5 to technology shocks. This paper contributes to these literatures by exploring how an understudied source of intangible capital heterogeneity, customer tastes and loyalty, can impact firms’ risk. Finally, this paper contributes to the ongoing discussion on the product market com- petition risk premium. The literature has proposed many different mechanisms justifying both a positive and negative risk premium for product market competition. This paper does not propose a theoretical mechanism, but finds empirical evidence consistent with a positive sign for the product market competition risk premium. Hou and Robinson (2006) find that firms in low HHI (high competition) industries earn higher returns than firms in highly concentrated (low competition) industries and argued that this is due to com- petitive firms having higher distress risk. However, Ali et al. (2009) point out that the Compustat-based concentration measures used in Hou and Robinson (2006) ignore pri- vate firms. They find that industry concentration measures including private and public firms are not correlated with higher returns. Bustamante and Donangelo (2017) argue that the relationship between competition and risk includes more than distress risk, as firms in more concentrated industries with higher barriers to entry are also hedged against new entrants. If new entrants are procyclical, this means that low concentration industries are actually hedged against macroeconomic shocks and thus are less risky than highly con- centrated industries. They find that the new entrant channel dominates the distress risk channel for an overall positive relationship between industry concentration and expected returns, consistent with a positive competition risk premium. Dou et al. (2021) argue that firms’ incentives to collude erode when discount rates are high, as the discounted benefits of future collusion decrease. Therefore, competition increases when discount rates are high for firms in oligopolic industries (concentrated industries). These firms are relatively less profitable during high discount rate (high competition) environments 6 compared to firms in non-collusive industries. Thus, they argue that firms in oligopolic industries should earn higher returns and are negatively exposed to competition, imply- ing a negative competition risk premium. This paper is most consistent with the findings of Bustamante and Donangelo (2017), as picky customer firms are analogous to firms op- erating in concentrated industries. They are both less vulnerable to increased competition and thus are relatively more profitable in high competition regimes. Moreover, both have higher expected returns which, when combined with their positive real betas with respect to competition, implies a positive competition risk premium. The interaction of strategic customers and firm behavior is an important but lightly explored topic in finance research. As there has already been much attention paid to nuances of “sell-side” intangible capital like skilled vs. unskilled labor, key talent, and brand capital, extending the analysis to customers, the most prominent “buyside” intan- gible capital, is a natural next step. This paper sheds light on one important aspect of customer heterogeneity and how it impacts firms’ risks and competitive environments. The paper proceeds as follows. Section 2 discusses the conceptual background relat- ing picky customers to competition and firm risk in more detail. Section 3 describes the Amazon review data. Section 4 presents the empirical results on expected returns and product market competition. Section 5 concludes. 2. Background In this section, I describe the economic mechanism relating customer pickiness, product market competition, and expected returns. I use a small, qualitative model to help clarify the economic logic and produce some hypotheses to guide the empirical analysis. Cheng et al. (2021) defines a picky customer as one who has narrow preferences, so he or she is willing to consume a smaller set of products. Therefore, if picky customers find a 7 satisfactory product, they are less likely to switch to a different product when the product assortment changes, e.g. when new firms and products enter the market. Thus, the sales of firms with a large proportion of picky customers compared to non- picky customers will be better insulated from changes in product market competition. More specifically, when the level of competition increases, firms with picky customers will have smaller decreases in profits compared to firms with non-picky customers. Whether having pickier customers makes firms riskier or not depends on the risk pre- mium of product market competition. Because firms with picky customers have relatively high cash flows when competition increases and relatively low cash flows when it de- creases, they should have positive cash flow beta with respect to overall competitiveness in the economy. If product market competition has a positive risk premium, then firms with picky customers should be riskier as their payouts are positively exposed to competi- tion and competition has a positive premium. Conversely, if product market competition has a negative risk premium, then firms with picky customers should be less risky. I illustrate this economic mechanism with a very simple qualitative model. I will con- sider a simple firm with only customers and abstract away capital, labor, and most other considerations to focus on the interaction between picky customers, competition, and risk. Consider a 1-period all-equity firm with an exogenously determined endowment of picky customersP i . For simplicity, the picky customers in the model are extremely picky, soP i will only purchase firmi’s products. Additionally, suppose there is measureN of non-picky customers who will evenly allocate themselves acrossF t firms in the economy, wheret indexes time. These non-picky customers have no product preferences whatso- ever and will simply purchase the same amount from each firm. Thus, each firm will receive N =Ft share of non-picky purchases. To further simplify, suppose that each cus- tomer always buys one item, and firms incur a uniform, positive profit from each item 8 sold (they have no fixed costs and do not adjust pricing with respect to demand). So, firm profits are given by i;t =P i + N F t = P i;t +(1P i;t ) F t1 F t P i P i;t : (1) The proportion of picky customers for firmi is defined asP i;t = P i P i + N =F t1 . 3 Att=0, there is an initial number of firms in the economy denoted byF 0 , then att=1, the final number of firms,F 1 , is realized. Profits are paid out to equity holders and all firms liquidate. The number of firms is the only stochastic element in this simple economy and the source of all risk. IfF 1 >F 0 , the amount of product market competition increases and all firms obtain a smaller portion of the non-picky customer pool, decreasing profits. If F 1 <F 0 , competition decreases and firms obtain a larger number of non-picky customers. 4 The distribution ofF t is unimportant, as long as it has non-negative support. Finally, assume that the stochastic discount factor is linear: M t =abG( Ft =F t1 ): (2) HereG() is any increasing function andb determines the sign of the competition risk pre- mium. Whenb>0, competition is high in low marginal utility states and the competition risk premium is positive. Whenb < 0, competition is high in high marginal utility states and the competition risk premium is negative. Now, I examine how increasing competition affects profits and how this relationship changes when the proportion of picky customers varies. First, to more closely mirror the 3 Profit levels are dependent on firm size (the sum of picky and non-picky customers), but final results concerning profitability and risk do not depend on size. 4 I abstract away bankruptcy effects whenF 1 <F 0 . 9 empirics, I consider an ROA- or ROE-like scaled profitability measure: i;t = i;t P i + N =F t1 =P i;t +(1P i;t ) F t1 F t : (3) This is scaled by the total number of customers because, in this simple model, the only source of “book” assets are firms’ customer bases. The overall sensitivity oft = 1 profits to competition is given by i;1 F 1 =F 0 = P i;1 1 ( F 1 =F 0 ) 2 <0: (4) BecauseP i;1 2[0;1), this is always negative. This is intuitive given the model set-up where more competition does not affect sales from picky customers but reduces each firm’s share of non-picky customers. Then, the cross-derivative is simply i;1 F 1 =F 0 P i;1 = 1 ( F 1 =F 0 ) 2 : (5) This is positive, the opposite sign of the direct effect of competition on profits, so firms with more picky customers will be less sensitive to changes to competition. Hypothesis1. Product market competition is negatively correlated with firm profitability. How- ever, the profitability of firms with more picky customers is less sensitive to changes in competition. The expected return of firms at t = 0 in this 1-period model is simply the expected profit att=1 divided by the discounted value of the profit: E[R i;1 ]= E[ i;1 ] E[M 1 i;1 ] : (6) 10 The effect of the proportion of picky customers on firm risk is given by: E[R i;1 ] P i;1 =b cov(G( F 1 =F 0 ); F 0 =F 1 ): (7) G() is an increasing function so the covariance term is negative and the sign of the deriva- tive depends on the sign ofb. Ifb is positive (positive risk premium for competition), then firms with a higher proportion of picky customers are riskier. Ifb is negative, firms with a higher proportion of picky customers are less risky. The evidence in this paper is consistent with a positive risk premium for competition (b>0), as I find firms with pickier customers have higher returns. Finding the sign of the premium is outside the scope of this paper, but my evidence is consistent with a number of other papers arguing that the premium is positive. I can summarize the prediction concerning firm risk of this simple framework as the following: Hypothesis2. If the competition risk premium is positive, then firms with more picky customers should have higher expected returns. 3. Data This section describes the features of the Amazon review datasets, the matching and cleaning procedures connecting them to CRSP and Compustat, and the construction of the product market competition measures 3.1 AmazonReviewDatasets The Amazon reviews dataset used in this paper is a union of the reviews data provided by Amazon Web Services (AWS) and by Prof. Julian McAuley from UCSD. The AWS data contains 107,368,073 reviews for 15,103,480 products, and Julian McAuley’s data contains 11 82,456,878 reviews for 9,857,242 products and 110,796 brands. The primary data in both are timestamped review texts and star ratings linked to products and reviewers. However, the datasets span different time periods and contain slightly different metadata. The AWS data contains reviews from November 1995 to September 2015 and has explicit unique reviewer IDs. It contains each product’s ASIN (Amazon Standard Identification Number) used to uniquely identify products, but it does not have other product or brand metadata such as price or the brand name. Julian McAuley’s data contains reviews from May 1996 to July 2014. It has reviewer usernames, but not reviewer IDs. It also contains detailed product and brand metadata including the price of the product at the time the data was collected, the brand name, and product categorization (e.g. “Electronics > Cell Phones > Cell Phone Chargers”). I remove reviews for books, movies, and music and take the union of reviews in both datasets to obtain 80,156,983 reviews for 6,289,617 products and 106,785 brands in the combined data. 5 After combining both datasets, I compute review lengths as the word count and use this as the measure of customer pickiness. As shown through survey evidence by Cheng et al. (2021), pickier customers with more specific tastes leave more detailed product feed- back and write longer product descriptions of their ideal products. However, one issue with using review lengths is that average lengths may vary between different types of products in ways unrelated to customer pickiness. For example, more complicated prod- ucts with larger feature sets may have longer reviews. To account for variation across product types, I adjust each review length by its product category median length. To do this, I first use the product category metadata to group all products into category-years that contain at least 5 products in the year. For each unique category-year, I start with the most granular categorization (e.g. “Electronics> Cell Phones> Cell Phone Chargers > iPhone Chargers” in 2010). If this category contains less than 5 products, I make the 5 The full merging process is described in the appendix (Section A1). 12 categorization one level coarser (e.g. “Electronics> Cell Phones> Cell Phone Chargers”). I repeat until the category has at least 5 products, or I reach the broadest categorization. Then, within each category-year, I compute the median review length in the category. Finally, I compute two category-adjusted review lengths for each review by dividing by the category median (division-adjusted) or subtracting the category median (subtraction- adjusted). 3.2 Matching I hand-match the combined Amazon dataset with CRSP and Compustat by brand name. I start by using a list of CRSP/Compustat companies matchable to Amazon from Huang (2018), who also hand matches Amazon brands to CRSP . In his study, he shows that Ama- zon review star ratings are incorporated into equity prices with a delay, so a strategy of buying firms with high ratings and shorting firms with low ratings earns an abnormal profit. For each company on this list, I manually search their company website for a list of brands owned by the company and find matching brands in the Amazon dataset. I also use the USPTO trademark database to match brands to companies. I first perform an initial fuzzy string match between Amazon brand names and trademark names and between trademark owner names and CRSP company names. Then, I manually check each brand-company pair in the fuzzy-matched list for correctness. Checking for correctness mainly entails distinguishing similar brand names on Ama- zon. I do this by manually checking each brand’s storefront on Amazon for correctness. For example, if both “Red Foods” and “Red Store” brands exist on Amazon, and “Red” is a trademark owned by a consumer food products firm, I check both storefronts on Amazon to see which one sells consumer food products. If multiple brands sell the same products on Amazon, I include all of them in the final matched dataset. 13 After merging to CRSP , I filter out firm-years with less than 5 reviews and only use reviews from 2004 through 2015 to remove noise from the dataset . I begin the sample in 2004 because, as argued by Huang (2018), review data is much more reliable in 2004 as (1) Amazon disallowed anonymous reviews and required a valid credit card on file in June 2004, and (2) there are very few reviews for products from publicly-traded companies beforehand. The sample ends in 2015 due to data availability. I also implement standard asset pricing filters and only include common stocks trading on AMEX, NASDAQ, or NYSE. The hand-matching and filtering yields a final dataset of 4,608,424 reviews for 337 firms. As a point of comparison, Huang (2018) is able to match 346 firms. 3.3 Firm-LevelPickyCustomerMeasure I collapse the individual reviews’ (category-adjusted) lengths to the firm-year level by taking the mean review length for all reviews in each firm-year. Then, I use the standard Fama and French (1992) timing convention, and merge the average adjusted lengths of year t 1 with monthly returns in July of year t through June of year t + 1. 6 In cross- sectional regression analysis, I take logs of the mean adjusted review lengths as they are positive and right-skewed. Table 1 displays summary statistics for the final sample and also compares it to the overall Compustat sample which includes all firms without requiring a match to Amazon. Panel A shows summary statistics for the Amazon-matched sample, and panel B shows summary statistics for the overall Compustat sample. The Amazon sample contains, on average, 175.83 firms each year. As a point of comparison, the sample in Huang (2018) contains around 150 unique firms in each month. The adjusted length measures also show 6 The measure equal-weights all reviews, but, ideally, each review would be weighted by the relative importance of the reviewer or product to the company’s profits. However, as pointed out by Huang (2018), if the equal-weighted measure overlooks important product/reviewer heterogeneity, this should bias against finding results. 14 that reviews for the matched firms are also longer than their category median lengths. More specifically, they are 40% longer, or 9.77 words longer on average than the category median. The Amazon sample firms are much larger than the average Compustat firm in terms of total assets (17,305.45 Amazon mean vs. 6,204.66 Compustat mean) and market capitalization (22,350.11 Amazon mean vs. 4,224.96 Compustat mean). Finally, they tend to be more growth-oriented with an average book-to-market ratio of 0.57 compared to the overall Compustat average of 0.75. 3.4 ProductMarketCompetitionMeasures I use two separate measures of product market competition: (1) product market textual centrality from Hoberg and Phillips (2016), and (2) industry markups from Bustamante and Donangelo (2017). For both measures, I construct firm-level measures of exposure to product market competition as well as macroeconomic time series measuring the overall level of product market competition in the economy. For each year, Hoberg and Phillips (2016) construct a similarity matrix by computing all pairwise cosine similarities of the text of firms’ Form 10K business descriptions. High similarity between two firms indicates that they sell similar products or services. Hoberg and Phillips (2016) use their similarity measures to construct new industry classifications and show that they outperform standard classifications like SIC in terms of explaining between-industry variance in profitability, sales growth, and risk. Thus, these measures more accurately capture which firms are competing in similar product markets and also quantify how closely firms compete. I collapse the firm-firm-year similarity matrices into firm-year vectors to capture the overall product similarity of each firm to the entire economy. This measure distinguishes firms that sell highly differentiated products (low overall similarity to the rest of the econ- 15 omy) from firms that sell relatively commoditized products (high overall similarity to the rest of the economy). I do this by computing the eigenvector similarity of each similar- ity matrix, i.e. the leading eigenvector of each matrix. 7 Firms with higher eigencentrality have more similar product descriptions to all other firms and thus sell commodity prod- ucts. Meanwhile, firms with low eigencentrality sell more differentiated products. Firms that sell commodity products are more vulnerable to product market compe- tition, as they operate in product markets with lower barriers to entry. Therefore, the centrality of the firm is another proxy for its vulnerability to product market competition that should have the opposite effect of having picky customers: firms with higher cen- trality are more vulnerable to competition, while firms with more picky customers are less vulnerable to competition. In section 4.3, I will show evidence consistent with this hypothesis. Following Bustamante and Donangelo (2017), I compute industry markups in yeart as total industry sales minus total industry COGS all divided by total industry sales for each SIC 4-digit code. I assign the overall industry markup to each firm inside the industry to create a firm-year measure. A high markup indicates a lower competition environment, while a low markup indicates high competition. Bustamante and Donangelo (2017) also compute other competition proxies measuring industry concentration like HHI, but these other measures require Census data to construct accurately, as Ali et al. (2009) show that only using public firm data in Compustat incorrectly ignores private companies. How- ever, Bustamante and Donangelo (2017) find that the industry markup proxy does not suffer from the same problem. More specifically, they find that the Compustat-only in- dustry markup is 0.64 correlated with the industry markup including both private and 7 Eigenvector centrality is conceptually similar to the stationary distribution of Markov processes. High centrality indicates that the firm is “more connected” to other firms, while a high stationary probability indicates a state is well-connected to other states in the Markov process. See Borgatti (2005) for a discussion of various centrality measures. 16 public firms, while the Compustat-only HHI is only 0.09 correlated with the HHI includ- ing both private and public firms. For each measure, I also compute macroeconomic time series capturing the overall level of product market competition in the economy by computing the cross-sectional means of the firm-level measures in each year. If the mean centrality is high, firms are, on average, less differentiated from each other, and the overall level of product market competition should be higher. In contrast, when the mean centrality is low, firms are, on average, more differentiated from each other, and the overall level of product mar- ket competition should be lower. Similarly, high mean industry markups indicates low competition and low mean industry markups indicates high competition. It is difficult to directly confirm whether each times series directly captures overall product market competition. However, if both are proxies for overall product market competition in the economy, they should do so in opposite directions and should be negatively correlated. Figure 1 shows that this is indeed the case. The measures have -0.96 correlation when unadjusted and have -0.86 correlation when removing quadratic time trends. 4. Results 4.1 VerifyingtheMeasure I first verify a few properties of the review length measure: (1) adjusted review length correlates with other indicators of customer pickiness, (2) it does not correlate with other product characteristics, (3) and it predicts customer loyalty. Figure 2 plots the difference between the number of brands purchased by customers who write long reviews versus customers who write short reviews. If long reviews are in- dicative of customer pickiness, then customers who write long reviews should purchase 17 less unique brands, on average. To construct the plot, I first account for the mechanical relationship between number of reviews left and number of brands purchased by com- paring long vs. short reviewers among customers who write the same number of reviews. For each number of reviews left, I sort customers by average division-adjusted review lengths into quintiles. Then, I compute the difference between the log mean number of brands purchased in the top review length quintile and the bottom quintile for each num- ber of reviews left. The plot shows that, on average, customers who write longer reviews purchase a smaller number of brands compared with customers who write shorter re- views and have the same total number of reviews. The difference is less noticeable for customers who write a small number of reviews, as they may have only left reviews for a small fraction of their total purchases. However, for customers who leave many reviews (and thus should have a larger proportion of their purchasing behavior reflected in their reviewing behavior), the difference is stark, with long-reviewers purchasing significantly less brands than short-reviewers. Figure 3 plots the relationship between the log division-adjusted review lengths and various review and product characteristics for the 4,608,424 reviews in the matched re- view dataset. The adjusted length is positively-related to the number of helpful votes re- ceived by the review and negatively-related to the review’s star rating. This is consistent with the findings of Cheng et al. (2021) who find that picky customers leave more infor- mative feedback (more helpful reviews) and more critical feedback (lower star ratings). The adjusted length is also not monotonically related to the product price or the length of the product descriptions. This indicates that the adjusted reviews are not uniformly longer or shorter for more expensive or more complicated products. I also establish that the picky customer measure predicts customer loyalty. If pur- chasing activity was observable, the ideal test would be if I could observe whether picky 18 customers are more likely to make repeat purchases of the same brands. In other words, if a customer leaves a long review for a brand, they are a picky customer and should be more likely to purchase from the same brand in the future. I only observe reviews, so I use repeat reviews as a proxy for repeat purchases, as repeat reviews are likely due to repeat purchases. Many repeat purchasers will not leave a repeat review, so measuring repeat reviews may understate the true amount of repeat purchasing. Therefore, using repeat reviews may actually give conservative measurements of customer loyalty. Table 2 shows panel regressions of year-ahead repeat review measures on division- adjusted review lengths using the entire combined reviews dataset - including reviews that are not matched to CRSP firms. The dependent variables are repeat review measures for each brand-year that are created by tracking the future reviewing behavior of all cus- tomers that leave a review for the brand in yeart. F1 Has Loyal is a dummy for whether one of the reviewers in year t also leaves a review for the same brand in year t+1. F1 Has Excl. Loyal is a dummy for whether one of the reviewers in yeart leaves a review for the same brand and also does not leave a review for any other brand in year t + 1. F1 Perc. Loyal is the proportion of reviewers in yeart that leave a review for the same brand in yeart+1. F1 Perc. Excl. Loyal is the proportion of reviewers in yeart that leave a re- view for the same brand and also do not leave a review for any other brand in yeart+1. I include the “exclusive loyalty” variables to account for the possiblity that customers who write long reviews may be more active reviewers that write more reviews in gen- eral rather than being loyal to specific brands. The key independent variable is the year t mean division-adjusted review length converted into percentile rankings. The regres- sions also include control variables for average division-adjusted product prices and star ratings, the brand-year’s number of reviews, number of products, and number of unique reviewers. All regressions also contain brand and year fixed effects, and standard errors 19 are clustered by brand and year. Finally, all dependent variables are multiplied by 100 to aid interpretation. The key result is that the adjusted review length positively predicts all measures of year-ahead repeat reviewing, indicating that brands with pickier customers tend to have more customer loyalty in the future. The magnitudes are also economically significant. Columns (1) and (2) show that brands in the top percentile of average adjusted length have 2.08% to 2.29% higher probability of having a loyal reviewer in yeart+1 compared to the firms in the lowest percentile of length. The overall sample mean of the “has loyal” dummy is 11.10%. Columns (3) and (4) show similar results for exclusively loyalty. The top percentile is 1.64% to 1.85% more likely to have an exclusively loyal reviewer than the lowest percentile, compared to an overall sample mean of 10.32%. Columns (5) and (6) show that top percentile firms have 0.14% to 0.15% higher proportion of loyal review- ers than the lowest percentile, compared to an overall mean of 0.34% for the proportion of loyal reviewers. Columns (7) and (8) show that top percentile firms have 0.10% higher proportion of exclusively loyal reviewers than the lowest percentile, compared to an over- all mean of 0.29% for the proportion of exclusively loyal reviewers. The appendix contains a battery of robustness checks testing whether the brand loy- alty results hold when using length tercile dummies instead of percentiles, subtraction- adjusted lengths instead of division-adjusted, and two- and three-year ahead repeat re- viewer measures. The results are similar for all variations. One drawback of the preceding loyalty analysis is that it is conducted at the brand-year granularity level. Thus, it shows that brands with pickier customers tend to have more loyalty. However, it does not directly show that picky customers themselves are more loyal. It is conceivable that the predictive power of picky customers on brand loyalty could be due to indirect effects e.g., long and informative reviews reduce the information 20 gathering costs of other customers and thus increase all customers’ propensity to buy the brand’s products. To address this, I conduct a similar analysis at the brand-reviewer-year granularity level. This will show whether picky customers are themselves more likely to be repeat customers. If the loyalty effect is entirely indirect, then there should be no difference between reviewers who leave long reviews and other reviewers for the same brand. Table 3 shows panel regressions of year-ahead repeat review measures on division- adjusted average review lengths at the brand-reviewer-year granularity level. The de- pendent variables are F1 Loyal which tracks whether the reviewer at yeart leaves another review for the same brand in yeart+1 and F1 Excl. Loyal which tracks whether the re- viewer at yeart leaves another review for the same brand - and only that brand - in year t + 1. The key independent variables are Len. Ptile, the average review length for the reviewer-brand-year in year t converted into percentiles and Len. T2 and Len. T3 which are the same average review lengths converted into tercile dummies. All regressions con- tain brand and year fixed effects, and standard errors are clustered by brand and year. The results show that customers who leave longer reviews are more likely to be repeat review- ers, even for the same brand. Both the length percentile and tercile dummies significantly positively predict loyalty and exclusive loyalty. Moreover, the tercile results show that the average adjusted review length predicts monotonically higher loyalty. The results are also economically significant, as the overall means of F1 Loyal and F1 Excl. Loyal are 1.24% and 1.10%, respectively. Again, various robustness tests found in the appendix shows the customer loyalty results hold when varying the adjustment procedure and the number of years in the future. Tables 2 and 3 also show that the star rating is a consistent positive predictor of cus- tomer loyalty. This is unsurprising as customer satisfaction should drive repeat busi- 21 ness, but it raises the concern of how loyalty and customer satisfaction interact to drive customer loyalty. In other words, linearly controlling for star rating could be inade- quate. To address these concerns, I repeat the customer loyalty analysis subsampling by category-adjusted star quintiles. Table 4 shows that the loyalty results hold even within star quintiles. Again, robustness tests in the appendix show similar subsample results for subtraction-adjusted measures and for exclusive loyalty. Finally, I verify that the category-adjustment procedure removes industry effects in the final asset pricing sample. There are two points that I will make. First, the raw review lengths largely capture differences in product types that manifest in cross-industry vari- ance. This means using raw review lengths would muddle customer characteristics with product/industry characteristics. Second, dividing by or subtracting the category median lengths removes most of the product-type and industry variance. Table 5 reportsR 2 and adjustedR 2 values from regressing the raw and adjusted firm- year review length measures on SIC2-year dummies (Panel A) and firm dummies (Panel B). The first row in Panel A shows that 61% of the variance in the raw unadjusted length is explained by industry-year effects. In contrast, industry-year effects only explain 28% and 22% of the variance of division- and subtraction-adjusted measures, respectively. The last row of panel A shows that the average of the category median review length for each firm-year is almost entirely explained by industry-year effects, with the regression of the average category median lengths on SIC2-year dummies yielding 0.86R 2 . These results show that adjusting the raw measure is necessary, as the majority of the variance of the raw lengths and almost all of the variance of the category median lengths are explained by industry effects. Second, the results show that the division- and subtraction-adjustment procedures remove most of the industry-related variance. This is especially stark when viewing the adjustedR 2 values which are, respectively, 0.13 and 0.06 for the division- and 22 subtraction-adjusted measures. 4.2 AssetPricingTests In this section, I perform some standard asset pricing tests to show that the picky cus- tomer measure is positively correlated with expected returns. Namely, I perform time- series portfolio tests and cross-sectional Fama-MacBeth regression tests (Fama and Mac- Beth (1973)). Both show that firms with longer reviews earn higher expected returns. Table 6 displays average monthly returns of a value-weighted portfolio that buys the quartile of stocks with the longest review lengths and shorts the quartile of stocks with the shortest review lengths from July 2005 to June 2017. 8 The long-short portfolio sorted by division-adjusted length has a mean monthly return of 0.47% (5.64% annualized) which is statistically significant at the 5% level. Moreover, the portfolio has positive monthly alphas ranging from 0.49% (5.88% annualized) to 0.65% (7.80% annualized) for a vari- ety of factor models, indicating that the portfolio returns are not subsumed by standard risk factors. 9 All of the alphas are statistically significant at the 5% level except for the Carhart four factor model including the momentum factor which is only significant at the 10% level. Long-short quartile portfolios sorted by the subtraction-adjusted length yield similar results. The mean monthly long-short return is 0.45% (5.40% annualized) and the factor model alphas range from 0.44% (5.28% annualized) to 0.52% (6.24% annualized). All alphas are significant at the 5% except for the Fama-French five factor model which is significant at the 10% level. 8 The portfolio is constructed using the standard Fama-French timing convention. Each yeart at the end of June, stocks are ranked based on the average length of their reviews from calendar yeart1. I sort stocks into quartiles rather than deciles due to the small cross-section of around 175 stocks, on average. Tercile and quintile sorting yields similar, albeit noisier, results. 9 The models used are the CAPM, the Fama-French three factor model (Fama and French (1993)), the Carhart four factor model including momentum factor (Carhart (1997)), the Fama-French five factor model (Fama and French (2015)), and Lu Zhang’s q-factor model (Hou et al. (2015)). 23 Table 7 repeats the portfolio analysis with mean raw unadjusted lengths and product category median lengths. If the category-medians are related to product and/or industry effects and not related to customer pickiness, then the portfolios sorted by category me- dians should have no significant return spreads and those sorted by unadjusted reviews should have smaller spreads as well. The results are consistent with this hypothesis. The average returns for portfolios sorted by unadjusted review lengths are not significant at the 5% level, and neither are most of the factor model alphas, aside from the CAPM and q-factor model. Sorting by category medians yields statistically insignificant spreads and alphas, even at the 10% level. I also perform cross-sectional regression tests which allows me to control for a vari- ety of fundamental and Amazon-specific firm characteristics. Moreover, cross-sectional regression results are not beholden to portfolio weighting and sorting choices. Table 8 shows the results for cross-sectional regressions of returns on log mean division-adjusted review lengths and subtraction-adjusted review lengths. Columns (1) and (2) show re- sults for univariate regressions of returns on the length measures. Columns (3) and (4) add firm-level controls including log market cap, log book-to-market, ROE, asset growth, and R&D-to-market. Columns (5) and (6) are univariate regressions with industry-month dummies. Columns (7) and (8) contain both industry-month dummies and firm-level controls. The results for all specifications show that the review length measure positively predicts returns. A 100% increase in the division-adjusted review length predicts 0.74% to 0.88% higher monthly returns, and one additional word over the median length pre- dicts 0.02% to 0.026% higher monthly returns. The results hold for both adjustment pro- cedures, including/excluding controls, and including industry-month dummies. In the appendix, tables A20, A21, and A22 show the results continue to hold when weighting the cross-sectional regressions by June market cap; including controls for product and review 24 characteristics like helpful votes, star ratings, price, and product description lengths; and including controls for product turnover like the number and proportion of new products. Table 9 shows that the category median length does not have predictive power for returns, reinforcing the need to perform the category adjustment with cross-sectional re- gression evidence. In this table, I repeat the cross-sectional regression evidence replacing the adjusted length with the category median length. The only specifications that indicate predictive power for the category median are when weighting by market cap and includ- ing industry-month dummies. This is consistent with the product type and industry- related variance in review lengths being unrelated to customer pickiness, as including industry-month dummies is an indirect method of removing the category effects. 4.3 ProductMarketCompetition In this section, I explore the product market competition mechanism linking customer pickiness to firm risk. If firms with picky customers are indeed insulated from prod- uct market competition, then (1) their profitability should be less sensitive to increasing product market competition (i.e. their real beta with respect to proxies of product market competition should be lower than firms with less picky customers), and (2) the long-short returns of buying picky customer firms and shorting non-picky customer firms should be correlated with other portfolios/factors that are insulated from product market competi- tion. I perform real betas tests by regressing two measures of firm profitability - ROE and ROA - on the two macroeconomic time series proxying overall intensity of product mar- ket competition described in section 3.4. The two proxies are the time-series of the cross- sectional means of product similarity eigencentrality and of industry markups. 10 When 10 Both profitability and industry markups are conceptually similarly, so there is a concern of finding a mechanical relationship between the two. However, there are two layers of data aggregation that mitigate 25 the average product similarity is high in the economy, this indicates higher levels of com- petition. Inversely, low average industry markups are indicative of higher competition. The regression specification is as follows: Prof i;t+1 =Len i;t + Comp t+1 +(Len i;t Comp t+1 )+F i + 1 t+ 2 t 2 +" i;t : (8) Prof i;t+1 is firm profitability,Len i;t is the log mean division-adjusted length,Comp t is the macroeconomic time series proxy of product market competition,F i are firm fixed effects, andt andt 2 control for quadratic trends. 11 . Standard errors are clustered by firm and year. First, if higher competition decreases profitability, should be negative whenComp= EigCent (higherEigcent means higher competition) and positive whenComp=Markup (higherMarkup means lower competition). If this is true, then this indicates the measures are indeed proxying for competition in the expected directions. Second, if having picky customers reduces exposure to competition, then should be positive when Comp = EigCent and negative whenComp=Markup. Table 10 presents the real beta results. Columns (1), (2), (4), and (5) show results using the competition time series. Columns (3) and (6) replace competition with GDP growth to test if picky customers insulate firms from GDP growth shocks. First, we see that the coefficient describing the direct effect of competition on profits has the expected sign: lower centrality and higher markups are associated with higher profits. A 100% increase in centrality is correlated with a 1.04 (3.30) lower ROA (ROE), and a 100% increase in markups is correlated with a 1.70 (5.30) higher ROA (ROE). The coefficients for the inter- action terms are also consistent with the hypothesis that pickier customers insulate firms this mechanical relationship. First, each firm is assigned its overall industry constructed from all the firms in the industry. Second, the macroeconomic time series in each year is the cross-sectional mean of firm-level industry markups. 11 Table A23 gives similar results using subtraction-adjusted review lengths 26 from product market competition. The interaction term coefficients have opposite signs compared to the direct effect indicating that firms with longer reviews have profits that are less sensitive to product market competition. The magnitudes of the interaction terms are also economically significant. The estimated sensitivity of a firm’s ROA to EigCent ( ROA =EigCent) is -1.04 for a firm with review lengths equal to the category medians, on average. In contrast, a firm with reviews twice the length of category medians has an estimated sensitivity of -0.093, nearly zero. The magnitudes for markup sensitivity are similar: 1.70 for a category median firm, and 0.30 for a firm with reviews twice as long as the category median. Finally, I test whether picky firms load positively on competition risk. To this end, I regress the returns of the long-short review length portfolio on CAPM, Fama-French three factor, and Fama-French five factor models where I have added an additional factor proxying for competition risk. Because there is not an agreed upon mimicking portfolio for competition risk in the literature, I construct two mimicking portfolios from the firm- level measures of product centrality and industry markups used previously. I construct both factors such that they both have positive loadings on competition risk. Because firms with highly differentiated products are less vulnerable to product market competition, the centrality factor buys the lowest decile of firms by centrality (most differentiated prod- ucts) and shorts the highest centrality decile (most commoditized products). Similarly, high markup industries should have high barriers to entry, so the markup factor buys the highest industry markup decile and shorts the lowest industry markup decile. Both of these variables, like having picky customers, should insulate firms from new compe- tition. Therefore, firms with low centrality and high industry markups should also have relatively high profits during high competition regimes. If the review length portfolio positively loads on competition risk, then it should have positive loadings on these two 27 other portfolios. The results in table 11 are consistent with the hypothesis that picky firms load pos- itively on competition risk. Columns (1), (2) and (3) regress division-adjusted review length portfolios returns on factor models including an industry markup sorted portfo- lio. The review length portfolio loads positively on industry markup portfolios even after controlling for CAPM and Fama-French factors. Columns (4), (5) and (6) display similar results for factor models including centrality-sorted portfolios. However, the other com- petition factors do not fully explain the the review length portfolio’s returns. This could be because the review length portfolio loads on other factors, or it could be because the competition portfolios do not adequately capture product market competition risks. Ta- ble A24 in the appendix shows similar results for portfolios sorted on subtraction-adjusted review lengths. 5. Conclusion In this paper, I study the impact of customer capital heterogeneity on firm risk and their exposure to product market competition, highlighting an understudied type of intangible capital. Namely, I investigate the effect of picky customers: customers with narrow prod- uct tastes. I propose that picky customers should also be more loyal customers, as it is more difficult for competitors to match their highly specific product preferences. There- fore, firms with picky customers should be insulated from product market competition. I use hand-matched Amazon.com online product reviews data to construct a proxy for customer pickiness based on the length of reviews. As shown by prior literature in mar- keting and consumer psychology, the length of reviews, or, more generally, the amount of detail in product feedback, is a key characteristic of picky customers. I verify that cus- tomers who write long reviews are more loyal, and that firms with picky customers are 28 less vulnerable to product market competition. Moreover, I find that picky customer firms have higher expected returns. Another contribution of this paper is to provide additional evidence on the sign for the risk premium associated with product market competition. Currently in the litera- ture, there are countervailing views with some arguing that firms in more competitive industries are riskier due to higher operating leverage, and others arguing that compe- tition tends to increase during prosperous periods of low marginal utility and increased ease of starting profitable firms. Because I find that picky customer firms are relatively more profitable in high competition regimes and have high returns, this paper’s evidence supports is consistent with a positive risk premium for product market competition. More broadly, the interaction of strategic customers and firm behavior is an important but lightly explored topic in finance research. As there has already been much attention paid to nuances of “sell-side” intangible capital like skilled vs. unskilled labor, key talent, and brand capital, extending the analysis to customers, the most prominent “buyside” intangible capital, is a natural next step. 29 Bibliography Ali, Ashiq, Sandy Klasa, and Eric Yeung, 2009, The limitations of industry concentration measures constructed with compustat data: Implications for finance research, Rev. Financ. Stud. 22, 3839– 3871. Belo, Frederico, Jun Li, Xiaoji Lin, and Xiaofei Zhao, 2017, Labor-Force heterogeneity and asset prices: The importance of skilled labor, Rev. Financ. Stud. 30, 3669–3709. Belo, Frederico, Xiaoji Lin, and Santiago Bazdresch, 2014a, Labor hiring, investment, and stock return predictability in the cross section, J. Polit. Econ. 122, 129–177. Belo, Frederico, Xiaoji Lin, and Maria Ana Vitorino, 2014b, Brand capital and firm value, Rev. Econ. Dyn. 17, 150–169. Bernstein, Fernando, and Victor Mart´ ınez-de Alb´ eniz, 2017, Dynamic product rotation in the pres- ence of strategic customers, Manage. Sci. 63, 2092–2107. Borgatti, Stephen P , 2005, Centrality and network flow, Social Networks 27, 55–71. Bustamante, M Cecilia, and Andres Donangelo, 2017, Product market competition and industry returns, Rev. Financ. Stud. 30, 4216–4266. Cachon, G´ erard P , Christian Terwiesch, and Yi Xu, 2005, Retail assortment planning in the presence of consumer search, M&SOM 7, 330–346. Carhart, Mark M, 1997, On persistence in mutual fund performance, J. Finance 52, 57–82. Cheng, Andong, Hans Baumgartner, and Margaret G Meloy, 2021, Identifying picky shoppers: Who they are and how to spot them, J. Consum. Psychol. . Dou, Winston Wei, Yan Ji, David Reibstein, and Wei Wu, 2021, Inalienable customer capital, cor- porate liquidity, and stock returns, J. Finance 76, 211–265. Eisfeldt, Andrea L, and Dimitris Papanikolaou, 2013, Organization capital and the Cross-Section of expected returns, J. Finance 68, 1365–1406. Fama, Eugene F, and Kenneth R French, 1992, The Cross-Section of expected stock returns, J. Fi- nance 47, 427–465. Fama, Eugene F, and Kenneth R French, 1993, Common risk factors in the returns on stocks and bonds, J. financ. econ. 33, 3–56. 30 Fama, Eugene F, and Kenneth R French, 2015, A five-factor asset pricing model, J. financ. econ. 116, 1–22. Fama, Eugene F, and James D MacBeth, 1973, Risk, return, and equilibrium: Empirical tests, J. Polit. Econ. 81, 607–636. Gourio, Francois, and Leena Rudanko, 2014, Customer capital, Review of Economic Studies . He, Ruining, and Julian McAuley, 2016, Ups and downs: Modeling the visual evolution of fashion trends with One-Class collaborative filtering, in Proceedings of the 25th International Conference on World Wide Web, WWW ’16, 507–517 (International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE). Hoberg, Gerard, and Gordon Phillips, 2016, Text-Based network industries and endogenous prod- uct differentiation, J. Polit. Econ. 124, 1423–1465. Hou, Kewei, and David T Robinson, 2006, Industry concentration and average stock returns, J. Finance 61, 1927–1956. Hou, Kewei, Chen Xue, and Lu Zhang, 2015, Digesting anomalies: An investment approach, Rev. Financ. Stud. 28, 650–705. Huang, Jiekun, 2018, The customer knows best: The investment value of consumer opinions, J. financ. econ. 128, 164–182. Kilic, Mete, 2017, Asset pricing implications of hiring demographics . Lin, Xiaoji, Berardino Palazzo, and Fan Yang, 2020, The risks of old capital age: Asset pricing implications of technology adoption, J. Monet. Econ. 115, 145–161. McAuley, Julian, Christopher Targett, Qinfeng Shi, and Anton van den Hengel, 2015, Image-Based recommendations on styles and substitutes, in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, 43–52 (Association for Computing Machinery, New York, NY, USA). Zhang, Miao Ben, 2019, Labor-technology substitution: Implications for asset pricing, J. Finance 74, 1793–1839. 31 Figure 1: MeasuresofCompetition This figure shows the time-series relationship of the raw eigencentrality and industry markup measures (top) and the time-series relationship of the detrended measures (bottom). Eigen- centrality is denoted by the solid line and industry markup is denoted by the dotted line. 32 Figure 2: PickyvsNon-PickyNumberofBrands This figure shows the difference between the number of unique brands purchased by reviewers in the top quintile of adjusted review length compared to the number of unique brands purchased by customers in the bottom quintile. For each number of reviews, all customers are sorted by their average division-adjusted review length into quintiles. The number of unique brands purchased is averaged across customers within each (quintile, number of reviews) bucket. Each point on the plot is the average number of brands purchased by customers in the top quintile divided by the average number of brands purchased by customers in the bottom quintile for all customers who left the number of reviews stated on the x-axis. 33 Figure 3: ReviewLengthandReview/ProductCharacteristics This figure shows the relationship between category-adjusted review length and various review and product characteristics. The entire review sample is partitioned into 100 buckets by category- adjusted review length. Each point plots the mean of the characteristic in each bucket against the mean log division-adjusted review length in each bucket. 34 Table 1: SampleComparison Panel A reports the time-series averages of cross-sectional annual summary statistics for the sample of firms with valid Amazon reviews data used in the final analysis. Panel B re- ports the summary statistics for the entire CRSP-Compustat sample of firms. Both sam- ples only include common stocks trading on NYSE/AMEX/NASDAQ from July 2005 to June 2017. Adj. Len. Divide (Subtract) is computed by category-adjusting each review length by dividing by (subtracting) its product-category median review length and then averag- ing the category-adjusted lengths for each firm-year. ME is market capitalization. BM is the book-to-market ratio. ROE is return-on-equity. AT is total book assets. AG is as- set growth. See Appendix ?? for detailed variable definitions and sample selection criteria. PanelA:AmazonReviewsSample Obs Mean SD p25 p50 p75 Adj. Len. Divide 175:83 1:40 0:36 1:22 1:36 1:50 Adj. Len. Subtract 175:83 9:77 9:34 4:80 8:84 12:76 ME 175:25 22;350:11 51;367:23 680:24 2;957:36 13;631:46 BM 171:00 0:57 0:51 0:27 0:44 0:70 ROE 171:75 0:15 0:47 0:06 0:14 0:22 AT 175:83 17;305:45 56;631:59 686:54 2;423:45 10;643:60 AG 175:83 0:09 0:27 0:03 0:04 0:12 PanelB:FullSample Obs Mean SD p25 p50 p75 ME 3;417:42 4;224:96 14;454:13 117:35 508:52 2;155:07 BM 3;319:08 0:75 0:69 0:33 0:58 0:94 ROE 3;441:33 0:03 0:25 0:02 0:02 0:06 AT 3;442:00 6;204:66 24;570:61 169:13 691:69 2;724:04 AG 3;441:58 0:12 0:37 0:03 0:05 0:16 35 Table 2: BrandLoyaltyPercentiles:1-YearAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F1 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+1. In columns (3)-(4), F1 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+1 and no other brand int+1. In columns (5)-(6), F1 Perc. Loyal is the percentage of reviewers int who review the brand int + 1. In columns (7)-(8), F1 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 1 and no other brand int + 1. Len. Ptile is the mean category-adjusted review length per brand-year as a percentile. The control variables are other brand characteristics in yeart. Lengths are category-adjusted by dividing by the category median length in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F1 Has Loyal F1 Has Excl. Loyal F1 Perc. Loyal F1 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 2.078 2.285 1.635 1.849 0.144 0.148 0.101 0.104 (0.402) (0.451) (0.360) (0.396) (0.025) (0.024) (0.021) (0.021) Price 0.117 0.115 0.003 0.001 (0.097) (0.086) (0.006) (0.006) Stars 1.537 1.492 0.115 0.101 (0.424) (0.396) (0.029) (0.028) # Rev. 0.020 0.021 0.00002 0.0001 (0.007) (0.006) (0.0001) (0.0001) # Prod. 0.335 0.334 0.005 0.004 (0.079) (0.076) (0.003) (0.002) # Cust. 0.024 0.026 0.0002 0.0003 (0.007) (0.006) (0.0001) (0.0001) N 223,988 223,565 223,988 223,565 223,988 223,565 223,988 223,565 Adj. R 2 0.303 0.358 0.285 0.346 0.072 0.074 0.047 0.050 36 Table 3: CustomerLoyalty:1-YearLater This table reports panel regressions of customer loyalty on review length. The unit of observation is reviewer-brand-year. In columns (1)-(4), the dependent variable, F1 Loyal, is a dummy for whether the reviewer leaves at least one review for the same brand in yeart+1. In columns (5)-(8), F1 Excl. Loyal is a dummy for whether the reviewer reviews the same brand and does not review any other brand in yeart+1. Len. Ptile is the mean category-adjusted review length per reviewer-brand-year in yeart as a percentile. Len. T2 and Len. T3 are tercile dummies. Lengths are category-adjusted by dividing by the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number or products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F1 Loyal F1 Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 1.295 0.909 1.054 0.760 (0.163) (0.121) (0.127) (0.097) Len. T2 0.268 0.170 0.225 0.150 (0.024) (0.024) (0.019) (0.020) Len. T3 0.809 0.563 0.660 0.472 (0.101) (0.076) (0.080) (0.062) Price 0.024 0.025 0.017 0.018 (0.034) (0.034) (0.032) (0.032) Stars 0.618 0.607 0.578 0.569 (0.118) (0.115) (0.105) (0.103) # Rev. 0.123 0.124 0.097 0.097 (0.031) (0.031) (0.024) (0.024) # Prod. 0.0001 0.0001 0.0001 0.0001 (0.0001) (0.0001) (0.0001) (0.0001) # Cust. 0.00001 0.00001 0.00001 0.00001 (0.00001) (0.00001) (0.00001) (0.00001) N 13,158,834 13,157,332 13,158,834 13,157,332 13,158,834 13,157,332 13,158,834 13,157,332 Adj. R 2 0.012 0.020 0.011 0.020 0.009 0.015 0.009 0.015 37 Table 4: CustomerLoyaltybyStarRating This table reports panel regressions of customer loyalty on review length by star rating quintile. The unit of observation is reviewer-brand-year. Columns represent subsamples by mean category- adjusted star rating of the brand. The dependent variable, F1 Loyal, is a dummy for whether the reviewer leaves at least one review for the same brand in yeart+1. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in year t. Lengths are category- adjusted by dividing by the category median length in year t. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F1 Loyal (1) (2) (3) (4) (5) Len. T2 0.058 0.230 0.183 0.202 0.173 (0.009) (0.059) (0.023) (0.037) (0.017) Len. T3 0.275 0.805 0.515 0.659 0.523 (0.026) (0.133) (0.086) (0.146) (0.066) Price 0.013 0.020 0.133 0.030 0.025 (0.029) (0.068) (0.069) (0.055) (0.050) Stars 0.671 3.789 12.403 1.609 (0.154) (0.479) (0.015) (0.367) # Rev. 0.102 0.152 0.112 0.123 0.115 (0.025) (0.036) (0.028) (0.039) (0.030) # Prod. 0.0001 0.0002 0.0003 0.0001 0.0002 (0.0001) (0.0002) (0.0003) (0.0003) (0.0002) # Cust. 0.00001 0.00001 0.00002 0.00002 0.00001 (0.00001) (0.00001) (0.00001) (0.00001) (0.00001) N 2,630,024 2,631,994 2,631,789 2,631,989 2,631,536 Adj. R 2 0.002 0.026 0.013 0.014 0.013 38 Table 5: ReviewLengthDummyVariableAnalysis This table reports the R 2 and adjusted R 2 from regressing various measures of review length on sets of dummy variables. Panel A reports the values from regressing the raw review lengths, the division and subtraction category-adjusted review lengths, and the cat- egory median review lengths on SIC2 by Year dummies. The two rightmost columns contain the number of dummy variables and average number of observations in each SIC2 by year group. Panel B contains the same statistics except using firm dummies. PanelA:SIC2xYearDummies R 2 Adj. R 2 # Groups # Obs. in Group Length 0:61 0:53 356 5.93 Adj. Len. Divide 0:28 0:13 Adj. Len. Subtract 0:22 0:06 Category Length 0:86 0:83 PanelB:FirmDummies R 2 Adj. R 2 # Groups # Obs. in Group Length 0:46 0:35 337 6.26 Adj. Len. Divide 0:53 0:44 Adj. Len. Subtract 0:51 0:42 Category Length 0:36 0:24 39 Table 6: PortfolioSorting This table reports average monthly stock returns for value-weighted quartile portfolios sorted by category-adjusted review length. In the top row, review lengths are adjusted by dividing by product category median lengths. In the bottom row, review lengths are adjusted by subtract- ing product category median lengths. Excess portfolio returns are shown in columns 1 through 4. Long-short returns are shown in column 5. The remaining columns show the long-short port- folio’s alphas from various factor models. The sample spans July 2005 to June 2017. t-statistics are in parentheses and are computed using Newey-West standard errors with 4 month lags. Short 2 3 Long L-S CAPM FF3 FF4 FF5 Q4 Divide 0:43 0:64 1:07 0:96 0:53 0:53 0:49 0:49 0:65 0:58 (1:06) (1:49) (2:66) (2:42) (2:17) (2:16) (1:96) (1:95) (2:57) (2:37) Subtract 0:50 0:70 0:91 0:95 0:45 0:48 0:44 0:44 0:44 0:52 (1:21) (1:57) (2:24) (2:33) (2:07) (2:23) (2:04) (2:06) (1:85) (2:43) Table 7: PortfolioSorting: Alternatives This table reports average monthly stock returns for value-weighted quartile portfolios sorted by unadjusted review length and the product category median review length. Ex- cess portfolio returns are shown in columns 1 through 4. Long-short returns are shown in column 5. The remaining columns show the long-short portfolio’s alphas from var- ious factor models. The sample spans July 2005 to June 2017. t-statistics are in parentheses and are computed using Newey-West standard errors with 4 month lags. Short 2 3 Long L-S CAPM FF3 FF4 FF5 Q4 Unadjusted 0:54 0:67 0:75 1:01 0:47 0:48 0:42 0:42 0:39 0:51 (1:27) (1:54) (2:01) (2:49) (1:91) (1:97) (1:78) (1:77) (1:56) (2:18) Cat. Median 0:58 0:68 0:63 1:06 0:48 0:47 0:40 0:40 0:26 0:47 (1:31) (1:62) (1:95) (2:33) (1:46) (1:35) (1:24) (1:22) (0:83) (1:30) 40 Table 8: Cross-SectionalRegressions This table reports Fama-MacBeth cross-sectional regression results. Divide is the log of the mean division-adjusted review length where each length is adjusted by dividing by the category median length. Subtract is the mean subtraction-adjusted review length where each length is adjusted by subtracting the category median length. Controls include log market cap, log book-to-market, ROE, asset growth, and R&D-to-market. Columns (5) through (8) include SIC2-Month dummies. The sample spans July 2005 to June 2017. Newey-West Standard errors with 4 months of lag are shown in parentheses. (1) (2) (3) (4) (5) (6) (7) (8) Constant 0.743 0.805 1.143 1.181 1.244 1.295 1.590 1.634 (0.541) (0.546) (0.990) (0.995) (1.112) (1.110) (1.493) (1.496) Divide 0.789 0.824 0.743 0.876 (0.314) (0.343) (0.433) (0.418) Subtract 0.024 0.026 0.020 0.022 (0.009) (0.009) (0.010) (0.009) Controls No No Yes Yes No No Yes Yes SIC2-Month No No No No Yes Yes Yes Yes N 24,793 24,793 24,036 24,036 24,793 24,793 24,036 24,036 41 Table 9: CategoryMedianCross-SectionalRegressions This table reports Fama-MacBeth cross-sectional regression results. Cat. Med. is the log of mean category median review length. Controls include log market cap, log book-to-market, ROE, asset growth, and R&D-to-market. Columns (3), (4), (7), and (8) include SIC2-Month dum- mies. Columns (5) through (8) are weighted by June market cap. The sample spans July 2005 to June 2017. Newey-West Standard errors with 4 months of lag are shown in parentheses. (1) (2) (3) (4) (5) (6) (7) (8) Constant 0.031 0.115 0.682 0.907 2.974 0.724 4.324 2.109 (2.013) (2.096) (2.452) (2.618) (2.575) (2.297) (3.146) (3.205) Cat. Med. 0.294 0.444 0.084 0.178 1.022 0.639 1.558 1.320 (0.506) (0.474) (0.499) (0.501) (0.741) (0.577) (0.722) (0.649) Controls No Yes No Yes No Yes No Yes SIC2-Month No No Yes Yes No No Yes Yes Weighted No No No No Yes Yes Yes Yes N 24,793 24,036 24,793 24,036 24,793 24,036 24,793 24,036 42 Table 10: RealBeta This table reports panel regressions of profitability on macroeconomic measures of competition. ROA is firm-level net income divided by total assets in year t + 1. ROE is firm-level net income divided by book equity int + 1 where book equity is defined following Fama and French (1992). Divide is log firm-level average category-adjusted review length (adjusted by dividing by product category median lengths). EigCent is the cross-sectional mean of all firms’ eigencentrality int+1. Markup is the cross-sectional mean of all firms’ industry markup. GDP is percent growth of real GDP int+1. All regressions include firm fixed effects and control for quadratic time trends. The sample spans 2005 through 2017. Firm- and year-clustered standard errors are in parentheses. ROA ROE (1) (2) (3) (4) (5) (6) Divide 0.181 0.024 0.030 0.385 0.021 0.293 (0.057) (0.017) (0.027) (0.225) (0.064) (0.184) EigCent 1.036 3.302 (0.268) (1.621) Markup 0.017 0.053 (0.004) (0.027) GDP 0.973 5.845 (0.292) (2.913) Divide:EigCent 0.943 2.104 (0.281) (1.132) Divide:Markup 0.014 0.024 (0.004) (0.019) Divide:GDP 0.506 6.912 (0.653) (4.613) N 2,074 2,074 2,074 2,020 2,020 2,020 Adj. R 2 0.476 0.476 0.487 0.373 0.373 0.380 43 Table 11: FactorLoadings This table reports factor loadings for monthly value-weighted long-short quartile portfolio returns sorted by average category-adjusted review length. MARK is a portfolio that buys firms in the highest decile by industry markup and shorts firms in the lowest decile by industry markup. HP is a portfolio that buys firms in the lowest decile of Hoberg-Phillips eigencentrality and shorts firms in the highest decile of eigencentrality. The other factors are the standard Fama-French five factors. Review lengths are adjusted by dividing by the product category median length. The sample spans July 2005 to June 2017. Newey-West standard errors with 4-month lags are reported in parentheses. (1) (2) (3) (4) (5) (6) Constant 0.496 0.454 0.582 0.465 0.435 0.620 (0.247) (0.246) (0.256) (0.256) (0.263) (0.256) MARK 0.225 0.241 0.203 (0.090) (0.099) (0.101) HP 0.118 0.125 0.169 (0.054) (0.061) (0.078) MKT 0.057 0.006 0.038 0.053 (0.075) (0.077) (0.075) (0.070) SMB 0.036 0.009 (0.145) (0.134) HML 0.080 0.064 (0.181) (0.175) RMW 0.331 0.509 (0.166) (0.189) CMA 0.105 0.293 (0.260) (0.287) N 144 144 144 144 144 144 44 Appendix A1 CombiningAWSandJulianMcAuleyData This section describes the procedure for taking the union of the AWS reviews data and Julian McAuley’s review data. I essentially perform an outer join on the datasets, but there are some idiosyncrasies that complicate the task. The primary difficulties are that the timestamps for the same review can differ by up to one week between the two datasets, and the review texts can differ slightly due to leading/trailing spaces, one character corruptions, etc. To account for these issues, I implement a fuzzy-join procedure to find and remove reviews that are in both datasets. First, I match all reviews for the same ASIN and star rating with timestamps within10 days of each other. Among these reviews, some have exactly matching review bodies and titles, so these are easily marked as duplicates. When the same review exists in both Julian’s data and the AWS data, I discard the AWS review and use Julian’s review. For the remaining reviews in AWS, I compute three flags. First, I flag if the review title edit distance is 3 or less characters from any (rating, ASIN, date window) matched review in Julian’s data. Then, I flag if the first 20 characters of the review edit distance is within min(4;0:25(# characters in review)) of the first twenty characters in Julian’s reviews. I compute the same flag for the last twenty characters. 12 If all three flags are true, then I mark the review as a duplicate and remove it. After removing all duplicate reviews in the AWS data, I simply concatenate the two datasets. 12 I use the first and last twenty characters instead of the entire review to reduce computation time. 45 A2 SupplementaryFiguresandTables Figure A1: HistogramofHelpfulVotesProportion This figure shows the histogram of the proportion of helpful votes (number of helpful votes di- vided by total number of votes). The vertical axis denotes the proportion of the entire sample of reviews falling into each histogram bucket. 46 Table A1: BrandLoyaltyTerciles:1-YearAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F1 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+1. In columns (3)-(4), F1 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+1 and no other brand int+1. In columns (5)-(6), F1 Perc. Loyal is the percentage of reviewers int who review the brand int + 1. In columns (7)-(8), F1 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 1 and no other brand int+1. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in yeart. Lengths are category-adjusted by dividing by the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F1 Has Loyal F1 Has Excl. Loyal F1 Perc. Loyal F1 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. T2 2.576 2.223 2.343 1.994 0.071 0.066 0.063 0.059 (0.468) (0.456) (0.460) (0.445) (0.014) (0.014) (0.014) (0.014) Len. T3 1.785 1.839 1.479 1.538 0.100 0.102 0.073 0.074 (0.359) (0.376) (0.323) (0.332) (0.018) (0.017) (0.015) (0.014) Price 0.110 0.109 0.002 0.0003 (0.096) (0.085) (0.006) (0.006) Star 1.584 1.539 0.115 0.101 (0.433) (0.405) (0.029) (0.028) # Rev. 0.020 0.021 0.00003 0.0001 (0.007) (0.006) (0.0001) (0.0001) # Prod. 0.333 0.333 0.005 0.004 (0.079) (0.076) (0.003) (0.002) # Cust. 0.024 0.026 0.0002 0.0003 (0.007) (0.006) (0.0001) (0.0001) N 223,988 223,565 223,988 223,565 223,988 223,565 223,988 223,565 Adj. R 2 0.304 0.359 0.285 0.346 0.072 0.074 0.047 0.050 47 Table A2: BrandLoyaltyPercentiles:2-YearsAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F2 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both year t andt+2. In columns (3)-(4), F2 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in year t and yeart + 2 and no other brand int + 2. In columns (5)-(6), F2 Perc. Loyal is the percentage of reviewers int who review the brand in t + 2. In columns (7)-(8), F2 Perc. Excl. Loyal is the percentage of reviewers in t who review the brand in t + 2 and no other brand int + 2. Len. Ptile is the mean category-adjusted review length per brand-year as a percentile. Lengths are category-adjusted by dividing by the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F2 Has Loyal F2 Has Excl. Loyal F2 Perc. Loyal F2 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 1.223 1.344 1.003 1.128 0.077 0.079 0.058 0.060 (0.229) (0.191) (0.193) (0.146) (0.013) (0.012) (0.010) (0.009) Price 0.127 0.155 0.003 0.005 (0.104) (0.105) (0.005) (0.005) Stars 0.521 0.519 0.022 0.019 (0.401) (0.450) (0.017) (0.015) # Rev. 0.029 0.017 0.0001 0.0001 (0.006) (0.006) (0.0002) (0.0001) # Prod. 0.339 0.321 0.004 0.004 (0.068) (0.061) (0.002) (0.002) # Cust. 0.059 0.051 0.00005 0.00004 (0.007) (0.006) (0.0002) (0.0001) N 159,188 158,902 159,188 158,902 159,188 158,902 159,188 158,902 Adj. R 2 0.221 0.313 0.203 0.299 0.013 0.016 0.001 0.004 48 Table A3: BrandLoyaltyTerciles:2-YearsAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F2 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+2. In columns (3)-(4), F2 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+2 and no other brand int+2. In columns (5)-(6), F2 Perc. Loyal is the percentage of reviewers int who review the brand int + 2. In columns (7)-(8), F2 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 2 and no other brand int+2. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in yeart. Lengths are category-adjusted by dividing by the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F2 Has Loyal F2 Has Excl. Loyal F2 Perc. Loyal F2 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. T2 1.432 0.905 1.364 0.863 0.036 0.030 0.034 0.029 (0.291) (0.210) (0.279) (0.196) (0.010) (0.009) (0.009) (0.008) Len. T3 0.961 0.992 0.824 0.860 0.052 0.052 0.042 0.042 (0.197) (0.177) (0.193) (0.171) (0.010) (0.010) (0.008) (0.008) Price 0.123 0.152 0.003 0.005 (0.104) (0.105) (0.005) (0.005) Stars 0.530 0.531 0.022 0.019 (0.406) (0.455) (0.017) (0.015) # Rev. 0.029 0.017 0.0001 0.0001 (0.006) (0.006) (0.0002) (0.0001) # Prod. 0.338 0.320 0.004 0.004 (0.068) (0.060) (0.002) (0.002) # Cust. 0.059 0.051 0.0001 0.00004 (0.007) (0.006) (0.0002) (0.0001) N 159,188 158,902 159,188 158,902 159,188 158,902 159,188 158,902 Adj. R 2 0.222 0.313 0.203 0.299 0.013 0.016 0.002 0.004 49 Table A4: BrandLoyaltyPercentiles:3-YearsAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F3 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both year t andt+3. In columns (3)-(4), F3 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in year t and yeart + 3 and no other brand int + 3. In columns (5)-(6), F3 Perc. Loyal is the percentage of reviewers int who review the brand in t + 3. In columns (7)-(8), F3 Perc. Excl. Loyal is the percentage of reviewers in t who review the brand in t + 3 and no other brand int + 3. Len. Ptile is the mean category-adjusted review length per brand-year as a percentile. Lengths are category-adjusted by dividing by the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F3 Has Loyal F3 Has Excl. Loyal F3 Perc. Loyal F3 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 0.480 0.430 0.392 0.348 0.037 0.036 0.027 0.027 (0.201) (0.164) (0.170) (0.165) (0.015) (0.015) (0.011) (0.010) Price 0.001 0.012 0.001 0.001 (0.070) (0.076) (0.003) (0.003) Stars 0.378 0.296 0.042 0.034 (0.230) (0.243) (0.014) (0.015) # Rev. 0.002 0.007 0.0001 0.0001 (0.010) (0.010) (0.0003) (0.0002) # Prod. 0.413 0.375 0.005 0.005 (0.101) (0.097) (0.003) (0.002) # Cust. 0.043 0.038 0.0004 0.0003 (0.013) (0.007) (0.0003) (0.0002) N 111,382 111,186 111,382 111,186 111,382 111,186 111,382 111,186 Adj. R 2 0.169 0.271 0.151 0.255 0.046 0.042 0.054 0.050 50 Table A5: BrandLoyaltyTerciles:3-YearsAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F3 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+3. In columns (3)-(4), F3 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+3 and no other brand int+3. In columns (5)-(6), F3 Perc. Loyal is the percentage of reviewers int who review the brand int + 3. In columns (7)-(8), F3 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 3 and no other brand int+3. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in yeart. Lengths are category-adjusted by dividing by the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F3 Has Loyal F3 Has Excl. Loyal F3 Perc. Loyal F3 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. T2 1.090 0.449 1.021 0.417 0.023 0.018 0.021 0.016 (0.249) (0.282) (0.228) (0.267) (0.008) (0.009) (0.006) (0.007) Len. T3 0.358 0.282 0.298 0.230 0.025 0.024 0.020 0.019 (0.148) (0.100) (0.127) (0.090) (0.010) (0.010) (0.007) (0.007) Price 0.003 0.014 0.001 0.001 (0.071) (0.077) (0.003) (0.003) Stars 0.382 0.300 0.042 0.034 (0.232) (0.245) (0.014) (0.015) # Rev. 0.001 0.007 0.0001 0.0001 (0.010) (0.010) (0.0003) (0.0002) # Prod. 0.412 0.374 0.005 0.004 (0.102) (0.097) (0.003) (0.002) # Cust. 0.043 0.038 0.0004 0.0003 (0.013) (0.007) (0.0003) (0.0002) N 111,382 111,186 111,382 111,186 111,382 111,186 111,382 111,186 Adj. R 2 0.170 0.271 0.152 0.255 0.046 0.042 0.054 0.050 51 Table A6: SubtractBrandLoyaltyPercentiles:1-YearAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F1 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+1. In columns (3)-(4), F1 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+1 and no other brand int+1. In columns (5)-(6), F1 Perc. Loyal is the percentage of reviewers int who review the brand int + 1. In columns (7)-(8), F1 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 1 and no other brand int + 1. Len. Ptile is the mean category-adjusted review length per brand-year as a percentile. The control variables are other brand characteristics in yeart. Lengths are category-adjusted by subtracting the category median length in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F1 Has Loyal F1 Has Excl. Loyal F1 Perc. Loyal F1 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 2.594 2.718 2.145 2.269 0.153 0.159 0.109 0.113 (0.485) (0.506) (0.447) (0.455) (0.025) (0.025) (0.022) (0.021) Price 0.002 0.002 0.0002 0.0001 (0.002) (0.002) (0.0001) (0.0001) Stars 0.372 0.358 0.027 0.023 (0.105) (0.101) (0.006) (0.006) # Prod. 0.326 0.325 0.005 0.004 (0.078) (0.075) (0.003) (0.002) # Cust. 0.006 0.007 0.0002 0.0002 (0.002) (0.002) (0.0001) (0.0001) N 223,988 223,565 223,988 223,565 223,988 223,565 223,988 223,565 Adj. R 2 0.304 0.358 0.285 0.345 0.072 0.074 0.047 0.050 52 Table A7: SubtractBrandLoyaltyTerciles:1-YearAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F1 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+1. In columns (3)-(4), F1 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+1 and no other brand int+1. In columns (5)-(6), F1 Perc. Loyal is the percentage of reviewers int who review the brand int + 1. In columns (7)-(8), F1 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 1 and no other brand int+1. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in yeart. Lengths are category-adjusted by subtracting the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F1 Has Loyal F1 Has Excl. Loyal F1 Perc. Loyal F1 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. T2 2.292 2.114 2.069 1.900 0.066 0.062 0.059 0.055 (0.441) (0.458) (0.436) (0.449) (0.014) (0.014) (0.014) (0.014) Len. T3 2.065 2.035 1.781 1.750 0.106 0.108 0.079 0.080 (0.365) (0.366) (0.338) (0.331) (0.016) (0.016) (0.015) (0.014) Price 0.002 0.002 0.0002 0.0001 (0.002) (0.002) (0.0001) (0.0001) Stars 0.379 0.366 0.026 0.023 (0.106) (0.102) (0.006) (0.006) # Prod. 0.325 0.323 0.005 0.004 (0.078) (0.075) (0.003) (0.002) # Cust. 0.006 0.007 0.0002 0.0002 (0.002) (0.002) (0.0001) (0.0001) N 223,988 223,565 223,988 223,565 223,988 223,565 223,988 223,565 Adj. R 2 0.304 0.358 0.285 0.345 0.072 0.074 0.047 0.050 53 Table A8: SubtractBrandLoyaltyPercentiles:2-YearsAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F2 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+2. In columns (3)-(4), F2 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+2 and no other brand int+2. In columns (5)-(6), F2 Perc. Loyal is the percentage of reviewers int who review the brand int + 2. In columns (7)-(8), F2 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 2 and no other brand int + 2. Len. Ptile is the mean category-adjusted review length per brand-year as a percentile. The control variables are other brand characteristics in yeart. Lengths are category-adjusted by subtracting the category median length in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F2 Has Loyal F2 Has Excl. Loyal F2 Perc. Loyal F2 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 1.628 1.590 1.421 1.387 0.079 0.080 0.061 0.062 (0.277) (0.264) (0.233) (0.214) (0.014) (0.014) (0.011) (0.010) Price 0.001 0.002 0.00000 0.0001 (0.003) (0.003) (0.0002) (0.0002) Stars 0.133 0.132 0.005 0.005 (0.105) (0.115) (0.004) (0.003) # Prod. 0.328 0.315 0.004 0.004 (0.069) (0.062) (0.002) (0.002) # Cust. 0.034 0.035 0.0002 0.0001 (0.006) (0.006) (0.0001) (0.0001) N 159,188 158,902 159,188 158,902 159,188 158,902 159,188 158,902 Adj. R 2 0.221 0.312 0.203 0.299 0.013 0.016 0.001 0.004 54 Table A9: SubtractBrandLoyaltyTerciles:2-YearsAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F2 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+2. In columns (3)-(4), F2 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+2 and no other brand int+2. In columns (5)-(6), F2 Perc. Loyal is the percentage of reviewers int who review the brand int + 2. In columns (7)-(8), F2 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 2 and no other brand int+2. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in yeart. Lengths are category-adjusted by subtracting the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F2 Has Loyal F2 Has Excl. Loyal F2 Perc. Loyal F2 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. T2 1.182 0.784 1.116 0.735 0.031 0.025 0.029 0.024 (0.289) (0.182) (0.276) (0.171) (0.011) (0.010) (0.010) (0.009) Len. T3 1.337 1.166 1.179 1.013 0.057 0.056 0.045 0.045 (0.245) (0.213) (0.240) (0.209) (0.011) (0.011) (0.009) (0.008) Price 0.001 0.001 0.00001 0.0001 (0.003) (0.003) (0.0002) (0.0002) Stars 0.134 0.133 0.005 0.005 (0.105) (0.115) (0.004) (0.003) # Prod. 0.327 0.314 0.004 0.004 (0.069) (0.062) (0.002) (0.002) # Cust. 0.034 0.036 0.0002 0.0001 (0.006) (0.006) (0.0001) (0.0001) N 159,188 158,902 159,188 158,902 159,188 158,902 159,188 158,902 Adj. R 2 0.222 0.312 0.203 0.299 0.013 0.016 0.002 0.004 55 Table A10: SubtractBrandLoyaltyPercentiles:3-YearsAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F3 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+3. In columns (3)-(4), F3 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+3 and no other brand int+3. In columns (5)-(6), F3 Perc. Loyal is the percentage of reviewers int who review the brand int + 3. In columns (7)-(8), F3 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 3 and no other brand int + 3. Len. Ptile is the mean category-adjusted review length per brand-year as a percentile. The control variables are other brand characteristics in yeart. Lengths are category-adjusted by subtracting the category median length in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F3 Has Loyal F3 Has Excl. Loyal F3 Perc. Loyal F3 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 0.888 0.672 0.795 0.592 0.040 0.040 0.031 0.030 (0.194) (0.165) (0.174) (0.176) (0.015) (0.014) (0.010) (0.010) Price 0.002 0.001 0.00005 0.00002 (0.002) (0.002) (0.0001) (0.0001) Stars 0.070 0.052 0.009 0.007 (0.045) (0.051) (0.003) (0.003) # Prod. 0.413 0.378 0.005 0.005 (0.099) (0.097) (0.003) (0.002) # Cust. 0.041 0.044 0.0003 0.0002 (0.007) (0.007) (0.0001) (0.0001) N 111,382 111,186 111,382 111,186 111,382 111,186 111,382 111,186 Adj. R 2 0.169 0.271 0.151 0.255 0.046 0.042 0.054 0.050 56 Table A11: SubtractBrandLoyaltyTerciles:3-YearsAhead This table reports panel regressions of brand loyalty on review length. The unit of observation is brand-year. In columns (1)-(2), the dependent variable, F3 Has Loyal, is a dummy for whether the brand has at least one person who reviews in both yeart and t+3. In columns (3)-(4), F3 Has Excl. Loyal is a dummy for whether the brand has at least one person who reviews in yeart and yeart+3 and no other brand int+3. In columns (5)-(6), F3 Perc. Loyal is the percentage of reviewers int who review the brand int + 3. In columns (7)-(8), F3 Perc. Excl. Loyal is the percentage of reviewers int who review the brand int + 3 and no other brand int+3. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in yeart. Lengths are category-adjusted by subtracting the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F3 Has Loyal F3 Has Excl. Loyal F3 Perc. Loyal F3 Perc. Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. T2 0.785 0.244 0.710 0.205 0.018 0.013 0.016 0.011 (0.183) (0.220) (0.150) (0.192) (0.007) (0.007) (0.006) (0.006) Len. T3 0.619 0.383 0.551 0.327 0.027 0.026 0.022 0.021 (0.139) (0.109) (0.130) (0.126) (0.009) (0.009) (0.006) (0.006) Price 0.002 0.001 0.0001 0.00002 (0.002) (0.002) (0.0001) (0.0001) Stars 0.068 0.050 0.009 0.007 (0.046) (0.051) (0.003) (0.003) # Prod. 0.412 0.378 0.005 0.005 (0.099) (0.097) (0.003) (0.002) # Cust. 0.041 0.044 0.0003 0.0002 (0.007) (0.007) (0.0001) (0.0001) N 111,382 111,186 111,382 111,186 111,382 111,186 111,382 111,186 Adj. R 2 0.169 0.271 0.151 0.255 0.046 0.042 0.054 0.050 57 Table A12: CustomerLoyalty:2-YearsLater This table reports panel regressions of customer loyalty on review length. The unit of observation is reviewer-brand-year. In columns (1)-(4), the dependent variable, F2 Loyal, is a dummy for whether the reviewer leaves at least one review for the same brand in yeart+2. In columns (5)-(8), F2 Excl. Loyal is a dummy for whether the reviewer reviews the same brand and does not review any other brand in yeart+2. Len. Ptile is the mean category-adjusted review length per reviewer-brand-year in yeart as a percentile. Len. T2 and Len. T3 are tercile dummies. Lengths are category-adjusted by dividing by the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number or products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F2 Loyal F2 Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 1.000 0.705 0.839 0.601 (0.139) (0.075) (0.110) (0.062) Len. T2 0.197 0.142 0.174 0.131 (0.023) (0.012) (0.017) (0.009) Len. T3 0.631 0.443 0.531 0.380 (0.091) (0.049) (0.073) (0.042) Price 0.009 0.008 0.007 0.006 (0.023) (0.023) (0.020) (0.020) Stars 0.416 0.406 0.393 0.385 (0.067) (0.067) (0.059) (0.059) # Rev. 0.131 0.132 0.107 0.108 (0.039) (0.039) (0.031) (0.031) # Prod. 0.0001 0.0001 0.0001 0.0001 (0.0001) (0.0001) (0.0001) (0.0001) # Cust. 0.00001 0.00001 0.00001 0.00001 (0.00001) (0.00001) (0.00001) (0.00001) N 5,879,327 5,878,334 5,879,327 5,878,334 5,879,327 5,878,334 5,879,327 5,878,334 Adj. R 2 0.005 0.012 0.005 0.012 0.004 0.009 0.004 0.009 58 Table A13: CustomerLoyalty:3-YearsLater This table reports panel regressions of customer loyalty on review length. The unit of observation is reviewer-brand-year. In columns (1)-(4), the dependent variable, F3 Loyal, is a dummy for whether the reviewer leaves at least one review for the same brand in yeart+3. In columns (5)-(8), F3 Excl. Loyal is a dummy for whether the reviewer reviews the same brand and does not review any other brand in yeart+3. Len. Ptile is the mean category-adjusted review length per reviewer-brand-year in yeart as a percentile. Len. T2 and Len. T3 are tercile dummies. Lengths are category-adjusted by dividing by the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number or products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F3 Loyal F3 Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 0.829 0.530 0.707 0.458 (0.127) (0.078) (0.098) (0.065) Len. T2 0.130 0.078 0.114 0.071 (0.026) (0.021) (0.024) (0.021) Len. T3 0.522 0.331 0.444 0.286 (0.082) (0.051) (0.063) (0.042) Price 0.0003 0.0002 0.002 0.003 (0.019) (0.019) (0.019) (0.019) Stars 0.263 0.255 0.248 0.241 (0.021) (0.021) (0.021) (0.021) # Rev. 0.159 0.159 0.133 0.133 (0.047) (0.047) (0.036) (0.036) # Prod. 0.0003 0.0003 0.0002 0.0002 (0.0002) (0.0002) (0.0001) (0.0001) # Cust. 0.00002 0.00002 0.00002 0.00002 (0.00003) (0.00003) (0.00003) (0.00003) N 3,102,122 3,101,470 3,102,122 3,101,470 3,102,122 3,101,470 3,102,122 3,101,470 Adj. R 2 0.0004 0.008 0.0003 0.008 0.001 0.005 0.001 0.005 59 Table A14: SubtractCustomerLoyalty:1-YearLater This table reports panel regressions of customer loyalty on review length. The unit of observation is reviewer-brand-year. In columns (1)-(4), the dependent variable, F1 Loyal, is a dummy for whether the reviewer leaves at least one review for the same brand in yeart+1. In columns (5)-(8), F1 Excl. Loyal is a dummy for whether the reviewer reviews the same brand and does not review any other brand in yeart+1. Len. Ptile is the mean category-adjusted review length per reviewer-brand-year in yeart as a percentile. Len. T2 and Len. T3 are tercile dummies. Lengths are category-adjusted by subtracting the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number or products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F1 Loyal F1 Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 1.285 0.907 1.047 0.759 (0.167) (0.124) (0.131) (0.100) Len. T2 0.260 0.164 0.217 0.143 (0.025) (0.024) (0.019) (0.019) Len. T3 0.805 0.561 0.658 0.472 (0.101) (0.077) (0.080) (0.062) Price 0.001 0.001 0.001 0.001 (0.001) (0.001) (0.001) (0.001) Stars 0.131 0.128 0.122 0.120 (0.026) (0.025) (0.023) (0.023) # Rev. 0.123 0.124 0.097 0.097 (0.031) (0.031) (0.024) (0.024) # Prod. 0.0001 0.0001 0.0001 0.0001 (0.0001) (0.0001) (0.0001) (0.0001) # Cust. 0.00001 0.00001 0.00001 0.00001 (0.00001) (0.00001) (0.00001) (0.00001) N 13,158,834 13,157,332 13,158,834 13,157,332 13,158,834 13,157,332 13,158,834 13,157,332 Adj. R 2 0.012 0.020 0.011 0.020 0.009 0.015 0.009 0.015 60 Table A15: SubtractCustomerLoyalty:2-YearsLater This table reports panel regressions of customer loyalty on review length. The unit of observation is reviewer-brand-year. In columns (1)-(4), the dependent variable, F2 Loyal, is a dummy for whether the reviewer leaves at least one review for the same brand in yeart+2. In columns (5)-(8), F2 Excl. Loyal is a dummy for whether the reviewer reviews the same brand and does not review any other brand in yeart+2. Len. Ptile is the mean category-adjusted review length per reviewer-brand-year in yeart as a percentile. Len. T2 and Len. T3 are tercile dummies. Lengths are category-adjusted by subtracting the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number or products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F2 Loyal F2 Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 1.004 0.712 0.844 0.608 (0.144) (0.079) (0.114) (0.066) Len. T2 0.189 0.134 0.167 0.123 (0.020) (0.010) (0.014) (0.008) Len. T3 0.624 0.439 0.527 0.377 (0.094) (0.052) (0.075) (0.044) Price 0.0002 0.0002 0.0001 0.0001 (0.001) (0.001) (0.001) (0.001) Stars 0.090 0.087 0.085 0.083 (0.015) (0.015) (0.013) (0.013) # Rev. 0.131 0.132 0.107 0.108 (0.039) (0.039) (0.030) (0.031) # Prod. 0.0001 0.0001 0.0001 0.0001 (0.0001) (0.0001) (0.0001) (0.0001) # Cust. 0.00001 0.00001 0.00001 0.00001 (0.00001) (0.00001) (0.00001) (0.00001) N 5,879,327 5,878,334 5,879,327 5,878,334 5,879,327 5,878,334 5,879,327 5,878,334 Adj. R 2 0.005 0.012 0.005 0.012 0.004 0.009 0.004 0.009 61 Table A16: SubtractCustomerLoyalty:3-YearsLater This table reports panel regressions of customer loyalty on review length. The unit of observation is reviewer-brand-year. In columns (1)-(4), the dependent variable, F3 Loyal, is a dummy for whether the reviewer leaves at least one review for the same brand in yeart+3. In columns (5)-(8), F3 Excl. Loyal is a dummy for whether the reviewer reviews the same brand and does not review any other brand in yeart+3. Len. Ptile is the mean category-adjusted review length per reviewer-brand-year in yeart as a percentile. Len. T2 and Len. T3 are tercile dummies. Lengths are category-adjusted by subtracting the category median length in yeart. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number or products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F3 Loyal F3 Excl. Loyal (1) (2) (3) (4) (5) (6) (7) (8) Len. Ptile 0.842 0.547 0.719 0.473 (0.130) (0.081) (0.100) (0.068) Len. T2 0.125 0.073 0.109 0.067 (0.024) (0.019) (0.023) (0.021) Len. T3 0.525 0.336 0.448 0.291 (0.081) (0.049) (0.063) (0.042) Price 0.001 0.001 0.0003 0.0003 (0.0005) (0.0005) (0.0004) (0.0004) Stars 0.058 0.057 0.055 0.054 (0.005) (0.005) (0.005) (0.005) # Rev. 0.158 0.159 0.133 0.133 (0.047) (0.047) (0.036) (0.036) # Prod. 0.0003 0.0003 0.0002 0.0002 (0.0002) (0.0002) (0.0001) (0.0001) # Cust. 0.00002 0.00002 0.00002 0.00002 (0.00003) (0.00003) (0.00003) (0.00003) N 3,102,122 3,101,470 3,102,122 3,101,470 3,102,122 3,101,470 3,102,122 3,101,470 Adj. R 2 0.0005 0.008 0.0003 0.008 0.001 0.005 0.001 0.005 62 Table A17: ExclusiveCustomerLoyaltybyStarRating This table reports panel regressions of exclusive customer loyalty on review length by star rating quintile. The unit of observation is reviewer-brand-year. Columns represent subsamples by mean category-adjusted star rating of the brand. The dependent variable, Excl. F1 Loyal, is a dummy for whether the leaves at least one review for the same brand and does not review any other brand in yeart +1. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in yeart. Lengths are category-adjusted by dividing by the category median length in year t. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. F1 Excl. Loyal (1) (2) (3) (4) (5) Len. T2 0.052 0.204 0.166 0.173 0.150 (0.006) (0.050) (0.020) (0.031) (0.015) Len. T3 0.240 0.651 0.446 0.565 0.451 (0.022) (0.104) (0.074) (0.170) (0.053) Price 0.016 0.023 0.102 0.015 0.035 (0.029) (0.061) (0.070) (0.056) (0.046) Stars 0.581 2.602 7.232 1.166 (0.122) (0.354) (0.025) (0.254) # Rev. 0.083 0.110 0.091 0.100 0.093 (0.020) (0.026) (0.022) (0.039) (0.024) # Prod. 0.0001 0.0001 0.0002 0.00003 0.0002 (0.0001) (0.0002) (0.0002) (0.0002) (0.0002) # Cust. 0.00001 0.00001 0.00002 0.00002 0.00001 (0.00000) (0.00001) (0.00001) (0.00002) (0.00001) N 2,630,024 2,631,994 2,631,789 2,631,989 2,631,536 Adj. R 2 0.001 0.014 0.010 0.011 0.009 63 Table A18: SubtractCustomerLoyaltybyStarRating This table reports panel regressions of customer loyalty on review length by star rating quintile. The unit of observation is reviewer-brand-year. Columns represent subsamples by mean category- adjusted star rating of the brand. The dependent variable, F1 Loyal, is a dummy for whether the reviewer leaves at least one review for the same brand in yeart+1. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in year t. Lengths are category- adjusted by subtracting the category median length in year t. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. (1) (2) (3) (4) (5) Len. T2 0.064 0.203 0.167 0.192 0.187 (0.009) (0.052) (0.021) (0.032) (0.026) Len. T3 0.287 0.781 0.514 0.663 0.519 (0.033) (0.125) (0.089) (0.098) (0.068) Price 0.001 0.001 0.004 0.0004 0.001 (0.001) (0.002) (0.002) (0.002) (0.001) Stars 0.129 0.778 0.655 0.344 (0.032) (0.095) (0.335) (0.082) # Rev. 0.105 0.151 0.113 0.123 0.113 (0.027) (0.035) (0.028) (0.032) (0.029) # Prod. 0.0001 0.0001 0.0003 0.0001 0.0002 (0.0001) (0.0002) (0.0003) (0.0002) (0.0002) # Cust. 0.00001 0.00001 0.00002 0.00002 0.00001 (0.00000) (0.00001) (0.00001) (0.00001) (0.00001) N 2,630,077 2,631,941 2,631,795 2,631,984 2,631,535 Adj. R 2 0.003 0.026 0.014 0.015 0.012 64 Table A19: SubtractExclusiveCustomerLoyaltybyStarRating This table reports panel regressions of exclusive customer loyalty on review length by star rating quintile. The unit of observation is reviewer-brand-year. Columns represent subsamples by mean category-adjusted star rating of the brand. The dependent variable, Excl. F1 Loyal, is a dummy for whether the leaves at least one review for the same brand and does not review any other brand in yeart +1. Len. T2 and Len. T3 are tercile dummies of the brand’s mean category-adjusted review lengths in yeart. Lengths are category-adjusted by subtracting the category median length in year t. The control variables are other brand characteristics in yeart. Price is the mean category-adjusted product price, Stars is the mean category-adjusted star rating, # Rev. is the number of reviews, # Prod. is the number of unique products sold, and # Cust. is the number of unique reviewers. All regressions contain brand and year fixed effects and are clustered by brand and year. (1) (2) (3) (4) (5) Len. T2 0.055 0.182 0.147 0.169 0.162 (0.008) (0.043) (0.020) (0.024) (0.023) Len. T3 0.250 0.634 0.444 0.570 0.449 (0.026) (0.099) (0.076) (0.085) (0.054) Price 0.0003 0.001 0.003 0.0004 0.001 (0.001) (0.002) (0.002) (0.002) (0.001) Stars 0.111 0.535 0.731 0.246 (0.024) (0.070) (0.326) (0.054) # Rev. 0.085 0.109 0.091 0.100 0.092 (0.022) (0.025) (0.022) (0.026) (0.023) # Prod. 0.0001 0.0001 0.0002 0.00003 0.0002 (0.0001) (0.0002) (0.0003) (0.0002) (0.0002) # Cust. 0.00001 0.00001 0.00002 0.00003 0.00001 (0.00000) (0.00001) (0.00001) (0.00001) (0.00001) N 2,630,077 2,631,941 2,631,795 2,631,984 2,631,535 Adj. R 2 0.001 0.014 0.010 0.011 0.009 65 Table A20: WeightedCross-SectionalRegressions This table reports Fama-MacBeth cross-sectional regression results weighted by June market cap. Divide is the log of the mean division-adjusted review length where each length is ad- justed by dividing by the category median length. Subtract is the mean subtraction-adjusted review length where each length is adjusted by subtracting the category median length. Con- trols include log market cap, log book-to-market, ROE, asset growth, and R&D-to-market. Columns (5) through (8) include SIC2-Month dummies. The sample spans July 2005 to June 2017. Newey-West Standard errors with 4 months of lag are shown in parentheses. (1) (2) (3) (4) (5) (6) (7) (8) Constant 0.485 0.543 1.715 1.730 1.200 1.274 2.935 3.004 (0.385) (0.374) (1.137) (1.133) (1.129) (1.129) (1.653) (1.667) Divide 0.776 0.665 0.753 0.539 (0.359) (0.314) (0.411) (0.394) Subtract 0.027 0.021 0.022 0.016 (0.012) (0.010) (0.010) (0.010) Controls No No Yes Yes No No Yes Yes SIC2-Month No No No No Yes Yes Yes Yes N 24,793 24,793 24,036 24,036 24,793 24,793 24,036 24,036 66 Table A21: Cross-SectionalRegressionswithProductCharacteristics This table reports Fama-MacBeth cross-sectional regression results controlling for re- view and product characteristics. Divide is the log of mean category-adjusted re- view lengths where each length is adjusted by dividing by the category median length. All regressions include additional firm-level controls: market cap, log book-to- market, ROE, asset growth, and R&D-to-market. The sample spans July 2005 to June 2017. Newey-West Standard errors with 4 months of lag are shown in parentheses. (1) (2) (3) (4) Constant 3.375 1.602 2.362 1.055 (3.565) (2.297) (1.503) (0.913) Divide 0.736 0.831 0.727 0.850 (0.362) (0.375) (0.362) (0.372) Prop. Helpful 11.616 (9.316) (Prop. Helpful) 2 7.245 (6.158) Stars 1.281 (1.206) (Stars) 2 0.147 (0.158) Log Price 0.728 (0.606) (Log Price) 2 0.100 (0.075) Prod. Desc. Len 0.001 (0.008) (Prod. Desc. Len) 2 0.00002 (0.00005) N 23,904 24,036 23,988 23,731 67 Table A22: Cross-SectionalRegressionsControllingforProductTurnover This table reports Fama-MacBeth cross-sectional regression results controlling for product turnover. Divide is the log of mean category-adjusted review lengths where each length is ad- justed by dividing by the category median length. Columns (2) and (4) contain firm-level controls: market cap, log book-to-market, ROE, asset growth, and R&D-to-market. The sample spans July 2005 to June 2017. Newey-West Standard errors with 4 months of lag are shown in parentheses. (1) (2) (3) (4) Constant 0.639 1.050 0.779 1.137 (0.536) (0.953) (0.517) (0.932) Divide 1.187 1.053 0.747 0.784 (0.485) (0.539) (0.354) (0.358) Prop. New Prod. 144.714 145.749 (152.849) (132.897) Divide x (Prop. New Prod.) 186.493 183.627 (221.094) (189.422) Num. New Prod. 0.239 0.221 (0.278) (0.256) Divide x (Num. New Prod.) 0.368 0.346 (0.435) (0.415) Controls No Yes No Yes N 24286 23529 24286 23529 68 Table A23: RealBeta:Subtract This table reports panel regressions of profitability on macroeconomic measures of competition. ROA is firm-level net income divided by total assets in year t + 1. ROE is firm-level net income divided by book equity int + 1 where book equity is defined following Fama and French (1992). Subtract is firm-level average category-adjusted review length (adjusted by subtracting product category median lengths). EigCent is the cross-sectional mean of all firms’ eigencentrality int+1. Markup is the cross-sectional mean of all firms’ industry markup. GDP is percent growth of real GDP int+1. All regressions include firm fixed effects and control for quadratic time trends. The sample spans 2005 through 2017. Firm- and year-clustered standard errors are in parentheses. ROA ROE (1) (2) (3) (4) (5) (6) Subtract 0.007 0.0004 0.0002 0.011 0.001 0.005 (0.003) (0.0004) (0.001) (0.011) (0.002) (0.004) EigCent 0.739 2.397 (0.242) (1.523) Markup 0.012 0.040 (0.004) (0.024) GDP 0.839 4.960 (0.256) (2.513) Subtract:EigCent 0.042 0.070 (0.018) (0.063) Subtract:Markup 0.001 0.001 (0.0003) (0.001) Subtract:GDP 0.003 0.116 (0.015) (0.095) N 2,074 2,074 2,074 2,020 2,020 2,020 Adj. R 2 0.476 0.475 0.486 0.373 0.373 0.379 69 Table A24: FactorLoadings:Subtract This table reports factor loadings for monthly value-weighted long-short quartile portfolio returns sorted by average category-adjusted review length. MARK is a portfolio that buys firms in the highest decile by industry markup and shorts firms in the lowest decile by industry markup. HP is a portfolio that buys firms in the lowest decile of Hoberg-Phillips eigencentrality and shorts firms in the highest decile of eigencentrality. The other factors are the standard Fama-French five factors. Review lengths are adjusted by subtracting the product category median length. The sample spans July 2005 to June 2017. Newey-West standard errors with 4-month lags are reported in parentheses. (1) (2) (3) (4) (5) (6) Constant 0.432 0.434 0.407 0.393 0.397 0.425 (0.220) (0.216) (0.240) (0.222) (0.221) (0.239) MARK 0.139 0.138 0.106 (0.077) (0.084) (0.084) HP 0.108 0.107 0.097 (0.041) (0.045) (0.060) MKT 0.004 0.014 0.004 0.011 (0.054) (0.062) (0.054) (0.055) SMB 0.013 0.011 (0.130) (0.124) HML 0.100 0.011 (0.148) (0.137) RMW 0.060 0.040 (0.148) (0.160) CMA 0.047 0.151 (0.247) (0.269) N 144 144 144 144 144 144 70
Abstract (if available)
Abstract
I examine the impact of having a picky customer base on a firm’s risk and return. Using Amazon.com product review lengths, I construct a proxy for customer pickiness. Picky customers have narrow product tastes and are more likely to be repeat purchasers. Thus, firms with picky customers are less vulnerable to product market competition. They experience smaller decreases in profitability when competition increases, and they earn higher returns (0.53% per month). Moreover, the returns of firms with picky customers positively correlates with the returns of other firms that are insulated from product market competition.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Three essays on macro and labor finance
PDF
Workplace organization and asset pricing
PDF
Disclosure distance and earnings announcement returns
PDF
Accrual quality and expected returns: the importance of controlling for cash flow shocks
PDF
Internal capital markets and competitive threats
PDF
Essays in corporate finance
PDF
Changing fundamental analysis in the new economy: the case of DuPont analysis and STEM firms
PDF
Essays on digital platforms
PDF
Expectation dynamics and stock returns
PDF
Essays on artificial intelligience and new media
PDF
Three essays on behavioral finance with rational foundations
PDF
Managing product variety in a competitive environment
PDF
Essays on the firm and stakeholders relationships: evidence from mergers & acquisitions and labor negotiations
PDF
Understanding the disclosure practices of firms affected by a natural disaster: the case of hurricanes
PDF
Share repurchases: how important is market timing?
PDF
Essays on the effect of cognitive constraints on financial decision-making
PDF
Quality investment and advertising: an empirical analysis of the auto industry
PDF
Insider editing on Wikipedia
PDF
Essays on consumer product evaluation and online shopping intermediaries
PDF
Essays on revenue management with choice modeling
Asset Metadata
Creator
Yang, Louis
(author)
Core Title
Picky customers and expected returns
School
Marshall School of Business
Degree
Doctor of Philosophy
Degree Program
Business Administration
Degree Conferral Date
2021-08
Publication Date
07/23/2021
Defense Date
04/26/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Amazon,competition,customer reviews,empirical asset pricing,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Zhang, Miao (
committee chair
), Jones, Chris (
committee member
), Kilic, Mete (
committee member
), Ogneva, Maria (
committee member
), Tuzel, Selale (
committee member
)
Creator Email
louis.yang321@gmail.com,louiszya@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15616760
Unique identifier
UC15616760
Legacy Identifier
etd-YangLouis-9832
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Yang, Louis
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
customer reviews
empirical asset pricing