Text-mining in macroeconomics: the wealth of words

Azqueta Gavaldon, Andres (2020) Text-mining in macroeconomics: the wealth of words. PhD thesis, University of Glasgow.

Full text available as:
Download (4MB) | Preview


The coming to life of the Royal Society in 1660 surely represented an important milestone in the history of science, not least in Economics. Yet, its founding motto, ``Nullius in verba'', could be somewhat misleading. Words in fact may play an important role in Economics. In order to extract relevant information that words provide, this thesis relies on state-of-the-art methods from the information retrieval and computer science communities.

Chapter 1 shows how policy uncertainty indices can be constructed via unsupervised machine learning models. Using unsupervised algorithms proves useful in terms of the time and resources needed to compute these indices. The unsupervised machine learning algorithm, called Latent Dirichlet Allocation (LDA), allows obtaining the different themes in documents without any prior information about their context. Given that this algorithm is widely used throughout this thesis, this chapter offers a detailed while intuitive description of its underlying mechanics.

Chapter 2 uses the LDA algorithm to categorize the political uncertainty embedded in the Scottish media. In particular, it models the uncertainty regarding Brexit and the Scottish referendum for independence. These referendum-related indices are compared with the Google search queries ``Scottish independence'' and ``Brexit'', showing strong similarities. The second part of the chapter examines the relationship of these indices on investment in a longitudinal panel dataset of 2,589 Scottish firms over the period 2008-2017. It presents evidence of greater sensitivity for firms that are financially constrained or whose investment is to a greater degree irreversible. Additionally, it is found that Scottish companies located on the border with England have a stronger negative correlation with Scottish political uncertainty than those operating in the rest of the country. Contrary to expectations, we notice that investment coming from manufacturing companies appears less sensitive to political uncertainty.

Chapter 3 builds eight different policy-related uncertainty indicators for the four largest euro area countries using press-media in German, French, Italian and Spanish from January 2000 until May 2019. This is done in two steps. Firstly, a continuous bag of word model is used to obtain semantically similar words to ``economy'' and ``uncertainty'' across the four languages and contexts. This allows for the retrieval of all news-articles relevant to economic uncertainty. Secondly, LDA is again employed to model the different sources of uncertainty for each country, highlighting how easily LDA can adapt to different languages and contexts. Using a Bayesian Structural Vector Autoregressive set up (BSVAR) a strong heterogeneity in the relationship between uncertainty and investment in machinery and equipment is then documented. For example, while investment in France, Italy and Spain reacts heavily to political uncertainty shocks, in Germany it is more sensitive to trade uncertainty shocks.

Finally, Chapter 4 analyses English language media from Europe, India and the United States, augmented by a sentiment analysis to study how different narratives concerning cryptocurrencies influence their prices. The time span ranges from April 2013 to December 2018 a period where cryptocurrency prices experienced a parabolic behaviour. In addition, this case study is motivated by Shiller's belief that narratives around cryptocurrencies might have led to this price behaviour. Nonetheless, the relationship between narratives and prices ought to be driven by complex interactions. For example, articles written in the media about a specific phenomenon will attract or detract new investors depending on their content and tone (sentiment). Moreover, the press might also react to price changes by increasing the coverage of a given topic. For this reason, a recent causal model, Convergent Cross Mapping (CCM), suited to discovering causal relationships in complex dynamical ecosystems is used. I find bidirectional causal relationships between narratives concerning investment and regulation while a mild unidirectional causal association exists in narratives that relate technology and security to prices.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: macroeconomics, text-mining, uncertainty, machine learning.
Subjects: H Social Sciences > HA Statistics
H Social Sciences > HB Economic Theory
H Social Sciences > HG Finance
Colleges/Schools: College of Social Sciences > Adam Smith Business School > Economics
Supervisor's Name: Nolan, Professor Charles and Leith, Professor Campbell
Date of Award: 2020
Depositing User: Dr. Andres Azqueta Gavaldon
Unique ID: glathesis:2020-81641
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 08 Sep 2020 11:49
Last Modified: 31 Aug 2022 10:10
Thesis DOI: 10.5525/gla.thesis.81641
URI: http://theses.gla.ac.uk/id/eprint/81641
Related URLs:

Actions (login required)

View Item View Item


Downloads per month over past year