NLP for analysis and forecasting of crude oil prices

Gifuni, Luigi (2023) NLP for analysis and forecasting of crude oil prices. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2023GifuniPhD.pdf] PDF
Download (4MB)


The past fifteen years have seen an increasing number of unexpected economic and political changes that have interrupted a prolonged period of low market instability. This has challenged the predictability of macroeconomic leading indicators, due to delays in data being made available. As a result, economists have had a growing interest in considering present day written documents generated by media and government institutions, as a medium to develop new variables that are able to provide additional economic insights in real time.

This PhD thesis departs from this literature and contributes to enhancing the use of text as a valuable source of information for studying the behaviour of monthly crude oil prices. The work comprises three core chapters organised as follows.

In the first study I develop a set of text-based indexes capturing human sentiment and economic uncertainty in the oil market. The text analysis includes the titles and full articles of 138,797 oil related news items which featured in The Financial Times, Thompson-Reuters and The Independent. Empirical experiments show that sentiment indicators readily react to economic and geopolitical events affecting the price of oil, thereby enabling said indicators to accurately predict real oil prices. In contrast, measures of uncertainty hide structural weaknesses and thus yield unreliable oil price forecasts. This work results in a new textbased index that significantly improves the real oil price point forecasts, especially in periods of financial stress, when forecasting matters the most.

In the second essay I investigate the predictability of monthly real oil prices when daily and weekly text data are combined alongside the oil market fundamentals. Text data are retrieved from 140,096 full oilrelated articles which featured in The Financial Times, Thomson Reuters and The Independent. I show that models containing high-frequency financial and commodity variables do not yield significant improvements on the no-change forecast. In contrast, when text data are used along with commodity variables and oil market fundamentals, the preferred models reduce the MSPEs by 18%. However, despite this marginal improvement, gains are low. Indeed, the corresponding models with variables observed at homogeneous frequency, generate similar out-of-sample forecasts in terms of accuracy. I thus conclude that variables sampled at different frequencies do not significantly improve the predictability of monthly real oil prices. This is true for point and density forecasts.

In the final empirical chapter I highlight how oil studies typically assume the correct model specification and thus ignore the problem of estimating overly optimistic confidence sets. This implies that model uncertainty is pervasive in the empirical results. By relaxing this specification assumption, I revisit the role of (i) oil supply, (ii) aggregate demand and (iii) oil-specific demand shock, by proposing the Information Criterion model averaging as a strategy to address the problem of informational deficiency. In this analysis I consider a large macroeconomic panel, modelled with a structural vector autoregression model. The analysis is implemented with real and artificial data, and the non-orthogonalized impulse-response matrix shows that, in contrast to Kilian [2009], oil price response is less persistent after an aggregate demand shock, and more persistent following an oil specific demand shock.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: H Social Sciences > HG Finance
Colleges/Schools: College of Social Sciences > Adam Smith Business School
Supervisor's Name: Korobilis, Professor Dimitris, Tsoukalas, Professor John and Phella, Professor Anthoulla
Date of Award: 2023
Depositing User: Theses Team
Unique ID: glathesis:2023-83938
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 14 Nov 2023 09:51
Last Modified: 14 Nov 2023 11:48
Thesis DOI: 10.5525/gla.thesis.83938

Actions (login required)

View Item View Item


Downloads per month over past year