Page 304 - Special Topic Session (STS) - Volume 1
P. 304
STS441 David B. et al.
sources provide different perspectives on the stock market. In this paper, we
show the potential and caveats for integrating heterogeneous statistical and
regulatory data on stock markets.
We focus on two data sources, German Securities Holdings Statistics (SHS)
and data that are collected on the basis of the Markets in Financial Instruments
Directive (MiFID). The datasets differ in three respects. First, whereas the MiFID
data record every trade on German stock exchanges, the SHS data provide a
monthly snapshot of the composition of investments in securities in Germany.
Second, there are differences with respect to the level of granularity of
information on the investor: In the SHS dataset, the investor positions are
aggregated to the level of the economic sector of the investor. Exceptions to
this rule are the reporting banks’ own investments in securities, which are not
aggregated with investments of other actors in the financial sector. The MiFID
data, on the other hand, provide granular information on trades on a
counterparty-by-counterparty level. The third, and most fundamental
difference between the datasets, is that they provide diverse perspectives on
the stock market. Whereas the SHS data show aggregated portfolios of
investors and thus provide a portfolio perspective, MiFID data show flows
between market participants. The datasets are similar in the sense that they
both contain security-by-security information, i.e. the level of granularity is the
same regarding the issuer and the unique identifier of the security.
An integration of the datasets allows us to exploit the strengths of both
data sources: Whereas the SHS data provides a complete picture of banks’
securities portfolios, MiFID data provides more timely and more granular
information on investment decisions. To integrate the datasets, we proceed in
two steps. In the first step, we use machine learning methods for a data-driven
development of a matching algorithm. Specifically, we combine supervised
segmentation of the data with unsupervised association rule discovery. In the
second step, we refine the discovered rules with expert heuristics to develop
a comprehensive set of rules for matching the data. We show that this
approach provides a good performance for the integration of a large subset
of data points. For this subsample, the MiFID data allow us to update the end-
of-month portfolio composition, that banks report to SHS, continuously and
analyse portfolio risks in-between the monthly SHS reporting dates.
The paper is organized as follows. Section 2 outlines our methodology.
Section 3 analyses our results. Section 4 discusses our results and concludes.
2. Data
In this paper, we use data on banks’ own investments in equity securities
(SHS) and banks’ transactions of equity securities (MiFID) from 2014 to 2015.
For a matching between the datasets to be feasible, we first have to transform
the data so that both datasets show the same perspective on the stock market
293 | I S I W S C 2 0 1 9