Page 304 - Special Topic Session (STS) - Volume 1
P. 304

STS441 David B. et al.
                  sources provide different perspectives on the stock market. In this paper, we
                  show the potential and caveats for integrating heterogeneous statistical and
                  regulatory data on stock markets.
                      We focus on two data sources, German Securities Holdings Statistics (SHS)
                  and data that are collected on the basis of the Markets in Financial Instruments
                  Directive (MiFID). The datasets differ in three respects. First, whereas the MiFID
                  data record every trade on German stock exchanges, the SHS data provide a
                  monthly snapshot of the composition of investments in securities in Germany.
                  Second,  there  are  differences  with  respect  to  the  level  of  granularity  of
                  information on  the  investor:  In  the SHS  dataset,  the  investor positions are
                  aggregated to the level of the economic sector of the investor. Exceptions to
                  this rule are the reporting banks’ own investments in securities, which are not
                  aggregated with investments of other actors in the financial sector. The MiFID
                  data,  on  the  other  hand,  provide  granular  information  on  trades  on  a
                  counterparty-by-counterparty  level.  The  third,  and  most  fundamental
                  difference between the datasets, is that they provide diverse perspectives on
                  the  stock  market.  Whereas  the  SHS  data  show  aggregated  portfolios  of
                  investors  and  thus  provide  a  portfolio  perspective,  MiFID  data  show  flows
                  between market participants. The datasets are similar in the sense that they
                  both contain security-by-security information, i.e. the level of granularity is the
                  same regarding the issuer and the unique identifier of the security.
                      An integration of the datasets allows us to exploit the strengths of both
                  data sources: Whereas the SHS data provides a complete picture of banks’
                  securities  portfolios,  MiFID  data  provides  more  timely  and  more  granular
                  information on investment decisions. To integrate the datasets, we proceed in
                  two steps. In the first step, we use machine learning methods for a data-driven
                  development of a matching algorithm. Specifically, we combine supervised
                  segmentation of the data with unsupervised association rule discovery. In the
                  second step, we refine the discovered rules with expert heuristics to develop
                  a  comprehensive  set  of  rules  for  matching  the  data.  We  show  that  this
                  approach provides a good performance for the integration of a large subset
                  of data points. For this subsample, the MiFID data allow us to update the end-
                  of-month portfolio composition, that banks report to SHS, continuously and
                  analyse portfolio risks in-between the monthly SHS reporting dates.
                      The paper is organized as follows. Section 2 outlines our methodology.
                  Section 3 analyses our results. Section 4 discusses our results and concludes.

                  2.  Data
                      In this paper, we use data on banks’ own investments in equity securities
                  (SHS) and banks’ transactions of equity securities (MiFID) from 2014 to 2015.
                  For a matching between the datasets to be feasible, we first have to transform
                  the data so that both datasets show the same perspective on the stock market

                                                                     293 | I S I   W S C   2 0 1 9
   299   300   301   302   303   304   305   306   307   308   309