Page 305 - Special Topic Session (STS)

Page 305 - Special Topic Session (STS) - Volume 1

P. 305

STS441 David B. et al.
(either flows or stocks). To this end, we derive transactions – a flow-perspective
– from the SHS data by taking the first difference between the monthly
reported stocks. To adjust for the difference in reporting frequencies between
the datasets, we then aggregate transactions in the MiFID data to their
monthly sum, netting purchase and selling transactions. The result are two
transformed datasets (SHS* and MiFID*) that show aggregated monthly
transactions of banks on a security-by-security basis. This leaves us with
764,713 data points. Figure 1 shows the distribution of the differences
between the datasets (red bars and upper axis) and the distribution of
2
normalized differences (blue bars and lower axis). Because of the symmetry
of the distribution around zero, on an aggregate level, the positive and
negative deviations cancel each other out. Thus, both data sources show the
same change in banks’ aggregate stock of equity securities. Turning to the
more granular security-by-security level, we find an exact match of the
transactions for 26% of the data points. For 81% of the data points, the
absolute difference is below EUR 10,000 (the average volume of a transaction
in the SHS* data is EUR 42,685).

Figure 3: Distribution of Differences between the Datasets

Note: The figure shows the distribution of the differences between the
datasets (red and upper axis) and the distribution of normalized differences
(blue and lower axis). For the normalized differences, we exclude differences
of zero.
If the mismatches in Figure 3 have a structural underpinning, we can use
machine learning methods to mine rules for isolating transactions that can be

2 For the normalization, we use the inverse hyperbolic sinus scaling function.
294 | I S I W S C 2 0 1 9

300 301 302 303 304 305 306 307 308 309 310