Page 399 - Contributed Paper Session (CPS) - Volume 4
P. 399

CPS2449 Louisa Nolan et al.
                Figure 2: location of road traffic sensors in England (red dots) and of
                                  the ports analysed (blue pins)




















            2.2 Understanding the characteristics of high-growth companies using
            novel data sources
                The  goal  of  this  project  was  to  use  novel  data  sources  to  explore  the
            characteristics of firms with high growth. We used four data sources: the UK’s
            statistical business register (the inter-departmental business register, IDBR); a
            high-growth  flag  constructed  by  the  Department  for  Business,  Energy and
            Industry Strategy (BEIS) from HMRC VAT data; a dataset from GlassAI, a start-
            up, who shared a random sample of data from 30,000 UK company websites,
            including company descriptions, sectors, mentions, news articles, job adverts
            and bios; and geolocations of UK retail clusters from the Ordnance Survey.
                The IDBR, high-growth flag and GlassAI data were linked, giving a total
            sample of 5,500 companies, of which 8.6% were high-growth.
                Supervised learning classification, using a gradient boosted classifier (GBC)
            was used to identify the features of high growth firms, firstly from the IDBR
            data alone, and then from the IDBR data linked to the GlassAI data.
                Spatial  analysis  was  carried  out  to  investigate  whether  high  growth  is
            related to geographical location in retail clusters, where retail clusters may be
            seen  as  a  broad  proxy  for  density  of  economic  activity.  And  finally,  topic
            analysis was carried out on the GlassAI textual data.

            2.3 Optimus - a tool to turn free text into hierarchical datasets
                Many datasets contain variables that consist of short free-text descriptions
            of items or products. Optimus is a tool developed with the Department for
            Environment,  Food  and  Rural  Affairs  (DEFRA)  to  understand  shipping
            manifests of ferry journeys. The manifests are short, messy text descriptions of
            cargo  on  lorries  boarding  ferries.    The  huge  variation  in  detail,  scale  of
            description  and  how  items  are  recorded  (such  as  incorrect  spellings  or


                                                               388 | I S I   W S C   2 0 1 9
   394   395   396   397   398   399   400   401   402   403   404