Page 285 - Invited Paper Session (IPS) - Volume 1
P. 285
IPS153 Christine B. et al.
Until recently, migration flows were largely based on the responses
provided by travellers on their passenger cards when crossing the border. The
intention to stay in NZ (or leave NZ) for 12 months or more, as reported by
the traveller, was used to determine whether the border-crossing was a
migrant crossing. This measure allowed New Zealand to produce some of the
most timely migration statistics in the world. However inter-censal
discrepancies, and analysis involving total traveller in-flows and out-flows
showed that the measure was generating inaccuracies in the migration
estimates, particularly between 2001 and 2006, where net migration was likely
under-estimated by approximately 50,000 (Stats NZ 2017b) over the five years.
This led to the development of an outcomes-based measure of migration,
where, rather than a traveller’s stated intentions, the amount of time they spent
in or out of New Zealand was used to determine migration status. Using
passport data acquired as travellers cross the New Zealand border, a travel
history of almost every traveller is created. This gives us a longitudinal ‘register’
of travel histories for individuals. By applying a classification rule to these travel
histories, the migrant status of any given border-crossing can be classified. In
New Zealand, the particular classification method is called the ‘12/16 month
rule’ (Stats NZ 2017b). For example, an arrival who has been out of the country
for at least 12 months in the 16 months prior to their border-crossing, is
classified as a migrant arrival if they spend 12 months or more in New Zealand
over the 16 months following the border-crossing. This method estimates
migration more accurately, as we are no longer reliant on travellers’ self-
reported intentions.
As with any administrative data, identity resolution is a challenge, especially
given the number of border-crossings that we must resolve. Currently, our
longitudinal administrative dataset extends back to 2013, with over 65 million
border-crossings that require identity resolution to ascertain border-crossings
by the same traveller. On average, 40,000 border-crossings are added each day
to this dataset. Within these border-crossings, we estimate over 17 million
unique travellers, based on data available on passport records since 2013. The
resolution of these individuals provides the basis of a register of travellers.
While the outcomes-based measure provides more accurate measures of
migration, it requires a wait of 16 months before migration levels are known
with certainty. Official external migration estimates have been published
monthly, within a month of the reference date. To maintain timeliness with the
12/16 method, we have recently developed a predictive classification model,
which provides provisional estimates of migration for the latest 16 months.
This is a difficult prediction problem because there is a large class
imbalance between overall traveller numbers and migrants (who make up less
than 2% of all border-crossings). We have employed a machine learning
approach, learning at the unit record level, combined with a multiple
274 | I S I W S C 2 0 1 9