Page 240 - Invited Paper Session (IPS) - Volume 1
P. 240

IPS 151 Rub´en C.

                  The execution times of the different implementations are shown in tables 5
                  and 6.

                    Table 5: Execution Time of Sequential    Table 6: Execution Time of
                    Version and pthread Version              Distributed and     GPU Version
                    Version              Time (Seconds)      Version          Time (Seconds)
                    Secuential                 4             Distributed - 96           0.047

                    Parallel - 2 Threads       7              Processes
                    Parallel - 3 Threads       8             GPU - 9,999           0.14
                    Parallel - 4 Threads       8              Processes V1

                    Parallel - 5 Threads       7             GPU - 9,999         0.000013
                    Parallel - 6 Threads       8              Processes V2

                    Parallel - 7 Threads       8

                      Version V1 of the implementation with GPU considers the transfer time of
                  the  Pearson  coefficients  from  the  card  to  the  computer  memory,  whereas
                  version V2 only considers the processing time on the card. In the distributed
                  version, the maximum number of bootstrap iterations that some processors
                  perform is 15 and the reported execution time is only local processing.

                  4.  Discussion and Conclusion
                      We have presented the parallel and distributed implementations of the
                  bootstrap method for the calculation of Pearson correlation coefficients and
                  we  observe  that  the  execution  times  obtained  depend  strongly  on  the
                  available infrastructure, which in this case is of accessible cost.
                      The  implementation  with  the  lowest  performance  is  presented  by  the
                  implementation with pthread due to the limitations of the number of available
                  cores and in addition to the bottle-neck of the interface with the computer
                  memory.  In  the  case  of  distributed  implementation,  it  provides  adequate
                  performance, and it can process much larger problems. The imputation with
                  GPU presents the best performance, but has the limitation of the memory
                  capacity of the graphics card.
                      Today it is possible to implement hybrid architectures with computers with
                  more than one graphics card and/or computer clusters with these devices, so
                  it is possible to solve instances of bigger problems with adequate execution
                  times.




                                                                     229 | I S I   W S C   2 0 1 9
   235   236   237   238   239   240   241   242   243   244   245