Abstract:
For the past 10 years, the peak floating point operation execution rate (floating point operations per second, or FLOPS ) in parallel computers has increased by several orders of magnitude. Their large-scale applications in science rely on larger configuration of the supercomputers which comprises more than a hundred thousand of processors. Development of parallel software has been thought of as time and effort intensive. But the parallel algorithm design involves more than just using multiple processors, it also focuses on its efficiency and scalability. Setting loadbalance tasks' assigning can commonly speedup the running. The GRAPES (Global and Regional Assimilation and PrEdiction System) global model is a semi implicit semiLagrangian numerical prediction model formulated in spherical coordinates. For the parallelism of the GRAPES software system, there are two problems must to be solved, one is how to gather the data when computing the upstream points near the south and north poles along Lagrangian trajectory, and another is how to set the values of periodic boundary latitudinally and of symmetry boundary longitudinally. Due to the convergence of the meridians, the longitudinal grid size decreases toward zero as the poles are approached. So the parallelism near the poles is a tough issue. How to get a better running time at the high performance computer system is a keyissue for the GRAPES software development. In this paper, a new method is proposed, which is based on group message passing of MPI. A more efficient load balance solution of assignment is also discussed. Experiments on the IBMcluster 1600 of Chinese Meteorology Administration (CMA) showed that the parallel algorithm is scalable, efficient and portable. Its computing time cost can be acceptable for realtime weather forecast.