Tuesday, November 5, 2013

WPAC 2013 -- GFS-based Models v ECMWF

 TC Performance of GFS-based US Models v ECMWF 

2013 WPAC season

Mike Fiorino, NOAA ESRL, Boulder CO 
05 November 2013





Now the that 2013 in the western North Pacific (WPAC) basin is nearly over, I've prepared preliminary error statistics for the tropical cyclone (TC) forecasts of three GFS-based US models v the ECMWF HRES (the 10-d, High-RESolution deterministic run of the IFS).  

These results are preliminary for two reasons:
  1. as of time of this writing two systems are active:  30W and 31W (haiyan).  31W is the 7th typhoon to undergo 'ED' - explosive deepening which is defined as a 50 kt change in 24 h.  When you add in RI - rapid intensification (30 kt / 24 h) storms, 11 of 31 typhoons have made large intensity changes.  Further,  with the recent promotion of 31W, there have been 5 'supertyphoon' -- max winds >= 130 kt
  2. I use the 'working' best track for verification vice the 'final' best track.  The final, post-season reanalysis of the fix data that makes the true or final best track is typically not available until around Feb/Mar for NHC whereas JTWC finalizes the best track during the season.
The models considered are (ATCF 4-char name in caps):
  • HWRF - a three-grid, high-resolution limited-area model that use the GFS for initial and lateral boundary conditions. the grids have a horizontal resolution (dx) of 27:9:3 km
  • AVNO - the GFS global model dx~21 km (a spectral model formerly know as the 'aviation' run)
  • FIM9 - the ESRL global model dx=15 km (finite-volume, flow-following icosahedral dynamical core)
  • EDET - ECMWF HRES, dx~15 km (a spectral model with 129 layers)


Table 1. test of tables from google docs
a
b
e
d

from google doc + snagit grab
First consider the mean forecast (track) error, defined as the great-circle distance between the best track and forecast position, but before looking at the stats, the verification system needs to be reviewed.   

There were cases of tracker failures (inability to accurately locate the model TC) that had to be eliminated to avoid skewing the statistics.  These failures are obvious when plotting the tracks, as is done in operations, and are few in number in WPAC.  Details are given in appendix A.

Figure 1.  WPAC 2013 mean forecast error [nmi]




There are no clear winners in that no single model has lower numbers at all forecast times or 'taus'.  For the longer taus 96 and 120 h (day4 D4 & D5) there is greater separation because of fewer storms going into the mean --  one bad storm can ruin the mean.  We'll dig into the details later...

A second measure of forecast performance is storm 'intensity' or a comparison of the model max surface wind speed (Vmax) to 'observed' from the (working) best track.  The standard metric for intensity is the mean absolute error (no directionality), but from a modeling perspective the mean error itself, i.e., the bias, may be more important.  Although bias is not commonly displayed, I do below:


Figure 2.  WPAC 2013 intensity or Vmax eror: lines are the mean absolute error and the bars the mean error or bias
The bars show bias and the lines the mean absolute error.  In some ways the numbers are typical but other not:
  • HWRF has the highest spatial resolution (3km) and takes great care in initializing the model vortex.  Consequently, the model has near zero initial bias and shows only a slight over intensified bias out to 120 h (5 kt). More remarkable is the mean abs error of 15 kt at 72 h.  One of the best statistical-dynamical intensity aids is LGEM (Logistic Growth Equation Model) and this aid serves as a baseline for performance.  In a head-to-head comparison at 72 h: HWRF: 16|1 kt v LGEM: 19|-12 kt [MM|BB where MM is the mean abs error and BB is the bias].  A 15 kt mean abs error is simply excellent and these statistics are among the best I have ever seen for any model, numerical or statistical, in WPAC.
  • The initial ECMWF mean|bias is 20|-18 kt.  These statistics imply a very poor analysis of the cyclone intensity.  The bias does decrease in time and is only -3 kt at tau 120 h.  This drop in bias is typical for the ECMWF model and in the past there were cases where the bias goes positive at 120 h.  The basic conclusion is that the ECMWF data assimilation system does a (very) poor job in analyzing the TC vortex.
  •  The two GFS-based global models: AVNO and FIM9 both under analyze and under predict TC intensity.  The mean abs error is about 5 kt higher than HWRF at all taus and most of the error comes from a negative (weak-storm) bias.
  • Large intensity errors do not imply large track errors.  The key result of my PhD work at the Naval Postgraduate School (Fiorino and Elsberry 1989) was that the TC inner-core, where the intensity change occurs, does not affect the dynamics of vortex motion, i.e., the scales that dominate the motion process are much larger the the TC inner core (~50 km).  Nonetheless, the large intensity error in the ECMWF model may reflect vortex structure errors on larger scales.  We will explore this relationship in a separate blog...
  • FIM9 with higher horizontal resolution than the GFS (AVNO) has less bias than AVNO despite the tracker using the same grid (0.5 deg global).  The higher model resolution does permit more intense cyclones and if we tracked at the native resolution, the bias would probably be less...
One way to compensate for large intensity errors in the numerical models when forecasting, is to apply statistical post-processing.  The scheme used at both JTWC and NHC is to calculate an initial 'offset' and then add a portion of the inital offset to the forecast.  The offset is defined as the intensity analyzed in operations minus the model initial intensity.  This operational intensity is generally not the same as the working best track, but since all intensity is analyzed to the nearest 5 kt, does not make much of a practical difference, but does result a some initial intensity error.  For global models, I apply 100% offset at tau 0 and 0% at tau 72 h, with a linear variation between 0-72 h.  The limited-area models set the 0% at tau 24 h.

Here are the bias-corrected statistics:

Figure 3. WPAC 2013 intensity or Vmax eror with bias correction. lines are the mean absolute error and the bars the mean error or bias
The big initial under bias for ECMWF has been eliminated and the correction has greatly reduced the size of the bias from tau 0-48 h and the mean abs errors. Still, the global model errors are higher the HWRF (and LGEM, not shown).














Appendix A - details of the stats




  
Figure 1 plots these numbers

                      000    012    024    036    048    072    096    120  
           HWRF      10.1   31.0   43.7   58.0   74.9  106.2  165.4  276.9
           AVNO      13.5   27.0   39.5   54.8   76.2  113.5  172.6  237.5
           FIM9      14.7   25.5   40.6   56.8   74.9  110.9  165.1  212.3
           EDET      22.0   27.2   39.2   55.5   68.4  108.5  178.2  247.3
           CONW       7.5   28.3   45.0   63.5   78.8  118.6  155.9  220.2
         #CASES       206    189    172    151    132    95     60     36    
 #Tossed( HWRF)       2      1      1      1     
 #Tossed( AVNO)       1      1      1      1     
 #Tossed( FIM9)       1      1     
 #Tossed( EDET)       1      2      2     

#Tossed is the number of cases removed at that forecast tau

model runs that were filtered:

BE filter Cases for: HWRF
stmid     dtg         tau     BE[nmi]
02W.2013  2013022018   12     221
14W.2013  2013083100   12     253

BE filter Cases for: AVNO
stmid     dtg         tau     BE[nmi]
14W.2013  2013082912   12     203

BE filter Cases for: FIM9
stmid     dtg         tau     BE[nmi]
02W.2013  2013022100    0     173
02W.2013  2013022100   12     335

BE filter Cases for: EDET
stmid     dtg         tau     BE[nmi]
08W.2013  2013071700    0      190
08W.2013  2013071612   12     217

adsfasdf

Friday, November 1, 2013

FIM9 2013 LANT/EPAC

FIM9 Track Performance during 2013 LANT/EPAC seasons

Mike Fiorino, NOAA ESRL, Boulder CO 




Craig Mattocks, the new TECHDEV lead at NHC (I was in this position from 2006-2009) made an interesting comment in an email of 25 Oct 2013:

... "I have heard good things from the Hurricane Specialists this year about the FIM(9), especially the track accuracy."

To see if the 'good things' can be seen in the standard metric of track skill, I looked at the  errors in the atLANTic (LANT) and Eastern north PACific (EPAC) using the working best tracks.
2013 atLANTic mean forecast ('track') error for HWRF, GFS, FIM9

For forecast times (as they say in .mil 'taus') 0-72 h, FIM9 did have slightly higher errors than the GFS.  Note the considerably poorer performance of HWRF which is telling because all three models used the exact same initial (and lateral boundary for HWRF) conditions.  The errors at tau 120h only represent the performance for one storm (09L - HU Humberto). 

At tau 72 h there are more storms:
2013 LANT tau 72 storm-by-storm mean forecast error [nmi]


Most of the GFS improvement over the FIM9 at tau 72 h comes from storm 11L - TS Jerry (7 cases), whereas the FIM9 was better for storms 04L, 05L and 10L with 1,3,4 cases respectively.  

The statistics don't show a clear advantage of FIM9 over the GFS in the LANT, yet the hurricane specialists were impressed(?).  Do we need a better way to measure 'good things.' 

The results in the EPAC are different:

2013 EPAC mean forecast error [nmi] for HWRF, GFS, FIM9


Here we find the FIM9 has lower errors than both HWRF and GFS at all taus.  Also note how the GFS errors grow faster than in FIM9.  One speculation is that while both GFS and FIM9 share a common set of physics, their dynamical cores are substantially different. The precipitation fields in my TCgen site http://ruc.noaa.gov/hfip/tcgen look different with GFS generally having more intense and smaller-scale precipitation events.

It's also impressive that HWRF has lower errors at the longer taus (72-120h) compared to its host global model the GFS.  This error growth is counter intuitive in that the lateral boundaries in HWRF should cause greater errors.  Further investigation is needed, but given the better errors for FIM9 maybe it is in EPAC that the specialists are impressed...

For completeness here are the mean tau 72 errors by storm:
2013 EPAC storm-by-storm tau 72 mean forecast errors [nmi]



Storms 03E (5 cases) and 05E (3 cases) are noteworthy in that FIM9 has lowest errors for 03E but higher errors for 05E

Let's combine the basins for stats in EPAC & LANT:

EPAC & LANT 2013 mean forecast errors [nmi] HWRF v GFS v FIM9

No comparison is complete without including the 'gold standard' of TC track prediction - ECMWF HRES (this is how ECMWF refers to their hi-resolution deterministic run).  There were 8 cases of serious errors in the ECMWF trackers - failure to track the observed storm initially and making large track changes in the first 24 h (with implied speed of motion > 100 kt).  The main problem storm was 02L in which the tracker jumped into the EPAC from the Bay of Campeche.  These errors will be the subject of an upcoming post...but my verification code had to be improved to toss out these cases.  Fortunately no cases > 48 h were removed and most of the bad tracks were for tau 0-24.

For EPAC/LANT combined:
EPAC & LANT 2013 mean forecast error HWRF v GFS v FIM v ECMWF


 ECMWF is clearly the gold standard at tau 48-120 h

The results in the LANT are in some ways more dramatic:

LANT 2013 mean forecast error [nmi] HWRF v GFS v FIM9 v ECMWF
The large initial position errors and the relative poorer performance for tau 0-24 are explained by deficiencies in the ECMWF tracker and no special treatment of TCs in their data assimilation system.  However, the stats at tau 48 and 72 are hugely better than the US models.  A mean forecast error of 90 nmi may be a record low.  When I started in this business in the late 1970s the typical 72-h mean error was around 350 nmi!!!!

In EPAC we find a similar pattern - ECMWF better at the medium and long-range taus (72-120 h)
EPAC 2013 mean forecast error [nmi] HWRF v GFS v FIM9 v ECMWF