Some Verification Trends at the Albany Forecast Office

National Weather Service Forecast Office
Albany, New York

ABSTRACT

During the past several years, there has been an increasing call for the automation of public forecasts issued by the National Weather Service (NWS). This call is the result of national verification statistics which show that, when all forecasts are averaged together, the improvement of NWS forecasters over computer generated forecasts of maximum/minimum temperature and probability of precipitation is small. However, grouping all forecasts in such a manner can hide certain trends which have been evident to field forecasters for many years. Namely, that computer generated forecasts are excellent and hard to beat when the weather is seasonably normal, but field forecasters do much better when the weather is unusual. This paper will present the results of a verification study which shows a significant correlation between abnormal temperature patterns, and the ability of field forecasters at the Albany, New York forecast office to improve upon computer generated forecasts. Not only do the forecasters improve upon the computer generated temperature forecasts, but the probability of precipitation forecasts as well.

1. INTRODUCTION

The National Weather Service (NWS) has produced a Model Output Statistics (MOS) guidance package (Glahn and Lowry 1972) since the early 1970's. For more than two decades, MOS guidance was based on output from the Limited Fine-Mesh (LFM) model (Newell and Deaven 1981) and was known as the FPC guidance (National Weather Service 1983). The FPC guidance quickly became the standard used to measure local forecast performance. Overall, most local forecasters had little difficulty improving upon the FPC forecasts, as was shown by the initial verification results and by NWS AFOS era Verification (AEV) results (Dagostaro 1985). Since the late 1980's, another MOS guidance package has been produced based on output from the Nested-Grid Model (NGM; Hoke et al. 1989) and it is known as the FWC guidance (National Weather Service 1992). Overall, the NGM has been much better than the LFM model, and this resulted in the FWC forecasts being better than the FPC forecasts, once there was a sufficient data base of NGM data to use for MOS equation development. For many years the FPC and FWC guidance packages were produced simultaneously. During much of this time, the FPC guidance remained the standard used to measure local forecast performance. However, in 1993, the FWC guidance became the standard for comparison and the FPC guidance was discontinued shortly thereafter.

Since 1993, AEV results indicate, overall, that local forecasts of probability of precipitation (PoP) have been about the same as the FWC forecasts, and the local 12-hr maximum/minimum temperature (TEMP) forecasts have been a little better than the FWC forecasts. These verification results might appear to suggest that local forecasters add very little additional value to the 6- to 60-hr general public forecasts, and that these forecasts could now be automated through the use of computer worded forecasts (Glahn 1979) based on MOS. However, a verification study was carried out at the Albany, New York forecast office, which shows that local forecasters at Albany significantly improve upon the FWC guidance when large temperature anomalies occur. Of course, the weather is of considerable interest to the general public during periods when the regime is anomalous compared to the average conditions expected at a given time of the year. The public's attention to weather information increases greatly during periods of unusually cold or hot conditions, unusually wet or dry periods, or when major storms approach. This study will show that the NWS forecasters at Albany were able to add considerable value to public forecast products during those periods when unusually cold or hot conditions occurred. In contrast, the ability of local forecasters to make significant improvements to MOS guidance during unusually wet or dry periods was not conclusive, and no effort was made in this study to quantify forecaster improvement over guidance for individual major storm events.

2. DEFINITIONS

PoP and TEMP forecasts were examined for Albany, New York and for Burlington, Vermont by using AEV data for the period of July 1993 through March 1997. For each month, the local forecast improvements over FWC forecasts were determined for PoP, TEMP, and PoP/TEMP forecasts combined. The Frequently and Effectively Departs Significantly (FEDS) score (Maglaras 1991) was used to determine the local forecast improvement over MOS for TEMP, PoP, and TEMP/PoP forecasts combined. This score is based on the premise that one of the most desirable overall verification measures is to determine how frequently local forecasters deviate substantially from MOS, and how effective they are when they do so. Thus, the FEDS score is calculated by multiplying the frequency of significant changes (F), by the improvement over MOS (I) when significant changes are made, and then dividing by ten. To this total, the overall percent improvement over MOS (OI) is then added. Hence:

FEDS = (( F x I ) / 10) + OI

For TEMP forecasts, a significant change is defined as those cases where the local forecast deviated from MOS by 3 0F, or more. For PoP forecasts, a significant change is defined as those cases where the local forecasts deviated from MOS by 20% or more. Forecasters who frequently deviate significantly from MOS guidance, and who are also effective when they do so, will have the highest FEDS scores. Forecasters who do not deviate frequently or who are not effective when they do so, or both, will have lower FEDS scores.

In order to determine how much the weather deviated from normal in terms of temperature, the Average Daily Temperature Departure (ADTD) was calculated for each month. The ADTD is defined as the sum of the absolute values of the daily temperature departures, divided by the number of days in the month,

ADTD = ( n=1 to m (TO - TA) ) / m

where m = number of days in the month TO = the observed daily average temperature TA = the climatological daily average temperature.

The temperature departure from normal for each month can sometimes mask extreme weather changes that occur during the course of the month. For example, at both Albany and Burlington, on a daily basis, the temperature averaged 10 to 15 0F below normal for about the first half of January 1996, and the weather also featured major snowstorms. For the last half of January 1996, the temperature averaged 10 to 15 0F above normal and there were heavy rainfalls accompanied by record, or near-record breaking, flooding. The temperature departure from normal at Albany (Burlington) for the month was zero 0F (+1.2 0F), and masked the extreme nature of the weather that month. On the other hand, the ADTD was 12.4, which confirmed that January 1996 was the second most anomalous month of the 45-month sample in terms of temperature.

An example of the improvements over MOS temperature forecasts by Albany local forecasters during the cold part of the month were the forecasts made from the 1200 GMT cycle on January 5, 1996. For Albany, MOS forecasts were -3 0F, 20 0F, -1 0F, and 20 0F for the first, second, third and fourth periods, respectively. Local forecasts were -10 0F, 14 0F, -5 0F, and 17 0F. The observed temperatures were -19 0F, 6 0F, -6 0F, and 6 0F. For Burlington, MOS forecasts were -10 0F, 13 0F, -7 0F, and 19 0F. The local forecasts were -19 0F, 7 0F, -10 0F, and 15 0F. The observed temperatures at Burlington were -23 0F, 2 0F, -16 0F, and 3 0F. The forecaster on duty this day was able to improve on MOS guidance by a total of 43 0F, or an average of more than 5 0F per forecast period. Even so, MOS was in error by an additional 60 0F for this set of forecasts, or a total of 103 0F. The observed temperatures during this period were about 25 0F to 30 0F below normal.

Large improvements over MOS were also made during the warm part of the month. For example, on January 19, 1996 the high temperature at Albany reached 60 0F, and at Burlington the high reached 65 0F. These temperatures were 30 0F to 35 0F above normal. The second period MOS forecasts that were made for this day from the 1200 GMT cycle on January 18, 1996, were 44 0F and 45 0F, respectively. The corresponding local forecasts were 54 0F and 60 0F. This resulted in an improvement over MOS for each station of 10 0F and 15 0F, respectively. The improved local temperature forecasts were not only significant because they let the public know how unusually warm it would be that day, it also meant that forecasters on duty were anticipating that snow melt would be a significant factor in the record-breaking flooding that would eventually occur on January 19 and 20.

Precipitation anomaly measures were also calculated. For each of the 45 months that comprised the data sample, the Monthly Precipitation Amount Departure from normal (MPAD) was calculated (in percent), and was used to determine the anomaly for precipitation amount. In order to determine how anomalous each month was in terms of the frequency of measurable ( .01 inches) precipitation events, the Monthly Precipitation Frequency Departure (MPFD) was also calculated (in percent).

3. RESULTS

Fig. 1 is a scatter diagram of the combined TEMP/PoP FEDS score and the ADTD for each month in the verification data sample. These results reveal a correspondence between the improvement of Albany local forecasts over FWC guidance, and the departure of temperature from normal. As the ADTD increases (greater temperature anomalies) the local forecast improvement over guidance increased considerably. Past experience with the combined TEMP/PoP FEDS score has shown that a value over 50 was a good score, and a value of 100 or more was an outstanding score. Closer examination of the scatter diagram in Fig. 1 reveals that when the ADTD was six or less, the TEMP/PoP FEDS score averaged slightly more than zero, which indicated little or no overall improvement over MOS. When the ADTD was between six and 10, the TEMP/PoP FEDS score averaged around 35, which indicated a solid improvement over MOS. In addition, with only a few exceptions, forecasters did better than MOS nearly every time. Finally, when the ADTD was 10 or more, the TEMP/PoP FEDS score averaged around 110, which indicated an outstanding improvement over MOS.

Figs. 2 and 3 are the same as Fig. 1, except they show the relationship of the ADTD to the TEMP FEDS score and the PoP FEDS score, respectively. These results also reveal a correspondence between the improvement of Albany local forecasts over FWC guidance, and the departure of temperature from normal. As the ADTD increases, the local TEMP and PoP forecast improvement over guidance increases. However, the increase in the improvement over guidance for TEMP and PoP forecasts, individually, were not as great as the increase for TEMP/PoP forecasts combined.

Fig. 2 also reveals that Albany local TEMP forecasts were as good as or better than MOS nearly all the time, while Fig. 3 shows that for PoP forecasts, local forecasters do better than MOS, only about half of the time.

Based on Figs. 1, 2, and 3, when large temperature anomalies occur, forecasters not only make significant improvements over MOS TEMP guidance, but MOS PoP guidance as well. This is not surprising since temperature, precipitation, and other meteorological variables are not independent. For example, even if the air mass is not very cold, a cloudy and rainy day in the summer will usually result in a daytime maximum temperature that is 10 to 20 0F below normal. On the other hand, a calm, clear, precipitation free night in January when there is snow cover on the ground could result in a nighttime minimum temperature that is 10 to 20 0F below normal. In these scenarios, the abnormal surface temperature readings are the result of interactions with other meteorological variables. Frequently, especially when MOS has not adequately taken into account precipitation, clouds and wind, MOS TEMP forecasts will not do well in such scenarios and local forecasters have a good opportunity to make significant improvements over MOS forecasts. However, before large improvements can be made, local forecasters must also correctly forecast precipitation, clouds and wind. The combined TEMP/PoP FEDS score was used in this paper as an overall measure of forecaster performance. The fact that the combined TEMP/PoP FEDS score is higher than for the individual elements is a reflection of the fact that, overall, when large temperature deviations occur, local forecasters not only provide better forecasts of temperature than does MOS, but, in order to do so, they must also provide better forecasts of other meteorological variables as well. The verification data used in this study shows that this hypothesis is true. On a monthly basis, most of the time, when the local TEMP FEDS score was high and the ADTD was large, the local PoP FEDS score was also high. Conversely, when the local TEMP FEDS score was low and the ADTD was low, the local PoP FEDS score was low.

An analysis of the FEDS score and its relationship to the ADTD, MPAD, MPFD anomaly measures was done by using the Statistical COrrelation and REgression program (SCORE) (Wooldridge and Burrus 1997). The results of this analysis are shown in Table 1. For temperature departures, Table 1 reveals that the correlations of the TEMP, PoP, and combined TEMP/PoP FEDS scores to the ADTD were higher than for any other measure and were 54.6%, 48.2%, and 64.9%, respectively. Also, the statistical correlation between the combined TEMP/PoP FEDS score and the ADTD, which was 64.9% and is shown graphically in Fig. 1, was considerably higher than for the TEMP and PoP FEDS scores, individually. This adds support to the hypothesis discussed in the previous paragraph, that when large temperature deviations occur, local forecasters will usually make significant improvements over MOS for both elements, resulting in a higher combined FEDS score.

We tested the correlations in Table 1 for significance using the F-test. The results of the F-test showed that the correlation of the TEMP, PoP, and TEMP/PoP FEDS score, respectively, to the ADTD were all significant at the 99% level.

For precipitation amount departures, Table 1 shows that the correlations of the TEMP, PoP, and the TEMP/PoP FEDS scores to the MPAD were 27.3%, 12.5%, and 26.0%, respectively. For precipitation frequency departures, the correlations of the TEMP, PoP, and the TEMP/PoP FEDS scores to the MPFD were 8.0%, -34.1%, and -13.3%, respectively. The F-test showed that the correlation of the PoP FEDS score to the MPFD was significant at the 95% level, while all the other correlations were not significant.

The correlation of forecaster improvement over MOS guidance to “abnormal” long-term precipitation patterns was not conclusive in this study. One reason might be that, unlike temperature, precipitation is not a linear variable. As a result, on a daily basis or for a specific forecast period, daily precipitation departures from normal have little meaning and were not calculated. Even on a longer term basis, such as the monthly basis used in this study, the precipitation anomaly measures could produce mixed results. For example, below (above) normal precipitation amounts can occur even when the precipitation frequency is above (below) normal, if most of the precipitation events during the month were light (heavy). Perhaps the use of another precipitation anomaly measure might have produced more conclusive results. However, it is difficult to conceive of any measure that would quantify the true “abnormal” nature of the precipitation pattern on a daily basis, as does the ADTD for the temperature pattern.

4. DISCUSSION

Based on the results of this study, it can be concluded that, overall, local forecasters at the Albany forecast office have little difficulty making significant changes to, and improving on MOS forecasts of both PoP and TEMP during periods of "abnormal" temperature conditions, and that this improvement over guidance increases rapidly with increasing temperature departures from normal. This should come as no surprise since it is a well known fact that MOS guidance has difficulty with rare events or with weather patterns that deviate substantially from climatological normals (Lowry (1980), Murphy and Dallavalle (1984), Maglaras and Carter (1986), and Carter et al. (1989)). Conversely, during periods of "normal" temperature conditions, or during the warm season when deviations from normal generally are much less, local forecast improvements over MOS guidance are reduced.

Lowry (1980), Murphy and Dallavalle (1984), Maglaras and Carter (1986), and Cater et al. (1989) indicated that MOS guidance usually performs well within the range of the average conditions which occurred in the developmental sample. The guidance will show a decreasing trend in accuracy as the weather conditions deviate further and further from this "normal range." Also, this decreasing trend will be more pronounced at latter forecast periods. (As noted in the previous section, for a specific day or forecast period, the idea of “normal range” for a non-linear variable, such as precipitation, is not relevant). These characteristics of MOS will not change, even when future MOS developments occur based on more accurate numerical forecast models.

The findings of this study and the inherent characteristics of MOS guidance leaves the meteorological community with a dilemma. On the one hand, we have an automated system for forecasting (MOS guidance and computer worded forecasts), which performs very well during periods of near normal temperatures and during much of the warm season, and that local forecasters add only little additional value, overall, to the TEMP and PoP forecasts before they are issued to the general public. On the other hand, local forecasters perform much better during periods when the temperature deviates significantly from normal, and the local forecasters add substantial value to the TEMP and PoP forecasts before they are issued to the public.

From one point of view, overall, on a day to day basis, MOS guidance and computer worded forecasts will serve the public well. However, for those periods when the temperature is "unusual" or "extreme", local forecasts will serve them the best. But it is during periods of "unusual" or "extreme" weather that the public's awareness of the forecast is greatly heightened. Thus, local forecasters add significant value to the forecast during those periods when it is most needed by the public. In addition, in order to maintain their proficiency at making large improvements over guidance, local forecasters need to produce PoP and temperature forecasts on a daily basis. If the forecasts for routine situations were delegated exclusively to MOS and computer worded forecasts, the likelihood of forecaster success for periods with anomalous temperature regimes would be diminished considerably. Hence, the apparent trend to migrate towards the automatic generation of most products might need to be reexamined and modified in an appropriate manner.

Future verification work at the Albany forecast office will involve trying to quantify local forecaster improvement over guidance for individual major storm events.

ACKNOWLEDGMENTS

I would like to thank the staff of the National Weather Service office at Burlington, Vermont, for providing much of the Burlington climatological data used in this paper. I would also like to thank Gary Carter of the NWS Eastern Region Headquarters for his many suggestions for improving this paper. Finally, I would like to thank all the forecasters (past and present) at the Albany forecast office who have allowed me, without complaint, to conduct research related to the accuracy of their forecasts for the past several years.



REFERENCES

Carter, G. M., J. P. Dallavalle, and H. R. Glahn, 1989: Statistical forecasts based on the National Meteorological Center's numerical weather prediction system. Wea. and Forecasting, 4, 401-412.

Dagostaro, V. J., 1985: The national AFOS era verification processing system. TDL Office Note 85-9. National Weather Service, NOAA, U.S. Department of Commerce, 47 pp.

Glahn, H. R., D. A. Lowry, 1972: The use of Model Output Statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11, 1203-1211.

_____, 1979: Computer worded forecasts. Bull. Amer. Meteor. Soc., 60, 4-11

Hoke, J. E., N. A. Phillips, G. J. Dimego, J. J. Tuccillo, and J. G. Sela, 1989: The Regional Analysis and Forecast System of the National Meteorological Center. Wea. Forecasting, 4, 323-334.

Lowry, D. A., 1980: How to use and not use MOS guidance. Preprints Eighth Conference on Weather Forecasting and Analysis, Denver, Amer. Meteor. Soc., 11-12.

Maglaras, G. J., and G. M. Carter, 1986: How to use MOS guidance effectively. Preprints Eleventh Conference on Weather Forecasting and Analysis, Kansas City, Amer. Meteor. Soc., 17-22.

Maglaras, G. J., 1991: A new verification scheme. Eastern Region Technical Attachment No. 91-7B, National Weather Service, NOAA, U.S. Department of Commerce, 5 pp.

Murphy, M. C., and J. P. Dallavalle, 1984: An investigation of MOS minimum temperature errors in North and South Dakota during December 1982. TDL Office Note 84-16, National Weather Service, NOAA, U.S. Department of Commerce, 14 pp.

National Weather Service, 1983: The FOUS12(FO12) bulletin. NWS Technical Procedures Bulletin No. 325, NOAA, U.S. Department of Commerce, 12 pp.

National Weather Service, 1992: NGM-based MOS guidance - the FOUS14/FWC message. NWS Technical Procedures Bulletin No. 408, NOAA, U.S. Department of Commerce, 8 pp.

Newell, J. E., and D. G. Deaven, 1981: The LFM-II Model-1980. NOAA Tech Memorandum NWS NMC-66, NOAA, U.S. Department of Commerce, 20 pp.

Wooldridge, M., and S. Burrus, 1997: SCORE. NOAA Eastern Region Computer Programs NWS ERCP-27MC, National Weather Service, NOAA, U.S. Department of Commerce, 61 pp.