Intro to Parameterizations in WRF

An Introduction to Parameterization Options in WRF

This post is the first in a series about parameterization schemes in WRF. We hope they will help demystify the role these schemes play in the model, so that you can make a more informed decision about which schemes to use for your own run. This post provides a very brief overview; future posts will be a more detailed look at specific scheme categories.

To understand parameterizations, we first must discuss what a model does. According to Schwarz et al. (2009), models are representations that explain and predict a natural phenomenon. In the atmospheric sciences, accuracy of the model’s prediction is directly related to how well it represents atmospheric processes, which can be adjusted by choosing the best parameterization schemes for your research question. Every choice made changes the outcome of the model simulation. As a result, parameterization schemes are one of the most important aspects to consider when setting up a computer model.

Unfortunately, choosing the best parameterization schemes is also one of the most difficult steps of the model set up procedure because there is no single “best combination”; it depends entirely on the research question you chose, the location and size of your domain, and the resolution of your domain. Even after considering these options, some choices may still not be clear.

It is not hopeless, though! It is possible to make an informed decision on which parameters to use if you understand how the different schemes influence the model.

What do parameterization schemes do?

In very general terms, here’s how parameterization schemes in WRF work:

From our current understanding of atmospheric processes, atmospheric scientists have derived several equations that describe dynamic and physical processes. In a basic sense, weather is created when there is uneven heating, creating regions of relatively warm and cool temperatures, which causes air to move (or be transported). On a spherical planet with no water or vegetation, this simple model description is probably pretty representative of how the atmosphere behaves. However, on a planet like Earth, more equations are need in order to take into account things like oceans, mountains, ice, plants and animals.

In fact, there are so many different influences and interactions to consider, that many equations are needed. Therefore, parameterization options are generally broken down into categories. In general, parameterizations use mathematical formula (derived from theoretical understanding of atmospheric processes) to calculate values for variables of interest. Stensrud (2007) summarizes most schemes into the categories of: land surface, atmosphere interaction, water-atmosphere interaction, planetary boundary layer and turbulence, convection, microphysics, and radiation. WRF uses similar distinctions, but organizes its parameterization schemes using the following structure:

  1. Physics Options

    1. Microphysics (mp_physics)

    2. Longwave Radiation (ra_lw_physics)

    3. Shortwave Radiation (ra_sw_physics)

    4. Cloud fraction option

    5. Surface Layer (sf_sfclay_physics)

    6. Land Surface (sf_surface_physics)

    7. Urban Surface (sf_urban_physics)

    8. Lake Physics (sf_lake_physics)

    9. Planetary Boundary Layer (bl_pbl_physics)

    10. Cumulus Parameterization (cu_physics)

  2. Diffusion and Damping Options

  3. Advection Options

  4. Lateral Boundary Conditions

As you can see, there are many categories with at least 2-3 scheme options per category. The result is literally hundreds of thousands of potential combinations to choose from.

Advantages and disadvantages of parameterization schemes

The equations that make up parameterization schemes in WRF range from very simple to very complex. In general, the more complex options are much more comprehensive and provide more precise and accurate results. Which begs the question: why not use the most complex, in-depth parameterizations every time, if they yield more precise results? There are several reasons, but the best explanation is: computation time is expensive.

In order to get an accurate representation of the region you are modeling, you want to create a domain that is as large as possible with the highest resolution possible. This means, as many grid points as possible. The problem is that the model solves all the equations at every grid point for every time step, therefore if the model runs for a long time period, for a large domain, or for a high resolution, the computer is doing A LOT of calculations. This can take a very long time to do, or requires lots of processors (which are expensive). Therefore, sacrifices have to be made based on the question being researched.

Figure 1. Visual representation of grid spaces in atmospheric models. Image from UCAR Digital Library.

Climate, weather, and regional climate models

Consider, for example, a climate model versus a weather model. Weather models tend to have complex parameterizations that focus on small-scale processes, therefore also have high resolutions. The goal of weather models is to precisely and accurately forecast day to day weather. Since the resolution is high and parameterizations are complex, the models are run over a very small domain in order to be sure the model run-time is reasonably short. After all, what good is a 24 hour forecast if the model takes 48 hours to run?

Climate models, however, have a different goal. The goal is to accurately indicate trends (rather than specific values) over a very long period of time and a very large domain (often the entire planet). Therefore, parameterization schemes focus on large-scale processes and resolution is generally quite low. In the case of climate models, it is OK if the model takes several days to complete a run, since the model is usually looking at several decades into the future.

Regional climate modeling (as is frequently done using WRF) requires some combination of both processes and is why nesting is such an important part of setting up the model domain. The larger, coarser nest calculates the climate influence on the region (using large-scale, less complex processes) and the smaller, high resolution nest calculates the weather as influenced by the climate for that region (using small-scale, more complex processes). Therefore, parameterization schemes must be chosen with both of these ideas in mind.

Figure 2. Visual representation of Earth’s Climate System. Image from UCAR Digital Library.

Choosing parameterizations based on research questions

Next, consider the following hypothetical research question: Does urban development impact the wind over a wind turbine farm in my domain? In order to choose the best parameterizations for the model run, you have to think about the question being asked. First, the dynamic processes that calculate small-scale wind and speed relating to radiative heating are important and complex, in-depth parameters should be chosen. Similarly, small-scale dynamical processes and accurate representation of orography and urban structures require small grid spaces, so a high resolution is also required. However, small-scale microphysical processes that determine different concentrations of various water particles in the atmosphere is not an important component of this question. So, using a less-complex microphysics scheme will save computing power and likely will not impact the answer to your research question.

Final Thoughts

While choosing the most complex and comprehensive parameterization option may seem like the best choice, it is important to consider the resolution and size of your domain as well as the time period of your model run. When you consider your research question, be sure to ask yourself:

  • What length of time am I interested in studying: long range or short range time scales?

  • What variables am I interested in looking at: are they small-scale or large-scale?

  • Based on my variables of interest and the size of my region, do I need to use a fine resolution or is a coarse resolution ok? Is nesting an appropriate option?

  • Are my variables of interest the result of primarily dynamic processes or microphysical processes?

  • Which parameterization category options are most important? Which are least important?

This is by no means a comprehensive list of questions to consider, but it is a nice place to start. In future posts, we will take a much closer look at several microphysics, cumulus, radiation, and boundary layer options in WRF. At the end of this post series, we hope you will have a much better idea of how these schemes will impact your WRF run.

Until next time,

-Morgan

References

Schwarz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L., Acher, A., Fortus, D., et al. (2009). Developing a learning progression for scientific modeling: Making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching , 632-654.

Skamarock WC, Klemp JB, Dudhia J, et al. A description of the advanced research WRF version 2. NCAR/TN-468+STR 2005.

Stensrud D. J. (2007). Parameterization Schemes: Keys to Understanding Numerical Weather Prediction Models. Cambridge University Press, New York.

The End of p Values?

The end of p values?

This is the second of a two-part post discussing the scientific journal Basic and Applied Social Psychology editor’s controversial decision to ban the use of p-values in their publications. Part one can be found here.

 

 

The Royal Statistical Society (RSS) 2015 Conference featured a debate on the use of p values. The discussant was Dr. David Colquhoun, University College London,. But why was there such a debate at the RSS conference? Earlier this year, the Basic and Applied Social Psychology (BASP) journal decided to ban the use of p values. The reason? Well, one of the reasons was that statistics might be used to support ‘lower-quality research’ (Woolston, 2015). So, the RSS debate was there to promote a discussion around the strict BASP decision and to rethink the use of p values.

 

The RSS 2015 debate was arguably one of the most effective scientific discussions of the year. This is because it showcased both sides of the debate (for and against) in light of sound reasoning. Also, it featured a room packed with some of the best statisticians in the world. An important outcome of the debate was that it highlighted the need for university degrees to teach their students to be more critical to what they learn. This idea makes sense, because the goal of science is to insure all claims are thoroughly critiqued; therefore it is important that all evidence is considered. p values are just one piece of evidence in any story, hence should not be the be-all and end-all for the results of the study. For example, were you taught at your university to consider the possibility that the use of p values might be flawed for the study? Have you learned other alternatives to hypothesis testing? This is an important, more constructive approach to teaching statistics in the science classroom.

 

What is a p value?

In hypothesis testing, p values represents the probability (or likelihood) that the result obtained is ‘more extreme’ than what would normally be observed, given a specific hypothesis. Also, one uses a threshold (significance level) above which it one would not reject the null hypothesis, and below which one would reject the null hypothesis in favor of the alternative one. A threshold of 5% is often used. The use of p values started in the 1770s and their use in statistics were later popularized by Ronald Fisher (Ibid.).

 

The use of p values is best done with an already existing set of large data because generally the goal is to determine if one specific set of data is unusually different from what normally happens. This can be done for either inductive or deductive processes.

 

Inductive and deductive

In all forms of scientific inquiry, there are two ways to draw reasonable conclusions about a topic; that is through a process of either induction or deduction. One of the first and most commonly cited definitions of inductive and deductive reasoning was written by John Dewey in 1910, who states that “…building up the idea is known as inductive discovery; the movement toward developing, applying, and testing, as deductive proof.” (p 244).

 

In other words, induction is about taking evidence already known and attempting to make sense of it. The example Dewey gives is to think of someone who left a room for a period of time and comes back to find objects in the room scattered about in disarray. Inductive logic would be to consider the presence of a burglar. No burglar was observed directly, but with the brief evidence provided, it seems to be the best conclusion. Using meteorology as an example, perhaps you use p values to determine that one particular rain event was statistically more intense than is usual for the area. You may suspect that something influenced this high rain event, but your p value does not tell you what that could be. At this point, you are using your information to establish evidence that something happened, but more research needs to be done to figure out what.

 

Deduction involves testing already existing theories. Therefore, in Dewey’s example, deduction begins once data collection and evidence derivation beings. The person may search their valuables, check windows and doors for entry marks, or perhaps anything unusual in the room that the burglar may have left behind. The idea during deductive logic is to confirm the original theory that a burglar was in the room and (potentially) identify who the burglar was. This is equivalent to using several p values to narrow down the reason for your high rain event in the meteorology example. Perhaps you use p values to explore other atmospheric phenomenon leading up to your intense precipitation event. Maybe wind direction, wind speed, and temperature values were within expected ranges, but dewpoint values were higher than normal. What caused the higher dewpoint values? Further research would have to be done. This process of continually testing and hypothesizing using p values is building evidence and is inductive reasoning.

 

Therefore, the problem with relying too heavily on p values is that when it is used inductively (as one piece of evidence to suggest that something is different) without following it up with further research to determine why the precipitation event is more intense than usual, we can end up with conclusions drawn from poor use of scientific inquiry, which is not how p values were intended to be used. When conclusions are drawn prematurely using p values, it is inevitable that ‘false discoveries’ occur, which can lead to a whole host of new problems.

 

False discovery rate

One of the major issues in the use of p values was pointed out by Colquhoun in his address at the RSS 2015. The use of a threshold of 5% leads to a number of false positive tests (or false discoveries) – leading to a false discovery rate that could be as high as 30% or more (Colquhoun, 2014). So, what threshold should one use? In Colquhoun’s talk, he emphasized that a threshold of 4% (p=0.04) does not mean one has discovered something, it only means that ‘it might be worth another look’ (Colquhoun, 2015). In order to say one has discovered something, a threshold of 0.5% or 0.1% (p value of 0.005 or 0.001, respectively) should be used (Johnson, 2013). A p value of 0.001 (or less) will give a false discovery rate of less than 2% (Colquhoun, 2015).

 

Concluding remarks

The p value discussion is still ongoing, but one thing is certain: we should not take what we learn at university as something for granted. One has to critically evaluate what is learned and keep apace with new ways of thinking. Most importantly, p values should be used in deductive reasoning; as a way to derive a variety of evidence to support a conclusion. Not as a be-all and end-all to draw conclusions. Finally, when working with hypothesis testing, one needs to think about the following points:

  • If you still decide to use p values, use them with caution. Be sure it is one piece of a variety of evidence, not the only piece of evidence.
  • Remind yourself that a 5% threshold has a high false discovery rate, so you would be “wrong at least 30% of the time” (Colquhoun, 2014)
  • Conduct tests at the 0.005 or 0.001 level of significance (Johnson, 2013) and according to Colquhoun (2014): “never use the word ‘significant’”. It is easily confused with meaningful.

 

– Michel and Morgan

 

References:
Colquhoun, D. (2015). “P-values debate.” Retrieved 28 October, 2015, from https://rss.conference-services.net/programme.asp?conferenceID=4494&action=prog_list&session=33652
Colquhoun, D. (2014). "An investigation of the false discovery rate and the misinterpretation of p-values." Royal Society Open Science 1(3).
Dewey (1910): Systematic Inference: Induction and Deduction. How We Think. D.C. Heath & Company.
Johnson, V. E. (2013). "Revised standards for statistical evidence." Proceedings of the National Academy of Sciences 110(48): 19313-19317.
Woolston, C. (2015). Pscychology journal bans P values. Nature, Macmillan Publishers Limited. 519: 9.