July 1, 2016
This guide is a step-by-step procedure for the generation of hourly meteorological data at a desired geographical location through the setup, running and post-processing of a mesoscale weather numerical model.
The two specific goals of this project were:
This guide relies on a number of key requirements:
The software referred to in this procedure is under continuous development. It is expected that the instructions contained here are subject to change and relevant to WRF simulations circa 2016.
Neither Klimaat nor ASHRAE warrant that the information in this guide is free of errors. This information is provided
as is without warranty of any kind, either expressed or implied. The entire risk as to the quality and performance of the program and data is with you. In no event will ASHRAE or the program developer be liable to you for any damages, including without limitation any lost profits, lost savings, or other incidental or consequential damages arising out of the use of or inability to use this information.
With all that out of the way, let's get the software installed.
In this section we describe the installation of the required software.
A recent version of the Java runtime environment, e.g. version 7, is required for some of the graphical applications such as
Domain Wizard and
Panoply. It is typically easily installed through a given Linux distribution's package manager. For example, on Debian-based systems (e.g. Ubuntu), it can be installed via
sudo apt-get install default-jre
Your mileage may vary.
The crucial element of this entire process is the STRC UEMS software package. UEMS contains all the functionality of the advanced mesoscale modelling software WRF, yet with a much simplified installation, configuration, and execution. It is maintained by the good people at UCAR, particularly Robert Rozumalski.
Clear and colourful instructions on how to install and use UEMS can be found at the UEMS website.
The four main steps are:
Registering in order to receive a Perl installation script
For the impatient, this involves opening a terminal window and typing:
perl uems_install.pl --install
The main decision will be where to install (e.g.
/home/jeeves/). Make sure you provide an absolute path, i.e. starting with
uems_install.pl will then proceed to download and install approximately 20GB, so ensure sufficient space is available in the chosen installation directory. The largest portion is reserved for world-wide static high-resolution (30 seconds of arc) data such as topography and land-use classes. Coffee time!
UEMS adds some code to your
~/.bash_profile which sets a few environment variables and adds the UEMS scripts to your path. Note that the commands in
~/.bash_profile are not executed until you are running a login terminal. Most new terminals opened on the desktop are not in fact login terminals. Thus, I find it more useful to move the relevant lines into
~/.bashrc, which is typically always executed. The relevant lines in
if [ -f /home/jeeves/uems/etc/EMS.profile ] ; then source /home/jeeves/uems/etc/EMS.profile fi
Move them over to
~/.bashrc and open a new terminal which should activate your environment variables. You can confirm proper configuration by typing the following at the command line:
This should result in something like:
Your mileage may vary.
Confirming installation by running a benchmark simulation. Full instructions can be found in Appendix B (PDF). The steps are essentially
cd $EMS_UTIL/benchmark/27april2011 ems_prep --benchmark --domain 2 ems_run --domain 2
ems_prep will setup the benchmark simulation, indicating success with something like "Your awesome EMS Prep party is complete".
ems_run will run the benchmark, taking a few hours to do so.
Panoply is a Java-based application that plots geo-gridded and other arrays from netCDF, HDF, GRIB, and other datasets. GRIB is the file format of input files to WRF while netCDF is the file format for WRF output files.
An older version of Panoply ships with UEMS and is used by
Domain Wizard when viewing your domains. However, the new Panoply has a number of features that make it worthwhile to download separately. Instructions can be found here.
To be able to run Panoply from anywhere, add an
alias in your
Open a new terminal and type
panoply to test your installation. If successful, you should be able to load the files from your benchmark run.
For the output WRF netCDF files generated,
Open one of the files located in
$EMS_UTIL/benchmark/27april2011/wrfprd. Note that you will not see any files listed until you list files of type
All Files. Unfortunately, WRF generates files with an empty extension, without the expected netCDF
.nc extension. Select one of the available Datasets (e.g. T2), click
Create Plot, click
Create to accept the default plot type, and you should see something like:
The postage-stamp-size rectangle in the above image represents the largest domain (d01) that was solved in the benchmark simulation. You can adjust map properties and projection to zoom into the domain.
git is a
distributed version control system. Installing git will allow you to both download or
clone the source code from github for some of the packages below, plus allow you to keep them up to date. or
fork the source code if you want to make personal changes.
If you do not have it installed, use your favourite Linux package installation method. e.g. on Debian (Ubuntu) systems:
sudo apt-get install git
We mark this installation as optional as for every repository on github, there is an option to kick it ol' school by downloading a ZIP file containing the source.
In order to read the WRF data files, we need to install the netcdf4-python package. You have two paths:
sudo apt-get install python-netcdf4
If that is successful, your
netcdf installation is complete.
python-netcdf4 itself has a number of prerequisites: we need to install
numpy: This is a Python package offering many of the features and delights of Matlab®
cython: This is a C-compiler for Python to make things go fast, fast, fast
libnetcdf4-dev: This provides the API library to read and write the NetCDF files used to store WRF data
To install these packages, on recent Debian-based systems
sudo apt-get install python-numpy cython libnetcdf-dev libhdf5-dev python-dev
Finally, to download
git clone https://github.com/Unidata/netcdf4-python.git
or grab the source code by clicking on
Download ZIP after clicking on
Clone or download at the github repository and unzipping into a suitable directory.
Then installation is via
cd netcdf4-python python setup.py build sudo python setup.py install
The big benefit of using git: when the package is updated, you can simply go to the directory and
pull all the changes
cd netcdf4-python git pull python setup.py build sudo python setup.py install
As a bonus,
python-netcdf4 provides a script called
nc3tonc4 which will compress existing WRF output files by about a third.
UEMS vastly simplifies WRF modelling. We take this one (baby) step further by providing Python code to aide in the splitting up of large simulations in multiple
chunks, and the subsequent collation or
dechunking of the results into a coherent time series.
This software, called
emspy, is hosted on github repository. This repository allows us to respond as new versions of UEMS are released (and to fix the resulting many bugs).
python-netcdf4, you can either
git clone https://github.com/klimaat/emspy.git
Or, just click
Download ZIP at github.
empsy has been simplifed into just two executable Python scripts. You have two ways to operate:
emspy. That is, add the following to your
emspy is now ready to use. Test installation by typing
Which should return the usage and command-line switches for
Warning: This code was written by engineers not professional coders. It is likely to be brimming with bugs. As
emspy evolves, it can be updated via
cd /directory/where/emspy/exists git pull
We also encourage you to peruse and modify the code as you see fit. Suggestions are welcome.
Congratulations, you are done installing the software. Now let's talk about
Your WRF model must be coupled with the output of another three-dimensional model dataset, one with greater area coverage:
For historical simulations, there are a number of re-analysis datasets which are large scale weather models which have assimilated observations and dynamically interpolated observations to a regular grid. Two we recommend and that are pre-configured for use with WRF-EMS are:
UEMS will automatically download files as needed. However, these datasets are enormous: e.g. a single year of NARR or CFSR requires approximately 50GB. This, even after the extraneous variables were culled by the good people at UCAR.
UEMS thus provides an attractive option: the so-called
personal tile method. In this method only a small subset (tile) of data is downloaded, i.e. bounded by the extents of your largest domain and only the variables needed, resulting in files that are 1% of the size. These datasets will be referred to as
Okay, let's start setting up your simulation.
By default, all your
run files are stored in a directory pointed to by the environment variable
$EMS_RUN. This will be setup by default to a
runs directory under the main UEMS installation directory. However, you may want to point this environment variable to another location, perhaps where you have a large network drive. To do this, you will need to add a line below to where you modified the
~/.bashrc script previously
The first step in getting a simulation setup is establishing a series of grids of increasing resolution and decreasing size centered on your chosen latitude/longitude and then interpolating static quantities such as topography, monthly vegetation, and land-use. This is all accomplished through a graphical tool called
First start the Domain Wizard:
You will be presented with a window dialog. Click
Create New Domain
Enter a domain
Description. The Name will be used as a directory name so keep it simple (e.g.
new_york). Use _ as a space. Click
Drag a rectangle, centered around your location, approximately 30°×30°. Size doesn't matter as we will be adjusting it in the next steps.
Type will be now be highlighted. Select
Enter the longitude and latitude of your desired center point under
Centerpoint Lon and
Centerpoint Lat, respectively. At this point you should see something like this:
Update Map and then enter:
Horizontal Dimension X: 70
Horizontal Dimension Y: 70
Grid points distance (km): 36
Geographic data resolution: 10m
This will establish a 70×70 or a 2520km×2520km parent domain.
At this point you can configure your
nested domains, d02, d03, etc.
Nests tab and click
New. Leave all the
Nest Properties at their default settings but enter the following under
This establishes where in the parent, d01, the new nested domain, d02, sits.
The above configuration centers d02 within d01, has the same cell dimension (70×70), but has three times the resolution (12km×12km).
We repeat the above step to generate another domain, d03, of resolution 4km×4km.
Note: At this point, another domain, d04, could be formed of size 1.3km×1.3km. However, each subsequent nested domain, even though of the same 70×70 dimension, requires 3 times as much computational effort. This is due to the requirement that the discrete time step on the finer grids is approximately three times smaller in order to maintain stability (see CFL for more information). If you have the computational power, go for it. As 500-1000m is the limit of conventional WRF modelling, we do not recommend any more than 4 domains in total.
At this point, we click
Next and then
Localize Domain to interpolate all of the static fields, such as albedo and green fraction, onto our domains.
Next at this point brings up the ability to view your domains in Panoply.
Note that the version of Panoply hard-coded into UEMS is a much older version, so the interface is slightly different.
Exit to leave Domain Wizard. You now have a base directory in the
runs directory called
atlanta. You are now ready to prep and simulate your domain. We are now ready to start actually putting the computers to work.
At this point, everything is configured to run; we simply need to split your simulation into a number of
chunks, adjust the default UEMS parameters, prep, and finally run. Thankfully,
emspy provides a script,
ems_chunk.py to make this easy.
To get familiar with this script, type
ems_chunk.py simply requires which domain you want to run and the start and end dates. e.g.
ems_chunk.py atlanta 20000101 20000131
The above will run the Atlanta simulation you setup previously for the month of January 2000. It will split the run into a series of chunks of 3 days in length, with a suitable
spin-up period of 12 hours, and run the chunks in sequence.
By default, it will be use the tiled version of the CFSR dataset but can use NARR. This is selected with the
ems_chunk.py -d narrpt atlanta 20000101 20000131
By default, it will run all the nested domains you have configured. If you have setup, say, d01, d02, d03, d04, but only want to solve up to d03, you can pass the switch
ems_chunk.py -n 3 -d narrpt atlanta 20000101 20000131
ems_chunk.py can be run in
prep-only mode. That is, skip actually running each chunk, in case you want to run them all later, perhaps after you've downloaded all the required tiles. Just pass a
ems_chunk.py -s -d narrpt atlanta 20000101 20000131
Finally, if you try to re-run,
ems_chunk.py will first check the directories to see if there was a chunk has been already prep'd and run and skip them if so. This allows you to repeatedly start and stop
ems_chunk.py> and it will pick up approximately where you left off. If however, you want to force it to prep and run, overwriting whatever work has been done, just add a
ems_chunk.py -f -d narrpt atlanta 20000101 20000131
You can use Panoply to view your wrfout files. They can be found in the
wrfprd directory of each chunk's run directory, e.g.
Also, a log file with some of the gory details is created in the
$EMS_RUN directory labelled e.g.
If you want to start
ems_chunk.py in a terminal (perhaps on a remote machine) and want to later close that terminal, prepend your command with
nohup and append an
&. This will ensure that your simulation is not interrupted. All the typical text output will go to a file called
nohup ems_chunk.py atlanta 20000101 20000131 &
A typical simulation with three domains (d01/36km, d02/12km, d03/4km) will take on order 2 hours to run a single three-day chunk on a standard eight-core linux machine. As there are 122 chunks in a year, that means a single year simulation will take approximately 12 days. If an additional nest is added, (d04/1km), you will see approximately a three-fold increase in total simulation time, requiring approximately a month to complete a year of simulation.
The simulation time can be reduced by throwing more computers at it via a cluster or simple network of computers. While this feature has not been coded in
ems_chunk.py at the moment, it is on the wish list.
The code has been established with a number of fixed, robust configuration parameters to select e.g. radiation schemes. There are more parameterization schemes than you can shake a stick at. UEMS provides well-documented files that are stored in the e.g.
atlanta/conf/ems_run. Explore the various files and options available.
If you find that you need a certain configuration option implemented. You have two options:
ems_chunk.py, manual edit the relevant files. However, you will have to do this every time you create a new domain using
ems_chunk.pyitself to automatically modify the desired config files. See the relevant lines in the code.
At this point you have run a simulation for a period of time and want to
de-chunk. The counter to
ems_dechunk.py. This tool allows you to repeatedly mine your simulation to generate a time series (in CSV format) at various locations in your domain.
The key parameters to
ems_dechunk.py are the domain to operate on, e.g. atlanta, and the location in the domain that you are interested in. Location is specified either through either grid indices (i,j), switch
-ij, or explicit geographical coordinates (latitude, longitude), switch
-ll. If geographical units are chosen, the latitude and longitude snap to the nearest grid point.
For example, the center of the 70×70 Atlanta grid could be generated via
ems_dechunk.py atlanta -ij 35 35
Or via geographical coordinates
ems_dechunk.py atlanta -ll 33.834 -84.329
As WRF outputs data as the centers of cells, for a 70×70 grid, there are 69×69 cells.
The indexing convention is
flipped matrix. That is, (i,j)=(1,1) is the southwest point of the grid, (i,j)=(69,1) is the northwest point, (i,j)=(1,69) is the southeast point, while (i,j)=(69,69) is the northeast point.
By default, the finest nest domain will be extracted. If a coarser grid is desired, perhaps for comparing 1km results to 4km results, simply specify the nest desired with the
ems_dechunk.py atlanta -n 3 -ll 33.834 -84.329
The resulting CSV file, e.g.
atlanta_i35_j35.csv will be located found in the
$EMS_RUN directory. As it stands, a subset of the full UEMS list of variables is exported, see Table 1, though this list is easily expanded (see code). Also, note that all dates and times are Universal Coordinate Time (UTC).
|Screen (2m) drybulb temperature||°C||T2|
|Screen (2m) humidity ratio||g/kg of dry air||Q2|
|Screen (2m) relative humidity||%||RH02|
|Wind speed (10m)||m/s||U10, V10|
|Wind direction (10m)||°||U10, V10|
|Global horizontal shortwave down||W·hr/m²||SWDOWN|
|Snowfall (liquid equiv.)||mm||TACC_SNOW|