Set Up Your Own ERDDAP
Why? | Initial Setup | Update | Need To Know | Don’t Need To Know |
Programmer’s Guide | List of Changes | Credits | License | Contact
And see the related document Working with the datasets.xml File
ERDDAP is an all-open source, all-Java (servlet), web application that runs in a web application server (for example, Tomcat). This web page is mostly for people (“ERDDAP administrators”) who want to set up their own ERDDAP installation at their own web site.
Why use ERDDAP to distribute your data?
Because the small effort to setup ERDDAP brings many benefits.
-
If you already have a web service for distributing your data,
you can set up ERDDAP to access your data via the existing service.
Or, you can set up ERDDAP to access your data directly from local files.
-
For each dataset, you only have to write a small chunk of XML to tell ERDDAP how to access the dataset.
-
Once you have ERDDAP serving your data, end users can:
-
Request the data in various ways (DAP, WMS, and more in the future).
-
Get the data response in various file formats. (That’s probably the biggest reason!)
-
Make graphs and maps. (Everyone likes pretty pictures.)
You can customize your ERDDAP’s appearance so ERDDAP fits in with the rest of your web site.
Is the installation procedure hard? Can I do it?
The initial installation takes some time, but it isn’t very hard. You can do it. If you get stuck, email me at bob dot simons at noaa dot gov . I will help you.
How To Do the Initial Setup of ERDDAP on Your Server
ERDDAP can run on any server that supports Java and Tomcat (and perhaps other application servers). ERDDAP has been tested on Linux, Mac, and Windows computers.
-
Set up Java.
Type “java -version” from your server’s command line to make sure you have Java (JRE or JDK) version 1.7 installed. For security reasons, it is almost always best to use the latest version of Java. ERDDAP works with 32 bit or 64 bit Java. 64 bit is preferred for 64-bit operating systems. On Linux, we recommend that you download and install Java even if your computer came with Java installed. This lets you be in control of which Java you have and where it is (usr/local?). To install Java on Linux, see these instructions.
This version of ERDDAP will work with Java 1.6 (AKA 6); however, we recommend against using Java 1.6. Java 1.6 is past its official end of life and so is no longer supported by Oracle. As Oracle’s Java SE 6 Downloads web page says: “WARNING: These older versions of the JRE and JDK are provided to help developers debug issues in older systems. They are not updated with the latest security patches and are not recommended for use in production.”
-
Set up Tomcat.
For security reasons, it is almost always best to use the latest version of Tomcat. Below, the Tomcat directory will be referred to as tomcat. Warning! If you already have a Tomcat running some other web application (especially THREDDS), we recommend that you install ERDDAP in a second Tomcat, because ERDDAP may need different settings and shouldn’t have to contend with other applications for memory.
-
Follow the instructions at http://tomcat.apache.org/ to set up Tomcat on your server.
On Linux, we recommend installing it in /usr/local.
-
Security recommendation: you can remove some possible avenues for attackers by following these instructions (especially the article’s second recommendation) to disable certain HTTP methods (TRACE, PUT, OPTIONS, DELETE) in Tomcat. (background information)
-
On Linux and Macs, Tomcat is often set up as belonging to user “tomcat”. Set up that account now. From the parent of the apache-tomcat directory, type
chown -R tomcat apache-tomcat-7.0.32
chgrp -R tomcat apache-tomcat-7.0.32
(but substitute the actual name of your tomcat directory) right after unpacking Tomcat. Do most of the rest of the setup instructions as user “tomcat”. Later, run the startup.sh and shutdown.sh scripts as user “tomcat” so that Tomcat has permission to write to its log files.
-
On Linux and Mac, create a file tomcat/bin/setenv.sh (or in Red Hat Enterprise Linux (RHEL), edit ~tomcat/conf/tomcat7.conf) to set environmental variables. This file will be used by tomcat/bin/startup.sh and shutdown.sh. The file should contain
export JAVA_HOME=/usr/local/jre1.7.0_09
export JAVA_OPTS='-server -Djava.awt.headless=true -Xmx1500M -Xms1500M'
export TOMCAT_HOME=/usr/local/apache-tomcat-7.0.32
export CATALINA_HOME=/usr/local/apache-tomcat-7.0.32
(but substitute the directory names from your computer).
(If you previously set JRE_HOME, you can remove that.)
On Macs, you probably don’t need to set JAVA_HOME.
The -Xmx and -Xms memory settings are important because ERDDAP works better with more memory. Always set -Xms to the same value as -Xmx.
-
For 32 bit Operating Systems and 32 bit Java:
The more physical memory in the server the better: 4+ GB is really good, 2 GB is okay, less is not recommended. With 32 bit Java, even with abundant physical memory, Tomcat and Java won’t run if you try to set -Xmx much above 1500M (1200M on some computers). If your server has less than 2GB of memory, reduce the -Xmx value (in ‘M’egaBytes) to 1/2 of the computer’s physical memory.
-
For 64 bit Operating Systems and 64 bit Java:
64 bit Java will only work on a 64 bit operating system. To enable 64 bit Java, add -d64 to the list of JAVA_OPTS in startup.sh and shutdown.sh, for example,
export JAVA_OPTS='-server -Djava.awt.headless=true -Xmx8000M -Xms8000M -d64'
With 64 bit Java, Tomcat and Java can use very high -Xmx and -Xms settings. The more physical memory in the server the better. As a simplistic suggestion: we recommend you set -Xmx and -Xms to (in ‘M’egaBytes) to 1/2 (or less) of the computer’s physical memory. You can see if Tomcat, Java, and ERDDAP are indeed running in 64 bit mode by searching for ” bit,” in ERDDAP’s Daily Report email or in the bigParentDirectory/logs/log.txt file (bigParentDirectory is specified in setup.xml).
-
On Linux and Macs, change the permissions of all *.sh files in tomcat/bin/ to be executable by the owner, e.g., with
chmod +x *.sh
-
Fonts for images: We strongly prefer the free Vera Sans fonts to the standard Linux/Java fonts. Installing these fonts isn’t required.
If you don’t install these fonts, you need to change the fontFamily setting in setup.xml to <fontFamily>SansSerif</fontFamily> .
To install the fonts, please download BitstreamVeraSans.zip (344,753 bytes, MD5=E16AF0C9838FD2443434F6E0E9FD0A0D) and unzip the font files to a temporary directory.
-
On Linux (as the root user) and Windows XP (as the administrator), copy the font files into JAVA_HOME/lib/fonts so Java can find the fonts. Remember: if/when you later upgrade to a newer version of Java, you need to reinstall these fonts.
-
Test your Tomcat installation.
-
Linux:
As user “tomcat”, run tomcat/bin/startup.sh
View your URL + “:8080/” in your browser (e.g., http://coastwatch.pfeg.noaa.gov:8080/).
You should see the Tomcat “Congratulations” page.
If there is trouble, see the Tomcat log file tomcat/logs/catalina.out.
-
Troubles with the Tomcat installation?
-
See the Tomcat log file tomcat/logs/catalina.out. Tomcat problems are almost always indicated there.
-
See the Tomcat web site or search the web for help, but please let us know the problems you had and the solutions you found.
-
Email me at bob dot simons at noaa dot gov . I will try to help you.
-
Set up the tomcat/content/erddap configuration files
On Linux, Mac, and Windows, download erddapContent.zip (version 1.46, size=22,268 bytes, MD5=FE827B9C411ECAC535F8C65392A5CDFD) and unzip it into tomcat, creating tomcat/content/erddap .
For Red Hat Enterprise Linux (RHEL), unzip it into ~tomcat and set the system property erddapContentDirectory=~tomcat/content/erddap in ~tomcat/conf/tomcat7.conf so ERDDAP can find the directory.
Then,
-
Read the comments in tomcat/content/erddap/setup.xml and make the requested changes. setup.xml is the file with all of the settings which specify how your ERDDAP behaves.
-
Read the comments in Working with the datasets.xml File, then modify the XML in tomcat/content/erddap/datasets.xml to specify all of the datasets you want your ERDDAP to serve.
-
(Unlikely) Now or (slightly more likely) in the future, if you want to modify erddap’s CSS file, make a copy of tomcat/content/erddap/images/erddapStart.css called erddap.css and then make changes to it. Changes to erddap.css only take affect when ERDDAP is restarted and often also require the user to clear the browser’s cached files.
After you edit the .xml files, it is a good idea to verify that the result is well-formed XML by pasting the XML text into an XML checker like RUWF.
In the unusual situation where you aren’t allowed to modify the Tomcat directory, you can put the ERDDAP content directory somewhere else (e.g., /usr/local/erddap). To let ERDDAP know where it is, set the system property erddapContentDirectory=/usr/local/erddap/content/erddap/ (or wherever it is). If you aren’t allowed to set this property in startup.sh, perhaps you can set it in Tomcat’s context.xml.
-
Install the erddap.war file.
On Linux, Mac, and Windows, download erddap.war (version 1.46, size=482,115,063 bytes, MD5=26957D4A6866F5DBAF3238E7A39BD0FE) into tomcat/webapps . The .war file is big because it contains high resolution coastline, boundary, and elevation data needed to create maps.
-
Use ProxyPass so users don’t have to put :8080 in the URL.
On Linux computers, if Tomcat is running in Apache, you need to modify the /etc/httpd/conf/httpd.conf file to allow HTTP traffic to/from ERDDAP:
-
Add the following lines right before #<IfModule mod_proxy.c> :
ProxyPass /erddap http://www.YourServer.org:8080/erddap
ProxyPassReverse /erddap http://www.YourServer.org:8080/erddap
-
Then restart Apache: /usr/sbin/apachectl restart (but sometimes it is in a different directory).
-
Start Tomcat.
If Tomcat is already running, use tomcat/bin/shutdown.sh to shut down Tomcat.
Use tomcat/bin/startup.sh to start Tomcat.
Or, if you use the Tomcat Web Application Manager:
-
Download erddap.war into a temporary directory on your computer.
-
Use “Select WAR file to upload” to pick the erddap.war file.
-
Click on “Deploy”.
-
Is ERDDAP running?
Use a browser to try to view http://www.YourServer.org/erddap/
ERDDAP starts up without any datasets loaded. Datasets are loaded in a background thread and so become available one-by-one.
Trouble? Tomcat has to do a lot of work the first time an application like ERDDAP is started. On some servers, the first attempt to view ERDDAP stalls (30 seconds?) until this work is finished. On other servers, the first attempt will fail immediately. But if you wait 30 seconds and try again, it will succeed if ERDDAP was installed correctly.
If your second attempt to view ERDDAP fails, look for error messages in
Tomcat’s log file (tomcat/logs/catalina.out) and
ERDDAP’s log file (bigParentDirectory/logs/log.txt).
-
In the future, to shut down (and restart) ERDDAP, see
How to Shut Down and Restart Tomcat and ERDDAP.
-
Troubles installing Tomcat or ERDDAP?
Email me at bob dot simons at noaa dot gov . I will help you.
-
Email Notification of Updates
If you want to receive an email whenever a new version of ERDDAP is available, send an email to bob dot simons at noaa dot gov requesting this.
How To Do an Update of an Existing ERDDAP on Your Server
-
Make the changes listed in Changes in the section entitled “Things ERDDAP Administrators Need to Know and Do” for all of the ERDDAP versions since the version you were using.
-
Download erddap.war (version 1.46, size=482,115,063 bytes, MD5=26957D4A6866F5DBAF3238E7A39BD0FE) into a temporary directory.
-
messages.xml
-
If you are upgrading from ERDDAP version 1.44 (or below),
you MUST delete the old messages.xml file:
tomcat/content/erddap/messages.xml .
-
Common: If you just use the standard messages,
the new messages will be installed automatically. The standard messages are now stored in the erddap.war file (not a separate messages.xml file).
-
Rare: If you always make changes to the standard messages,
you need to make those changes to the new messages.xml file (which is
WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml after erddap.war is decompressed by Tomcat).
-
Rare: If you maintain a custom messages.xml file
in tomcat/content/erddap/, you need to figure out (via diff) what changes have been made to the default messages.xml (which are in the new erddap.war as
WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml) and modify your custom messages.xml file accordingly.
-
If you use Tomcat Manager:
-
“Undeploy” ERDDAP.
-
“Deploy” the erddap.war file (it will ask you for the location of the erddap.war file).
If you use a Linux or Mac command line:
-
Follow the first part of these instructions to shut down Tomcat and ERDDAP
(but don’t restart it yet).
-
In tomcat/webapps, use rm -r erddap
-
In tomcat/webapps, use rm erddap.war
-
Copy the new erddap.war file from the temporary directory to tomcat/webapps .
-
In tomcat/bin, use ./startup.sh to restart Tomcat and ERDDAP.
-
View ERDDAP in your browser to check that the restart succeeded. (Sometimes, you need to try to load ERDDAP in your browser a few times before it succeeds.)
-
Troubles updating ERDDAP?
Email me at bob dot simons at noaa dot gov . I will help you.
Things You Need To Know
-
If ERDDAP Is Responding Slowly
you may be able to figure out if the slowness is reasonable and temporary (e.g., because of lots of requests from scripts or WMS users), or if ERDDAP is having trouble and you need to shut down and restart Tomcat and ERDDAP.
If ERDDAP is responding slowly, here are some things you can do to determine the cause:
-
Look for clues (e.g., error messages) in
ERDDAP’s log file (bigParentDirectory/logs/log.txt) and
Tomcat’s log file (tomcat/logs/catalina.out).
-
If just one dataset is responding slowly (perhaps only for certain types of requests), the problem may be:
-
ERDDAP’s access to the dataset’s source data may be temporarily or permanently slow. Try to check the source’s speed independent of ERDDAP. If it is slow, perhaps you can improve it.
-
The dataset may be not optimally configured. You may be able to make changes to the dataset’s datasets.xml chunk to help ERDDAP handle the dataset better. For example, the configuration of the <subsetVariables> tag has a huge influence on how ERDDAP handles EDDTable datasets. Similarly, many factors influence the speed of an EDDTableFromDatabase dataset. If you want help speeding up a specific dataset, email a description of the problem and the dataset’s chunk of datasets.xml to bob dot simons at noaa dot gov.
-
If everything in ERDDAP is always slow, the problem may be:
-
The computer that is runnning ERDDAP may not have enough memory or processing power. It is good to run ERDDAP on a modern, multi-core server. For heavy use, the server should have a 64-bit operating system and 8 GB or more of memory.
-
The computer that is runnning ERDDAP may be also running other applications that are consuming lots of system resources. If so, can you get a dedicated server for ERDDAP? For example (this is not an endorsement), you can get a quad-core Mac Mini Server with 8 GB of memory for ~$1100.
-
If everything in ERDDAP is temporarily slow, view your ERDDAP’s
/erddap/status.html page in your browser.
-
Does the ERDDAP status page fail to load?
Is so, restart ERDDAP.
-
Did the ERDDAP status page load slowly (e.g., >5 seconds)?
That is a sign that everything in ERDDAP is running slowly, but it isn’t necessarily trouble.
-
For “Response Failed Time (since last major LoadDatasets)”, is n= a large number?
That indicates there have been lots of failed requests recently. That may be trouble or the start of trouble. The median time for the failures is often large (e.g., 210000 ms),
which means that there were (are?) lots of active threads.
which were tying up lots of resources (like memory, open files, open sockets, …),
which is not good.
-
For “Response Succeeded Time (since last major LoadDatasets)”, is n= a large number?
That indicates there have been lots of successful requests recently. This isn’t trouble. It just means your ERDDAP is getting heavy use.
-
Is the “Number of non-Tomcat-waiting threads” double a typical value?
This is often serious trouble that will cause ERDDAP to slow down and eventually freeze. If this persists for hours, you may want to proactively restart ERDDAP.
-
At the bottom of the “Memory Use Summary” list, is the last “Memory: currently using” value very high?
That may just indicate high usage, or it may be a sign of trouble.
-
Is your internet connection currently slow?
Search the internet for “internet speed test” and use one of the free online tests, such as http://www.speakeasy.net/speedtest//. If your internet connection is slow, it slows down ERDDAP. Sometimes, you can solve this by stopping unnecessary internet use (e.g., people watching streaming videos or on video conference calls).
-
Use the “Server Status” link in your Tomcat Manager to check on the status of all of the response threads.
-
How To Shut Down and Restart Tomcat and ERDDAP
You don’t need to shut down and restart Tomcat and ERDDAP if ERDDAP is temporarily slow, or slow for some known reason (like lots of requests from scripts or WMS users).
You do need to shut down and restart Tomcat and ERDDAP if you need to apply changes to the setup.xml file, or if ERDDAP freezes, hangs, or locks up.
To shutdown and restart Tomcat and ERDDAP:
-
If you use Linux or a Mac:
-
Use cd tomcat/bin
-
Use ps -u tomcatUser | grep java to find the java/tomcat processID (hopefully, just one process will be listed), which we’ll call javaProcessID below.
-
If ERDDAP is frozen/hung/locked up, use kill -3 javaProcessID to tell Java (which is running Tomcat) to do a thread dump to the Tomcat log file
tomcat/logs/catalina.out. After you reboot, you can diagnose the problem by finding the thread dump information (and any other useful information above it) in tomcat/logs/catalina.out and also by reading relevant parts of the ERDDAP log archive. If you want, you can email that information to bob dot simons at noaa dot gov so I can see what went wrong.
-
Use ./shutdown.sh
-
Use ps -u tomcatUser | grep java repeatedly until the java/tomcat process isn’t listed.
Sometimes, the java/tomcat process will take up to two minutes to shut down. The reason is: ERDDAP sends a message to its background threads to tell them to stop, but sometimes it takes these threads up to two minutes to get to a good stopping place.
-
If you don’t want to wait for java/tomcat to stop by itself, you can use
kill -9 javaProcessID
to force the java/tomcat process to stop immediately. If possible, use this only as a last resort. The -9 switch is powerful, but it may cause various problems.
-
To restart ERDDAP, use ./startup.sh
-
View ERDDAP in your browser to check that the restart succeeded. (Sometimes, you need to wait 30 seconds and try to load ERDDAP again in your browser for it to succeed.)
-
If you use Windows:
-
Use cd tomcat/bin
-
Use shutdown
-
You may want/need to use the Windows Task Manager (accessible via Ctrl Alt Del) to ensure that the Java/Tomcat/ERDDAP process/application has fully stopped.
Sometimes, the process/application will take up to two minutes to shut down. The reason is: ERDDAP sends a message to its background threads to tell them to stop, but sometimes it takes these threads up to two minutes to get to a good stopping place.
-
To restart ERDDAP, use startup
-
View ERDDAP in your browser to check that the restart succeeded. (Sometimes, you need to wait 30 seconds and try to load ERDDAP again in your browser for it to succeed.)
-
Frequent Crashes – If ERDDAP freezes/stops twice or more in one day, you probably have an troublesome user running several scripts at once and/or someone making a large number of invalid requests. If this happens, you should probably blacklist that user. When a blacklisted user makes a request, the error message in the response encourages them to email you to work out the problems. Then, you can encourage them to run just one script at a time and to fix the problems in their script (e.g., requesting data from a remote dataset that can’t respond before timing out). See <requestBlacklist> in your datasets.xml file.
-
Monitor ERDDAP – My experience is that ERDDAP can run for months without needing a restart. But sometimes, if there is heavy use (e.g., scripts or WMS users) and especially if many of the requests are failing in odd ways, then ERDDAP freezes (and needs to be restarted) every few days or even every day. I suspect that the problem is related to the program stopping lower level processing threads that froze or otherwise failed in an odd way. Java only offers one way for a program to stop these threads and that method is not good (and deprecated) because it is highly likely that it causes memory and other resource leaks. Those leaks accumulate. After 1000’s of leaks, ERDDAP becomes very slow or freezes. Unfortunately, I haven’t found which resource leak is causing the problem. But if you look at your ERDDAP’s /erddap/status.html page, when the number on the line
Number of non-Tomcat-waiting threads in this JVM
increases significantly (e.g., doubles) and permanently (more than an hour or two), that is usually a sign that trouble is here or is coming soon.
If you find that you frequently need to restart ERDDAP, you might try using Monit to monitor your ERDDAP and restart it if needed. Or, you could make a cron job to restart ERDDAP (proactively) periodically. It may be a little challenging to write a script to automate monitoring and restarting ERDDAP. Some tips that might help:
-
You can simplify testing if the Tomcat process is still running by using the -c switch with grep:
ps -u tomcatUser | grep -c java
That will reduce the output to “1” if the tomcat process is still alive, or “0” if the process has stopped.
-
If you are good with gawk, you can extract the processID from the results of
ps -u tomcatUser | grep java, and use the processID in other lines of the script.
If you do set up Monit or a cron job, please email the details to bob dot simons at noaa dot gov .
-
PermGen – If you repeatedly use Tomcat Manager to Reload (or Stop and Start) ERDDAP, ERDDAP may fail to start up and throw java.lang.OutOfMemoryError: PermGen. The solution is to periodically (or every time?) shut down and restart tomcat and ERDDAP, instead of just reloading ERDDAP.
[Update: This problem should be greatly minimized or fixed in ERDDAP version 1.24.]
-
log.txt – If ERDDAP doesn’t start up or if something isn’t working as expected, it is very useful to look at the error and diagnostic messages in the ERDDAP log file.
-
The log file is bigParentDirectory/logs/log.txt
(bigParentDirectory is specified in setup.xml). If there is no log.txt file or if the log.txt file hasn’t been updated since you restarted ERDDAP, look in the Tomcat Log Files to see if there is an error message there.
-
Types of diagnostic messages in the log file:
-
The word “error” is used when something went so wrong that the procedure failed to complete. Although it is annoying to get an error, the error forces you to deal with the problem. Our thinking is that it is better to throw an error, than to have ERDDAP hobble along, working in a way you didn’t expect.
-
The word “warning” is used when something went wrong, but the procedure was able to complete. These are pretty rare.
-
Anything else is just an informative message. You can control how much information is logged with <logLevel> in setup.xml.
-
When the log.txt files gets to 20 MB,
the file is renamed log.txt.previous and a new log.txt file is created. So log files don’t accumulate.
-
Whenever you restart ERDDAP,
ERDDAP makes an archive copy of the log.txt and log.txt.previous files with a time stamp in the file’s name. If there was trouble before the restart, it may be useful to analyze these archived files for clues as to what the trouble was. You can delete the archive files if they are no longer needed.
-
Tomcat Log Files – If ERDDAP doesn’t start up because an error occured very early in ERDDAP’s startup, the error message will show up in Tomcat’s log files (tomcat/logs/catalina.today.log or tomcat/logs/catalina.out), not in ERDDAP’s log.txt file.
-
emailLogYEAR-MM-DD.txt – ERDDAP always writes the text of all out-going email messages in the current day’s emailLogYEAR-MM-DD.txt file in bigParentDirectory/logs (bigParentDirectory is specified in setup.xml).
-
If the server can’t send out email messages, or if you have configured ERDDAP not to send out email messages, or if you are just curious, this file is a convenient way to see all of the email messages that have been sent out.
-
You can delete previous days’ email log files if they are no longer needed.
-
Daily Report – The Daily Report has lots of useful information — all of the information from your ERDDAP’s /erddap/status.html page and more.
-
It is the most complete summary of your ERDDAP’s status.
-
Among other statistics, it includes a list of datasets that didn’t load and the exceptions they generated.
-
It is generated when you start up ERDDAP (just after ERDDAP finishes trying to load all of the datasets) and generated soon after 7 am local time every morning.
-
Whenever it is generated, it is written to ERDDAP’s log.txt file.
-
Whenever it is generated, it is emailed to <emailDailyReportsTo> and <emailEverythingTo> (which are specified in setup.xml) provided you have set up the email system (in setup.xml).
-
Status Page – You can view the status of your ERDDAP from any browser by going to <baseUrl>/erddap/status.html
-
This page is generated dynamically, so it always has up-to-the-moment statistics for your ERDDAP.
-
It includes statistics regarding the number of requests, memory usage, thread stack traces, the taskThread, etc.
-
Because the Status Page can be viewed by anyone, it doesn’t include quite as much information as the Daily Report.
-
Adding/Changing Datasets – ERDDAP usually rereads datasets.xml every loadDatasetsMinMinutes (specified in setup.xml). So you can make changes to datasets.xml any time, even while ERDDAP is running.
A new dataset will be detected soon, usually within loadDatasetsMinMinutes.
A changed dataset will be reloaded when it is reloadEveryNMinutes old (as specified in datasets.xml).
-
A Flag File Tells ERDDAP to Try to Reload a Dataset As Soon As Possible
-
ERDDAP won’t notice any changes to a dataset’s setup in datasets.xml until ERDDAP reloads the dataset.
-
If a dataset is active in ERDDAP and you want to force ERDDAP to reload it as soon as possible (before the dataset’s <reloadEveryNMinutes> would cause it to be reloaded), put a file in bigParentDirectory/flag (bigParentDirectory is specified in setup.xml) that has the same name as the dataset’s datasetID.
-
The contents of the flag file are irrelevant.
-
In between major dataset reloads, ERDDAP looks continuously for flag files.
-
When ERDDAP finds a flag file, it deletes the file and tries to reload the dataset very soon (usually within seconds).
-
Note that when a dataset is reloaded, all files in the bigParentDirectory/cache/datasetID directory are deleted.
-
Note that if the dataset’s xml includes active="false", a flag will cause the dataset to be made inactive (if it is active), and in any case, not reloaded.
ERDDAP has a web service so that flags can be set via URLs.
-
For example,
http://coastwatch.pfeg.noaa.gov/erddap/setDatasetFlag.txt?datasetID=rPmelTao&flagKey=31415926
(that’s a fake flagKey) will set a flag for the rPmelTao dataset.
-
There is a different flagKey for each datasetID.
-
Administrators can see a list of flag URLs for all datasets by looking at the bottom of their Daily Report email.
-
Administrators should treat these URLs as confidential, since they give someone the right to reset a dataset at will.
-
If you think the flagKeys have fallen into the hands of someone who is abusing them, you can change <flagKeyKey> in setup.xml and restart ERDDAP to force ERDDAP to generate and use a different set of flagKeys.
-
If you change <flagKeyKey>, delete all of the old subscriptions (see the list in your Daily Report) and remember to send the new URLs to the people who you do want to have them.
The flag system can serve as the basis for a more efficient mechanism for telling ERDDAP when to reload a dataset. For example, you could set a dataset’s <reloadEveryNMinutes> to a large number (e.g., 10080 = 1 week). Then, when you know the dataset has changed (perhaps because you added a file to the dataset’s data directory), set a flag so that the dataset is reloaded as soon as possible. Flags are usually seen quickly. But if the LoadDatasets thread is already busy, it may be a while before it is available to act on the flag. But the flag system is much more responsive and much more efficient than setting <reloadEveryNMinutes> to a small number.
-
Force Dataset Removal – If a dataset is active in ERDDAP, and you want to deactivate it as soon as possible, add active="false" to the dataset tag and set a flag. Flags are usually seen quickly. But if the LoadDatasets thread is already busy, it may be a while before it is available to act on the flag. Once the dataset is not active (i.e., not visible in ERDDAP’s list of datasets), you can remove the dataset’s description from the datasets.xml file if you want to.
-
When Are Datasets Reloaded? A thread called RunLoadDatasets is the master thread that controls when datasets are reloaded. RunLoadDatasets loops forever:
-
RunLoadDatasets notes the current time.
-
RunLoadDatasets starts a LoadDatasets thread to do a “majorLoad”. You can see information about the current/previous majorLoad at the top of your ERDDAP’s
/erddap/status.html page (status page example).
-
LoadDatasets makes a copy of datasets.xml.
-
LoadDatasets reads through the copy of datasets.xml and, for each dataset, sees if the dataset needs to be (re)loaded or removed.
-
If a flag file exists for this dataset, the file is deleted and the dataset is removed if active="false" or (re)loaded if active="true" (regardless of the dataset’s age).
-
If the dataset’s dataset.xml chunk has active="false" and the dataset is currently loaded (active), is is unloaded (removed).
-
If the dataset has active="true" and the dataset isn’t already loaded, it is loaded.
-
If the dataset has active="true" and the dataset is already loaded, the data set is reloaded if the dataset’s age (time since last load) is greater than its <reloadEveryNMinutes> (default = 10080 minutes), otherwise, the dataset is left alone.
-
LoadDatasets finishes.
The RunLoadDatasets thread waits for the LoadDatasets thread to finish. If LoadDatasets takes longer than loadDatasetsMinMinutes (as specified in setup.xml), RunLoadDatasets interrupts the LoadDatasets thread. Ideally, LoadDatasets notices the interrupt and finishes. But if it doesn’t notice the interrupt within a minute, RunLoadDatasets calls loadDatasets.stop(), which is undesirable.
-
While the time since the start of the last majorLoad is less than loadDatasetsMinMinutes (as specified in setup.xml, e.g., 15 minutes), RunLoadDatasets repeatedly looks for flag files in the bigParentDirectory/flag directory. If one or more flag files are found, they are deleted, and RunLoadDatasets starts a LoadDatasets thread to do a “minorLoad” (majorLoad=false). You can’t see minorLoad information on your ERDDAP’s /erddap/status.html page.
-
LoadDatasets makes a copy of datasets.xml.
-
LoadDatasets reads through the copy of datasets.xml and, for each dataset for which there was a flag file:
-
If the dataset’s dataset.xml chunk has active="false" and the dataset is currently loaded (active), is is unloaded (removed).
-
If the dataset has active="true", the dataset is (re)loaded, regardless of its age.
Non-flagged datasets are ignored.
-
LoadDatasets finishes.
-
RunLoadDatasets goes back to step 1.
Notes:
-
Startup
When you restart ERDDAP, every dataset with active="true" is loaded.
-
Cache
When a dataset is (re)loaded, its cache (including any data response files and/or image files) is emptied.
-
Lots of Datasets
If you have a lot of datasets and/or one or more datasets are slow to (re)load, a LoadDatasets thread may take a long to finish its work, perhaps even longer than loadDatasetsMinMinutes.
-
One LoadDatasets Thread
There is never more than one LoadDatasets thread running at once. If a flag is set when LoadDatasets is already running, the flag probably won’t be noticed or acted on until that LoadDatasets thread finishes running. You might say: “That’s stupid. Why don’t you just start a bunch of new threads to load datasets?” But if you have lots of datasets which get data from one remote server, even one LoadDatasets thread will put substantial stress on the remote server. The same is true if you have lots of datasets which get data from files on one RAID. There are rapidly diminishing returns from having more than one LoadDatasets thread.
-
Flag=ASAP
Setting a flag just signals that the dataset should be (re)loaded as soon as possible, not necessarily immediately. If no LoadDatasets thread is currently running, the dataset will start to be reloaded within a few seconds. But if a LoadDatasets thread is currently running, the dataset probably won’t be reloaded until after that LoadDatasets thread is finished.
-
Flag File Deleted
In general, if you put a flag file in the bigParentDirectory/erddap/flag directory (by visiting the dataset’s flagUrl or putting an actual file there), the dataset will usually be reloaded very soon after that flag file is deleted.
-
Flag vs. Small reloadEveryNMinutes
If you have some external way of knowing when a dataset needs to be reloaded and if it is convenient for you, the best way to make sure that a dataset is always up-to-date is to set its reloadEveryNMinutes to a large number (10080?) and set a flag (via a script?) whenever it needs to be reloaded. That is the system that EDDGridFromErddap and EDDTableFromErddap use receive messages that the dataset needs to be reloaded.
-
Look in log.txt
Lots of relevant information is written to the bigParentDirectory/logs/log.txt file. If things aren’t working as you expect, looking at log.txt lets you diagnose the problem by finding out exactly what ERDDAP did.
-
Search for “majorLoad=true” for the start of major LoadDataset threads.
-
Search for “majorLoad=false” for the start of minor LoadDatasets threads.
-
Search for a given dataset’s datasetID for information about it being (re)loaded or queried.
-
Cached Responses – In general, ERDDAP doesn’t cache (store) responses to user requests. The rationale was that most requests would be slightly different so the cache wouldn’t be very effective. The biggest exceptions are requests for image files (which are cached since browsers and programs like Google Earth often re-request images) and requests for .nc files (because they can’t be created on-the-fly). ERDDAP stores each dataset’s cached files in a different directory: bigParentDirectory/cache/datasetID since a single cache directory might have a huge number of files which might become slow to access.
Files are removed from the cache for one of three reasons:
-
All files in this cache are deleted when ERDDAP is restarted.
-
Periodically, any file more than <cacheMinutes> old (as specified in setup.xml) will be deleted. Removing files in the cache based on age (not Least-Recently-Used) ensures that files won’t stay in the cache very long. Although it might seem like a given request should always return the same response, that isn’t true. For example, a tabledap request which includes &time>someTime will change if new data arrives for the dataset. And a griddap request which includes [last] for the time dimension will change if new data arrives for the dataset.
-
Images showing error conditions are cached, but only for a few minutes (it’s a difficult situation).
-
Every time a dataset is reloaded, all files in that dataset’s cache are deleted. Because requests may be for the “last” index in a gridded dataset, files in the cache may become invalid when a dataset is reloaded.
-
Stored Dataset Information –
For all types of datasets, ERDDAP gathers lots of information when a dataset is loaded and keeps that in memory. This allows ERDDAP to respond very quickly to searches, requests for lists of datasets, and requests for information about a dataset.
For a few types of datasets (notably EDDGridCopy, EDDTableCopy, EDDGridFromXxxFiles, and EDDTableFromXxxFiles), ERDDAP stores on disk some information about the dataset that is reused when the dataset is reloaded. This greatly speeds the reloading process.
-
The dataset information files are human-readable .json files and are stored in bigParentDirectory/dataset .
-
ERDDAP only deletes these files in unusual situations.
-
It shouldn’t ever be necessary for you to delete these files because ERDDAP verifies and updates the stored information when the dataset is reloaded.
-
But if you ever do need to delete these files (why?), you can do it when ERDDAP is running. Then set a flag.
-
If you want to encourage ERDDAP to update the stored dataset information (for example, if you just added, removed, or changed some files to the dataset’s data directory), use the flag system.
-
robots.txt – The search engine companies use web crawlers (e.g., GoogleBot) to examine all of the pages on the web to add the content to the search engines. For ERDDAP, that is good. ERDDAP has lots of links between pages, so the crawlers will find all of the web pages and add them to the search engines. Then, users of the search engines will be able to find datasets on your ERDDAP.
Unfortunately, some web crawlers (e.g., GoogleBot) are now filling out and submitting forms in order to find additional content. For web commerce sites, this is great. But this is terrible for ERDDAP because it just leads to an infinite number of undesirable and pointless attempts to crawl the actual data. This can lead to more requests for data than from all other users combined. And it fills the search engine with goofy, pointless subsets of the actual data.
To tell the web crawlers to stop filling out forms, you need to create a text file called robots.txt in the root directory of your web site’s document hierarchy so that it can be viewed by anyone as, e.g., http://www.example.com/robots.txt .
If you are creating a new robots.txt file, put these two lines in it:
User-agent: *
Disallow: /*?
If you already have a robots.txt file, add this line to the Disallow section:
Disallow: /*?
This tells the web crawler not to visit URLs with “?” in the URL (i.e., all form submissions).
It may take a few days for the search engines to notice and for the changes to take effect.
-
sitemap.xml – As the www.sitemaps.org web site says:
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Actually, since ERDDAP is RESTful, search engine spiders can easily crawl your ERDDAP. But they tend to do it more often (daily!) than necessary (monthly?).
-
Data Dissemination / Data Distribution Networks: Push and Pull Technology
-
Normally, ERDDAP acts as an intermediary: it takes a request from a user; gets data from a remote data source; reformats the data; and sends it to the user.
-
Pull Technology: ERDDAP also has the ability to actively get all of the available data from a remote data source and store a local copy of the data.
-
Push Technology: By using ERDDAP’s subscription services, other data servers can be notified as soon as new data is available so that they can request the data (by pulling the data).
-
ERDDAP’s EDDGridFromErddap and EDDTableFromErddap use ERDDAP’s subscription services and flag system so that it will be notified immediately when new data is available.
-
You can combine these to great effect: if you wrap an EDDGridCopy around an EDDGridFromErddap dataset (or wrap an EDDTableCopy around an EDDTableFromErddap dataset), ERDDAP will automatically create and maintain a local copy of another ERDDAP’s dataset.
-
Because the subscription services work as soon as new data is available, push technology disseminates data very quickly (within seconds).
This architecture puts each ERDDAP administrator in charge of determining where the data for his/her ERDDAP comes from.
-
Other ERDDAP administrators can do the same. There is no need for coordination between administrators.
-
If many ERDDAP administrators link to each other’s ERDDAPs, a data distribution network is formed.
-
Data will be quickly, efficiently, and automatically disseminated from data sources (ERDDAPs and other servers) to data re-distribution sites (ERDDAPs) anywhere in the network.
-
A given ERDDAP can be both a source of data for some datasets and a re-distribution site for other datasets.
-
The resulting network is roughly similar to data distribution networks set up with programs like Unidata’s IDD/IDM, but less rigidly structured.
-
Security/Authentication/Authorization – By default, ERDDAP runs as an entirely public server (using http) with no login system (authentication) and no restrictions to data access (authorization). If you want to restrict access to some or all datasets to some users, you can use ERDDAP’s built-in security system. When the security system is in use:
-
ERDDAP uses role-based access control.
-
The ERDDAP administrator defines users with the <user> tag in datasets.xml. Each user has a username, a password, and one or more roles.
-
The ERDDAP administrator defines which roles have access to a given dataset via the <accessibleTo> tag in datasets.xml for any dataset that shouldn’t have public access.
-
The user’s log in status (and a link to log in/out) will be shown at the top of every web page. (But a logged in user will appear to ERDDAP to be not logged in if he uses an http URL.) If a user tries and fails to log in 3 times, the user is blocked from trying to log in for 15 minutes. This prevents hackers from simply trying millions of passwords until they find the right one.
-
Users who are not logged in use ERDDAP’s http URLs. Users who are logged in use ERDDAP’s https URLs. This helps prevent session hijacking and sidejacking.
-
Anyone who isn’t logged in can access and use the public datasets. By default, private datasets don’t appear in lists of datasets if a user isn’t logged in. If the administrator has set setup.xml’s <listPrivateDatasets> to true, they will appear. Attempts to request data from private datasets (if the user knows the URL) will be redirected to the login page.
-
Anyone who is logged in will be able to see and request data from any public dataset and any private dataset to which their role allows them access. By default, private datasets to which a user doesn’t have access don’t appear in lists of datasets. If the administrator has set setup.xml’s <listPrivateDatasets> to true, they will appear. Attempts to request data from private datasets to which the user doesn’t have access will be redirected to the login page.
-
The RSS information for all datasets is always available to anyone. (This is not ideal. But RSS readers won’t ever log in, so they need access without loggin in.) But, since private datasets that aren’t accessible aren’t advertised, their RSS links are not advertised either.
-
Email subscriptions can only be set up when a user has access to a dataset. Once set up, they continue to function after the user has logged out.
To set up the security/authentication/authorization system:
-
Do the standard ERDDAP initial setup.
-
In setup.xml,
-
Add/change the <authenticate> value from nothing to custom or openid. See the comments about these options below.
-
Add/change the <baseHttpsUrl> value.
-
Insert/uncomment &loginInfo; in <startBodyHtml> to display the user’s log in/out info at the top of each web page.
-
Configure tomcat to support SSL (the basis for https connections) by creating a keystore with a digital certificate and by modifying tomcat/conf/server.xml to uncomment the connector for port 8443. It is better to get a digital certificate from a certificate authority than to make a self-signed certificate (instructions), because it gives your clients more assurance that they are indeed connecting to your ERDDAP, not an imposter’s website. Many vendors sell digital certificates. (Search for web.) They are not expensive. On Windows, you may have to move .keystore from “c:\Documents and Settings\you\.keystore” to “c:\Documents and Settings\Default User\.keystore” or “c:\.keystore” (see tomcat/logs/catalina.today.log if the application doesn’t load or users can’t see the log in page). You can see when the .keystore certificate will expire by examining the certificate when you log in. For additional security, create a signed certificate from a trusted source. (Search the web for more information.)
-
If Tomcat is running in Apache, you need to modify the /etc/httpd/conf/httpd.conf file to allow HTTPS traffic to/from ERDDAP: To the “VirtualHost” tag, add the lines:
ProxyPass /erddap https://www.YourServer.org:8443/erddap
ProxyPassReverse /erddap https://www.YourServer.org:8443/erddap
(This is untested. If you do this and it works or doesn’t work, let us know.)
-
In datasets.xml, create a <user> tag for each user with username, password, and roles information. This is the authorization part of ERDDAP’s security system.
-
In datasets.xml, add an <accessibleTo> tag to each dataset that shouldn’t have public access. <accessibleTo> lets you specify which roles have access to that dataset.
-
Restart Tomcat. Trouble? Check the Tomcat logs.
-
CHECK YOUR WORK! Any mistake could lead to a security flaw.
-
Check that the login page uses https (not http). Attempts to connect via http should be automatically redirected to https and port 8443. You may need to work with your network administrator to allow external web requests to access port 8443 on your server.
-
You can change the <user> and <accessibleTo> tags at any time. The changes will be applied at the next regular reload of any dataset, or immediately if you use a flag.
-
It worked for a few months, now users can’t get to the log in page? Check the Tomcat logs. Your .keystore certificate may have expired and you may need to make a new one. You can see when the .keystore certificate will expire by examining the certificate when you log in.
Authentication (logging in) – Currently, ERDDAP supports custom and openid (recommended) authentication. We strongly recommend OpenID because it frees you from storing and handling user’s passwords. Remember that users often use the same password at different sites. So they may be using the same password for your ERDDAP as they do at their bank. That makes their password very valuable — much more valuable to the user than the data they are requesting. So you need to do as much as you can to keep the passwords private. That is a big responsibility. OpenID takes care of passwords, so you don’t have to gather, store, or work with them. So you are freed from that responsibility.
-
custom is ERDDAP’s custom system for letting users log in by entering their User Name and Password in a form on a web page. This is secure because the User Name and Password are transmitted via https (not http), but OpenID is better because it frees you from having to handle passwords. The custom approach requires you to collect User Names and Passwords (use your phone! email isn’t secure!) and store them in datasets.xml in <user> tags.
The custom approach uses a cookie on the user’s computer, so the user’s browser must be set to allow cookies. If a user is making ERDDAP requests from a computer program (not a browser), cookies are hard to work with. Sorry.
-
openid is an open standard that lets users log in with your password at one web site and then log in without your password at many other web sites, including ERDDAP. OpenID is very convenient for ERDDAP administrators — you don’t ever have to deal with passwords. All you need is a user’s OpenID URL (which is public information) so that you can define the users and their roles in datasets.xml with <user> tags.
OpenID uses a cookie on the user’s computer, so the user’s browser must be set to allow cookies. If a user is making ERDDAP requests from a computer program (not a browser), cookies are hard to work with. Sorry.
ERDDAP doesn’t support BASIC authentication because:
-
BASIC seems geared toward predefined web pages needing secure access or blanket on/off access to the whole site, but ERDDAP allows (restricted access) datasets to be added on-the-fly.
-
BASIC authentication doesn’t offer a way for users to log out!
Secure Data Sources – If a data set is to have restricted access to ERDDAP users, the data source (from where ERDDAP gets the data) should not be publicly accessible. So how can ERDDAP get the data for restricted access datasets? Some options are:
-
ERDDAP can serve data from local files (for example, via EDDTableFromFiles or EDDGridFromFiles).
-
ERDDAP can be in a DMZ and the data source (e.g., an OPeNDAP server or a database) can be behind a firewall, where it is accessible to ERDDAP but not to the public.
-
The data source can be on a public web site, but require a login to get the data. The one type of dataset that ERDDAP can log on to access is EDDTableFromDatabase. These datasets support (and should always use) user names, passwords, SSL connections, and other security measures.
But in general, currently, ERDDAP can’t deal these data sources because it has no provisions for logging on to the data source. This is the reason why access to EDDGridFromErddap and EDDTableFromErddap datasets can’t be restricted. Currently, the local ERDDAP has no way to login and access the metadata information from the remote ERDDAP. And putting the remote ERDDAP behind your firewall and removing its dataset’s accessibleTo restrictions doesn’t solve the problem: since user requests for EDDXxxFromErddap data need to be redirected to the remote ERDDAP, the remote ERDDAP can’t be behind a firewall.
Questions? Suggestions? If you have any questions about ERDDAP’s security system or have any questions, doubts, concerns, or suggestions about how it is set up, please email bob dot simons at noaa dot gov.
Things You Don’t Need To Know
These are details that you don’t need to know until a need arises.
-
Setting Up a Second ERDDAP for Testing/Development
If you want to do this, there are two approaches:
-
(Best) Install Tomcat and ERDDAP on a computer other than the computer that has your public ERDDAP. If you use your personal computer:
-
Do the installation one step at a time. Get Tomcat up and running first.
When Tomcat is running, the Tomcat Manager should be at
http://127.0.0.1:8080/manager/html/
-
Install ERDDAP.
-
Don’t use ProxyPass to eliminate the port number from the ERDDAP URL.
-
In setup.xml, set baseUrl to http://127.0.0.1:8080
-
After you start up this ERDDAP, you should be able to see it at
http://127.0.0.1:8080/erddap
-
(Second Best) Install another Tomcat on the same computer as your public ERDDAP.
-
Do the installation one step at a time. Get Tomcat up and running first.
Change all of the port numbers associated with the second Tomcat (e.g., change 8080 to 8081) (see these directions ).
-
Install ERDDAP in the new Tomcat.
-
Don’t use ProxyPass to eliminate the port number from the ERDDAP URL.
-
In setup.xml, set baseUrl to http://www.yourDomainName:8081
-
After you start up this ERDDAP, you should be able to see it at
http://www.yourDomainName:8081/erddap
-
Heavy Loads/Constraints – With heavy use, a standalone ERDDAP may be constrained by various problems. For more information, see the list of constraints and solutions.
-
Grids, Clusters, and Federations – Under very heavy use, a single standalone ERDDAP will run into one or more constraints and even the suggested solutions will be insufficient. For such situations, ERDDAP has features that make it easy to construct scalable grids (also called clusters or federations) of ERDDAPs which allow the system to handle very heavy use (e.g., for a large data center). For more information, see grids, clusters, and federations of ERDDAPs.
-
Cloud Computing – Several companies are starting to offer cloud computing services (e.g., Amazon Web Services). Web hosting companies have offered a range of roughly similar services since the mid-1990’s. You can use these services to set up a grid/cluster of ERDDAPs to handle very heavy use. For more information, see cloud computing with ERDDAP.
-
WaitThenTryAgain Exception – In unusual circumstances, a user may get an error message like
WaitThenTryAgainException:
There was a (temporary?) problem. Wait a minute, then try again. (In a browser, click the Reload button.)
Details: GridDataAccessor.increment: partialResults[0]="123542730" was expected to be "123532800".
The explanation is: For each EDDGrid dataset, ERDDAP keeps the axis variable values in memory. They are used, for example, to convert requested axis values that use the “()” format into index numbers. For example, if the axis values are “10, 15, 20, 25”, a request for (20) will be interpreted as a request for index #2 (0-based indices). When ERDDAP gets a request for data and gets the data from the source, it verifies that the axis values that it got from the source match the axis values in memory. Normally, they do. But sometimes the data source has changed in a significant way: for example, index values from the beginning of the axis variable may have been removed (e.g., “10, 15, 20, 25” may have become “20, 25, 30”). If that happens, it is clear that ERDDAP’s interpretation of the request (e.g., “(20)” is index #2) is now wrong. So ERDDAP throws an exception and marks the dataset as needing to be reloaded. ERDDAP will update the dataset soon (often in a few seconds, usually within a minute). Other, similar problems also throw the WaitThenTryAgain exception.
Starting with ERDDAP version 1.14, it became much less likely that a user would actually see this error. Now, when the underlying error occurs, ERDDAP automatically internally tries to reload the dataset and resubmit the request to the reloaded dataset. Often this succeeds. When it does, the user will simply see that a given request took a little longer than usual. If it fails, the user should (as the message says) wait a minute, then try again.
-
Palettes are used to convert a range of numbers into a range of colors when making graphs and maps. ERDDAP comes with several pre-made palettes. You can add additional palettes. ERDDAP’s palettes are defined in GMT-style .cpt (Color Palette Table) files. All ERDDAP .cpt files are valid GMT .cpt files, but the opposite may not be true. ERDDAP may just support a subset of what GMT supports. If there is trouble, ERDDAP will probably throw an error when the .cpt file is parsed (which is better than misusing the information). ERDDAP requires that all .cpt files be stored in tomcat/webapps/erddap/WEB-INF/cptfiles. To get ERDDAP to use a new .cpt file, store the file in that directory and either:
-
If you use the default messages.xml file: add the file name to the <palettes> tag in
tomcat/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml.
If you do this, you need to do it every time you upgrade ERDDAP.
-
If you use a custom messages.xml file: add the file name to the <palettes> tag in your custom messages.xml file: tomcat/content/erddap/messages.xml . If you do this, you only need to do it once (but there is other work to maintain a custom messages.xml file).
But don’t remove any of the standard palettes. They are a standard feature of all ERDDAP installations. An advantage of this approach is that you can specify the order of the palettes in the list presented to users.
And for every new ERDDAP release, you need to put your .cpt files in tomcat/webapps/erddap/WEB-INF/cptfiles and restart ERDDAP so ERDDAP findes the files.
-
How does ERDDAP generate colors in a colorbar
-
The user selects one of the predefined palettes or uses the default, e.g., Rainbow. Palettes are stored/defined in GMT-style .cpt Color Palette Table files. Each of ERDDAP’s predefined palettes has a simple integer range, e.g., 0 to 1 (if there is just one section in the palette), or 0 to 4 (if there are four sections in the palette). Each segment in the file covers n to n+1, starting at n=0.
-
ERDDAP generates a new .cpt file on-the-fly, by scaling the predefined palette’s range (e.g., 0 to 4) to the range of the palette needed by the user (e.g., 0.1 to 50) and then generating a section in the new palette for each section of the new palette (e.g., a log scale with ticks at 0.1, 0.5, 1, 5, 10, 50 will have 5 sections). The color for the end point of each section is generated by finding the relevant section of the palette in the .cpt file, then linearly interpolating the R, G, and B values. (That’s the same as how GMT generates colors from its Color Palette Table files.) This system allows ERDDAP to start with generic palettes (e.g., Rainbow with 8 segments, in total spanning 0 to 8) and create custom palettes on-the-fly (e.g., a custom Rainbow, which maps 0.1 to 50 mg/L to the rainbow colors).
-
ERDDAP then uses that new .cpt file to generate the color for each different colored pixel in the color bar (and later for each data point when plotting data on a graph or map), again by finding the relevant section of the palette in the .cpt file, then linearly interpolating the R, G, and B values.
This process may seem unnecessarily complicated. But it solves problems related to log scales that are hard to solve other ways.
So how can you mimic what ERDDAP is doing? That isn’t easy. Basically you need to duplicate the process that ERDDAP is using. If you are a Java programmer, you can use the same Java class that ERDDAP uses to do all of this:
tomcat/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/coastwatch/sgt/CompoundColorMap.java.
-
Guidelines for Data Distribution Systems – More general opinions about the design and evaluation of data distribution systems can be found here.
Programmer’s Guide
These are things that only a programmer who intends to work with ERDDAP’s Java classes needs to know.
-
Source Code
-
The source code for the current version of ERDDAP is always in the current erddap.war file.
-
The source code for recent public versions and in-development versions is available via GitHub. Please read the Wiki for that project.
-
ERDDAP has a very liberal, open-source license, so you can use the source code for any purpose.
-
Use the Code for Other Projects While you are welcome to use parts of the ERDDAP code for other projects, be warned that the code can and will change. We don’t promise to support other uses of our code. Our main goal is to make a web application that people can download and use, as is, to distribute data.
For many situations where you might be tempted to use parts of ERDDAP in your project, we think you will find it much easier to install and use ERDDAP as is, and then write other services which use ERDDAP’s services. You can set up your own ERDDAP installation crudely in an hour or two. You can set up your own ERDDAP installation in a polished way in a few days (depending on the number and complexity of your datasets). But hacking out parts of ERDDAP for your own project is likely to take weeks (and months to catch subtleties). We (obviously) think there are many benefits to using ERDDAP as is and making your ERDDAP installation publicly accessible. However, in some circumstances, you might not want to make your ERDDAP installation publicly accessible. Then, your service can access and use your private ERDDAP and your clients needn’t know about ERDDAP.
Half Way – Or, there is another approach which you may find useful which is half way between delving into ERDDAP’s code and using ERDDAP as a stand-alone web service: In the EDD class, there is a static method which lets you make an instance of a dataset (based on the specification in datasets.xml):
oneFromDatasetXml(String tDatasetID)
It returns an instance of an EDDTable or EDDGrid dataset. Given that instance, you can call
makeNewFileForDapQuery(String userDapQuery, String dir, String fileName, String fileTypeName)
to tell the instance to make a data file, of a specific fileType, with the results from a user query. Thus, this is a simple way to use ERDDAP’s methods to request data and get a file in response, just as a client would use the ERDDAP web application. But this approach works within your Java program and bypasses the need for an application server like Tomcat. We use this approach for many of the unit tests of EDDTable and EDDGrid subclasses, so you can see examples of this in the source code for all of those classes.
-
Development Environment
-
Directory Structure
-
If you are installing ERDDAP in a Tomcat (whether or not you will actually use it that way), follow the instructions above.
-
If you aren’t installing ERDDAP in a Tomcat, you still need to make a Tomcat-like directory structure, so that ERDDAP can find the setup files in tomcat/content/erddap .
-
Make a directory somewhere called “tomcat” (it can be something else, but this is easier to explain).
-
As indicated above, unzip erddapContent.zip into tomcat, creating tomcat/content/erddap . Follow the instructions above to modify setup.xml and datasets.xml. Depending on your situation, you may need to specify that directory by adding something like
-DErddapContentDirectory="/someDirectory/content/erddap/"
to the Java command line for your program.
-
Make a tomcat/webapps/erddap directory.
-
.war files are just .zip files that follow a few additional conventions. So you can use an unzip program to unzip erddap.war into tomcat/webapps/erddap . That has all of ERDDAP’s .java classes and many other files.
-
Our development environment is just a programmer’s editor (we’re not saying which one). (No, we don’t use Eclipse, Ant, Maven, or …; nor do we offer ERDDAP-related support for them. If we required you to use Ant and you preferred Maven (or vice versa), you wouldn’t be very happy about it.)
-
We use a batch file which deletes all of the .class files in the source tree.
-
We currently use javac 1.6.0_35 to compile gov.noaa.pfel.coastwatch.TestAll (it has links to a few classes that wouldn’t be compiled otherwise) and java 1.6.0_35 and 1.7.0_09 to run the tests. For security reasons, it is almost always best to use the latest versions of Java and Tomcat.
-
When we run javac or java, the current directory is tomcat/webapps/erddap/WEB-INF .
-
Our javac and java classpath (including some unnecessary items) is currently
./classes;../../../lib/servlet-api.jar;lib/activation.jar;lib/axis.jar;lib/commons-compress.jar;lib/commons-discovery.jar;lib/itext-1.3.1.jar;lib/joda-time.jar;lib/joid.jar;lib/lucene-core.jar;lib/mail.jar;lib/netcdfAll-latest.jar;lib/postgresql.jdbc.jar;lib/jaxrpc.jar;lib/saaj.jar;lib/slf4j-jdk14.jar;lib/tsik.jar;lib/wsdl4j.jar
-
So your javac command line will be something like
javac -cp (classpath above) classes/gov/noaa/pfel/coastwatch/TestAll.java
-
And your java command line will be something like
java -cp (classpath above) -Xmx1200M -Xms1200M classes/gov/noaa/pfel/coastwatch/TestAll
Optional: you can add -verbose:gc. It tells Java to print garbage collection statistics.
-
If TestAll compiles, everything ERDDAP needs has been compiled. Lots of classes are compiled that aren’t needed for ERDDAP. If compiling TestAll succeeds but doesn’t compile some class, that class isn’t needed. (There are some unfinished/unused classes.)
-
We use some 3rd party source code instead of .jar files (notably for SSHTools and Dods) and have modified them slightly to avoid problems compiling with Java 1.7. We have often made other slight modifications (notably to Dods) for other reasons.
-
Most classes have test methods. We run lots of tests via TestAll. Unfortunately, many of the tests are specific to our set up. (Sorry.)
-
Important Classes – If you want to look at the source code and try to figure out how ERDDAP works, please do.
-
The code has JavaDoc comments, but the JavaDocs haven’t been generated. Feel free to generate them.
-
The most important classes (including the ones mentioned below) are within gov/noaa/pfel/erddap.
-
The Erddap class has the highest level methods. It extends HttpServlet.
-
Erddap passes requests to instances of subclasses of EDDGrid or EDDTable, which represent individual datasets.
-
EDDGrid and EDDTable subclasses parse the request, get data from subclass-specific methods, then format the data for the response.
-
EDDGrid subclasses push data into GridDataAccessor (the internal data container for gridded data).
-
EDDTable subclasses push data into TableWriter subclasses, which write data to a specific file type on-the-fly.
-
Code Contribution – If you have written some code which would be a nice addition to ERDDAP, please email bob dot simons at noaa dot gov. We’ll work out the details. The two situations that are most likely are:
-
You want to write another subclass of EDDGrid or EDDTable to handle another data source type. If so, we recommend that you find the closest existing subclass and use that code as a starting point.
-
You want to write another saveAsFileType method. If so, we recommend that you find the closest existing saveAsFileType method in EDDGrid or EDDTable and use that code as a starting point.
Both of these situations have the advantage that the code you write is self-contained. You won’t need to know all the details of ERDDAP’s internals. And it will be easy for us to incorporate your code in ERDDAP. Note that if you do submit code, the license will need compatible with the ERDDAP license (e.g., Apache , BSD , or MIT-X ). You can hold the copyright to your code. We’ll list your contribution in the credits.
List of Changes
The List of Changes for each ERDDAP release is now on a separate web page.
Credits
ERDDAP is a product of the NOAA NMFS ERD.
Bob Simons is the author of ERDDAP (the designer and programmer who wrote the ERDDAP-specific code).
Roy Mendelssohn instigated the project.
The ERDDAP-specific code is licensed as copyrighted open source, with NOAA holding the copyright. See the ERDDAP license.
ERDDAP uses copyrighted open source, Apache, LGPL, MIT/X, Mozilla, and public domain libraries and data.
ERDDAP does not require any GPL code or commercial programs.
Some of the funding for work on ERDDAP has come from the NOAA CoastWatch program, the NOAA IOOS program, and the now defunct POST program.
Thank you all very much.
-
ERDDAP is a Java Servlet
program. At ERD, it runs inside of a Tomcat application server (license: Apache ), with an Apache web server (license: Apache ), running on a computer using the Red Hat Linux operating system (license: GPL ).
-
The data sets are from various sources. See the metadata (in particular the “sourceUrl”, “infoUrl”, and “institution”) for each dataset.
-
The com/cohort classes are from CoHort Software (http://www.cohort.com
) which makes these classes available with an MIT/X-like license (see classes/com/cohort/util/LICENSE.txt).
-
Data from OPeNDAP
servers are read with Java DAP 1.1.7 (license: LGPL).
-
NetCDF files (.nc) and GMT-style NetCDF files (.grd) are read and written with code in the NetCDF Java Library
(license: MIT/X-like ) from Unidata .
-
The NetCDF Java Library reads GRIB files via the Unidata GRIB decoder (grib-6.0.jar)
(license: MIT/X-like ).
-
The NetCDF Java Library reads BUFR files via the Unidata BUFR decoder (bufrTables-1.5.jar)
(license: MIT/X-like ).
-
The NetCDF Java Library requires a logger facade (we chose slf4j-jdk14.jar) from the Simple Logging Facade for Java project (license: MIT/X
).
-
The NetCDF Java Library uses code from several .jar files from Apache projects
: commons-codec , commons-httpclient , and commons-logging (license: Apache ).
-
Other parts of ERDDAP use code from other Apache projects
: commons-compress and commons-discovery (license: Apache ).
-
The NetCDF Java Library uses XML processing code from JDOM
(license: Apache ).
-
ERDDAP uses json.org’s Java-based JSON library
to parse JSON data (license: copyrighted open source ).
-
The graphs and maps are created on-the-fly with a modified version of NOAA’s SGT
version 3 (a Java-based Scientific Graphics Toolkit written by Donald Denbo at NOAA PMEL ) (license: copyrighted open source ).
-
Big, HTML tooltips on ERDDAP’s HTML pages are created with Walter Zorn’s wz_tooltip.js (license: LGPL).
-
Sliders and the drag and drop feature of the Slide Sorter are created with Walter Zorn’s wz_dragdrop.js (license: LGPL).
-
The .pdf files are created with iText
(version 1.3.1, which used the Mozilla license), a free Java-PDF library by Bruno Lowagie and Paulo Soares.
-
The shoreline and lake data are from GSHHS
– A Global Self-consistent, Hierarchical, High-resolution Shoreline Database (license: GPL ) and created by Paul Wessel and Walter Smith.
-
The political boundary and river data are from the pscoast
program in GMT , which uses data from the CIA World Data Bank II (license: public domain).
-
The bathymetry/topography data used in the background of some maps is the ETOPO1 Global 1-Minute Gridded Elevation Data Set
(Ice Surface, grid registered, binary, 2 byte int: etopo1_ice_g_i2.zip) (license: public domain ), which is distributed for free by NOAA NGDC .
-
Emails are sent using code in mail.jar from Oracle’s JavaMail API
(license: Oracle Binary Code License Agreement for Java EE Technologies ).
-
JavaMail uses activation.jar from the JavaBeans Activation Framework
(license: Oracle Binary Code License Agreement for Java EE Technologies .
-
For OpenID
authentication, ERDDAP uses the joid.jar and tsik.jar libraries from the joid project from Verisign (license: Apache 2.0 ).
-
ERDDAP uses Joda
for some calendar calculations. (license: Apache ).
-
ERDDAP includes the PostGres JDBC4
driver (license: BSD ). The driver is Copyright (c) 1997-2010, PostgreSQL Global Development Group. All rights reserved.
-
For SOAP services, ERDDAP uses:
-
ERDDAP uses code from the CoastWatch Browser
project from the NOAA CoastWatch West Coast Regional Node (license: copyrighted open source ). That project was initiated and managed by Dave Foley, the Coordinator of the NOAA CoastWatch West Coast Regional Node. The CoastWatch Browser code was written by Bob Simons.
-
The ERDDAP distribution includes code from the J2SSH project which is distributed by SSHTools
(version 0.2.2, license: Apache ). It is based on j2ssh/examples/SftpConnect.java (License: LGPL) which is Copyright (C) 2002 Lee David Painter (lee@sshtools.com).
License
The ERDDAP-specific code is licensed as copyrighted open source, with NOAA holding the copyright. The license is:
ERDDAP, Copyright 2012, NOAA.
PERMISSION TO USE, COPY, MODIFY, AND DISTRIBUTE THIS SOFTWARE AND ITS DOCUMENTATION FOR ANY PURPOSE AND WITHOUT FEE IS HEREBY GRANTED, PROVIDED THAT THE ABOVE COPYRIGHT NOTICE APPEAR IN ALL COPIES, THAT BOTH THE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE APPEAR IN SUPPORTING DOCUMENTATION, AND THAT REDISTRIBUTIONS OF MODIFIED FORMS OF THE SOURCE OR BINARY CODE CARRY PROMINENT NOTICES STATING THAT THE ORIGINAL CODE WAS CHANGED AND THE DATE OF THE CHANGE. THIS SOFTWARE IS PROVIDED “AS IS” WITHOUT EXPRESS OR IMPLIED WARRANTY.
Contact
Questions, comments, suggestions? Please send an email to bob dot simons at noaa dot gov and include the ERDDAP URL directly related to your question or comment.
ERDDAP, Version 1.46
Disclaimers | Privacy Policy
|
Leave a Reply