The old trick:
t = readHTMLTable("http://www.epexspot.com/en/market-data") DE = t[][,9]
does not work any more. Actually, this data source still works, but some other sources do not. This code reads the latest German electricity price (all hours) from the page where there are the prices for the latest week in one table. The table is generated in the epexspot back end and the table is nicely in the HTML code.
It may be that the information is intentionally hidden. But it can not be hidden, what the user sees in the browser.
There are theree alternatives to get the information:
I have selected the third approach.
Remote controlled browsers are widely used for testing. Therefore, there are well developed systems to programmatically control a running browser. One such framework is Selenium.
Relenium is a R library to use Selenium.
To use it, we need an impressive stack of software:
- Relenium, that is interface library in R
- R java interface library, because Selenium is written in Java
- Selenium software
- Webdriver - a part of selenium that interacts with Firefox, WWW-consortium standard
- Firefox browser
I don’t go through all the dependencies and version problems I had in installation. Just couple of the deserves mention.
First of all, the selenium that is currently available in repositories works only with Firefox 30 when the current version is 32. With Firefox 32 Selenium starts the Firefox but hangs there.
If your application runs in a normal server that is "headless" - it does not have graphical user interface, read the chapter "making Firefox headless".
Easy installation of relenium requires R devtools package. It has a non-trivial dependency of development libraries of curl package in Linux. In Debian it is libcurl4-openssl-dev. In Fedora it is libcurl-devel.
Another dependency is a fully installed proprietary Java SDK that is painful in most Linuxes. In older Debian squeeze, Java 6 is available in non-free but in newer Debians it is removed due to license restrictions. And even in Debian squeeze, remember to run manually:
sudo update-java-alternatives -s java-6-sun
after installing the package. In other Linuxes you must update every alternative separately (jar, javac, javah, …) after installing JDK from Oracle site. Alternetives must work right to get rJava installed.
Java environment is configured for R using command:
sudo R CMD javareconf
After the Linux dependencies are done, the installation of relenium is like
install.packages("rJava") install.packages("devtools") require(devtools) install_github('seleniumJars', 'LluisRamon') install_github('relenium', 'LluisRamon')
Selenium is packaged with relenium (‘seleniumJars’), so it does not need separate installation.
require(relenium) firefox = firefoxClass$new() firefox$get("http://www.epexspot.com/en/market-data") html = firefox$getPageSource()
And the result can be translated as before:
require(XML) t = readHTMLTable(html) DE = t[][,9] DE.date = as.character(DE) DE = DE[seq(2, length(DE), 2)] DE = as.numeric(as.character(DE)) DE = DE / 10.0
The code produces a vector of 24 hourly electricity prices of the latest day in Germany (as c/kWh). For example, the result today is
> DE  2.834 2.924 2.846 2.695 2.579 2.717 4.324 5.278 5.217 4.989 4.545 3.907  3.107 2.833 2.707 2.934 3.163 4.006 4.360 4.591 4.093 2.978 2.551 1.383
If you are running the code in a normal server environment where there is no graphical user interface, Firefox does not find user interface to start. It is possible to define a virtual user interface that satisfies Firefox.
The virtual X user interface is xvfb.
sudo apt-get install xvfb
And to start the server:
Xvfb :10 -ac &
And to start the headless Firefox:
export DISPLAY=:10 firefox
It might be tricky to get the environment variable DISPLAY to the firefox started by relenium but luckily, if the DISPLAY is specified before starting R, the DISPLAY variable is mediated to the firefox started using relenium (shell->R->Java->Firefox).