Firstly, hats off to the person/people at SEPA who've pushed this through. For anyone who isn't aware, this is a way of retrieving time-series data held by SEPA. Of note is that is gives access to high resolution time series. In particular I'm pleased to see the API and browser approach, rather than a GUI which, while easier for more people to use, would need to be significantly less flexible. The result looks like a significant boost to hydrology in Scotland.
The web link from SEPA explains clearly how to manually use HTML requests to retrieve data. However, for the analyst, getting the data into a useful piece of software, or saved in a way that facilitates analysis is important.
As I'm aware that many people out there are using R for hydrological analysis, these notes help to outline how to use R to query the SEPA database in a basic form.
There are many ways to skin a cat. A clunky method is to just call the HTML within R, and then scrape the resulting webpage. This can be done using the XML2 and rvest packages. So taking the most basic HTML function on SEPA's website, the code would look something like:
library(xml2)
library(rvest)
webpage_url <- "https://timeseries.sepa.org.uk/KiWIS/KiWIS?service=kisters&type=queryServices&datasource=0&request=getStationList"
webpage <- xml2::read_html(webpage_url)
data <- rvest::html_table(webpage)[[1]] %>%
tibble::as_tibble(.name_repair = "unique") # cleans up repeated columns
data %>% dplyr::glimpse(45)
However, as mentioned, this is clunky and still needs a bit of tidying up. A neater way is perhaps to use a package specifically developed for querying KISTERS WISKI databases. This package is called kiwisR and can be installed in the usual way in R or Rstudio. The user manual is here and the GitHub repository for the code is here. In this case, there's no need to use HTML code. So, taking the example above, to return the list of stations is simply:
library(kiwisR)##loads the correct package
ki_station_list(hub="https://timeseries.sepa.org.uk/KiWIS/KiWIS?")
Looks a bit neater! The documentation referenced above allows you to return time series in the same way as the HTML code would, specifying station/time-series ID. Then you can continue with whatever analysis or modelling you want.
While this data's always been available through data requests to the hydrometric teams, I think the ability to easily retrieve datasets will lead to a lot more 'messing' and so I'll be interested to see what applications we see as a result of this service. For my part there's a few pet projects I can think of which can more easily be brought to fruition.
Comentarios