The rn layer is based on the rht in France, and the wise layers http://www.mapama.gob.es/es/cartografia-y-sig/ide/directorio_datos_servicios/caracteristicas_wms.aspx in Spain and Portugal https://sniambgeoportal.apambiente.pt/geoportal/catalog/search/resource/details.page?uuid={F199A8DA-ECE8-44FF-B753-37DDD9A70240}. Those have been chained, and the algorythm routing the river from the downstream part has been run.
But to get the surface of the basin upstream, the maximum distance to the source, the shreeve or strahler ranks it has to be routed from the source.
This is done using the BaseEdaRiosRiversegments class which has a method stream_order to run the rivers from the upstream part.
The following code shows examples on how to run the method in Spain :
require("RPostgreSQL")
require("sqldf")
library("sp")
library("rgdal")
library("sf")
source("EDACCM/init.r")
# Creates a class for all rivers in Spain (no basins selected- no emu selected)
rios_surf <- new('BaseEdaRiosRiversegments',
baseODBC = "baseODBCrios",
schema = "dbeel_rivers",
table = "rn",
prkey = "idsegment",
zonegeo = 'Spain',
basin=as.character(NULL),
emu=as.character(NULL))
# Loads all the riversegments from spain including attributes
rios_surf <- loaddb_all(rios_surf)#2.3 min
# run the stream order method
rios_surf<-stream_order(rios_surf,silent=TRUE)
# extract a dataset with the columns idsegment, shreeve, strahler, surfacebvm2 (the cumulated surface upstream), and maximum distance source
export <- rios_surf@data[,c("idsegment","shreeve","strahler","surfacebvm2","distancesourcem")]
# save in a temporary table and update the database
sqldf("drop table if exists spain.c_temp")
sqldf("create table spain.c_temp as select * from export")
sqldf("update spain.rna set (shreeve,strahler, surfacebvm2,distancesourcem)= (c_temp.shreeve,c_temp.strahler,c_temp.surfacebvm2,c_temp.distancesourcem) from spain.c_temp
where c_temp.idsegment=rna.idsegment")
#sqldf("update spain.rna set (shreeve,strahler, surfacebvm2)= (NULL,NULL,NULL)")
The method selects all riversegments and starts from the source. It will stop at a segment untill all ancestors have been found and calculate strahler and shreeve accordingly.
The data have been collected from electofishing data, a first trial of random sampling using google. Even thought this was designated to concentrate on large water bodies, this sampling very often ended on completely dy lang, in a field, or a dry stream. We have thus chosen a different method with systematic sampling using google and going up the river and trying to measure distance in places where there was no (or the least) influence of downstream dams.
The initial plot shows that a lot of data are missing in some basins, this becomes apparent in a map, a lot of the data for river widht in electrofishing are missing in the central part of Spain and Portugal. We have checked in the source of data, this is not a problem with data integration, we have the surface but not the river width…
Finally, we have used the MERIT Hydro wordwide river width computation to add with to the largest part of the rivers http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro/index.html (Yamazaki et al., 2019). We have projected the raster on the current layer using 10 point interpolation along the riversegment.
CREATE TABLE raster_width.rnsubset as (
WITH dumprn as (
SELECT idsegment,ST_GeometryN(geom,1) as geom FROM spain.rn WHERE ST_NumGeometries(geom) = 1
--and seaidsegment='SP322526'
-- four multi geometries ignored there
),
point0 AS (
SELECT idsegment,
ST_value(raster.rast, ST_StartPoint(st_transform(geom,4326)), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_StartPoint(st_transform(geom,4326)), raster.rast))
), point1 AS (
SELECT idsegment,
ST_value(raster.rast, ST_EndPoint(st_transform(geom,4326)), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_EndPoint(st_transform(geom,4326)), raster.rast))
), point01 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.1), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.1), raster.rast))
), point02 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.2), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.2), raster.rast))
), point03 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.3), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.3), raster.rast)))
, point04 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.4), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.4), raster.rast)))
, point05 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.5), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.5), raster.rast)))
, point06 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.6), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.6), raster.rast)))
, point07 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.7), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.7), raster.rast)))
, point08 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.8), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.8), raster.rast)))
, point09 AS (
SELECT idsegment,
ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.9), TRUE) AS pointvalue
FROM dumprn LEFT JOIN
raster_width.merit_width15x15 AS raster
ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.9), raster.rast))
),
union_tous as(
SELECT * FROM point0
UNION
SELECT * FROM point01
UNION
SELECT * FROM point02
UNION
SELECT * FROM point03
UNION
SELECT * FROM point04
UNION
SELECT * FROM point05
UNION
SELECT * FROM point06
UNION
SELECT * FROM point07
UNION
SELECT * FROM point08
UNION
SELECT * FROM point09
UNION
SELECT * FROM point1),
grouped as (
SELECT idsegment, avg(pointvalue) as width FROM union_tous where pointvalue>0 group by idsegment)
SELECT * FROM grouped where width>11.25
);
Unlike vector based gis queries, we didn’t use buffer arround lines. For most of large rivers, the main segment went trough the raster part and the calculation of with as the average of the raster part was correct. Some checks with electrofishing data or data collected from google showed that the fit was pretty good except where an Island was present in the channel in which case the total width of the river is underestimated : the projected segment takes the path of one or the other branch and the other is ignored. There was a problem that the width computed for the main stream so a decision was taken to remove the width. The rule was a bit different for portugal and Spain as the river networks do not have the same level of details (we decided to remove a different strahler order, up to three for Spain and one for Portugal).
The MERIT method correctly calculates the width of the river by adjusting the flow direction source Yamazaki et al. (2019) but there is no way the segment passing trough a reservoir can adjust to the path chosen in the merit file which was calculated as centerline pixels by searching convex points in bank distance field.
Calculation of stream width (perpendicular to the flow direction) Yamazaki, de Almeida, and Bates (2013) | Calculation of flow direction Yamazaki, de Almeida, and Bates (2013) |
In some of the reservoirs, the raster will not follow the same path as the river, and some width will be lost, also the removal of low strahler ranks may make less sense, but in our project it is more important to have a correct river width for the downstream parts of rivers, as the reservoirs should be added later as a layer.
UPDATE spain.rna SET riverwidthm=NULL where strahler<=3; -- remove tributaries
UPDATE portugal.rna SET riverwidthm=NULL where strahler=1; -- remove tributaries
A quick analysis of the data shows the necessity of a log transformation to normalize the distributions. The inital plot without grouping shows some basins with very few information.
##
## CUENCAS INTERNAS PAIS VASCO NORTE GALICIA-COSTA
## 936 2453 3424
## EBRO DUERO GUADALQUIVIR - ANDALUCIA - GUADIANA - TAJO
## 1083 91 511
## MIAO-LIMIA-MINHO-VOUGA-MONDEGO JUCAR-SEGURA CUENCAS INTERNAS DE CATALUNA
## 2328 340 306
To choose the grouping of basins, the availability of data has been considered as well as the precipitation level. The runoff is avalaible in the swicca EU project. However as this data is not downloadable as a raster is has been downloaded as an image to help build off diagnostic. SWICCA project
grouped basins | comments |
---|---|
MIAO-LIMIA-MINHO-VOUGA-MONDEGO | Those basins are in the indermediate part of Portugal where there is an intermeditate water runoff. It would have been preferable to split some of those e.g “MIAO-LIMIA-MINHO” on one hand “VOUGA - MONDEGO” on the other hand but we didn’t have enough data to get a correct fit. |
GUADALQUIVIR - ANDALUCIA - GUADIANA - TAJO | correspond to the driest part, thanks to the work of University of Cordoba we have plenty of data available. |
DUERO | It was possible to combine Spanish and Portugese data |
JUCAR-SEGURA | Few electrofishing data there, but a rather similar river width so these are grouped |
EBRO | It’s quite obvious that some of the largest surface correspond to small river width, in that case the width is the width of the electrofished area and not the width of the stream. These points have been removed from the dataset |
As gam may tend to impose local variations due to incomplete data, or data grouped from several basins, the scam package has been used. Here it imposes a monotone increasing and convex P-splines. The river width recorded using google tend to be larger than those recorded during electrofishing, perhaps because the electrofishing width was to the width of the whole stream in the downstream part. The use of raster calculated width for the largest parts will partly solve this problem. But also imposing a convex and increasing trend is probably better adapted in most cases.
The model selected does not use the shreeve index as the latter is largerly correlated by surface upstream.
load(file=str_c(ddatawd,"WW2.Rdata"))
# the best model is obtained without a common intecept
g0 <- glm(lW~lS,data=WW2); AIC(g0)
## [1] 18588.87
g1 <- glm(lW~lS*basin_rn,data=WW2); AIC(g1)# 17579
## [1] 17201.95
g2 <- glm(lW~lS*basin_rn+shreeve,data=WW2);AIC(g2) # 17641
## [1] 16933.38
gm0<-mgcv::gam(lW~basin_rn+s(lS,by=basin_rn,k=4),data=WW2) ; AIC(gm0) # 16765
## [1] 16188.62
# for some reason without shreeve takes ages...
#scam0<-scam(lW~basin+s(lS,k=4,bs="micx",m=2,by=basin),data=WW2);AIC(scam0) #17
# again too long
gm1<-mgcv::gam(lW~basin_rn+ti(lS,lshreeve,by=basin_rn,k=c(3,3)),data=WW2); AIC(gm0)
## [1] 16188.62
# not much better
vis.gam(gm1,view=c("lS","lshreeve"),color="heat",theta=-35,phi=40);title("log(width)~log(S)+log(shreeve)")
vis.gam(gm0,theta=-50,phi=40);title("log(width)~log(S) by basin")
# there is a problem as shreeve does not mean the same in Portugal and Spain and we have had
# to mix up the two countries
cor(WW2$lshreeve,WW2$lS) # 0.97
## [1] 0.9739114
#plot(g1,id.n = 5, labels.id = WW2$id) # much better with new basins
#summary(g0)
# does not fit takes ages, the ggplot works by adding a split strategy
t1<-Sys.time()
scam0<-scam(lW~s(lS,k=4,bs="micx",m=1,by=basin_rn),data=WW2);
paste("time for model (mins)",round(difftime(Sys.time(),t1,units="mins"),2))
## [1] "time for model (mins) 3.34"
summary(scam0)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## lW ~ s(lS, k = 4, bs = "micx", m = 1, by = basin_rn)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.32187 0.03298 -9.759 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(lS):basin_rnCUENCAS INTERNAS PAIS VASCO 2.00 2.000 2146.4 <2e-16 ***
## s(lS):basin_rnNORTE 2.00 2.000 3481.0 <2e-16 ***
## s(lS):basin_rnGALICIA-COSTA 1.00 1.000 4733.9 <2e-16 ***
## s(lS):basin_rnEBRO 2.00 2.000 2484.3 <2e-16 ***
## s(lS):basin_rnDUERO 2.00 2.000 442.9 <2e-16 ***
## s(lS):basin_rnGUADALQUIVIR - ANDALUCIA - GUADIANA - TAJO 2.07 2.135 1828.6 <2e-16 ***
## s(lS):basin_rnMIAO-LIMIA-MINHO-VOUGA-MONDEGO 1.00 1.000 4430.5 <2e-16 ***
## s(lS):basin_rnJUCAR-SEGURA 1.00 1.000 2295.8 <2e-16 ***
## s(lS):basin_rnCUENCAS INTERNAS DE CATALUNA 2.00 2.000 1050.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.4949 Deviance explained = 49.6%
## GCV score = 0.25755 Scale est. = 0.25719 n = 11472
##
## BFGS termination condition:
## 1.595285e-05
E<-resid(scam0)
save(scam0,file=str_c(ddatawd,"scam0.Rdata"))
boxplot(E~WW2$basin_rn,main="Residuals per basin")
# model predictions (without the log scale)
newdata <- expand.grid("basin_rn"=levels(WW2$basin_rn),
lS=seq(12.5,20,by=0.1))
newdata$S=exp(newdata$lS)/10^6
newdata$p <- predict(scam0,newdata=newdata)
newdata$pW <-exp(newdata$p)
WW2$S <- WW2$S/10^6
ggplot(WW2)+
geom_point(aes(x=S,y=W,color=source),alpha=0.5) +
geom_density_2d(col="pink", aes(x=S,y=W)) +
geom_line(aes(x=S,y=pW,color=source),alpha=0.5,color="purple",data=newdata)+
facet_wrap(~basin_rn)+
scale_x_continuous(name = "Surface (10^6 m2)", limits=c(0,500))+
scale_y_continuous(name = "River width (m)", limits=c(0,50))
## Warning: Removed 885 rows containing non-finite values (stat_density2d).
## Warning: Removed 885 rows containing missing values (geom_point).
Yamazaki, Dai, Gustavo AM de Almeida, and Paul D. Bates. 2013. “Improving Computational Efficiency in Global River Models by Implementing the Local Inertial Flow Equation and a Vector-Based River Network Map.” Water Resources Research 49 (11): 7221–35.
Yamazaki, Dai, Daiki Ikeshima, Jeison Sosa, Paul D. Bates, George Allen, and Tamlin Pavelsky. 2019. “MERIT Hydro: A High-Resolution Global Hydrography Map Based on Latest Topography Datasets.” Water Resources Research, May. doi:10.1029/2019WR024873.