Modelling of upstream catchment area, shreeve and strahler in Spain and Portugal

The rn layer is based on the rht in France, and the wise layers http://www.mapama.gob.es/es/cartografia-y-sig/ide/directorio_datos_servicios/caracteristicas_wms.aspx in Spain and Portugal https://sniambgeoportal.apambiente.pt/geoportal/catalog/search/resource/details.page?uuid={F199A8DA-ECE8-44FF-B753-37DDD9A70240}. Those have been chained, and the algorythm routing the river from the downstream part has been run.

But to get the surface of the basin upstream, the maximum distance to the source, the shreeve or strahler ranks it has to be routed from the source.

This is done using the BaseEdaRiosRiversegments class which has a method stream_order to run the rivers from the upstream part.

The following code shows examples on how to run the method in Spain :

require("RPostgreSQL")
require("sqldf")
library("sp")
library("rgdal")
library("sf")
source("EDACCM/init.r")
# Creates a class for all rivers in Spain (no basins selected- no emu selected)
rios_surf <- new('BaseEdaRiosRiversegments',
        baseODBC = "baseODBCrios",
        schema = "dbeel_rivers",        
        table = "rn",
        prkey = "idsegment", 
        zonegeo = 'Spain',
        basin=as.character(NULL),
        emu=as.character(NULL))
# Loads all the riversegments from spain including attributes
rios_surf  <- loaddb_all(rios_surf)#2.3 min
#  run the stream order method
rios_surf<-stream_order(rios_surf,silent=TRUE)
# extract a dataset with the columns idsegment, shreeve, strahler, surfacebvm2 (the cumulated surface upstream), and maximum distance source
export <- rios_surf@data[,c("idsegment","shreeve","strahler","surfacebvm2","distancesourcem")]
# save in a temporary table and update the database
sqldf("drop table if exists spain.c_temp")
sqldf("create table spain.c_temp as select * from export")
sqldf("update spain.rna set (shreeve,strahler, surfacebvm2,distancesourcem)= (c_temp.shreeve,c_temp.strahler,c_temp.surfacebvm2,c_temp.distancesourcem) from spain.c_temp
                where c_temp.idsegment=rna.idsegment")
#sqldf("update spain.rna set (shreeve,strahler, surfacebvm2)= (NULL,NULL,NULL)")

The method selects all riversegments and starts from the source. It will stop at a segment untill all ancestors have been found and calculate strahler and shreeve accordingly.

Stahler, shreeve, and distance source

Collection of actual river width data

The data have been collected from electofishing data, a first trial of random sampling using google. Even thought this was designated to concentrate on large water bodies, this sampling very often ended on completely dy lang, in a field, or a dry stream. We have thus chosen a different method with systematic sampling using google and going up the river and trying to measure distance in places where there was no (or the least) influence of downstream dams.

The initial plot shows that a lot of data are missing in some basins, this becomes apparent in a map, a lot of the data for river widht in electrofishing are missing in the central part of Spain and Portugal. We have checked in the source of data, this is not a problem with data integration, we have the surface but not the river width…

Available data

Available data

River width for large streams and reservoirs

Finally, we have used the MERIT Hydro wordwide river width computation to add with to the largest part of the rivers http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro/index.html (Yamazaki et al., 2019). We have projected the raster on the current layer using 10 point interpolation along the riversegment.

CREATE TABLE raster_width.rnsubset as (
WITH dumprn as (
SELECT idsegment,ST_GeometryN(geom,1) as geom FROM spain.rn WHERE ST_NumGeometries(geom) = 1
--and seaidsegment='SP322526'
 -- four multi geometries ignored there
),
 point0 AS (
  SELECT idsegment, 
         ST_value(raster.rast, ST_StartPoint(st_transform(geom,4326)), TRUE) AS pointvalue
  FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_StartPoint(st_transform(geom,4326)), raster.rast))
), point1 AS (
  SELECT idsegment, 
         ST_value(raster.rast, ST_EndPoint(st_transform(geom,4326)), TRUE) AS pointvalue
  FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_EndPoint(st_transform(geom,4326)), raster.rast))
), point01 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.1), TRUE) AS pointvalue
    FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.1), raster.rast))
), point02 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.2), TRUE) AS pointvalue
    FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.2), raster.rast))
), point03 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.3), TRUE) AS pointvalue
    FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.3), raster.rast)))
, point04 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.4), TRUE) AS pointvalue
    FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.4), raster.rast)))
, point05 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.5), TRUE) AS pointvalue
    FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.5), raster.rast)))
, point06 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.6), TRUE) AS pointvalue
    FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.6), raster.rast)))
, point07 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.7), TRUE) AS pointvalue
    FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.7), raster.rast)))
, point08 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.8), TRUE) AS pointvalue
    FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.8), raster.rast)))
, point09 AS (
    SELECT idsegment, 
         ST_value(raster.rast, ST_LineInterpolatePoint(st_transform(geom,4326), 0.9), TRUE) AS pointvalue
         FROM dumprn LEFT JOIN 
       raster_width.merit_width15x15 AS raster 
       ON (ST_Intersects(ST_LineInterpolatePoint(st_transform(geom,4326), 0.9), raster.rast))
       ),
union_tous as(
  SELECT * FROM point0
    UNION
     SELECT * FROM point01
    UNION
     SELECT * FROM point02
     UNION
     SELECT * FROM point03
     UNION
     SELECT * FROM point04
     UNION
     SELECT * FROM point05
     UNION
     SELECT * FROM point06
     UNION
     SELECT * FROM point07
     UNION
    SELECT * FROM  point08
     UNION
     SELECT * FROM point09
     UNION  
     SELECT * FROM point1),
 grouped as (
  SELECT idsegment, avg(pointvalue) as width FROM union_tous where pointvalue>0 group by idsegment)
 SELECT * FROM grouped where width>11.25
  
);

Unlike vector based gis queries, we didn’t use buffer arround lines. For most of large rivers, the main segment went trough the raster part and the calculation of with as the average of the raster part was correct. Some checks with electrofishing data or data collected from google showed that the fit was pretty good except where an Island was present in the channel in which case the total width of the river is underestimated : the projected segment takes the path of one or the other branch and the other is ignored. There was a problem that the width computed for the main stream so a decision was taken to remove the width. The rule was a bit different for portugal and Spain as the river networks do not have the same level of details (we decided to remove a different strahler order, up to three for Spain and one for Portugal).

Width computed in a river, removing low strahler rank (i.e. not applying width to tributaries) makes sense.

Width computed in a river, removing low strahler rank (i.e. not applying width to tributaries) makes sense.

The MERIT method correctly calculates the width of the river by adjusting the flow direction source Yamazaki et al. (2019) but there is no way the segment passing trough a reservoir can adjust to the path chosen in the merit file which was calculated as centerline pixels by searching convex points in bank distance field.

Calculation of stream width (perpendicular to the flow direction) Yamazaki, de Almeida, and Bates (2013) Calculation of flow direction Yamazaki, de Almeida, and Bates (2013)

In some of the reservoirs, the raster will not follow the same path as the river, and some width will be lost, also the removal of low strahler ranks may make less sense, but in our project it is more important to have a correct river width for the downstream parts of rivers, as the reservoirs should be added later as a layer.

Width in reservoir, removing low strahler rank is not always best, but those reservoirs will be added as surface water later in the project.

Width in reservoir, removing low strahler rank is not always best, but those reservoirs will be added as surface water later in the project.

UPDATE spain.rna SET riverwidthm=NULL where strahler<=3; -- remove tributaries
UPDATE portugal.rna SET riverwidthm=NULL where strahler=1; -- remove tributaries

Modelling river width

A quick analysis of the data shows the necessity of a log transformation to normalize the distributions. The inital plot without grouping shows some basins with very few information.

## 
##                CUENCAS INTERNAS PAIS VASCO                                      NORTE                              GALICIA-COSTA 
##                                        936                                       2453                                       3424 
##                                       EBRO                                      DUERO GUADALQUIVIR - ANDALUCIA - GUADIANA - TAJO 
##                                       1083                                         91                                        511 
##             MIAO-LIMIA-MINHO-VOUGA-MONDEGO                               JUCAR-SEGURA               CUENCAS INTERNAS DE CATALUNA 
##                                       2328                                        340                                        306

To choose the grouping of basins, the availability of data has been considered as well as the precipitation level. The runoff is avalaible in the swicca EU project. However as this data is not downloadable as a raster is has been downloaded as an image to help build off diagnostic. SWICCA project

Basins according to runoff (swicca project)

Basins according to runoff (swicca project)

grouped basins comments
MIAO-LIMIA-MINHO-VOUGA-MONDEGO Those basins are in the indermediate part of Portugal where there is an intermeditate water runoff. It would have been preferable to split some of those e.g “MIAO-LIMIA-MINHO” on one hand “VOUGA - MONDEGO” on the other hand but we didn’t have enough data to get a correct fit.
GUADALQUIVIR - ANDALUCIA - GUADIANA - TAJO correspond to the driest part, thanks to the work of University of Cordoba we have plenty of data available.
DUERO It was possible to combine Spanish and Portugese data
JUCAR-SEGURA Few electrofishing data there, but a rather similar river width so these are grouped
EBRO It’s quite obvious that some of the largest surface correspond to small river width, in that case the width is the width of the electrofished area and not the width of the stream. These points have been removed from the dataset

As gam may tend to impose local variations due to incomplete data, or data grouped from several basins, the scam package has been used. Here it imposes a monotone increasing and convex P-splines. The river width recorded using google tend to be larger than those recorded during electrofishing, perhaps because the electrofishing width was to the width of the whole stream in the downstream part. The use of raster calculated width for the largest parts will partly solve this problem. But also imposing a convex and increasing trend is probably better adapted in most cases.

The model selected does not use the shreeve index as the latter is largerly correlated by surface upstream.

load(file=str_c(ddatawd,"WW2.Rdata"))

# the best model is obtained without a common intecept

g0 <- glm(lW~lS,data=WW2); AIC(g0) 
## [1] 18588.87
g1 <- glm(lW~lS*basin_rn,data=WW2); AIC(g1)# 17579
## [1] 17201.95
g2 <- glm(lW~lS*basin_rn+shreeve,data=WW2);AIC(g2) # 17641
## [1] 16933.38
gm0<-mgcv::gam(lW~basin_rn+s(lS,by=basin_rn,k=4),data=WW2) ; AIC(gm0) # 16765
## [1] 16188.62
# for some reason without shreeve takes ages...
#scam0<-scam(lW~basin+s(lS,k=4,bs="micx",m=2,by=basin),data=WW2);AIC(scam0) #17
# again too long
gm1<-mgcv::gam(lW~basin_rn+ti(lS,lshreeve,by=basin_rn,k=c(3,3)),data=WW2); AIC(gm0)
## [1] 16188.62
# not much better
vis.gam(gm1,view=c("lS","lshreeve"),color="heat",theta=-35,phi=40);title("log(width)~log(S)+log(shreeve)")

vis.gam(gm0,theta=-50,phi=40);title("log(width)~log(S) by basin")

# there is a problem as shreeve does not mean the same in Portugal and Spain and we have had
# to mix up the two countries
cor(WW2$lshreeve,WW2$lS) # 0.97
## [1] 0.9739114
#plot(g1,id.n = 5, labels.id = WW2$id) # much better with new basins
#summary(g0)
# does not fit takes ages, the ggplot works by adding a split strategy
t1<-Sys.time()  

scam0<-scam(lW~s(lS,k=4,bs="micx",m=1,by=basin_rn),data=WW2);  
paste("time for model (mins)",round(difftime(Sys.time(),t1,units="mins"),2))
## [1] "time for model (mins) 3.34"
summary(scam0)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lW ~ s(lS, k = 4, bs = "micx", m = 1, by = basin_rn)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.32187    0.03298  -9.759   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                                           edf Ref.df      F p-value    
## s(lS):basin_rnCUENCAS INTERNAS PAIS VASCO                2.00  2.000 2146.4  <2e-16 ***
## s(lS):basin_rnNORTE                                      2.00  2.000 3481.0  <2e-16 ***
## s(lS):basin_rnGALICIA-COSTA                              1.00  1.000 4733.9  <2e-16 ***
## s(lS):basin_rnEBRO                                       2.00  2.000 2484.3  <2e-16 ***
## s(lS):basin_rnDUERO                                      2.00  2.000  442.9  <2e-16 ***
## s(lS):basin_rnGUADALQUIVIR - ANDALUCIA - GUADIANA - TAJO 2.07  2.135 1828.6  <2e-16 ***
## s(lS):basin_rnMIAO-LIMIA-MINHO-VOUGA-MONDEGO             1.00  1.000 4430.5  <2e-16 ***
## s(lS):basin_rnJUCAR-SEGURA                               1.00  1.000 2295.8  <2e-16 ***
## s(lS):basin_rnCUENCAS INTERNAS DE CATALUNA               2.00  2.000 1050.2  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.4949   Deviance explained = 49.6%
## GCV score = 0.25755  Scale est. = 0.25719   n = 11472
## 
## BFGS termination condition:
## 1.595285e-05
E<-resid(scam0)
save(scam0,file=str_c(ddatawd,"scam0.Rdata"))
boxplot(E~WW2$basin_rn,main="Residuals per basin")

# model predictions (without the log scale)
newdata <- expand.grid("basin_rn"=levels(WW2$basin_rn),
                       lS=seq(12.5,20,by=0.1))
newdata$S=exp(newdata$lS)/10^6
newdata$p <- predict(scam0,newdata=newdata)
newdata$pW <-exp(newdata$p)
WW2$S <- WW2$S/10^6

ggplot(WW2)+
  geom_point(aes(x=S,y=W,color=source),alpha=0.5) +
  geom_density_2d(col="pink", aes(x=S,y=W)) +
  geom_line(aes(x=S,y=pW,color=source),alpha=0.5,color="purple",data=newdata)+
  facet_wrap(~basin_rn)+
  scale_x_continuous(name = "Surface (10^6 m2)", limits=c(0,500))+
  scale_y_continuous(name = "River width (m)", limits=c(0,50))
## Warning: Removed 885 rows containing non-finite values (stat_density2d).
## Warning: Removed 885 rows containing missing values (geom_point).

References

Yamazaki, Dai, Gustavo AM de Almeida, and Paul D. Bates. 2013. “Improving Computational Efficiency in Global River Models by Implementing the Local Inertial Flow Equation and a Vector-Based River Network Map.” Water Resources Research 49 (11): 7221–35.

Yamazaki, Dai, Daiki Ikeshima, Jeison Sosa, Paul D. Bates, George Allen, and Tamlin Pavelsky. 2019. “MERIT Hydro: A High-Resolution Global Hydrography Map Based on Latest Topography Datasets.” Water Resources Research, May. doi:10.1029/2019WR024873.