Multiscale sediment dynamics¶
Nicolas Pucino; PhD Student @ Deakin University, Australia
This notebook covers the following concepts:
- The ProfileDynamics class
- Multiscale volumetric analysis and plotting
- Hotspot analysis
- Multiscale beachface Cluster Dynamics indices and plotting </font>
%matplotlib notebook
import matplotlib.pyplot as plt
import pickle
import pandas as pd
import seaborn as sb
from sandpyper.hotspot import ProfileDynamics
from sandpyper.dynamics import sensitivity_tr_rbcd, plot_sensitivity_rbcds_transects
pd.options.mode.chained_assignment = None # default='warn'
C:\conda3\envs\sandpyper_env\lib\site-packages\pysal\explore\segregation\network\network.py:15: UserWarning: You need pandana and urbanaccess to work with segregation's network module You can install them with `pip install urbanaccess pandana` or `conda install -c udst pandana urbanaccess` warn( C:\conda3\envs\sandpyper_env\lib\site-packages\pysal\model\spvcm\abstracts.py:10: UserWarning: The `dill` module is required to use the sqlite backend fully. from .sqlite import head_to_sql, start_sql C:\conda3\envs\sandpyper_env\lib\site-packages\fuzzywuzzy\fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
P=pickle.load(open(r"C:\my_packages\sandpyper\tests\test_data\test.p", "rb"))
Introduction¶
In this notebook, we finally get some dynamics data out of our profiles using the ProfileDynamics class.
ProfileDynamics class¶
The ProfileDynamics class is the core object in sandpyper. Its methods allows to:
- compute multitemporal elevation changes
- compute Limit of Detections (LoDs) statistics
- compute multi-scale volumetrics
- discretise elevation changes into discrete classes of magnite of changes with a variety of methods (thanks to the Python package pysal)
- compute multi-scale beachface cluster dynamics indices (BCDs, check the paper Pucino et al., 2021)
- various plotting capabilities
labels=["Undefined", "Small", "Medium", "High", "Extreme"]
appendix=["_deposition", "_erosion"]
Here below we instantiate a ProfileDynamics object which discretises the data into 5 bins using the Jenks and Caspall method. We also provides labels to the bins, in order to be able to name them and keep the analysis clear.
D = ProfileDynamics(P, bins=5, method="JenksCaspall", labels=labels)
Data will be partitioned into 5 discrete classes. Labels provided.
Computing multitemporal dataset (dh_df)¶
The method ProfileDynamics.compute_multitemporal allows to:
- create pairwise (date from - date to) elevation difference dataset called dh_df
- option to filter data based on a class. For instance, computing elevationc hanges of sand only
- option to provide a dictionary to use full length location names instead of location codes in some plots
Once run, this will add an attribute called dh_df that stores the dataframe of elevation changes across all the surveys and time periods. If LoD was previously derived with LoD transects, this method will also compute lod_dh table, which is used to create an LoD table.
Let's see how this work.
D.compute_multitemporal(loc_full={'mar': 'Marengo',
'leo': 'St. Leonards'}, filter_class='sand')
Filter activated: only ['sand'] points will be retained. working on leo
Calculating dt0, from 20180606 to 20180713 in leo. Calculating dt1, from 20180713 to 20180920 in leo. Calculating dt2, from 20180920 to 20190211 in leo. Calculating dt3, from 20190211 to 20190328 in leo. Calculating dt4, from 20190328 to 20190731 in leo. working on mar
Calculating dt0, from 20180601 to 20180621 in mar. Calculating dt1, from 20180621 to 20180727 in mar. Calculating dt2, from 20180727 to 20180925 in mar. Calculating dt3, from 20180925 to 20181113 in mar. Calculating dt4, from 20181113 to 20181211 in mar. Calculating dt5, from 20181211 to 20190205 in mar. Calculating dt6, from 20190205 to 20190313 in mar. Calculating dt7, from 20190313 to 20190516 in mar. done Computing LoD dh data. working on mar
Calculating dt0, from 20180601 to 20180621 in mar. Calculating dt1, from 20180621 to 20180727 in mar. Calculating dt2, from 20180727 to 20180925 in mar. Calculating dt3, from 20180925 to 20181113 in mar. Calculating dt4, from 20181113 to 20181211 in mar. Calculating dt5, from 20181211 to 20190205 in mar. Calculating dt6, from 20190205 to 20190313 in mar. Calculating dt7, from 20190313 to 20190516 in mar. working on leo
Calculating dt0, from 20180606 to 20180713 in leo. Calculating dt1, from 20180713 to 20180920 in leo. Calculating dt2, from 20180920 to 20190211 in leo. Calculating dt3, from 20190211 to 20190328 in leo. Calculating dt4, from 20190328 to 20190731 in leo. done
We can see that mutlitemporal statistics have been computed for each time period in all locations, creating the dh_df dataframe.
D.dh_df.head()
geometry | location | tr_id | distance | class_filter | dt | date_pre | date_post | z_pre | z_post | dh | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (731628.744 5705535.887) | mar | 24 | 22.0 | sand | dt_7 | 20190313 | 20190516 | 1.177169 | 0.517324 | -0.659844 |
1 | POINT (731627.918 5705536.452) | mar | 24 | 23.0 | sand | dt_7 | 20190313 | 20190516 | 1.176551 | 0.679524 | -0.497027 |
2 | POINT (731627.093 5705537.016) | mar | 24 | 24.0 | sand | dt_7 | 20190313 | 20190516 | 1.182770 | 0.768895 | -0.413875 |
3 | POINT (731626.267 5705537.581) | mar | 24 | 25.0 | sand | dt_7 | 20190313 | 20190516 | 1.183118 | 0.863863 | -0.319256 |
4 | POINT (731625.442 5705538.145) | mar | 24 | 26.0 | sand | dt_7 | 20190313 | 20190516 | 1.183118 | 0.906839 | -0.276280 |
The dt column stores a 'time-period ID'. For instance, in St. Leonards, the changes of points from (20180606) to (20180713) is named dt_0, because it is the first time period in this location. The second one, (20180713) to (20180920) is named dt_1 and so on.
All the information about time periods are stored into a dataframe as an attribute called ProfileDynamics.dh_details.
D.dh_details
dt | date_pre | date_post | location | n_days | loc_full | |
---|---|---|---|---|---|---|
0 | dt_0 | 20180606 | 20180713 | leo | 37 | St. Leonards |
1 | dt_1 | 20180713 | 20180920 | leo | 69 | St. Leonards |
2 | dt_2 | 20180920 | 20190211 | leo | 144 | St. Leonards |
3 | dt_3 | 20190211 | 20190328 | leo | 45 | St. Leonards |
4 | dt_4 | 20190328 | 20190731 | leo | 125 | St. Leonards |
5 | dt_0 | 20180601 | 20180621 | mar | 20 | Marengo |
6 | dt_1 | 20180621 | 20180727 | mar | 36 | Marengo |
7 | dt_2 | 20180727 | 20180925 | mar | 60 | Marengo |
8 | dt_3 | 20180925 | 20181113 | mar | 49 | Marengo |
9 | dt_4 | 20181113 | 20181211 | mar | 28 | Marengo |
10 | dt_5 | 20181211 | 20190205 | mar | 56 | Marengo |
11 | dt_6 | 20190205 | 20190313 | mar | 36 | Marengo |
12 | dt_7 | 20190313 | 20190516 | mar | 64 | Marengo |
In the same way the dh_df dataframes has been computed, the lod_dh table is also computed, except that no filters are applied and all points previously extracted from the LoD transects are used.
D.lod_dh.head()
geometry | location | tr_id | distance | class_filter | dt | date_pre | date_post | z_pre | z_post | dh | dh_abs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (299959.041 5773365.227) | leo | 7 | 0 | no_filters_applied | dt_4 | 20190328 | 20190731 | 1.252971 | 1.221372 | -0.031598 | 0.031598 |
1 | POINT (299959.299 5773364.260) | leo | 7 | 1 | no_filters_applied | dt_4 | 20190328 | 20190731 | 1.250840 | 1.237120 | -0.013720 | 0.013720 |
2 | POINT (299959.556 5773363.294) | leo | 7 | 2 | no_filters_applied | dt_4 | 20190328 | 20190731 | 1.264070 | 1.245647 | -0.018423 | 0.018423 |
3 | POINT (299959.813 5773362.328) | leo | 7 | 3 | no_filters_applied | dt_4 | 20190328 | 20190731 | 1.266122 | 1.241259 | -0.024862 | 0.024862 |
4 | POINT (299960.070 5773361.361) | leo | 7 | 4 | no_filters_applied | dt_4 | 20190328 | 20190731 | 1.262709 | 1.276591 | 0.013882 | 0.013882 |
From this dataset, a set of useful error (i.e, elevation change in calibration areas where a theoretical zero change should occur) statistics is also computed, which include:
- mean (column: 'mean')
- median (column: 'med')
- standard deviation (column: 'std')
- normalised medain absolute deviation (column: 'nmad')
- 68.3th quantile of absolute error (column: 'a_q683')
- 95th quantile of absolute error (column: 'a_q95')
- robust root mean squared error (column: 'rrmse')
- number of total observations (column: 'n')
- n_outliers using the 3-sigma method (column: 'n_outliers')
- Shapiro–Wilk statistics, p-values and normality evaluation (columns: 'saphiro_stat', 'saphiro_p', 'saphiro_normality')
- D’Agostino–Pearson statistics, p-values and normality evaluation (columns: 'ago_stat', 'ago_p', 'agoo_normality')
- the chosen Limit of Detection based on normality check (column: 'lod')
All this information is stored in the lod_df attribute.
D.lod_df
location | dt | mean | med | std | nmad | a_q683 | a_q95 | rrmse | n | n_outliers | saphiro_stat | saphiro_p | ago_stat | ago_p | saphiro_normality | ago_normality | lod | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | leo | dt_0 | -0.016272 | -0.064292 | 0.150259 | 0.101219 | 0.112612 | 0.345512 | 0.119911 | 93 | 0 | 0.858607 | 5.912969e-08 | 37.326988 | 7.844181e-09 | not normal | not normal | 0.101219 |
1 | leo | dt_1 | -0.174044 | -0.169029 | 0.219942 | 0.111155 | 0.217434 | 0.587310 | 0.202302 | 93 | 3 | 0.828144 | 5.102937e-09 | 48.123011 | 3.549940e-11 | not normal | not normal | 0.111155 |
2 | leo | dt_2 | 0.055579 | 0.031193 | 0.237583 | 0.071532 | 0.082824 | 0.478929 | 0.078037 | 93 | 0 | 0.621792 | 4.005841e-14 | 83.112916 | 8.959002e-19 | not normal | not normal | 0.071532 |
3 | leo | dt_3 | 0.034389 | 0.070812 | 0.212526 | 0.111180 | 0.142241 | 0.419897 | 0.131816 | 93 | 2 | 0.822265 | 3.284433e-09 | 33.561621 | 5.154500e-08 | not normal | not normal | 0.111180 |
4 | leo | dt_4 | 0.012034 | 0.006063 | 0.178835 | 0.098167 | 0.097641 | 0.393089 | 0.098354 | 93 | 1 | 0.815170 | 1.953578e-09 | 51.212861 | 7.573009e-12 | not normal | not normal | 0.098167 |
5 | mar | dt_0 | 0.052821 | -0.000377 | 0.299611 | 0.030544 | 0.070590 | 0.720866 | 0.030547 | 93 | 0 | 0.563625 | 3.452208e-15 | 84.456100 | 4.577105e-19 | not normal | not normal | 0.030544 |
6 | mar | dt_1 | 0.042716 | -0.020326 | 0.321094 | 0.049026 | 0.099827 | 0.505632 | 0.053072 | 93 | 0 | 0.604403 | 1.874006e-14 | 88.045972 | 7.604313e-20 | not normal | not normal | 0.049026 |
7 | mar | dt_2 | -0.001520 | 0.005103 | 0.153149 | 0.028770 | 0.043764 | 0.270786 | 0.029219 | 93 | 1 | 0.606796 | 2.077573e-14 | 96.624273 | 1.043051e-21 | not normal | not normal | 0.028770 |
8 | mar | dt_3 | -0.132597 | -0.031893 | 0.427761 | 0.091381 | 0.215306 | 1.105026 | 0.096786 | 93 | 3 | 0.805272 | 9.664408e-10 | 37.493918 | 7.216045e-09 | not normal | not normal | 0.091381 |
9 | mar | dt_4 | 0.080227 | 0.018110 | 0.318988 | 0.098746 | 0.265875 | 0.633117 | 0.100393 | 93 | 0 | 0.864506 | 9.857729e-08 | 33.143538 | 6.352901e-08 | not normal | not normal | 0.098746 |
10 | mar | dt_5 | 0.091513 | -0.000182 | 0.283531 | 0.078057 | 0.180398 | 0.744685 | 0.078057 | 93 | 0 | 0.826460 | 4.493375e-09 | 26.286661 | 1.958503e-06 | not normal | not normal | 0.078057 |
11 | mar | dt_6 | 0.039482 | -0.002108 | 0.346209 | 0.044897 | 0.157244 | 0.806286 | 0.044947 | 66 | 1 | 0.806915 | 7.147010e-08 | 17.631138 | 1.484045e-04 | not normal | not normal | 0.044897 |
12 | mar | dt_7 | -0.404197 | -0.200153 | 0.390672 | 0.097589 | 0.447426 | 1.168575 | 0.222676 | 66 | 4 | 0.789516 | 2.540825e-08 | 27.461689 | 1.088353e-06 | not normal | not normal | 0.097589 |
Plot: LoD normality check¶
Eventhough statistical tests can be useful in identifying wether a distribution is normal or not, especially in the case of photogrammetrically derived products, it is advised to also visually check both the histograms and the Q-Q plots of absolute error.
Sandpyper allows quick and easy generation of such plots, here is an example for Marengo.
D.plot_lod_normality_check(locations=['leo'])
Multiscale volumetrics¶
With the LoD and dh_df tables we are finally ready to observe multi-scale change.
Mean Elevation Change¶
The Mean Elevation Change (MEC) is calculated with the totality of the valid points (filtered absed on class, beyond LoD, within the beachface) that occur both in the pre and post dataset in each location.
MEC is the preferred metric as it allows a robust comparison between locations and time, given a context where number of transects and retained valid points can drastically change from time to time or from location to locatio due to changes in:
- survey extent
- wave run-up and the swash level
- non-sand points across the beachface
Volumetric estimation from profiles¶
The volumes of beachface change, instead, are estimated, using the following procedure.
- Along each transect, the missing points (due to non-sand filters if applied) are replaced by a linearly interpolated value.
- We integrate the change along the cleaned transect to estimated volumetric change at each transect.
- We multiply the sum of all transect volume change by the transect spacing, to obtain a location-level beachface change estimation.
This approximation is necessary to go from transect change to beachface change. Here below is how Sandpyper takes care of that in one line, computing multi-scale MECs and volumetrics in one line, by just calling the ProfileDynamics.compute_volumetrics method.
This will add two additional dataframes as attributes, named location_volumetrics and transects_volumetrics, which will store a number of statistics.
# compute volumetrics
D.compute_volumetrics(lod=D.lod_df)
Using LoDs.
D.location_volumetrics.head()
location | location_full | dt | date_from | date_to | n_days | abs_in | abs_out | abs_net_change | mec_m | norm_in | norm_out | norm_net_change | tot_vol_depo | tot_vol_ero | net_vol_change | location_m3_m | n_obs_valid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | leo | St. Leonards | dt_0 | 20180606 | 20180713 | 37 | 9.489928 | -20.506530 | -11.016602 | -0.018994 | 0.041082 | -0.088773 | -0.047691 | 134.334184 | -362.662966 | -228.328782 | -0.393670 | 231 |
1 | leo | St. Leonards | dt_1 | 20180713 | 20180920 | 69 | 18.725955 | -9.525609 | 9.200346 | 0.016429 | 0.085507 | -0.043496 | 0.042011 | 322.445395 | -103.237953 | 219.207442 | 0.391442 | 219 |
2 | leo | St. Leonards | dt_2 | 20180920 | 20190211 | 144 | 15.840791 | -7.481373 | 8.359419 | 0.014928 | 0.076158 | -0.035968 | 0.040190 | 285.138428 | -138.500707 | 146.637721 | 0.261853 | 208 |
3 | leo | St. Leonards | dt_3 | 20190211 | 20190328 | 45 | 10.163696 | -11.862978 | -1.699282 | -0.002930 | 0.040015 | -0.046705 | -0.006690 | 181.204725 | -171.608277 | 9.596448 | 0.016546 | 254 |
4 | leo | St. Leonards | dt_4 | 20190328 | 20190731 | 125 | 6.105440 | -29.049858 | -22.944418 | -0.040972 | 0.026896 | -0.127973 | -0.101077 | 89.599553 | -511.175997 | -421.576444 | -0.752815 | 227 |
D.transects_volumetrics.head()
tr_id | tot_vol_change | m3_m | n_obs_valid | abs_in | abs_out | abs_net_change | mec_m | norm_in | norm_out | norm_net_change | dt | location | location_full | date_from | date_to | n_days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18 | 1.418567 | -2.837134 | 10 | 1.409220 | -0.326091 | 1.083129 | -0.108313 | 0.140922 | -0.032609 | 0.108313 | dt_0 | leo | St. Leonards | 20180606 | 20180713 | 37 |
1 | 19 | -0.306168 | 0.874765 | 8 | 0.178991 | -0.555436 | -0.376445 | 0.053778 | 0.022374 | -0.069430 | -0.047056 | dt_0 | leo | St. Leonards | 20180606 | 20180713 | 37 |
2 | 20 | 0.190265 | -0.951327 | 5 | 0.258673 | 0.000000 | 0.258673 | -0.064668 | 0.051735 | 0.000000 | 0.051735 | dt_0 | leo | St. Leonards | 20180606 | 20180713 | 37 |
3 | 21 | -0.036110 | 0.144439 | 6 | 0.128409 | -0.215072 | -0.086663 | 0.017333 | 0.021401 | -0.035845 | -0.014444 | dt_0 | leo | St. Leonards | 20180606 | 20180713 | 37 |
4 | 22 | -1.378348 | 4.594493 | 6 | 0.000000 | -1.248788 | -1.248788 | 0.208131 | 0.000000 | -0.208131 | -0.208131 | dt_0 | leo | St. Leonards | 20180606 | 20180713 | 37 |
Amongst the varios statistics computed (check the docs for details), the most important fields are:
- norm_net_change: abs_net_change/total number of points used in the calculation of this time period (n_obs_valid). This is the MEC.
- net_vol_change: Net volume change (m3)
- location_m3_m: Net volume change / beach length (m3/m)
- cum: Cumulative net volume change since the beginning of the monitoring (m3)
- cum_mec: Cumulative norm_net_change since the beginning of the monitoring (m3)
Let's use sandpyper to visualise hese stats.
Plot: multi-scale MECS and volumetrics¶
Transect-level¶
The method ProfileDynamics.plot_transects allow to visualise elevation profile changes in selected locations, tr_ids and time periods. COnveniently, if the data has been classified, by setting the parameter classified=True we can also see what class each point was classified into.
NOTE: if classified=False, the points that have been filtered out will not be displayed. If classified=True, then all points will be plotted. This is done to have a visual understanding of the cleaning procedure applied and appreciate what is actually retained to compute change statistics.
# let's plot transect ID 10 in Marengo, for dt_2 and dt_3 with no classification applied.
D.plot_transects(location='mar', tr_id=10, dt=['dt_2','dt_3'], classified=False)
# same, with classification applied.
D.plot_transects(location='mar', tr_id=10, dt=['dt_2','dt_3'], classified=True)
In the above example, we can see how a major sand nourishment interention built a considerable foredune during dt_2. However, as it is shown in dt_3, this intervation did't last long, as the sand got eroded and the profile started receding again.
To have a full overview of a transect history all in one place, Sandpyper can plot it for you.
D.plot_transect_mecs(location='mar',tr_id=10)
Statistics are computed based on the following classes only: sand
Or, we can also have use Sandpyper to plot both alongshore heatmaps of altimetric change and alongshore estimated volume changes, in all locations and all time-periods.
D.plot_alongshore_change(mode='subset', # to plot only a subset of locations and dts, rather than all
dt_subset=['dt_2', 'dt_3'],
location_subset=["mar"],
lod=D.lod_df, # LoD will be applied when provided in form of lod dataframe or single (global) value
ax2_y_lims=(-10, 20), # lineplot y axis limits
fig_size=(10,6),
font_scale=.9,
plots_spacing=0, # spacing between the two plots. Try .1, it is also nice
bottom=True,
y_heat_bottom_limit=40, # how far seawar the heatmap will extent.
transect_spacing=D.ProfileSet.transects_spacing,
heat_yticklabels_freq=5,
heat_xticklabels_freq=5,)
Working on mar at dt_2
Working on mar at dt_3
Location-level¶
So far we saw data and plots at the transect level. Let's see a few plots at the location level, starting with the ProfileDynamics.plot_single_loc method.
D.plot_single_loc(["mar"],None)
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
In the above plot we can appreciate the volume change timeseries of Marengo, with the solid line representing the period-specific change while the dashed line the cumulative change from the start of the monitoring.
Alternatively, we can plot both locations MECs timeseries in order to compare their elevational changes through time, using the method ProfileDynamics.plot_mec_evolution.
# scale_mode can be 'auto', which means the x axis is location-specific, or 'equal', better for comparison.
D.plot_mec_evolution(location_field="location",x_limits=(-.2, .8),
loc_order=["leo","mar"],
x_binning=10,
figure_size=(6, 7),
font_scale=1.2)
Hotspot analysis: Local Indicator of Spatial Association (LISA)¶
In order to discard spatial outliers and get a better representation of the beachface dynamics by at the same time reducing the number of points analysed (in real-world case this can be an important aspect), we use the Local Moran's I index, which is one of the most used spatial autocorrelation measure, avalibale thanks to the great work in the Python package Pysal.
The method ProfileDynamics.LISA_site_level performs a Local Moran's I with False Discovery Rate (fdr) correction analysis for all the elevation change points in each survey in the dh_df dataframe.
This method can use KNN-based, inverse distance weighted or binary distance-based spatial weight matrices. In this example, we model spatial relationships with a distance-based row standardised binary weight matrix with neighborhood radius of 35 m, in order to include two adjacent transect without getting too far from the focal point.
D.LISA_site_level(mode="distance", distance_value=35)
Working on mar
Working on leo
This method adds some hotspot analysis relevant columns to the dh_df table, most importantly:
- false discovery rate threshold (column: lisa_fdr)
- local Moran's I (column: lisa_I)
- simulated pseudo p and z values (columns: lisa_p_sim, lisa_z_value)
- Moran's scatter plot quadrant (column: lisa_q)
MORAN SCATTERPLOT (High-High cluster, High-Low spatial outlier, Low-Low cluster and Low-High spatial outlier)
Although technically the High-High and Low-Low Moran's scatter plot quadrants represent statistically significant spatial clusters, we commonly refer to these points as hotspots and coldspots.
IMPORTANT NOTE: Hot versus cold spots do not necessary reflect positive and negative values, rather, they represent statistically significan changes in the magnitude of the process analysed (elevationc change). Thus, in a scenario of extremely high deposition, hotspots represents high values of deposition surrounded by other high values of deposition, while coldspots represent low values of deposition surrounded by other low values of deposition. The choice of Local Moran's I as LISA has been motivated by the fact that we want to represent dynamics as best as we can by removing spatial outliers at the location level and we are not necessarily interested (at this stage) at the actual human interpretation (deposition vs erosion).
Now that we have statistically significant hotspots of change, we can use those points to capture the most interesting areas of beachface change which we use to model behaviour.
ANOTHER NOTE: Our beachfaces are narrow, and we use only reliable valid points using multiple levels of filtering (LoD, beachface area, sand-only), which significantly reduce the total number of usable points in each timestep. Because in our next step we will compute the Beachface Cluster Dynamics indices both at the location and transect scales, we will use the hotspot filtering only at the location level and not at the transect level. This is necessary to assure that we have enough points in each transect to model their behaviours.
D.hotspots.query("location=='mar' and dt=='dt_2' and lisa_q in [1,3]").plot(column='lisa_q', categorical=True,
cmap='RdBu')
<AxesSubplot:>
Multiscale Beachface Cluster Dynamics (BCDs) indices¶
In this section, the Beachface Cluster Dynamics (BCDs), which compirse both the empirical and residual BCD indices, are computed with sandpyper.
From Pucino et al.,: " The empirical and residual BCDs (e-BCD and r-BCD respectively) are purposefully designed metrics to leverage the very high spatiotemporal resolutions and three-dimensionality of our data for studying subaerial beach landform dynamics (morphodynamics). With elevation change (Δh) magnitude classes as transient states, we used finite discrete Markov chain models to compute first-order stochastic transition matrices and steady-state probability vectors, used to derive e- and r-BCD respectively." For more information about the methods, please refer to the docs.
As stated above, before computing any BCD, we need to discretise the magnitude of elevation change that occurred at every time step into bins. When we intiate the ProfileSet instance we already provide sandpyper with all it need to perform the discretisation on the data.
Therefore it is as simple as calling the ProfileDynamics.discretise method, as shown below.
D.discretise(absolute=True, print_summary=True, lod=D.lod_df)
JenksCaspall Lower Upper Count ======================================== x[i] <= 0.108 1799 0.108 < x[i] <= 0.247 1515 0.247 < x[i] <= 0.453 1082 0.453 < x[i] <= 1.096 601 1.096 < x[i] <= 3.012 115 Fit of JenksCaspall with 5 bins: 254.46116716298275
By setting absolute = True, the bins will be derived from absolute values of elevation changes first, then, classes will be assigned to both negative and positive values accordingly. This is needed to ensure that both erosional and depositional classes represent the same interval in elevation change magnitudes, except of opposite sign.
After this step, a new dataframe is added named df_labelled, which stores the markov_tag field, which is the result of the discretisation process.
D.df_labelled.head()
geometry | location | tr_id | distance | class_filter | dt | date_pre | date_post | z_pre | z_post | ... | lisa_p_sim | lisa_z_sim | lisa_z | decay | lod | dh_abs | lod_valid | markov_tag | magnitude_class | spatial_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (731628.744 5705535.887) | mar | 24 | 22.0 | sand | dt_7 | 20190313 | 20190516 | 1.177169 | 0.517324 | ... | 0.001 | -3.263109 | -1.946153 | 0 | 0.097589 | 0.659844 | yes | High_erosion | High | 757792605686565N81T)2.340P508542.852799I58(367... |
1 | POINT (731627.918 5705536.452) | mar | 24 | 23.0 | sand | dt_7 | 20190313 | 20190516 | 1.176551 | 0.679524 | ... | 0.003 | -2.939650 | -1.299948 | 0 | 0.097589 | 0.497027 | yes | High_erosion | High | 103787109670565N48T)2.011P749636.553917I14(357... |
2 | POINT (731627.093 5705537.016) | mar | 24 | 24.0 | sand | dt_7 | 20190313 | 20190516 | 1.182770 | 0.768895 | ... | 0.001 | -2.971367 | -0.969926 | 0 | 0.097589 | 0.413875 | yes | Medium_erosion | Medium | 640761700672535N94T)2.576P580778.152032I30(347... |
3 | POINT (731626.267 5705537.581) | mar | 24 | 25.0 | sand | dt_7 | 20190313 | 20190516 | 1.183118 | 0.863863 | ... | 0.003 | -2.803241 | -0.594391 | 0 | 0.097589 | 0.319256 | yes | Medium_erosion | Medium | 097756203666535N42T)2.137P811761.853250I95(327... |
4 | POINT (731625.442 5705538.145) | mar | 24 | 26.0 | sand | dt_7 | 20190313 | 20190516 | 1.183118 | 0.906839 | ... | 0.003 | -2.812945 | -0.423824 | 0 | 0.097589 | 0.276280 | yes | Medium_erosion | Medium | 433741807650535N09T)2.899P042855.453478I51(317... |
5 rows × 29 columns
Now, we are ready to compute the actual indices, but first, a note on the weights, neede for the e-BCDs.
e-BCDs weights¶
The e-BCD uses weights to represent the importance of each transition.
A point that transitioned from small erosion to extreme erosion has more impact on the sediment budget than if it was transitioning to a medium erosion class.
The weights for each magnitude class can be specified manually using a dictionary. However, it is best to obtain an objective representation of the severity of change. As the magnitude classes are derived from the data (using Jenks-Caspall optimised natural breaks method in this example), a simple yet robust solution is to use the medians of each magnitude class as weight.
The function infer_weights takes care of this process.
D.infer_weights()
D.weights_dict
{'Undefined_deposition': 0.05, 'Undefined_erosion': 0.05, 'Small_deposition': 0.17, 'Small_erosion': 0.17, 'Medium_deposition': 0.33, 'Medium_erosion': 0.33, 'High_deposition': 0.58, 'High_erosion': 0.58, 'Extreme_deposition': 1.62, 'Extreme_erosion': 1.62}
Of course, if we want to use our own designed weigths, it is as simple as this below:
# create the weight dictionary
my_weights_dict={'Undefined_deposition': 1,
'Undefined_erosion': 1,
'Small_deposition': 1,
'Small_erosion': 1,
'Medium_deposition': 1,
'Medium_erosion': 1,
'High_deposition': 3,
'High_erosion': 3,
'Extreme_deposition': 8,
'Extreme_erosion': 8}
# add it to the ProfileDynamics object
D.weights_dict = my_weights_dict
D.weights_dict
{'Undefined_deposition': 1, 'Undefined_erosion': 1, 'Small_deposition': 1, 'Small_erosion': 1, 'Medium_deposition': 1, 'Medium_erosion': 1, 'High_deposition': 3, 'High_erosion': 3, 'Extreme_deposition': 8, 'Extreme_erosion': 8}
However, it makes more sense to use Sandpyper in-built method, so, let's recalculate it.
D.infer_weights()
D.weights_dict
{'Undefined_deposition': 0.05, 'Undefined_erosion': 0.05, 'Small_deposition': 0.17, 'Small_erosion': 0.17, 'Medium_deposition': 0.33, 'Medium_erosion': 0.33, 'High_deposition': 0.58, 'High_erosion': 0.58, 'Extreme_deposition': 1.62, 'Extreme_erosion': 1.62}
Now, we can compute BCDs at the location level at once.
The method ProfileDynamics.BCDs_compute computes all the stochastic first-order transition matrices of sand dynamics, based on the filters provided (sand-only, hotspots, beyond LoD) across beachfaces, at the site level, which are the basis for the e-BCDs.
It returns 2 dataframes:
- e-BCDs
- steady-state distribution
D.BCD_compute_location("geometry","all",True, filterit='lod')
LoD filter in use. Only points beyond LoDs will be retained.
After this step, two new dataframes are added as attributes to the ProfileDynamics object:
- location_ebcds : stores the location level e-BCDs of each submatrix, as well as its value, sign and trend.
- location_ss : stores the location level steady-state vector of each magnitude of change class
D.location_ebcds
location | sub_matrix | coastal_markov_idx | trend | sign | states_labels | |
---|---|---|---|---|---|---|
0 | leo | de | 2.096 | 2.625 | + | Vulnerability |
1 | leo | dd | 0.546 | 0.001 | + | Depositional |
2 | leo | ed | 1.731 | -0.159 | - | Recovery |
3 | leo | ee | 0.924 | -0.045 | - | Erosional |
4 | mar | de | 2.080 | 0.098 | + | Vulnerability |
5 | mar | dd | 1.477 | 0.044 | + | Depositional |
6 | mar | ed | 2.570 | -0.253 | - | Recovery |
7 | mar | ee | 1.290 | -0.027 | - | Erosional |
D.location_ss
leo | mar | |
---|---|---|
Extreme_deposition | 0.000000 | 0.017774 |
High_deposition | 0.009499 | 0.063266 |
Medium_deposition | 0.056102 | 0.099950 |
Small_deposition | 0.148219 | 0.106840 |
Undefined_deposition | 0.017016 | 0.043324 |
total_deposition | 0.230837 | 0.331154 |
Undefined_erosion | 0.011688 | 0.027057 |
Small_erosion | 0.143038 | 0.082955 |
Medium_erosion | 0.070255 | 0.065048 |
High_erosion | 0.007023 | 0.045040 |
Extreme_erosion | 0.000897 | 0.003127 |
total_erosion | 0.232901 | 0.223227 |
r_bcds | -0.206427 | 10.792748 |
The residual is simply the difference between erosional and depositional probabilities in the Steady-State distribution, multiplicated by 100 for readability purposes. Note that in this case, no weigths are applied as no transition is represented.
Here below, the dataframe returns the residual column, which is what you might want to map in Qgis.
The above table, the row r_bcds represent the r-BCD index for the locations, which we also call behavioural regime.
Plot: first-order transition stochastic matrices¶
Let's use sandpyper to visualise location-specific first-order stochastic matrices, representing the probability ofeach transiction, which is the basis for the computation of the e-BCDs.
# the following dictionary is used to rename the magnitude of change classes into small codes
#such as 'me' for Medium Erosion or 'sd' for Small Deposition. This is done purely for graphical purposes.
relabel_dict={'Undefined_deposition': 'ue',
'Undefined_erosion': 'ue',
'Small_deposition': 'sd',
'Small_erosion': 'se',
'Medium_deposition': 'md',
'Medium_erosion': 'me',
'High_deposition': 'hd',
'High_erosion': 'he',
'Extreme_deposition': 'ed',
'Extreme_erosion': 'ee'}
%matplotlib inline
#let's plot those matrices
D.plot_trans_matrices(relabel_dict)
Notes:
These matrices discard all the valid to non-valid transitions. In other words, all transitions going from a valid point (non spatial outlier, classified as sand and beyond limit of detection) are labelled as "nnn" and discarded. Moreover, the colour ramp higher limit (vmax parameter) is set to a maximum of 3 times the standard deviation of all dataset without the nnn state.
How to read transition matrices¶
The title informs about:
- mar/leo = Location codes (leo = St. Leonards; mar = Marengo)
- n = Total number of points in the timeseries (valid and non-valid)
- t = Total number of timesteps
- trans = Total number of valid transitions considered
Rows are the initial state, while the columns are the final (one-step) state. For instance, in Marengo, a point which starting state is Undefined Erosion (ue) has 12% probability to become a Medium Erosion (me) point and a 15% probability to become a Medium Deposition in the next time period.
Some observations
We chose St. Leonards as an example to demonstrates a few limitations, which are observable in this matrix.
St. Leonards is located within Port Phillip bay. This narrrow beach is not embayed, but fetch-limited, meaning that it doesn't receive the highly energetic Souther Ocean swell, rather, it sees its morphodynamics impacted by considerably lower waves which are locally generated from within the bay. Moreover, seagrass meadows and reefs further reduce wave imapcts on the subaerial beachface.
Therefore, changes are small and often below the limit of detections, thus, uncertain. If we consider that we filter points based on:
- only sand
- limit of detection
- spatial outliers
from an already narrow beach (short transects) this location is severly decimated in terms of behavioural modelling potential. In fact, the submatrices have many zeroes, which indicate a zero probability of these transitions, because these have not been recorded during the monitoring period.
Moreover, we note how most of the transitions tend to be from lower magnitude classes to low magnitude classes.
Another story for Marengo.
Marengo is an open-ocean beach with a small southern section pretected by a small headland (tombolo-like). The section used for this example is also affected by sand nourishment, which injects considerable amount of sediment during the monitoring period.
Given in Marengo we have roughly 4 times the number of valid observations (trans=5'278) as compared to St. Leonards (trans=1'072), the submatrices better capture the stochastic behaviour of this beachface.
Plot: e-BCDs (location)¶
Let's plot the e-BCDs, which summarises the information contained in the submatrices.
D.plot_location_ebcds()
Plot: r-BCD (location)¶
We already computed the Steady State probability vectors for each location using the BCDs_compute function, storing it in the ss variable.
Here below we reorder the data and plot as an heatmap, where each columns is a location and each row a magnitude of change.
order=[i for i in D.tags_order if i !='nnn']
ss=D.location_ss.loc[order]
ss
leo | mar | |
---|---|---|
Extreme_deposition | 0.000000 | 0.017774 |
High_deposition | 0.009499 | 0.063266 |
Medium_deposition | 0.056102 | 0.099950 |
Small_deposition | 0.148219 | 0.106840 |
Undefined_deposition | 0.017016 | 0.043324 |
Undefined_erosion | 0.011688 | 0.027057 |
Small_erosion | 0.143038 | 0.082955 |
Medium_erosion | 0.070255 | 0.065048 |
High_erosion | 0.007023 | 0.045040 |
Extreme_erosion | 0.000897 | 0.003127 |
plt.rcParams['font.sans-serif'] = 'Arial'
plt.rcParams['font.family'] = 'sans-serif'
sb.set_context("paper", font_scale=2)
f,ax=plt.subplots(figsize=(12,10))
sb.heatmap(ss, cmap="Blues",annot=True,
annot_kws={'size':14},linewidths=1,linecolor="white", cbar_kws={'label': 'Lim. Probabilities'});
#f.savefig(r'E:\\path\\to\\save\\picture.png', dpi=600);
Location level BCDs are good to compare multiple locations across wide areas. But what about getting a more detailed spatial-explicit view of the behaviour of a beachface system, at the transect-level?
Transect-level BCDs¶
If we are looking at spatial distribution of r-BCDs, it is important that transects are comparable to each other within one location.
So far, LoDs and sand-only filters have assured that only high quality data was used in the r-BCDs computation. At the transect level we add two additional conditions, to ensure that are only retained:
- points that remain valid for at least a certain amount of time periods
- transects that have a number of valid points greater than a determined minimum threshold
Therefore, the choice of the two parameters (thresh and min_pts) of the ProfileDynamics.BCD_compute_transects method is very important:
- min_points: the minimum required valid points per transect to consider a transect reliable.
- thresh: the minimum number of timesteps required to retain a transect.
This is done to ensure comparability across time and transects in a determined location.
Therefore, thresh is an important parameter that should ideally be as high as the available time periods, in order to ensure that the maximum behavioural variability is captured.
Yet, setting this parameter very high can reduce considerably the number of valid points retained in a single transect, which in turn can fall below min_pt, leading to the loss of transects from the final behavioural map.
Let's see how this works.
loc_specs={'mar':{'thresh':6,
'min_points':6}}
D.BCD_compute_transects(loc_specs=loc_specs,reliable_action='keep', dirNameTrans=D.ProfileSet.dirNameTrans)
tr_id 24 has 0 valid points. No threshold and mi_points provided in loc_specs for leo. Using no filters.
C:\conda3\envs\sandpyper_env\lib\site-packages\geopandas\geodataframe.py:422: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance. for feature in features_lst:
D.transects_rbcd.query("location=='leo'").plot(column="residual", cmap='RdBu_r')
<AxesSubplot:>
Done. We now have transect level r-BCDS computed with the totality of elevation changes transitions occurred during the monitoring period.
Sensitivity analysis¶
However, as previously discussed, the choice of the thresh and min_pts parameter is very important. It is also informative to monitor and try to minimise the number of transects that passed from a depositional to erosional behavioral regime (or vice versa) in the last time period (i. e. changed sign from t-1 to t), for any chosen thresh. Those transects could signal a behaviour that only emerged by choosing a determined value for thresh, signaling a potentially lower confidence in their r-BCD values.
Also choosing the right min_pts and thresh parameters is important.
This is why sandpyper includes 2 functions to perform a sensitivity analysis:
- sensitivity_tr_rbcd() : to perform a sensitivity analysis
- plot_sensitivity_rbcds_transects() : to visualise the sensitivity analysis results
We start by defining all the combinations between the parameters thresh (t) and min_pts (pt), given ranges that we define.
The thresh parameter can be up to the total number of timesteps available, while the min_pts, we decide to test the values from 0 to 210, with a step of 10.
ss_tr_big = sensitivity_tr_rbcd(D.df_labelled,
test_thresholds='max',
test_min_pts=[0,20,2])
Working on mar. A total of 90 combinations of thresholds and min_pts will be computed.
Working on threshold 0 and min points 0. errore Working on threshold 1 and min points 0. errore Working on threshold 2 and min points 0. errore Working on threshold 3 and min points 0. errore Working on threshold 4 and min points 0. errore Working on threshold 5 and min points 0. errore Working on threshold 6 and min points 0. errore Working on threshold 7 and min points 0. errore Working on threshold 8 and min points 0. errore Working on threshold 0 and min points 2. errore Working on threshold 1 and min points 2. errore Working on threshold 2 and min points 2. errore Working on threshold 3 and min points 2. errore Working on threshold 4 and min points 2. errore Working on threshold 5 and min points 2. errore Working on threshold 6 and min points 2. errore Working on threshold 7 and min points 2. errore Working on threshold 8 and min points 2. errore Working on threshold 0 and min points 4. errore Working on threshold 1 and min points 4. errore Working on threshold 2 and min points 4. errore Working on threshold 3 and min points 4. errore Working on threshold 4 and min points 4. errore Working on threshold 5 and min points 4. errore Working on threshold 6 and min points 4. errore Working on threshold 7 and min points 4. errore Working on threshold 8 and min points 4. errore Working on threshold 0 and min points 6. errore Working on threshold 1 and min points 6. errore Working on threshold 2 and min points 6. errore Working on threshold 3 and min points 6. errore Working on threshold 4 and min points 6. errore Working on threshold 5 and min points 6. errore Working on threshold 6 and min points 6. errore Working on threshold 7 and min points 6. errore Working on threshold 8 and min points 6. errore Working on threshold 0 and min points 8. errore Working on threshold 1 and min points 8. errore Working on threshold 2 and min points 8. errore Working on threshold 3 and min points 8. errore Working on threshold 4 and min points 8. errore Working on threshold 5 and min points 8. errore Working on threshold 6 and min points 8. errore Working on threshold 7 and min points 8. errore Working on threshold 8 and min points 8. errore Working on threshold 0 and min points 10. errore Working on threshold 1 and min points 10. errore Working on threshold 2 and min points 10. errore Working on threshold 3 and min points 10. errore Working on threshold 4 and min points 10. errore Working on threshold 5 and min points 10. errore Working on threshold 6 and min points 10. errore Working on threshold 7 and min points 10. errore Working on threshold 8 and min points 10. errore Working on threshold 0 and min points 12. errore Working on threshold 1 and min points 12. errore Working on threshold 2 and min points 12. errore Working on threshold 3 and min points 12. errore Working on threshold 4 and min points 12. errore Working on threshold 5 and min points 12. errore Working on threshold 6 and min points 12. errore Working on threshold 7 and min points 12. errore Working on threshold 8 and min points 12. errore Working on threshold 0 and min points 14. errore Working on threshold 1 and min points 14. errore Working on threshold 2 and min points 14. errore Working on threshold 3 and min points 14. errore Working on threshold 4 and min points 14. errore Working on threshold 5 and min points 14. errore Working on threshold 6 and min points 14. errore Working on threshold 7 and min points 14. errore Working on threshold 8 and min points 14. errore Working on threshold 0 and min points 16. errore Working on threshold 1 and min points 16. errore Working on threshold 2 and min points 16. errore Working on threshold 3 and min points 16. errore Working on threshold 4 and min points 16. errore Working on threshold 5 and min points 16. errore Working on threshold 6 and min points 16. errore Working on threshold 7 and min points 16. errore Working on threshold 8 and min points 16. errore Working on threshold 0 and min points 18. errore Working on threshold 1 and min points 18. errore Working on threshold 2 and min points 18. errore Working on threshold 3 and min points 18. errore Working on threshold 4 and min points 18. errore Working on threshold 5 and min points 18. errore Working on threshold 6 and min points 18. errore Working on threshold 7 and min points 18. errore Working on threshold 8 and min points 18. errore Working on leo. A total of 60 combinations of thresholds and min_pts will be computed.
Working on threshold 0 and min points 0. errore Working on threshold 1 and min points 0. errore Working on threshold 2 and min points 0. errore Working on threshold 3 and min points 0. errore Working on threshold 4 and min points 0. errore Working on threshold 5 and min points 0. errore Working on threshold 0 and min points 2. errore Working on threshold 1 and min points 2. errore Working on threshold 2 and min points 2. errore Working on threshold 3 and min points 2. errore Working on threshold 4 and min points 2. errore Working on threshold 5 and min points 2. errore Working on threshold 0 and min points 4. errore Working on threshold 1 and min points 4. errore Working on threshold 2 and min points 4. errore Working on threshold 3 and min points 4. errore Working on threshold 4 and min points 4. errore Working on threshold 5 and min points 4. errore Working on threshold 0 and min points 6. errore Working on threshold 1 and min points 6. errore Working on threshold 2 and min points 6. errore Working on threshold 3 and min points 6. errore Working on threshold 4 and min points 6. errore Working on threshold 5 and min points 6. errore Working on threshold 0 and min points 8. errore Working on threshold 1 and min points 8. errore Working on threshold 2 and min points 8. errore Working on threshold 3 and min points 8. errore Working on threshold 4 and min points 8. errore Working on threshold 5 and min points 8. errore Working on threshold 0 and min points 10. errore Working on threshold 1 and min points 10. errore Working on threshold 2 and min points 10. errore Working on threshold 3 and min points 10. errore Working on threshold 4 and min points 10. errore Working on threshold 5 and min points 10. errore Working on threshold 0 and min points 12. errore Working on threshold 1 and min points 12. errore Working on threshold 2 and min points 12. errore Working on threshold 3 and min points 12. errore Working on threshold 4 and min points 12. errore Working on threshold 5 and min points 12. errore Working on threshold 0 and min points 14. errore Working on threshold 1 and min points 14. errore Working on threshold 2 and min points 14. errore Working on threshold 3 and min points 14. errore Working on threshold 4 and min points 14. errore Working on threshold 5 and min points 14. errore Working on threshold 0 and min points 16. errore Working on threshold 1 and min points 16. errore Working on threshold 2 and min points 16. errore Working on threshold 3 and min points 16. errore Working on threshold 4 and min points 16. errore Working on threshold 5 and min points 16. errore Working on threshold 0 and min points 18. errore Working on threshold 1 and min points 18. errore Working on threshold 2 and min points 18. errore Working on threshold 3 and min points 18. errore Working on threshold 4 and min points 18. errore Working on threshold 5 and min points 18. errore
f,ax=plt.subplots(figsize=(10,10))
palette=sb.color_palette( n_colors=ss_tr_big.tr_id.unique().shape[0])
sb.lineplot(data=ss_tr_big, x='thresh',y='residual', hue='tr_id', style='location',
palette=palette, legend=False, **dict(alpha=0.5),
ax=ax
)
ax.set_ylabel("r_bcd")
ax.axhline(y=0, lw=2, c='r');
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-36-f2c8a953848a> in <module> 1 f,ax=plt.subplots(figsize=(10,10)) 2 ----> 3 palette=sb.color_palette( n_colors=ss_tr_big.tr_id.unique().shape[0]) 4 sb.lineplot(data=ss_tr_big, x='thresh',y='residual', hue='tr_id', style='location', 5 palette=palette, legend=False, **dict(alpha=0.5), C:\conda3\envs\sandpyper_env\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5460 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5461 return self[name] -> 5462 return object.__getattribute__(self, name) 5463 5464 def __setattr__(self, name: str, value) -> None: AttributeError: 'DataFrame' object has no attribute 'tr_id'
As we can see, there are only 3 transects that flipped their r-BCD sign (depo to ero or viceversa) as a consequence of an increase in the minimum number of timesteps.
In order to better decide a sub-optimal combination of thresh and min_pts that retains the majority of the transects (while keeping an eye on transects that flipped r-BCD), here below we do some plotting.
plot_sensitivity_rbcds_transects(ss_tr_big, location='mar')
Conclusion¶
This notebook concludes the tutorials for the profile-based monitoring approach offered by sandpyper. In the next notebook we introduce a paradigm shift, which is raster-based analysis. However, this part of sandpyper is currently under development, thus, only a small teaser notebook will illustrate a few useful functions already implemented in sandpyper.
See you there!