Introduction and data preparation¶
Nicolas Pucino; PhD Student @ Deakin University, Australia
This notebook covers the following concepts:
- Demo data explaination.
- Naming conventions and global settings.
- Setting up the folders. </font>
</font>
# import what required
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
import geopandas as gpd
import rasterio as ras
from rasterio.plot import show
from sandpyper.outils import create_transects
Introduction¶
Sandpyper is an open-source Python package that automates profile-based volumetric and altimetric analysis of sandy beaches. It facilitates the cleaning of the elevation data from unwanted non-sand points or swash areas (where waves run up on the beach slope and 3D reconstruction is inaccurate) and allows to model beachface behavioural regimes using the Beachface Cluster Dynamics (BCD) indices. It combines a file naming convention with some user-provided global settings and keep the analys coherent by containing it into two main classes:
- ProfileSet class: manages the elevation and colour data extraction and LoD analysis
- ProfileDynamics class: manages volumetric, altimetric and BCD analysis
The user only needs to set-up the directories with all the DSMs and/or orthophotos correctly named (a function is provided, see formatting files notebook) and provide a few global settings at the beginning of the analysis, then the rest is fully automated.
I provided some demo data and prepared notebooks to introduce the main functionalities of the package, including:
- demo data extraction and global settings explaination
- profile data extraction
- sand labelling and data cleaning
- multiscale beach volumetric analysis
- multiscale beach cluster dynamics indices (BCDs indices)
- space module (teaser)
These notebooks focus is in the code implementation. If you want to read more about the methods and theory behind, please refer to the Documentation.
For a real world application, refer to the paper "Citizen science for monitoring seasonal-scale beach erosion and behaviour with aerial drones", where we analysed 85 surveys in 10 locations in Victoria, Australia.
Demo data¶
The demo_data.rar archive includes all the data you need to get you started and understand how Sandpyper works. This include:
dsm_1m: 6 DSMs of a small demo area in St.Leonards (leo) beach and 9 in Marengo (mar) beach, both covering their first year of monitoring. These are Geotiffs in two Coordinate Reference Systems (CRS) which have been resampled at 1m resolution (from an original 2.5cm) for demo speed.
orthos_1m: the corresponding orthophotos of the DSMs.
transects: vector files (Geopackages) representing the transects we want to extract information and work with, one for each location.
lod_transects: transects placed on invariant features of the landscape (roads, roofs, ramps, big rocks, etc.) used to derive Limit of Detection (LoDs) statistics.
clean: polygon masks used to fine-tune unsupervised clustering algorithm (label_corrections.gpkg), masking out swash and water (watermasks.gpkg) and shore areas (shoremasks.gpkg).
shorelines: vector shorelines used to demonstrate the creation of transects with Sandpyper.
Feel free to drag and drop these files in your favourite GIS (Qgis highly reccomended!) to have a look and feel on how they look, or, see below.
Here I plot a survey orthophoto in matplotlib interactive window. Zoom in to an area of interest to have a better image. For better visualisation purposes I will only show the orthophoto but remember that Sandpyper extract and work with both elevation (from DSMs) and colour data (from orthophotos) when provided.
Here below I also plot the watermasks, used to mask out the swash and water from the analysis, and the shoremask, used to clip the area of interest within a smaller and more meaningful zone.
%matplotlib notebook
f,axs= plt.subplots(1,2, figsize=(10,8), squeeze=True)
ortho_path=r"C:\my_packages\sandpyper\tests\test_data\orthos_1m\leo_20180920_ortho_resampled_1m.tif"
watermasks_path=r"C:\my_packages\sandpyper\tests\test_data\clean\watermasks.gpkg"
shoremasks_path=r"C:\my_packages\sandpyper\tests\test_data\clean\shoremasks.gpkg"
label_corr_path=r"C:\my_packages\sandpyper\tests\test_data\clean\label_corrections.gpkg"
transect_path=r"C:\my_packages\sandpyper\tests\test_data\transects\leo_transects.gpkg"
transect_lod_path=r"C:\my_packages\sandpyper\tests\test_data\lod_transects\leo_lod_transects.gpkg"
transects=gpd.read_file(transect_path)
watermasks=gpd.read_file(watermasks_path)
shoremasks=gpd.read_file(shoremasks_path)
label_corr=gpd.read_file(label_corr_path)
transect_lod=gpd.read_file(transect_lod_path)
with ras.open(ortho_path,'r') as ortho:
# plot orthos
show(ortho, ax=axs[0])
show(ortho, ax=axs[1])
# plot transects
transects.plot(ax=axs[0], color='r', label='transects')
transects.plot(ax=axs[1], color='r')
# plot shoremask
shoremask_leo_demo=shoremasks.query("location=='leo'")
shoremask_leo_demo.to_crs(32755).boundary.plot(ax=axs[1],color='w',linestyle='--',label='shoremask')
# plot watermask
watermask_leo_demo=watermasks.query("location=='leo' and raw_date==20180920")
watermask_leo_demo.to_crs(32755).plot(ax=axs[1],alpha=.3,label='watermask')
# add tr_id labels
for x, y, label in zip(transects.geometry.centroid.x, transects.geometry.centroid.y, transects.tr_id):
axs[0].annotate(label, xy=(x, y), xytext=(5, 5), textcoords="offset points")
# print coodinate reference systems
print(f"CRS of St.Leonards data:\n transects = {transects.crs}\n ortho = {ortho.crs}\n watermask = {watermask_leo_demo.crs}\
\n shoremask= {shoremask_leo_demo.crs}")
# set up image
f.suptitle("St. Leonards (leo) the 20th September 2018 (20180920)")
axs[0].set_title("overview")
axs[1].set_title("zoom in")
axs[1].set_xlim(299904.9, 299970)
axs[1].set_ylim(5773523, 5773601)
axs[0].legend()
axs[1].legend()
C:\conda3\envs\sandpyper_env\lib\site-packages\geopandas\geodataframe.py:422: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance. for feature in features_lst:
CRS of St.Leonards data: transects = epsg:32755 ortho = EPSG:32755 watermask = epsg:32754 shoremask= epsg:32754
Notes:
Not all the transects are within the extent of this orthophoto. Sandpyper will automatically assign numpy.NaN to points that are outside the raster extent (transects in the white areas of overview image) and points that are within the extent of the raster, but the raster itself has NoData, if any, pixels (black areas of zoom image).
Watermask and shoremask are in different CRSs. This is due to the fact that in Geopandas GeoDataFrames we can only store one CRS per geodataframe. When we create these masks in a GIS, we have to work with one CRS for all polygons (CRS of epsg: 32754 in this case). Sandpyper uses a user-defined CRS dictionary to keep all the analysis organised and always use the right CRS for each dataset or reproject when needed. This is useful when working in projects involving large datasets across space.
Global settings¶
Sandpyper needs a few global settings to be set at the beginning of the analysis in order to keep the it coherent, which are:
- cordinate reference systems for each location (dictionary)
- location codes (list)
- location search keywords for automatic filename formatting (dictionary)
- transects for beach dynamics and Limit of Detection analysis (vector spatial file, Geopackage)
- correction polygons (vector spatial file, Geopackage)
Let's review each one of them.
Location codes¶
When your analysis involve a multi-site approach, it is convenient to assign each location a small code to easy the analysis and retrieval.
Here are some examples:
- Saint Leonards : leo
- Marengo : mar
# The location codes used troughout the analysis
loc_codes=["mar","leo"]
Coordinate Reference Systems dictionary¶
Working on a wide area, often requires dealing with multiple Coordinate Reference Systems (CRS). Therefore, it is important to assign each location code with its appropriate CRS at the beginning, in order to always take it into account trhoughout the analysis.
We do this with another dictionary, called crs_dict_string
, where as keys we store the location codes and as values we store another dictionary, in this form {'init': 'epsg:32754'}
.
Modify the EPSG code to change CRS. Here is an example of the resulting dictionary:
crs_dict_string = {'wbl': {'init': 'epsg:32754'},
'apo': {'init': 'epsg:32754'},
'prd': {'init': 'epsg:32755'},
'dem': {'init': 'epsg:32755'} }
NOTE: to specify the CRS, use the dictionary format supported by Geopandas 0.6.3. Only projected CRS are supported.
# The Coordinate Reference Systems used troughout this study
crs_dict_string= {
'mar': {'init': 'epsg:32754'},
'leo':{'init': 'epsg:32755'}
}
Naming conventions and spatial data fields¶
Here below are the naming conventions used in Sandpyper.
- DSM image: LocationCode_yyyymmdd_dsm.tif (leo_20180606_dsm.tif)
- orthophoto image: LocationCode_yyyymmdd_ortho.tif (mar_20180727_ortho.tif)
- transect file: LocationCode_transects.gpkg (mar_transects.gpkg)
- lod transect file: LocationCode_lod_transects.gpkg (leo_lod_transects.gpkg)
- shoreline file: LocationCode_shoreline.gpkg (mar_shoreline.gpkg)
As the user needs to create polygons to use to clean the data (watermasks, shoremasks and label correcion masks), here below are some images showing how those polygon attributes need to be provided.
- location (string): location code
Every polygon has its own row and represent the shoremask of each location.
Working with only one geopackage or shapefile implies that only one CRS has to be used for all the locations. This is automatically taken into account in Sandpyper using the crs_dict_string global setting.
- location (string): location code
- raw_date (integer): survey date to apply this mask in raw for (yyyymmdd)
Every polygon has its own row and represent the watermask for each survey in each location.
- location (string): location code
- raw_date (integer): survey date to apply this mask in raw for (yyyymmdd)
- target_label_k (integer): label k to apply the correction of this polygon. 999 to apply to all points within it, regardless of their label k
- new_class (string): class to assign to the points of specified label k within the polygon
Every polygon has its own row and represent the correction polygons for each survey in each location.
Location search dictionary¶
Sometimes, Sandpyper needs to automatically obtain the right location code and date from raster files, either of DSMs or orthophotos, whose filenames contains unformatted versions of the location name and date (e.g. Saint_Leonards_20180923 or Marengo_21_july_2019).
An easy and fast way to do this, is to create a dictionary, where keys are the location codes and the values are lists of possible full names we expect to find in the files. Here are some examples:
loc_search_dict = { 'leo': ['St','Leonards','leonards','leo'],
'mar': ['Marengo','marengo','mar'] }
NOTE: always include the location codes in the list of possible names, in case the original raster filenames are already formatted!
# The terms used in the original filenames.
# These will be used to properly format files, extracting location codes.
loc_search_dict = { 'leo': ['St','Leonards','leonards','leo'],
'mar': ['Marengo','marengo','mar'] }
Input data and folder structure¶
Automatic formatting of filenames¶
from sandpyper.space import form
Transects¶
Any Shapely line or polyline object can be used as input as a transect. In coastal geomeorphometric studies, transects are usually equally spaced alongshore, and place normal to the shoreline. However, any type of line can be used to extract values from both orthophtos and DSMs.
You can construct transects in 2 ways:
- Any GIS
- Using the function create_transects, starting from a shoreline baseline.
If you use your favourite GIS (Qgis preferred), ensure that the output format is geopackage (.gpkg), each transect has its own separate row (geometry) and a transect ID column holds each transect, well, ID.
If you want to use in-built sandpyper function, see below:
# load and display the shoreline
shoreline_path=r"C:\my_packages\sandpyper\tests\test_data\shorelines\leo_shoreline_short.gpkg"
shoreline=gpd.read_file(shoreline_path)
shoreline.plot();
C:\conda3\envs\sandpyper_env\lib\site-packages\geopandas\geodataframe.py:422: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance. for feature in features_lst:
# create and display the transects (in red), give the shoreline (in blu)
f,ax=plt.subplots(figsize=(5,5)) # Change figsize if you want bigger images
location='leo' # insert the location code for this transect file.
transects=create_transects(shoreline,
sampling_step=20, # alongshore transect spacing in meters
tick_length=50, # transects length in meters
location=location,
crs=crs_dict_string[location],
side='both' # 'both':transect is centered at the interesction with the baseline
)
# Modify the figure by plotting shoreline, transects and transect IDs.
shoreline.plot(ax=ax,color='b')
transects.plot(ax=ax,color='r')
for x, y, label in zip(transects.geometry.centroid.x, transects.geometry.centroid.y, transects.tr_id):
ax.annotate(label, xy=(x, y), xytext=(4, 1), textcoords="offset points")
transects.head()
C:\conda3\envs\sandpyper_env\lib\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6 return _prepare_from_string(" ".join(pjargs))
tr_id | geometry | location | |
---|---|---|---|
0 | 0 | LINESTRING (300166.471 5772833.267, 300206.279... | leo |
1 | 1 | LINESTRING (300154.369 5772849.190, 300194.177... | leo |
2 | 2 | LINESTRING (300142.267 5772865.113, 300182.075... | leo |
3 | 3 | LINESTRING (300130.166 5772881.037, 300169.974... | leo |
4 | 4 | LINESTRING (300119.411 5772904.944, 300165.639... | leo |
The following cell saves transect to file and name it as:
locationCode_whateverYouWant.gpkg
(example: leo_transects.gpkg)
and place it to a folder where you store all the transects for each location.
Note: If the saving throws an error "PLE_NotSupported in dataset leo_transects.gpkg does not support layer creation option ENCODING", the file should be created and valid anyway. Double-check by opening it in Qgis.
#transects.to_file(filename=r'C:\save\here\your\leo_transects.gpkg',driver='GPKG')
Conclusion¶
This lenghty notebook showed what Sandpyper needs to operate and what the input data should look like. In the next notebooks I will show how all this information will be analysed in a few lines of code using Sandpyper.