Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.
predictions. Verde provides ways of splitting a dataset into one for fitting the gridder
(a training set) and one for comparing to predictions (a testing set). Function
:func:`verde.train_test_split` is based on
:func:`sklearn.model_selection.train_test_split` but is able to handle spatial data as
inputs.
See :ref:`model_evaluation` for more details.
"""
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pyproj
import numpy as np
import verde as vd
# Load the Baja California shipborne bathymetry data
data = vd.datasets.fetch_baja_bathymetry()
coordinates = (data.longitude.values, data.latitude.values)
region = vd.get_region(coordinates)
# Split the data into a training and testing set by picking points at random
# This is NOT the best way to split spatially correlated data and will cease being the
# default in future versions of Verde.
train, test = vd.train_test_split(coordinates, data.bathymetry_m, random_state=0)
print("Train and test data sizes:")
print(train[0][0].size, test[0][0].size)
# Alternatively, we can split the data into blocks and pick blocks at random.
# The advantage of this approach is that it makes sure that the training and testing
# datasets are not spatially correlated, which would bias our model evaluation.
# This will be the default in future versions of Verde.
block_train, block_test = vd.train_test_split(
Mask grid points by distance
============================
Sometimes, data points are unevenly distributed. In such cases, we might not want to
have interpolated grid points that are too far from any data point. Function
:func:`verde.distance_mask` allows us to set such points to NaN or some other value.
"""
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pyproj
import numpy as np
import verde as vd
# The Baja California bathymetry dataset has big gaps on land. We want to mask these
# gaps on a dummy grid that we'll generate over the region.
data = vd.datasets.fetch_baja_bathymetry()
region = vd.get_region((data.longitude, data.latitude))
# Generate the coordinates for a regular grid mask
spacing = 10 / 60
coordinates = vd.grid_coordinates(region, spacing=spacing)
# Generate a mask for points that are more than 2 grid spacings away from any data
# point. The mask is True for points that are within the maximum distance. Distance
# calculations in the mask are Cartesian only. We can provide a projection function to
# convert the coordinates before distances are calculated (Mercator in this case). In
# this case, the maximum distance is also Cartesian and must be converted from degrees
# to meters.
mask = vd.distance_mask(
(data.longitude, data.latitude),
maxdist=spacing * 2 * 111e3,
coordinates=coordinates,
"""
Bathymetry data from Baja California
====================================
We provide sample bathymetry data from Baja California to test the gridding
methods. This is the ``@tut_ship.xyz`` sample data from the `GMT
`__ tutorial. The data is downloaded to a local
directory if it's not there already.
"""
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import verde as vd
# The data are in a pandas.DataFrame
data = vd.datasets.fetch_baja_bathymetry()
print(data.head())
# Make a Mercator map of the data using Cartopy
plt.figure(figsize=(7, 6))
ax = plt.axes(projection=ccrs.Mercator())
ax.set_title("Bathymetry from Baja California")
# Plot the bathymetry as colored circles. Cartopy requires setting the projection of the
# original data through the transform argument. Use PlateCarree for geographic data.
plt.scatter(
data.longitude,
data.latitude,
c=data.bathymetry_m,
s=0.1,
transform=ccrs.PlateCarree(),
)
plt.colorbar().set_label("meters")
============================
When gridding data that has been highly oversampled in a direction (shipborne
and airborne data, for example), it is important to decimate the data before
interpolation to avoid aliasing. Class :func:`verde.BlockReduce` decimates
data by applying a reduction operation (mean, median, mode, max, etc) to the
data in blocks. For non-smooth data, like bathymetry, a blocked median filter
is a good choice.
"""
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import numpy as np
import verde as vd
# We'll test this on the Baja California shipborne bathymetry data
data = vd.datasets.fetch_baja_bathymetry()
# Decimate the data using a blocked median with 10 arc-minute blocks
reducer = vd.BlockReduce(reduction=np.median, spacing=10 / 60)
coordinates, bathymetry = reducer.filter(
(data.longitude, data.latitude), data.bathymetry_m
)
lon, lat = coordinates
print("Original data size:", data.bathymetry_m.size)
print("Decimated data size:", bathymetry.size)
# Make a plot of the decimated data using Cartopy
plt.figure(figsize=(7, 6))
ax = plt.axes(projection=ccrs.Mercator())
ax.set_title("10' Block Median Bathymetry")
# Plot the bathymetry as colored circles.
Data Decimation
===============
Often times, raw spatial data can be highly oversampled in a direction. In these cases,
we need to decimate the data before interpolation to avoid aliasing effects.
"""
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import verde as vd
########################################################################################
# For example, our sample shipborne bathymetry data has a higher sampling frequency
# along the tracks than between tracks:
# Load the data as a pandas.DataFrame
data = vd.datasets.fetch_baja_bathymetry()
# Plot it using matplotlib and Cartopy
crs = ccrs.PlateCarree()
plt.figure(figsize=(7, 7))
ax = plt.axes(projection=ccrs.Mercator())
ax.set_title("Locations of bathymetry measurements from Baja California")
# Plot the bathymetry data locations as black dots
plt.plot(data.longitude, data.latitude, ".k", markersize=1, transform=crs)
vd.datasets.setup_baja_bathymetry_map(ax)
plt.tight_layout()
plt.show()
########################################################################################
# Class :class:`verde.BlockReduce` can be used to apply a reduction/aggregation
# operation (mean, median, standard deviation, etc) to the data in regular blocks. All
# data inside each block will be replaced by their aggregated value.
============================
When gridding data that has been highly oversampled in a direction (shipborne
and airborne data, for example), it is important to decimate the data before
interpolation to avoid aliasing. Class :func:`verde.BlockReduce` decimates
data by applying a reduction operation (mean, median, mode, max, etc) to the
data in blocks. For non-smooth data, like bathymetry, a blocked median filter
is a good choice.
"""
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import numpy as np
import verde as vd
# We'll test this on the Baja California shipborne bathymetry data
data = vd.datasets.fetch_baja_bathymetry()
# Decimate the data using a blocked median with 10 arc-minute blocks
reducer = vd.BlockReduce(reduction=np.median, spacing=10 / 60)
coordinates, bathymetry = reducer.filter(
(data.longitude, data.latitude), data.bathymetry_m
)
lon, lat = coordinates
print("Original data size:", data.bathymetry_m.size)
print("Decimated data size:", bathymetry.size)
# Make a plot of the decimated data using Cartopy
plt.figure(figsize=(7, 6))
ax = plt.axes(projection=ccrs.Mercator())
ax.set_title("10' Block Median Bathymetry")
# Plot the bathymetry as colored circles.
Mask grid points by distance
============================
Sometimes, data points are unevenly distributed. In such cases, we might not want to
have interpolated grid points that are too far from any data point. Function
:func:`verde.distance_mask` allows us to set such points to NaN or some other value.
"""
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pyproj
import numpy as np
import verde as vd
# The Baja California bathymetry dataset has big gaps on land. We want to mask these
# gaps on a dummy grid that we'll generate over the region.
data = vd.datasets.fetch_baja_bathymetry()
region = vd.get_region((data.longitude, data.latitude))
# Generate the coordinates for a regular grid mask
spacing = 10 / 60
coordinates = vd.grid_coordinates(region, spacing=spacing)
# Generate a mask for points that are more than 2 grid spacings away from any data
# point. The mask is True for points that are within the maximum distance. Distance
# calculations in the mask are Cartesian only. We can provide a projection function to
# convert the coordinates before distances are calculated (Mercator in this case). In
# this case, the maximum distance is also Cartesian and must be converted from degrees
# to meters.
mask = vd.distance_mask(
(data.longitude, data.latitude),
maxdist=spacing * 2 * 111e3,
coordinates=coordinates,
# *namespace*). We'll also need a few other libraries for plotting and
# projecting our data.
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import pyproj
import numpy as np
import verde as vd
###############################################################################
# The first thing to do is load a test data set with which we can work. Verde
# offers functions for loading our packaged test data in :mod:`verde.datasets`.
# In this tutorial, we'll work with some bathymetry data from Baja California.
data = vd.datasets.fetch_baja_bathymetry()
###############################################################################
# The data are stored in a pandas.DataFrame object.
print("Data is of type:", type(data))
print(data.head())
###############################################################################
# Plot the data using matplotlib and Cartopy
crs = ccrs.PlateCarree()
plt.figure(figsize=(7, 6))
ax = plt.axes(projection=ccrs.Mercator())
ax.set_title("Bathymetry data from Baja California", pad=25)
# Plot the land as a solid color
.. note::
The :class:`~verde.Chain` class was inspired by the
:class:`sklearn.pipeline.Pipeline` class, which doesn't serve our purposes because
it only affects the feature matrix, not what we would call *data* (the target
vector).
For example, let's create a pipeline to grid our sample bathymetry data.
"""
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pyproj
import verde as vd
data = vd.datasets.fetch_baja_bathymetry()
region = vd.get_region((data.longitude, data.latitude))
# The desired grid spacing in degrees (converted to meters using 1 degree approx. 111km)
spacing = 10 / 60
# Use Mercator projection because Spline is a Cartesian gridder
projection = pyproj.Proj(proj="merc", lat_ts=data.latitude.mean())
proj_coords = projection(data.longitude.values, data.latitude.values)
plt.figure(figsize=(7, 6))
ax = plt.axes(projection=ccrs.Mercator())
ax.set_title("Bathymetry from Baja California")
plt.scatter(
data.longitude,
data.latitude,
c=data.bathymetry_m,
s=0.1,
transform=ccrs.PlateCarree(),