IYSE 6420 Birdcall Distributions

Author: Anthony Miyaguchi Last Modified: 2022-12-14

This website demonstrates the results of building birdcall distribution maps with Bayesian modeling methods. I completed this project for the IYSE 6420: Bayesian Statistics course as part of my Fall 2022 semester in Georgia Tech’s OMSCS program. See the project report and source on GitHub for more details.

We use the geographic metadata from the BirdCLEF 2022 competition dataset to build a map to show the location of birdcall recordings. We fit the data to a Poisson Generalized Linear Model (GLM) to estimate covariate or random effects.

Plots

We split each region into a grid (or regular lattice) and summarized birdcall recording observations into each grid cell. We define the grid in degrees of latitude or longitude. These discrete cells help fit a Bayesian model to the data and allow us to incorporate external geographical information derived from Google Earth Engine. The cells are small enough to be computationally tractable but large enough to capture the spatial variation in the data. See the Earth Engine Plots page for more information about the data we use from Google Earth Engine.

The posterior predictive is the estimated point prediction for the number of observations in each grid cell derived from the posterior distribution of the model parameters.

Options

Data

Data for this project can be found in the gs://iyse6420-birdcall-distribution bucket. Here are the direct links to source data:

You can load this data directly into a Python session using pandas and pyarrow:

>>> import pandas as pd
>>> df = pd.read_parquet("https://storage.googleapis.com/iyse6420-birdcall-distribution/ee_v3_ca_1.parquet")
>>> df.head()
      name region  grid_size  population_density  ...  land_cover_14  land_cover_15  land_cover_16  land_cover_17
0  -125_39     ca          1            7.639377  ...              0              0              0            117
1  -125_40     ca          1       172360.641191  ...              0              0              2           1032
2  -125_41     ca          1        48910.677999  ...              0              0              0           1386
3  -125_42     ca          1        39462.664024  ...              0              0              0           1436
4  -124_38     ca          1        32485.186359  ...              0              0              3           1348

[5 rows x 30 columns]
>>> df.shape
(68, 30)