Sharks are an ancient group of cartilaginous fishes in the class Chondrichthyes, subclass Elasmobranchii, and subdivision Selachimorpha. The selachians include over 500 species that are further placed in in two superorders: Squalomorphii and Galeomorphii. Modern sharks originated nearly 200 million years ago (Grogan, Lund, and Greenfest-Allen (2012)) and, since then, have diversified into nearly every major marine ecosystem. This evolutionary success has been attributed to the diversity of their locomotor modes and associated body shape (Lauder and Di Santo (2015),Thomson and Simanek (1977)). A modest body of work devoted to the ecomorphology of sharks has focused largely on caudal find shape as it relates to ecology and locomotor modes (Thomson (1976),Wilga and Lauder (2004)). Sharks have a heterocercal tail in which the caudal part of notochord flexes dorsally and supports a large dorsal lobe of the fin, while, with more than a few exceptions, the ventral lobe is small.
In there seminal work on shark ecomorphology, Thomson and Simanek (1977) put forth that there were four shark morphotypes:
A white shark, Carcharias carchardon, typical of Group 1. |
A bull shark, Carcharhinus leucas, typical of Group 2. |
|
A dogfish of the genus Squalus typical of Group 3. |
A kitefin shark, Dalatias licha, typical of Group 4. |
Sternes and Shimada (2020) used geometric morphometrics and a multivariate analysis to evaluate the groups proposed by Thomson and Simanek (1977) and found that sharks fell into one of two morphotypes (A and B). Their Group A was characterized by an elongate body, posteriorly placed dorsal fin, and a swept caudal while Group B was characterized by a deeper body, anteriorly place dorsal fins, and a less swept caudal fin. Thomson and Simanek (1977) suggested that the shape of species in these two broad ecomorphs evolved due to differences in ecological niche and locomotor behavior. Specifically, they posited that the species of elongate Group A are typically benthic and swim like eels whereas the deeper bodied species of Group B are often pelagic and swim like salmon or tunas.
Although the work of Thomson and Simanek (1977) and later Sternes and Shimada (2020) points to interesting patterns in shark body-shape, neither study explicitly studied the role habitat—that is, pelagic or benthic—had in influencing the evolution body-shape evolution. In particular, Sternes and Shimada (2020) merely observed that their two groups with different body shapes often varied in their habitat preference. In addition, Sternes and Shimada (2020) made no explicit, quantitative phylogenetic test of their patterns. This leaves open an important question: Do pelagic and benthic sharks have significantly different body shape? Subsequently, we might also ask, Has the morphology of pelagic and benthic sharks evolveed at the same rate? These are the questions we will take on in this project.
To answer such a question requires that we essentially recapitulate the Sternes and Shimada (2020) body-shape data set using landmark-based geometric morphometrics. This is a relatively straightforward, if tedious, method of capturing shape variation in a group of interest. For this, you’ll digitize 14 homologous landmarks on images of some 270 species of sharks. Using the best practices in the field of geometric morphometrics to align our landlandmakrs and extract shape information, we’ll assess morphological disparity in a phylogenetic frame work between pelagic and benthic sharks.
A few decades ago, a morphometric project like this would require visiting museums collections from around the world to take images of the specimens and species of interest. In fact, this was a major part of Prof. Kenaley’s PhD thesis, which included trips to natural history collections in France, Denmark, Great Britain, Taiwan, Japan, Monaco, New Zealand, and Australia. Ahhhhhh, the good ol’ days. Fortunately (or unfortunately), travel to such wonderful places may be avoided several resources exist that contain a trove of images of fish-like vertebrates.
We’ll make use of shark images in the FishBase catalog: a clearing house
of all things fish that has amassed tons of data from the scientific
literature. Prof. Kenaley used the rfishbase
package to rip
images of scientific drawings of sharks. This resulted in a cache of
images for the ~200 species included in a recent phylogeny of the
elasmobranchs (Stein et al. (2018)).
Because we’ll be comparing body shape in sharks, this project is by definition a comparative one and we’ll need a phylogeny for the group. Fortunately a relatviely recent phylogenomic analysis of over 500 species of elasmobranchs (sharks + batoids) has been published by Stein et al. (2018). Unfortunately, there is some phylogenetic discordance in this study. Specifically, when researchers analyze phylogenetic relationships, they typically repeat their analysis in a process known as bootstrapping. In phylogenetics, bootstrapping is conducted using the columns of the character matrix. Each pseudoreplicate contains the same number of species (rows) and characters (columns) randomly sampled from the original matrix, with replacement. A phylogeny is reconstructed from each pseudoreplicate, with the same methods used to reconstruct the phylogeny from the original data. This often results in a data set of trees that have conflicting relationships—that is, phylogenetic uncertainty—and this was the case for the Stein et al. (2018) study. We’ll have to account for this in our analysis. But, more later on . . .
In any case, we’ve used the taxa studied in this paper to find images in FishBase and we’ll use a tree file representing phylogeny for our comparative analysis.
As we learned in class and what you’ve been reading, researchers interested in comparing shape change between species often use landmark-based geometric morphometrics. A geometric morphometric study such as ours generally follow these steps:
Our shape analysis will be based on 14 landmarks placed on images of shark bodies of our ~200 species. To landmarks on the body, we’ll follow this generalize workflow below in the image analysis program FIJI (FIJI is just imageJ).
You and your team should will be assigned about 20 of the ~200 species to outline. Images of these species can be accessed through this directory. First choose one of your species to work on. For each species, start with what you think is the best image, that is, it has a scale bar, the wings are not damaged, and the hindwing is most exposed. Open that best image in Fiji.
After opening the photo, follow the steps below
Open a image.
Select the multipoint tool.
Digitize the 14 landmarks you see above.
Measure the XY positions of these points
(cmd/ctrl+m
).
Select the macro tab and run the macro
(cmd/ctrl+r
).
Save the results to the appropriate directory.
Close the results and image tab (don’t save the image).
Note: For the first image you outline, you will have to open the macro editor (“Plugins” \(\rightarrow\) “New” \(\Macro\)). Paste the script command from below into the editor and save this new macro to an appropriate directory, perhaps your cloned repo. You’ll want to access this later (“Plugins” \(\rightarrow\) “Edit”) so that you automate this process.
f = getTitle();
dir=getDirectory("Choose where to save data ");
selectWindow("Results");
saveAs("Results", dir+f+"_.csv");
This script saves the results as a tab-separated text file with filname as the image file name. This is super important!! and how we’ll link the landmarks to a species.
After your team has completed landmarking the assigned species, upload your text files to this directory. Once all the data appear here, Prof. Kenaley will evaluate the quality of the landmarks and do some post-processing if required (i.e., file-name checks, quality, etc.). You can then download these evaluated data for shape analysis.
After digitizing landmarks, we must remove the distracting variations of size, orientation, and position that can vary between specimens and thus add noisey variables in analysis of shape. This can be mitigated by using a superimposition method. Generalized procrustes analysis (GPA) is the superimposition method of choice nowadays. GPA removes the variation of size, orientation, and position by superimposing the landmarks in a common coordinate system. The landmarks for all specimens are optimally translated, rotated, and scaled based on a least-squared estimation. The first step is translation and rotation to minimize the squared and summed differences (squared Procrustes distance) between landmarks on each specimen. Then the landmarks are individually scaled to the same units of centroid size. Centroid size is the square root of the sum of squared distances of the landmarks in configuration to their mean location. The translation, rotation, and scaling bring the landmark configurations for all specimens into a common coordinate system so that the only differing variables are based on shape alone. The new superimposed landmarks can now be analyzed in multivariate statistical analyses
To perform a Procrustes superimposition on our landmark data—along
with many other analyses—we will turn the the geomorph
package. Please install and load geomorph
. But, before we
do, let’s download and read in about 90 or so landmark files that
Prof. Kenaley produceds ahead of this project. You can find these
example files here. You’ll
have to download this compressed directory and unzip it to your project
directory. You’ll also have to download and install the
abind
package as well.
To get started with superimposition, let’s load the packages
geomorph
and abind
. When you unzip the
shark_points.zip
directory to you project directory, you
have a folder named shark_points
. Our first operation will
be to list the files in that directory and then read them in, all at
once.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
shark_dat <- read_csv("shark_data.csv")
## Rows: 89 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): sp, file_name, habitat
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(geomorph)
## Loading required package: RRPP
## Loading required package: rgl
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
library(abind)
f <- list.files("shark_points",full.names = T,pattern=".csv")
xy <- read_csv(f,id="file") %>%
select(file,X,Y) %>%
mutate(file_name=gsub("*_.csv","",basename(file))) %>%
left_join(shark_dat)
## New names:
## • `` -> `...1`
## Rows: 1246 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (14): ...1, Area, Mean, Mode, X, Y, XM, YM, Perim., Circ., Slice, AR, Ro...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Joining with `by = join_by(file_name)`
Here we used list.files()
with
pattern=pattern=".csv"
to find all the files in the
shark_points
directory. We also set
full.names = T
so that the full file path is saved to the
object f
.
head(f)
## [1] "shark_points/Acnig_u0.gif_.csv" "shark_points/Alpel_u0.gif_.csv"
## [3] "shark_points/Alsup_u0.gif_.csv" "shark_points/Alvul_u0.gif_.csv"
## [5] "shark_points/Apbru_u1.gif_.csv" "shark_points/Aplau_u0.gif_.csv"
This made reading all the files and adding them to one tibble in a
single operation possible with read_csv()
. Notice we added
a file
column to the tibble with id="file"
and
then selected only the columns file
, X
,
Y
. This operation is saved to an object logically named
xy
.
For landmarks to used in geomorph
, they must be stored
as a 3-dimensional array, that is, essentially a list of 2D data.frames,
where the position the \(i^{th}\)
matrix is in the \(i^{th}\) index of
the third dimension, i.e., array[,,i]
. To massage our large
tibble into such an array, we have to split the tibble into a bunch of
lists using dplyr
’s group_split()
function.
Notice how we use a gsub()
operation to create a new column file_name
that removes
“.csv” from the file name data. Splitting by this, will leave used with
a list of lists as long as all our files. We then use
abind
’s abind()
function to bind the list of
data.frames in ldk_l
into an array, with each data.frame
taking a position in the third dimension. This array is saved in the
object ldk
. Lastly, we’ll specify that the names of this
third dimension should be the file_name
minus the “.csv”
extension. Note: this is how we’ll retrieve species
information.
ldk_l <- list()
shark_sp <- xy$sp %>% unique
for(i in shark_sp){
ldk_l[[i]] <- xy %>%
filter(sp==i) %>%
select(X,Y)
data.frame
}
ldk <- abind(ldk_l, along=3)
dimnames(ldk)[[3]] <-names(ldk_l)
ldk[,,1:3] #the first few
## , , Aculeola nigra
##
## X Y
## [1,] 39.5 185.5
## [2,] 260.0 141.5
## [3,] 292.0 136.5
## [4,] 293.5 160.0
## [5,] 464.0 172.5
## [6,] 596.0 162.5
## [7,] 522.0 222.0
## [8,] 454.5 194.5
## [9,] 380.0 198.5
## [10,] 341.0 214.5
## [11,] 209.0 202.5
## [12,] 180.0 194.5
## [13,] 100.5 202.5
## [14,] 74.5 209.0
##
## , , Alopias pelagicus
##
## X Y
## [1,] 38.0 166.000
## [2,] 162.5 141.500
## [3,] 186.5 101.500
## [4,] 192.5 138.500
## [5,] 295.5 152.500
## [6,] 606.5 94.500
## [7,] 321.5 206.500
## [8,] 300.5 181.500
## [9,] 249.5 183.500
## [10,] 228.5 187.500
## [11,] 135.5 185.500
## [12,] 103.5 188.500
## [13,] 69.0 179.833
## [14,] 59.5 180.333
##
## , , Alopias superciliosus
##
## X Y
## [1,] 38.5 173.5
## [2,] 221.5 135.0
## [3,] 257.0 89.5
## [4,] 261.5 136.0
## [5,] 364.0 152.0
## [6,] 604.5 71.5
## [7,] 389.5 205.5
## [8,] 363.0 177.5
## [9,] 310.0 184.0
## [10,] 282.5 197.0
## [11,] 168.5 194.0
## [12,] 138.5 196.5
## [13,] 82.5 190.5
## [14,] 67.5 189.0
With a proper array in place as ldk
, we’re now free to
move on the superimposition with the geomorph
function
gpagen()
. We’ll do this saving the output to
ldk_al
(for landmark alignment) and plot the the 14 aligned
and superimposed landmarks for all the example specimens.
ldk_al <- gpagen(ldk)
##
## Performing GPA
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
##
## Making projections... Finished!
plot(ldk_al)
In geometric morphometric analysis, researchers first turn to describing the axes of shape variation.Indeed, this is what Sternes and Shimada (2020) did to produce the figure above. This and what we’re after is often termed a morphospace, a representation of shape the study group does and does not have.
To develop such a thing, biologist usually turn to Principal
Components Analysis (PCA) whereby the very complicated suite of
variables is distilled into a set of fewer variables that describe the
important variance among the original variables. The coordinates
contained in our ldk_al
object describe dozens of
variables: scaled distances between all the landmarks. So we have many
variable to consider and this becomes, very quickly, a serious multivariate
problem. PCA is suited to multivariate analysis because it reduces
the complexity of the data, taking many variables and distilling the
into just a few. At its core, PCA produces a series of regressions, or
vectors, that pass through the values of the original variables. This
operation is performed iteratively, with the first vector (the first
principal component, PC1), drawn through the variable space that
accounts for the most variance in the data set. After the combination of
variables forming PC1 is removed, a subsequent vector (PC2) is drawn
through another unique variable space that describes the second most
amount of variance. This continues until most of the variance is
captured, resulting in scores of PCs. The first two or three, however,
are usually enough to describe a health amount of variance, say 90% or
more.
To perform PCA, we’ll use geomorph
s
gm.prcomp()
function like so. Plotting reveals our first
result.
pca <- gm.prcomp(ldk_al$coords)
plot(pca)
Wow, now we’re getting close to the Sternes and Shimada (2020) results and answering, at least in part, one of our questions: Does habitat result in different body shapes. But let’s break down what this plot means first. By default, when we plot a PCA object, the first two PCs are plotted, PC1 on the x axis, PC2 on the y. Each point represents the PC scores for a specimen in the data set. Notice that the amount of variance explained by our PCs is given and, as expected, PC1 contains most of the variance (54%) and PC2 considerably less less (19%). If we plot again, but specify a different combination of PCs, say 1 and 3, we get a different plot, of course. Notice the variance explained by PC3 is predictably less than that of PC2.
plot(pca,1,3)
OK, so based on our PC plots, it looks as if we have two groupings, just like Sternes and Shimada (2020). But, do these groups correspond to largely pelagic vs. benthic species? It’s hard to tell. We don’t really know because we don’t know which species belong to each point in our plots, nor their habitat.
Notice, however, that the names of the coordinates in
ldk
were pegged in a previous operation to the image file
names.
head(dimnames(ldk)[[3]])
## [1] "Aculeola nigra" "Alopias pelagicus" "Alopias superciliosus"
## [4] "Alopias vulpinus" "Apristurus brunneus" "Apristurus laurussonii"
These values were maintained throughout the superimpositon step and
in the PCA operation. The coordinates plotted in the in PCA plots are
contained in a matrix named x
in our pca
object. Notice the row names:
pca$x %>% data.frame %>% select(Comp1,Comp2) %>% head
## Comp1 Comp2
## Aculeola nigra 0.05844921 0.03217813
## Alopias pelagicus 0.07242735 -0.31245808
## Alopias superciliosus 0.01513342 -0.17090354
## Alopias vulpinus 0.07868014 -0.31153641
## Apristurus brunneus -0.13808874 0.00307793
## Apristurus laurussonii -0.13393379 -0.04949571
So, we have the image file names from whence the data, both land mark and PCA, came from. Fortunately, professor Kenaley assembled a table containin of all the the file names used for landmarking, and the species and habitat for each. You can download this file and read in the data.
shark_dat <- read_csv("shark_data.csv")
## Rows: 89 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): sp, file_name, habitat
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
What’s left to do in exploring if benthic and pelagic sharks occupy
different morphospace is to combine this information with the PC scores
in pca$x
. This can be done thusly:
PCA <- pca$x %>%
data.frame %>%
select(Comp1:Comp4) %>%
mutate(sp=rownames(pca$x)) %>%
left_join(shark_dat)
## Joining with `by = join_by(sp)`
head(PCA)
## Comp1 Comp2 Comp3 Comp4 sp
## 1 0.05844921 0.03217813 0.03313839 -0.018461267 Aculeola nigra
## 2 0.07242735 -0.31245808 0.12037744 -0.025605227 Alopias pelagicus
## 3 0.01513342 -0.17090354 0.13645281 0.014632816 Alopias superciliosus
## 4 0.07868014 -0.31153641 0.08396053 -0.006664429 Alopias vulpinus
## 5 -0.13808874 0.00307793 0.21400019 -0.049243871 Apristurus brunneus
## 6 -0.13393379 -0.04949571 0.01343577 0.007815099 Apristurus laurussonii
## file_name habitat
## 1 Acnig_u0.gif benthic
## 2 Alpel_u0.gif pelagic
## 3 Alsup_u0.gif pelagic
## 4 Alvul_u0.gif pelagic
## 5 Apbru_u1.gif benthic
## 6 Aplau_u0.gif benthic
With that over with, we can dispence with the ugly PCA plots above
and use ggplot
to visualize our results with repsect to
habitat.
PCA %>%
ggplot(aes(Comp1,Comp2,col=habitat))+geom_point()
PCA %>%
ggplot(aes(Comp1,Comp3,col=habitat))+geom_point()
Voila! For morphospace described by PCs 1, 2, and 3, it appeears that pelagic species inhabit a high PC1 score and never a low PC1 score.
This is wonderful, but you may be asking, “What is PC1 (or any other) describing?” Our PCs points are abstractions of shape, but they don’t represent explicitly what shape that is. To get a sense of what our PCs represent in terms of shape values, we can turn to thin-plate spline analysis.
max_PC1 <- which.max(pca$x[,1])
min_PC1 <- which.min(pca$x[,1])
max_PC2 <- which.max(pca$x[,2])
min_PC2 <- which.min(pca$x[,2])
M <- ldk_al$consensus
par(mfrow=c(2,2))
plotRefToTarget(M, ldk_al$coords[,,min_PC1], mag = 1,method="vector",label = T)
title("min PC1 warp",cex=0.6)
plotRefToTarget(M, ldk_al$coords[,,max_PC1], mag = 1,method="vector",label = T)
title("max PC1 warp",cex=0.6)
plotRefToTarget(M, ldk_al$coords[,,min_PC2], mag = 1,method="vector",label = T)
title("min PC2 warp",cex=0.6)
plotRefToTarget(M, ldk_al$coords[,,max_PC2], mag = 1,method="vector",label = T)
title("max PC3 warp",cex=0.6)
Morphological disparity
library(ape)
library(phytools)
phy <- readNexus("data/shark_trees.nex")
phy2 <- keep.tip(phy,PCA$sp)
gdf <- geomorph.data.frame(ldk_al,
species = PCA$sp,
habitat=PCA$habitat
)
md <- morphol.disparity(coords ~ 1, groups= ~ habitat, data = gdf, iter = 10000, print.progress = FALSE)
summary(md)
gdf <- geomorph.data.frame(ldk_al,
species = PCA$sp,
habitat=PCA$habitat,
phy=phy2[[1]]
)
setdiff(phy2[[1]]$tip.label,PCA$sp)
shark_pgls <- procD.pgls(coords ~ habitat, phy = phy, data = gdf,
iter = 999)
Evolutionary rates
–>
–>
–>
–> –>
–>
–>
–> –> –> –> –> –> –>
–>
–>
–>
–> –> –>
–>
–> –>
–> –> –>
–> –> –> –>
–>
–> –>
–> –> –>
–>
–> –>
–>
–> –> –>
–> –>
–>
–> –> –>
–>
–>
–> –> –> –>
–> –>
–> –>
–> –> –> –>
–> –>
–> –>
–>
–> –>
–>
–>
–>
–> –> –> –> –> –> –> –> –>
–>
–>
–>
–>
–>
–>
–> –>
–>
–> –>
–>
–>
–>
–> –> –> –>
–>
–> –>
–> –> –> –>
–>
–> –>
–> –> –> –>
–>
–> –>
–>
–> –>
–> –> –> –> –> –>
–> –> –> –> –> –>
–> –> –>
–> –> –>
–>
–> –> –> –> –>
–> –>
–> –> –> –>
–>
–>
–>
–> –>
–>
–> –>
–>
–> –> –> –> –>
–> –> –> –> –>
–>
–> –> –> –> –>
–> –>
–> –>
–> –> –> –>
–> –> –>
–> –> –> –>
–> –> –>
–> –> –> –> –> –> –> –> –>
–> –> –>
–>
–> –> –> –> –>
–> –> –>
–> –> –> –> –>
–>
–> –> –> –> –>
–> –> –> –> –> –>
–> –> –>
–> –> –> –> –> –>
–> –> –>
–>
–> –>
–> –> –>
–> –>
–> –> –>
–> –>
–> –>
–>
–> –> –> –>
–> –> –>
–> –> –> –>
–> –>
–>
–>
–> –> –> –>
–> –> –> –>
–> –>
–>
–>
–> –> –> –>
–> –> –>
–> –> –> –>
–> –> –>
–>
–> –>
–>
–> –> –> –> –>
–> –>
–>
–> –> –> –> –>
–> –>
–> –>
–> –> –> –> –>
–>
–> –>
–> –>
–> –>
–> –>
–> –> –> –>
–>
–> –> –> –>
–>
–> –> –> –> –>
–> –>
–> –> –>
–>
–> –> –> –>
–>
–> –>
–> –> –> –> –> –> –> –>
–> –> –> –> –> –> –>
–> –> –>
–> –>
–>
–>
–>
–> –>
–> –> –>
–> –> –> –>
–> –> –>
–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>
–>
–>
–> –>
–>
–>
–> –> –> –> –>
–>
–> –> –>
–>
–>
–> –>
–>
–> –> –>
–>