Title: | Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data |
Version: | 2.3-4 |
Date: | 2024-09-24 |
Depends: | R (≥ 2.10), MASS, mclust |
Suggests: | spdep, spatialreg, bootstrap, foreign, mvtnorm |
Description: | Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetic data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distance-based regressions are equal. Try package?prabclus for on overview. |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
URL: | https://www.unibo.it/sitoweb/christian.hennig/en/ |
NeedsCompilation: | no |
Packaged: | 2024-09-24 10:06:15 UTC; chrish |
Author: | Christian Hennig [aut, cre], Bernhard Hausdorf [aut] |
Maintainer: | Christian Hennig <christian.hennig@unibo.it> |
Repository: | CRAN |
Date/Publication: | 2024-09-24 11:10:02 UTC |
prabclus package overview
Description
Here is a list of the main functions in package prabclus. Most other functions are auxiliary functions for these.
Initialisation
- prabinit
Initialises presence/absence-, abundance- and multilocus data with dominant markers for use with most other key prabclus-functions.
- alleleinit
Initialises multilocus data with codominant markers for use with key prabclus-functions.
- alleleconvert
Generates the input format required by
alleleinit
.
Tests for clustering and nestedness
- prabtest
-
Computes the tests introduced in Hausdorf and Hennig (2003) and Hennig and Hausdorf (2004; these tests occur in some further publications of ours but this one is the most detailed statistical reference) for presence/absence data. Allows use of the geco-dissimilarity (Hennig and Hausdorf, 2006).
- abundtest
-
Computes the test introduced in Hausdorf and Hennig (2007) for abundance data.
- homogen.test
A classical distance-based test for homogeneity going back to Erdos and Renyi (1960) and Ling (1973).
Clustering
- prabclust
Species clustering for biotic element analysis (Hausdorf and Hennig, 2007, Hennig and Hausdorf, 2004 and others), clustering of individuals for species delimitation (Hausdorf and Hennig, 2010) based on Gaussian mixture model clustering with noise as implemented in R-package
mclust
, Fraley and Raftery (1998), on output of multidimensional scaling from distances as computed byprabinit
oralleleinit
. See alsostressvals
for help with choosing the number of MDS-dimensions.- hprabclust
An unpublished alternative to
prabclust
using hierarchical clustering methods.- lociplots
Visualisation of clusters of genetic markers vs. clusters of species.
- NNclean
Nearest neighbor based classification of observations as noise/outliers according to Byers and Raftery (1998).
Dissimilarity matrices
- alleledist
Shared allele distance (see the corresponding help pages for references).
- dicedist
Dice distance.
- geco
geco coefficient, taking geographical distance into account.
- jaccard
Jaccard distance.
- kulczynski
Kulczynski dissimilarity.
- qkulczynski
Quantitative Kulczynski dissimilarity for abundance data.
Communities
- communities
Constructs communities from geographical distances between individuals.
- communitydist
chord-, phiPT- and various versions of the shared allele distance between communities.
Tests for equality of dissimilarity-based regression
- regeqdist
Jackknife-based test for equality of two independent regressions between distances (Hausdorf and Hennig 2019).
- regdistbetween
Jackknife-based test for equality of regression involving all distances and regression involving within-group distances only (Hausdorf and Hennig 2019).
- regdistbetweenone
Jackknife-based test for equality of regression involving within-group distances of a reference group only and regression involving between-group distances (Hausdorf and Hennig 2019).
Small conversion functions
- coord2dist
Computes geographical distances from geographical coordinates.
- geo2neighbor
Computes a neighborhood list from geographical distances.
- alleleconvert
A somewhat restricted function for conversion of different file formats used for genetic data with codominant markers.
Data sets
kykladspecreg
, siskiyou
,
veronica
, tetragonula
.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/
References
Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.
Erdos, P. and Renyi, A. (1960) On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17-61.
Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clusterin method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.
Hausdorf, B. and Hennig, C. (2003) Nestedness of north-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
Hausdorf, B. and Hennig, C. (2007) Null model tests of clustering of species, negative co-occurrence patterns and nestedness in meta-communities. Oikos 116, 818-828.
Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, 59, 491-503.
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
Hennig, C. and Hausdorf, B. (2006) A robust distance coefficient between distribution areas incorporating geographic distances. Systematic Biology 55, 170-175.
Ling, R. F. (1973) A probability theory of cluster analysis. Journal of the American Statistical Association 68, 159-164.
Parametric bootstrap test for clustering in abundance matrices
Description
Parametric bootstrap test of a null model of i.i.d., but spatially
autocorrelated species against clustering of the species' population
patterns. Note that most relevant functionality of prabtest
(except of the use of the geco distance) is
also included in abundtest
, so that abundtest
can also
be used on binary presence-absence data.
In spite of the lots of
parameters, a standard execution (for the default test statistics, see
parameter teststat
below) will be
prabmatrix <- prabinit(file="path/abundmatrixfile",
neighborhood="path/neighborhoodfile")
test <- abundtest(prabmatrix)
summary(test)
Note: Data formats are described
on the prabinit
help page. You may also consider the example datasets
kykladspecreg.dat
and nb.dat
. Take care of the
parameter rows.are.species
of prabinit
.
Usage
abundtest(prabobj, teststat = "distratio", tuning = 0.25,
times = 1000, p.nb = NULL,
prange = c(0, 1), nperp = 4, step = 0.1, step2 = 0.01,
twostep = TRUE, species.fixed=TRUE, prab01=NULL,
groupvector=NULL,
sarestimate=prab.sarestimate(prabobj),
dist = prabobj$distance,
n.species = prabobj$n.species)
Arguments
prabobj |
an object of class |
teststat |
string, indicating the test statistics. |
tuning |
integer or (if |
times |
integer. Number of simulation runs. |
p.nb |
numerical between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. If |
prange |
numerical range vector, lower value not smaller than 0, larger
value not larger than 1. Range where |
nperp |
integer. Number of simulations per |
step |
numerical between 0 and 1. Interval length between
subsequent choices of |
step2 |
numerical between 0 and 1. Interval length between
subsequent choices of |
twostep |
logical. If |
species.fixed |
logical. Indicates if the range sizes of the species
are held fixed
in the test simulation ( |
prab01 |
|
groupvector |
integer vector. For every species, a number
indicating the species' group membership. Needed only if
|
sarestimate |
Estimator of the parameters of a simultaneous
autoregression model corresponding to the null model for abundance
data from Hausdorf and Hennig (2007) as generated by
|
dist |
One of |
n.species |
number of species. By default this is taken from
|
Details
For presence-absence data, the routine is described in
prabtest
. For abundance data, the first step under the
null model is to
simulated presence-absence patterns as in prabtest
. The second
step is to fit a simultaneous autoregression (SAR) model (Ripley 1981,
section 5.2) to the log-abundances, see
prab.sarestimate
. The simulation from the null model is
implemented in regpop.sar
.
For more details see Hennig
and Hausdorf (2004) for presence-absence data and Hausdorf and Hennig
(2007) for abundance data and the test statistics "mean"
and
"groups"
, which can also be applied to binary data.
If p.nb=NA
was
specified, a diagnostic plot
for the estimation of pd
is plotted by autoconst
.
For details see Hennig
and Hausdorf (2004) and the help pages of the cited functions.
Value
An object of class prabtest
, which is a list with components
results |
vector of test statistic values for all simulated
populations. For |
p.above |
p-value against an alternative that generates large
values of the test statistic (usually reasonable for
|
p.below |
p-value against an alternative that generates small
values of the test statistic (usually reasonable for
|
datac |
test statistic value for the original
data. ( |
tuning |
see above. |
distance |
|
teststat |
see above. |
pd |
|
abund |
|
sarlambda |
Estimator of the autocorrelation
parameter |
sarestimate |
the output object of |
groupinfo |
list containing information from
|
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2007) Null model tests of clustering of species, negative co-occurrence patterns and nestedness in meta-communities. Oikos 116, 818-828.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
Ripley, B. D. (1981) Spatial Statistics. Wiley.
See Also
prabinit
generates objects of class prab
.
autoconst
estimates pd
from such objects.
prabtest
(analogous function for presence-absence data).
regpop.sar
generates populations from the null model.
prab.sarestimate
(parameter estimators for simultaneous
autoregression model). This calls
errorsarlm
(original estimation function from
package spdep
).
Some more information on the test statistics is given in
homogen.test
, lcomponent
,
distratio
, nn
,
incmatrix
.
Summary and print methods: summary.prabtest
.
Examples
# Note: NOT RUN.
# This needs package spdep and a bunch of packages that are
# called by spdep!
# data(siskiyou)
# set.seed(1234)
# x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb,
# distance="logkulczynski")
# a1 <- abundtest(x, times=5, p.nb=0.0465)
# a2 <- abundtest(x, times=5, p.nb=0.0465, teststat="groups",
# groupvector=siskiyou.groups)
# These settings are chosen to make the example execution
# faster; usually you will use abundtest(x).
# summary(a1)
# summary(a2)
Converts alleleobject into binary matrix
Description
Converts alleleobject
with codominant markers into
binary matrix with a column for each marker.
Usage
allele2zeroone(alleleobject)
Arguments
alleleobject |
object of class |
Value
A 0-1-matrix with individuals as rows and markers (alleles) as columns.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
Examples
data(tetragonula)
ta <- alleleconvert(strmatrix=tetragonula[21:50,])
tai <- alleleinit(allelematrix=ta)
allele2zeroone(tai)
Format conversion for codominant marker data
Description
Codominant marker data (which here means: data with several diploid
loci; two alleles per locus) can be represented in various ways. This
function converts the formats "genepop"
and "structure"
into
"structurama"
and "prabclus"
. "genepop"
is a version of the format
used by the package GENEPOP (Rousset, 2008), "structure"
is a version
of what is used by STRUCTURE (Pritchard et al., 2000), another one is
"structureb"
. "structurama"
is a version of what is used by STRUCTURAMA (Huelsenbeck and
Andolfatto, 2007) and "prabclus"
is required by the function
alleleinit
in the present package.
Usage
alleleconvert(file=NULL,strmatrix=NULL, format.in="genepop",
format.out="prabclus",
alength=3,orig.nachar="000",new.nachar="-",
rows.are.individuals=TRUE, firstcolname=FALSE,
aletters=intToUtf8(c(65:90,97:122),multiple=TRUE),
outfile=NULL,skip=0)
Arguments
file |
string. Filename of input file, see details. One of
|
strmatrix |
matrix or data frame of strings, see details. One of
|
format.in |
string. One of |
format.out |
string. One of |
alength |
integer. If |
orig.nachar |
string. Code for missing values in input data. |
new.nachar |
string. Code for missing values in output data. |
rows.are.individuals |
logical. If |
firstcolname |
logical. If |
aletters |
character vector. String of default characters for
alleles if |
outfile |
string. If specified, the output matrix (omitting
quotes) is written to a file of this name (including row names if
|
skip |
number of rows to be skipped when reading data from a
file ( |
Details
The formats are as follows (described is the format within R, i.e.,
for the input, the format of strmatrix
; if file
is
specified, the file is read with
read.table(file,colClasses="character")
and should give the
format explained below - note that colClasses="character"
implies that quotes are not needed in the input file):
- genepop
Alleles are coded by strings of length
alength
and there is no space between the two alleles in a locus, so a value of"258260"
means that in the corresponding locus the two alleles have codes 258 and 260.- structure
Alleles are coded by strings of arbitrary length. Two rows correspond to each inidividual, the first row containing the first alleles in all loci and the second row containing the second ones.
- structureb
Alleles are coded by strings of arbitrary length. One row corresponds to each inidividual, containing first and second alleles in all loci (first and second allele of first locus, first and second allele of second locus etc.). This starts in the third row (first two have locus names and other information).
- structurama
Alleles are coded by strings of arbitrary length. the two alleles in each locus are written with brackets around them and a comma in between, so
"258260"
in"genepop"
corresponds to"(258,260)"
in"structurama"
.- prabclus
Alleles are coded by a single character and there is no space between the two alleles in a locus (e.g.,
"AC"
).
Value
A matrix of strings in the format specified as format.out
with
an attribute "alevels"
, a vector of all used allele codes if
format.out=="prabclus"
, otherwise vector of allele codes of
last locus.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Huelsenbeck, J. P., and P. Andolfatto (2007) Inference of population structure under a Dirichlet process model. Genetics 175, 1787-1802.
Pritchard, J. K., M. Stephens, and P. Donnelly (2000) Inference of population structure using multi-locus genotype data. Genetics 155, 945-959.
Rousset, F. (2008) genepop'007: a complete re-implementation of the genepop software for Windows and Linux. Molecular Ecology Resources 8, 103-106.
See Also
Examples
data(tetragonula)
# This uses example data file Heterotrigona_indoFO.dat
str(alleleconvert(strmatrix=tetragonula))
strucmatrix <-
cbind(c("I1","I1","I2","I2","I3","I3"),
c("122","144","122","122","144","144"),c("0","0","21","33","35","44"))
alleleconvert(strmatrix=strucmatrix,format.in="structure",
format.out="prabclus",orig.nachar="0",firstcolname=TRUE)
alleleconvert(strmatrix=strucmatrix,format.in="structure",
format.out="structurama",orig.nachar="0",new.nachar="-9",firstcolname=TRUE)
Shared allele distance for diploid loci
Description
Shared allele distance for codominant markers (Bowcock et al., 1994). One minus proportion of alleles shared by two individuals averaged over loci (loci with missing values for at least one individual are ignored).
Usage
alleledist(allelelist,ni,np,count=FALSE)
Arguments
allelelist |
a list of lists. In the "outer" list, there are
|
ni |
integer. Number of individuals. |
np |
integer. Number of loci. |
count |
logical. If |
Value
A symmetrical matrix of shared allele distances between individuals.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Bowcock, A. M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J. R., Cavalli-Sforza, L. L. (1994) High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455-457.
See Also
alleleinit
, unbuild.charmatrix
Examples
data(tetragonula)
tnb <-
coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE)
ta <- alleleconvert(strmatrix=tetragonula[1:50,])
tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist,distance="none")
str(alleledist((unbuild.charmatrix(tai$charmatrix,50,13)),50,13))
Diploid loci matrix initialization
Description
alleleinit
converts genetic data with diploid loci as generated
by alleleconvert
into an object of class
alleleobject
. print.alleleobject
is a print method for such
objects.
Usage
alleleinit(file = NULL, allelematrix=NULL,
rows.are.individuals = TRUE,
neighborhood = "none", distance = "alleledist", namode="variables",
nachar="-", distcount=FALSE)
## S3 method for class 'alleleobject'
print(x, ...)
Arguments
file |
string. File name. File must be in |
allelematrix |
matrix in |
rows.are.individuals |
logical. If |
neighborhood |
A string or a list with a component for
every individual. The
components are vectors of integers indicating
neighboring individuals. An individual without neighbors
should be assigned a vector |
distance |
|
namode |
one of |
nachar |
character denoting missing values. |
distcount |
logical. If |
x |
object of class |
... |
necessary for print method. |
Details
The required input format is the output format "prabclus"
of
alleleconvert
. Alleles are coded by a single character,
so diploid loci need to be pairs of characters without space between
the two alleles (e.g., "AC"). The input needs to be an
individuals*loci matrix or data frame (or a file that produces such
a data frame by read.table(file,stringsAsFactors=FALSE)
)
Value
alleleinit
produces
an object of class alleleobject
(note that this is similar to
class prab
; for example both can be used with
prabclust
), which is a list with components
distmat |
distance matrix between individuals. |
amatrix |
data frame of input data with string variables in the input format, see details. Note that in the output for an individual the whole locus is declared missing if at least one of its alleles is missing in the input. |
charmatrix |
matrix of characters in which there are two rows for
every individual corresponding to the two alleles in every locus
(column). Entries are allele codes but missing values are coded as
|
nb |
neighborhood list, see above. |
ext.nblist |
a neighborhood list in which for every row in
|
n.variables |
number of loci. |
n.individuals |
number of individuals. |
n.levels |
maximum number of different alleles in a locus. |
n.species |
identical to |
alevels |
character vector with all used allele codes not including missing values. |
leveldist |
matrix in which rows are loci, columns are alleles and entries are frequencies of alleles per locus. |
prab |
useless matrix of number of factor levels corresponding to
|
regperspec |
vector of row-wise sums of |
specperreg |
vector of column-wise sums of |
distance |
string denoting the chosen distance measure, see above. |
namode |
see above. |
naprob |
probability of missing values, numeric or vector, see
documentation of argument |
nasum |
number of missing entries (individual/loci) in
|
nachar |
see above. |
spatial |
logical. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
alleleconvert
, alleledist
,
prabinit
.
Examples
# Only 50 observations are used in order to have a fast example.
data(tetragonula)
tnb <-
coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE)
ta <- alleleconvert(strmatrix=tetragonula[1:50,])
tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist)
print(tai)
Internal: compares two pairs of alleles
Description
Used for computation of the genetic distances alleledist
.
Usage
allelepaircomp(allelepair1,allelepair2,method="sum")
Arguments
allelepair1 |
vector of two allele codes (usually characters), or
|
allelepair2 |
vector of two allele codes (usually characters), or
|
method |
one of |
Value
If method=="sum"
, number of shared alleles (0, 1 or 2), or
NA
. If method=="geometrical"
, 0, 0.5, sqrt(0.5)
(in case that one of the allelepairs is double such as in
c("A","B"),c("A","A")
) or 1, or
NA
.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
allelepaircomp(c("A","B"),c("A","C"))
Spatial autocorrelation parameter estimation
Description
Monte Carlo estimation of the disjunction/spatial autocorrelation
parameter pd
for the simulation model used in
randpop.nb
, used for tests for clustering of presence-absence data.
autoconst
is the main function; autoreg
performs the
simulation and is executed within autoconst
.
Usage
autoconst(x, prange = c(0, 1), twostep = TRUE, step1 = 0.1,
step2 = 0.01, plot = TRUE, nperp = 4, ejprob = NULL,
species.fixed = TRUE, pdfnb=FALSE, ignore.richness=FALSE)
autoreg(x, probs, ejprob, plot = TRUE, nperp = 4, species.fixed = TRUE,
pdfnb=FALSE, ignore.richness=FALSE)
Arguments
x |
object of class |
prange |
numerical range vector, lower value not smaller than 0, larger value not larger than 1. Range where the parameter is to be found. |
twostep |
logical. If |
step1 |
numerical between 0 and 1. Interval length between
subsequent choices of |
step2 |
numerical between 0 and 1. Interval length between
subsequent choices of |
plot |
logical. If |
nperp |
integer. Number of simulations per |
ejprob |
numerical between 0 and 1. Observed disjunction
probability for data |
species.fixed |
logical. If |
probs |
vector of numericals between 0 and 1. |
pdfnb |
logical. If |
ignore.richness |
logical. If |
Details
The spatial autocorrelation parameter pd
of the model for the generation of
presence-absence data sets used by randpop.nb
can be estimated
by use of the observed disjuction probability ejprob
which is
the sum of
all species' connectivity components minus the number of species
divided by the number of "presence" entries minus the number of
species. This is done by a simulation of artificial data sets with
characteristics of x
and different pd
-values, governed
by prange, step1, step2
and nperp
. ejprob
is then
calculated for all simulated populations. A linear regression of
ejprob
on pd
is performed and the estimator of pd
is determined by computing the inverse of the regression function for
the ejprob
-value of x
.
Value
autoconst
produces the same list as autoreg
with
additional component ejprob
. The components are
pd |
(eventually) estimated parameter |
coef |
(eventually) estimated regression coefficients. |
ejprob |
see above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. To appear in Systematic Biology.
Hausdorf, B. and Hennig, C. (2003) Nestedness of north-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
See Also
randpop.nb
, prabinit
, con.comp
Examples
options(digits=4)
data(kykladspecreg)
data(nb)
set.seed(1234)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
ax <- autoconst(x,nperp=2,step1=0.3,twostep=FALSE)
Internal: create character matrix out of allele list
Description
For use in alleleinit
.
Creates a matrix of characters in which there are two rows for
every individual corresponding to the two alleles in every locus
(column) out of a list of lists, such as required by
alleledist
.
Usage
build.charmatrix(allelelist,n.individuals,n.variables)
Arguments
allelelist |
A list of lists. In the "outer" list, there are
|
n.individuals |
integer. Number of individuals. |
n.variables |
integer. Number of loci. |
Value
A matrix of characters in which there are two rows for every individual corresponding to the two alleles in every locus (column).
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
alleleinit
, unbuild.charmatrix
Examples
alist <- list()
alist[[1]] <- list(c("A","A"),c("B","A"),c(NA,NA))
alist[[2]] <- list(c("A","C"),c("B","B"),c("A","D"))
build.charmatrix(alist,3,2)
Internal: generates neighborhood list for diploid loci
Description
This is for use in alleleinit
.
Given a neighborhood list of individuals, a new neighborhood list is
generated in which there are two entries for each individual (entry 1
and 2 refer to individual one, 3 and 4 to individual 2 and so
on). Neighborhoods are preserved and additionally the two entries
belonging to the same individual are marked as neighbors.
Usage
build.ext.nblist(neighbors,n.individuals=length(neighbors))
Arguments
neighbors |
list of integer vectors, where each vector contains the neighbors of an individual. |
n.individuals |
integer. Number of individuals. |
Value
list with 2*n.inidividuals
vectors of integers as described
above.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
data(veronica)
vnb <- coord2dist(coordmatrix=veronica.coord[1:20,], cut=20,
file.format="decimal2",neighbors=TRUE)
build.ext.nblist(vnb$nblist)
Generate spatial weights from prabclus neighborhood list
Description
This generates a listw
-object as needed for estimation of a
simultaneous autoregression model in package spdep
from a
neighborhood list of the type generated in prabinit
.
Usage
build.nblist(prabobj,prab01=NULL,style="C")
Arguments
prabobj |
object of class |
prab01 |
presence-absence matrix of same dimensions than the
abundance matrix of |
style |
can take values "W", "B", "C", "U", and "S" though tests
suggest that "C" should be chosen. See |
Value
A 'listw' object with the following members:
style |
see above. |
neighbours |
the neighbours list in |
weights |
the weights for the neighbours and chosen style, with attributes set to report the type of relationships (binary or general, if general the form of the glist argument), and style as above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
nb2listw
(which is called)
Examples
# Not run; requires package spdep
# data(siskiyou)
# x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb,
# distance="logkulczynski")
# build.nblist(x)
Simulation of presence-absence matrices (clustered)
Description
Generates a simulated matrix where the rows are interpreted as regions
and the columns as species, 1 means that a species is present in the
region and 0 means that the species is absent. Species are generated
in order to produce 2 clusters of species with similar ranges.
Spatial autocorrelation of a species' presences is governed by
the parameter p.nb
and a list of neighbors for each region.
Usage
cluspop.nb(neighbors, p.nb = 0.5, n.species, clus.specs, reg.group,
grouppf = 10, n.regions = length(neighbors),
vector.species = rep(1, n.species), pdf.regions = rep(1/n.regions,
n.regions), count = TRUE, pdfnb = FALSE)
Arguments
neighbors |
A list with a component for every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a list |
p.nb |
numerical between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. Note that for a given
presence-absence matrix, this parameter can be estimated by
|
n.species |
integer. Number of species. |
clus.specs |
integer not larger than |
reg.group |
vector of pairwise distinct integers not larger than
|
grouppf |
numerical. The probability of the region of
a clustered species to belong to the corresponding group of regions
is up-weighted by factor |
n.regions |
integer. Number of regions. |
vector.species |
vector of integers. The sizes
(i.e., numbers of regions)
of the species are generated randomly from
the empirical distribution of |
pdf.regions |
numerical vector of length |
count |
logical. If |
pdfnb |
logical. If |
Details
The non-clustered species are generated as explained on the help page
for randpop.nb
. The general principle for the clustered species
is the same, but with modified probabilities for the regions. For each
clustered species, one of the two groups of regions is drawn,
distributed according to the sum of its regions' probability given by
pdf.regions
. The first region of such a species is only drawn
from the regions of this group.
Value
A 0-1-matrix, rows are regions, columns are species.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
See Also
autoconst
estimates p.nb
from matrices of class
prab
. These are generated by prabinit
.
Examples
data(nb)
set.seed(888)
cluspop.nb(nb, p.nb=0.1, n.species=10, clus.specs=9, reg.group=1:17,
vector.species=c(10))
Construct communities from individuals
Description
Construct communities from individuals using geographical distance and
hierarchical clustering. Communities are clusters of geographically
close individuals, formed by hclust
with specified
distance cutoff.
Usage
communities(geodist,grouping=NULL,
cutoff=1e-5,method="single")
Arguments
geodist |
|
grouping |
something that can be coerced into a factor. Different
groups indicated by |
cutoff |
numeric; clustering distance cutoff value, passed on as
parameter |
method |
|
Value
Vector of community memberships for the individuals (integer numbers from 1 to the number of communities without interruption.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
data(veronica)
ver.geo <- coord2dist(coordmatrix=veronica.coord[1:90,],file.format="decimal2")
species <-c(rep(1,64),rep(2,17),rep(3,9))
communities(ver.geo,species)
Distances between communities
Description
Constructs distances between communities: chord- (Cavalli-Sforza and Edwards, 1967), phiPT/phiST (Peakall and Smouse, 2012, Meirmans, 2006), three versions of the shared allele distance between communities, and geographical distance between communities.
Usage
communitydist(alleleobj,comvector="auto",distance="chord",
compute.geodist=TRUE,out.dist=FALSE,
grouping=NULL,geodist=NA,diploid=TRUE,
phiptna=NA,...)
Arguments
alleleobj |
if |
comvector |
either a vector of integers indicating to which
community an individual belongs (these need to be numbered from 1 to
a maximum number without interruption), or |
distance |
one of |
compute.geodist |
logical, indicating whether geographical distances between communities should be generated. |
out.dist |
logical, indicating whether |
grouping |
something that can be coerced into a factor, for
passing on to |
geodist |
matrix or |
diploid |
logical, indicating whether loci are diploid, see
|
phiptna |
if |
... |
optional arguments to be passed on to |
Details
All genetic distances between communities are based on the information
given in alleleobj
; either on the alleles directly or on a genetic
distance (distmat
-component, see alleleinit
).
The possible genetic distance measures between communities are as follows:
-
"chord"
: chord-distance (Cavalli-Sforza and Edwards, 1967) -
"phipt"
: phiPT-distance implemented according to Peakall and Smouse, 2012. This also appears in the literature under the name phiST (Meirmans, 2006, although the definition there is incomplete and we are not sure whether this is identical). -
"shared.average"
: average of between-community genetic distances. -
"shared.chakraborty"
: between-community shared allele distance according to Chakraborty and Jin (1993). -
"shared.problist"
: this implements the shared allele distance (Bowcock et al., 1994) for individuals directly for communities (one minus proportion of alleles shared by two communities averaged over loci).
Value
list with components
comvector |
integer vector of length of the number of individuals, indicating their community membership. |
dist |
genetic distances between communities. Parameter |
cgeodist |
if |
comgroup |
vector of length of the number of communities. If
|
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Bowcock, A. M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J. R., Cavalli-Sforza, L. L. (1994) High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455-457.
Cavalli-Sforza, L. L. and Edwards, A. W. F. (1967) Phylogenetic Analysis - Models and Estimation Procedures. The American Journal of Human Genetics 19, 233-257.
Chakraborty, R. and Jin, L. (1993) Determination of relatedness between individuals using DNA fingerprinting. Human Biology 65, 875-895.
Meirmans, P. G. (2006) Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution 60, 2399-2402.
Peakall, R. and Smouse P.E. (2012) GenAlEx Tutorial 2. https://biology-assets.anu.edu.au/GenAlEx/Tutorials.html
See Also
communities
; refer to phipt
for
computation of distances between specific pairs of communities.
diploidcomlist
produces relative frequencies for all
alles of all loci in all communities (on which the chord- and the
"shared.problist"
-distances are based).
Examples
options(digits=4)
data(tetragonula)
tnb <-
coord2dist(coordmatrix=tetragonula.coord[83:120,],cut=50,
file.format="decimal2",neighbors=TRUE)
ta <- alleleconvert(strmatrix=tetragonula[83:120,])
tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist)
tetraspec <- c(rep(1,11),rep(2,13),rep(3,14))
tetracoms <-
c(rep(1:3,each=3),4,5,rep(6:11,each=2),12,rep(13:19,each=2))
c1 <- communitydist(tai,comvector=tetracoms,distance="chord",
geodist=tnb$distmatrix,grouping=tetraspec)
c2 <- communitydist(tai,comvector=tetracoms,distance="phipt",
geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE)
c3 <- communitydist(tai,comvector=tetracoms,distance="shared.average",
geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE)
c4 <- communitydist(tai,comvector=tetracoms,distance="shared.chakraborty",
geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE)
c5 <- communitydist(tai,comvector=tetracoms,distance="shared.problist",
geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE)
round(c1$cgeodist,digits=1)
c1$comvector
c2$comvector
c3$comvector
c4$comvector
c5$comvector
round(c1$dist,digits=2)
round(c2$dist,digits=2)
round(c3$dist,digits=2)
round(c4$dist,digits=2)
round(c5$dist,digits=2)
Compare species clustering and species groups
Description
Tests for independence between a clustering and another grouping of species.
This is simply an interface to chisq.test
.
Usage
comp.test(cl,spg)
Arguments
cl |
a vector of integers. Clustering of species (may be taken
from |
spg |
a vector of integers of the same length, groups of species. |
Details
chisq.test
with simulated p-value is used.
Value
Output of chisq.test
.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. Systematic Biology 52, 717-723.
See Also
Examples
set.seed(1234)
g1 <- c(rep(1,34),rep(2,12),rep(3,15))
g2 <- sample(3,61,replace=TRUE)
comp.test(g1,g2)
Connectivity components of an undirected graph
Description
Computes the connectivity components of an undirected graph from a matrix giving the edges.
Usage
con.comp(comat)
Arguments
comat |
a symmetric logical or 0-1 matrix, where |
Details
The "depth-first search" algorithm of Cormen, Leiserson and Rivest (1990, p. 477) is used.
Value
An integer vector, giving the number of the connectivity component for each vertice.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Cormen, T. H., Leiserson, C. E. and Rivest, R. L. (1990), Introduction to Algorithms, Cambridge: MIT Press.
See Also
hclust
, cutree
for cutted single linkage
trees (often equivalent).
Examples
set.seed(1000)
x <- rnorm(20)
m <- matrix(0,nrow=20,ncol=20)
for(i in 1:20)
for(j in 1:20)
m[i,j] <- abs(x[i]-x[j])
d <- m<0.2
cc <- con.comp(d)
max(cc) # number of connectivity components
plot(x,cc)
# The same should be produced by
# cutree(hclust(as.dist(m),method="single"),h=0.2).
Connected regions per species
Description
Returns a vector of the numbers of connected regions per species for a presence-absence matrix.
Usage
con.regmat(regmat, neighbors, count = FALSE)
Arguments
regmat |
0-1-matrix. Columns are species, rows are regions. |
neighbors |
A list with a component for every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a list |
count |
logical. If |
Details
Uses con.comp
.
Value
Vector of numbers of connected regions per species.
Note
Designed for use in prabtest
.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
data(nb)
set.seed(888)
cp <- cluspop.nb(nb, p.nb=0.1, n.species=10, clus.specs=9,
reg.group=1:17,vector.species=c(10))
con.regmat(cp,nb)
Geographical coordinates to distances
Description
Computes geographical distances from geographical coordinates
Usage
coord2dist(file=NULL, coordmatrix=NULL, cut=NULL,
file.format="degminsec",
output.dist=FALSE, radius=6378.137,
fp=1/298.257223563, neighbors=FALSE)
Arguments
file |
string. A filename for the coordinate file. The file
should have 2, 4 or 6 numeric columns and one row for each location.
See |
coordmatrix |
something that can be coerced into a matrix with
2, 4 or 6 columns. Matrix of coordinates, one row for each
location. See |
cut |
numeric. Only active if |
file.format |
one of
|
output.dist |
logical. If |
radius |
numeric. Radius of the earth in km used in computation (the default is the equatorial radius but this is not the uniquely possible choice). |
fp |
flattening of the earth; the default is from WGS-84. |
neighbors |
logical. If |
Value
If neighbors==TRUE
, a
list with components
distmatrix |
distance matrix between locations. See
|
nblist |
list with a vector for every location containing the
numbers of its neighbors, see |
If neighbors==FALSE
, only the distance matrix.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
German Wikipedia from 29 August 2010: https://de.wikipedia.org/wiki/Orthodrome
See Also
Examples
options(digits=4)
data(veronica)
coord2dist(coordmatrix=veronica.coord[1:20,], cut=20, file.format="decimal2",neighbors=TRUE)
Region-wise cluster membership
Description
Produces a matrix with clusters as rows and regions as columns, indicating how many species present in a region belong to the clusters
Usage
crmatrix(x,xc,percentages=FALSE)
Arguments
x |
object of class |
xc |
object of class |
percentages |
logical. If |
Value
A clusters time regions matrix as explained above.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
Examples
options(digits=3)
data(kykladspecreg)
data(nb)
set.seed(1234)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
xc <- prabclust(x)
crmatrix(x,xc)
crmatrix(x,xc, percentages=TRUE)
Dice distance matrix
Description
Computes a distance derived from Dice's coincidence index between the columns of a 0-1-matrix.
Usage
dicedist(regmat)
Arguments
regmat |
0-1-matrix. Columns are species, rows are regions. |
Details
The Dice distance between two species is 1 minus the Coincidence Index, which is (2*number of regions where both species are present)/(2*number of regions where both species are present plus number of regions where at least one species is present). This is S23 in Shi (1993).
Value
A symmetrical matrix of Dice distances.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Shi, G. R. (1993) Multivariate data analysis in palaeoecology and palaeobiogeography - a review. Palaeogeography, Palaeoclimatology, Palaeoecology 105, 199-234.
See Also
Examples
options(digits=4)
data(kykladspecreg)
dicedist(t(kykladspecreg))
Distance ratio test statistics for distance based clustering
Description
Calculates the ratio between the prop
smallest and largest
distances of a distance matrix.
Usage
distratio(distmat, prop = 0.25)
Arguments
distmat |
symmetric distance matrix. |
prop |
numerical. Proportion between 0 and 1. |
Details
Rounding is by floor
for small and ceiling
for large
distances.
Value
A list with components
dr |
ratio of |
lowmean |
mean of |
himean |
mean of |
prop |
see above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
See Also
Examples
options(digits=4)
data(kykladspecreg)
j <- jaccard(t(kykladspecreg))
distratio(j)
geco distance matrix
Description
Computes geco distances between the columns of a 0-1-matrix, based on a distance matrix between regions (usually, but not necessarily, this is a geographical distance).
Usage
geco(regmat,geodist=as.dist(matrix(as.integer(!diag(nrow(regmat))))),
transform="piece",
tf=0.1,
countmode=ncol(regmat)+1)
Arguments
regmat |
0-1-matrix. Columns are species, rows are regions. |
geodist |
|
transform |
transformation applied to the distances before
computation of geco coefficient, see details. "piece" means
piecewise linear, namely distance/( |
tf |
tuning constant for transformation. See |
countmode |
optional positive integer. Every 'countmode' algorithm runs 'geco' shows a message. |
Details
The geco distance between two species is 0.5*(mean distance
between region where species 1 is present and closest region where
species 2 is present plus mean distance
between region where species 2 is present and closest region where
species 1 is present). 'closest' to a region could be the regions
itself.
It is recommended (Hennig and Hausdorf, 2006) to transform the
distances first, because the differences between large distances are
usually not meaningful or at least much less meaningful than
differences between small distances for dissimilarity measurement
between species ranges. See parameter transform
.
If the between-regions distance is 1 for all pairs of
non-equal regions, the geco distance degenerates
to the Kulczynski distance, see kulczynski
.
Value
A symmetrical matrix of geco distances.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hennig, C. and Hausdorf, B. (2006) A robust distance coefficient between distribution areas incorporating geographic distances. Systematic Biology 55, 170-175.
See Also
Examples
options(digits=4)
data(kykladspecreg)
data(waterdist)
geco(t(kykladspecreg),waterdist)
Neighborhood list from geographical distance
Description
Generates a neighborhood list as required by prabinit
from a
matrix of geographical distances.
Usage
geo2neighbor(geodist,cut=0.1*max(geodist))
Arguments
geodist |
|
cut |
non-negative numerical. All pairs of regions with
|
Value
A list of integer vectors, giving the set of neighbors for every region.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
Examples
data(waterdist)
geo2neighbor(waterdist)
Classical distance-based test for homogeneity against clustering
Description
Classical distance-based test for homogeneity against clustering. Test
statistics is number of isolated vertices in the graph of smallest
distances. The homogeneity model is a random graph model where ne
edges are drawn from all possible edges.
Usage
homogen.test(distmat, ne = ncol(distmat), testdist = "erdos")
Arguments
distmat |
numeric symmetric distance matrix. |
ne |
integer. Number of edges in the data graph, corresponding to smallest distances. |
testdist |
string. If |
Details
The "ling"-test is one-sided (rejection if the number of isolated vertices is too large), the "erdos"-test computes a one-sided as well as a two-sided p-value.
Value
A list with components
p |
p-value for one-sided test. |
p.twoside |
p-value for two-sided test, only if |
iv |
number of isolated vertices in the data. |
lambda |
parameter of the Poisson test distribution, only if
|
distcut |
largest distance value for which an edge has been drawn. |
ne |
see above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Erdos, P. and Renyi, A. (1960) On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17-61.
Godehardt, E. and Horsch, A. (1995) Graph-Theoretic Models for Testing the Homogeneity of Data. In Gaul, W. and Pfeifer, D. (Eds.) From Data to Knowledge, Springer, Berlin, 167-176.
Ling, R. F. (1973) A probability theory of cluster analysis. Journal of the American Statistical Association 68, 159-164.
See Also
Examples
options(digits=4)
data(kykladspecreg)
j <- jaccard(t(kykladspecreg))
homogen.test(j, testdist="erdos")
homogen.test(j, testdist="ling")
Clustering of species ranges from presence-absence matrices (hierarchical methods)
Description
Clusters a presence-absence matrix object by taking the
'h-cut'-partition of a hierarchical clustering and
declaring all members of too small clusters as 'noise' (this gives a
distance-based clustering method, which estimates the number of
clusters and allows for noise/non-clustered points). Note that this
is experimental. Often, the prabclust
-solutions
is more convincing due to higher flexibility of that method. However,
hprabclust
may be more stable sometimes.
Note: Data formats are described
on the prabinit
help page. You may also consider the example datasets
kykladspecreg.dat
and nb.dat
. Take care of the
parameter rows.are.species
of prabinit
.
Usage
hprabclust(prabobj, cutdist=0.4, cutout=1,
method="average", nnout=2, mdsplot=TRUE, mdsmethod="classical")
## S3 method for class 'comprabclust'
print(x, ...)
Arguments
prabobj |
object of class |
cutdist |
non-negative integer. Cutoff distance to determine the
partition, see |
cutout |
non-negative integer. Points that have at most
|
method |
string. Clustering method, see |
nnout |
non-negative integer. Members of clusters with less or
equal than |
mdsplot |
logical. If |
mdsmethod |
|
x |
|
... |
necessary for print method. |
Value
hprabclust
generates an object of class comprabclust
. This is a
list with components
clustering |
vector of integers indicating the cluster memberships of
the species ( |
rclustering |
vector of integers indicating the cluster memberships of
the species, noise as described under |
cutdist |
see above. |
method |
see above. |
cutout |
see above. |
nnout |
see above. |
noisen |
number of points minus |
symbols |
vector of characters corresponding to |
points |
numerical matrix. MDS configuration (if |
call |
function call. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
data(kykladspecreg)
data(nb)
data(waterdist)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb,
geodist=waterdist, distance="geco")
hprabclust(x,mdsplot=FALSE)
Nestedness matrix
Description
Computes species*species nestedness matrix and number of nestings (inclusions) from regions*species presence-absence matrix.
Usage
incmatrix(regmat)
Arguments
regmat |
0-1-matrix. Columns are species, rows are regions. |
Value
A list with components
m |
0-1-matrix. |
ninc |
integer. Number of strict inclusions. |
neq |
integer. Number of region equalities between species (not including equality between species i and i). |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2003) Nestedness of nerth-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
See Also
Examples
data(kykladspecreg)
incmatrix(t(kykladspecreg))$ninc
Jaccard distance matrix
Description
Computes Jaccard distances between the columns of a 0-1-matrix.
Usage
jaccard(regmat)
Arguments
regmat |
0-1-matrix. Columns are species, rows are regions. |
Details
The Jaccard distance between two species is 1-(number of regions where both species are present)/(number of regions where at least one species is present). As a similarity coefficient, this is S22 in Shi (1993).
Thank you to Laurent Buffat for improving this function!
Value
A symmetrical matrix of Jaccard distances.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Shi, G. R. (1993) Multivariate data analysis in palaeoecology and palaeobiogeography - a review. Palaeogeography, Palaeoclimatology, Palaeoecology 105, 199-234.
See Also
Examples
options(digits=4)
data(kykladspecreg)
jaccard(t(kykladspecreg))
Kulczynski distance matrix
Description
Computes Kulczynski distances between the columns of a 0-1-matrix.
Usage
kulczynski(regmat)
Arguments
regmat |
0-1-matrix. Columns are species, rows are regions. |
Details
The Kulczynski distance between two species is 1-(mean of (number of regions where both species are present)/(number of regions where species 1 is present) and (number of regions where both species are present)/(number of regions where species 2 is present)). The similarity version of this is S28 in Shi (1993).
Value
A symmetrical matrix of Kulczynski distances.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Shi, G. R. (1993) Multivariate data analysis in palaeoecology and palaeobiogeography - a review. Palaeogeography, Palaeoclimatology, Palaeoecology 105, 199-234.
See Also
jaccard
, geco
,qkulczynski
,
dicedist
Examples
options(digits=4)
data(kykladspecreg)
kulczynski(t(kykladspecreg))
Snail presence-absence data from Aegean sea
Description
0-1-matrix where rows are snail species and columns are islands in the Aegean sea. An entry of 1 means that the species is present in the region.
Usage
data(kykladspecreg)
Format
A 0-1 matrix with 80 rows and 34 columns.
Details
Reads from example data file kykladspecreg.dat
.
Source
B. Hausdorf and C. Hennig (2005) The influence of recent geography, palaeography and climate on the composition of the faune of the central Aegean Islands. Biological Journal of the Linnean Society 84, 785-795.
See Also
nb
provides neighborhood information about the 34
islands. waterdist
provides a geographical distance
matrix between the islands.
Examples
data(kykladspecreg)
Largest connectivity component
Description
Computes the size of the largest connectivity component of the graph
of ncol(distmat)
vertices with edges defined by the smallest
ne
distances.
Usage
lcomponent(distmat, ne = floor(3*ncol(distmat)/4))
Arguments
distmat |
symmetric distance matrix. |
ne |
integer. |
Value
list with components
lc |
size of the largest connectivity component. |
ne |
see above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
See Also
Examples
data(kykladspecreg)
j <- jaccard(t(kykladspecreg))
lcomponent(j)
Visualises clusters of markers vs. species
Description
Given a clustering of individuals from prabclust
(as
generated in species delimitation) and a clustering of markers (for
example dominant markers of genetic loci), lociplots
visualises
the presence of markers against the clustering of individuals and
computes some statistics.
Usage
lociplots(indclust,locclust,locprab,lcluster,
symbols=NULL,brightest.grey=0.8,darkest.grey=0,
mdsdim=1:2)
Arguments
indclust |
|
locclust |
vector of integers. Clustering of markers/loci. |
locprab |
|
lcluster |
integer. Number of cluster in |
symbols |
vector of plot symbols. If |
brightest.grey |
numeric between 0 and 1. Brightest grey value used in plot for individuals with smallest marker percentage, see details. |
darkest.grey |
numeric between 0 and 1. Darkest grey value used in plot for individuals with highest marker percentage, see details. |
mdsdim |
vector of two integers. The two MDS variables taken from
|
Details
Plot and statistics are based on the individual marker percentage,
which is the percentage of markers present in an individual of the
markers belonging to cluster no. lcluster
. In the plot, the
grey value visualises the marker percentage.
Value
list with components
locfreq |
vector of individual marker percentages. |
locfreqmin |
vector of minimum individual marker precentages for
each cluster in |
locfreqmax |
vector of maximum individual marker precentages for
each cluster in |
locfreqmean |
vector of average individual marker precentages for
each cluster in |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
options(digits=4)
data(veronica)
vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard")
ppv <- prabclust(vei)
veloci <- prabinit(prabmatrix=veronica[1:50,],rows.are.species=FALSE)
velociclust <- prabclust(veloci,nnk=0)
lociplots(ppv,velociclust$clustering,veloci,lcluster=3)
Missing values statistics for matrix
Description
Computes column-wise and row-wise numbers of missing values.
Usage
nastats(amatrix, nastr="--")
Arguments
amatrix |
(any) matrix. |
nastr |
missing value indicator. |
Value
A list with components
narow |
vector of row-wise numbers of mixxing values. |
nacol |
vector of column-wise numbers of mixxing values. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
Examples
xx <- cbind(c(1,2,3),c(0,0,1),c(5,3,1))
nastats(xx,nastr=0)
Neighborhood list for Aegean islands
Description
List of neighboring islands for 34 Aegean islands.
Usage
data(nb)
Format
List with 34 components, all being vetors of integers (or
numeric(0)
in case of no neighbors) indicating the neighboring
islands.
Details
Reads from example data file nb.dat
.
Source
B. Hausdorf and C. Hennig (2005) The influence of recent geography, palaeography and climate on the composition of the faune of the central Aegean Islands. Biological Journal of the Linnean Society 84, 785-795.
Examples
data(nb)
# nb <- list()
# for (i in 1:34)
# nb <- c(nb,list(scan(file="(path/)nb.dat",
# skip=i-1,nlines=1)))
Test of neighborhood list
Description
Tests a list of neighboring regions for proper format. Neighborhood is tested for being symmetrical. Causes an error if tests fail.
Usage
nbtest(nblist, n.regions=length(nblist))
Arguments
nblist |
A list with a component for
every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a vector |
n.regions |
Number of regions. |
Value
invisible{TRUE}
.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
data(nb)
nbtest(nb)
nb[[1]][1] <- 1
try(nbtest(nb))
Mean distance to kth nearest neighbor
Description
Computes the mean of the distances from each point to its ne
th
nearest neighbor.
Usage
nn(distmat, ne = 1)
Arguments
distmat |
symmetric distance matrix (not a |
ne |
integer. |
Value
numerical.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
See Also
Examples
data(kykladspecreg)
j <- jaccard(t(kykladspecreg))
nn(j,4)
Nearest neighbor based clutter/noise detection
Description
Detects if data points are noise or part of a cluster, based on a Poisson process model.
Usage
NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
convergence = 0.001, plot=FALSE, quiet=TRUE)
## S3 method for class 'nnclean'
print(x, ...)
Arguments
data |
numerical matrix or data frame. |
k |
integer. Number of considered nearest neighbors per point. |
distances |
distance matrix object of class |
edge.correct |
logical. If |
wrap |
numerical. If |
convergence |
numerical. Convergence criterion for EM-algorithm. |
plot |
logical. If |
quiet |
logical. If |
x |
object of class |
... |
necessary for print methods. |
Details
The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.
Value
NNclean
returns a list of class nnclean
with components
z |
0-1-vector of length of the number of data points. 1 means cluster, 0 means noise. |
probs |
vector of estimated a priori probabilities for each point to belong to the cluster component. |
k |
see above. |
lambda1 |
intensity parameter of cluster component. |
lambda2 |
intensity parameter of noise component. |
p |
estimated probability of cluster component. |
kthNND |
distance to kth nearest neighbor. |
Note
The software can be freely used for non-commercial purposes, and can be freely distributed for non-commercial purposes only.
Author(s)
R-port by Christian Hennig
christian.hennig@unibo.it
https://www.unibo.it/sitoweb/christian.hennig/en,
original Splus package by S. Byers and A. E. Raftery.
References
Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.
Examples
library(mclust)
data(chevron)
nnc <- NNclean(chevron[,2:3],15,plot=TRUE)
plot(chevron[,2:3],col=1+nnc$z)
Distances between communities, auxiliary functions
Description
Auxiliary functions for communitydist
. phipt
computes phiPT/phiST (Peakall and Smouse, 2012, Meirmans,
2006) between two communities. cfchord
computes the
chord-distance (Cavalli-Sforza and Edwards, 1967) between two lists or
locus-wise relative allele frequencies. shared.problist
computes a straightforward generalisation of the shared allele
distance (Bowcock et al., 1994) between
individuals for communities, namely the ‘overlap’, i.e., sum of the
minima of the
allele relative frequencies. diploidcomlist
constructs the
input lists for cfchord
and shared.problist
from an
alleleobject
. It provides relative frequencies for all
alles of all loci in all communities.
Usage
phipt(alleleobj,comvector,i,j)
cfchord(p1,p2)
shared.problist(p1,p2)
diploidcomlist(alleleobj,comvector,diploid=TRUE)
Arguments
alleleobj |
if |
comvector |
vector of integers indicating to which community an individual belongs. |
i |
integer. Number of community. |
j |
integer. Number of community. The phiPT-distance is computed
between the communities numbered |
p1 |
list. Every list entry refers to a locus and is a vector of relative frequencies of the alleles present in that locus in a community. |
p2 |
list. Every list entry refers to a locus and is a vector of
relative frequencies of the alleles present in that locus in a
community. The chord or shared allele distance is computed between
the communities encoded by |
diploid |
logical, indicating whether loci are diploid, see
|
Value
cfchord
gives out the value of the chord
distance. shared.problist
gives out the distance
value. diploidcomlist
gives out a two-dimensional list. The
list has one entry for each community, which is itself a list. This
community list has one entry for each locus, which is a vector that
gives the relative frequencies of the different alleles in
phipt
gives out a list with components phipt, vap, n0,
sst, ssg, msa, msw
. These refer to the notation on p.2.12 and 2.15 of
Peakall and Smouse (2012).
phipt |
value of phiPT. |
vap |
variance among (between) populations (communities). |
n0 |
standardisation factor N0, see p.2.12 of Peakall and Smouse (2012). |
sst |
total distances sum of squares. |
ssg |
vector with two non- |
msa |
mean square between communities. |
msw |
mean square within communities. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Bowcock, A. M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J. R., Cavalli-Sforza, L. L. (1994) High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455-457.
Cavalli-Sforza, L. L. and Edwards, A. W. F. (1967) Phylogenetic Analysis - Models and Estimation Procedures. The American Journal of Human Genetics 19, 233-257.
Meirmans, P. G. (2006) Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution 60, 2399-2402.
Peakall, R. and Smouse P.E. (2012) GenAlEx Tutorial 2. https://biology-assets.anu.edu.au/GenAlEx/Tutorials.html
See Also
Examples
options(digits=4)
data(tetragonula)
tnb <-
coord2dist(coordmatrix=tetragonula.coord[83:120,],cut=50,file.format="decimal2",neighbors=TRUE)
ta <- alleleconvert(strmatrix=tetragonula[83:120,])
tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist)
tetracoms <-
c(rep(1:3,each=3),4,5,rep(6:11,each=2),12,rep(13:19,each=2))
phipt(tai,tetracoms,4,6)
tdip <- diploidcomlist(tai,tetracoms,diploid=TRUE)
cfchord(tdip[[4]],tdip[[6]])
shared.problist(tdip[[4]],tdip[[6]])
Piecewise linear transformation for distance matrices
Description
Piecewise linear transformation for distance matrices, utility
function for geco
.
Usage
piecewiselin(distmatrix, maxdist=0.1*max(distmatrix))
Arguments
distmatrix |
symmetric (non-negative) distance matrix. |
maxdist |
non-negative numeric. Larger distances are transformed to constant 1. |
Details
Transforms large distances to 1, 0 to 0 and continuously linear between 0 and
maxdist
.
Value
A symmetrical matrix.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
options(digits=4)
data(waterdist)
piecewiselin(waterdist)
Plots for within-groups and between-groups distance regression
Description
Visualisation of various regressions on distance (or dissimilarity) data where objects are from two groups.
Usage
plotdistreg(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2],
cols=c(1,2,3,4),
pchs=rep(1,3),
ltys=c(1,2,1,2),
individual=TRUE,jointwithin=TRUE,jointall=TRUE,
oneplusjoint=TRUE,jittering=TRUE,bcenterline=TRUE,
xlim=NULL,ylim=NULL,xlab="geographical distance",
ylab="genetic distance",...)
Arguments
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
something that can be coerced into a factor,
defining the grouping of
objects represented by the dissimilarities |
groups |
Vector of two levels. The two groups defining the
regressions to be compared in the test. These can be
factor levels, integer numbers, or strings, depending on the entries
of |
cols |
vector of four colors (or color numbers) to be used for
plotting distances
and regression lines within the first group, within the second group,
distances between groups, and a line marking the center of the
between-groups explanatory distances, see |
pchs |
vector of three plot symbols (or numbers) to be used for
plotting distances within the first group, within the second group,
and distances between groups, see |
ltys |
vector of line type numbers to be used for single group
within-group regression, both groups combined within-group
regression, regression with all distances, and regression combining
within-groups distances of one group with between-groups distances,
see |
individual |
if |
jointwithin |
if |
jointall |
if |
oneplusjoint |
if |
jittering |
if |
bcenterline |
if |
xlim |
to be passed on to |
ylim |
to be passed on to |
xlab |
to be passed on to |
ylab |
to be passed on to |
... |
optional arguments to be passed on to |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
See Also
regeqdist
, regdistbetween
,
regdistbetweenone
, regdistdiffone
Examples
options(digits=4)
data(veronica)
ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2")
vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard")
species <-c(rep(1,13),rep(2,22))
loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25))
plotdistreg(dmx=loggeo,dmy=vei$distmat,grouping=species,
jointwithin=FALSE,jointall=FALSE,groups=c(1,2))
legend(5,0.75,c("within species 1",
"within species 2","species 1 and between","species 2 and between"),lty=c(1,1,2,2),col=c(1,2,1,2))
plotdistreg(dmx=loggeo,dmy=vei$distmat,grouping=species,
jointwithin=TRUE,jointall=TRUE,oneplusjoint=FALSE,groups=c(1,2))
legend(5,0.75,c("within species 1",
"within species 2","all distances","all within species"),lty=c(1,1,1,2),col=c(1,2,3,3))
p-value simulation for presence-absence matrices clustering test
Description
Parametric bootstrap simulation of the p-value of a test of a
homogeneity hypothesis against clustering (or significant nestedness).
Designed for use within
prabtest
. The null model is defined by
randpop.nb
.
Usage
pop.sim(regmat, neighbors, h0c = 1, times = 200, dist = "kulczynski",
teststat = "isovertice", testc = NULL, geodist=NULL, gtf=0.1,
n.species = ncol(regmat),
specperreg = NULL, regperspec = NULL, species.fixed=FALSE, pdfnb=FALSE,
ignore.richness=FALSE)
Arguments
regmat |
0-1-matrix. Columns are species, rows are regions. |
neighbors |
A list with a component for every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a list |
h0c |
numerical. Parameter |
times |
integer. Number of simulation runs. |
dist |
"kulczynski", "jaccard" or "geco", see |
teststat |
"isovertice", "lcomponent", "distratio", "nn" or
"inclusions". See
the corresponding functions, |
testc |
numerical. Tuning constant for the test statistics. |
geodist |
matrix of non-negative reals. Geographical distances
between regions. Only used if |
gtf |
tuning constant for geco-distance if |
n.species |
integer. Number of species. |
specperreg |
vector of integers. Numbers of species per region (is calculated from the data by default). |
regperspec |
vector of integers. Number of regions per species (is calculated from the data by default). |
species.fixed |
logical. If |
pdfnb |
logical. Probability correction in |
ignore.richness |
logical. If |
Value
List with components
results |
vector of teststatistic values for the simulated matrices. |
p.above |
p-value if large test statistic leads to rejection. |
p.below |
p-value if small test statistic leads to rejection. |
datac |
test statistic value for the original data. |
testc |
see above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. Systematic Biology 52, 717-723.
Hausdorf, B. and Hennig, C. (2003) Nestedness of north-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
See Also
prabtest
, randpop.nb
,
jaccard
, kulczynski
,
homogen.test
, lcomponent
,
distratio
, nn
,
incmatrix
.
Examples
options(digits=4)
data(kykladspecreg)
data(nb)
set.seed(1234)
pop.sim(t(kykladspecreg), nb, times=5, h0c=0.35, teststat="nn", testc=3)
Estimates SAR model from log-abundance matrix of prab-object.
Description
This is either an interface for the function errorsarlm
for abundance data stored in an object of class prab
implemented for use in abundtest
, or, in case that spatial
information should be ignored, it estimates a two-way additive
unreplicated linear
model for log-abundances on factors species and region.
Usage
prab.sarestimate(abmat, prab01=NULL,sarmethod="eigen",
weightstyle="C",
quiet=TRUE, sar=TRUE,
add.lmobject=TRUE)
Arguments
abmat |
object of class |
prab01 |
presence-absence matrix of same dimensions than the
abundance matrix of |
sarmethod |
this is passed on as parameter |
weightstyle |
can take values "W", "B", "C", "U", and "S" though tests
suggest that "C" should be chosen. See |
quiet |
this is passed on as parameter |
sar |
logical. If |
add.lmobject |
logical. If |
Value
A list with the following components:
sar |
see above. |
intercept |
numeric. Estimator of the intercept. |
sigma |
numeric. Estimator of error standard deviation. |
regeffects |
numeric vector. Estimator for region effects. |
speceffects |
numeric vector. Estimator for species effects. |
lamda |
numeric. Governs the degree of spatial
autocorrelation. See |
size |
integer. Length of neighborhood list generated by
|
nbweight |
numeric. Average weight of neighbors. |
lmobject |
if |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
options(digits=4)
data(siskiyou)
x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb,
distance="none")
# Not run; this needs package spdep
# prab.sarestimate(x)
prab.sarestimate(x, sar=FALSE)
Clustering for biotic elements or for species delimitation (mixture method)
Description
Clusters a presence-absence matrix object (for clustering
ranges/finding biotic elements, Hennig and Hausdorf, 2004) or
an object of genetic information (for species delimitation, Hausdorf
and Hennig, 2010)
by calculating an MDS from
the distances, and applying maximum likelihood Gaussian mixtures clustering
with "noise" (package mclust
) to the MDS points. The solution
is plotted. A standard execution (using the default distance of
prabinit
) will be
prabmatrix <- prabinit(file="path/prabmatrixfile",
neighborhood="path/neighborhoodfile")
clust <- prabclust(prabmatrix)
print(clust)
Examples for species delimitation are given below in the examples section.
Note: Data formats are described
on the prabinit
and alleleinit
help pages. You may also consider the example datasets
kykladspecreg.dat
, nb.dat
,
Heterotrigona_indoFO.txt
or MartinezOrtega04AFLP.dat
.
Note: prabclust
calls the function
mclustBIC
in package mclust. An alternative is the use of hprabclust
.
Usage
prabclust(prabobj, mdsmethod = "classical", mdsdim = 4, nnk =
ceiling(prabobj$n.species/40), nclus = 0:9, modelid = "all", permutations=0)
## S3 method for class 'prabclust'
print(x, bic=FALSE, ...)
Arguments
prabobj |
object of class |
mdsmethod |
|
mdsdim |
integer. Dimension of the MDS points. For
|
nnk |
integer. Number of nearest neighbors to determine the
initial noise estimation by |
nclus |
vector of integers. Numbers of clusters to perform the mixture estimation. |
modelid |
string. Model name for |
permutations |
integer. It has been found occasionally that
depending on the order of observations the algorithms |
x |
object of class |
bic |
logical. If |
... |
necessary for summary method. |
Details
Note that if mdsmethod!="classical"
, zero distances between
non-identical objects are replaced by the smallest nonzero distance
divided by 10 to prevent the MDS methods from producing an error.
Value
print.prabclust
does not produce output.
prabclust
generates an object of class prabclust
. This is a
list with components
clustering |
vector of integers indicating the cluster memberships of
the species. Noise can be recognized by output component |
clustsummary |
output object of |
bicsummary |
output object of |
points |
numerical matrix. MDS configuration. |
nnk |
see above. |
mdsdim |
see above. |
mdsmethod |
see above. |
symbols |
vector of characters, similar to |
permchange |
logical. If |
Note
Note that we used mdsmethod="kruskal"
in our publications, but
mdsmethod="classical"
is now the default, because of
occasional numerical instabilities of the isoMDS
-implementation
for Jaccard, Kulczynski or geco distance matrices.
Sometimes, prabclust
produces an error because mclustBIC
cannot handle all models properly. In this case we recommend to change
the modelid
parameter. "noVVV"
and "VVV"
are
reasonable alternative choices (one of these is expected to reproduce
the error, but the other one might work).
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.
Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, 59, 491-503.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
See Also
mclustBIC
, summary.mclustBIC
,
NNclean
, cmdscale
,
isoMDS
, sammon
,
prabinit
, hprabclust
,
alleleinit
, stressvals
.
Examples
# Biotic element/range clustering:
data(kykladspecreg)
data(nb)
set.seed(1234)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
# If you want to use your own ASCII data files, use
# x <- prabinit(file="path/prabmatrixfile",
# neighborhood="path/neighborhoodfile")
print(prabclust(x))
# Here is an example for species delimitation with codominant markers;
# only 50 individuals were used in order to have a fast example.
data(tetragonula)
ta <- alleleconvert(strmatrix=tetragonula[1:50,])
tai <- alleleinit(allelematrix=ta)
print(prabclust(tai))
# Here is an example for species delimitation with dominant markers;
# only 50 individuals were used in order to have a fast example.
# You may want to use stressvals to choose mdsdim.
data(veronica)
vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard")
print(prabclust(vei,mdsmethod="kruskal",mdsdim=3))
Presence-absence/abundance matrix initialization
Description
prabinit
converts a matrix into an object
of class prab
(presence-absence). The matrix may be read from a
file or an R-object. It may be a 0-1 matrix or a matrix with
non-negative entries (usually abundances).
print.prab
is a print method for such
objects.
Documentation here is in terms of biotic elements analysis (species are to be clustered). For species delimitation with dominant markers, see Hausdorf and Hennig (2010), individuals take the role of species and loci take the role of regions.
Usage
prabinit(file = NULL, prabmatrix = NULL, rows.are.species = TRUE,
neighborhood = "none", nbbetweenregions=TRUE, geodist=NULL, gtf=0.1,
distance = "kulczynski", toprab = FALSE, toprabp
= 0.05, outc = 5.2)
## S3 method for class 'prab'
print(x, ...)
Arguments
file |
string. non-negative matrix ASCII file (such as example dataset
|
prabmatrix |
matrix with non-negative entries. Either |
rows.are.species |
logical. If |
neighborhood |
A string or a list with a component for
every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a vector |
nbbetweenregions |
logical. If |
geodist |
matrix of non-negative reals. Geographical distances
between regions. Only used if |
gtf |
tuning constant for geco-distance if |
distance |
|
toprab |
logical. If |
toprabp |
numerical between 0 and 1, see |
outc |
numerical. Tuning constant for the outlier identification
associated with |
x |
object of class |
... |
necessary for print method. |
Details
Species that are absent in all regions are omitted.
Value
prabinit
produces
an object of class prab
, which is a list with components
distmat |
distance matrix between species. |
prab |
abundance or presence/absence matrix (if presence/absence, the entries are logical). Rows are regions, columns are species. |
nb |
neighborhood list, see above. |
regperspec |
vector of the number of regions occupied by a species. |
specperreg |
vector of the number of species present in a region. |
n.species |
number of species (in the |
n.regions |
number of regions. |
distance |
string denoting the chosen distance measure. |
geodist |
non-negative matrix. see above. |
gtf |
numeric. see above. |
spatial |
|
nonempty.species |
logical vector. The length is the number of species
in the original file/matrix. If |
nbbetweenregions |
see above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, 59, 491-503.
See Also
read.table
, jaccard
,
kulczynski
, geco
,
qkulczynski
, nbtest
,
alleleinit
Examples
# If you want to use your own ASCII data files, use
# x <- prabinit(file="path/prabmatrixfile",
# neighborhood="path/neighborhoodfile")
data(kykladspecreg)
data(nb)
prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
Parametric bootstrap test for clustering in presence-absence matrices
Description
Parametric bootstrap test of a null model of i.i.d., but spatially
autocorrelated species against clustering of the species' occupied
areas (or alternatively nestedness). In spite of the lots of
parameters, a standard execution (for the default test statistics, see
parameter teststat
below) will be
prabmatrix <- prabinit(file="path/prabmatrixfile",
neighborhood="path/neighborhoodfile")
test <- prabtest(prabmatrix)
summary(test)
Note: Data formats are described
on the prabinit
help page. You may also consider the example datasets
kykladspecreg.dat
and nb.dat
. Take care of the
parameter rows.are.species
of prabinit
.
Usage
prabtest(prabobject, teststat = "distratio", tuning = switch(teststat,
distratio = 0.25, lcomponent = floor(3 * ncol(prabobject$distmat)/4),
isovertice = ncol(prabobject$distmat), nn = 4, NA), times = 1000,
pd = NULL, prange = c(0, 1), nperp = 4, step = 0.1, step2=0.01,
twostep = TRUE,
sf.sim = FALSE, sf.const = sf.sim, pdfnb = FALSE, ignore.richness=FALSE)
## S3 method for class 'prabtest'
summary(object, above.p=object$teststat %in%
c("groups","inclusions","mean"),
group.outmean=FALSE,...)
## S3 method for class 'summary.prabtest'
print(x, ...)
Arguments
prabobject |
an object of class |
teststat |
string, indicating the test statistics. |
tuning |
integer or (if |
times |
integer. Number of simulation runs. |
pd |
numerical between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. If |
prange |
numerical range vector, lower value not smaller than 0, larger
value not larger than 1. Range where |
nperp |
integer. Number of simulations per |
step |
numerical between 0 and 1. Interval length between
subsequent choices of |
step2 |
numerical between 0 and 1. Interval length between
subsequent choices of |
twostep |
logical. If |
sf.sim |
logical. Indicates if the range sizes of the species
are held fixed
in the test simulation ( |
sf.const |
logical. Same as |
pdfnb |
logical. If |
ignore.richness |
logical. If |
object |
object of class |
above.p |
logical. |
group.outmean |
logical. If |
x |
object of class |
... |
no meaning, necessary for print and summary methods. |
Details
From the original data, the distribution of the
range sizes of the species, the autocorrelation parameter pd
(estimated by autoconst
) and the distribution on the regions
induced by the relative species numbers are taken. With these
parameters, times
populations according to the null model
implemented in randpop.nb
are generated and the test statistic
is evaluated. The resulting p-value is number of simulated statistic
values more extreme than than the value of the original data+1
divided by times+1
. "More extreme" means smaller for
"lcomponent"
, "distratio"
, "nn"
, larger for
"inclusions"
, and
twice the smaller number between the original statistic value and the
"border", i.e., a two-sided test for "isovertice"
.
If pd=NA
was
specified, a diagnostic plot
for the estimation of pd
is plotted by autoconst
.
For details see Hennig
and Hausdorf (2004) and the help pages of the cited functions.
Value
prabtest
prodices
an object of class prabtest
, which is a list with components
results |
vector of test statistic values for all simulated populations. |
datac |
test statistic value for the original data.' |
p.value |
the p-value. |
tuning |
see above. |
pd |
see above. |
reg |
regression coefficients from |
teststat |
see above. |
distance |
the distance measure chosen, see |
gtf |
the geco-distance tuning parameter (only informative if
|
times |
see above. |
pdfnb |
see above. |
ignore.richness |
see above. |
summary.prabtest
produces an object of class
summary.prabtest
, which is a list with components
rrange |
range of the simulation results (test statistic values)
of |
rmean |
mean of the simulation results (test statistic values)
of |
datac , p.value , pd , tuning , teststat , distance , times , pdfnb , abund , sarlambda |
directly
taken from |
groupinfo |
if |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. Systematic Biology 52, 717-723.
Hausdorf, B. and Hennig, C. (2003) Nestedness of north-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
See Also
prabinit
generates objects of class prab
.
autoconst
estimates pd
from such objects.
randpop.nb
generates populations from the null model.
An alternative model is given by cluspop.nb
.
Some more information on the test statistics is given in
homogen.test
, lcomponent
,
distratio
, nn
,
incmatrix
.
The simulations are computed by pop.sim
.
Examples
options(digits=4)
data(kykladspecreg)
data(nb)
set.seed(1234)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
# If you want to use your own ASCII data files, use
# x <- prabinit(file="path/prabmatrixfile",
# neighborhood="path/neighborhoodfile")
kpt <- prabtest(x, times=5, pd=0.35)
# These settings are chosen to make the example execution
# a bit faster; usually you will use prabtest(kprab).
summary(kpt)
Quantitative Kulczynski distance matrix
Description
Computes quantitative Kulczynski distances between the columns of an abundance matrix.
Usage
qkulczynski(regmat, log.distance=FALSE)
Arguments
regmat |
(non-negative) abundance matrix. Columns are species, rows are regions. |
log.distance |
logical. If |
Details
The quantitative Kulczynski distance between two species is 1-(mean of (mean of over regions minimum abundance of both species)/(sum of abundances of species 1) and (mean of over regions minimum abundance of both species)/(sum of abundances of species 2)). If the abundance matrix is a 0-1-matrix, this gives the standard Kulczynski distance.
Value
A symmetrical matrix of quantitative Kulczynski distances.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
D. P. Faith, P. R. Minchin and L. Belbin (1987) Compositional dissimilarity as a robust measure of ecological distance. Vegetation 69, 57-68.
See Also
Examples
options(digits=4)
data(kykladspecreg)
qkulczynski(t(kykladspecreg))
Simulation of presence-absence matrices (non-clustered)
Description
Generates a simulated matrix where the rows are interpreted as regions
and the columns as species, 1 means that a species is present in the
region and 0 means that the species is absent. Species are generated
i.i.d.. Spatial autocorrelation of a species' presences is governed by
the parameter p.nb
and a list of neighbors for each region.
Usage
randpop.nb(neighbors, p.nb = 0.5, n.species, n.regions =
length(neighbors), vector.species = rep(1, n.species),
species.fixed = FALSE, pdf.regions = rep(1/n.regions, n.regions),
count = TRUE, pdfnb = FALSE)
Arguments
neighbors |
A list with a component for every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a list |
p.nb |
numerical between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. Note that for a given
presence-absence matrix, this parameter can be estimated by
|
n.species |
integer. Number of species. |
n.regions |
integer. Number of regions. |
vector.species |
vector of integers. If
|
species.fixed |
logical. See |
pdf.regions |
numerical vector of length |
count |
logical. If |
pdfnb |
logical. If |
Details
The principle is that a single species with given size is generated
one-by-one region. The first region is drawn according to
pdf.regions
. For all following regions, a neighbor or
non-neighbor of the previous configuration is added (if possible),
as explained in pdf.regions
, p.nb
.
Value
A 0-1-matrix, rows are regions, columns are species.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. Systematic Biology 52, 717-723.
Hausdorf, B. and Hennig, C. (2003) Nestedness of nerth-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
See Also
autoconst
estimates p.nb
from matrices of class
prab
. These are generated by prabinit
.
prabtest
uses randpop.nb
as a null model for
tests of clustering. An alternative model is given by
cluspop.nb
.
Examples
data(nb)
set.seed(2346)
randpop.nb(nb, p.nb=0.1, n.species=5, vector.species=c(1,10,20,30,34))
Regression between subsets of dissimilarity matrices
Description
Given two dissimilarity matrices dmx
and dmy
and an indicator
vector x
, this computes a standard least squares regression
between the dissimilarity between objects indicated in x
.
Usage
regdist(x,dmx,dmy,xcenter=0,param)
Arguments
x |
vector of logicals of length of the number of objects on which
dissimilarities |
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
xcenter |
numeric. Dissimilarities |
param |
1 or 2 or |
Value
If param=NULL
, the output object of lm
. If
param=1
the intercept. If
param=2
the slope.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
Examples
options(digits=4)
data(veronica)
ver.geo <- coord2dist(coordmatrix=veronica.coord[1:20,],file.format="decimal2")
vei <- prabinit(prabmatrix=veronica[1:20,],distance="jaccard")
regdist(c(rep(TRUE,10),rep(FALSE,10)),ver.geo,vei$distmat,param=1)
Testing equality of within-groups and between-groups distances regression
Description
Jackknife-based test for equality of two regressions between distances. Given two groups of objects, this tests whether the regression involving all distances is compatible with the regression involving within-group distances only.
Usage
regdistbetween(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2])
## S3 method for class 'regdistbetween'
print(x,...)
Arguments
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
something that can be coerced into a factor,
defining the grouping of
objects represented by the dissimilarities |
groups |
Vector of two levels. The two groups defining the
regressions to be compared in the test. These can be
factor levels, integer numbers, or strings, depending on the entries
of |
x |
object of class |
... |
optional arguments for print method. |
Details
The null hypothesis that the regressions based on all distances and based on within-group distances only are equal is tested using jackknife pseudovalues. This assumes that a single regression is appropriate at least for the within-group distances alone. The test statistic is the difference between fitted values with x (explanatory variable) fixed at the center of the between-group distances. The test is run one-sided, i.e., the null hypothesis is only rejected if the between-group distances are larger than expected under the null hypothesis, see below.
The test cannot be run in case that within-group regressions or jackknifed within-group regressions are ill-conditioned.
This was implemented having in mind an application in which the
explanatory distances represent geographical distances, the response
distances are genetic distances, and groups represent species or
species-candidates. In this application, for testing whether the
regression patterns are compatble with the two groups behaving like a
single species, one would first use regeqdist
to test whether a
joint regression for the within-group distances of both groups makes
sense. If this is not rejected, regdistbetween
is run to see
whether the between-group distances are compatible with the
within-group distances. This is only rejected if the between-group
distances are larger than expected under equality of regressions,
because if they are smaller, this is not an indication against the
groups belonging together genetically.
If a joint regression on
within-group distances is rejected by regeqdist
,
regdistbetweenone
can be
used to test whether the between-group distances are at least
compatible with the within-group distances of one of the groups, which
can still be the case within a single species, see Hausdorf and Hennig (2019).
Value
list of class "regdistbetween"
with components
pval |
p-value. |
coeffdiff |
difference between regression fits (all distances
minus within-group distances only) at |
condition |
condition numbers of regressions, see |
lmfit |
list. Output objects of |
jr |
output object of |
xcenter |
mean of within-groups distances of explanatory variable, used for centering. |
xcenterbetween |
mean of between-groups distances of explanatory
variable (after centering by |
tstat |
t-statistic. |
tdf |
degrees of freedom of t-statistic. |
jackest |
jackknife-estimator of difference between regression
fitted values at |
jackse |
jackknife-standard error for
|
jackpseudo |
vector of jacknife pseudovalues on which the test is based. |
testname |
title to be printed out when using
|
groups |
see above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
See Also
Examples
options(digits=4)
data(veronica)
ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2")
vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard")
loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25))
species <-c(rep(1,13),rep(2,22))
rtest2 <-
regdistbetween(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2))
print(rtest2)
Testing equality of one within-group and between-two groups distances regression
Description
Jackknife-based test for equality of two regressions between distances. Given two groups of objects, this tests whether the regression involving the distances within one of the groups is compatible with the regression involving the same within-group distances together with the between group distances.
Usage
regdistbetweenone(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2],rgroup)
Arguments
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
something that can be coerced into a factor,
defining the grouping of
objects represented by the dissimilarities |
groups |
vector of two levels. The two groups defining the
regressions to be compared in the test. These can be
factor levels, integer numbers, or strings, depending on the entries
of |
rgroup |
one of the levels in |
Details
The null hypothesis that the regressions based on the distances
within group species
and based on these distances together with
the between-groups distances are
equal is tested using jackknife pseudovalues. The test statistic is
the difference between fitted
values with x (explanatory variable) fixed at the center of the
between-group distances. The test is run one-sided, i.e., the null
hypothesis is only rejected if the between-group distances are larger
than expected under the null hypothesis, see below. For the jackknife,
observations from both groups are left out one at a time. However, the
roles of the two groups are different (observations from group
species
are used in both regressions whereas observations from
the other group are only used in one of them), and therefore the
corresponding jackknife pseudovalues can have different variances. To
take this into account, variances are pooled, and the degrees of
freedom of the t-test are computed by the Welch-Sattertwaithe
approximation for aggregation of different variances.
The test cannot be run and many components will be NA
in case that
within-group regressions or jackknifed within-group regressions are
ill-conditioned.
This was implemented having in mind an application in which the
explanatory distances represent geographical distances, the response
distances are genetic distances, and groups represent species or
species-candidates. In this application, for testing whether the
regression patterns are compatble with the two groups behaving like a
single species, one would first use regeqdist
to test whether a
joint regression for the within-group distances of both groups makes
sense. If this is not rejected, regdistbetween
is run to see
whether the between-group distances are compatible with the
within-group distances.
If a joint regression on
within-group distances is rejected by regeqdist
,
regdistbetweenone
can be
used to test whether the between-group distances are at least
compatible with the within-group distances of one of the groups, which
can still be the case within a single species, see Hausdorf and Hennig
(2019). This
is only rejected if the between-group
distances are larger than expected under equality of regressions,
because if they are smaller, this is not an indication against the
groups belonging together genetically. To this end,
regdistbetweenone
needs to be run twice using both groups as
species
. This will produce two p-values. The null hypothesis
that the regressions are compatible for at least one group can be
rejected if the maximum of the two p-values is smaller than the chosen
significance level.
Value
list of class "regdistbetween"
with components
pval |
p-value. |
coeffdiff |
difference between regression fits (within-group
together with between-groups distances
minus within-group distances only) at |
condition |
condition numbers of regressions, see |
lmfit |
list. Output objects of |
jr |
output object of |
xcenter |
mean of within-group distances for group |
xcenterbetween |
mean of between-groups distances of explanatory
variable (after centering by |
tstat |
t-statistic. |
tdf |
degrees of freedom of t-statistic according to Welch-Sattertwaithe approximation. |
jackest |
jackknife-estimator of difference between regression
fitted values at |
jackse |
jackknife-standard error for
|
jackpseudo |
vector of jacknife pseudovalues on which the test is based. |
groups |
see above. |
species |
see above. |
testname |
title to be printed out when using
|
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
See Also
Examples
options(digits=4)
data(veronica)
ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2")
vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard")
species <-c(rep(1,13),rep(2,22))
loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25))
rtest3 <-
regdistbetweenone(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2),rgroup=1)
print(rtest3)
Regression difference between within-group dissimilarities
Description
Given two dissimilarity matrices dmx
and dmy
, an indicator
vector x
and a grouping, this computes the difference between
standard least squares regression predictions at point
xcenterbetween
. The regressions are based on the dissimilarities
in dmx
vs. dmy
for objects indicated in
x
. grouping
indicates the two groups, and the difference
is computed between regressions based on the within-group distances of
the two groups.
Usage
regdistdiff(x,dmx,dmy,grouping,xcenter=0,xcenterbetween=0)
Arguments
x |
vector of logicals of length of the number of objects on which
dissimilarities |
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
vector of length of the number of objects on which
dissimilarities |
xcenter |
numeric. Dissimilarities |
xcenterbetween |
numeric. This specifies the x- (dissimilarity)
value at which predictions from the two regressions are
compared. Note that this is interpreted as after centering by
|
Value
Difference between
standard least squares regression predictions for the two groups at point
xcenterbetween
.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
See Also
Examples
options(digits=4)
data(veronica)
ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2")
vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard")
species <-c(rep(1,13),rep(2,22))
regdistdiff(rep(TRUE,35),ver.geo,vei$distmat,grouping=species,xcenter=0,xcenterbetween=100)
Regression difference within reference group and between-group dissimilarities
Description
Given two dissimilarity matrices dmx
and dmy
, an indicator
vector x
and a grouping, this computes the difference between
standard least squares regression predictions at point
xcenterbetween
. The regressions are based on the dissimilarities
in dmx
vs. dmy
for objects indicated in
x
. grouping
indicates the two groups, and the difference
is computed between regressions based on (a) the within-group
distances of the reference group rgroup
and (b) these together
with the between-group distances.
Usage
regdistdiffone(x,dmx,dmy,grouping,xcenter=0,xcenterbetween=0,rgroup)
Arguments
x |
vector of logicals of length of the number of objects on which
dissimilarities |
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
vector of length of the number of objects on which
dissimilarities |
xcenter |
numeric. Dissimilarities |
xcenterbetween |
numeric. This specifies the x- (dissimilarity)
value at which predictions from the two regressions are
compared. Note that this is interpreted as after centering by
|
rgroup |
one of the values of |
Value
Difference between
standard least squares regression predictions for the two regressions at point
xcenterbetween
.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
See Also
Examples
options(digits=4)
data(veronica)
ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],
file.format="decimal2")
vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard")
species <-c(rep(1,13),rep(2,22))
regdistdiffone(rep(TRUE,35),ver.geo,vei$distmat,grouping=species,
xcenter=0,xcenterbetween=100,rgroup=2)
Testing equality of two distance-regressions
Description
Jackknife-based test for equality of two regressions between distance matrices.
Usage
regeqdist(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2])
## S3 method for class 'regeqdist'
print(x,...)
Arguments
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
something that can be coerced into a factor,
defining the grouping of
objects represented by the dissimilarities |
groups |
Vector of two, indicating the two groups defining the
regressions to be compared in the test. These can be
factor levels, integer numbers, or strings, depending on the entries
of |
x |
object of class |
... |
optional arguments for print method. |
Details
The null hypothesis that the regressions within the two groups are equal is tested using jackknife pseudovalues independently in both groups allowing for potentially different variances of the pseudovalues, and aggregating as in Welch's t-test. Tests are run separately for intercept and slope and aggregated by Bonferroni's rule.
The test cannot be run and many components will be NA
in case that
within-group regressions or jackknifed within-group regressions are
ill-conditioned.
This was implemented having in mind an application in which the
explanatory distances represent geographical distances, the response
distances are genetic distances, and groups represent species or
species-candidates. In this application, for testing whether the
regression patterns are compatble with the two groups behaving like a
single species, one would first use regeqdist
to test whether a
joint regression for the within-group distances of both groups makes
sense. If this is not rejected, regdistbetween
is run to see
whether the between-group distances are compatible with the
within-group distances. On the other hand, if a joint regression on
within-group distances is rejected, regdistbetweenone
can be
used to test whether the between-group distances are at least
compatible with the within-group distances of one of the groups, which
can still be the case within a single species, see Hausdorf and Hennig (2019).
Value
list of class "regeqdist"
with components
pval |
p-values for intercept and slope. |
coeffdiff |
vector of differences between groups (first minus second) for intercept and slope. |
condition |
condition numbers of regressions, see |
lmfit |
list. Output objects of |
jr |
list of two lists of two; output object of
|
xcenter |
mean of |
tstat |
t-statistic. |
tdf |
vector of degrees of freedom of t-statistic according to Welch-Sattertwaithe approximation for intercept and slope. |
jackest |
jackknife-estimator of difference between regressions; vector with intercept and slope difference. |
jackse |
vector with jackknife-standard errors for
|
jackpseudo |
list of two lists of vectors; jacknife pseudovalues within both groups for intercept and slope estimators. |
groups |
see above. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
See Also
regdistbetween
, regdistbetweenone
Examples
options(digits=4)
data(veronica)
ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2")
vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard")
loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25))
species <-c(rep(1,13),rep(2,22))
rtest <- regeqdist(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2))
print(rtest)
Simulation of abundance matrices (non-clustered)
Description
Generates a simulated matrix where the rows are interpreted as regions
and the columns as species, and the entries are abundances.
Species are generated i.i.d. in two steps. In the first step, a
presence-absence matrix is generated as in randpop.nb
. In the
second step, conditionally on presence in the first step, abundance
values are generated according to a simultaneous autoregression (SAR)
model for the log-abundances (see errorsarlm
for
the model; estimates are provided by the parameter
sarestimate
). Spatial autocorrelation of a species' presences
is governed by the parameter p.nb
, sarestimate
and a
list of neighbors for each region.
Usage
regpop.sar(abmat, prab01=NULL, sarestimate=prab.sarestimate(abmat),
p.nb=NULL,
vector.species=prab01$regperspec,
pdf.regions=prab01$specperreg/(sum(prab01$specperreg)),
count=FALSE)
Arguments
abmat |
object of class |
prab01 |
presence-absence matrix of same dimensions than the
abundance matrix of |
sarestimate |
Estimator of the parameters of a simultaneous
autoregression model corresponding to the null model for abundance
data from Hausdorf and Hennig (2007) as generated by
|
p.nb |
numeric between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. If |
vector.species |
vector of integers. |
pdf.regions |
numerical vector of length |
count |
logical. If |
Value
A matrix of abundance values, rows are regions, columns are species.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
References
Hausdorf, B. and Hennig, C. (2007) Null model tests of clustering of species, negative co-occurrence patterns and nestedness in meta-communities. Oikos 116, 818-828.
See Also
autoconst
estimates p.nb
from matrices of class
prab
. These are generated by prabinit
.
abundtest
uses regpop.sar
as a null model for
tests of clustering.
randpop.nb
(analogous function for simulating
presence-absence data)
Examples
options(digits=4)
data(siskiyou)
set.seed(1234)
x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb,
distance="none")
# Not run; this needs package spdep.
# regpop.sar(x, p.nb=0.046)
regpop.sar(x, p.nb=0.046, sarestimate=prab.sarestimate(x,sar=FALSE))
Herbs of the Siskiyou Mountains
Description
Distributions of species of herbs in relation to elevation on quartz diorite in the central Siskiyou Mountains. All values are per mille frequencies in transects (The number of 1 m2 quadrats, among 1000 such quadrats, in which a species was observed, based on 1250 1m2 quadrats in the first 5 transects, and 400 1m2 quadrats in 6. transect). Observed presences in the transect, outside the sampling plots, were coded as 0.2. Rows correspond to species, columns correspond to regions.
Usage
data(siskiyou)
Format
Three objects are generated:
- siskiyou
numeric matrix giving the 144*6 abundance values.
- siskiyou.nb
neighborhood list for the 6 regions.
- siskiyou.groups
integer vector of length 144, giving group memberships for the 144 species.
Details
Reads from example data files LeiMik1.dat, LeiMik1NB.dat,
LeiMik1G.dat
.
Source
Whittaker, R. H. 1960. Vegetation of the Siskiyou Mountains, Oregon and California. Ecol. Monogr. 30: 279-338 (table 14).
Examples
data(siskiyou)
Average within-group distances for given groups
Description
Generates average within-group distances (overall and group-wise) from a dissimilarity matrix and a given grouping.
Usage
specgroups(distmat,groupvector, groupinfo)
Arguments
distmat |
dissimilarity matrix or |
groupvector |
integer vector. For every row of |
groupinfo |
list with components |
Value
A list with parameters
overall |
overall average within-groups dissimilarity. |
gr |
vector of group-wise average within-group dissimilarities
(this will be |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
Examples
options(digits=4)
data(siskiyou)
x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb,
distance="logkulczynski")
groupvector <- as.factor(siskiyou.groups)
ng <- length(levels(groupvector))
lg <- levels(groupvector)
nsg <- numeric(0)
for (i in 1:ng) nsg[i] <- sum(groupvector==lg[i])
groupinfo <- list(lg=lg,ng=ng,nsg=nsg)
specgroups(x$distmat,groupvector,groupinfo)
Stress values for different dimensions of Kruskal's MDS
Description
Computes Kruskal's nonmetric multidimensional scaling
isoMDS
on alleleobject
or
prab
-objects for
different output dimensions in order to compare stress values.
Usage
stressvals(x,mdsdim=1:12,trace=FALSE)
Arguments
x |
object of class |
mdsdim |
integer vector of MDS numbers of dimensions to be tried. |
trace |
logical. |
Details
Note that zero distances between
non-identical objects are replaced by the smallest nonzero distance
divided by 10 to prevent isoMDS
from producing an error.
Value
A list with components
MDSstress |
vector of stress values. |
mdsout |
list of full outputs of |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
Examples
options(digits=4)
data(tetragonula)
set.seed(112233)
taiselect <- sample(236,40)
# Use data subset to make execution faster.
tnb <-
coord2dist(coordmatrix=tetragonula.coord[taiselect,],
cut=50,file.format="decimal2",neighbors=TRUE)
ta <- alleleconvert(strmatrix=tetragonula[taiselect,])
tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist)
stressvals(tai,mdsdim=1:3)$MDSstress
Microsatellite genetic data of Tetragonula bees
Description
Genetic data for 236 Tetragonula (Apidae) bees from Australia and Southeast Asia, see Franck et al. (2004). The data give pairs of alleles (codominant markers) for 13 microsatellite loci.
Usage
data(tetragonula)
Format
Two objects are generated:
- tetragonula
A data frame with 236 observations and 13 string variables. Strings consist of six digits each. The format is derived from the data format used by the software GENEPOP (Rousset 2008). Alleles have a three digit code, so a value of
"258260"
on variable V10 means that on locus 10 the two alleles have codes 258 and 260."000"
refers to missing values.- tetragonula.coord
a 236*2 matrix. Coordinates of locations of individuals in decimal format, i.e. the first number is latitude (negative values are South), with minutes and seconds converted to fractions. The second number is longitude (negative values are West).
Details
Reads from example data file Heterotrigona_indoFO.dat
.
Source
Franck, P., E. Cameron, G. Good, J.-Y. Rasplus, and B. P. Oldroyd (2004) Nest architecture and genetic differentiation in a species complex of Australian stingless bees. Mol. Ecol. 13, 2317-2331.
Rousset, F. (2008) genepop'007: a complete re-implementation of the genepop software for Windows and Linux. Molecular Ecology Resources 8, 103-106.
Examples
data(tetragonula)
Convert abundance matrix into presence/absence matrix
Description
Converts abundance matrix into binary (logical) presence/absence
matrix (TRUE
if
abundance>0).
Usage
toprab(prabobj)
Arguments
prabobj |
object of class |
Value
Logical matrix with same dimensions as prabobj$prab
as described above.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
Examples
data(siskiyou)
x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb,
distance="none")
toprab(x)
Internal: create allele list out of character matrix
Description
Creates a list of lists, such as required by alleledist
,
from the charmatrix
component of an
alleleobject
.
Usage
unbuild.charmatrix(charmatrix,n.individuals,n.variables)
Arguments
charmatrix |
matrix of characters in which there are two rows for
every individual corresponding to the two alleles in every locus
(column). Entries are allele codes but missing values are coded as
|
n.individuals |
integer. Number of individuals. |
n.variables |
integer. Number of loci. |
Value
A list of lists. In the "outer" list, there are
n.variables
lists, one for each locus. In the "inner" list, for every
individual there is a vector of two codes (typically characters, see
alleleinit
) for the two alleles in that locus.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
See Also
Examples
data(tetragonula)
tnb <-
coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE)
ta <- alleleconvert(strmatrix=tetragonula[1:50,])
tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist,distance="none")
str(unbuild.charmatrix(tai$charmatrix,50,13))
Genetic AFLP data of Veronica plants
Description
0-1 data indicating whether dominant markers are present for 583 different AFLP bands ranging from 61 to 454 bp of 207 plant individuals of Veronica (Pentasepalae) from the Iberian Peninsula and Morocco (Martinez-Ortega et al., 2004).
Usage
data(veronica)
Format
Two objects are generated:
- veronica
0-1 matrix with 207 individuals (rows) and 583 AFLP bands (columns).
- veronica.coord
a 207*2 matrix. Coordinates of locations of individuals in decimal format, i.e. the first number is latitude (negative values are South), with minutes and seconds converted to fractions. The second number is longitude (negative values are West).
Details
Reads from example data files MartinezOrtega04AFLP.dat,
MartinezKoord.dat
.
Source
Martinez-Ortega, M. M., L. Delgado, D. C. Albach, J. A. Elena-Rossello, and E. Rico (2004). Species boundaries and phylogeographic patterns in cryptic taxa inferred from AFLP markers: Veronica subgen. Pentasepalae (Scrophulariaceae) in the Western Mediterranean.Syst. Bot. 29, 965-986.
Examples
data(veronica)
Overwater distances between islands in the Aegean sea
Description
Distance matrix of overwater distances in km between 34 islands in the Aegean sea.
Usage
data(waterdist)
Format
A symmetric 34*34 distance matrix.
Details
Reads from example data file Waterdist.dat
, in which there is a
35th column and line with distances to Turkey mainland.
Source
B. Hausdorf and C. Hennig (2005) The influence of recent geography, palaeography and climate on the composition of the faune of the central Aegean Islands. Biological Journal of the Linnean Society 84, 785-795.
Examples
data(waterdist)