| Title: | Datasets, Functions and Scripts for Semiparametric Regression Supporting Harezlak, Ruppert & Wand (2018) |
|---|---|
| Description: | The book "Semiparametric Regression with R" by J. Harezlak, D. Ruppert & M.P. Wand (2018, Springer; ISBN: 978-1-4939-8851-8) makes use of datasets and scripts to explain semiparametric regression concepts. Each of the book's scripts are contained in this package as well as datasets that are not within other R packages. Functions that aid semiparametric regression analysis are also included. |
| Authors: | Jaroslaw Harezlak [aut], David Ruppert [aut], Matt P. Wand [aut, cre] |
| Maintainer: | Matt P. Wand <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.0-6 |
| Built: | 2026-05-23 08:41:55 UTC |
| Source: | https://github.com/cran/HRW |
The BanglaContrac data frame has multilevel data from the 1988 Bangladesh Fertility Survey. There are data on contraceptive use, number of children, age and urban-dweller status of 1,934 women grouped in 60 districts.
data(BanglaContrac)data(BanglaContrac)
This data frame contains the following columns:
districtIDdistrict identification number.
usingContraceptionindicator that woman is using contraception:
1 = woman using contraception at time of survey,
0 = woman not using contraception at time of survey.
childCodenumerical code for number of living children:
1 = no living children at time of survey,
2 = one living child at time of survey,
3 = two living children at time of survey,
4 = three or more living children at time of survey.
ageMinusMeanage in years of woman at time of survey, with mean age subtracted
isUrbanindicator that woman lives in an urban region:
1 = woman an urban region dweller at time of survey,
0 = woman a rural region dweller at time of survey.
Huq, N.M. and Cleland, J. (1990). Bangladesh Fertility Survey 1989 (Main Report). Dhaka, Bangladesh: National Institute of Population Research and Training.
library(HRW) ; data(BanglaContrac) if (require("lattice")) print(xyplot(jitter(usingContraception) ~ ageMinusMean|factor(districtID), groups = childCode, data = BanglaContrac))library(HRW) ; data(BanglaContrac) if (require("lattice")) print(xyplot(jitter(usingContraception) ~ ageMinusMean|factor(districtID), groups = childCode, data = BanglaContrac))
The BCR data frame has data from a 6-week clinical trial of a drug versus a placebo. The data are subject to measurement error and have been transformed and rescaled from their original form. These data are analyzed in the 2002 ‘Journal of the American Statistical Association’ article by Berry, Carroll and Ruppert (full reference below).
data(BCR)data(BCR)
This data frame contains the following columns:
statuscode for status of patient:
control = patient is in placebo group,
treatment=patient is in treatment group.
wphysician-assessed score of patient's mental health at baseline.
yphysician-assessed score of patient's mental health at end of the study.
Berry, S.M., Carroll, R.J. and Ruppert, D. (2002). Bayesian smoothing and regression splines for measurement error problems.Journal of the American Statistical Association, 97, 160-169.
library(HRW) ; data(BCR) if (require("lattice")) print(xyplot(y ~ w|status,data = BCR))library(HRW) ; data(BCR) if (require("lattice")) print(xyplot(y ~ w|status,data = BCR))
The full dataset is in the data frame 'Hdma' within the R package Ecdat. This data frame is the subset of mortgage applications during the years 1998-1999.
data(BostonMortgages)data(BostonMortgages)
A data frame with 2,380 observations on the following 13 variables:
dirratio of the debt payments to the total income.
hirratio of the housing expenses to the total income.
lvrratio of the loan size to the assessed value of property.
ccsa credit score ranging from 1 to 6, where a low value indicates low credit risk.
mcsa mortgage credit score from 1 to 4, where a low value indicates low credit risk.
pbcrdid the applicant have a public bad credit record?: a factor with levels no and yes.
dmiwas the applicant denied mortgage insurance?: a factor with levels no and yes.
selfwas the applicant self-employed?: a factor with levels no and yes.
singleis the applicant single?: a factor with levels no and yes.
uria1989 Massachusetts unemployment rate in the applicant's industry.
condominiumis the unit a condominium?: a factor with levels no and yes.
blackis the applicant black?: a factor with levels no and yes.
denywas the mortgage denied?: a factor with levels no and yes.
Munnell, A. H., Tootell, G. M. B., Browne, L. E., and McEneaney, J. (1996). Mortgage lending in Boston: Interpreting HDMA data, American Economic Review, 25-53.
library(HRW) ; data(BostonMortgages) BostonMortgages$denyBinary <- as.numeric(BostonMortgages$deny == "yes") fit <- glm(denyBinary ~ black + dir + lvr + pbcr + self + single + as.factor(ccs), family = binomial,data = BostonMortgages) summary(fit)library(HRW) ; data(BostonMortgages) BostonMortgages$denyBinary <- as.numeric(BostonMortgages$deny == "yes") fit <- glm(denyBinary ~ black + dir + lvr + pbcr + self + single + as.factor(ccs), family = binomial,data = BostonMortgages) summary(fit)
The brainImage data frame corresponds a functional magnetic image of a coronal slice of a human brain.
The data are brain activity on a pixel array.
data(brainImage)data(brainImage)
This data frame is a 80 by 37 array. The entries correspond to brain activity in each of the corresponding pixels. The columns names are c1-c37. These names have no meaning, and are present to ensure that this data frame conforms with R data frame conventions.
Landman, B.A., Huang, A.J., Gifford, A., Vikram, D.S., Lim, I.A.L, Farrell, J.A.D., Bogovic, J.A., Hua, J., Chen, M., Jarso, S., Smith, S.A., Joel, S., Mori, S., Pekar, J.J., Barker, P.B., Prince, J.L. and van Zijl, P.C.M. (2010). Multi-parametric neuroimaging reproducibility: A 3T resource study. NeuroImage, 54, 2854-2866.
library(HRW) ; data(brainImage) image(as.matrix(brainImage))library(HRW) ; data(brainImage) image(as.matrix(brainImage))
Daily returns on the Standard & Poors' 500 stock market index, daily rate of the U.S. Treasury bills, and 3 companies' stocks including Microsoft, the General Electric and the Ford Motor Company during the period from November 1, 1993 to March 31, 2003.
data(capm)data(capm)
A data frame with 2363 observations on the following 6 variables:
Close.tbillDaily Treasury bill rate expressed as a percentage.
Close.msftDaily closing price of the Microsoft stock.
Close.sp500Daily closing Standard and Poor's 500 index.
Close.geDaily closing price of the General Electric stock.
Close.fordDaily closing price of the Ford Motor Company stock.
DateDates from November 1, 1993 to March 31, 2003 (d-Mon-yy and dd-Mon-yr formats).
Federal Reserve Bank of St. Louis U.S.A. (Treasury bill rates) and Yahoo Finance (stock prices).
# The Capital Asset Pricing Model (CAPM) states that the excess returns on a stock # have a linear relationship with the returns on the market. This example investigates # the CAPM for General Electric stock: library(HRW) ; data(capm) n <- dim(capm)[1] riskfree <- capm$Close.tbill[2:n]/365 elrGE <- diff(log(capm$Close.ge)) - riskfree elrSP500 <- diff(log(capm$Close.sp500)) - riskfree plot(elrSP500,elrGE,col = "blue",cex = 0.2) fitOLS <- lm(elrGE ~ elrSP500) summary(fitOLS) par(mfrow = c(2,2)) ; plot(fitOLS)# The Capital Asset Pricing Model (CAPM) states that the excess returns on a stock # have a linear relationship with the returns on the market. This example investigates # the CAPM for General Electric stock: library(HRW) ; data(capm) n <- dim(capm)[1] riskfree <- capm$Close.tbill[2:n]/365 elrGE <- diff(log(capm$Close.ge)) - riskfree elrSP500 <- diff(log(capm$Close.sp500)) - riskfree plot(elrSP500,elrGE,col = "blue",cex = 0.2) fitOLS <- lm(elrGE ~ elrSP500) summary(fitOLS) par(mfrow = c(2,2)) ; plot(fitOLS)
The carAuction data frame has data on several variables concerning cars purchased at automobile auctions by automobile dealerships in the United States of America. The origin of these data is a classification competition titled “Don't Get Kicked!” that ran on the ‘kaggle’ platform (https://www.kaggle.com) during 2011-2012. Many of the variables in this data frame have been derived from those in the original data set from https://www.kaggle.com.
data(carAuction)data(carAuction)
This data frame contains the following columns:
RefIdunique number assigned to each vehicles.
IsBadBuyindicator that the vehicle purchased at auction by an automobile dealership has serious problems that hinder or prevent it being sold - a "bad buy":
1 = the vehicle is a "bad buy"
0 = the vehicle is a "good buy".
All other indicator variables are defined in this way.
purchIn2010indicator that vehicle was purchased in 2010.
aucEqAdesaindicator that the auction provider at which the vehicle was purchased was Adesa.
aucEqManheimindicator that the auction provider at which the vehicle was purchased was Manheim.
vehYearEq03indicator that the manufacturer's year of the vehicle is 2003.
vehYearEq04indicator that the manufacturer's year of the vehicle is 2004.
vehYearEq05indicator that the manufacturer's year of the vehicle is 2005.
vehYearEq06indicator that the manufacturer's year of the vehicle is 2006.
vehYearEq07indicator that the manufacturer's year of the vehicle is 2007.
ageAtSaleage of the vehicle in years when sold.
makeEqChevroletindicator that the vehicle's manufacturer is Chevrolet.
makeEqFordindicator that the vehicle's manufacturer is Ford.
makeEqDodgeindicator that the vehicle's manufacturer is Dodge.
makeEqChryslerindicator that the vehicle's manufacturer is Chrysler.
trimEqBasindicator that the trim level of the vehicle is 'Bas'.
trimEqLSindicator that the trim level of the vehicle is 'LS'.
trimEqSEindicator that the trim level of the vehicle is 'SE'.
subModelEq4DSEDANindicator that the submodel of the vehicle is '4DSedan'.
subModelEq4DSEDANLSindicator that the submodel of the vehicle is '4DSedanLS'.
subModelEq4DSEDANSEindicator that the submodel of the vehicle is '4DSedanSE'.
colourEqSilverindicator that the vehicle color is silver.
colourEqWhiteindicator that the vehicle color is white.
colourEqBlueindicator that the vehicle color is blue.
colourEqGreyindicator that the vehicle color is grey.
colourEqBlackindicator that the vehicle color is black.
colourEqRedindicator that the vehicle color is red.
colourEqGoldindicator that the vehicle color is gold.
colourEqOrangeindicator that the vehicle color is orange.
transEqManualindicator that the vehicle has manual transmission.
wheelEqAlloyindicator that the vehicle has alloy wheels.
wheelEqCoversindicator that the vehicle has covered wheels.
odomReadthe vehicle's odometer reading in miles.
AmericanMadeindicator that the vehicle was manufactured in the United States of America.
otherAsianMadeindicator that the vehicle was manuctured in an Asian nation other than Japan or South Korea.
sizeEqTruckindicator that the size category of the vehicle is 'truck'.
sizeEqMediumindicator that the size category of the vehicle is 'medium'.
sizeEqSUVindicator that the size category of the vehicle is 'SUV'.
sizeEqCompactindicator that the size category of the vehicle is 'compact'.
sizeEqVanindicator that the size category of the vehicle is 'van'.
priceacquisition price for this vehicle in average condition at time of purchase in U.S. dollars.
purchInTexasindicator that the vehicle was purchased in Texas.
purchInFloridaindicator that the vehicle was purchased in Florida.
purchInCaliforniaindicator that the vehicle was purchased in California.
purchInNorthCarolinaindicator that the vehicle was purchased in North Carolina.
purchInArizonaindicator that the vehicle was purchased in Arizona.
purchInColoradoindicator that the vehicle was purchased in Colorado.
purchInSouthCarolinaindicator that the vehicle was purchased in South Carolina.
costAtPurchacquisition cost paid for the vehicle at time of purchase.
onlineSaleindicator that the vehicle was purchased online.
warrantyCostwarranty cost in U.S. dollars.
The “Don't Get Kicked” competition, https://www.kaggle.com.
library(HRW) ; data(carAuction) ## Not run: for (colNum in 3:10) { plot(jitter(carAuction[,colNum]),jitter(carAuction$IsBadBuy),pch = ".", xlab = names(carAuction)[colNum],ylab = "is car a bad buy?",col = "blue") readline("Hit Enter to continue.\n") } for (colNum in 11:51) { plot(jitter(carAuction[,colNum]),jitter(carAuction$IsBadBuy),pch = ".", xlab = names(carAuction)[colNum],ylab = "is car a bad buy?",col = "blue") readline("Hit Enter to continue.\n") } ## End(Not run)library(HRW) ; data(carAuction) ## Not run: for (colNum in 3:10) { plot(jitter(carAuction[,colNum]),jitter(carAuction$IsBadBuy),pch = ".", xlab = names(carAuction)[colNum],ylab = "is car a bad buy?",col = "blue") readline("Hit Enter to continue.\n") } for (colNum in 11:51) { plot(jitter(carAuction[,colNum]),jitter(carAuction$IsBadBuy),pch = ".", xlab = names(carAuction)[colNum],ylab = "is car a bad buy?",col = "blue") readline("Hit Enter to continue.\n") } ## End(Not run)
The CHD data frame has data on coronary heart disease status, cholesterol level measurements and age. Further details are given in the 1996 ‘Journal of the American Statistical Association’ article by Roeder, Carroll and Lindsay (full reference below).
data(CHD)data(CHD)
This data frame contains the following columns:
CHDindicator of coronary heart disease status:
0 = patient does not have coronary heart disease,
1 = patient has coronary heart disease.
LDLlow density lipoprotein cholesterol level.
TCtotal cholesterol level.
ageage of patient in years.
Roeder, K., Carroll, R.J. and Lindsay, B.G. (1996). A semiparametric mixture approach to case-control studies with errors in covariables. Journal of the American Statistical Association, 91, 722-732.
Richardson, S., Leblond, L., Jaussent, I. and Green, P.J. (2002). Mixture models in measurement error problems, with reference to epidemiological studies. Journal of the Royal Statistical Society, Series A, 163, 549-566.
library(HRW) ; data(CHD) pairs(CHD)library(HRW) ; data(CHD) pairs(CHD)
The coral data frame has data on initial size, taxonomic identity and alive/death status of coral organisms in French Polynesia.
data(coral)data(coral)
This data frame contains the following columns:
siteDepthPeriodfactor with levels corresponding to a code for the site, depth and time period concerning where and when coral organisms were measured.
taxonfactor corresponding to an abbreviation for taxonomic identity:ACR = Acropora,POC = Pocillopora,POR = Porites.
logInitialSizePlusOneinitial size measurement of coral organism transformed according to the log(initial size + 1).
diedindicator that coral organism has died:
1 = coral organism has died,
0 = coral organism still alive.
Kayal, M., Vercelloni, J., Wand, M.P. and Adjeroud, M. (2015). Searching for the best bet in life-strategy: a quantitative population dynamics approach to life history trade-offs in reef-building corals. Ecological Complexity, 23, 73-84.
library(HRW) ; data(coral) if (require("lattice")) print(xyplot(died ~ logInitialSizePlusOne|siteDepthPeriod*taxon, data = coral,layout = c(15,5)))library(HRW) ; data(coral) if (require("lattice")) print(xyplot(died ~ logInitialSizePlusOne|siteDepthPeriod*taxon, data = coral,layout = c(15,5)))
Create a boundary polygon corresponding nominally to the effective probability density support of a bivariate dataset via an interactive graphical interface and mouse (or, possibly, touchpad) posititionings and button clicks.
createBoundary(x,y)createBoundary(x,y)
x |
vector containing the x-coordinates of a bivariate dataset. |
y |
vector containing the y-coordinates of a bivariate dataset. |
After the bivariate dataset is displayed on the screen a boundary polygon is selected by performing left mouse (or, possibly, touchpad) clicks on the screen to specify vertex positions, and then moving around in a clockwise direction until the polygon is completed. Completion is achieved by clicking inside the red octagon surrounding the starting vertex.
A two-column matrix containing the vertices of the selected boundary polygon.
M.P. Wand [email protected]
library(HRW) x <- c(4,1,9,8,3,9,7) y <- c(5,7,5,4,2,1,1) ## Not run: myBoundary <- createBoundary(x,y)library(HRW) x <- c(4,1,9,8,3,9,7) y <- c(5,7,5,4,2,1,1) ## Not run: myBoundary <- createBoundary(x,y)
The femSBMD data frame has longitudinal data on spinal bone mineral, density, age and ethnicity for female youths from a study on bone mineral acquisition.
data(femSBMD)data(femSBMD)
This data frame contains the following columns:
idnumidentification number unique to each subject.
spnbmdspinal bone mineral density in grams per square centimeter.
ageage of subject in years.
ethnicityfactor corresponding to the subject's
ethnicity with levels Asian, Black, Hispanic and White.
blackindicator of the subject being black:
0 = subject is black
1 = subject is not black.
hispanicindicator of the subject being Hispanic:
0 = subject is Hispanic
1 = subject is not Hispanic.
whiteindicator of the subject being white:
0 = subject is white
1 = subject is not white.
Bachrach, L.K., Hastie, T., Wang, M.-C., Narasimhan, B. and Marcus, R. (1999). Bone mineral acquisition in healthy Asian, Hispanic, Black, and Caucasian youth: a longitudinal study. Journal of Clinical Endocrinology and Metabolism, 84, 4702-4712.
library(HRW) ; data(femSBMD) if (require("lattice")) print(xyplot(spnbmd ~ age|factor(ethnicity),groups = idnum, data = femSBMD,type = "b"))library(HRW) ; data(femSBMD) if (require("lattice")) print(xyplot(spnbmd ~ age|factor(ethnicity),groups = idnum, data = femSBMD,type = "b"))
Data on adolescent somatic growth obtained from a study of the mechanisms of human hypertension development conducted at the Indiana University School of Medicine, Indianapolis, Indiana, U.S.A. The data are restricted to a subset of 216 adolescents in the original study who had at least 9 height measurements. There are a total of 4,123 height measurements taken approximately every 6 months.
data(growthIndiana)data(growthIndiana)
A data frame with 4,123 observations on the following 5 variables:
idnumidentification numbers of the 216 adolescents.
heightheight in centimeters.
ageage in years.
maleindicator of the adolescent being male:
1 = adolescent is male,
0 = adolescent is female.
blackindicator of the adolescent being black:
1 = adolescent is black,
0 = adolescent is not black.
Pratt, J.H., Jones, J.J., Miller, J.Z., Wagner, M.A. and Fineberg, N.S. (1989). Racial differences in aldosterone excretion and plasma aldosterone concentrations in children. New England Journal of Medicine, 321, 1152-1157.
library(HRW) ; data(growthIndiana) growthINblackMales <- growthIndiana[(growthIndiana$male == 1) & (growthIndiana$black == 1),] if (require("lattice")) xyplot(height ~ age|factor(idnum),data = growthINblackMales)library(HRW) ; data(growthIndiana) growthINblackMales <- growthIndiana[(growthIndiana$male == 1) & (growthIndiana$black == 1),] if (require("lattice")) xyplot(height ~ age|factor(idnum),data = growthINblackMales)
Indonesian Children's Health Study of respiratory infections for a cohort of 275 Indonesian children. The data are longitudinal with each child having between 1 and 6 repeated measurements.
data(indonRespir)data(indonRespir)
A data frame with 1200 observations on the following 12 variables:
idnumchild identification number.
respirInfecindicator of presence of resipiratory infection.
ageage of the child in years.
vitAdeficindicator of Vitamin A deficiency:
1 = the child had Vitamin A deficiency,
0 = the child did not have Vitamin A deficiency.
femaleindicator of child being female:
1 = the child is female,
0 = the child is male.
heightheight of the child in centimeters.
stuntedindicator of the child being "short for his/her age":
1 = the child is "short for his/her age",
0 = the child is not "short for his/her age"
visit2indicator that the child had exactly 2 clinical visits:
1 = the exact number of clinical visits was 2,
0 = the exact number of clinical visits was not 2.
visit3indicator that the child had exactly 3 clinical visits:
1 = the exact number of clinical visits was 3,
0 = the exact number of clinical visits was not 3.
visit4indicator that the child had exactly 4 clinical visits:
1 = the exact number of clinical visits was 4,
0 = the exact number of clinical visits was not 4.
visit5indicator that the child had exactly 5 clinical visits:
1 = the exact number of clinical visits was 5,
0 = the exact number of clinical visits was not 5.
visit6indicator that the child had exactly 6 clinical visits:
1 = the exact number of clinical visits was 6,
0 = the exact number of clinical visits was not 6.
Sommer, A. (1982). Nutritional Blindness. New York: Oxford University Press.
Diggle, P., Heagerty, P., Liang, K.-L. and Zeger, S. (2002). Analysis of Longitudinal Data (Second Edition). Oxford: Oxford University Press.
library(HRW) ; data(indonRespir) if (require("mgcv")) { fit <- gamm(respirInfec ~ s(age) + vitAdefic + female + height + stunted + visit2 + visit3 + visit4 + visit5 + visit6, random = list(idnum = ~1),family = binomial,data = indonRespir) summary(fit$gam) }library(HRW) ; data(indonRespir) if (require("mgcv")) { fit <- gamm(respirInfec ~ s(age) + vitAdefic + female + height + stunted + visit2 + visit3 + visit4 + visit5 + visit6, random = list(idnum = ~1),family = binomial,data = indonRespir) summary(fit$gam) }
The lidar data frame has 221 pairs from a LIght Detection And Ranging (LIDAR) experiment.
data(lidar)data(lidar)
This data frame contains the following columns:
rangedistance traveled before the light is reflected back to its source.
logratiologarithm of the ratio of received light from two laser sources.
Sigrist, M. (Ed.) (1994). Air Monitoring by Spectroscopic Techniques (Chemical Analysis Series, vol. 197). New York: Wiley.
library(HRW) ; data(lidar) plot(lidar$range,lidar$logratio)library(HRW) ; data(lidar) plot(lidar$range,lidar$logratio)
This dataset is a subset of the ozone2 dataset in the fields package. It contains the 8-hour average ozone concentration at 147 sites in the midwest region of the U.S.
data(ozoneSub)data(ozoneSub)
A data frame with 147 observations on the following 3 variables:
longitudeobservation longitude.
latitudeobservation latitude.
ozoneozone level.
Aerometric Information Retrieval System, the U.S. Environmental Protection Agency air quality data base.
Nychka, D., Furrer, R., Paige, J. and Sain, S. (2017). fields: Tools for spatial data. R package version 9.0. https://www.r-project.org.
library(HRW) ; data(ozoneSub) if (require("mgcv")) { fit.ozone.mgcv.tp <- gam(ozone ~ s(longitude,latitude,bs = "tp"), data = ozoneSub,method = "REML") plot(fit.ozone.mgcv.tp,scheme = 2, main = "ozone concentration",bty = "l") points(ozoneSub$longitude,ozoneSub$latitude) } if (require("fields")) US(add = TRUE,lwd = 2)library(HRW) ; data(ozoneSub) if (require("mgcv")) { fit.ozone.mgcv.tp <- gam(ozone ~ s(longitude,latitude,bs = "tp"), data = ozoneSub,method = "REML") plot(fit.ozone.mgcv.tp,scheme = 2, main = "ozone concentration",bty = "l") points(ozoneSub$longitude,ozoneSub$latitude) } if (require("fields")) US(add = TRUE,lwd = 2)
The plankton data frame has data on six flow cytometric measurements for 400 plankton organisms categorized into 5 different species. The data are synthetic and were generated to test various machine learning algorithms for plankton species classification. More details are given in the 2001 ‘Cytometry’ article by Boddy, Wilkins and Morris (full reference below).
data(plankton)data(plankton)
This data frame contains the following columns:
timeFlighttime of flight.
forwScattforward-scatter.
sideScattside-scatter.
redFluorBlueLightred fluorescence under blue light.
greenFluorBlueLightgreen fluorescence under blue light.
redFluorRedLightred fluorescence under red light.
speciesname of the plankton species, which is either Dunaliella, Hemiselmis, Isochrysis, Pavlova or Pyramimonas.
Boddy, L., Wilkins, M.F. and Morris, C.W. (2001). Pattern recognition in flow cytometry. Cytometry, 44, 195-209.
library(HRW) ; data(plankton) pointCols <- c("red","blue","green","orange","purple") pairs(plankton[,1:6],col = pointCols[plankton$species],pch = ".")library(HRW) ; data(plankton) pointCols <- c("red","blue","green","orange","purple") pairs(plankton[,1:6],col = pointCols[plankton$species],pch = ".")
Determination of whether each member of a set of bivariate points are inside a polygon.
pointsInPoly(pointsCoords,polygon)pointsInPoly(pointsCoords,polygon)
pointsCoords |
two-column matrix with each row specifying the (x,y) coordinates of a point. |
polygon |
two-column matrix with each row specifying the (x,y) coordinates of a vertex of a polygon. |
Geometric results are used to determine whether each of a set of bivariate points is inside or outside a polygon. A Boolean vector of indicators of whether or not each point is inside the polygon is returned.
A Boolean array with length equal to the number of rows in ‘pointsCoords’ corresponding to whether or not each of the corresponding points in 'pointCoords' are inside 'polygon'.
M.P. Wand [email protected]
library(HRW) myPolygon <- rbind(c(1,9),c(8,8),c(9,3),c(3,2),c(1,9))/10 plot(0:1,0:1,type = "n") ; lines(myPolygon) xyMat <- cbind(runif(10),runif(10)) inPoly <- pointsInPoly(xyMat,myPolygon) ; print(inPoly) points(xyMat[,1],xyMat[,2],col = as.numeric(inPoly) + 2)library(HRW) myPolygon <- rbind(c(1,9),c(8,8),c(9,3),c(3,2),c(1,9))/10 plot(0:1,0:1,type = "n") ; lines(myPolygon) xyMat <- cbind(runif(10),runif(10)) inPoly <- pointsInPoly(xyMat,myPolygon) ; print(inPoly) points(xyMat[,1],xyMat[,2],col = as.numeric(inPoly) + 2)
The protein data frame has longitudinal data on protein intake, body mass index and age of subjects in a dietary study.
data(protein)data(protein)
This data frame contains the following columns:
idnumidentification number unique to each subject.
proteinBioMlogarithm of intake of protein as measured by the biomarker urinary.
ageage of subject in years.
BMIbody mass index.
proteinRecalllogarithm of intake of protein as measured by a 24-hour recall instrument.
femaleindicator that subject is female:
1=subject is female,
0=subject is male.
Kipnis, V., Subar, A.F., Midthune, D., Freedman, L.S., Ballard-Barbash, R., Troiano, R., Bingham, S., Schoeller, D.A., Schatzkin, A. and Carroll, R.J. (2003). The structure of dietary measurement error: results of the OPEN biomarker study. American Journal of Epidemiology, 158, 14-21.
library(HRW) ; data(protein) if (require("lattice")) print(xyplot(proteinBioM ~ BMI|factor(female),groups = idnum, data = protein, type = "b"))library(HRW) ; data(protein) if (require("lattice")) print(xyplot(proteinBioM ~ BMI|factor(female),groups = idnum, data = protein, type = "b"))
The ragweed data frame has data on ragweed levels and meteorological variables for 334 days in Kalamazoo, Michigan, U.S.A.
data(ragweed)data(ragweed)
This data frame contains the following columns:
pollenCountragweed pollen count (grains per cubic metre).
yearone of 1991, 1992, 1993 or 1994.
dayInSeasonday number in the current ragweed pollen season.
temperaturetemperature for the corresponding day (degrees Fahrenheit).
temperatureResidualresidual from fitting a 5 effective degrees of freedom smoothing splines to temperature versus day number for each annual ragweed pollen season (degrees Fahrenheit).
rainindicator of significant rain on the corresponding day:
1=at least 3 hours of steady or brief but intense rain,
0=otherwise.
windSpeedwind speed for the corresponding day (knots).
Stark, P. C., Ryan, L. M., McDonald, J. L. and Burge, H. A. (1997). Using meteorologic data to model and predict daily ragweed pollen levels. Aerobiologia, 13, 177-184.
Ruppert, D., Wand, M.P. and Carroll, R.J. (2003). Semiparametric Regression Cambridge University Press.
library(HRW) ; data(ragweed) pairs(ragweed,pch = ".")library(HRW) ; data(ragweed) pairs(ragweed,pch = ".")
The scallop data frame has 148 triplets concerning scallop abundance; based on a 1990 survey cruise in the Atlantic continental shelf off Long Island, New York, U.S.A.
data(scallop)data(scallop)
This data frame contains the following columns:
latitudedegrees latitude (north of the Equator).
longitudedegrees longitude (west of Greenwich).
totalCatchtotal size of scallop catch at location specified by 'latitude' and 'longitude'.
Ecker, M.D. and Heltshe, J.F. (1994). Geostatistical estimates of scallop abundance. In Case Studies in Biometry. Lange, N., Ryan, L., Billard, L., Brillinger, D., Conquest, L. and Greenhouse, J. (eds.) New York: John Wiley & Sons, 107-124.
Ruppert, D., Wand, M.P. and Carroll, R.J. (2003). Semiparametric Regression. Cambridge University Press.
library(HRW) ; data(scallop) pairs(scallop)library(HRW) ; data(scallop) pairs(scallop)
The schoolResults data frame has multilevel data school results and gender for 1,905 school children from 73 schools in United Kingdom.
data(schoolResults)data(schoolResults)
This data frame contains the following columns:
schoolIDschool identification number.
studentIDstudent identification number.
indicator that child is female:
1=child is female,
0=child is male.
writtenScorescore on traditional written examination paper out of a total of 160.
courseScorescore from projects undertaken during the course and marked by the student's own teacher, out of a total of 108.
Creswell, M. (1991). A multilevel bivariate model. In Data Analysis with ML3 (eds. Prosser, R., Rasbash, J. and Goldstein, H.) London: Institute of Education, London.
library(HRW) ; data(schoolResults) if (require("lattice")) print(xyplot(writtenScore ~ courseScore|factor(schoolID), groups = female,data = schoolResults))library(HRW) ; data(schoolResults) if (require("lattice")) print(xyplot(writtenScore ~ courseScore|factor(schoolID), groups = female,data = schoolResults))
Given a set of MCMC for possibly several parameters the following summaries are produced: 1. a trace (time series) plot of each MCMC, 2. a plot of each MCMC sample against the 1-lagged sample, 3. estimated autocorrelatio function (acf), 4. Brooks-Gelman-Rubin (BGR) diagnostic plot in cases where multiple chains are inputted, 5. kernel ddensity estimate of the posterior density function based on the MCMC sample, 6. numerical summary consisting of the MCMC-based estimates of posterior means and credible sets for each parameter.
summMCMC(xList,EPSfileName,PDFfileName,plotInd = 1,parNames,columnHeadings, colourVersion = TRUE,credLevel = 0.95,columnHeadCex = NULL,paletteNum = 1, numerSummCex = NULL,BGRsttPos = 10,BGRyRange = c(0.95,1.25),BGRtickPos = 1.2, BGRlogTransf,BGRlogitTransf,KDExlim,KDEvertLine = TRUE,KDEvertLineCol = "black", addTruthToKDE = NULL)summMCMC(xList,EPSfileName,PDFfileName,plotInd = 1,parNames,columnHeadings, colourVersion = TRUE,credLevel = 0.95,columnHeadCex = NULL,paletteNum = 1, numerSummCex = NULL,BGRsttPos = 10,BGRyRange = c(0.95,1.25),BGRtickPos = 1.2, BGRlogTransf,BGRlogitTransf,KDExlim,KDEvertLine = TRUE,KDEvertLineCol = "black", addTruthToKDE = NULL)
xList |
list of matrices, where each matrix corresponds to a different chain, and the columns of each matrix correspond to different parameters. The matrices each have dimension "numMCMC" by "numParms"; where "numMCMC" is the size of the MCMC sample and "numParms" is the number of parameters being summarized. |
EPSfileName |
filename if the summary is to be saved as a (encapsulated) Postscript file. If this argument and 'PDFfileName' are both not specified then the summary is printed to the screen. |
PDFfileName |
filename if the summary is to be saved as a PDF file. If this argument and 'EPSfileName' are both not specified then the summary is printed to the screen. |
plotInd |
if "numChains" exceeds 1 then this indicates which chain is summarized in the non-BGR panels. The BGR panels are Brooks-Gelman-Rubin diagnostic plots are use all chains. The default value is 1. |
parNames |
list containing a vector of character strings for the parameter names. The maximum length of the vector is 3. |
columnHeadings |
vector containing column headings. The default is: c("parameter","trace","lag 1","acf","BGR","density","summary"). |
colourVersion |
logical flag indicating if summary should be in colour. The default is TRUE. |
credLevel |
number between 0 and 1 specifying the credible set level. The default is 0.95. |
columnHeadCex |
positive number specifying character expansion factor for the column headings. |
paletteNum |
either 1 or 2 to specify the colour palette to be used. |
numerSummCex |
positive number specifying character expansion factor for the numerical summary (last column). |
BGRsttPos |
starting position for the Brooks-Gelman-Rubin plots. The default value is 10. |
BGRyRange |
vertical axis limits for the Brooks-Gelman-Rubin plots. The default value is c(0.95,1.25). |
BGRtickPos |
position of tick mark on vertical axis for the Brooks-Gelman-Rubin plots. The default value is 1.2. |
BGRlogTransf |
vector containing indices of those parameters for which the Brooks-Gelman-Rubin plots should be done on a logarithmic scale. |
BGRlogitTransf |
vector containing indices of those parameters for which the Brooks-Gelman-Rubin plots should be done on a logit scale. |
KDExlim |
list of vectors of length 2 specifying the horizontal axis limits for the kernel density estimates. |
KDEvertLine |
logical flag indicating if a vertical line at zero should be added to the kernel density estimates. The default value is TRUE. |
KDEvertLineCol |
Colour of the vertical line at zero for kernel density estimates. The default value is "black". |
addTruthToKDE |
Vector indicating ‘true’ values of parameters. The default value is NULL. If 'addTruthToKDE' is non-NULL then dashed vertical lines corresponding to true values (if known from simulation) are added. |
Matt Wand [email protected]
library(HRW) xListSingleChain <- list(cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100))) summMCMC(xListSingleChain,parNames = list("par1","par2","par3","par4")) xListMultipleChains <- list(chain1 = cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100)), chain2 = cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100))) summMCMC(xListMultipleChains,parNames = list("par1","par2","par3","par4"))library(HRW) xListSingleChain <- list(cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100))) summMCMC(xListSingleChain,parNames = list("par1","par2","par3","par4")) xListMultipleChains <- list(chain1 = cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100)), chain2 = cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100))) summMCMC(xListMultipleChains,parNames = list("par1","par2","par3","par4"))
The SydneyRealEstate data frame has data on several variables concerning houses sold in Sydney, Australia,
during 2001.
data(SydneyRealEstate)data(SydneyRealEstate)
This data frame contains the following columns:
logSalePricenatural logarithm of sale price in Australian dollars.
lotSizelot size in square meters but with some imputation.
longitudedegrees longitude of location of house.
latitudedegrees latitude of location of house.
saleDatesale date in dd/mm/yyy format.
saleQtrfinancial quarter in which sale took place.
infRateinflation rate measure as a percentage.
postCodefour-digit post code of the suburb in which the house located.
crimeDensitycrime density measure for the suburb in which the house is located.
crimeRatecrime rate measure for the suburb in which the house is located.
incomeaverage weekly income of the suburb in which the house is located.
distToBusStopdistance from house to the nearest bus stop (kilometers).
distToCoastlinedistance from house to the nearest coastline location (kilometers).
distToNatParkdistance from house to the nearest national park (kilometers).
distToParkdistance from house to the nearest park (kilometers).
distToRailLinedistance from house to the nearest railway line (kilometers).
distToRailStationdistance from house to the nearest railway station (kilometers).
distToHighwaydistance from house to the nearest highway (kilometers).
distToFreewaydistance from house to the nearest freeway (kilometers).
distToTunneldistance from house to the Sydney Harbour Tunnel (kilometers).
distToMainRoaddistance from house to the nearest main road (kilometers).
distToSealedRoaddistance from house to the nearest sealed road (kilometers).
distToUnsealedRoaddistance from house to the nearest sealed road (kilometers).
airNoiseaircraft noise exposure measure.
foreignerRatioproportion of foreigners in the suburb in which the house is located.
distToGPOdistance from the house to the General Post Office in Sydney's central business district (kilometers).
NOnitrous oxide level recorded at the air pollution monitoring station nearest to the house.
NO2nitrogen dioxide level recorded at the air pollution monitoring station nearest to the house.
ozoneozone level recorded at the air pollution monitoring station nearest to the house.
nephnephelometer suspended matter measurement recorded at the air pollution monitoring station nearest to the house.
PM10particulate matter with a diameter of under 10 micrometers level recorded at the air pollution monitoring station nearest to the house.
SO2sulphur dioxide level recorded at the air pollution monitoring station nearest to the house.
distToAmbulancedistance from house to the nearest ambulance station (kilometers).
distToFactorydistance from house to the nearest factory (kilometers).
distToFerrydistance from house to the nearest ferry wharf (kilometers).
distToHospitaldistance from house to the nearest hospital (kilometers).
distToMedicaldistance from house to the nearest medical services (kilometers).
distToSchooldistance from house to the nearest school (kilometers).
distToUniversitydistance from house to the nearest university (kilometers).
Chernih, A. and Sherris, M. (2004). Geoadditive hedonic pricing models. Unpublished manuscript. University of New South Wales, Australia.
library(HRW) ; data(SydneyRealEstate) ## Not run: for (colNum in setdiff((2:39),c(5,8))) { plot(jitter(SydneyRealEstate[,colNum]),SydneyRealEstate$logSalePrice,pch = ".", xlab = names(SydneyRealEstate)[colNum],ylab = "log(sale price)",col = "blue") readline("Hit Enter to continue.\n") } ## End(Not run)library(HRW) ; data(SydneyRealEstate) ## Not run: for (colNum in setdiff((2:39),c(5,8))) { plot(jitter(SydneyRealEstate[,colNum]),SydneyRealEstate$logSalePrice,pch = ".", xlab = names(SydneyRealEstate)[colNum],ylab = "log(sale price)",col = "blue") readline("Hit Enter to continue.\n") } ## End(Not run)
A two-column data frame containing the (longitude,latitude) pairs for the vertices of a 202-sided polygon. The polygon was created manually using the HRW package function createBdry(). The polygon tightly encompasses the majority of the (longitude,latitude) data of the HRW package 'SydneyRealEstate' data frame and approximately corresponds to the residential parts of Sydney, Australia.
data(SydneyRealEstateBdry)data(SydneyRealEstateBdry)
A data frame with 203 observations on the following 2 variables (note that the first vertex is repeated at the end of the data frame):
longitudelongitudinal position of a vertex.
latitudelatitudinal position of the same vertex.
SydneyRealEstate
library(HRW) ; data(SydneyRealEstateBdry) ; data(SydneyRealEstate) plot(SydneyRealEstate$longitude,SydneyRealEstate$latitude,cex = 0.1) lines(SydneyRealEstateBdry,lwd = 5,col = "red")library(HRW) ; data(SydneyRealEstateBdry) ; data(SydneyRealEstate) plot(SydneyRealEstate$longitude,SydneyRealEstate$latitude,cex = 0.1) lines(SydneyRealEstateBdry,lwd = 5,col = "red")
One-month maturity U.S. Treasury rate during the period 2001-2013.
data(TreasuryRate)data(TreasuryRate)
A data frame with 3,117 observations on each of the following 2 variables:
datedays from July 31, 2001 until July 10, 2013 with the occasional missing values due to holidays.
ratedaily one-month maturity U.S. Treasury rate.
Federal Reserve Bank of St. Louis, U.S.A.
library(HRW) ; data(TreasuryRate) TRdate <- as.Date(TreasuryRate$date,"%m/%d/%Y")[!is.na(TreasuryRate$rate)] TRrate <- TreasuryRate$rate[!is.na(TreasuryRate$rate)] plot(TRdate,TRrate,type = "l",bty = "l",xlab = "date",ylab = "U.S. Treasury rate")library(HRW) ; data(TreasuryRate) TRdate <- as.Date(TreasuryRate$date,"%m/%d/%Y")[!is.na(TreasuryRate$rate)] TRrate <- TreasuryRate$rate[!is.na(TreasuryRate$rate)] plot(TRdate,TRrate,type = "l",bty = "l",xlab = "date",ylab = "U.S. Treasury rate")
The UtahPEF data frame data contains longitudinal data on peak expiratory flow, air pollution and temperature for a cohort of 41 schoolchildren in the Utah Valley, U.S.A., during 1990-1991.
data(UtahPEF)data(UtahPEF)
This data frame contains the following columns:
idnumschoolchild identification number.
devPEFdaily peak expiratory flow measurements for each schoolchild minus the overall average for that schoolchild.
PM10withMA55-day moving average of the concentration of particulate matter 10 micrometers or less in diameter.
lowTemplowest temperature in degrees Fahrenheit on the day of recording.
timeTrendday of the study on which the measurements were made.
Pope, C.A., Dockery, D.W., Spengler, J.D. and Raizenne, M.E. (1991). Respiratory health and PM_10 pollution: a daily time series analysis. American Review of Respiratory Disease, 144, 668-674.
library(HRW) ; data(UtahPEF) if (require("lattice")) print(xyplot(devPEF ~ PM10withMA5|as.factor(idnum),data = UtahPEF))library(HRW) ; data(UtahPEF) if (require("lattice")) print(xyplot(devPEF ~ PM10withMA5|as.factor(idnum),data = UtahPEF))
'WarsawApts' is a subset of the data set 'apartments' in the R package PBImisc. This dataset contains the prices of the apartments which were sold in Warsaw, Poland, during the calendar years 2007 to 2009.
data(WarsawApts)data(WarsawApts)
A data frame with 409 observations on the following 6 variables:
surfacearea of the apartment in square meters.
districta factor corresponding to the district of Warsaw with levels Mokotow, Srodmiescie, Wola and Zoliborz.
n.roomsnumber of rooms in the apartment.
floorfloor on which the apartment is located.
construction.dateyear that the apartment was constructed.
areaPerMzlotyarea in square meters per million zloty.
The Polish real estate web-site https://www.oferty.net.
Biecek, P. (2016). PBImisc: A set of datasets in My Classes or in the Book 'Modele Liniowe i Mieszane w R, Wraz z Przykladami w Analizie Danych' 1.0.
library(HRW) ; data(WarsawApts) x <- WarsawApts$construction.date y <- WarsawApts$areaPerMzloty plot(x,y,bty = "l",col = "dodgerblue") if (require("mgcv")) { fitGAMcr <- gam(y ~ s(x,bs = "cr",k = 30)) xg <- seq(min(x),max(x),length = 1001) fHatgGAMcr <- predict(fitGAMcr,newdata = data.frame(x = xg)) lines(xg,fHatgGAMcr,col = "darkgreen") }library(HRW) ; data(WarsawApts) x <- WarsawApts$construction.date y <- WarsawApts$areaPerMzloty plot(x,y,bty = "l",col = "dodgerblue") if (require("mgcv")) { fitGAMcr <- gam(y ~ s(x,bs = "cr",k = 30)) xg <- seq(min(x),max(x),length = 1001) fHatgGAMcr <- predict(fitGAMcr,newdata = data.frame(x = xg)) lines(xg,fHatgGAMcr,col = "darkgreen") }
U.S., European and Japanese yield curves. These are functions of maturity. This dataset has 91 columns. The first column is the date, columns 2 to 31 are European yields at maturities from 1 to 30 years, columns 32 to 61 are Japanese yields at these maturities, and columns 62 to 91 are U.S. yields at the same maturities.
data(yields)data(yields)
A data frame with 1565 observations on the following 91 variables:
datedate when the yield is measured.
EU01European yield at a maturity of 1 year.
EU02European yield at a maturity of 2 years.
EU03European yield at a maturity of 3 years.
EU04European yield at a maturity of 4 years.
EU05European yield at a maturity of 5 years.
EU06European yield at a maturity of 6 years.
EU07European yield at a maturity of 7 years.
EU08European yield at a maturity of 8 years.
EU09European yield at a maturity of 9 years.
EU10European yield at a maturity of 10 years.
EU11European yield at a maturity of 11 years.
EU12European yield at a maturity of 12 years.
EU13European yield at a maturity of 13 years.
EU14European yield at a maturity of 14 years.
EU15European yield at a maturity of 15 years.
EU16European yield at a maturity of 16 years.
EU17European yield at a maturity of 17 years.
EU18European yield at a maturity of 18 years.
EU19European yield at a maturity of 19 years.
EU20European yield at a maturity of 20 years.
EU21European yield at a maturity of 21 years.
EU22European yield at a maturity of 22 years.
EU23European yield at a maturity of 23 years.
EU24European yield at a maturity of 24 years.
EU25European yield at a maturity of 25 years.
EU26European yield at a maturity of 26 years.
EU27European yield at a maturity of 27 years.
EU28European yield at a maturity of 28 years.
EU29European yield at a maturity of 29 years.
EU30European yield at a maturity of 30 years.
JP01Japanese yield at a maturity of 1 year.
JP02Japanese yield at a maturity of 2 years.
JP03Japanese yield at a maturity of 3 years.
JP04Japanese yield at a maturity of 4 years.
JP05Japanese yield at a maturity of 5 years.
JP06Japanese yield at a maturity of 6 years.
JP07Japanese yield at a maturity of 7 years.
JP08Japanese yield at a maturity of 8 years.
JP09Japanese yield at a maturity of 9 years.
JP10Japanese yield at a maturity of 10 years.
JP11Japanese yield at a maturity of 11 years.
JP12Japanese yield at a maturity of 12 years.
JP13Japanese yield at a maturity of 13 years.
JP14Japanese yield at a maturity of 14 years.
JP15Japanese yield at a maturity of 15 years.
JP16Japanese yield at a maturity of 16 years.
JP17Japanese yield at a maturity of 17 years.
JP18Japanese yield at a maturity of 18 years.
JP19Japanese yield at a maturity of 19 years.
JP20Japanese yield at a maturity of 20 years.
JP21Japanese yield at a maturity of 21 years.
JP22Japanese yield at a maturity of 22 years.
JP23Japanese yield at a maturity of 23 years.
JP24Japanese yield at a maturity of 24 years.
JP25Japanese yield at a maturity of 25 years.
JP26Japanese yield at a maturity of 26 years.
JP27Japanese yield at a maturity of 27 years.
JP28Japanese yield at a maturity of 28 years.
JP29Japanese yield at a maturity of 29 years.
JP30Japanese yield at a maturity of 30 years.
US01U.S. yield at a maturity of 1 year.
US02U.S. yield at a maturity of 2 years.
US03U.S. yield at a maturity of 3 years.
US04U.S. yield at a maturity of 4 years.
US05U.S. yield at a maturity of 5 years.
US06U.S. yield at a maturity of 6 years.
US07U.S. yield at a maturity of 7 years.
US08U.S. yield at a maturity of 8 years.
US09U.S. yield at a maturity of 9 years.
US10U.S. yield at a maturity of 10 years.
US11U.S. yield at a maturity of 11 years.
US12U.S. yield at a maturity of 12 years.
US13U.S. yield at a maturity of 13 years.
US14U.S. yield at a maturity of 14 years.
US15U.S. yield at a maturity of 15 years.
US16U.S. yield at a maturity of 16 years.
US17U.S. yield at a maturity of 17 years.
US18U.S. yield at a maturity of 18 years.
US19U.S. yield at a maturity of 19 years.
US20U.S. yield at a maturity of 20 years.
US21U.S. yield at a maturity of 21 years.
US22U.S. yield at a maturity of 22 years.
US23U.S. yield at a maturity of 23 years.
US24U.S. yield at a maturity of 24 years.
US25U.S. yield at a maturity of 25 years.
US26U.S. yield at a maturity of 26 years.
US27U.S. yield at a maturity of 27 years.
US28U.S. yield at a maturity of 28 years.
US29U.S. yield at a maturity of 29 years.
US30U.S. yield at a maturity of 30 years.
European Central Bank (European yields), Japanese Ministry of Finance (Japanese yields) and U.S. Federal Reserve Board (U.S. yields).
library(HRW) ; data(yields) t <- 1:30 ; yieldsCleaned <- na.omit(yields)[,-1] plot(t,yieldsCleaned[1,61:90], type="l",ylim = c(0,6),lwd = 2, xlab = "maturity (years)",ylab = "yield", bty = "l",cex.lab = 1.5,cex.axis = 1.5) for (i in 2:14) lines(t,yieldsCleaned[100*i+1,61:90],col = i,lwd = 2)library(HRW) ; data(yields) t <- 1:30 ; yieldsCleaned <- na.omit(yields)[,-1] plot(t,yieldsCleaned[1,61:90], type="l",ylim = c(0,6),lwd = 2, xlab = "maturity (years)",ylab = "yield", bty = "l",cex.lab = 1.5,cex.axis = 1.5) for (i in 2:14) lines(t,yieldsCleaned[100*i+1,61:90],col = i,lwd = 2)
Constructs a design matrix consisting of O'Sullivan cubic spline functions of an array of abscissae. Typicially the array corresponds to either observed values of a predictor or an abscissa grid for plotting purposes.
ZOSull(x,range.x,intKnots,drv = 0)ZOSull(x,range.x,intKnots,drv = 0)
x |
array of abscissae. |
range.x |
array of length 2 such that range.x[1] <= min(x) and range.x[2] >= max(x). |
intKnots |
ordered array of length smaller than that of x and consisting of unique numbers between min(x) and max(x) that specifies the positions of internal knots, that define the spline basis (see the Wand and Ormerod (2008) reference below for full mathematical details). |
drv |
either 0,1 or 2 with a default value of 0. If drv = 1 then the first derivatives of the O'Sullivan spline basis functions are computed instead. Similarly, if drv = 2 then the second derivatives are computed. |
A matrix of with length(x) rows and (length(intKnots) + 2) columns with each column containing a separate O'Sullivan spline basis function (determined by range.x and intKnots) of the abscissae in x. The values of range.x and intKnots are included as attributes of the fit object.
Matt Wand [email protected]
O'Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion). Statistical Science, 1, 505-527.
Wand, M.P. and Ormerod, J.T. (2008). On semiparametric regression with O'Sullivan penalized splines. Australian and New Zealand Journal of Statistics. 50, 179-198.
library(HRW) x <- WarsawApts$construction.date a <- 1.01*min(x) - 0.01*max(x) b <- 1.01*max(x) - 0.01*min(x) ; numIntKnots <- 23 intKnots <- quantile(unique(x),seq(0,1,length = (numIntKnots + 2))[-c(1,(numIntKnots + 2))]) xg <- seq(a,b,length = 1001) Zg <- ZOSull(xg,range.x = c(a,b),intKnots = intKnots) plot(0,type = "n",xlim = range(xg),ylim = range(Zg), bty = "l",xlab = "construction date (year)", ylab = "spline basis function") for (k in 1:ncol(Zg)) lines(xg,Zg[,k],col = k,lwd = 2) lines(c(min(xg),max(xg)),rep(0,2),col = "darkmagenta") for (k in 1:numIntKnots) points(intKnots[k],0,pch = 18,cex = 2,col = "darkmagenta")library(HRW) x <- WarsawApts$construction.date a <- 1.01*min(x) - 0.01*max(x) b <- 1.01*max(x) - 0.01*min(x) ; numIntKnots <- 23 intKnots <- quantile(unique(x),seq(0,1,length = (numIntKnots + 2))[-c(1,(numIntKnots + 2))]) xg <- seq(a,b,length = 1001) Zg <- ZOSull(xg,range.x = c(a,b),intKnots = intKnots) plot(0,type = "n",xlim = range(xg),ylim = range(Zg), bty = "l",xlab = "construction date (year)", ylab = "spline basis function") for (k in 1:ncol(Zg)) lines(xg,Zg[,k],col = k,lwd = 2) lines(c(min(xg),max(xg)),rep(0,2),col = "darkmagenta") for (k in 1:numIntKnots) points(intKnots[k],0,pch = 18,cex = 2,col = "darkmagenta")