Type: | Package |
Title: | Basic Statistics and Data Analysis |
Version: | 1.2.2 |
Date: | 2023-09-14 |
LazyData: | yes |
Maintainer: | Alan T. Arnholt <arnholtat@appstate.edu> |
Description: | Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens. |
Depends: | lattice, R (≥ 2.10) |
Imports: | e1071 |
License: | GPL-3 |
Suggests: | ggplot2 (≥ 2.1.0), dplyr, tidyr |
RoxygenNote: | 7.2.3 |
Encoding: | UTF-8 |
URL: | https://github.com/alanarnholt/BSDA, https://alanarnholt.github.io/BSDA/ |
BugReports: | https://github.com/alanarnholt/BSDA/issues |
NeedsCompilation: | no |
Packaged: | 2023-09-18 13:43:50 UTC; arnholtat |
Author: | Alan T. Arnholt [aut, cre], Ben Evans [aut] |
Repository: | CRAN |
Date/Publication: | 2023-09-18 17:50:05 UTC |
Daily price returns (in pence) of Abbey National shares between 7/31/91 and 10/8/91
Description
Data used in problem 6.39
Usage
Abbey
Format
A data frame/tibble with 50 observations on one variable
- price
daily price returns (in pence) of Abbey National shares
Source
Buckle, D. (1995), Bayesian Inference for Stable Distributions, Journal of the American Statistical Association, 90, 605-613.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Abbey$price)
qqline(Abbey$price)
t.test(Abbey$price, mu = 300)
hist(Abbey$price, main = "Exercise 6.39",
xlab = "daily price returns (in pence)",
col = "blue")
Three samples to illustrate analysis of variance
Description
Data used in Exercise 10.1
Usage
Abc
Format
A data frame/tibble with 54 observations on two variables
- response
a numeric vector
- group
a character vector
A
,B
, andC
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(response ~ group, col=c("red", "blue", "green"), data = Abc )
anova(lm(response ~ group, data = Abc))
Crimes reported in Abilene, Texas
Description
Data used in Exercise 1.23 and 2.79
Usage
Abilene
Format
A data frame/tibble with 16 observations on three variables
- crimetype
a character variable with values
Aggravated assault
,Arson
,Burglary
,Forcible rape
,Larceny theft
,Murder
,Robbery
, andVehicle theft
.- year
a factor with levels
1992
and1999
- number
number of reported crimes
Source
Uniform Crime Reports, US Dept. of Justice.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
par(mfrow = c(2, 1))
barplot(Abilene$number[Abilene$year=="1992"],
names.arg = Abilene$crimetype[Abilene$year == "1992"],
main = "1992 Crime Stats", col = "red")
barplot(Abilene$number[Abilene$year=="1999"],
names.arg = Abilene$crimetype[Abilene$year == "1999"],
main = "1999 Crime Stats", col = "blue")
par(mfrow = c(1, 1))
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Abilene, aes(x = crimetype, y = number, fill = year)) +
geom_bar(stat = "identity", position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 30, hjust = 1))
## End(Not run)
Perceived math ability for 13-year olds by gender
Description
Data used in Exercise 8.57
Usage
Ability
Format
A data frame/tibble with 400 observations on two variables
- gender
a factor with levels
girls
andboys
- ability
a factor with levels
hopeless
,belowavg
,average
,aboveavg
, andsuperior
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
CT <- xtabs(~gender + ability, data = Ability)
CT
chisq.test(CT)
Abortion rate by region of country
Description
Data used in Exercise 8.51
Usage
Abortion
Format
A data frame/tibble with 51 observations on the following 10 variables:
- state
a character variable with values
alabama
,alaska
,arizona
,arkansas
,california
,colorado
,connecticut
,delaware
,dist of columbia
,florida,
georgia
,hawaii
,idaho
,illinois
,indiana
,iowa
,kansas
,kentucky
,louisiana
,maine
,maryland
,massachusetts
,michigan
,minnesota
,mississippi
,missouri
,montana
,nebraska
,nevada
,new hampshire
,new jersey
,new mexico
,new york
,north carolina
,north dakota
,ohio
,oklahoma
,oregon
,pennsylvania
,rhode island
,south carolina
,south dakota
,tennessee
,texas
,utah
,vermont
,virginia
,washington
,west virginia
,wisconsin
, andwyoming
- region
a character variable with values
midwest
northeast
south
west
- regcode
a numeric vector
- rate1988
a numeric vector
- rate1992
a numeric vector
- rate1996
a numeric vector
- provide1988
a numeric vector
- provide1992
a numeric vector
- lowhigh
a numeric vector
- rate
a factor with levels
Low
andHigh
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~region + rate, data = Abortion)
T1
chisq.test(T1)
Number of absent days for 20 employees
Description
Data used in Exercise 1.28
Usage
Absent
Format
A data frame/tibble with 20 observations on one variable
- days
days absent
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
CT <- xtabs(~ days, data = Absent)
CT
barplot(CT, col = "pink", main = "Exercise 1.28")
plot(ecdf(Absent$days), main = "ECDF")
Math achievement test scores by gender for 25 high school students
Description
Data used in Example 7.14 and Exercise 10.7
Usage
Achieve
Format
A data frame/tibble with 25 observations on two variables
- score
mathematics achiement score
- gender
a factor with 2 levels
boys
andgirls
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
anova(lm(score ~ gender, data = Achieve))
t.test(score ~ gender, var.equal = TRUE, data = Achieve)
Number of ads versus number of sales for a retailer of satellite dishes
Description
Data used in Exercise 9.15
Usage
Adsales
Format
A data frame/tibble with six observations on three variables
- month
a character vector listing month
- ads
a numeric vector containing number of ads
- sales
a numeric vector containing number of sales
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(sales ~ ads, data = Adsales, main = "Exercise 9.15")
mod <- lm(sales ~ ads, data = Adsales)
abline(mod, col = "red")
summary(mod)
predict(mod, newdata = data.frame(ads = 6), interval = "conf", level = 0.99)
Agressive tendency scores for a group of teenage members of a street gang
Description
Data used in Exercises 1.66 and 1.81
Usage
Aggress
Format
A data frame/tibble with 28 observations on one variable
- aggres
measure of aggresive tendency, ranging from 10-50
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
with(data = Aggress,
EDA(aggres))
# OR
IQR(Aggress$aggres)
diff(range(Aggress$aggres))
Monthly payments per person for families in the AFDC federal program
Description
Data used in Exercises 1.91 and 3.68
Usage
Aid
Format
A data frame/tibble with 51 observations on two variables
- state
a factor with levels
Alabama
,Alaska
,Arizona
,Arkansas
,California
,Colorado
,Connecticut
,Delaware
,District of Colunbia
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
,Kansas
,Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missour
,Montana
,Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
,Washington
,West Virginia
,Wisconsin
, andWyoming
- payment
average monthly payment per person in a family
Source
US Department of Health and Human Services, 1993.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Aid$payment, xlab = "payment", main =
"Average monthly payment per person in a family",
col = "lightblue")
boxplot(Aid$payment, col = "lightblue")
dotplot(state ~ payment, data = Aid)
Incubation times for 295 patients thought to be infected with HIV by a blood transfusion
Description
Data used in Exercise 6.60
Usage
Aids
Format
A data frame/tibble with 295 observations on three variables
- duration
time (in months) from HIV infection to the clinical manifestation of full-blown AIDS
- age
age (in years) of patient
- group
a numeric vector
Source
Kalbsleich, J. and Lawless, J., (1989), An analysis of the data on transfusion related AIDS, Journal of the American Statistical Association, 84, 360-372.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
with(data = Aids,
EDA(duration)
)
with(data = Aids,
t.test(duration, mu = 30, alternative = "greater")
)
with(data = Aids,
SIGN.test(duration, md = 24, alternative = "greater")
)
Aircraft disasters in five different decades
Description
Data used in Exercise 1.12
Usage
Airdisasters
Format
A data frame /tibble with 141 observations on the following seven variables
- year
a numeric vector indicating the year of an aircraft accident
- deaths
a numeric vector indicating the number of deaths of an aircraft accident
- decade
a character vector indicating the decade of an aircraft accident
Source
2000 World Almanac and Book of Facts.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
par(las = 1)
stripchart(deaths ~ decade, data = Airdisasters,
subset = decade != "1930s" & decade != "1940s",
method = "stack", pch = 19, cex = 0.5, col = "red",
main = "Aircraft Disasters 1950 - 1990",
xlab = "Number of fatalities")
par(las = 0)
Percentage of on-time arrivals and number of complaints for 11 airlines
Description
Data for Example 2.9
Usage
Airline
Format
A data frame/tibble with 11 observations on three variables
- airline
a charater variable with values
Alaska
,Amer West
,American
,Continental
,Delta
,Northwest
,Pan Am
,Southwest
,TWA
,United
, andUSAir
- ontime
a numeric vector
- complaints
complaints per 1000 passengers
Source
Transportation Department.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
with(data = Airline,
barplot(complaints, names.arg = airline, col = "lightblue",
las = 2)
)
plot(complaints ~ ontime, data = Airline, pch = 19, col = "red",
xlab = "On time", ylab = "Complaints")
Ages at which 14 female alcoholics began drinking
Description
Data used in Exercise 5.79
Usage
Alcohol
Format
A data frame/tibble with 14 observations on one variable
- age
age when individual started drinking
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Alcohol$age)
qqline(Alcohol$age)
SIGN.test(Alcohol$age, md = 20, conf.level = 0.99)
Allergy medicines by adverse events
Description
Data used in Exercise 8.22
Usage
Allergy
Format
A data frame/tibble with 406 observations on two variables
- event
a factor with levels
insomnia
,headache
, anddrowsiness
- medication
a factor with levels
seldane-d
,pseudoephedrine
, andplacebo
Source
Marion Merrel Dow, Inc. Kansas City, Mo. 64114.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~event + medication, data = Allergy)
T1
chisq.test(T1)
Recovery times for anesthetized patients
Description
Data used in Exercise 5.58
Usage
Anesthet
Format
A with 10 observations on one variable
- recover
recovery time (in hours)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Anesthet$recover)
qqline(Anesthet$recover)
with(data = Anesthet,
t.test(recover, conf.level = 0.90)$conf
)
Math test scores versus anxiety scores before the test
Description
Data used in Exercise 2.96
Usage
Anxiety
Format
A data frame/tibble with 20 observations on two variables
- anxiety
anxiety score before a major math test
- math
math test score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(math ~ anxiety, data = Anxiety, ylab = "score",
main = "Exercise 2.96")
with(data = Anxiety,
cor(math, anxiety)
)
linmod <- lm(math ~ anxiety, data = Anxiety)
abline(linmod, col = "purple")
summary(linmod)
Level of apolipoprotein B and number of cups of coffee consumed per day for 15 adult males
Description
Data used in Examples 9.2 and 9.9
Usage
Apolipop
Format
A data frame/tibble with 15 observations on two variables
- coffee
number of cups of coffee per day
- apolipB
level of apoliprotein B
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(apolipB ~ coffee, data = Apolipop)
linmod <- lm(apolipB ~ coffee, data = Apolipop)
summary(linmod)
summary(linmod)$sigma
anova(linmod)
anova(linmod)[2, 3]^.5
par(mfrow = c(2, 2))
plot(linmod)
par(mfrow = c(1, 1))
Median costs of an appendectomy at 20 hospitals in North Carolina
Description
Data for Exercise 1.119
Usage
Append
Format
A data frame/tibble with 20 observations on one variable
- fee
fees for an appendectomy for a random sample of 20 hospitals in North Carolina
Source
North Carolina Medical Database Commission, August 1994.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
fee <- Append$fee
ll <- mean(fee) - 2*sd(fee)
ul <- mean(fee) + 2*sd(fee)
limits <-c(ll, ul)
limits
fee[fee < ll | fee > ul]
Median costs of appendectomies at three different types of North Carolina hospitals
Description
Data for Exercise 10.60
Usage
Appendec
Format
A data frame/tibble with 59 observations on two variables
- cost
median costs of appendectomies at hospitals across the state of North Carolina in 1992
- region
a vector classifying each hospital as rural, regional, or metropolitan
Source
Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(cost ~ region, data = Appendec, col = c("red", "blue", "cyan"))
anova(lm(cost ~ region, data = Appendec))
Aptitude test scores versus productivity in a factory
Description
Data for Exercises 2.1, 2.26, 2.35 and 2.51
Usage
Aptitude
Format
A data frame/tibble with 8 observations on two variables
- aptitude
aptitude test scores
- product
productivity scores
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(product ~ aptitude, data = Aptitude, main = "Exercise 2.1")
model1 <- lm(product ~ aptitude, data = Aptitude)
model1
abline(model1, col = "red", lwd=3)
resid(model1)
fitted(model1)
cor(Aptitude$product, Aptitude$aptitude)
Radiocarbon ages of observations taken from an archaeological site
Description
Data for Exercises 5.120, 10.20 and Example 1.16
Usage
Archaeo
Format
A data frame/tibble with 60 observations on two variables
- age
number of years before 1983 - the year the data were obtained
- phase
Ceramic Phase numbers
Source
Cunliffe, B. (1984) and Naylor and Smith (1988).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(age ~ phase, data = Archaeo, col = "yellow",
main = "Example 1.16", xlab = "Ceramic Phase", ylab = "Age")
anova(lm(age ~ as.factor(phase), data= Archaeo))
Time of relief for three treatments of arthritis
Description
Data for Exercise 10.58
Usage
Arthriti
Format
A data frame/tibblewith 51 observations on two variables
- time
time (measured in days) until an arthritis sufferer experienced relief
- treatment
a factor with levels
A
,B
, andC
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(time ~ treatment, data = Arthriti,
col = c("lightblue", "lightgreen", "yellow"),
ylab = "days")
anova(lm(time ~ treatment, data = Arthriti))
Durations of operation for 15 artificial heart transplants
Description
Data for Exercise 1.107
Usage
Artifici
Format
A data frame/tibble with 15 observations on one variable
- duration
duration (in hours) for transplant
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Artifici$duration, 2)
summary(Artifici$duration)
values <- Artifici$duration[Artifici$duration < 6.5]
values
summary(values)
Dissolving time versus level of impurities in aspirin tablets
Description
Data for Exercise 10.51
Usage
Asprin
Format
A data frame/tibble with 15 observations on two variables
- time
time (in seconds) for aspirin to dissolve
- impurity
impurity of an ingredient with levels
1%
,5%
, and10%
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(time ~ impurity, data = Asprin,
col = c("red", "blue", "green"))
Asthmatic relief index on nine subjects given a drug and a placebo
Description
Data for Exercise 7.52
Usage
Asthmati
Format
A data frame/tibble with nine observations on three variables
- drug
asthmatic relief index for patients given a drug
- placebo
asthmatic relief index for patients given a placebo
- difference
difference between the
placebo
anddrug
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Asthmati$difference)
qqline(Asthmati$difference)
shapiro.test(Asthmati$difference)
with(data = Asthmati,
t.test(placebo, drug, paired = TRUE, mu = 0, alternative = "greater")
)
Number of convictions reported by U.S. attorney's offices
Description
Data for Example 2.2 and Exercises 2.43 and 2.57
Usage
Attorney
Format
A data frame/tibble with 88 observations on three variables
- staff
U.S. attorneys' office staff per 1 million population
- convict
U.S. attorneys' office convictions per 1 million population
- district
a factor with levels
Albuquerque
,Alexandria, Va
,Anchorage
,Asheville, NC
,Atlanta
,Baltimore
,Baton Rouge
,Billings, Mt
,Birmingham, Al
,Boise, Id
,Boston
,Buffalo
,Burlington, Vt
,Cedar Rapids
,Charleston, WVA
,Cheyenne, Wy
,Chicago
,Cincinnati
,Cleveland
,Columbia, SC
,Concord, NH
,Denver
,Des Moines
,Detroit
,East St. Louis
,Fargo, ND
,Fort Smith, Ark
,Fort Worth
,Grand Rapids, Mi
,Greensboro, NC
,Honolulu
,Houston
,Indianapolis
,Jackson, Miss
,Kansas City
,Knoxville, Tn
,Las Vegas
,Lexington, Ky
,Little Rock
,Los Angeles
,Louisville
,Memphis
,Miami
,Milwaukee
,Minneapolis
,Mobile, Ala
,Montgomery, Ala
,Muskogee, Ok
,Nashville
,New Haven, Conn
,New Orleans
,New York (Brooklyn)
,New York (Manhattan)
,Newark, NJ
,Oklahoma City
,Omaha
,Oxford, Miss
,Pensacola, Fl
,Philadelphia
,Phoenix
,Pittsburgh
,Portland, Maine
,Portland, Ore
,Providence, RI
,Raleigh, NC
,Roanoke, Va
,Sacramento
,Salt Lake City
,San Antonio
,San Diego
,San Francisco
,Savannah, Ga
,Scranton, Pa
,Seattle
,Shreveport, La
,Sioux Falls, SD
,South Bend, Ind
,Spokane, Wash
,Springfield, Ill
,St. Louis
,Syracuse, NY
,Tampa
,Topeka, Kan
,Tulsa
,Tyler, Tex
,Washington
,Wheeling, WVa
, andWilmington, Del
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
par(mfrow=c(1, 2))
plot(convict ~ staff, data = Attorney, main = "With Washington, D.C.")
plot(convict[-86] ~staff[-86], data = Attorney,
main = "Without Washington, D.C.")
par(mfrow=c(1, 1))
Number of defective auto gears produced by two manufacturers
Description
Data for Exercise 7.46
Usage
Autogear
Format
A data frame/tibble with 20 observations on two variables
- defectives
number of defective gears in the production of 100 gears per day
- manufacturer
a factor with levels
A
andB
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
t.test(defectives ~ manufacturer, data = Autogear)
wilcox.test(defectives ~ manufacturer, data = Autogear)
t.test(defectives ~ manufacturer, var.equal = TRUE, data = Autogear)
Illustrates inferences based on pooled t-test versus Wilcoxon rank sum test
Description
Data for Exercise 7.40
Usage
Backtoback
Format
A data frame/tibble with 24 observations on two variables
- score
a numeric vector
- group
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
wilcox.test(score ~ group, data = Backtoback)
t.test(score ~ group, data = Backtoback)
Baseball salaries for members of five major league teams
Description
Data for Exercise 1.11
Usage
Bbsalaries
Format
A data frame/tibble with 142 observations on two variables
- salary
1999 salary for baseball player
- team
a factor with levels
Angels
,Indians
,Orioles
,Redsoxs
, andWhitesoxs
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stripchart(salary ~ team, data = Bbsalaries, method = "stack",
pch = 19, col = "blue", cex = 0.75)
title(main = "Major League Salaries")
Graduation rates for student athletes and nonathletes in the Big Ten Conf.
Description
Data for Exercises 1.124 and 2.94
Usage
Bigten
Format
A data frame/tibble with 44 observations on the following four variables
- school
a factor with levels
Illinois
,Indiana
,Iowa
,Michigan
,Michigan State
,Minnesota
,Northwestern
,Ohio State
,Penn State
,Purdue
, andWisconsin
- rate
graduation rate
- year
factor with two levels
1984-1985
and1993-1994
- status
factor with two levels
athlete
andstudent
Source
NCAA Graduation Rates Report, 2000.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(rate ~ status, data = subset(Bigten, year = "1993-1994"),
horizontal = TRUE, main = "Graduation Rates 1993-1994")
with(data = Bigten,
tapply(rate, list(year, status), mean)
)
Test scores on first exam in biology class
Description
Data for Exercise 1.49
Usage
Biology
Format
A data frame/tibble with 30 observations on one variable
- score
test scores on the first test in a beginning biology class
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Biology$score, breaks = "scott", col = "brown", freq = FALSE,
main = "Problem 1.49", xlab = "Test Score")
lines(density(Biology$score), lwd=3)
Live birth rates in 1990 and 1998 for all states
Description
Data for Example 1.10
Usage
Birth
Format
A data frame/tibble with 51 observations on three variables
- state
a character with levels
Alabama
,Alaska
,Arizona
,Arkansas
,California
,Colorado
,Connecticut
,Delaware
,District of Colunbia
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
,Kansas
,Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missour
,Montana
,Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
,Washington
,West Virginia
,Wisconsin
, andWyoming
- rate
live birth rates per 1000 population
- year
a factor with levels
1990
and1998
Source
National Vital Statistics Report, 48, March 28, 2000, National Center for Health Statistics.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
rate1998 <- subset(Birth, year == "1998", select = rate)
stem(x = rate1998$rate, scale = 2)
hist(rate1998$rate, breaks = seq(10.9, 21.9, 1.0), xlab = "1998 Birth Rate",
main = "Figure 1.14 in BSDA", col = "pink")
hist(rate1998$rate, breaks = seq(10.9, 21.9, 1.0), xlab = "1998 Birth Rate",
main = "Figure 1.16 in BSDA", col = "pink", freq = FALSE)
lines(density(rate1998$rate), lwd = 3)
rm(rate1998)
Education level of blacks by gender
Description
Data for Exercise 8.55
Usage
Blackedu
Format
A data frame/tibble with 3800 observations on two variables
- gender
a factor with levels
Female
andMale
- education
a factor with levels
High school dropout
,High school graudate
,Some college
,Bachelor
's degree
, andGraduate degree
Source
Bureau of Census data.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~gender + education, data = Blackedu)
T1
chisq.test(T1)
Blood pressure of 15 adult males taken by machine and by an expert
Description
Data for Exercise 7.84
Usage
Blood
Format
A data frame/tibble with 15 observations on the following two variables
- machine
blood pressure recorded from an automated blood pressure machine
- expert
blood pressure recorded by an expert using an at-home device
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
DIFF <- Blood$machine - Blood$expert
shapiro.test(DIFF)
qqnorm(DIFF)
qqline(DIFF)
rm(DIFF)
t.test(Blood$machine, Blood$expert, paired = TRUE)
Incomes of board members from three different universities
Description
Data for Exercise 10.14
Usage
Board
Format
A data frame/tibble with 7 observations on three variables
- salary
1999 salary (in $1000) for board directors
- university
a factor with levels
A
,B
, andC
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(salary ~ university, data = Board, col = c("red", "blue", "green"),
ylab = "Income")
tapply(Board$salary, Board$university, summary)
anova(lm(salary ~ university, data = Board))
## Not run:
library(dplyr)
dplyr::group_by(Board, university) %>%
summarize(Average = mean(salary))
## End(Not run)
Bone density measurements of 35 physically active and 35 non-active women
Description
Data for Example 7.22
Usage
Bones
Format
A data frame/tibble with 70 observations on two variables
- density
bone density measurements
- group
a factor with levels
active
andnonactive
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
t.test(density ~ group, data = Bones, alternative = "greater")
t.test(rank(density) ~ group, data = Bones, alternative = "greater")
wilcox.test(density ~ group, data = Bones, alternative = "greater")
Number of books read and final spelling scores for 17 third graders
Description
Data for Exercise 9.53
Usage
Books
Format
A data frame/tibble with 17 observations on two variables
- book
number of books read
- spelling
spelling score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(spelling ~ book, data = Books)
mod <- lm(spelling ~ book, data = Books)
summary(mod)
abline(mod, col = "blue", lwd = 2)
Prices paid for used books at three different bookstores
Description
Data for Exercise 10.30 and 10.31
Usage
Bookstor
Format
A data frame/tibble with 72 observations on two variables
- dollars
money obtained for selling textbooks
- store
a factor with levels
A
,B
, andC
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(dollars ~ store, data = Bookstor,
col = c("purple", "lightblue", "cyan"))
kruskal.test(dollars ~ store, data = Bookstor)
Brain weight versus body weight of 28 animals
Description
Data for Exercises 2.15, 2.44, 2.58 and Examples 2.3 and 2.20
Usage
Brain
Format
A data frame/tibble with 28 observations on three variables
- species
a factor with levels
African elephant
,Asian Elephant
,Brachiosaurus
,Cat
,Chimpanzee
,Cow
,Diplodocus
,Donkey
,Giraffe
,Goat
,Gorilla
,Gray wolf
,Guinea Pig
,Hamster
,Horse
,Human
,Jaguar
,Kangaroo
,Mole
,Mouse
,Mt Beaver
,Pig
,Potar monkey
,Rabbit
,Rat
,Rhesus monkey
,Sheep
, andTriceratops
- bodyweight
body weight (in kg)
- brainweight
brain weight (in g)
Source
P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection (New York: Wiley, 1987).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(log(brainweight) ~ log(bodyweight), data = Brain,
pch = 19, col = "blue", main = "Example 2.3")
mod <- lm(log(brainweight) ~ log(bodyweight), data = Brain)
abline(mod, lty = "dashed", col = "blue")
Repair costs of vehicles crashed into a barrier at 5 miles per hour
Description
Data for Exercise 1.73
Usage
Bumpers
Format
A data frame/tibble with 23 observations on two variables
- car
a factor with levels
Buick Century
,Buick Skylark
,Chevrolet Cavalier
,Chevrolet Corsica
,Chevrolet Lumina
,Dodge Dynasty
,Dodge Monaco
,Ford Taurus
,Ford Tempo
,Honda Accord
,Hyundai Sonata
,Mazda 626
,Mitsubishi Galant
,Nissan Stanza
,Oldsmobile Calais
,Oldsmobile Ciere
,Plymouth Acclaim
,Pontiac 6000
,Pontiac Grand Am
,Pontiac Sunbird
,Saturn SL2
,Subaru Legacy
, andToyota Camry
- repair
total repair cost (in dollars) after crashing a car into a barrier four times while the car was traveling at 5 miles per hour
Source
Insurance Institute of Highway Safety.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Bumpers$repair)
stripchart(Bumpers$repair, method = "stack", pch = 19, col = "blue")
library(lattice)
dotplot(car ~ repair, data = Bumpers)
Attendance of bus drivers versus shift
Description
Data for Exercise 8.25
Usage
Bus
Format
A data frame/tibble with 29363 observations on two variables
- attendance
a factor with levels
absent
andpresent
- shift
a factor with levels
am
,noon
,pm
,swing
, andsplit
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~attendance + shift, data = Bus)
T1
chisq.test(T1)
Median charges for coronary bypass at 17 hospitals in North Carolina
Description
Data for Exercises 5.104 and 6.43
Usage
Bypass
Format
A data frame/tibble with 17 observations on two variables
- hospital
a factor with levels
Carolinas Med Ct
,Duke Med Ct
,Durham Regional
,Forsyth Memorial
,Frye Regional
,High Point Regional
,Memorial Mission
,Mercy
,Moore Regional
,Moses Cone Memorial
,NC Baptist
,New Hanover Regional
,Pitt Co. Memorial
,Presbyterian
,Rex
,Univ of North Carolina
, andWake County
- charge
median charge for coronary bypass
Source
Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Bypass$charge)
t.test(Bypass$charge, conf.level=.90)$conf
t.test(Bypass$charge, mu = 35000)
Estimates of costs of kitchen cabinets by two suppliers on 20 prospective homes
Description
Data for Exercise 7.83
Usage
Cabinets
Format
A data frame/tibble with 20 observations on three variables
- home
a numeric vector
- supplA
estimate for kitchen cabinets from supplier A (in dollars)
- supplB
estimate for kitchen cabinets from supplier A (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
DIF <- Cabinets$supplA - Cabinets$supplB
qqnorm(DIF)
qqline(DIF)
shapiro.test(DIF)
with(data = Cabinets,
t.test(supplA, supplB, paired = TRUE)
)
with(data = Cabinets,
wilcox.test(supplA, supplB, paired = TRUE)
)
rm(DIF)
Survival times of terminal cancer patients treated with vitamin C
Description
Data for Exercises 6.55 and 6.64
Usage
Cancer
Format
A data frame/tibble with 64 observations on two variables
- survival
survival time (in days) of terminal patients treated with vitamin C
- type
a factor indicating type of cancer with levels
breast
,bronchus
,colon
,ovary
, andstomach
Source
Cameron, E and Pauling, L. 1978. “Supplemental Ascorbate in the Supportive Treatment of Cancer.” Proceedings of the National Academy of Science, 75, 4538-4542.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(survival ~ type, Cancer, col = "blue")
stomach <- Cancer$survival[Cancer$type == "stomach"]
bronchus <- Cancer$survival[Cancer$type == "bronchus"]
boxplot(stomach, ylab = "Days")
SIGN.test(stomach, md = 100, alternative = "greater")
SIGN.test(bronchus, md = 100, alternative = "greater")
rm(bronchus, stomach)
Carbon monoxide level measured at three industrial sites
Description
Data for Exercise 10.28 and 10.29
Usage
Carbon
Format
A data frame/tibble with 24 observations on two variables
- CO
carbon monoxide measured (in parts per million)
- site
a factor with levels
SiteA
,SiteB
, andSiteC
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(CO ~ site, data = Carbon, col = "lightgreen")
kruskal.test(CO ~ site, data = Carbon)
Reading scores on the California achievement test for a group of 3rd graders
Description
Data for Exercise 1.116
Usage
Cat
Format
A data frame/tibble with 17 observations on one variable
- score
reading score on the California Achievement Test
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Cat$score)
fivenum(Cat$score)
boxplot(Cat$score, main = "Problem 1.116", col = "green")
Entry age and survival time of patients with small cell lung cancer under two different treatments
Description
Data for Exercises 7.34 and 7.48
Usage
Censored
Format
A data frame/tibble with 121 observations on three variables
- survival
survival time (in days) of patients with small cell lung cancer
- treatment
a factor with levels
armA
andarmB
indicating the treatment a patient received- age
the age of the patient
Source
Ying, Z., Jung, S., Wei, L. 1995. “Survival Analysis with Median Regression Models.” Journal of the American Statistical Association, 90, 178-184.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(survival ~ treatment, data = Censored, col = "yellow")
wilcox.test(survival ~ treatment, data = Censored, alternative = "greater")
Temperatures and O-ring failures for the launches of the space shuttle Challenger
Description
Data for Examples 1.11, 1.12, 1.13, 2.11 and 5.1
Usage
Challeng
Format
A data frame/tibble with 25 observations on four variables
- flight
a character variable indicating the flight
- date
date of the flight
- temp
temperature (in fahrenheit)
- failures
number of failures
Source
Dalal, S. R., Fowlkes, E. B., Hoadley, B. 1989. “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association, 84, No. 408, 945-957.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Challeng$temp)
summary(Challeng$temp)
IQR(Challeng$temp)
quantile(Challeng$temp)
fivenum(Challeng$temp)
stem(sort(Challeng$temp)[-1])
summary(sort(Challeng$temp)[-1])
IQR(sort(Challeng$temp)[-1])
quantile(sort(Challeng$temp)[-1])
fivenum(sort(Challeng$temp)[-1])
par(mfrow=c(1, 2))
qqnorm(Challeng$temp)
qqline(Challeng$temp)
qqnorm(sort(Challeng$temp)[-1])
qqline(sort(Challeng$temp)[-1])
par(mfrow=c(1, 1))
Starting salaries of 50 chemistry majors
Description
Data for Example 5.3
Usage
Chemist
Format
A data frame/tibble with 50 observations on one variable
- salary
starting salary (in dollars) for chemistry major
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Chemist$salary)
Surface salinity measurements taken offshore from Annapolis, Maryland in 1927
Description
Data for Exercise 6.41
Usage
Chesapea
Format
A data frame/tibble with 16 observations on one variable
- salinity
surface salinity measurements (in parts per 1000) for station 11, offshore from Annanapolis, Maryland, on July 3-4, 1927.
Source
Davis, J. (1986) Statistics and Data Analysis in Geology, Second Edition. John Wiley and Sons, New York.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Chesapea$salinity)
qqline(Chesapea$salinity)
shapiro.test(Chesapea$salinity)
t.test(Chesapea$salinity, mu = 7)
Insurance injury ratings of Chevrolet vehicles for 1990 and 1993 models
Description
Data for Exercise 8.35
Usage
Chevy
Format
A data frame/tibble with 67 observations on two variables
- year
a factor with levels
1988-90
and1991-93
- frequency
a factor with levels
much better than average
,above average
,average
,below average
, andmuch worse than average
Source
Insurance Institute for Highway Safety and the Highway Loss Data Institute, 1995.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~year + frequency, data = Chevy)
T1
chisq.test(T1)
rm(T1)
Weight gain of chickens fed three different rations
Description
Data for Exercise 10.15
Usage
Chicken
Format
A data frame/tibble with 13 observations onthree variables
- gain
weight gain over a specified period
- feed
a factor with levels
ration1
,ration2
, andration3
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(gain ~ feed, col = c("red","blue","green"), data = Chicken)
anova(lm(gain ~ feed, data = Chicken))
Measurements of the thickness of the oxide layer of manufactured integrated circuits
Description
Data for Exercises 6.49 and 7.47
Usage
Chipavg
Format
A data frame/tibble with 30 observations on three variables
- wafer1
thickness of the oxide layer for
wafer1
- wafer2
thickness of the oxide layer for
wafer2
- thickness
average thickness of the oxide layer of the eight measurements obtained from each set of two wafers
Source
Yashchin, E. 1995. “Likelihood Ratio Methods for Monitoring Parameters of a Nested Random Effect Model.” Journal of the American Statistical Association, 90, 729-738.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Chipavg$thickness)
t.test(Chipavg$thickness, mu = 1000)
boxplot(Chipavg$wafer1, Chipavg$wafer2, name = c("Wafer 1", "Wafer 2"))
shapiro.test(Chipavg$wafer1)
shapiro.test(Chipavg$wafer2)
t.test(Chipavg$wafer1, Chipavg$wafer2, var.equal = TRUE)
Four measurements on a first wafer and four measurements on a second wafer selected from 30 lots
Description
Data for Exercise 10.9
Usage
Chips
Format
A data frame/tibble with 30 observations on eight variables
- wafer11
first measurement of thickness of the oxide layer for
wafer1
- wafer12
second measurement of thickness of the oxide layer for
wafer1
- wafer13
third measurement of thickness of the oxide layer for
wafer1
- wafer14
fourth measurement of thickness of the oxide layer for
wafer1
- wafer21
first measurement of thickness of the oxide layer for
wafer2
- wafer22
second measurement of thickness of the oxide layer for
wafer2
- wafer23
third measurement of thickness of the oxide layer for
wafer2
- wafer24
fourth measurement of thickness of the oxide layer for
wafer2
Source
Yashchin, E. 1995. “Likelihood Ratio Methods for Monitoring Parameters of a Nested Random Effect Model.” Journal of the American Statistical Association, 90, 729-738.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
with(data = Chips,
boxplot(wafer11, wafer12, wafer13, wafer14, wafer21,
wafer22, wafer23, wafer24, col = "pink")
)
Milligrams of tar in 25 cigarettes selected randomly from 4 different brands
Description
Data for Example 10.4
Usage
Cigar
Format
A data frame/tibble with 100 observations on two variables
- tar
amount of tar (measured in milligrams)
- brand
a factor indicating cigarette brand with levels
brandA
,brandB
,brandC
, andbrandD
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(tar ~ brand, data = Cigar, col = "cyan", ylab = "mg tar")
anova(lm(tar ~ brand, data = Cigar))
Effect of mother's smoking on birth weight of newborn
Description
Data for Exercise 2.27
Usage
Cigarett
Format
A data frame/tibble with 16 observations on two variables
- cigarettes
mothers' estimated average number of cigarettes smoked per day
- weight
children's birth weights (in pounds)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(weight ~ cigarettes, data = Cigarett)
model <- lm(weight ~ cigarettes, data = Cigarett)
abline(model, col = "red")
with(data = Cigarett,
cor(weight, cigarettes)
)
rm(model)
Confidence Interval Simulation Program
Description
This program simulates random samples from which it constructs confidence intervals for one of the parameters mean (Mu), variance (Sigma), or proportion of successes (Pi).
Usage
CIsim(
samples = 100,
n = 30,
mu = 0,
sigma = 1,
conf.level = 0.95,
type = "Mean"
)
Arguments
samples |
the number of samples desired. |
n |
the size of each sample. |
mu |
if constructing confidence intervals for the population mean or
the population variance, mu is the population mean (i.e., type is one of
either |
sigma |
the population standard deviation. |
conf.level |
confidence level for the graphed confidence intervals, restricted to lie between zero and one. |
type |
character string, one of |
Details
Default is to construct confidence intervals for the population mean. Simulated confidence intervals for the population variance or population proportion of successes are possible by selecting the appropriate value in the type argument.
Value
Graph depicts simulated confidence intervals. The number of confidence intervals that do not contain the parameter of interest are counted and reported in the commands window.
Author(s)
Alan T. Arnholt
Examples
CIsim(100, 30, 100, 10)
# Simulates 100 samples of size 30 from
# a normal distribution with mean 100
# and standard deviation 10. From the
# 100 simulated samples, 95% confidence
# intervals for the Mean are constructed
# and depicted in the graph.
CIsim(100, 30, 100, 10, type="Var")
# Simulates 100 samples of size 30 from
# a normal distribution with mean 100
# and standard deviation 10. From the
# 100 simulated samples, 95% confidence
# intervals for the variance are constructed
# and depicted in the graph.
CIsim(100, 50, .5, type="Pi", conf.level=.90)
# Simulates 100 samples of size 50 from
# a binomial distribution where the population
# proportion of successes is 0.5. From the
# 100 simulated samples, 90% confidence
# intervals for Pi are constructed
# and depicted in the graph.
Percent of peak bone density of different aged children
Description
Data for Exercise 9.7
Usage
Citrus
Format
A data frame/tibble with nine observations on two variables
- age
age of children
- percent
percent peak bone density
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
model <- lm(percent ~ age, data = Citrus)
summary(model)
anova(model)
rm(model)
Residual contaminant following the use of three different cleansing agents
Description
Data for Exercise 10.16
Usage
Clean
Format
A data frame/tibble with 45 observations on two variables
- clean
residual contaminants
- agent
a factor with levels
A
,B
, andC
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(clean ~ agent, col = c("red", "blue", "green"), data = Clean)
anova(lm(clean ~ agent, data = Clean))
Signal loss from three types of coxial cable
Description
Data for Exercise 10.24 and 10.25
Usage
Coaxial
Format
A data frame/tibble with 45 observations on two variables
- signal
signal loss per 1000 feet
- cable
factor with three levels of coaxial cable
typeA
,typeB
, andtypeC
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(signal ~ cable, data = Coaxial, col = c("red", "green", "yellow"))
kruskal.test(signal ~ cable, data = Coaxial)
Productivity of workers with and without a coffee break
Description
Data for Exercise 7.55
Usage
Coffee
Format
A data frame/tibble with nine observations on three variables
- without
workers' productivity scores without a coffee break
- with
workers' productivity scores with a coffee break
- differences
with
minuswithout
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Coffee$differences)
qqline(Coffee$differences)
shapiro.test(Coffee$differences)
t.test(Coffee$with, Coffee$without, paired = TRUE, alternative = "greater")
wilcox.test(Coffee$with, Coffee$without, paired = TRUE,
alterantive = "greater")
Yearly returns on 12 investments
Description
Data for Exercise 5.68
Usage
Coins
Format
A data frame/tibble with 12 observations on one variable
- return
yearly returns on each of 12 possible investments
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Coins$return)
qqline(Coins$return)
Combinations
Description
Computes all possible combinations of n
objects taken k
at a
time.
Usage
Combinations(n, k)
Arguments
n |
a number. |
k |
a number less than or equal to |
Value
Returns a matrix containing the possible combinations of n
objects taken k
at a time.
See Also
Examples
Combinations(5,2)
# The columns in the matrix list the values of the 10 possible
# combinations of 5 things taken 2 at a time.
Commuting times for selected cities in 1980 and 1990
Description
Data for Exercises 1.13, and 7.85
Usage
Commute
Format
A data frame/tibble with 39 observations on three variables
- city
a factor with levels
Atlanta
,Baltimore
,Boston
,Buffalo
,Charlotte
,Chicago
,Cincinnati
,Cleveland
,Columbus
,Dallas
,Denver
,Detroit
,Hartford
,Houston
,Indianapolis
,Kansas City
,Los Angeles
,Miami
,Milwaukee
,Minneapolis
,New Orleans
,New York
,Norfolk
,Orlando
,Philadelphia
,Phoenix
,Pittsburgh
,Portland
,Providence
,Rochester
,Sacramento
,Salt Lake City
,San Antonio
,San Diego
,San Francisco
,Seattle
,St. Louis
,Tampa
, andWashington
- year
year
- time
commute times
Source
Federal Highway Administration.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stripplot(year ~ time, data = Commute, jitter = TRUE)
dotplot(year ~ time, data = Commute)
bwplot(year ~ time, data = Commute)
stripchart(time ~ year, data = Commute, method = "stack", pch = 1,
cex = 2, col = c("red", "blue"),
group.names = c("1980", "1990"),
main = "", xlab = "minutes")
title(main = "Commute Time")
boxplot(time ~ year, data = Commute, names=c("1980", "1990"),
horizontal = TRUE, las = 1)
Tennessee self concept scale scores for a group of teenage boys
Description
Data for Exercise 1.68 and 1.82
Usage
Concept
Format
A data frame/tibble with 28 observations on one variable
- self
Tennessee self concept scores
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
summary(Concept$self)
sd(Concept$self)
diff(range(Concept$self))
IQR(Concept$self)
summary(Concept$self/10)
IQR(Concept$self/10)
sd(Concept$self/10)
diff(range(Concept$self/10))
Compressive strength of concrete blocks made by two different methods
Description
Data for Example 7.17
Usage
Concrete
Format
A data frame/tibble with 20 observations on two variables
- strength
comprehensive strength (in pounds per square inch)
- method
factor with levels
new
andold
indicating the method used to construct a concrete block
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
wilcox.test(strength ~ method, data = Concrete, alternative = "greater")
Comparison of the yields of a new variety and a standard variety of corn planted on 12 plots of land
Description
Data for Exercise 7.77
Usage
Corn
Format
A data frame/tibble with 12 observations on three variables
- new
corn yield with new meathod
- standard
corn yield with standard method
- differences
new
minusstandard
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(Corn$differences)
qqnorm(Corn$differences)
qqline(Corn$differences)
shapiro.test(Corn$differences)
t.test(Corn$differences, alternative = "greater")
Exercise to illustrate correlation
Description
Data for Exercise 2.23
Usage
Correlat
Format
A data frame/tibble with 13 observations on two variables
- x
a numeric vector
- y
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(y ~ x, data = Correlat)
model <- lm(y ~ x, data = Correlat)
abline(model)
rm(model)
Scores of 18 volunteers who participated in a counseling process
Description
Data for Exercise 6.96
Usage
Counsel
Format
A data frame/tibble with 18 observations on one variable
- score
standardized psychology scores after a counseling process
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Counsel$score)
t.test(Counsel$score, mu = 70)
Consumer price index from 1979 to 1998
Description
Data for Exercise 1.34
Usage
Cpi
Format
A data frame/tibble with 20 observations on two variables
- year
year
- cpi
consumer price index
Source
Bureau of Labor Statistics.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(cpi ~ year, data = Cpi, type = "l", lty = 2, lwd = 2, col = "red")
barplot(Cpi$cpi, col = "pink", las = 2, main = "Problem 1.34")
Violent crime rates for the states in 1983 and 1993
Description
Data for Exercises 1.90, 2.32, 3.64, and 5.113
Usage
Crime
Format
A data frame/tibble with 102 observations on three variables
- state
a factor with levels
Alabama
,Alaska
,Arizona
,Arkansas
,California
,Colorado
,Connecticut
,DC
,Delaware
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
,Kansas
,Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missour
,Montana
,Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
,Washington
,West Virginia
,Wisconsin
, andWyoming
- year
a factor with levels
1983
and1993
- rate
crime rate per 100,000 inhabitants
Source
U.S. Department of Justice, Bureau of Justice Statistics, Sourcebook of Criminal Justice Statistics, 1993.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(rate ~ year, data = Crime, col = "red")
Charles Darwin's study of cross-fertilized and self-fertilized plants
Description
Data for Exercise 7.62
Usage
Darwin
Format
A data frame/tibble with 15 observations on three variables
- pot
number of pot
- cross
height of plant (in inches) after a fixed period of time when cross-fertilized
- self
height of plant (in inches) after a fixed period of time when self-fertilized
Source
Darwin, C. (1876) The Effect of Cross- and Self-Fertilization in the Vegetable Kingdom, 2nd edition, London.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
differ <- Darwin$cross - Darwin$self
qqnorm(differ)
qqline(differ)
shapiro.test(differ)
wilcox.test(Darwin$cross, Darwin$self, paired = TRUE)
rm(differ)
Automobile dealers classified according to type dealership and service rendered to customers
Description
Data for Example 2.22
Usage
Dealers
Format
A data frame/tibble with 122 observations on two variables
- type
a factor with levels
Honda
,Toyota
,Mazda
,Ford
,Dodge
, andSaturn
- service
a factor with levels
Replaces unnecessarily
andFollows manufacturer guidelines
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
xtabs(~type + service, data = Dealers)
T1 <- xtabs(~type + service, data = Dealers)
T1
addmargins(T1)
pt <- prop.table(T1, margin = 1)
pt
barplot(t(pt), col = c("red", "skyblue"), legend = colnames(T1))
rm(T1, pt)
Number of defective items produced by 20 employees
Description
Data for Exercise 1.27
Usage
Defectiv
Format
A data frame/tibble with 20 observations on one variable
- number
number of defective items produced by the employees in a small business firm
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~ number, data = Defectiv)
T1
barplot(T1, col = "pink", ylab = "Frequency",
xlab = "Defective Items Produced by Employees", main = "Problem 1.27")
rm(T1)
Percent of bachelor's degrees awarded women in 1970 versus 1990
Description
Data for Exercise 2.75
Usage
Degree
Format
A data frame/tibble with 1064 observations on two variables
- field
a factor with levels
Health
,Education
,Foreign Language
,Psychology
,Fine Arts
,Life Sciences
,Business
,Social Science
,Physical Sciences
,Engineering
, andAll Fields
- awarded
a factor with levels
1970
and1990
Source
U.S. Department of Health and Human Services, National Center for Education Statistics.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~field + awarded, data = Degree)
T1
barplot(t(T1), beside = TRUE, col = c("red", "skyblue"), legend = colnames(T1))
rm(T1)
Delay times on 20 flights from four major air carriers
Description
Data for Exercise 10.55
Usage
Delay
Format
A data frame/tibble with 80 observations on two variables
- delay
the delay time (in minutes) for 80 randomly selected flights
- carrier
a factor with levels
A
,B
,C
, andD
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(delay ~ carrier, data = Delay,
main = "Exercise 10.55", ylab = "minutes",
col = "pink")
kruskal.test(delay ~carrier, data = Delay)
Number of dependent children for 50 families
Description
Data for Exercise 1.26
Usage
Depend
Format
A data frame/tibble with 50 observations on one variable
- number
number of dependent children in a family
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~ number, data = Depend)
T1
barplot(T1, col = "lightblue", main = "Problem 1.26",
xlab = "Number of Dependent Children", ylab = "Frequency")
rm(T1)
Educational levels of a sample of 40 auto workers in Detroit
Description
Data for Exercise 5.21
Usage
Detroit
Format
A data frame/tibble with 40 observations on one variable
- educ
the educational level (in years) of a sample of 40 auto workers in a plant in Detroit
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Detroit$educ)
Demographic characteristics of developmental students at 2-year colleges and 4-year colleges
Description
Data used for Exercise 8.50
Usage
Develop
Format
A data frame/tibble with 5656 observations on two variables
- race
a factor with levels
African American
,American Indian
,Asian
,Latino
, andWhite
- college
a factor with levels
Two-year
andFour-year
Source
Research in Development Education (1994), V. 11, 2.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~race + college, data = Develop)
T1
chisq.test(T1)
rm(T1)
Test scores for students who failed developmental mathematics in the fall semester 1995
Description
Data for Exercise 6.47
Usage
Devmath
Format
A data frame/tibble with 40 observations on one variable
- score
first exam score
Source
Data provided by Dr. Anita Kitchens.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Devmath$score)
t.test(Devmath$score, mu = 80, alternative = "less")
Outcomes and probabilities of the roll of a pair of fair dice
Description
Data for Exercise 3.109
Usage
Dice
Format
A data frame/tibble with 11 observations on two variables
- x
possible outcomes for the sum of two dice
- px
probability for outcome
x
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
roll1 <- sample(1:6, 20000, replace = TRUE)
roll2 <- sample(1:6, 20000, replace = TRUE)
outcome <- roll1 + roll2
T1 <- table(outcome)/length(outcome)
remove(roll1, roll2, outcome)
T1
round(t(Dice), 5)
rm(roll1, roll2, T1)
Diesel fuel prices in 1999-2000 in nine regions of the country
Description
Data for Exercise 2.8
Usage
Diesel
Format
A data frame/tibble with 650 observations on three variables
- date
date when price was recorded
- pricepergallon
price per gallon (in dollars)
- location
a factor with levels
California
,CentralAtlantic
,Coast
,EastCoast
,Gulf
,LowerAtlantic
,NatAvg
,NorthEast
,Rocky
, andWesternMountain
Source
Energy Information Administration, National Enerfy Information Center: 1000 Independence Ave., SW, Washington, D.C., 20585.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
par(las = 2)
boxplot(pricepergallon ~ location, data = Diesel)
boxplot(pricepergallon ~ location,
data = droplevels(Diesel[Diesel$location == "EastCoast" |
Diesel$location == "Gulf" | Diesel$location == "NatAvg" |
Diesel$location == "Rocky" | Diesel$location == "California", ]),
col = "pink", main = "Exercise 2.8")
par(las = 1)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Diesel, aes(x = date, y = pricepergallon,
color = location)) +
geom_point() +
geom_smooth(se = FALSE) +
theme_bw() +
labs(y = "Price per Gallon (in dollars)")
## End(Not run)
Parking tickets issued to diplomats
Description
Data for Exercises 1.14 and 1.37
Usage
Diplomat
Format
A data frame/tibble with 10 observations on three variables
- country
a factor with levels
Brazil
,Bulgaria
,Egypt
,Indonesia
,Israel
,Nigeria
,Russia
,S. Korea
,Ukraine
, andVenezuela
- number
total number of tickets
- rate
number of tickets per vehicle per month
Source
Time, November 8, 1993. Figures are from January to June 1993.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
par(las = 2, mfrow = c(2, 2))
stripchart(number ~ country, data = Diplomat, pch = 19,
col= "red", vertical = TRUE)
stripchart(rate ~ country, data = Diplomat, pch = 19,
col= "blue", vertical = TRUE)
with(data = Diplomat,
barplot(number, names.arg = country, col = "red"))
with(data = Diplomat,
barplot(rate, names.arg = country, col = "blue"))
par(las = 1, mfrow = c(1, 1))
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Diplomat, aes(x = reorder(country, number),
y = number)) +
geom_bar(stat = "identity", fill = "pink", color = "black") +
theme_bw() + labs(x = "", y = "Total Number of Tickets")
ggplot2::ggplot(data = Diplomat, aes(x = reorder(country, rate),
y = rate)) +
geom_bar(stat = "identity", fill = "pink", color = "black") +
theme_bw() + labs(x = "", y = "Tickets per vehicle per month")
## End(Not run)
Toxic intensity for manufacturing plants producing herbicidal preparations
Description
Data for Exercise 1.127
Usage
Disposal
Format
A data frame/tibble with 29 observations on one variable
- pounds
pounds of toxic waste per $1000 of shipments of its products
Source
Bureau of the Census, Reducing Toxins, Statistical Brief SB/95-3, February 1995.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Disposal$pounds)
fivenum(Disposal$pounds)
EDA(Disposal$pounds)
Rankings of the favorite breeds of dogs
Description
Data for Exercise 2.88
Usage
Dogs
Format
A data frame/tibble with 20 observations on three variables
- breed
a factor with levels
Beagle
,Boxer
,Chihuahua
,Chow
,Dachshund
,Dalmatian
,Doberman
,Huskie
,Labrador
,Pomeranian
,Poodle
,Retriever
,Rotweiler
,Schnauzer
,Shepherd
,Shetland
,ShihTzu
,Spaniel
,Springer
, andYorkshire
- ranking
numeric ranking
- year
a factor with levels
1992
,1993
,1997
, and1998
Source
The World Almanac and Book of Facts, 2000.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
cor(Dogs$ranking[Dogs$year == "1992"], Dogs$ranking[Dogs$year == "1993"])
cor(Dogs$ranking[Dogs$year == "1997"], Dogs$ranking[Dogs$year == "1998"])
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Dogs, aes(x = reorder(breed, ranking), y = ranking)) +
geom_bar(stat = "identity") +
facet_grid(year ~. ) +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5))
## End(Not run)
Rates of domestic violence per 1,000 women by age groups
Description
Data for Exercise 1.20
Usage
Domestic
Format
A data frame/tibble with five observations on two variables
- age
a factor with levels
12-19
,20-24
,25-34
,35-49
, and50-64
- rate
rate of domestic violence per 1000 women
Source
U.S. Department of Justice.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
barplot(Domestic$rate, names.arg = Domestic$age)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Domestic, aes(x = age, y = rate)) +
geom_bar(stat = "identity", fill = "purple", color = "black") +
labs(x = "", y = "Domestic violence per 1000 women") +
theme_bw()
## End(Not run)
Dopamine b-hydroxylase activity of schizophrenic patients treated with an antipsychotic drug
Description
Data for Exercises 5.14 and 7.49
Usage
Dopamine
Format
A data frame/tibble with 25 observations on two variables
- dbh
dopamine b-hydroxylase activity (units are nmol/(ml)(h)/(mg) of protein)
- group
a factor with levels
nonpsychotic
andpsychotic
Source
D.E. Sternberg, D.P. Van Kammen, and W.E. Bunney, "Schizophrenia: Dopamine b-Hydroxylase Activity and Treatment Respsonse," Science, 216 (1982), 1423 - 1425.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(dbh ~ group, data = Dopamine, col = "orange")
t.test(dbh ~ group, data = Dopamine, var.equal = TRUE)
Closing yearend Dow Jones Industrial averages from 1896 through 2000
Description
Data for Exercise 1.35
Usage
Dowjones
Format
A data frame/tibble with 105 observations on three variables
- year
date
- close
Dow Jones closing price
- change
percent change from previous year
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(close ~ year, data = Dowjones, type = "l", main = "Exercise 1.35")
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Dowjones, aes(x = year, y = close)) +
geom_point(size = 0.5) +
geom_line(color = "red") +
theme_bw() +
labs(y = "Dow Jones Closing Price")
## End(Not run)
Opinion on referendum by view on moral issue of selling alcoholic beverages
Description
Data for Exercise 8.53
Usage
Drink
Format
A data frame/tibble with 472 observations on two variables
- drinking
a factor with levels
ok
,tolerated
, andimmoral
- referendum
a factor with levels
for
,against
, andundecided
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~drinking + referendum, data = Drink)
T1
chisq.test(T1)
rm(T1)
Number of trials to master a task for a group of 28 subjects assigned to a control and an experimental group
Description
Data for Example 7.15
Usage
Drug
Format
A data frame/tibble with 28 observations on two variables
- trials
number of trials to master a task
- group
a factor with levels
control
andexperimental
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(trials ~ group, data = Drug,
main = "Example 7.15", col = c("yellow", "red"))
wilcox.test(trials ~ group, data = Drug)
t.test(rank(trials) ~ group, data = Drug, var.equal = TRUE)
Data on a group of college students diagnosed with dyslexia
Description
Data for Exercise 2.90
Usage
Dyslexia
Format
A data frame/tibble with eight observations on seven variables
- words
number of words read per minute
- age
age of participant
- gender
a factor with levels
female
andmale
- handed
a factor with levels
left
andright
- weight
weight of participant (in pounds)
- height
height of participant (in inches)
- children
number of children in family
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(height ~ weight, data = Dyslexia)
plot(words ~ factor(handed), data = Dyslexia,
xlab = "hand", col = "lightblue")
One hundred year record of worldwide seismic activity(1770-1869)
Description
Data for Exercise 6.97
Usage
Earthqk
Format
A data frame/tibble with 100 observations on two variables
- year
year seimic activity recorded
- severity
annual incidence of sever earthquakes
Source
Quenoille, M.H. (1952), Associated Measurements, Butterworth, London. p 279.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Earthqk$severity)
t.test(Earthqk$severity, mu = 100, alternative = "greater")
Exploratory Data Anaalysis
Description
Function that produces a histogram, density plot, boxplot, and Q-Q plot.
Usage
EDA(x, trim = 0.05)
Arguments
x |
numeric vector. |
trim |
fraction (between 0 and 0.5, inclusive) of values to be trimmed
from each end of the ordered data. If |
Details
Will not return command window information on data sets containing more than 5000 observations. It will however still produce graphical output for data sets containing more than 5000 observations.
Value
Function returns various measures of center and location. The values returned for the Quartiles are based on the definitions provided in BSDA. The boxplot is based on the Quartiles returned in the commands window.
Note
Requires package e1071.
Author(s)
Alan T. Arnholt
Examples
EDA(rnorm(100))
# Produces four graphs for the 100 randomly
# generated standard normal variates.
Crime rates versus the percent of the population without a high school degree
Description
Data for Exercise 2.41
Usage
Educat
Format
A data frame/tibble with 51 observations on three variables
- state
a factor with levels
Alabama
,Alaska
,Arizona
,Arkansas
,California
,Colorado
,Connecticut
,DC
,Delaware
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
,Kansas
,Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missour
,Montana
,Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
,Washington
,West Virginia
,Wisconsin
, andWyoming
- nodegree
percent of the population without a high school degree
- crime
violent crimes per 100,000 population
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(crime ~ nodegree, data = Educat,
xlab = "Percent of population without high school degree",
ylab = "Violent Crime Rate per 100,000")
Number of eggs versus amounts of feed supplement
Description
Data for Exercise 9.22
Usage
Eggs
Format
A data frame/tibble with 12 observations on two variables
- feed
amount of feed supplement
- eggs
number of eggs per day for 100 chickens
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(eggs ~ feed, data = Eggs)
model <- lm(eggs ~ feed, data = Eggs)
abline(model, col = "red")
summary(model)
rm(model)
Percent of the population over the age of 65
Description
Data for Exercise 1.92 and 2.61
Usage
Elderly
Format
A data frame/tibble with 51 observations on three variables
- state
a factor with levels
Alabama
,Alaska
,Arizona
,Arkansas
,California
,Colorado
,Connecticut
,Delaware
,District of Colunbia
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
,Kansas
,Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missour
,Montana
,Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
,Washington
,West Virginia
,Wisconsin
, andWyoming
- percent1985
percent of the population over the age of 65 in 1985
- percent1998
percent of the population over the age of 65 in 1998
Source
U.S. Census Bureau Internet site, February 2000.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
with(data = Elderly,
stripchart(x = list(percent1998, percent1985), method = "stack", pch = 19,
col = c("red","blue"), group.names = c("1998", "1985"))
)
with(data = Elderly, cor(percent1998, percent1985))
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Elderly, aes(x = percent1985, y = percent1998)) +
geom_point() +
theme_bw()
## End(Not run)
Amount of energy consumed by homes versus their sizes
Description
Data for Exercises 2.5, 2.24, and 2.55
Usage
Energy
Format
A data frame/tibble with 12 observations on two variables
- size
size of home (in square feet)
- kilowatt
killowatt-hours per month
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(kilowatt ~ size, data = Energy)
with(data = Energy, cor(size, kilowatt))
model <- lm(kilowatt ~ size, data = Energy)
plot(Energy$size, resid(model), xlab = "size")
Salaries after 10 years for graduates of three different universities
Description
Data for Example 10.7
Usage
Engineer
Format
A data frame/tibble with 51 observations on two variables
- salary
salary (in $1000) 10 years after graduation
- university
a factor with levels
A
,B
, andC
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(salary ~ university, data = Engineer,
main = "Example 10.7", col = "yellow")
kruskal.test(salary ~ university, data = Engineer)
anova(lm(salary ~ university, data = Engineer))
anova(lm(rank(salary) ~ university, data = Engineer))
College entrance exam scores for 24 high school seniors
Description
Data for Example 1.8
Usage
Entrance
Format
A data frame/tibble with 24 observations on one variable
- score
college entrance exam score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Entrance$score)
stem(Entrance$score, scale = 2)
Fuel efficiency ratings for compact vehicles in 2001
Description
Data for Exercise 1.65
Usage
Epaminicompact
Format
A data frame/tibble with 22 observations on ten variables
- class
a character variable with value
MINICOMPACT CARS
- manufacturer
a character variable with values
AUDI
,BMW
,JAGUAR
,MERCEDES-BENZ
,MITSUBISHI
, andPORSCHE
- carline
a character variable with values
325CI CONVERTIBLE
,330CI CONVERTIBLE
,911 CARRERA 2/4
,911 TURBO
,CLK320 (CABRIOLET)
,CLK430 (CABRIOLET)
,ECLIPSE SPYDER
,JAGUAR XK8 CONVERTIBLE
,JAGUAR XKR CONVERTIBLE
,M3 CONVERTIBLE
,TT COUPE
, andTT COUPE QUATTRO
- displ
engine displacement (in liters)
- cyl
number of cylinders
- trans
a factor with levels
Auto(L5)
,Auto(S4)
,Auto(S5)
,Manual(M5)
, andManual(M6)
- drv
a factor with levels
4
(four wheel drive),F
(front wheel drive), andR
(rear wheel drive)- cty
city mpg
- hwy
highway mpg
- cmb
combined city and highway mpg
Source
EPA data.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
summary(Epaminicompact$cty)
plot(hwy ~ cty, data = Epaminicompact)
Fuel efficiency ratings for two-seater vehicles in 2001
Description
Data for Exercise 5.8
Usage
Epatwoseater
Format
A data frame/tibble with 36 observations on ten variables
- class
a character variable with value
TWO SEATERS
- manufacturer
a character variable with values
ACURA
,AUDI
,BMW
,CHEVROLET
,DODGE
,FERRARI
,HONDA
,LAMBORGHINI
,MAZDA
,MERCEDES-BENZ
,PLYMOUTH
,PORSCHE
, andTOYOTA
- carline
a character variable with values
BOXSTER
,BOXSTER S
,CORVETTE
,DB132/144 DIABLO
,FERRARI 360 MODENA/SPIDER
,FERRARI 550 MARANELLO/BARCHETTA
,INSIGHT
,MR2
,MX-5 MIATA
,NSX
,PROWLER
,S2000
,SL500
,SL600
,SLK230 KOMPRESSOR
,SLK320
,TT ROADSTER
,TT ROADSTER QUATTRO
,VIPER CONVERTIBLE
,VIPER COUPE
,Z3 COUPE
,Z3 ROADSTER
, andZ8
- displ
engine displacement (in liters)
- cyl
number of cylinders
- trans
a factor with levels
Auto(L4)
,Auto(L5)
,Auto(S4)
,Auto(S5)
,Auto(S6)
,Manual(M5)
, andManual(M6)
- drv
a factor with levels
4
(four wheel drive)F
(front wheel drive)R
(rear wheel drive)- cty
city mpg
- hwy
highway mpg
- cmb
combined city and highway mpg
@source Environmental Protection Agency.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
summary(Epatwoseater$cty)
plot(hwy ~ cty, data = Epatwoseater)
boxplot(cty ~ drv, data = Epatwoseater, col = "lightgreen")
Ages of 25 executives
Description
Data for Exercise 1.104
Usage
Executiv
Format
A data frame/tibble with 25 observations on one variable
- age
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Executiv$age, xlab = "Age of banking executives",
breaks = 5, main = "", col = "gray")
Weight loss for 30 members of an exercise program
Description
Data for Exercise 1.44
Usage
Exercise
Format
A data frame/tibble with 30 observations on one variable
- loss
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Exercise$loss)
Measures of softness of ten different clothing garments washed with and without a softener
Description
Data for Example 7.21
Usage
Fabric
Format
A data frame/tibble with 20 observations on three variables
- garment
a numeric vector
- softner
a character variable with values
with
andwithout
- softness
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
## Not run:
library(tidyr)
tidyr::spread(Fabric, softner, softness) -> FabricWide
wilcox.test(Pair(with, without)~1, alternative = "greater", data = FabricWide)
T7 <- tidyr::spread(Fabric, softner, softness) %>%
mutate(di = with - without, adi = abs(di), rk = rank(adi),
srk = sign(di)*rk)
T7
t.test(T7$srk, alternative = "greater")
## End(Not run)
Waiting times between successive eruptions of the Old Faithful geyser
Description
Data for Exercise 5.12 and 5.111
Usage
Faithful
Format
A data frame/tibble with 299 observations on two variables
- time
a numeric vector
- eruption
a factor with levels
1
and2
Source
A. Azzalini and A. Bowman, "A Look at Some Data on the Old Faithful Geyser," Journal of the Royal Statistical Society, Series C, 39 (1990), 357-366.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
t.test(time ~ eruption, data = Faithful)
hist(Faithful$time, xlab = "wait time", main = "", freq = FALSE)
lines(density(Faithful$time))
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Faithful, aes(x = time, y = ..density..)) +
geom_histogram(binwidth = 5, fill = "pink", col = "black") +
geom_density() +
theme_bw() +
labs(x = "wait time")
## End(Not run)
Size of family versus cost per person per week for groceries
Description
Data for Exercise 2.89
Usage
Family
Format
A data frame/tibble with 20 observations on two variables
- number
number in family
- cost
cost per person (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(cost ~ number, data = Family)
abline(lm(cost ~ number, data = Family), col = "red")
cor(Family$cost, Family$number)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Family, aes(x = number, y = cost)) +
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
## End(Not run)
Choice of presidental ticket in 1984 by gender
Description
Data for Exercise 8.23
Usage
Ferraro1
Format
A data frame/tibble with 1000 observations on two variables
- gender
a factor with levels
Men
andWomen
- candidate
a character vector of 1984 president and vice-president candidates
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~gender + candidate, data = Ferraro1)
T1
chisq.test(T1)
rm(T1)
Choice of vice presidental candidate in 1984 by gender
Description
Data for Exercise 8.23
Usage
Ferraro2
Format
A data frame/tibble with 1000 observations on two variables
- gender
a factor with levels
Men
andWomen
- candidate
a character vector of 1984 president and vice-president candidates
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~gender + candidate, data = Ferraro2)
T1
chisq.test(T1)
rm(T1)
Fertility rates of all 50 states and DC
Description
Data for Exercise 1.125
Usage
Fertility
Format
A data frame/tibble with 51 observations on two variables
- state
a character variable with values
Alabama
,Alaska
,Arizona
,Arkansas
,California
,Colorado
,Connecticut
,Delaware
,District of Colunbia
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
,Kansas
,Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missour
,Montana
,Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
,Washington
,West Virginia
,Wisconsin
, andWyoming
- rate
fertility rate (expected number of births during childbearing years)
Source
Population Reference Bureau.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Fertility$rate)
fivenum(Fertility$rate)
EDA(Fertility$rate)
Ages of women at the birth of their first child
Description
Data for Exercise 5.11
Usage
Firstchi
Format
A data frame/tibble with 87 observations on one variable
- age
age of woman at birth of her first child
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Firstchi$age)
Length and number of fish caught with small and large mesh codend
Description
Data for Exercises 5.83, 5.119, and 7.29
Usage
Fish
Format
A data frame/tibble with 1534 observations on two variables
- codend
a character variable with values
smallmesh
andlargemesh
- length
length of the fish measured in centimeters
Source
R. Millar, “Estimating the Size - Selectivity of Fishing Gear by Conditioning on the Total Catch,” Journal of the American Statistical Association, 87 (1992), 962 - 968.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
tapply(Fish$length, Fish$codend, median, na.rm = TRUE)
SIGN.test(Fish$length[Fish$codend == "smallmesh"], conf.level = 0.99)
## Not run:
dplyr::group_by(Fish, codend) %>%
summarize(MEDIAN = median(length, na.rm = TRUE))
## End(Not run)
Number of sit-ups before and after a physical fitness course
Description
Data for Exercise 7.71
Usage
Fitness
Format
A data frame/tibble with 18 observations on the three variables
- subject
a character variable indicating subject number
- test
a character variable with values
After
andBefore
- number
a numeric vector recording the number of sit-ups performed in one minute
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
## Not run:
tidyr::spread(Fitness, test, number) -> FitnessWide
t.test(Pair(After, Before)~1, alternative = "greater", data = FitnessWide)
Wide <- tidyr::spread(Fitness, test, number) %>%
mutate(diff = After - Before)
Wide
qqnorm(Wide$diff)
qqline(Wide$diff)
t.test(Wide$diff, alternative = "greater")
## End(Not run)
Florida voter results in the 2000 presidential election
Description
Data for Statistical Insight Chapter 2
Usage
Florida2000
Format
A data frame/tibble with 67 observations on 12 variables
- county
a character variable with values
ALACHUA
,BAKER
,BAY
,BRADFORD
,BREVARD
,BROWARD
,CALHOUN
,CHARLOTTE
,CITRUS
,CLAY
,COLLIER
,COLUMBIA
,DADE
,DE SOTO
,DIXIE
,DUVAL
,ESCAMBIA
,FLAGLER
,FRANKLIN
,GADSDEN
,GILCHRIST
,GLADES
,GULF
,HAMILTON
,HARDEE
,HENDRY
,HERNANDO
,HIGHLANDS
,HILLSBOROUGH
,HOLMES
,INDIAN RIVER
,JACKSON
,JEFFERSON
,LAFAYETTE
,LAKE
,LEE
,LEON
,LEVY
,LIBERTY
,MADISON
,MANATEE
,MARION
,MARTIN
,MONROE
,NASSAU
,OKALOOSA
,OKEECHOBEE
,ORANGE
,OSCEOLA
,PALM BEACH
,PASCO
,PINELLAS
,POLK
,PUTNAM
,SANTA ROSA
,SARASOTA
,SEMINOLE
,ST. JOHNS
,ST. LUCIE
,SUMTER
,SUWANNEE
,TAYLOR
,UNION
,VOLUSIA
,WAKULLA
,WALTON
, andWASHINGTON
- gore
number of votes
- bush
number of votes
- buchanan
number of votes
- nader
number of votes
- browne
number of votes
- hagelin
number of votes
- harris
number of votes
- mcreynolds
number of votes
- moorehead
number of votes
- phillips
number of votes
- total
number of votes
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(buchanan ~ total, data = Florida2000,
xlab = "Total votes cast (in thousands)",
ylab = "Votes for Buchanan")
Breakdown times of an insulating fluid under various levels of voltage stress
Description
Data for Exercise 5.76
Usage
Fluid
Format
A data frame/tibble with 76 observations on two variables
- kilovolts
a character variable showing kilowats
- time
breakdown time (in minutes)
Source
E. Soofi, N. Ebrahimi, and M. Habibullah, 1995.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
DF1 <- Fluid[Fluid$kilovolts == "34kV", ]
DF1
# OR
DF2 <- subset(Fluid, subset = kilovolts == "34kV")
DF2
stem(DF2$time)
SIGN.test(DF2$time)
## Not run:
library(dplyr)
DF3 <- dplyr::filter(Fluid, kilovolts == "34kV")
DF3
## End(Not run)
Annual food expenditures for 40 single households in Ohio
Description
Data for Exercise 5.106
Usage
Food
Format
A data frame/tibble with 40 observations on one variable
- expenditure
a numeric vector recording annual food expenditure (in dollars) in the state of Ohio.
Source
Bureau of Labor Statistics.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Food$expenditure)
Cholesterol values of 62 subjects in the Framingham Heart Study
Description
Data for Exercises 1.56, 1.75, 3.69, and 5.60
Usage
Framingh
Format
A data frame/tibble with 62 observations on one variable
- cholest
a numeric vector with cholesterol values
Source
R. D'Agostino, et al., (1990) "A Suggestion for Using Powerful and Informative Tests for Normality," The American Statistician, 44 316-321.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Framingh$cholest)
boxplot(Framingh$cholest, horizontal = TRUE)
hist(Framingh$cholest, freq = FALSE)
lines(density(Framingh$cholest))
mean(Framingh$cholest > 200 & Framingh$cholest < 240)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Framingh, aes(x = factor(1), y = cholest)) +
geom_boxplot() + # boxplot
labs(x = "") + # no x label
theme_bw() + # black and white theme
geom_jitter(width = 0.2) + # jitter points
coord_flip() # Create horizontal plot
ggplot2::ggplot(data = Framingh, aes(x = cholest, y = ..density..)) +
geom_histogram(fill = "pink", binwidth = 15, color = "black") +
geom_density() +
theme_bw()
## End(Not run)
Ages of a random sample of 30 college freshmen
Description
Data for Exercise 6.53
Usage
Freshman
Format
A data frame/tibble with 30 observations on one variable
- age
a numeric vector of ages
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
SIGN.test(Freshman$age, md = 19)
Cost of funeral by region of country
Description
Data for Exercise 8.54
Usage
Funeral
Format
A data frame/tibble with 400 observations on two variables
- region
a factor with levels
Central
,East,
South
, andWest
- cost
a factor with levels
less than expected
,about what expected
, andmore than expected
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~region + cost, data = Funeral)
T1
chisq.test(T1)
rm(T1)
Velocities of 82 galaxies in the Corona Borealis region
Description
Data for Example 5.2
Usage
Galaxie
Format
A data frame/tibble with 82 observations on one variable
- velocity
velocity measured in kilometers per second
Source
K. Roeder, "Density Estimation with Confidence Sets Explained by Superclusters and Voids in the Galaxies," Journal of the American Statistical Association, 85 (1990), 617-624.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Galaxie$velocity)
Results of a Gallup poll on possession of marijuana as a criminal offense conducted in 1980
Description
Data for Exercise 2.76
Usage
Gallup
Format
A data frame/tibble with 1,200 observations on two variables
- demographics
a factor with levels
National
,Gender: Male
Gender: Female
,Education: College
,Eduction: High School
,Education: Grade School
,Age: 18-24
,Age: 25-29
,Age: 30-49
,Age: 50-older
,Religion: Protestant
, andReligion: Catholic
- opinion
a factor with levels
Criminal
,Not Criminal
, andNo Opinion
Source
George H. Gallup The Gallup Opinion Index Report No. 179 (Princeton, NJ: The Gallup Poll, July 1980), p. 15.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~demographics + opinion, data = Gallup)
T1
t(T1[c(2, 3), ])
barplot(t(T1[c(2, 3), ]))
barplot(t(T1[c(2, 3), ]), beside = TRUE)
## Not run:
library(dplyr)
library(ggplot2)
dplyr::filter(Gallup, demographics == "Gender: Male" | demographics == "Gender: Female") %>%
ggplot2::ggplot(aes(x = demographics, fill = opinion)) +
geom_bar() +
theme_bw() +
labs(y = "Fraction")
## End(Not run)
Price of regular unleaded gasoline obtained from 25 service stations
Description
Data for Exercise 1.45
Usage
Gasoline
Format
A data frame/tibble with 25 observations on one variable
- price
price for one gallon of gasoline
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Gasoline$price)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Gasoline, aes(x = factor(1), y = price)) +
geom_violin() +
geom_jitter() +
theme_bw()
## End(Not run)
Number of errors in copying a German passage before and after an experimental course in German
Description
Data for Exercise 7.60
Usage
German
Format
A data frame/tibble with ten observations on three variables
- student
a character variable indicating student number
- when
a character variable with values
Before
andAfter
to indicate when the student received experimental instruction in German- errors
the number of errors in copying a German passage
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
## Not run:
tidyr::spread(German, when, errors) -> GermanWide
t.test(Pair(After, Before) ~ 1, data = GermanWide)
wilcox.test(Pair(After, Before) ~ 1, data = GermanWide)
T8 <- tidyr::spread(German, when, errors) %>%
mutate(di = After - Before, adi = abs(di), rk = rank(adi), srk = sign(di)*rk)
T8
qqnorm(T8$di)
qqline(T8$di)
t.test(T8$srk)
## End(Not run)
Distances a golf ball can be driven by 20 professional golfers
Description
Data for Exercise 5.24
Usage
Golf
Format
A data frame/tibble with 20 observations on one variable
- yards
distance a golf ball is driven in yards
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Golf$yards)
qqnorm(Golf$yards)
qqline(Golf$yards)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Golf, aes(sample = yards)) +
geom_qq() +
theme_bw()
## End(Not run)
Annual salaries for state governors in 1994 and 1999
Description
Data for Exercise 5.112
Usage
Governor
Format
A data frame/tibble with 50 observations on three variables
- state
a character variable with values
Alabama
,Alaska
,Arizona
,Arkansas
,California
,Colorado
,Connecticut
,Delaware
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
,Kansas
,Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missouri
,Montana
,Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
,Washington
,West Virginia
,Wisconsin
, andWyoming
- year
a factor indicating year
- salary
a numeric vector with the governor's salary (in dollars)
Source
The 2000 World Almanac and Book of Facts.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(salary ~ year, data = Governor)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Governor, aes(x = salary)) +
geom_density(fill = "pink") +
facet_grid(year ~ .) +
theme_bw()
## End(Not run)
High school GPA versus college GPA
Description
Data for Example 2.13
Usage
Gpa
Format
A data frame/tibble with 10 observations on two variables
- hsgpa
high school gpa
- collgpa
college gpa
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(collgpa ~ hsgpa, data = Gpa)
mod <- lm(collgpa ~ hsgpa, data = Gpa)
abline(mod) # add line
yhat <- predict(mod) # fitted values
e <- resid(mod) # residuals
cbind(Gpa, yhat, e) # Table 2.1
cor(Gpa$hsgpa, Gpa$collgpa)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Gpa, aes(x = hsgpa, y = collgpa)) +
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
## End(Not run)
Test grades in a beginning statistics class
Description
Data for Exercise 1.120
Usage
Grades
Format
A data frame with 29 observations on one variable
- grades
a numeric vector containing test grades
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Grades$grades, main = "", xlab = "Test grades", right = FALSE)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Grades, aes(x = grades, y = ..density..)) +
geom_histogram(fill = "pink", binwidth = 5, color = "black") +
geom_density(lwd = 2, color = "red") +
theme_bw()
## End(Not run)
Graduation rates for student athletes in the Southeastern Conf.
Description
Data for Exercise 1.118
Usage
Graduate
Format
A data frame/tibble with 12 observations on three variables
- school
a character variable with values
Alabama
,Arkansas
,Auburn
,Florida
,Georgia
,Kentucky
,Louisiana St
,Mississippi
,Mississippi St
,South Carolina,
Tennessee
, andVanderbilt
- code
a character variable with values
Al
,Ar
,Au
Fl
,Ge
,Ke
,LSt
,Mi
,MSt
,SC
,Te
, andVa
- percent
graduation rate
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
barplot(Graduate$percent, names.arg = Graduate$school,
las = 2, cex.names = 0.7, col = "tomato")
Varve thickness from a sequence through an Eocene lake deposit in the Rocky Mountains
Description
Data for Exercise 6.57
Usage
Greenriv
Format
A data frame/tibble with 37 observations on one variable
- thick
varve thickness in millimeters
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Greenriv$thick)
SIGN.test(Greenriv$thick, md = 7.3, alternative = "greater")
Thickness of a varved section of the Green river oil shale deposit near a major lake in the Rocky Mountains
Description
Data for Exercises 6.45 and 6.98
Usage
Grnriv2
Format
A data frame/tibble with 101 observations on one variable
- thick
varve thickness (in millimeters)
Source
J. Davis, Statistics and Data Analysis in Geology, 2nd Ed., Jon Wiley and Sons, New York.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Grnriv2$thick)
t.test(Grnriv2$thick, mu = 8, alternative = "less")
Group data to illustrate analysis of variance
Description
Data for Exercise 10.42
Usage
Groupabc
Format
A data frame/tibble with 45 observations on two variables
- group
a factor with levels
A
,B
, andC
- response
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(response ~ group, data = Groupabc,
col = c("red", "blue", "green"))
anova(lm(response ~ group, data = Groupabc))
An illustration of analysis of variance
Description
Data for Exercise 10.4
Usage
Groups
Format
A data frame/tibble with 78 observations on two variables
- group
a factor with levels
A
,B
, andC
- response
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(response ~ group, data = Groups, col = c("red", "blue", "green"))
anova(lm(response ~ group, data = Groups))
Children's age versus number of completed gymnastic activities
Description
Data for Exercises 2.21 and 9.14
Usage
Gym
Format
A data frame/tibble with eight observations on three variables
- age
age of child
- number
number of gymnastic activities successfully completed
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(number ~ age, data = Gym)
model <- lm(number ~ age, data = Gym)
abline(model, col = "red")
summary(model)
Study habits of students in two matched school districts
Description
Data for Exercise 7.57
Usage
Habits
Format
A data frame/tibble with 11 observations on four variables
- A
study habit score
- B
study habit score
- differ
B
minusA
- signrks
the signed-ranked-differences
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
shapiro.test(Habits$differ)
qqnorm(Habits$differ)
qqline(Habits$differ)
wilcox.test(Pair(B, A) ~ 1, data = Habits, alternative = "less")
t.test(Habits$signrks, alternative = "less")
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Habits, aes(x = differ)) +
geom_dotplot(fill = "blue") +
theme_bw()
## End(Not run)
Haptoglobin concentration in blood serum of 8 healthy adults
Description
Data for Example 6.9
Usage
Haptoglo
Format
A data frame/tibble with eight observations on one variable
- concent
haptoglobin concentration (in grams per liter)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
shapiro.test(Haptoglo$concent)
t.test(Haptoglo$concent, mu = 2, alternative = "less")
Daily receipts for a small hardware store for 31 working days
Description
Daily receipts for a small hardware store for 31 working days
Usage
Hardware
Format
A data frame with 31 observations on one variable
- receipt
a numeric vector of daily receipts (in dollars)
Source
J.C. Miller and J.N. Miller, (1988), Statistics for Analytical Chemistry, 2nd Ed. (New York: Halsted Press).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Hardware$receipt)
Tensile strength of Kraft paper for different percentages of hardwood in the batches of pulp
Description
Data for Example 2.18 and Exercise 9.34
Usage
Hardwood
Format
A data frame/tibble with 19 observations on two variables
- tensile
tensile strength of kraft paper (in pounds per square inch)
- hardwood
percent of hardwood in the batch of pulp that was used to produce the paper
Source
G. Joglekar, et al., "Lack-of-Fit Testing When Replicates Are Not Available," The American Statistician, 43(3), (1989), 135-143.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(tensile ~ hardwood, data = Hardwood)
model <- lm(tensile ~ hardwood, data = Hardwood)
abline(model, col = "red")
plot(model, which = 1)
Primary heating sources of homes on indian reservations versus all households
Description
Data for Exercise 1.29
Usage
Heat
Format
A data frame/tibble with 301 observations on two variables
- fuel
a factor with levels
Utility gas
,LP bottled gas
,Electricity
,Fuel oil
,Wood
, andOther
- location
a factor with levels
American Indians on reservation
,All U.S. households
, andAmerican Indians not on reservations
Source
Bureau of the Census, Housing of the American Indians on Reservations, Statistical Brief 95-11, April 1995.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~ fuel + location, data = Heat)
T1
barplot(t(T1), beside = TRUE, legend = TRUE)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Heat, aes(x = fuel, fill = location)) +
geom_bar(position = "dodge") +
labs(y = "percent") +
theme_bw() +
theme(axis.text.x = element_text(angle = 30, hjust = 1))
## End(Not run)
Fuel efficiency ratings for three types of oil heaters
Description
Data for Exercise 10.32
Usage
Heating
Format
A data frame/tibble with 90 observations on the two variables
- type
a factor with levels
A
,B
, andC
denoting the type of oil heater- efficiency
heater efficiency rating
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(efficiency ~ type, data = Heating,
col = c("red", "blue", "green"))
kruskal.test(efficiency ~ type, data = Heating)
Results of treatments for Hodgkin's disease
Description
Data for Exercise 2.77
Usage
Hodgkin
Format
A data frame/tibble with 538 observations on two variables
- type
a factor with levels
LD
,LP
,MC
, andNS
- response
a factor with levels
Positive
,Partial
, andNone
Source
I. Dunsmore, F. Daly, Statistical Methods, Unit 9, Categorical Data, Milton Keynes, The Open University, 18.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~type + response, data = Hodgkin)
T1
barplot(t(T1), legend = TRUE, beside = TRUE)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Hodgkin, aes(x = type, fill = response)) +
geom_bar(position = "dodge") +
theme_bw()
## End(Not run)
Median prices of single-family homes in 65 metropolitan statistical areas
Description
Data for Statistical Insight Chapter 5
Usage
Homes
Format
A data frame/tibble with 65 observations on the four variables
- city
a character variable with values
Akron OH
,Albuquerque NM
,Anaheim CA
,Atlanta GA
,Baltimore MD
,Baton Rouge LA
,Birmingham AL
,Boston MA
,Bradenton FL
,Buffalo NY
,Charleston SC
,Chicago IL
,Cincinnati OH
,Cleveland OH
,Columbia SC
,Columbus OH
,Corpus Christi TX
,Dallas TX
,Daytona Beach FL
,Denver CO
,Des Moines IA
,Detroit MI
,El Paso TX
,Grand Rapids MI
,Hartford CT
,Honolulu HI
,Houston TX
,Indianapolis IN
,Jacksonville FL
,Kansas City MO
,Knoxville TN
,Las Vegas NV
,Los Angeles CA
,Louisville KY
,Madison WI
,Memphis TN
,Miami FL
,Milwaukee WI
,Minneapolis MN
,Mobile AL
,Nashville TN
,New Haven CT
,New Orleans LA
,New York NY
,Oklahoma City OK
,Omaha NE
,Orlando FL
,Philadelphia PA
,Phoenix AZ
,Pittsburgh PA
,Portland OR
,Providence RI
,Sacramento CA
,Salt Lake City UT
,San Antonio TX
,San Diego CA
,San Francisco CA
,Seattle WA
,Spokane WA
,St Louis MO
,Syracuse NY
,Tampa FL
,Toledo OH
,Tulsa OK
, andWashington DC
- region
a character variable with values
Midwest
,Northeast
,South
, andWest
- year
a factor with levels
1994
and2000
- price
median house price (in dollars)
Source
National Association of Realtors.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
tapply(Homes$price, Homes$year, mean)
tapply(Homes$price, Homes$region, mean)
p2000 <- subset(Homes, year == "2000")
p1994 <- subset(Homes, year == "1994")
## Not run:
library(dplyr)
library(ggplot2)
dplyr::group_by(Homes, year, region) %>%
summarize(AvgPrice = mean(price))
ggplot2::ggplot(data = Homes, aes(x = region, y = price)) +
geom_boxplot() +
theme_bw() +
facet_grid(year ~ .)
## End(Not run)
Number of hours per week spent on homework for private and public high school students
Description
Data for Exercise 7.78
Usage
Homework
Format
A data frame with 30 observations on two variables
- school
type of school either
private
orpublic
- time
number of hours per week spent on homework
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(time ~ school, data = Homework,
ylab = "Hours per week spent on homework")
#
t.test(time ~ school, data = Homework)
Miles per gallon for a Honda Civic on 35 different occasions
Description
Data for Statistical Insight Chapter 6
Usage
Honda
Format
A data frame/tibble with 35 observations on one variable
- mileage
miles per gallon for a Honda Civic
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
t.test(Honda$mileage, mu = 40, alternative = "less")
Hostility levels of high school students from rural, suburban, and urban areas
Description
Data for Example 10.6
Usage
Hostile
Format
A data frame/tibble with 135 observations on two variables
- location
a factor with the location of the high school student (
Rural
,Suburban
, orUrban
)- hostility
the score from the Hostility Level Test
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(hostility ~ location, data = Hostile,
col = c("red", "blue", "green"))
kruskal.test(hostility ~ location, data = Hostile)
Median home prices for 1984 and 1993 in 37 markets across the U.S.
Description
Data for Exercise 5.82
Usage
Housing
Format
A data frame/tibble with 74 observations on three variables
- city
a character variable with values
Albany
,Anaheim
,Atlanta
,Baltimore
,Birmingham
,Boston
,Chicago
,Cincinnati
,Cleveland
,Columbus
,Dallas
,Denver
,Detroit
,Ft Lauderdale
,Houston
,Indianapolis
,Kansas City
,Los Angeles
,Louisville
,Memphis
,Miami
,Milwaukee
,Minneapolis
,Nashville
,New York
,Oklahoma City
,Philadelphia
,Providence
,Rochester
,Salt Lake City
,San Antonio
,San Diego
,San Francisco
,San Jose
,St Louis
,Tampa
, andWashington
- year
a factor with levels
1984
and1993
- price
median house price (in dollars)
Source
National Association of Realtors.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stripchart(price ~ year, data = Housing, method = "stack",
pch = 1, col = c("red", "blue"))
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Housing, aes(x = price, fill = year)) +
geom_dotplot() +
facet_grid(year ~ .) +
theme_bw()
## End(Not run)
Number of storms, hurricanes and El Nino effects from 1950 through 1995
Description
Data for Exercises 1.38, 10.19, and Example 1.6
Usage
Hurrican
Format
A data frame/tibble with 46 observations on four variables
- year
a numeric vector indicating year
- storms
a numeric vector recording number of storms
- hurrican
a numeric vector recording number of hurricanes
- elnino
a factor with levels
cold
,neutral
, andwarm
Source
National Hurricane Center.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~hurrican, data = Hurrican)
T1
barplot(T1, col = "blue", main = "Problem 1.38",
xlab = "Number of hurricanes",
ylab = "Number of seasons")
boxplot(storms ~ elnino, data = Hurrican,
col = c("blue", "yellow", "red"))
anova(lm(storms ~ elnino, data = Hurrican))
rm(T1)
Number of icebergs sighted each month south of Newfoundland and south of the Grand Banks in 1920
Description
Data for Exercise 2.46 and 2.60
Usage
Iceberg
Format
A data frame with 12 observations on three variables
- month
a character variable with abbreviated months of the year
- Newfoundland
number of icebergs sighted south of Newfoundland
- Grand Banks
number of icebergs sighted south of Grand Banks
Source
N. Shaw, Manual of Meteorology, Vol. 2 (London: Cambridge University Press 1942), 7; and F. Mosteller and J. Tukey, Data Analysis and Regression (Reading, MA: Addison - Wesley, 1977).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(Newfoundland ~ `Grand Banks`, data = Iceberg)
abline(lm(Newfoundland ~ `Grand Banks`, data = Iceberg), col = "blue")
Percent change in personal income from 1st to 2nd quarter in 2000
Description
Data for Exercise 1.33
Usage
Income
Format
A data frame/tibble with 51 observations on two variables
- state
a character variable with values
Alabama
,Alaska
,Arizona
,Arkansas
,California
,Colorado
,Connecticut
,Delaware
,District of Colunbia
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
,Kansas
,Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missour
,Montana
,Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
,Washington
,West Virginia
,Wisconsin
, andWyoming
- percent_change
percent change in income from first quarter to the second quarter of 2000
Source
US Department of Commerce.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
Income$class <- cut(Income$percent_change,
breaks = c(-Inf, 0.5, 1.0, 1.5, 2.0, Inf))
T1 <- xtabs(~class, data = Income)
T1
barplot(T1, col = "pink")
## Not run:
library(ggplot2)
DF <- as.data.frame(T1)
DF
ggplot2::ggplot(data = DF, aes(x = class, y = Freq)) +
geom_bar(stat = "identity", fill = "purple") +
theme_bw()
## End(Not run)
Illustrates a comparison problem for long-tailed distributions
Description
Data for Exercise 7.41
Usage
Independent
Format
A data frame/tibble with 46 observations on two variables
- score
a numeric vector
- group
a factor with levels
A
andB
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Independent$score[Independent$group=="A"])
qqline(Independent$score[Independent$group=="A"])
qqnorm(Independent$score[Independent$group=="B"])
qqline(Independent$score[Independent$group=="B"])
boxplot(score ~ group, data = Independent, col = "blue")
wilcox.test(score ~ group, data = Independent)
Educational attainment versus per capita income and poverty rate for American indians living on reservations
Description
Data for Exercise 2.95
Usage
Indian
Format
A data frame/tibble with ten observations on four variables
- reservation
a character variable with values
Blackfeet
,Fort Apache
,Gila River
,Hopi
,Navajo
,Papago
,Pine Ridge
,Rosebud
,San Carlos
, andZuni Pueblo
- percent high school
percent who have graduated from high school
- per capita income
per capita income (in dollars)
- poverty rate
percent poverty
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
par(mfrow = c(1, 2))
plot(`per capita income` ~ `percent high school`, data = Indian,
xlab = "Percent high school graudates", ylab = "Per capita income")
plot(`poverty rate` ~ `percent high school`, data = Indian,
xlab = "Percent high school graudates", ylab = "Percent poverty")
par(mfrow = c(1, 1))
Average miles per hour for the winners of the Indianapolis 500 race
Description
Data for Exercise 1.128
Usage
Indiapol
Format
A data frame/tibble with 39 observations on two variables
- year
the year of the race
- speed
the winners average speed (in mph)
Source
The World Almanac and Book of Facts, 2000, p. 1004.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(speed ~ year, data = Indiapol, type = "b")
Qualifying miles per hour and number of previous starts for drivers in 79th Indianapolis 500 race
Description
Data for Exercises 7.11 and 7.36
Usage
Indy500
Format
A data frame/tibble with 33 observations on four variables
- driver
a character variable with values
andretti
,bachelart
,boesel
,brayton
,c.guerrero
,cheever
,fabi
,fernandez
,ferran
,fittipaldi
,fox
,goodyear
,gordon
,gugelmin
,herta
,james
,johansson
,jones
,lazier
,luyendyk
,matsuda
,matsushita
,pruett
,r.guerrero
,rahal
,ribeiro
,salazar
,sharp
,sullivan
,tracy
,vasser
,villeneuve
, andzampedri
- qualif
qualifying speed (in mph)
- starts
number of Indianapolis 500 starts
- group
a numeric vector where 1 indicates the driver has 4 or fewer Indianapolis 500 starts and a 2 for drivers with 5 or more Indianapolis 500 starts
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stripchart(qualif ~ group, data = Indy500, method = "stack",
pch = 19, col = c("red", "blue"))
boxplot(qualif ~ group, data = Indy500)
t.test(qualif ~ group, data = Indy500)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Indy500, aes(sample = qualif)) +
geom_qq() +
facet_grid(group ~ .) +
theme_bw()
## End(Not run)
Private pay increase of salaried employees versus inflation rate
Description
Data for Exercises 2.12 and 2.29
Usage
Inflatio
Format
A data frame/tibble with 24 observations on four variables
- year
a numeric vector of years
- pay
average hourly wage for salaried employees (in dollars)
- increase
percent increase in hourly wage over previous year
- inflation
percent inflation rate
Source
Bureau of Labor Statistics.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(increase ~ inflation, data = Inflatio)
cor(Inflatio$increase, Inflatio$inflation, use = "complete.obs")
Inlet oil temperature through a valve
Description
Data for Exercises 5.91 and 6.48
Usage
Inletoil
Format
A data frame/tibble with 12 observations on one variable
- temp
inlet oil temperature (Fahrenheit)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Inletoil$temp, breaks = 3)
qqnorm(Inletoil$temp)
qqline(Inletoil$temp)
t.test(Inletoil$temp)
t.test(Inletoil$temp, mu = 98, alternative = "less")
Type of drug offense by race
Description
Data for Statistical Insight Chapter 8
Usage
Inmate
Format
A data frame/tibble with 28,047 observations on two variables
- race
a factor with levels
white
,black
, andhispanic
- drug
a factor with levels
heroin
,crack
,cocaine
, andmarijuana
Source
C. Wolf Harlow (1994), Comparing Federal and State Prison Inmates, NCJ-145864, U.S. Department of Justice, Bureau of Justice Statistics.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~race + drug, data = Inmate)
T1
chisq.test(T1)
rm(T1)
Percent of vehicles passing inspection by type inspection station
Description
Data for Exercise 8.59
Usage
Inspect
Format
A data frame/tibble with 174 observations on two variables
- station
a factor with levels
auto inspection
,auto repair
,car care center
,gas station
,new car dealer
, andtire store
- passed
a factor with levels
less than 70%
,between 70% and 84%
, andmore than 85%
Source
The Charlotte Observer, December 13, 1992.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~ station + passed, data = Inspect)
T1
barplot(T1, beside = TRUE, legend = TRUE)
chisq.test(T1)
rm(T1)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Inspect, aes(x = passed, fill = station)) +
geom_bar(position = "dodge") +
theme_bw()
## End(Not run)
Heat loss through a new insulating medium
Description
Data for Exercise 9.50
Usage
Insulate
Format
A data frame/tibble with ten observations on two variables
- temp
outside temperature (in degrees Celcius)
- loss
heat loss (in BTUs)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(loss ~ temp, data = Insulate)
model <- lm(loss ~ temp, data = Insulate)
abline(model, col = "blue")
summary(model)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Insulate, aes(x = temp, y = loss)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
theme_bw()
## End(Not run)
GPA versus IQ for 12 individuals
Description
Data for Exercises 9.51 and 9.52
Usage
Iqgpa
Format
A data frame/tibble with 12 observations on two variables
- iq
IQ scores
- gpa
Grade point average
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(gpa ~ iq, data = Iqgpa, col = "blue", pch = 19)
model <- lm(gpa ~ iq, data = Iqgpa)
summary(model)
rm(model)
R.A. Fishers famous data on Irises
Description
Data for Examples 1.15 and 5.19
Usage
Irises
Format
A data frame/tibble with 150 observations on five variables
- sepal_length
sepal length (in cm)
- sepal_width
sepal width (in cm)
- petal_length
petal length (in cm)
- petal_width
petal width (in cm)
- species
a factor with levels
setosa
,versicolor
, andvirginica
Source
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179-188.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
tapply(Irises$sepal_length, Irises$species, mean)
t.test(Irises$sepal_length[Irises$species == "setosa"], conf.level = 0.99)
hist(Irises$sepal_length[Irises$species == "setosa"],
main = "Sepal length for\n Iris Setosa",
xlab = "Length (in cm)")
boxplot(sepal_length ~ species, data = Irises)
Number of problems reported per 100 cars in 1994 versus 1995s
Description
Data for Exercise 2.14, 2.17, 2.31, 2.33, and 2.40
Usage
Jdpower
Format
A data frame/tibble with 29 observations on three variables
- car
a factor with levels
Acura
,BMW
,Buick
,Cadillac
,Chevrolet
,Dodge
Eagle
,Ford
,Geo
,Honda
,Hyundai
,Infiniti
,Jaguar
,Lexus
,Lincoln
,Mazda
,Mercedes-Benz
,Mercury
,Mitsubishi
,Nissan
,Oldsmobile
,Plymouth
,Pontiac
,Saab
,Saturn
, andSubaru
,Toyota
Volkswagen
,Volvo
- 1994
number of problems per 100 cars in 1994
- 1995
number of problems per 100 cars in 1995
Source
USA Today, May 25, 1995.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
model <- lm(`1995` ~ `1994`, data = Jdpower)
summary(model)
plot(`1995` ~ `1994`, data = Jdpower)
abline(model, col = "red")
rm(model)
Job satisfaction and stress level for 9 school teachers
Description
Data for Exercise 9.60
Usage
Jobsat
Format
A data frame/tibble with nine observations on two variables
- wspt
Wilson Stress Profile score for teachers
- satisfaction
job satisfaction score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(satisfaction ~ wspt, data = Jobsat)
model <- lm(satisfaction ~ wspt, data = Jobsat)
abline(model, col = "blue")
summary(model)
rm(model)
Smoking habits of boys and girls ages 12 to 18
Description
Data for Exercise 4.85
Usage
Kidsmoke
Format
A data frame/tibble with 1000 observations on two variables
- gender
character vector with values
female
andmale
- smoke
a character vector with values
no
andyes
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~smoke + gender, data = Kidsmoke)
T1
prop.table(T1)
prop.table(T1, 1)
prop.table(T1, 2)
Rates per kilowatt-hour for each of the 50 states and DC
Description
Data for Example 5.9
Usage
Kilowatt
Format
A data frame/tibble with 51 observations on two variables
- state
a factor with levels
Alabama
Alaska
,Arizona
,Arkansas
California
,Colorado
,Connecticut
,Delaware
,District of Columbia
,Florida
,Georgia
,Hawaii
,Idaho
,Illinois
,Indiana
,Iowa
Kansas
Kentucky
,Louisiana
,Maine
,Maryland
,Massachusetts
,Michigan
,Minnesota
,Mississippi
,Missour
,Montana
Nebraska
,Nevada
,New Hampshire
,New Jersey
,New Mexico
,New York
,North Carolina
,North Dakota
,Ohio
,Oklahoma
,Oregon
,Pennsylvania
,Rhode Island
,South Carolina
,South Dakota
,Tennessee
,Texas
,Utah
,Vermont
,Virginia
Washington
,West Virginia
,Wisconsin
, andWyoming
- rate
a numeric vector indicating rates for kilowatt per hour
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Kilowatt$rate)
Reading scores for first grade children who attended kindergarten versus those who did not
Description
Data for Exercise 7.68
Usage
Kinder
Format
A data frame/tibble with eight observations on three variables
- pair
a numeric indicator of pair
- kinder
reading score of kids who went to kindergarten
- nokinder
reading score of kids who did not go to kindergarten
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(Kinder$kinder, Kinder$nokinder)
diff <- Kinder$kinder - Kinder$nokinder
qqnorm(diff)
qqline(diff)
shapiro.test(diff)
t.test(diff)
rm(diff)
Median costs of laminectomies at hospitals across North Carolina in 1992
Description
Data for Exercise 10.18
Usage
Laminect
Format
A data frame/tibble with 138 observations on two variables
- area
a character vector indicating the area of the hospital with
Rural
,Regional
, andMetropol
- cost
a numeric vector indicating cost of a laminectomy
Source
Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(cost ~ area, data = Laminect, col = topo.colors(3))
anova(lm(cost ~ area, data = Laminect))
Lead levels in children's blood whose parents worked in a battery factory
Description
Data for Example 1.17
Usage
Lead
Format
A data frame/tibble with 66 observations on the two variables
- group
a character vector with values
exposed
andcontrol
- lead
a numeric vector indicating the level of lead in children's blood (in micrograms/dl)
Source
Morton, D. et al. (1982), "Lead Absorption in Children of Employees in a Lead-Related Industry," American Journal of Epidemiology, 155, 549-555.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(lead ~ group, data = Lead, col = topo.colors(2))
Leadership exam scores by age for employees on an industrial plant
Description
Data for Exercise 7.31
Usage
Leader
Format
A data frame/tibble with 34 observations on two variables
- age
a character vector indicating age with values
under35
andover35
- score
score on a leadership exam
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ age, data = Leader, col = c("gray", "green"))
t.test(score ~ age, data = Leader)
Survival time of mice injected with an experimental lethal drug
Description
Data for Example 6.12
Usage
Lethal
Format
A data frame/tibble with 30 observations on one variable
- survival
a numeric vector indicating time surivived after injection (in seconds)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
SIGN.test(Lethal$survival, md = 45, alternative = "less")
Life expectancy of men and women in U.S.
Description
Data for Exercise 1.31
Usage
Life
Format
A data frame/tibble with eight observations on three variables
- year
a numeric vector indicating year
- men
life expectancy for men (in years)
- women
life expectancy for women (in years)
Source
National Center for Health Statistics.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(men ~ year, type = "l", ylim = c(min(men, women), max(men, women)),
col = "blue", main = "Life Expectancy vs Year", ylab = "Age",
xlab = "Year", data = Life)
lines(women ~ year, col = "red", data = Life)
text(1955, 65, "Men", col = "blue")
text(1955, 70, "Women", col = "red")
Life span of electronic components used in a spacecraft versus heat
Description
Data for Exercise 2.4, 2.37, and 2.49
Usage
Lifespan
Format
A data frame/tibble with six observations two variables
- heat
temperature (in Celcius)
- life
lifespan of component (in hours)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(life ~ heat, data = Lifespan)
model <- lm(life ~ heat, data = Lifespan)
abline(model, col = "red")
resid(model)
sum((resid(model))^2)
anova(model)
rm(model)
Relationship between damage reports and deaths caused by lightning
Description
Data for Exercise 2.6
Usage
Ligntmonth
Format
A data frame/tibble with 12 observations on four variables
- month
a factor with levels
1/01/2000
,10/01/2000
,11/01/2000
,12/01/2000
,2/01/2000
,3/01/2000
,4/01/2000
,5/01/2000
,6/01/2000
,7/01/2000
,8/01/2000
, and9/01/2000
- deaths
number of deaths due to lightning strikes
- injuries
number of injuries due to lightning strikes
- damage
damage due to lightning strikes (in dollars)
Source
Lighting Fatalities, Injuries and Damage Reports in the United States, 1959-1994, NOAA Technical Memorandum NWS SR-193, Dept. of Commerce.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(deaths ~ damage, data = Ligntmonth)
model = lm(deaths ~ damage, data = Ligntmonth)
abline(model, col = "red")
rm(model)
Measured traffic at three prospective locations for a motor lodge
Description
Data for Exercise 10.33
Usage
Lodge
Format
A data frame/tibble with 45 observations on six variables
- traffic
a numeric vector indicating the amount of vehicles that passed a site in 1 hour
- site
a numeric vector with values
1
,2
, and3
- ranks
ranks for variable
traffic
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(traffic ~ site, data = Lodge, col = cm.colors(3))
anova(lm(traffic ~ factor(site), data = Lodge))
Long-tailed distributions to illustrate Kruskal Wallis test
Description
Data for Exercise 10.45
Usage
Longtail
Format
A data frame/tibble with 60 observations on three variables
- score
a numeric vector
- group
a numeric vector with values
1
,2
, and3
- ranks
ranks for variable
score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ group, data = Longtail, col = heat.colors(3))
kruskal.test(score ~ factor(group), data = Longtail)
anova(lm(score ~ factor(group), data = Longtail))
Reading skills of 24 matched low ability students
Description
Data for Example 7.18
Usage
Lowabil
Format
A data frame/tibble with 12 observations on three variables
- pair
a numeric indicator of pair
- experiment
score of the child with the experimental method
- control
score of the child with the standard method
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
diff = Lowabil$experiment - Lowabil$control
qqnorm(diff)
qqline(diff)
shapiro.test(diff)
t.test(diff)
rm(diff)
Magnesium concentration and distances between samples
Description
Data for Exercise 9.9
Usage
Magnesiu
Format
A data frame/tibble with 20 observations on two variables
- distance
distance between samples
- magnesium
concentration of magnesium
Source
Davis, J. (1986), Statistics and Data Analysis in Geology, 2d. Ed., John Wiley and Sons, New York, p. 146.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(magnesium ~ distance, data = Magnesiu)
model = lm(magnesium ~ distance, data = Magnesiu)
abline(model, col = "red")
summary(model)
rm(model)
Amounts awarded in 17 malpractice cases
Description
Data for Exercise 5.73
Usage
Malpract
Format
A data frame/tibble with 17 observations on one variable
- award
malpractice reward (in $1000)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
SIGN.test(Malpract$award, conf.level = 0.90)
Advertised salaries offered general managers of major corporations in 1995
Description
Data for Exercise 5.81
Usage
Manager
Format
A data frame/tibble with 26 observations on one variable
- salary
random sample of advertised annual salaries of top executives (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Manager$salary)
SIGN.test(Manager$salary)
Percent of marked cars in 65 police departments in Florida
Description
Data for Exercise 6.100
Usage
Marked
Format
A data frame/tibble with 65 observations on one variable
- percent
percentage of marked cars in 65 Florida police departments
Source
Law Enforcement Management and Administrative Statistics, 1993, Bureau of Justice Statistics, NCJ-148825, September 1995, p. 147-148.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Marked$percent)
SIGN.test(Marked$percent, md = 60, alternative = "greater")
t.test(Marked$percent, mu = 60, alternative = "greater")
Standardized math test scores for 30 students
Description
Data for Exercise 1.69
Usage
Math
Format
A data frame/tibble with 30 observations on one variable
- score
scores on a standardized test for 30 tenth graders
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Math$score)
hist(Math$score, main = "Math Scores", xlab = "score", freq = FALSE)
lines(density(Math$score), col = "red")
CharlieZ <- (62 - mean(Math$score))/sd(Math$score)
CharlieZ
scale(Math$score)[which(Math$score == 62)]
Standardized math competency for a group of entering freshmen at a small community college
Description
Data for Exercise 5.26
Usage
Mathcomp
Format
A data frame/tibble with 31 observations one variable
- score
scores of 31 entering freshmen at a community college on a national standardized test
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Mathcomp$score)
EDA(Mathcomp$score)
Math proficiency and SAT scores by states
Description
Data for Exercise 9.24, Example 9.1, and Example 9.6
Usage
Mathpro
Format
A data frame/tibble with 51 observations on four variables
- state
a factor with levels
Conn
,D.C.
,Del
,Ga
,Hawaii
,Ind
,Maine
,Mass
,Md
,N.C.
,N.H.
,N.J.
,N.Y.
,Ore
,Pa
,R.I.
,S.C.
,Va
, andVt
- sat_math
SAT math scores for high school seniors
- profic
math proficiency scores for eigth graders
- group
a numeric vector
Source
National Assessment of Educational Progress and The College Board.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
model <- lm(sat_math ~ profic, data = Mathpro)
plot(sat_math ~ profic, data = Mathpro, ylab = "SAT", xlab = "proficiency")
abline(model, col = "red")
summary(model)
rm(model)
Error scores for four groups of experimental animals running a maze
Description
Data for Exercise 10.13
Usage
Maze
Format
A data frame/tibble with 32 observations on two variables
- score
error scores for animals running through a maze under different conditions
- condition
a factor with levels
CondA
,CondB,
CondC
, andCondD
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ condition, data = Maze, col = rainbow(4))
anova(lm(score ~ condition, data = Maze))
Illustrates test of equality of medians with the Kruskal Wallis test
Description
Data for Exercise 10.52
Usage
Median
Format
A data frame/tibble with 45 observations on two variables
- sample
a vector with values
Sample1
,Sample 2
, andSample 3
- value
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(value ~ sample, data = Median, col = rainbow(3))
anova(lm(value ~ sample, data = Median))
kruskal.test(value ~ factor(sample), data = Median)
Median mental ages of 16 girls
Description
Data for Exercise 6.52
Usage
Mental
Format
A data frame/tibble with 16 observations on one variable
- age
mental age of 16 girls
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
SIGN.test(Mental$age, md = 100)
Concentration of mercury in 25 lake trout
Description
Data for Example 1.9
Usage
Mercury
Format
A data frame/tibble with 25 observations on one variable
- mercury
a numeric vector measuring mercury (in parts per million)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Mercury$mercury)
Monthly rental costs in metro areas with 1 million or more persons
Description
Data for Exercise 5.117
Usage
Metrent
Format
A data frame/tibble with 46 observations on one variable
- rent
monthly rent in dollars
Source
U.S. Bureau of the Census, Housing in the Metropolitan Areas, Statistical Brief SB/94/19, September 1994.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(Metrent$rent, col = "magenta")
t.test(Metrent$rent, conf.level = 0.99)$conf
Miller personality test scores for a group of college students applying for graduate school
Description
Data for Example 5.7
Usage
Miller
Format
A data frame/tibble with 25 observations on one variable
- miller
scores on the Miller Personality test
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Miller$miller)
fivenum(Miller$miller)
boxplot(Miller$miller)
qqnorm(Miller$miller,col = "blue")
qqline(Miller$miller, col = "red")
Twenty scores on the Miller personality test
Description
Data for Exercise 1.41
Usage
Miller1
Format
A data frame/tibble with 20 observations on one variable
- miller
scores on the Miller personality test
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Miller1$miller)
stem(Miller1$miller, scale = 2)
Moisture content and depth of core sample for marine muds in eastern Louisiana
Description
Data for Exercise 9.32
Usage
Moisture
Format
A data frame/tibble with 16 observations on four variables
- depth
a numeric vector
- moisture
g of water per 100 g of dried sediment
- lnmoist
a numeric vector
- depthsq
a numeric vector
Source
Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2d. ed., John Wiley and Sons, New York, pp. 177, 185.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(moisture ~ depth, data = Moisture)
model <- lm(moisture ~ depth, data = Moisture)
abline(model, col = "red")
plot(resid(model) ~ depth, data = Moisture)
rm(model)
Carbon monoxide emitted by smoke stacks of a manufacturer and a competitor
Description
Data for Exercise 7.45
Usage
Monoxide
Format
A data frame/tibble with ten observations on two variables
- company
a vector with values
manufacturer
andcompetitor
- emission
carbon monoxide emitted
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(emission ~ company, data = Monoxide, col = topo.colors(2))
t.test(emission ~ company, data = Monoxide)
wilcox.test(emission ~ company, data = Monoxide)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Monoxide, aes(x = company, y = emission)) +
geom_boxplot() +
theme_bw()
## End(Not run)
Moral attitude scale on 15 subjects before and after viewing a movie
Description
Data for Exercise 7.53
Usage
Movie
Format
A data frame/tibble with 12 observations on three variables
- before
moral aptitude before viewing the movie
- after
moral aptitude after viewing the movie
- differ
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Movie$differ)
qqline(Movie$differ)
shapiro.test(Movie$differ)
t.test(Movie$differ, conf.level = 0.99)
wilcox.test(Movie$differ)
Improvement scores for identical twins taught music recognition by two techniques
Description
Data for Exercise 7.59
Usage
Music
Format
A data frame/tibble with 12 observations on three variables
- method1
a numeric vector measuring the improvement scores on a music recognition test
- method2
a numeric vector measuring the improvement scores on a music recognition test
- differ
method1
-method2
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Music$differ)
qqline(Music$differ)
shapiro.test(Music$differ)
t.test(Music$differ)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Music, aes(x = differ)) +
geom_dotplot() +
theme_bw()
## End(Not run)
Estimated value of a brand name product and the conpany's revenue
Description
Data for Exercises 2.28, 9.19, and Example 2.8
Usage
Name
Format
A data frame/tibble with 42 observations on three variables
- brand
a factor with levels
Band-Aid
,Barbie
,Birds Eye
,Budweiser
,Camel
,Campbell
,Carlsberg
,Coca-Cola
,Colgate
,Del Monte
,Fisher-Price
,Gordon's
,Green Giant
,Guinness
,Haagen-Dazs
,Heineken
,Heinz
,Hennessy
,Hermes
,Hershey
,Ivory
,Jell-o
,Johnnie Walker
,Kellogg
,Kleenex
,Kraft
,Louis Vuitton
,Marlboro
,Nescafe
,Nestle
,Nivea
,Oil of Olay
,Pampers
,Pepsi-Cola
,Planters
,Quaker
,Sara Lee
,Schweppes
,Smirnoff
,Tampax
,Winston
, andWrigley's
- value
value in billions of dollars
- revenue
revenue in billions of dollars
Source
Financial World.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(value ~ revenue, data = Name)
model <- lm(value ~ revenue, data = Name)
abline(model, col = "red")
cor(Name$value, Name$revenue)
summary(model)
rm(model)
Efficiency of pit crews for three major NASCAR teams
Description
Data for Exercise 10.53
Usage
Nascar
Format
A data frame/tibble with 36 observations on six variables
- time
duration of pit stop (in seconds)
- team
a numeric vector representing team 1, 2, or 3
- ranks
a numeric vector ranking each pit stop in order of speed
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(time ~ team, data = Nascar, col = rainbow(3))
model <- lm(time ~ factor(team), data = Nascar)
summary(model)
anova(model)
rm(model)
Reaction effects of 4 drugs on 25 subjects with a nervous disorder
Description
Data for Example 10.3
Usage
Nervous
Format
A data frame/tibble with 25 observations on two variables
- react
a numeric vector representing reaction time
- drug
a numeric vector indicating each of the 4 drugs
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(react ~ drug, data = Nervous, col = rainbow(4))
model <- aov(react ~ factor(drug), data = Nervous)
summary(model)
TukeyHSD(model)
plot(TukeyHSD(model), las = 1)
Daily profits for 20 newsstands
Description
Data for Exercise 1.43
Usage
Newsstand
Format
A data frame/tibble with 20 observations on one variable
- profit
profit of each newsstand (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Newsstand$profit)
stem(Newsstand$profit, scale = 3)
Rating, time in 40-yard dash, and weight of top defensive linemen in the 1994 NFL draft
Description
Data for Exercise 9.63
Usage
Nfldraf2
Format
A data frame/tibble with 47 observations on three variables
- rating
rating of each player on a scale out of 10
- forty
forty yard dash time (in seconds)
- weight
weight of each player (in pounds)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(rating ~ forty, data = Nfldraf2)
summary(lm(rating ~ forty, data = Nfldraf2))
Rating, time in 40-yard dash, and weight of top offensive linemen in the 1994 NFL draft
Description
Data for Exercises 9.10 and 9.16
Usage
Nfldraft
Format
A data frame/tibble with 29 observations on three variables
- rating
rating of each player on a scale out of 10
- forty
forty yard dash time (in seconds)
- weight
weight of each player (in pounds)
Source
USA Today, April 20, 1994.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(rating ~ forty, data = Nfldraft)
cor(Nfldraft$rating, Nfldraft$forty)
summary(lm(rating ~ forty, data = Nfldraft))
Nicotine content versus sales for eight major brands of cigarettes
Description
Data for Exercise 9.21
Usage
Nicotine
Format
A data frame/tibble with eight observations on two variables
- nicotine
nicotine content (in milligrams)
- sales
sales figures (in $100,000)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
model <- lm(sales ~ nicotine, data = Nicotine)
plot(sales ~ nicotine, data = Nicotine)
abline(model, col = "red")
summary(model)
predict(model, newdata = data.frame(nicotine = 1),
interval = "confidence", level = 0.99)
Normal Area
Description
Function that computes and draws the area between two user specified values in a user specified normal distribution with a given mean and standard deviation
Usage
normarea(lower = -Inf, upper = Inf, m, sig)
Arguments
lower |
the lower value |
upper |
the upper value |
m |
the mean for the population |
sig |
the standard deviation of the population |
Author(s)
Alan T. Arnholt
Examples
normarea(70, 130, 100, 15)
# Finds and P(70 < X < 130) given X is N(100,15).
Required Sample Size
Description
Function to determine required sample size to be within a given margin of error.
Usage
nsize(b, sigma = NULL, p = 0.5, conf.level = 0.95, type = "mu")
Arguments
b |
the desired bound. |
sigma |
population standard deviation. Not required if using type
|
p |
estimate for the population proportion of successes. Not required
if using type |
conf.level |
confidence level for the problem, restricted to lie between zero and one. |
type |
character string, one of |
Details
Answer is based on a normal approximation when using type "pi"
.
Value
Returns required sample size.
Author(s)
Alan T. Arnholt
Examples
nsize(b=.03, p=708/1200, conf.level=.90, type="pi")
# Returns the required sample size (n) to estimate the population
# proportion of successes with a 0.9 confidence interval
# so that the margin of error is no more than 0.03 when the
# estimate of the population propotion of successes is 708/1200.
# This is problem 5.38 on page 257 of Kitchen's BSDA.
nsize(b=.15, sigma=.31, conf.level=.90, type="mu")
# Returns the required sample size (n) to estimate the population
# mean with a 0.9 confidence interval so that the margin
# of error is no more than 0.15. This is Example 5.17 on page
# 261 of Kitchen's BSDA.
Normality Tester
Description
Q-Q plots of randomly generated normal data of the same size as the tested data are generated and ploted on the perimeter of the graph while a Q-Q plot of the actual data is depicted in the center of the graph.
Usage
ntester(actual.data)
Arguments
actual.data |
a numeric vector. Missing and infinite values are
allowed, but are ignored in the calculation. The length of
|
Details
Q-Q plots of randomly generated normal data of the same size as the tested data are generated and ploted on the perimeter of the graph sheet while a Q-Q plot of the actual data is depicted in the center of the graph. The p-values are calculated form the Shapiro-Wilk W-statistic. Function will only work on numeric vectors containing less than or equal to 5000 observations.
Author(s)
Alan T. Arnholt
References
Shapiro, S.S. and Wilk, M.B. (1965). An analysis of variance test for normality (complete samples). Biometrika 52 : 591-611.
Examples
ntester(rexp(50,1))
# Q-Q plot of random exponential data in center plot
# surrounded by 8 Q-Q plots of randomly generated
# standard normal data of size 50.
Price of oranges versus size of the harvest
Description
Data for Exercise 9.61
Usage
Orange
Format
A data frame/tibble with six observations on two variables
- harvest
harvest in millions of boxes
- price
average price charged by California growers for a 75-pound box of navel oranges
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(price ~ harvest, data = Orange)
model <- lm(price ~ harvest, data = Orange)
abline(model, col = "red")
summary(model)
rm(model)
Salaries of members of the Baltimore Orioles baseball team
Description
Data for Example 1.3
Usage
Orioles
Format
A data frame/tibble with 27 observations on three variables
- first name
a factor with levels
Albert
,Arthur
,B.J.
,Brady
,Cal
,Charles
,dl-Delino
,dl-Scott
,Doug
,Harold
,Heathcliff
,Jeff
,Jesse
,Juan
,Lenny
,Mike
,Rich
,Ricky
,Scott
,Sidney
,Will
, andWillis
- last name
a factor with levels
Amaral
,Anderson
,Baines
,Belle
,Bones
,Bordick
,Clark
,Conine
,Deshields
,Erickson
,Fetters
,Garcia
,Guzman
,Johns
,Johnson
,Kamieniecki
,Mussina
,Orosco
,Otanez
,Ponson
,Reboulet
,Rhodes
,Ripken Jr.
,Slocumb
,Surhoff
,Timlin
, andWebster
- 1999salary
a numeric vector containing each player's salary (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stripchart(Orioles$`1999salary`, method = "stack", pch = 19)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Orioles, aes(x = `1999salary`)) +
geom_dotplot(dotsize = 0.5) +
labs(x = "1999 Salary") +
theme_bw()
## End(Not run)
Arterial blood pressure of 11 subjects before and after receiving oxytocin
Description
Data for Exercise 7.86
Usage
Oxytocin
Format
A data frame/tibble with 11 observations on three variables
- subject
a numeric vector indicating each subject
- before
mean arterial blood pressure of subject before receiving oxytocin
- after
mean arterial blood pressure of subject after receiving oxytocin
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
diff = Oxytocin$after - Oxytocin$before
qqnorm(diff)
qqline(diff)
shapiro.test(diff)
t.test(diff)
rm(diff)
Education backgrounds of parents of entering freshmen at a state university
Description
Data for Exercise 1.32
Usage
Parented
Format
A data frame/tibble with 200 observations on two variables
- education
a factor with levels
4yr college degree
,Doctoral degree
,Grad degree
,H.S grad or less
,Some college
, andSome grad school
- parent
a factor with levels
mother
andfather
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~education + parent, data = Parented)
T1
barplot(t(T1), beside = TRUE, legend = TRUE, col = c("blue", "red"))
rm(T1)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Parented, aes(x = education, fill = parent)) +
geom_bar(position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) +
scale_fill_manual(values = c("pink", "blue")) +
labs(x = "", y = "")
## End(Not run)
Years of experience and number of tickets given by patrolpersons in New York City
Description
Data for Example 9.3
Usage
Patrol
Format
A data frame/tibble with ten observations on three variables
- tickets
number of tickets written per week
- years
patrolperson's experience (in years)
- log_tickets
natural log of
tickets
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
model <- lm(tickets ~ years, data = Patrol)
summary(model)
confint(model, level = 0.98)
Karl Pearson's data on heights of brothers and sisters
Description
Data for Exercise 2.20
Usage
Pearson
Format
A data frame/tibble with 11 observations on three variables
- family
number indicating family of brother and sister pair
- brother
height of brother (in inches)
- sister
height of sister (in inches)
Source
Pearson, K. and Lee, A. (1902-3), On the Laws of Inheritance in Man, Biometrika, 2, 357.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(brother ~ sister, data = Pearson, col = "lightblue")
cor(Pearson$brother, Pearson$sister)
Length of long-distance phone calls for a small business firm
Description
Data for Exercise 6.95
Usage
Phone
Format
A data frame/tibble with 20 observations on one variable
- time
duration of long distance phone call (in minutes)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Phone$time)
qqline(Phone$time)
shapiro.test(Phone$time)
SIGN.test(Phone$time, md = 5, alternative = "greater")
Number of poisonings reported to 16 poison control centers
Description
Data for Exercise 1.113
Usage
Poison
Format
A data frame/tibble with 226,361 observations on one variable
- type
a factor with levels
Alcohol
,Cleaning agent
,Cosmetics
,Drugs
,Insecticides
, andPlants
Source
Centers for Disease Control, Atlanta, Georgia.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~type, data = Poison)
T1
par(mar = c(5.1 + 2, 4.1, 4.1, 2.1))
barplot(sort(T1, decreasing = TRUE), las = 2, col = rainbow(6))
par(mar = c(5.1, 4.1, 4.1, 2.1))
rm(T1)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Poison, aes(x = type, fill = type)) +
geom_bar() +
theme_bw() +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) +
guides(fill = FALSE)
## End(Not run)
Political party and gender in a voting district
Description
Data for Example 8.3
Usage
Politic
Format
A data frame/tibble with 250 observations on two variables
- party
a factor with levels
republican
,democrat
, andother
- gender
a factor with levels
female
andmale
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~party + gender, data = Politic)
T1
chisq.test(T1)
rm(T1)
Air pollution index for 15 randomly selected days for a major western city
Description
Data for Exercise 5.59
Usage
Pollutio
Format
A data frame/tibble with 15 observations on one variable
- inde
air pollution index
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Pollutio$inde)
t.test(Pollutio$inde, conf.level = 0.98)$conf
Porosity measurements on 20 samples of Tensleep Sandstone, Pennsylvanian from Bighorn Basin in Wyoming
Description
Data for Exercise 5.86
Usage
Porosity
Format
A data frame/tibble with 20 observations on one variable
- porosity
porosity measurement (percent)
Source
Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2nd edition, pages 63-65.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Porosity$porosity)
fivenum(Porosity$porosity)
boxplot(Porosity$porosity, col = "lightgreen")
Percent poverty and crime rate for selected cities
Description
Data for Exercise 9.11 and 9.17
Usage
Poverty
Format
A data frame/tibble with 20 observations on four variables
- city
a factor with levels
Atlanta
,Buffalo
,Cincinnati
,Cleveland
,Dayton, O
,Detroit
,Flint, Mich
,Fresno, C
,Gary, Ind
,Hartford, C
,Laredo
,Macon, Ga
,Miami
,Milwaukee
,New Orleans
,Newark, NJ
,Rochester,NY
,Shreveport
,St. Louis
, andWaco, Tx
- poverty
percent of children living in poverty
- crime
crime rate (per 1000 people)
- population
population of city
Source
Children's Defense Fund and the Bureau of Justice Statistics.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(poverty ~ crime, data = Poverty)
model <- lm(poverty ~ crime, data = Poverty)
abline(model, col = "red")
summary(model)
rm(model)
Robbery rates versus percent low income in eight precincts
Description
Data for Exercise 2.2 and 2.38
Usage
Precinct
Format
A data frame/tibble with eight observations on two variables
- rate
robbery rate (per 1000 people)
- income
percent with low income
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(rate ~ income, data = Precinct)
model <- (lm(rate ~ income, data = Precinct))
abline(model, col = "red")
rm(model)
Racial prejudice measured on a sample of 25 high school students
Description
Data for Exercise 5.10 and 5.22
Usage
Prejudic
Format
A data frame with 25 observations on one variable
- prejud
racial prejudice score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Prejudic$prejud)
EDA(Prejudic$prejud)
Ages at inauguration and death of U.S. presidents
Description
Data for Exercise 1.126
Usage
Presiden
Format
A data frame/tibble with 43 observations on five variables
- first_initial
a factor with levels
A.
,B.
,C.
,D.
,F.
,G.
,G. W.
,H.
,J.
,L.
,M.
,R.
,T.
,U.
,W.
, andZ.
- last_name
a factor with levels
Adams
,Arthur
,Buchanan
,Bush
,Carter
,Cleveland
,Clinton
,Coolidge
,Eisenhower
,Fillmore
,Ford
,Garfield
,Grant
,Harding
,Harrison
,Hayes
,Hoover
,Jackson
,Jefferson
,Johnson
,Kennedy
,Lincoln
,Madison
,McKinley
,Monroe
,Nixon
,Pierce
,Polk
,Reagan
,Roosevelt
,Taft
,Taylor
,Truman
,Tyler
,VanBuren
,Washington
, andWilson
- birth_state
a factor with levels
ARK
,CAL
,CONN
,GA
,IA
,ILL
,KY
,MASS
,MO
,NC
,NEB
,NH
,NJ
,NY
,OH
,PA
,SC
,TEX
,VA
, andVT
- inaugural_age
President's age at inauguration
- death_age
President's age at death
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
pie(xtabs(~birth_state, data = Presiden))
stem(Presiden$inaugural_age)
stem(Presiden$death_age)
par(mar = c(5.1, 4.1 + 3, 4.1, 2.1))
stripchart(x=list(Presiden$inaugural_age, Presiden$death_age),
method = "stack", col = c("green","brown"), pch = 19, las = 1)
par(mar = c(5.1, 4.1, 4.1, 2.1))
Degree of confidence in the press versus education level for 20 randomly selected persons
Description
Data for Exercise 9.55
Usage
Press
Format
A data frame/tibble with 20 observations on two variables
- education_yrs
years of education
- confidence
degree of confidence in the press (the higher the score, the more confidence)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(confidence ~ education_yrs, data = Press)
model <- lm(confidence ~ education_yrs, data = Press)
abline(model, col = "purple")
summary(model)
rm(model)
Klopfer's prognostic rating scale for subjects receiving behavior modification therapy
Description
Data for Exercise 6.61
Usage
Prognost
Format
A data frame/tibble with 15 observations on one variable
- kprs_score
Kloper's Prognostic Rating Scale score
Source
Newmark, C., et al. (1973), Predictive Validity of the Rorschach Prognostic Rating Scale with Behavior Modification Techniques, Journal of Clinical Psychology, 29, 246-248.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Prognost$kprs_score)
t.test(Prognost$kprs_score, mu = 9)
Effects of four different methods of programmed learning for statistics students
Description
Data for Exercise 10.17
Usage
Program
Format
A data frame/tibble with 44 observations on two variables
- method
a character variable with values
method1
,method2
,method3
, andmethod4
- score
standardized test score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ method, col = c("red", "blue", "green", "yellow"), data = Program)
anova(lm(score ~ method, data = Program))
TukeyHSD(aov(score ~ method, data = Program))
par(mar = c(5.1, 4.1 + 4, 4.1, 2.1))
plot(TukeyHSD(aov(score ~ method, data = Program)), las = 1)
par(mar = c(5.1, 4.1, 4.1, 2.1))
PSAT scores versus SAT scores
Description
Data for Exercise 2.50
Usage
Psat
Format
A data frame/tibble with seven observations on the two variables
- psat
PSAT score
- sat
SAT score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
model <- lm(sat ~ psat, data = Psat)
par(mfrow = c(1, 2))
plot(Psat$psat, resid(model))
plot(model, which = 1)
rm(model)
par(mfrow = c(1, 1))
Correct responses for 24 students in a psychology experiment
Description
Data for Exercise 1.42
Usage
Psych
Format
A data frame/tibble with 23 observations on one variable
- score
number of correct repsonses in a psychology experiment
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Psych$score)
EDA(Psych$score)
Weekly incomes of a random sample of 50 Puerto Rican families in Miami
Description
Data for Exercise 5.22 and 5.65
Usage
Puerto
Format
A data frame/tibble with 50 observations on one variable
- income
weekly family income (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Puerto$income)
boxplot(Puerto$income, col = "purple")
t.test(Puerto$income,conf.level = .90)$conf
Plasma LDL levels in two groups of quail
Description
Data for Exercise 1.53, 1.77, 1.88, 5.66, and 7.50
Usage
Quail
Format
A data frame/tibble with 40 observations on two variables
- group
a character variable with values
placebo
andtreatment
- level
low-density lipoprotein (LDL) cholestrol level
Source
J. McKean, and T. Vidmar (1994), "A Comparison of Two Rank-Based Methods for the Analysis of Linear Models," The American Statistician, 48, 220-229.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(level ~ group, data = Quail, horizontal = TRUE, xlab = "LDL Level",
col = c("yellow", "lightblue"))
Quality control test scores on two manufacturing processes
Description
Data for Exercise 7.81
Usage
Quality
Format
A data frame/tibble with 15 observations on two variables
- process
a character variable with values
Process1
andProcess2
- score
results of a quality control test
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ process, data = Quality, col = "lightgreen")
t.test(score ~ process, data = Quality)
Rainfall in an area of west central Kansas and four surrounding counties
Description
Data for Exercise 9.8
Usage
Rainks
Format
A data frame/tibble with 35 observations on five variables
- rain
rainfall (in inches)
- x1
rainfall (in inches)
- x2
rainfall (in inches)
- x3
rainfall (in inches)
- x4
rainfall (in inches)
Source
R. Picard, K. Berk (1990), Data Splitting, The American Statistician, 44, (2), 140-147.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
cor(Rainks)
model <- lm(rain ~ x2, data = Rainks)
summary(model)
Research and development expenditures and sales of a large company
Description
Data for Exercise 9.36 and Example 9.8
Usage
Randd
Format
A data frame/tibble with 12 observations on two variables
- rd
research and development expenditures (in million dollars)
- sales
sales (in million dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(sales ~ rd, data = Randd)
model <- lm(sales ~ rd, data = Randd)
abline(model, col = "purple")
summary(model)
plot(model, which = 1)
rm(model)
Survival times of 20 rats exposed to high levels of radiation
Description
Data for Exercise 1.52, 1.76, 5.62, and 6.44
Usage
Rat
Format
A data frame/tibble with 20 observations on one variable
- survival_time
survival time in weeks for rats exposed to a high level of radiation
Source
J. Lawless, Statistical Models and Methods for Lifetime Data (New York: Wiley, 1982).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Rat$survival_time)
qqnorm(Rat$survival_time)
qqline(Rat$survival_time)
summary(Rat$survival_time)
t.test(Rat$survival_time)
t.test(Rat$survival_time, mu = 100, alternative = "greater")
Grade point averages versus teacher's ratings
Description
Data for Example 2.6
Usage
Ratings
Format
A data frame/tibble with 250 observations on two variables
- rating
character variable with students' ratings of instructor (A-F)
- gpa
students' grade point average
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(gpa ~ rating, data = Ratings, xlab = "Student rating of instructor",
ylab = "Student GPA")
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Ratings, aes(x = rating, y = gpa, fill = rating)) +
geom_boxplot() +
theme_bw() +
theme(legend.position = "none") +
labs(x = "Student rating of instructor", y = "Student GPA")
## End(Not run)
Threshold reaction time for persons subjected to emotional stress
Description
Data for Example 6.11
Usage
Reaction
Format
A data frame/tibble with 12 observations on one variable
- time
threshold reaction time (in seconds) for persons subjected to emotional stress
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Reaction$time)
SIGN.test(Reaction$time, md = 15, alternative = "less")
Standardized reading scores for 30 fifth graders
Description
Data for Exercise 1.72 and 2.10
Usage
Reading
Format
A data frame/tibble with 30 observations on four variables
- score
standardized reading test score
- sorted
sorted values of
score
- trimmed
trimmed values of
sorted
- winsoriz
winsorized values of
score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Reading$score, main = "Exercise 1.72",
col = "lightgreen", xlab = "Standardized reading score")
summary(Reading$score)
sd(Reading$score)
Reading scores versus IQ scores
Description
Data for Exercises 2.10 and 2.53
Usage
Readiq
Format
A data frame/tibble with 14 observations on two variables
- reading
reading achievement score
- iq
IQ score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(reading ~ iq, data = Readiq)
model <- lm(reading ~ iq, data = Readiq)
abline(model, col = "purple")
predict(model, newdata = data.frame(iq = c(100, 120)))
residuals(model)[c(6, 7)]
rm(model)
Opinion on referendum by view on freedom of the press
Description
Data for Exercise 8.20
Usage
Referend
Format
A data frame with 237 observations on two variables
- choice
a factor with levels
A
,B
, andC
- response
a factor with levels
for
,against
, andundecided
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~choice + response, data = Referend)
T1
chisq.test(T1)
chisq.test(T1)$expected
Pollution index taken in three regions of the country
Description
Data for Exercise 10.26
Usage
Region
Format
A data frame/tibble with 48 observations on three variables
- pollution
pollution index
- region
region of a county (
west
,central
, andeast
)- ranks
ranked values of
pollution
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(pollution ~ region, data = Region, col = "gray")
anova(lm(pollution ~ region, data = Region))
Maintenance cost versus age of cash registers in a department store
Description
Data for Exercise 2.3, 2.39, and 2.54
Usage
Register
Format
A data frame/tibble with nine observations on two variables
- age
age of cash register (in years)
- cost
maintenance cost of cash register (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(cost ~ age, data = Register)
model <- lm(cost ~ age, data = Register)
abline(model, col = "red")
predict(model, newdata = data.frame(age = c(5, 10)))
plot(model, which = 1)
rm(model)
Rehabilitative potential of 20 prison inmates as judged by two psychiatrists
Description
Data for Exercise 7.61
Usage
Rehab
Format
A data frame/tibble with 20 observations on four variables
- inmate
inmate identification number
- psych1
rating from first psychiatrist on the inmates rehabilative potential
- psych2
rating from second psychiatrist on the inmates rehabilative potential
- differ
psych1
-psych2
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(Rehab$differ)
qqnorm(Rehab$differ)
qqline(Rehab$differ)
t.test(Rehab$differ)
Math placement test score for 35 freshmen females and 42 freshmen males
Description
Data for Exercise 7.43
Usage
Remedial
Format
A data frame/tibble with 84 observations on two variables
- gender
a character variable with values
female
andmale
- score
math placement score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ gender, data = Remedial,
col = c("purple", "blue"))
t.test(score ~ gender, data = Remedial, conf.level = 0.98)
t.test(score ~ gender, data = Remedial, conf.level = 0.98)$conf
wilcox.test(score ~ gender, data = Remedial,
conf.int = TRUE, conf.level = 0.98)
Weekly rentals for 45 apartments
Description
Data for Exercise 1.122
Usage
Rentals
Format
A data frame/tibble with 45 observations on one variable
- rent
weekly apartment rental price (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Rentals$rent)
sum(Rentals$rent < mean(Rentals$rent) - 3*sd(Rentals$rent) |
Rentals$rent > mean(Rentals$rent) + 3*sd(Rentals$rent))
Recorded times for repairing 22 automobiles involved in wrecks
Description
Data for Exercise 5.77
Usage
Repair
Format
A data frame/tibble with 22 observations on one variable
- time
time to repair a wrecked in car (in hours)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Repair$time)
SIGN.test(Repair$time, conf.level = 0.98)
Length of employment versus gross sales for 10 employees of a large retail store
Description
Data for Exercise 9.59
Usage
Retail
Format
A data frame/tibble with 10 observations on two variables
- months
length of employment (in months)
- sales
employee gross sales (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(sales ~ months, data = Retail)
model <- lm(sales ~ months, data = Retail)
abline(model, col = "blue")
summary(model)
Oceanography data obtained at site 1 by scientist aboard the ship Ron Brown
Description
Data for Exercise 2.9
Usage
Ronbrown1
Format
A data frame/tibble with 75 observations on two variables
- depth
ocen depth (in meters)
- temperature
ocean temperature (in Celsius)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(temperature ~ depth, data = Ronbrown1, ylab = "Temperature")
Oceanography data obtained at site 2 by scientist aboard the ship Ron Brown
Description
Data for Exercise 2.56 and Example 2.4
Usage
Ronbrown2
Format
A data frame/tibble with 150 observations on three variables
- depth
ocean depth (in meters)
- temperature
ocean temperature (in Celcius)
- salinity
ocean salinity level
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(salinity ~ depth, data = Ronbrown2)
model <- lm(salinity ~ depth, data = Ronbrown2)
summary(model)
plot(model, which = 1)
rm(model)
Social adjustment scores for a rural group and a city group of children
Description
Data for Example 7.16
Usage
Rural
Format
A data frame/tibble with 33 observations on two variables
- score
child's social adjustment score
- area
character variable with values
city
andrural
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ area, data = Rural)
wilcox.test(score ~ area, data = Rural)
## Not run:
library(dplyr)
Rural <- dplyr::mutate(Rural, r = rank(score))
Rural
t.test(r ~ area, data = Rural)
## End(Not run)
Starting salaries for 25 new PhD psychologist
Description
Data for Exercise 3.66
Usage
Salary
Format
A data frame/tibble with 25 observations on one variable
- salary
starting salary for Ph.D. psycholgists (in dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Salary$salary, pch = 19, col = "purple")
qqline(Salary$salary, col = "blue")
Surface-water salinity measurements from Whitewater Bay, Florida
Description
Data for Exercise 5.27 and 5.64
Usage
Salinity
Format
A data frame/tibble with 48 observations on one variable
- salinity
surface-water salinity value
Source
J. Davis, Statistics and Data Analysis in Geology, 2nd ed. (New York: John Wiley, 1986).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Salinity$salinity)
qqnorm(Salinity$salinity, pch = 19, col = "purple")
qqline(Salinity$salinity, col = "blue")
t.test(Salinity$salinity, conf.level = 0.99)
t.test(Salinity$salinity, conf.level = 0.99)$conf
SAT scores, percent taking exam and state funding per student by state for 1994, 1995 and 1999
Description
Data for Statistical Insight Chapter 9
Usage
Sat
Format
A data frame/tibble with 102 observations on seven variables
- state
U.S. state
- verbal
verbal SAT score
- math
math SAT score
- total
combined verbal and math SAT score
- percent
percent of high school seniors taking the SAT
- expend
state expenditure per student (in dollars)
- year
year
Source
The 2000 World Almanac and Book of Facts, Funk and Wagnalls Corporation, New Jersey.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
Sat94 <- Sat[Sat$year == 1994, ]
Sat94
Sat99 <- subset(Sat, year == 1999)
Sat99
stem(Sat99$total)
plot(total ~ percent, data = Sat99)
model <- lm(total ~ percent, data = Sat99)
abline(model, col = "blue")
summary(model)
rm(model)
Problem asset ration for savings and loan companies in California, New York, and Texas
Description
Data for Exercise 10.34 and 10.49
Usage
Saving
Format
A data frame/tibble with 65 observations on two variables
- par
problem-asset-ratio for Savings & Loans that were listed as being financially troubled in 1992
- state
U.S. state
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(par ~ state, data = Saving, col = "red")
boxplot(par ~ state, data = Saving, log = "y", col = "red")
model <- aov(par ~ state, data = Saving)
summary(model)
plot(TukeyHSD(model))
kruskal.test(par ~ factor(state), data = Saving)
Readings obtained from a 100 pound weight placed on four brands of bathroom scales
Description
Data for Exercise 1.89
Usage
Scales
Format
A data frame/tibble with 20 observations on two variables
- brand
variable indicating brand of bathroom scale (
A
,B
,C
, orD
)- reading
recorded value (in pounds) of a 100 pound weight
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(reading ~ brand, data = Scales, col = rainbow(4),
ylab = "Weight (lbs)")
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Scales, aes(x = brand, y = reading, fill = brand)) +
geom_boxplot() +
labs(y = "weight (lbs)") +
theme_bw() +
theme(legend.position = "none")
## End(Not run)
Exam scores for 17 patients to assess the learning ability of schizophrenics after taking a specified does of a tranquilizer
Description
Data for Exercise 6.99
Usage
Schizop2
Format
A data frame/tibble with 17 observations on one variable
- score
schizophrenics score on a second standardized exam
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Schizop2$score, xlab = "score on standardized test after a tranquilizer",
main = "Exercise 6.99", breaks = 10, col = "orange")
EDA(Schizop2$score)
SIGN.test(Schizop2$score, md = 22, alternative = "greater")
Standardized exam scores for 13 patients to investigate the learning ability of schizophrenics after a specified dose of a tranquilizer
Description
Data for Example 6.10
Usage
Schizoph
Format
A data frame/tibble with 13 observations on one variable
- score
schizophrenics score on a standardized exam one hour after recieving a specified dose of a tranqilizer.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Schizoph$score, xlab = "score on standardized test",
main = "Example 6.10", breaks = 10, col = "orange")
EDA(Schizoph$score)
t.test(Schizoph$score, mu = 20)
Injury level versus seatbelt usage
Description
Data for Exercise 8.24
Usage
Seatbelt
Format
A data frame/tibble with 86,759 observations on two variables
- seatbelt
a factor with levels
No
andYes
- injuries
a factor with levels
None
,Minimal
,Minor
, orMajor
indicating the extent of the drivers injuries
Source
Jobson, J. (1982), Applied Multivariate Data Analysis, Springer-Verlag, New York, p. 18.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~seatbelt + injuries, data = Seatbelt)
T1
chisq.test(T1)
rm(T1)
Self-confidence scores for 9 women before and after instructions on self-defense
Description
Data for Example 7.19
Usage
Selfdefe
Format
A data frame/tibble with nine observations on three variables
- woman
number identifying the woman
- before
before the course self-confidence score
- after
after the course self-confidence score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
Selfdefe$differ <- Selfdefe$after - Selfdefe$before
Selfdefe
t.test(Selfdefe$differ, alternative = "greater")
Reaction times of 30 senior citizens applying for drivers license renewals
Description
Data for Exercise 1.83 and 3.67
Usage
Senior
Format
A data frame/tibble with 31 observations on one variable
- reaction
reaction time for senior citizens applying for a driver's license renewal
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Senior$reaction)
fivenum(Senior$reaction)
boxplot(Senior$reaction, main = "Problem 1.83, part d",
horizontal = TRUE, col = "purple")
Sentences of 41 prisoners convicted of a homicide offense
Description
Data for Exercise 1.123
Usage
Sentence
Format
A data frame/tibble with 41 observations on one variable
- months
sentence length (in months) for prisoners convicted of homocide
Source
U.S. Department of Justice, Bureau of Justice Statistics, Prison Sentences and Time Served for Violence, NCJ-153858, April 1995.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Sentence$months)
ll <- mean(Sentence$months)-2*sd(Sentence$months)
ul <- mean(Sentence$months)+2*sd(Sentence$months)
limits <- c(ll, ul)
limits
rm(ul, ll, limits)
Effects of a drug and electroshock therapy on the ability to solve simple tasks
Description
Data for Exercises 10.11 and 10.12
Usage
Shkdrug
Format
A data frame/tibble with 64 observations on two variables
- treatment
type of treament
Drug/NoS
,Drug/Shk
,NoDg/NoS
, orNoDrug/S
- response
number of tasks completed in a 10-minute period
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(response ~ treatment, data = Shkdrug, col = "gray")
model <- lm(response ~ treatment, data = Shkdrug)
anova(model)
rm(model)
Effect of experimental shock on time to complete difficult task
Description
Data for Exercise 10.50
Usage
Shock
Format
A data frame/tibble with 27 observations on two variables
- group
grouping variable with values of
Group1
(no shock),Group2
(medium shock), andGroup3
(severe shock)- attempts
number of attempts to complete a task
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(attempts ~ group, data = Shock, col = "violet")
model <- lm(attempts ~ group, data = Shock)
anova(model)
rm(model)
Sales receipts versus shoplifting losses for a department store
Description
Data for Exercise 9.58
Usage
Shoplift
Format
A data frame/tibble with eight observations on two variables
- sales
sales (in 1000 dollars)
- loss
loss (in 100 dollars)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(loss ~ sales, data = Shoplift)
model <- lm(loss ~ sales, data = Shoplift)
summary(model)
rm(model)
James Short's measurements of the parallax of the sun
Description
Data for Exercise 6.65
Usage
Short
Format
A data frame/tibble with 158 observations on two variables
- sample
sample number
- parallax
parallax measurements (seconds of a degree)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Short$parallax, main = "Problem 6.65",
xlab = "", col = "orange")
SIGN.test(Short$parallax, md = 8.798)
t.test(Short$parallax, mu = 8.798)
Number of people riding shuttle versus number of automobiles in the downtown area
Description
Data for Exercise 9.20
Usage
Shuttle
Format
A data frame/tibble with 15 observations on two variables
- users
number of shuttle riders
- autos
number of automobiles in the downtown area
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(autos ~ users, data = Shuttle)
model <- lm(autos ~ users, data = Shuttle)
summary(model)
rm(model)
Sign Test
Description
This function will test a hypothesis based on the sign test and reports linearly interpolated confidence intervals for one sample problems.
Usage
SIGN.test(
x,
y = NULL,
md = 0,
alternative = "two.sided",
conf.level = 0.95,
...
)
Arguments
x |
numeric vector; |
y |
optional numeric vector; |
md |
a single number representing the value of the population median specified by the null hypothesis |
alternative |
is a character string, one of |
conf.level |
confidence level for the returned confidence interval, restricted to lie between zero and one |
... |
further arguments to be passed to or from methods |
Details
Computes a “Dependent-samples Sign-Test” if both x
and
y
are provided. If only x
is provided, computes the
“Sign-Test”.
Value
A list of class htest_S
, containing the following components:
statistic |
the S-statistic (the number of positive differences between the data and the hypothesized median), with names attribute “S”. |
p.value |
the p-value for the test |
conf.int |
is a confidence interval (vector of length 2) for the true
median based on linear interpolation. The confidence level is recorded in the attribute
|
estimate |
is avector of length 1, giving the sample median; this
estimates the corresponding population parameter. Component |
null.value |
is the value of the median specified by the null hypothesis.
This equals the input argument |
alternative |
records the value of the input argument alternative:
|
data.name |
a character string (vector of length 1)
containing the actual name of the input vector |
Confidence.Intervals |
a 3 by 3 matrix containing the lower achieved confidence interval, the interpolated confidence interval, and the upper achived confidence interval |
Null Hypothesis
For the one-sample sign-test, the null hypothesis
is that the median of the population from which x
is drawn is
md
. For the two-sample dependent case, the null hypothesis is that
the median for the differences of the populations from which x
and
y
are drawn is md
. The alternative hypothesis indicates the
direction of divergence of the population median for x
from md
(i.e., "greater"
, "less"
, "two.sided"
.)
Note
The reported confidence interval is based on linear interpolation. The lower and upper confidence levels are exact.
Author(s)
Alan T. Arnholt
References
Gibbons, J.D. and Chakraborti, S. (1992). Nonparametric Statistical Inference. Marcel Dekker Inc., New York.
Kitchens, L.J.(2003). Basic Statistics and Data Analysis. Duxbury.
Conover, W. J. (1980). Practical Nonparametric Statistics, 2nd ed. Wiley, New York.
Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden and Day, San Francisco.
See Also
Examples
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8)
SIGN.test(x, md = 6.5)
# Computes two-sided sign-test for the null hypothesis
# that the population median for 'x' is 6.5. The alternative
# hypothesis is that the median is not 6.5. An interpolated 95%
# confidence interval for the population median will be computed.
reaction <- c(14.3, 13.7, 15.4, 14.7, 12.4, 13.1, 9.2, 14.2,
14.4, 15.8, 11.3, 15.0)
SIGN.test(reaction, md = 15, alternative = "less")
# Data from Example 6.11 page 330 of Kitchens BSDA.
# Computes one-sided sign-test for the null hypothesis
# that the population median is 15. The alternative
# hypothesis is that the median is less than 15.
# An interpolated upper 95% upper bound for the population
# median will be computed.
Grade point averages of men and women participating in various sports-an illustration of Simpson's paradox
Description
Data for Example 1.18
Usage
Simpson
Format
A data frame/tibble with 100 observations on three variables
- gpa
grade point average
- sport
sport played (basketball, soccer, or track)
- gender
athlete sex (male, female)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(gpa ~ gender, data = Simpson, col = "violet")
boxplot(gpa ~ sport, data = Simpson, col = "lightgreen")
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Simpson, aes(x = gender, y = gpa, fill = gender)) +
geom_boxplot() +
facet_grid(.~sport) +
theme_bw()
## End(Not run)
Maximum number of situps by participants in an exercise class
Description
Data for Exercise 1.47
Usage
Situp
Format
A data frame/tibble with 20 observations on one variable
- number
maximum number of situps completed in an exercise class after 1 month in the program
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Situp$number)
hist(Situp$number, breaks = seq(0, 70, 10), right = FALSE)
hist(Situp$number, breaks = seq(0, 70, 10), right = FALSE,
freq = FALSE, col = "pink", main = "Problem 1.47",
xlab = "Maximum number of situps")
lines(density(Situp$number), col = "red")
Illustrates the Wilcoxon Rank Sum test
Description
Data for Exercise 7.65
Usage
Skewed
Format
A data frame/tibble with 21 observations on two variables
- C1
values from a sample of size 16 from a particular population
- C2
values from a sample of size 14 from a particular population
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(Skewed$C1, Skewed$C2, col = c("pink", "lightblue"))
wilcox.test(Skewed$C1, Skewed$C2)
Survival times of closely and poorly matched skin grafts on burn patients
Description
Data for Exercise 5.20
Usage
Skin
Format
A data frame/tibble with 11 observations on four variables
- patient
patient identification number
- close
graft survival time in days for a closely matched skin graft on the same burn patient
- poor
graft survival time in days for a poorly matched skin graft on the same burn patient
- differ
difference between close and poor (in days)
Source
R. F. Woolon and P. A. Lachenbruch, "Rank Tests for Censored Matched Pairs," Biometrika, 67(1980), 597-606.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Skin$differ)
boxplot(Skin$differ, col = "pink")
summary(Skin$differ)
Sodium-lithium countertransport activity on 190 individuals from six large English kindred
Description
Data for Exercise 5.116
Usage
Slc
Format
A data frame/tibble with 190 observations on one variable
- slc
Red blood cell sodium-lithium countertransport
Source
Roeder, K., (1994), "A Graphical Technique for Determining the Number of Components in a Mixture of Normals," Journal of the American Statistical Association, 89, 497-495.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Slc$slc)
hist(Slc$slc, freq = FALSE, xlab = "sodium lithium countertransport",
main = "", col = "lightblue")
lines(density(Slc$slc), col = "purple")
Water pH levels of 75 water samples taken in the Great Smoky Mountains
Description
Data for Exercises 6.40, 6.59, 7.10, and 7.35
Usage
Smokyph
Format
A data frame/tibble with 75 observations on three variables
- waterph
water sample pH level
- code
charater variable with values
low
(elevation below 0.6 miles), andhigh
(elevation above 0.6 miles)- elev
elevation in miles
Source
Schmoyer, R. L. (1994), Permutation Tests for Correlation in Regression Errors, Journal of the American Statistical Association, 89, 1507-1516.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
summary(Smokyph$waterph)
tapply(Smokyph$waterph, Smokyph$code, mean)
stripchart(waterph ~ code, data = Smokyph, method = "stack",
pch = 19, col = c("red", "blue"))
t.test(Smokyph$waterph, mu = 7)
SIGN.test(Smokyph$waterph, md = 7)
t.test(waterph ~ code, data = Smokyph, alternative = "less")
t.test(waterph ~ code, data = Smokyph, conf.level = 0.90)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Smokyph, aes(x = waterph, fill = code)) +
geom_dotplot() +
facet_grid(code ~ .) +
guides(fill = FALSE)
## End(Not run)
Snoring versus heart disease
Description
Data for Exercise 8.21
Usage
Snore
Format
A data frame/tibble with 2,484 observations on two variables
- snore
factor with levels
nonsnorer
,ocassional snorer
,nearly every night
, andsnores every night
- heartdisease
factor indicating whether the indiviudal has heart disease (
no
oryes
)
Source
Norton, P. and Dunn, E. (1985), Snoring as a Risk Factor for Disease, British Medical Journal, 291, 630-632.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~ heartdisease + snore, data = Snore)
T1
chisq.test(T1)
rm(T1)
Concentration of microparticles in snowfields of Greenland and Antarctica
Description
Data for Exercise 7.87
Usage
Snow
Format
A data frame/tibble with 34 observations on two variables
- concent
concentration of microparticles from melted snow (in parts per billion)
- site
location of snow sample (
Antarctica
orGreenland
)
Source
Davis, J., Statistics and Data Analysis in Geology, John Wiley, New York.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(concent ~ site, data = Snow, col = c("lightblue", "lightgreen"))
Weights of 25 soccer players
Description
Data for Exercise 1.46
Usage
Soccer
Format
A data frame/tibble with 25 observations on one variable
- weight
soccer players weight (in pounds)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Soccer$weight, scale = 2)
hist(Soccer$weight, breaks = seq(110, 210, 10), col = "orange",
main = "Problem 1.46 \n Weights of Soccer Players",
xlab = "weight (lbs)", right = FALSE)
Median income level for 25 social workers from North Carolina
Description
Data for Exercise 6.63
Usage
Social
Format
A data frame/tibble with 25 observations on one variable
- income
annual income (in dollars) of North Carolina social workers with less than five years experience.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
SIGN.test(Social$income, md = 27500, alternative = "less")
Grade point averages, SAT scores and final grade in college algebra for 20 sophomores
Description
Data for Exercise 2.42
Usage
Sophomor
Format
A data frame/tibble with 20 observations on four variables
- student
identification number
- gpa
grade point average
- sat
SAT math score
- exam
final exam grade in college algebra
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
cor(Sophomor)
plot(exam ~ gpa, data = Sophomor)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Sophomor, aes(x = gpa, y = exam)) +
geom_point()
ggplot2::ggplot(data = Sophomor, aes(x = sat, y = exam)) +
geom_point()
## End(Not run)
Murder rates for 30 cities in the South
Description
Data for Exercise 1.84
Usage
South
Format
A data frame/tibble with 31 observations on one variable
- rate
murder rate per 100,000 people
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(South$rate, col = "gray", ylab = "Murder rate per 100,000 people")
Speed reading scores before and after a course on speed reading
Description
Data for Exercise 7.58
Usage
Speed
Format
A data frame/tibble with 15 observations on four variables
- before
reading comprehension score before taking a speed-reading course
- after
reading comprehension score after taking a speed-reading course
- differ
after - before (comprehension reading scores)
- signranks
signed ranked differences
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
t.test(Speed$differ, alternative = "greater")
t.test(Speed$signranks, alternative = "greater")
wilcox.test(Pair(Speed$after, Speed$before) ~ 1, data = Speed, alternative = "greater")
Standardized spelling test scores for two fourth grade classes
Description
Data for Exercise 7.82
Usage
Spellers
Format
A data frame/tibble with ten observations on two variables
- teacher
character variable with values
Fourth
andColleague
- score
score on a standardized spelling test
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ teacher, data = Spellers, col = "pink")
t.test(score ~ teacher, data = Spellers)
Spelling scores for 9 eighth graders before and after a 2-week course of instruction
Description
Data for Exercise 7.56
Usage
Spelling
Format
A data frame/tibble with nine observations on three variables
- before
spelling score before a 2-week course of instruction
- after
spelling score after a 2-week course of instruction
- differ
after - before (spelling score)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Spelling$differ)
qqline(Spelling$differ)
shapiro.test(Spelling$differ)
t.test(Spelling$differ)
Favorite sport by gender
Description
Data for Exercise 8.32
Usage
Sports
Format
A data frame/tibble with 200 observations on two variables
- gender
a factor with levels
male
andfemale
- sport
a factor with levels
football
,basketball
,baseball
, andtennis
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~gender + sport, data = Sports)
T1
chisq.test(T1)
rm(T1)
Convictions in spouse murder cases by gender
Description
Data for Exercise 8.33
Usage
Spouse
Format
A data frame/tibble with 540 observations on two variables
- result
a factor with levels
not prosecuted
,pleaded guilty
,convicted
, andacquited
- spouse
a factor with levels
husband
andwife
Source
Bureau of Justice Statistics (September 1995), Spouse Murder Defendants in Large Urban Counties, Executive Summary, NCJ-156831.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~result + spouse, data = Spouse)
T1
chisq.test(T1)
rm(T1)
Simple Random Sampling
Description
Computes all possible samples from a given population using simple random sampling.
Usage
SRS(POPvalues, n)
Arguments
POPvalues |
vector containing the poulation values. |
n |
the sample size. |
Value
Returns a matrix containing the possible simple random samples of
size n
taken from a population POPvalues
.
Author(s)
Alan T. Arnholt
See Also
Examples
SRS(c(5,8,3),2)
# The rows in the matrix list the values for the 3 possible
# simple random samples of size 2 from the population of 5,8, and 3.
Times of a 2-year old stallion on a one mile run
Description
Data for Exercise 6.93
Usage
Stable
Format
A data frame/tibble with nine observations on one variable
- time
time (in seconds) for horse to run 1 mile
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
SIGN.test(Stable$time, md = 98.5, alternative = "greater")
Thicknesses of 1872 Hidalgo stamps issued in Mexico
Description
Data for Statistical Insight Chapter 1 and Exercise 5.110
Usage
Stamp
Format
A data frame/tibble with 485 observations on one variable
- thickness
stamp thickness (in mm)
Source
Izenman, A., Sommer, C. (1988), Philatelic Mixtures and Multimodal Densities, Journal of the American Statistical Association, 83, 941-953.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Stamp$thickness, freq = FALSE, col = "lightblue",
main = "", xlab = "stamp thickness (mm)")
lines(density(Stamp$thickness), col = "blue")
t.test(Stamp$thickness, conf.level = 0.99)
Grades for two introductory statistics classes
Description
Data for Exercise 7.30
Usage
Statclas
Format
A data frame/tibble with 72 observations on two variables
- class
class meeting time (9am or 2pm)
- score
grade for an introductory statistics class
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
str(Statclas)
boxplot(score ~ class, data = Statclas, col = "red")
t.test(score ~ class, data = Statclas)
Operating expenditures per resident for each of the state law enforcement agencies
Description
Data for Exercise 6.62
Usage
Statelaw
Format
A data frame/tibble with 50 observations on two variables
- state
U.S. state
- cost
dollars spent per resident on law enforcement
Source
Bureau of Justice Statistics, Law Enforcement Management and Administrative Statistics, 1993, NCJ-148825, September 1995, page 84.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Statelaw$cost)
SIGN.test(Statelaw$cost, md = 8, alternative = "less")
Test scores for two beginning statistics classes
Description
Data for Exercises 1.70 and 1.87
Usage
Statisti
Format
A data frame/tibble with 62 observations on two variables
- class
character variable with values
Class1
andClass2
- score
test score for an introductory statistics test
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ class, data = Statisti, col = "violet")
tapply(Statisti$score, Statisti$class, summary, na.rm = TRUE)
## Not run:
library(dplyr)
dplyr::group_by(Statisti, class) %>%
summarize(Mean = mean(score, na.rm = TRUE),
Median = median(score, na.rm = TRUE),
SD = sd(score, na.rm = TRUE),
RS = IQR(score, na.rm = TRUE))
## End(Not run)
STEP science test scores for a class of ability-grouped students
Description
Data for Exercise 6.79
Usage
Step
Format
A data frame/tibble with 12 observations on one variable
- score
State test of educational progress (STEP) science test score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Step$score)
t.test(Step$score, mu = 80, alternative = "less")
wilcox.test(Step$score, mu = 80, alternative = "less")
Short-term memory test scores on 12 subjects before and after a stressful situation
Description
Data for Example 7.20
Usage
Stress
Format
A data frame/tibble with 12 observations on two variables
- prestress
short term memory score before being exposed to a stressful situation
- poststress
short term memory score after being exposed to a stressful situation
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
diff <- Stress$prestress - Stress$poststress
qqnorm(diff)
qqline(diff)
t.test(diff)
## Not run:
wilcox.test(Pair(Stress$prestress, Stress$poststress)~1, data = Stress)
## End(Not run)
Number of hours studied per week by a sample of 50 freshmen
Description
Data for Exercise 5.25
Usage
Study
Format
A data frame/tibble with 50 observations on one variable
- hours
number of hours a week freshmen reported studying for their courses
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Study$hours)
hist(Study$hours, col = "violet")
summary(Study$hours)
Number of German submarines sunk by U.S. Navy in World War II
Description
Data for Exercises 2.16, 2.45, and 2.59
Usage
Submarin
Format
A data frame/tibble with 16 observations on three variables
- month
month
- reported
number of submarines reported sunk by U.S. Navy
- actual
number of submarines actually sunk by U.S. Navy
Source
F. Mosteller, S. Fienberg, and R. Rourke, Beginning Statistics with Data Analysis (Reading, MA: Addison-Wesley, 1983).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
model <- lm(actual ~ reported, data = Submarin)
summary(model)
plot(actual ~ reported, data = Submarin)
abline(model, col = "red")
rm(model)
Time it takes a subway to travel from the airport to downtown
Description
Data for Exercise 5.19
Usage
Subway
Format
A data frame/tibble with 30 observations on one variable
- time
time (in minutes) it takes a subway to travel from the airport to downtown
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Subway$time, main = "Exercise 5.19",
xlab = "Time (in minutes)", col = "purple")
summary(Subway$time)
Wolfer sunspot numbers from 1700 through 2000
Description
Data for Example 1.7
Usage
Sunspot
Format
A data frame/tibble with 301 observations on two variables
- year
year
- sunspots
average number of sunspots for the year
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(sunspots ~ year, data = Sunspot, type = "l")
## Not run:
library(ggplot2)
lattice::xyplot(sunspots ~ year, data = Sunspot,
main = "Yearly sunspots", type = "l")
lattice::xyplot(sunspots ~ year, data = Sunspot, type = "l",
main = "Yearly sunspots", aspect = "xy")
ggplot2::ggplot(data = Sunspot, aes(x = year, y = sunspots)) +
geom_line() +
theme_bw()
## End(Not run)
Margin of victory in Superbowls I to XXXV
Description
Data for Exercise 1.54
Usage
Superbowl
Format
A data frame/tibble with 35 observations on five variables
- winning_team
name of Suberbowl winning team
- winner_score
winning score for the Superbowl
- losing_team
name of Suberbowl losing team
- loser_score
score of losing teama numeric vector
- victory_margin
winner_score - loser_score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Superbowl$victory_margin)
Top speeds attained by five makes of supercars
Description
Data for Statistical Insight Chapter 10
Usage
Supercar
Format
A data frame/tibble with 30 observations on two variables
- speed
top speed (in miles per hour) of car without redlining
- car
name of sports car
Source
Car and Drvier (July 1995).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(speed ~ car, data = Supercar, col = rainbow(6),
ylab = "Speed (mph)")
summary(aov(speed ~ car, data = Supercar))
anova(lm(speed ~ car, data = Supercar))
Ozone concentrations at Mt. Mitchell, North Carolina
Description
Data for Exercise 5.63
Usage
Tablrock
Format
A data frame/tibble with 719 observations on the following 17 variables.
- day
date
- hour
time of day
- ozone
ozone concentration
- tmp
temperature (in Celcius)
- vdc
a numeric vector
- wd
a numeric vector
- ws
a numeric vector
- amb
a numeric vector
- dew
a numeric vector
- so2
a numeric vector
- no
a numeric vector
- no2
a numeric vector
- nox
a numeric vector
- co
a numeric vector
- co2
a numeric vector
- gas
a numeric vector
- air
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
summary(Tablrock$ozone)
boxplot(Tablrock$ozone)
qqnorm(Tablrock$ozone)
qqline(Tablrock$ozone)
par(mar = c(5.1 - 1, 4.1 + 2, 4.1 - 2, 2.1))
boxplot(ozone ~ day, data = Tablrock,
horizontal = TRUE, las = 1, cex.axis = 0.7)
par(mar = c(5.1, 4.1, 4.1, 2.1))
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Tablrock, aes(sample = ozone)) +
geom_qq() +
theme_bw()
ggplot2::ggplot(data = Tablrock, aes(x = as.factor(day), y = ozone)) +
geom_boxplot(fill = "pink") +
coord_flip() +
labs(x = "") +
theme_bw()
## End(Not run)
Average teacher's salaries across the states in the 70s 80s and 90s
Description
Data for Exercise 5.114
Usage
Teacher
Format
A data frame/tibble with 51 observations on three variables
- state
U.S. state
- year
academic year
- salary
avaerage salary (in dollars)
Source
National Education Association.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
par(mfrow = c(3, 1))
hist(Teacher$salary[Teacher$year == "1973-74"],
main = "Teacher salary 1973-74", xlab = "salary",
xlim = range(Teacher$salary, na.rm = TRUE))
hist(Teacher$salary[Teacher$year == "1983-84"],
main = "Teacher salary 1983-84", xlab = "salary",
xlim = range(Teacher$salary, na.rm = TRUE))
hist(Teacher$salary[Teacher$year == "1993-94"],
main = "Teacher salary 1993-94", xlab = "salary",
xlim = range(Teacher$salary, na.rm = TRUE))
par(mfrow = c(1, 1))
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Teacher, aes(x = salary)) +
geom_histogram(fill = "purple", color = "black") +
facet_grid(year ~ .) +
theme_bw()
## End(Not run)
Tennessee self concept scores for 20 gifted high school students
Description
Data for Exercise 6.56
Usage
Tenness
Format
A data frame/tibble with 20 observations on one variable
- score
Tennessee Self-Concept Scale score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Tenness$score, freq= FALSE, main = "", col = "green",
xlab = "Tennessee Self-Concept Scale score")
lines(density(Tenness$score))
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Tenness, aes(x = score, y = ..density..)) +
geom_histogram(binwidth = 2, fill = "purple", color = "black") +
geom_density(color = "red", fill = "pink", alpha = 0.3) +
theme_bw()
## End(Not run)
Tensile strength of plastic bags from two production runs
Description
Data for Example 7.11
Usage
Tensile
Format
A data frame/tibble with 72 observations on two variables
- tensile
plastic bag tensile strength (pounds per square inch)
- run
factor with run number (1 or 2)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(tensile ~ run, data = Tensile,
col = c("purple", "cyan"))
t.test(tensile ~ run, data = Tensile)
Grades on the first test in a statistics class
Description
Data for Exercise 5.80
Usage
Test1
Format
A data frame/tibble with 25 observations on one variable
- score
score on first statistics exam
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Test1$score)
boxplot(Test1$score, col = "purple")
Heat loss of thermal pane windows versus outside temperature
Description
Data for Example 9.5
Usage
Thermal
Format
A data frame/tibble with 12 observations on the two variables
- temp
temperature (degrees Celcius)
- loss
heat loss (BTUs)
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
model <- lm(loss ~ temp, data = Thermal)
summary(model)
plot(loss ~ temp, data = Thermal)
abline(model, col = "red")
rm(model)
1999-2000 closing prices for TIAA-CREF stocks
Description
Data for your enjoyment
Usage
Tiaa
Format
A data frame/tibble with 365 observations on four variables
- crefstk
closing price (in dollars)
- crefgwt
closing price (in dollars)
- tiaa
closing price (in dollars)
- date
day of the year
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
data(Tiaa)
Time to complete an airline ticket reservation
Description
Data for Exercise 5.18
Usage
Ticket
Format
A data frame/tibble with 20 observations on one variable
- time
time (in seconds) to check out a reservation
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Ticket$time)
Consumer Reports (Oct 94) rating of toaster ovens versus the cost
Description
Data for Exercise 9.36
Usage
Toaster
Format
A data frame/tibble with 17 observations on three variables
- toaster
name of toaster
- score
Consumer Reports score
- cost
price of toaster (in dollars)
Source
Consumer Reports (October 1994).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(cost ~ score, data = Toaster)
model <- lm(cost ~ score, data = Toaster)
summary(model)
names(summary(model))
summary(model)$r.squared
plot(model, which = 1)
Size of tonsils collected from 1,398 children
Description
Data for Exercise 2.78
Usage
Tonsils
Format
A data frame/tibble with 1,398 observations on two variables
- size
a factor with levels
Normal
,Large
, andVery Large
- status
a factor with levels
Carrier
andNon-carrier
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~size + status, data = Tonsils)
T1
prop.table(T1, 1)
prop.table(T1, 1)[2, 1]
barplot(t(T1), legend = TRUE, beside = TRUE, col = c("red", "green"))
## Not run:
library(dplyr)
library(ggplot2)
NDF <- dplyr::count(Tonsils, size, status)
ggplot2::ggplot(data = NDF, aes(x = size, y = n, fill = status)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("red", "green")) +
theme_bw()
## End(Not run)
The number of torts, average number of months to process a tort, and county population from the court files of the nation's largest counties
Description
Data for Exercise 5.13
Usage
Tort
Format
A data frame/tibble with 45 observations on five variables
- county
U.S. county
- months
average number of months to process a tort
- population
population of the county
- torts
number of torts
- rate
rate per 10,000 residents
Source
U.S. Department of Justice, Tort Cases in Large Counties, Bureau of Justice Statistics Special Report, April 1995.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
EDA(Tort$months)
Hazardous waste sites near minority communities
Description
Data for Exercises 1.55, 5.08, 5.109, 8.58, and 10.35
Usage
Toxic
Format
A data frame/tibble with 51 observations on five variables
- state
U.S. state
- region
U.S. region
- sites
number of commercial hazardous waste sites
- minority
percent of minorities living in communities with commercial hazardous waste sites
- percent
a numeric vector
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
hist(Toxic$sites, col = "red")
hist(Toxic$minority, col = "blue")
qqnorm(Toxic$minority)
qqline(Toxic$minority)
boxplot(sites ~ region, data = Toxic, col = "lightgreen")
tapply(Toxic$sites, Toxic$region, median)
kruskal.test(sites ~ factor(region), data = Toxic)
National Olympic records for women in several races
Description
Data for Exercises 2.97, 5.115, and 9.62
Usage
Track
Format
A data frame with 55 observations on eight variables
- country
athlete's country
- 100m
time in seconds for 100 m
- 200m
time in seconds for 200 m
- 400m
time in seconds for 400 m
- 800m
time in minutes for 800 m
- 1500m
time in minutes for 1500 m
- 3000m
time in minutes for 3000 m
- marathon
time in minutes for marathon
Source
Dawkins, B. (1989), "Multivariate Analysis of National Track Records," The American Statistician, 43(2), 110-115.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(`200m` ~ `100m`, data = Track)
plot(`400m` ~ `100m`, data = Track)
plot(`400m` ~ `200m`, data = Track)
cor(Track[, 2:8])
Olympic winning times for the men's 1500-meter run
Description
Data for Exercise 1.36
Usage
Track15
Format
A data frame/tibble with 26 observations on two variables
- year
Olympic year
- time
Olympic winning time (in seconds) for the 1500-meter run
Source
The World Almanac and Book of Facts, 2000.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(time~ year, data = Track15, type = "b", pch = 19,
ylab = "1500m time in seconds", col = "green")
Illustrates analysis of variance for three treatment groups
Description
Data for Exercise 10.44
Usage
Treatments
Format
A data frame/tibble with 24 observations on two variables
- score
score from an experiment
- group
factor with levels 1, 2, and 3
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(score ~ group, data = Treatments, col = "violet")
summary(aov(score ~ group, data = Treatments))
summary(lm(score ~ group, data = Treatments))
anova(lm(score ~ group, data = Treatments))
Number of trees in 20 grids
Description
Data for Exercise 1.50
Usage
Trees
Format
A data frame/tibble with 20 observations on one variable
- number
number of trees in a grid
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Trees$number)
hist(Trees$number, main = "Exercise 1.50", xlab = "number",
col = "brown")
Miles per gallon for standard 4-wheel drive trucks manufactured by Chevrolet, Dodge and Ford
Description
Data for Example 10.2
Usage
Trucks
Format
A data frame/tibble with 15 observations on two variables
- mpg
miles per gallon
- truck
a factor with levels
chevy
,dodge
, andford
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(mpg ~ truck, data = Trucks, horizontal = TRUE, las = 1)
summary(aov(mpg ~ truck, data = Trucks))
Summarized t-test
Description
Performs a one-sample, two-sample, or a Welch modified two-sample t-test
based on user supplied summary information. Output is identical to that
produced with t.test
.
Usage
tsum.test(
mean.x,
s.x = NULL,
n.x = NULL,
mean.y = NULL,
s.y = NULL,
n.y = NULL,
alternative = "two.sided",
mu = 0,
var.equal = FALSE,
conf.level = 0.95
)
Arguments
mean.x |
a single number representing the sample mean of |
s.x |
a single number representing the sample standard deviation for
|
n.x |
a single number representing the sample size for |
mean.y |
a single number representing the sample mean of |
s.y |
a single number representing the sample standard deviation for
|
n.y |
a single number representing the sample size for |
alternative |
is a character string, one of |
mu |
is a single number representing the value of the mean or difference in means specified by the null hypothesis. |
var.equal |
logical flag: if |
conf.level |
is the confidence level for the returned confidence interval; it must lie between zero and one. |
Details
If y
is NULL
, a one-sample t-test is carried out with
x
. If y is not NULL
, either a standard or Welch modified
two-sample t-test is performed, depending on whether var.equal
is
TRUE
or FALSE
.
Value
A list of class htest
, containing the following components:
statistic |
the t-statistic, with names attribute |
parameters |
is the degrees of freedom of the t-distribution associated
with statistic. Component |
p.value |
the p-value for the test. |
conf.int |
is
a confidence interval (vector of length 2) for the true mean or difference
in means. The confidence level is recorded in the attribute
|
estimate |
vector of length 1 or 2, giving the sample mean(s) or mean
of differences; these estimate the corresponding population parameters.
Component |
null.value |
the value of the mean or difference in means specified by
the null hypothesis. This equals the input argument |
alternative |
records the value of the input argument alternative:
|
data.name |
a character string (vector of length 1) containing the names x and y for the two summarized samples. |
Null Hypothesis
For the one-sample t-test, the null hypothesis is
that the mean of the population from which x
is drawn is mu
.
For the standard and Welch modified two-sample t-tests, the null hypothesis
is that the population mean for x
less that for y
is
mu
.
The alternative hypothesis in each case indicates the direction of
divergence of the population mean for x
(or difference of means for
x
and y
) from mu
(i.e., "greater"
,
"less"
, or "two.sided"
).
Author(s)
Alan T. Arnholt
References
Kitchens, L.J. (2003). Basic Statistics and Data Analysis. Duxbury.
Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.
Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.
See Also
Examples
tsum.test(mean.x=5.6, s.x=2.1, n.x=16, mu=4.9, alternative="greater")
# Problem 6.31 on page 324 of BSDA states: The chamber of commerce
# of a particular city claims that the mean carbon dioxide
# level of air polution is no greater than 4.9 ppm. A random
# sample of 16 readings resulted in a sample mean of 5.6 ppm,
# and s=2.1 ppm. One-sided one-sample t-test. The null
# hypothesis is that the population mean for 'x' is 4.9.
# The alternative hypothesis states that it is greater than 4.9.
x <- rnorm(12)
tsum.test(mean(x), sd(x), n.x=12)
# Two-sided one-sample t-test. The null hypothesis is that
# the population mean for 'x' is zero. The alternative
# hypothesis states that it is either greater or less
# than zero. A confidence interval for the population mean
# will be computed. Note: above returns same answer as:
t.test(x)
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8)
y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5)
tsum.test(mean(x), s.x=sd(x), n.x=11 ,mean(y), s.y=sd(y), n.y=8, mu=2)
# Two-sided standard two-sample t-test. The null hypothesis
# is that the population mean for 'x' less that for 'y' is 2.
# The alternative hypothesis is that this difference is not 2.
# A confidence interval for the true difference will be computed.
# Note: above returns same answer as:
t.test(x, y)
tsum.test(mean(x), s.x=sd(x), n.x=11, mean(y), s.y=sd(y), n.y=8, conf.level=0.90)
# Two-sided standard two-sample t-test. The null hypothesis
# is that the population mean for 'x' less that for 'y' is zero.
# The alternative hypothesis is that this difference is not
# zero. A 90% confidence interval for the true difference will
# be computed. Note: above returns same answer as:
t.test(x, y, conf.level=0.90)
Percent of students that watch more than 6 hours of TV per day versus national math test scores
Description
Data for Examples 2.1 and 2.7
Usage
Tv
Format
A data frame/tibble with 53 observations on three variables
- state
U.S. state
- percent
percent of students who watch more than six hours of TV a day
- test
state average on national math test
Source
Educational Testing Services.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(test ~ percent, data = Tv, col = "blue")
cor(Tv$test, Tv$percent)
Intelligence test scores for identical twins in which one twin is given a drug
Description
Data for Exercise 7.54
Usage
Twin
Format
A data frame/tibble with nine observations on three variables
- twinA
score on intelligence test without drug
- twinB
score on intelligence test after taking drug
- differ
twinA
-twinB
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
qqnorm(Twin$differ)
qqline(Twin$differ)
shapiro.test(Twin$differ)
t.test(Twin$differ)
Data set describing a sample of undergraduate students
Description
Data for Exercise 1.15
Usage
Undergrad
Format
A data frame/tibble with 100 observations on six variables
- gender
character variable with values
Female
andMale
- major
college major
- class
college year group classification
- gpa
grade point average
- sat
Scholastic Assessment Test score
- drops
number of courses dropped
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stripchart(gpa ~ class, data = Undergrad, method = "stack",
col = c("blue","red","green","lightblue"),
pch = 19, main = "GPA versus Class")
stripchart(gpa ~ gender, data = Undergrad, method = "stack",
col = c("red", "blue"), pch = 19,
main = "GPA versus Gender")
stripchart(sat ~ drops, data = Undergrad, method = "stack",
col = c("blue", "red", "green", "lightblue"),
pch = 19, main = "SAT versus Drops")
stripchart(drops ~ gender, data = Undergrad, method = "stack",
col = c("red", "blue"), pch = 19, main = "Drops versus Gender")
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Undergrad, aes(x = sat, y = drops, fill = factor(drops))) +
facet_grid(drops ~.) +
geom_dotplot() +
guides(fill = FALSE)
## End(Not run)
Number of days of paid holidays and vacation leave for sample of 35 textile workers
Description
Data for Exercise 6.46 and 6.98
Usage
Vacation
Format
A data frame/tibble with 35 observations on one variable
- number
number of days of paid holidays and vacation leave taken
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(Vacation$number, col = "violet")
hist(Vacation$number, main = "Exercise 6.46", col = "blue",
xlab = "number of days of paid holidays and vacation leave taken")
t.test(Vacation$number, mu = 24)
Reported serious reactions due to vaccines in 11 southern states
Description
Data for Exercise 1.111
Usage
Vaccine
Format
A data frame/tibble with 11 observations on two variables
- state
U.S. state
- number
number of reported serious reactions per million doses of a vaccine
Source
Center for Disease Control, Atlanta, Georgia.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Vaccine$number, scale = 2)
fn <- fivenum(Vaccine$number)
fn
iqr <- IQR(Vaccine$number)
iqr
Fatality ratings for foreign and domestic vehicles
Description
Data for Exercise 8.34
Usage
Vehicle
Format
A data frame/tibble with 151 observations on two variables
- make
a factor with levels
domestic
andforeign
- rating
a factor with levels
Much better than average
,Above average
,Average
,Below average
, andMuch worse than average
Source
Insurance Institute for Highway Safety and the Highway Loss Data Institute, 1995.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~make + rating, data = Vehicle)
T1
chisq.test(T1)
Verbal test scores and number of library books checked out for 15 eighth graders
Description
Data for Exercise 9.30
Usage
Verbal
Format
A data frame/tibble with 15 observations on two variables
- number
number of library books checked out
- verbal
verbal test score
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(verbal ~ number, data = Verbal)
abline(lm(verbal ~ number, data = Verbal), col = "red")
summary(lm(verbal ~ number, data = Verbal))
Number of sunspots versus mean annual level of Lake Victoria Nyanza from 1902 to 1921
Description
Data for Exercise 2.98
Usage
Victoria
Format
A data frame/tibble with 20 observations on three variables
- year
year
- level
mean annual level of Lake Victoria Nyanza
- sunspot
number of sunspots
Source
N. Shaw, Manual of Meteorology, Vol. 1 (London: Cambridge University Press, 1942), p. 284; and F. Mosteller and J. W. Tukey, Data Analysis and Regression (Reading, MA: Addison-Wesley, 1977).
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(level ~ sunspot, data = Victoria)
model <- lm(level ~ sunspot, data = Victoria)
summary(model)
rm(model)
Viscosity measurements of a substance on two different days
Description
Data for Exercise 7.44
Usage
Viscosit
Format
A data frame/tibble with 11 observations on two variables
- first
viscosity measurement for a certain substance on day one
- second
viscosity measurement for a certain substance on day two
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(Viscosit$first, Viscosit$second, col = "blue")
t.test(Viscosit$first, Viscosit$second, var.equal = TRUE)
Visual acuity of a group of subjects tested under a specified dose of a drug
Description
Data for Exercise 5.6
Usage
Visual
Format
A data frame/tibble with 18 observations on one variable
- visual
visual acuity measurement
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
stem(Visual$visual)
boxplot(Visual$visual, col = "purple")
Reading scores before and after vocabulary training for 14 employees who did not complete high school
Description
Data for Exercise 7.80
Usage
Vocab
Format
A data frame/tibble with 14 observations on two variables
- first
reading test score before formal vocabulary training
- second
reading test score after formal vocabulary training
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
t.test(Pair(Vocab$first, Vocab$second) ~ 1)
Volume of injected waste water from Rocky Mountain Arsenal and number of earthquakes near Denver
Description
Data for Exercise 9.18
Usage
Wastewat
Format
A data frame/tibble with 44 observations on two variables
- gallons
injected water (in million gallons)
- number
number of earthqueakes detected in Denver
Source
Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2 ed., John Wiley and Sons, New York, p. 228, and Bardwell, G. E. (1970), Some Statistical Features of the Relationship between Rocky Mountain Arsenal Waste Disposal and Frequency of Earthquakes, Geological Society of America, Engineering Geology Case Histories, 8, 33-337.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(number ~ gallons, data = Wastewat)
model <- lm(number ~ gallons, data = Wastewat)
summary(model)
anova(model)
plot(model, which = 2)
Weather casualties in 1994
Description
Data for Exercise 1.30
Usage
Weather94
Format
A data frame/tibble with 388 observations on one variable
- type
factor with levels
Extreme Temp
,Flash Flood
,Fog
,High Wind
,Hurricane
,Lighting
,Other
,River Flood
,Thunderstorm
,Tornado
, andWinter Weather
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
T1 <- xtabs(~type, data = Weather94)
T1
par(mar = c(5.1 + 2, 4.1 - 1, 4.1 - 2, 2.1))
barplot(sort(T1, decreasing = TRUE), las = 2, col = rainbow(11))
par(mar = c(5.1, 4.1, 4.1, 2.1))
## Not run:
library(ggplot2)
T2 <- as.data.frame(T1)
T2
ggplot2::ggplot(data =T2, aes(x = reorder(type, Freq), y = Freq)) +
geom_bar(stat = "identity", fill = "purple") +
theme_bw() +
theme(axis.text.x = element_text(angle = 55, vjust = 0.5)) +
labs(x = "", y = "count")
## End(Not run)
Price of a bushel of wheat versus the national weekly earnings of production workers
Description
Data for Exercise 2.11
Usage
Wheat
Format
A data frame/tibble with 19 observations on three variables
- year
year
- earnings
national weekly earnings (in dollars) for production workers
- price
price for a bushel of wheat (in dollars)
Source
The World Almanac and Book of Facts, 2000.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
par(mfrow = c(1, 2))
plot(earnings ~ year, data = Wheat)
plot(price ~ year, data = Wheat)
par(mfrow = c(1, 1))
Direct current produced by different wind velocities
Description
Data for Exercise 9.34
Usage
Windmill
Format
A data frame/tibble with 25 observations on two variables
- velocity
wind velocity (miles per hour)
- output
power generated (DC volts)
Source
Joglekar, et al. (1989), Lack of Fit Testing when Replicates Are Not Available, The American Statistician, 43,(3), 135-143.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
summary(lm(output ~ velocity, data = Windmill))
anova(lm(output ~ velocity, data = Windmill))
Wind leakage for storm windows exposed to a 50 mph wind
Description
Data for Exercise 6.54
Usage
Window
Format
A data frame/tibble with nine observations on two variables
- window
window number
- leakage
percent leakage from a 50 mph wind
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
SIGN.test(Window$leakage, md = 0.125, alternative = "greater")
Baseball team wins versus seven independent variables for National league teams in 1990
Description
Data for Exercise 9.23
Usage
Wins
Format
A data frame with 12 observations on nine variables
- team
name of team
- wins
number of wins
- batavg
batting average
- rbi
runs batted in
- stole
bases stole
- strkout
number of strikeots
- caught
number of times caught stealing
- errors
number of errors
- era
earned run average
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(wins ~ era, data = Wins)
## Not run:
library(ggplot2)
ggplot2::ggplot(data = Wins, aes(x = era, y = wins)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
theme_bw()
## End(Not run)
Strength tests of two types of wool fabric
Description
Data for Exercise 7.42
Usage
Wool
Format
A data frame/tibble with 20 observations on two variables
- type
type of wool (
Type I
,Type 2
)- strength
strength of wool
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
boxplot(strength ~ type, data = Wool, col = c("blue", "purple"))
t.test(strength ~ type, data = Wool, var.equal = TRUE)
Monthly sunspot activity from 1974 to 2000
Description
Data for Exercise 2.7
Usage
Yearsunspot
Format
A data frame/tibble with 252 observations on two variables
- number
average number of sunspots
- year
date
Source
NASA/Marshall Space Flight Center, Huntsville, AL 35812.
References
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Examples
plot(number ~ year, data = Yearsunspot)
Z-test
Description
This function is based on the standard normal distribution and creates confidence intervals and tests hypotheses for both one and two sample problems.
Usage
z.test(
x,
y = NULL,
alternative = "two.sided",
mu = 0,
sigma.x = NULL,
sigma.y = NULL,
conf.level = 0.95
)
Arguments
x |
numeric vector; |
y |
numeric vector; |
alternative |
character string, one of |
mu |
a single number representing the value of the mean or difference in means specified by the null hypothesis |
sigma.x |
a single number representing the population standard
deviation for |
sigma.y |
a single number representing the population standard
deviation for |
conf.level |
confidence level for the returned confidence interval, restricted to lie between zero and one |
Details
If y
is NULL
, a one-sample z-test is carried out with
x
. If y is not NULL
, a standard two-sample z-test is
performed.
Value
A list of class htest
, containing the following components:
statistic |
the z-statistic, with names attribute |
p.value |
the p-value for the test |
conf.int |
is a confidence
interval (vector of length 2) for the true mean or difference in means. The
confidence level is recorded in the attribute |
estimate |
vector of
length 1 or 2, giving the sample mean(s) or mean of differences; these
estimate the corresponding population parameters. Component |
null.value |
is the
value of the mean or difference in means specified by the null hypothesis.
This equals the input argument |
alternative |
records the
value of the input argument alternative: |
data.name |
a character string (vector of length
1) containing the actual names of the input vectors |
Null Hypothesis
For the one-sample z-test, the null hypothesis is
that the mean of the population from which x
is drawn is mu
.
For the standard two-sample z-tests, the null hypothesis is that the
population mean for x
less that for y
is mu
.
The alternative hypothesis in each case indicates the direction of
divergence of the population mean for x
(or difference of means for
x
and y
) from mu
(i.e., "greater"
,
"less"
, "two.sided"
).
Author(s)
Alan T. Arnholt
References
Kitchens, L.J. (2003). Basic Statistics and Data Analysis. Duxbury.
Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.
Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.
See Also
Examples
x <- rnorm(12)
z.test(x,sigma.x=1)
# Two-sided one-sample z-test where the assumed value for
# sigma.x is one. The null hypothesis is that the population
# mean for 'x' is zero. The alternative hypothesis states
# that it is either greater or less than zero. A confidence
# interval for the population mean will be computed.
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8)
y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5., 4.1, 5.5)
z.test(x, sigma.x=0.5, y, sigma.y=0.5, mu=2)
# Two-sided standard two-sample z-test where both sigma.x
# and sigma.y are both assumed to equal 0.5. The null hypothesis
# is that the population mean for 'x' less that for 'y' is 2.
# The alternative hypothesis is that this difference is not 2.
# A confidence interval for the true difference will be computed.
z.test(x, sigma.x=0.5, y, sigma.y=0.5, conf.level=0.90)
# Two-sided standard two-sample z-test where both sigma.x and
# sigma.y are both assumed to equal 0.5. The null hypothesis
# is that the population mean for 'x' less that for 'y' is zero.
# The alternative hypothesis is that this difference is not
# zero. A 90% confidence interval for the true difference will
# be computed.
rm(x, y)
Summarized z-test
Description
This function is based on the standard normal distribution and creates
confidence intervals and tests hypotheses for both one and two sample
problems based on summarized information the user passes to the function.
Output is identical to that produced with z.test
.
Usage
zsum.test(
mean.x,
sigma.x = NULL,
n.x = NULL,
mean.y = NULL,
sigma.y = NULL,
n.y = NULL,
alternative = "two.sided",
mu = 0,
conf.level = 0.95
)
Arguments
mean.x |
a single number representing the sample mean of |
sigma.x |
a single number representing the population standard
deviation for |
n.x |
a single number representing the sample size for |
mean.y |
a single number representing the sample mean of |
sigma.y |
a single number representing the population standard
deviation for |
n.y |
a single number representing the sample size for |
alternative |
is a character string, one of |
mu |
a single number representing the value of the mean or difference in means specified by the null hypothesis |
conf.level |
confidence level for the returned confidence interval, restricted to lie between zero and one |
Details
If y
is NULL
, a one-sample z-test is carried out with
x
. If y is not NULL
, a standard two-sample z-test is
performed.
Value
A list of class htest
, containing the following components:
statistic |
the z-statistic, with names attribute |
p.value |
the p-value for the test |
conf.int |
is a confidence
interval (vector of length 2) for the true mean or difference in means. The
confidence level is recorded in the attribute |
estimate |
vector of
length 1 or 2, giving the sample mean(s) or mean of differences; these
estimate the corresponding population parameters. Component |
null.value |
the value
of the mean or difference in means specified by the null hypothesis. This
equals the input argument |
alternative |
records the value of
the input argument alternative: |
data.name |
a character string (vector of length
1) containing the names |
Null Hypothesis
For the one-sample z-test, the null hypothesis is
that the mean of the population from which x
is drawn is mu
.
For the standard two-sample z-tests, the null hypothesis is that the
population mean for x
less that for y
is mu
.
The alternative hypothesis in each case indicates the direction of
divergence of the population mean for x
(or difference of means of
x
and y
) from mu
(i.e., "greater"
,
"less"
, "two.sided"
).
Author(s)
Alan T. Arnholt
References
Kitchens, L. J. (2003). Basic Statistics and Data Analysis. Duxbury.
Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.
Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.
See Also
Examples
zsum.test(mean.x=56/30,sigma.x=2, n.x=30, alternative="greater", mu=1.8)
# Example 9.7 part a. from PASWR.
x <- rnorm(12)
zsum.test(mean(x),sigma.x=1,n.x=12)
# Two-sided one-sample z-test where the assumed value for
# sigma.x is one. The null hypothesis is that the population
# mean for 'x' is zero. The alternative hypothesis states
# that it is either greater or less than zero. A confidence
# interval for the population mean will be computed.
# Note: returns same answer as:
z.test(x,sigma.x=1)
#
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8)
y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5)
zsum.test(mean(x), sigma.x=0.5, n.x=11 ,mean(y), sigma.y=0.5, n.y=8, mu=2)
# Two-sided standard two-sample z-test where both sigma.x
# and sigma.y are both assumed to equal 0.5. The null hypothesis
# is that the population mean for 'x' less that for 'y' is 2.
# The alternative hypothesis is that this difference is not 2.
# A confidence interval for the true difference will be computed.
# Note: returns same answer as:
z.test(x, sigma.x=0.5, y, sigma.y=0.5)
#
zsum.test(mean(x), sigma.x=0.5, n.x=11, mean(y), sigma.y=0.5, n.y=8,
conf.level=0.90)
# Two-sided standard two-sample z-test where both sigma.x and
# sigma.y are both assumed to equal 0.5. The null hypothesis
# is that the population mean for 'x' less that for 'y' is zero.
# The alternative hypothesis is that this difference is not
# zero. A 90% confidence interval for the true difference will
# be computed. Note: returns same answer as:
z.test(x, sigma.x=0.5, y, sigma.y=0.5, conf.level=0.90)
rm(x, y)