Help for package BSDA

Type:

Package

Title:

Basic Statistics and Data Analysis

Version:

1.2.2

Date:

2023-09-14

LazyData:

yes

Maintainer:

Alan T. Arnholt <arnholtat@appstate.edu>

Description:

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Depends:

lattice, R (≥ 2.10)

Imports:

e1071

License:

GPL-3

Suggests:

ggplot2 (≥ 2.1.0), dplyr, tidyr

RoxygenNote:

7.2.3

Encoding:

UTF-8

URL:

https://github.com/alanarnholt/BSDA, https://alanarnholt.github.io/BSDA/

BugReports:

https://github.com/alanarnholt/BSDA/issues

NeedsCompilation:

Packaged:

2023-09-18 13:43:50 UTC; arnholtat

Author:

Alan T. Arnholt [aut, cre], Ben Evans [aut]

Repository:

CRAN

Date/Publication:

2023-09-18 17:50:05 UTC

Daily price returns (in pence) of Abbey National shares between 7/31/91 and 10/8/91

Description

Data used in problem 6.39

Usage

Abbey

Format

A data frame/tibble with 50 observations on one variable

price: daily price returns (in pence) of Abbey National shares

Source

Buckle, D. (1995), Bayesian Inference for Stable Distributions, Journal of the American Statistical Association, 90, 605-613.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Abbey$price)
qqline(Abbey$price)
t.test(Abbey$price, mu = 300)
hist(Abbey$price, main = "Exercise 6.39", 
     xlab = "daily price returns (in pence)",
     col = "blue")

Three samples to illustrate analysis of variance

Description

Data used in Exercise 10.1

Usage

Abc

Format

A data frame/tibble with 54 observations on two variables

response: a numeric vector
group: a character vector A, B, and C

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(response ~ group, col=c("red", "blue", "green"), data = Abc )
anova(lm(response ~ group, data = Abc))

Crimes reported in Abilene, Texas

Description

Data used in Exercise 1.23 and 2.79

Usage

Abilene

Format

A data frame/tibble with 16 observations on three variables

crimetype: a character variable with values Aggravated assault, Arson, Burglary, Forcible rape, Larceny theft, Murder, Robbery, and Vehicle theft.
year: a factor with levels 1992 and 1999
number: number of reported crimes

Source

Uniform Crime Reports, US Dept. of Justice.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


par(mfrow = c(2, 1))
barplot(Abilene$number[Abilene$year=="1992"],
names.arg = Abilene$crimetype[Abilene$year == "1992"],
main = "1992 Crime Stats", col = "red")
barplot(Abilene$number[Abilene$year=="1999"],
names.arg = Abilene$crimetype[Abilene$year == "1999"],
main = "1999 Crime Stats", col = "blue")
par(mfrow = c(1, 1))

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Abilene, aes(x = crimetype, y = number, fill = year)) +
           geom_bar(stat = "identity", position = "dodge") +
           theme_bw() +
           theme(axis.text.x = element_text(angle = 30, hjust = 1))

## End(Not run)

Perceived math ability for 13-year olds by gender

Description

Data used in Exercise 8.57

Usage

Ability

Format

A data frame/tibble with 400 observations on two variables

gender: a factor with levels girls and boys
ability: a factor with levels hopeless, belowavg, average, aboveavg, and superior

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


CT <- xtabs(~gender + ability, data = Ability)
CT
chisq.test(CT)

Abortion rate by region of country

Description

Data used in Exercise 8.51

Usage

Abortion

Format

A data frame/tibble with 51 observations on the following 10 variables:

state: a character variable with values alabama, alaska, arizona, arkansas, california, colorado, connecticut, delaware, dist of columbia, florida, georgia, hawaii, idaho, illinois, indiana, iowa, kansas, kentucky, louisiana, maine, maryland, massachusetts, michigan, minnesota, mississippi, missouri, montana, nebraska, nevada, new hampshire, new jersey, new mexico, new york, north carolina, north dakota, ohio, oklahoma, oregon, pennsylvania, rhode island, south carolina, south dakota, tennessee, texas, utah, vermont, virginia, washington, west virginia, wisconsin, and wyoming
region: a character variable with values midwest northeast south west
regcode: a numeric vector
rate1988: a numeric vector
rate1992: a numeric vector
rate1996: a numeric vector
provide1988: a numeric vector
provide1992: a numeric vector
lowhigh: a numeric vector
rate: a factor with levels Low and High

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~region + rate, data = Abortion)
T1
chisq.test(T1)

Number of absent days for 20 employees

Description

Data used in Exercise 1.28

Usage

Absent

Format

A data frame/tibble with 20 observations on one variable

days: days absent

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


CT <- xtabs(~ days, data = Absent)
CT
barplot(CT, col = "pink", main = "Exercise 1.28")
plot(ecdf(Absent$days), main = "ECDF")

Math achievement test scores by gender for 25 high school students

Description

Data used in Example 7.14 and Exercise 10.7

Usage

Achieve

Format

A data frame/tibble with 25 observations on two variables

score: mathematics achiement score
gender: a factor with 2 levels boys and girls

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


anova(lm(score ~ gender, data = Achieve))
t.test(score ~ gender, var.equal = TRUE, data = Achieve)

Number of ads versus number of sales for a retailer of satellite dishes

Description

Data used in Exercise 9.15

Usage

Adsales

Format

A data frame/tibble with six observations on three variables

month: a character vector listing month
ads: a numeric vector containing number of ads
sales: a numeric vector containing number of sales

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(sales ~ ads, data = Adsales, main = "Exercise 9.15")
mod <- lm(sales ~ ads, data = Adsales)
abline(mod, col = "red")
summary(mod)
predict(mod, newdata = data.frame(ads = 6), interval = "conf", level = 0.99)

Agressive tendency scores for a group of teenage members of a street gang

Description

Data used in Exercises 1.66 and 1.81

Usage

Aggress

Format

A data frame/tibble with 28 observations on one variable

aggres: measure of aggresive tendency, ranging from 10-50

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


with(data = Aggress,
     EDA(aggres))
# OR
IQR(Aggress$aggres)
diff(range(Aggress$aggres))

Monthly payments per person for families in the AFDC federal program

Description

Data used in Exercises 1.91 and 3.68

Usage

Aid

Format

A data frame/tibble with 51 observations on two variables

state: a factor with levels Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, District of Colunbia, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missour, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming
payment: average monthly payment per person in a family

Source

US Department of Health and Human Services, 1993.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Aid$payment, xlab = "payment", main = 
"Average monthly payment per person in a family", 
col = "lightblue")
boxplot(Aid$payment, col = "lightblue")
dotplot(state ~ payment, data = Aid)

Incubation times for 295 patients thought to be infected with HIV by a blood transfusion

Description

Data used in Exercise 6.60

Usage

Aids

Format

A data frame/tibble with 295 observations on three variables

duration: time (in months) from HIV infection to the clinical manifestation of full-blown AIDS
age: age (in years) of patient
group: a numeric vector

Source

Kalbsleich, J. and Lawless, J., (1989), An analysis of the data on transfusion related AIDS, Journal of the American Statistical Association, 84, 360-372.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


with(data = Aids,
EDA(duration)
)
with(data = Aids, 
     t.test(duration, mu = 30, alternative = "greater")
)
with(data = Aids, 
     SIGN.test(duration, md = 24, alternative = "greater")
)

Aircraft disasters in five different decades

Description

Data used in Exercise 1.12

Usage

Airdisasters

Format

A data frame /tibble with 141 observations on the following seven variables

year: a numeric vector indicating the year of an aircraft accident
deaths: a numeric vector indicating the number of deaths of an aircraft accident
decade: a character vector indicating the decade of an aircraft accident

Source

2000 World Almanac and Book of Facts.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


par(las = 1)
stripchart(deaths ~ decade, data = Airdisasters, 
           subset = decade != "1930s" & decade != "1940s", 
           method = "stack", pch = 19, cex = 0.5, col = "red", 
           main = "Aircraft Disasters 1950 - 1990", 
           xlab = "Number of fatalities")
par(las = 0)

Percentage of on-time arrivals and number of complaints for 11 airlines

Description

Data for Example 2.9

Usage

Airline

Format

A data frame/tibble with 11 observations on three variables

airline: a charater variable with values Alaska, Amer West, American, Continental, Delta, Northwest, Pan Am, Southwest, TWA, United, and USAir
ontime: a numeric vector
complaints: complaints per 1000 passengers

Source

Transportation Department.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


with(data = Airline, 
     barplot(complaints, names.arg = airline, col = "lightblue", 
     las = 2)
)
plot(complaints ~ ontime, data = Airline, pch = 19, col = "red",
     xlab = "On time", ylab = "Complaints")

Ages at which 14 female alcoholics began drinking

Description

Data used in Exercise 5.79

Usage

Alcohol

Format

A data frame/tibble with 14 observations on one variable

age: age when individual started drinking

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Alcohol$age)
qqline(Alcohol$age)
SIGN.test(Alcohol$age, md = 20, conf.level = 0.99)

Allergy medicines by adverse events

Description

Data used in Exercise 8.22

Usage

Allergy

Format

A data frame/tibble with 406 observations on two variables

event: a factor with levels insomnia, headache, and drowsiness
medication: a factor with levels seldane-d, pseudoephedrine, and placebo

Source

Marion Merrel Dow, Inc. Kansas City, Mo. 64114.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~event + medication, data = Allergy)
T1
chisq.test(T1)

Recovery times for anesthetized patients

Description

Data used in Exercise 5.58

Usage

Anesthet

Format

A with 10 observations on one variable

recover: recovery time (in hours)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Anesthet$recover)
qqline(Anesthet$recover)
with(data = Anesthet,
t.test(recover, conf.level = 0.90)$conf
)

Math test scores versus anxiety scores before the test

Description

Data used in Exercise 2.96

Usage

Anxiety

Format

A data frame/tibble with 20 observations on two variables

anxiety: anxiety score before a major math test
math: math test score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(math ~ anxiety, data = Anxiety, ylab = "score",
     main = "Exercise 2.96")
with(data = Anxiety,
cor(math, anxiety)
)
linmod <- lm(math ~ anxiety, data = Anxiety)
abline(linmod, col = "purple")
summary(linmod)

Level of apolipoprotein B and number of cups of coffee consumed per day for 15 adult males

Description

Data used in Examples 9.2 and 9.9

Usage

Apolipop

Format

A data frame/tibble with 15 observations on two variables

coffee: number of cups of coffee per day
apolipB: level of apoliprotein B

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(apolipB ~ coffee, data = Apolipop)
linmod <- lm(apolipB ~ coffee, data = Apolipop)
summary(linmod)
summary(linmod)$sigma
anova(linmod)
anova(linmod)[2, 3]^.5
par(mfrow = c(2, 2))
plot(linmod)
par(mfrow = c(1, 1))

Median costs of an appendectomy at 20 hospitals in North Carolina

Description

Data for Exercise 1.119

Usage

Append

Format

A data frame/tibble with 20 observations on one variable

fee: fees for an appendectomy for a random sample of 20 hospitals in North Carolina

Source

North Carolina Medical Database Commission, August 1994.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


fee <- Append$fee
ll <- mean(fee) - 2*sd(fee)
ul <- mean(fee) + 2*sd(fee)
limits <-c(ll, ul)
limits
fee[fee < ll | fee > ul]

Median costs of appendectomies at three different types of North Carolina hospitals

Description

Data for Exercise 10.60

Usage

Appendec

Format

A data frame/tibble with 59 observations on two variables

cost: median costs of appendectomies at hospitals across the state of North Carolina in 1992
region: a vector classifying each hospital as rural, regional, or metropolitan

Source

Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(cost ~ region, data = Appendec, col = c("red", "blue", "cyan"))
anova(lm(cost ~ region, data = Appendec))

Aptitude test scores versus productivity in a factory

Description

Data for Exercises 2.1, 2.26, 2.35 and 2.51

Usage

Aptitude

Format

A data frame/tibble with 8 observations on two variables

aptitude: aptitude test scores
product: productivity scores

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(product ~ aptitude, data = Aptitude, main = "Exercise 2.1")
model1 <- lm(product ~ aptitude, data = Aptitude)
model1
abline(model1, col = "red", lwd=3)
resid(model1)
fitted(model1)
cor(Aptitude$product, Aptitude$aptitude)

Radiocarbon ages of observations taken from an archaeological site

Description

Data for Exercises 5.120, 10.20 and Example 1.16

Usage

Archaeo

Format

A data frame/tibble with 60 observations on two variables

age: number of years before 1983 - the year the data were obtained
phase: Ceramic Phase numbers

Source

Cunliffe, B. (1984) and Naylor and Smith (1988).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(age ~ phase, data = Archaeo, col = "yellow", 
        main = "Example 1.16", xlab = "Ceramic Phase", ylab = "Age")
anova(lm(age ~ as.factor(phase), data= Archaeo))

Time of relief for three treatments of arthritis

Description

Data for Exercise 10.58

Usage

Arthriti

Format

A data frame/tibblewith 51 observations on two variables

time: time (measured in days) until an arthritis sufferer experienced relief
treatment: a factor with levels A, B, and C

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(time ~ treatment, data = Arthriti, 
col = c("lightblue", "lightgreen", "yellow"),
ylab = "days")
anova(lm(time ~ treatment, data = Arthriti))

Durations of operation for 15 artificial heart transplants

Description

Data for Exercise 1.107

Usage

Artifici

Format

A data frame/tibble with 15 observations on one variable

duration: duration (in hours) for transplant

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Artifici$duration, 2)
summary(Artifici$duration)
values <- Artifici$duration[Artifici$duration < 6.5]
values
summary(values)

Dissolving time versus level of impurities in aspirin tablets

Description

Data for Exercise 10.51

Usage

Asprin

Format

A data frame/tibble with 15 observations on two variables

time: time (in seconds) for aspirin to dissolve
impurity: impurity of an ingredient with levels 1%, 5%, and 10%

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(time ~ impurity, data = Asprin, 
        col = c("red", "blue", "green"))

Asthmatic relief index on nine subjects given a drug and a placebo

Description

Data for Exercise 7.52

Usage

Asthmati

Format

A data frame/tibble with nine observations on three variables

drug: asthmatic relief index for patients given a drug
placebo: asthmatic relief index for patients given a placebo
difference: difference between the placebo and drug

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Asthmati$difference)
qqline(Asthmati$difference)
shapiro.test(Asthmati$difference)
with(data = Asthmati,
     t.test(placebo, drug, paired = TRUE, mu = 0, alternative = "greater")
)

Number of convictions reported by U.S. attorney's offices

Description

Data for Example 2.2 and Exercises 2.43 and 2.57

Usage

Attorney

Format

A data frame/tibble with 88 observations on three variables

staff: U.S. attorneys' office staff per 1 million population
convict: U.S. attorneys' office convictions per 1 million population
district: a factor with levels Albuquerque, Alexandria, Va, Anchorage, Asheville, NC, Atlanta, Baltimore, Baton Rouge, Billings, Mt, Birmingham, Al, Boise, Id, Boston, Buffalo, Burlington, Vt, Cedar Rapids, Charleston, WVA, Cheyenne, Wy, Chicago, Cincinnati, Cleveland, Columbia, SC, Concord, NH, Denver, Des Moines, Detroit, East St. Louis, Fargo, ND, Fort Smith, Ark, Fort Worth, Grand Rapids, Mi, Greensboro, NC, Honolulu, Houston, Indianapolis, Jackson, Miss, Kansas City, Knoxville, Tn, Las Vegas, Lexington, Ky, Little Rock, Los Angeles, Louisville, Memphis, Miami, Milwaukee, Minneapolis, Mobile, Ala, Montgomery, Ala, Muskogee, Ok, Nashville, New Haven, Conn, New Orleans, New York (Brooklyn), New York (Manhattan), Newark, NJ, Oklahoma City, Omaha, Oxford, Miss, Pensacola, Fl, Philadelphia, Phoenix, Pittsburgh, Portland, Maine, Portland, Ore, Providence, RI, Raleigh, NC, Roanoke, Va, Sacramento, Salt Lake City, San Antonio, San Diego, San Francisco, Savannah, Ga, Scranton, Pa, Seattle, Shreveport, La, Sioux Falls, SD, South Bend, Ind, Spokane, Wash ,Springfield, Ill, St. Louis, Syracuse, NY, Tampa, Topeka, Kan, Tulsa, Tyler, Tex, Washington, Wheeling, WVa, and Wilmington, Del

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


par(mfrow=c(1, 2))
plot(convict ~ staff, data = Attorney, main = "With Washington, D.C.")
plot(convict[-86] ~staff[-86], data = Attorney, 
main = "Without Washington, D.C.")
par(mfrow=c(1, 1))

Number of defective auto gears produced by two manufacturers

Description

Data for Exercise 7.46

Usage

Autogear

Format

A data frame/tibble with 20 observations on two variables

defectives: number of defective gears in the production of 100 gears per day
manufacturer: a factor with levels A and B

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


t.test(defectives ~ manufacturer, data = Autogear)
wilcox.test(defectives ~ manufacturer, data = Autogear)
t.test(defectives ~ manufacturer, var.equal = TRUE, data = Autogear)

Illustrates inferences based on pooled t-test versus Wilcoxon rank sum test

Description

Data for Exercise 7.40

Usage

Backtoback

Format

A data frame/tibble with 24 observations on two variables

score: a numeric vector
group: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


wilcox.test(score ~ group, data = Backtoback)
t.test(score ~ group, data = Backtoback)

Baseball salaries for members of five major league teams

Description

Data for Exercise 1.11

Usage

Bbsalaries

Format

A data frame/tibble with 142 observations on two variables

salary: 1999 salary for baseball player
team: a factor with levels Angels, Indians, Orioles, Redsoxs, and Whitesoxs

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stripchart(salary ~ team, data = Bbsalaries, method = "stack", 
           pch = 19, col = "blue", cex = 0.75)
title(main = "Major League Salaries")

Graduation rates for student athletes and nonathletes in the Big Ten Conf.

Description

Data for Exercises 1.124 and 2.94

Usage

Bigten

Format

A data frame/tibble with 44 observations on the following four variables

school: a factor with levels Illinois, Indiana, Iowa, Michigan, Michigan State, Minnesota, Northwestern, Ohio State, Penn State, Purdue, and Wisconsin
rate: graduation rate
year: factor with two levels 1984-1985 and 1993-1994
status: factor with two levels athlete and student

Source

NCAA Graduation Rates Report, 2000.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(rate ~ status, data = subset(Bigten, year = "1993-1994"), 
horizontal = TRUE, main = "Graduation Rates 1993-1994")
with(data = Bigten,
     tapply(rate, list(year, status), mean)
)

Test scores on first exam in biology class

Description

Data for Exercise 1.49

Usage

Biology

Format

A data frame/tibble with 30 observations on one variable

score: test scores on the first test in a beginning biology class

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Biology$score, breaks = "scott", col = "brown", freq = FALSE, 
main = "Problem 1.49", xlab = "Test Score")
lines(density(Biology$score), lwd=3)

Live birth rates in 1990 and 1998 for all states

Description

Data for Example 1.10

Usage

Birth

Format

A data frame/tibble with 51 observations on three variables

state: a character with levels Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, District of Colunbia, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missour, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming
rate: live birth rates per 1000 population
year: a factor with levels 1990 and 1998

Source

National Vital Statistics Report, 48, March 28, 2000, National Center for Health Statistics.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


rate1998 <- subset(Birth, year == "1998", select = rate)
stem(x = rate1998$rate, scale = 2)
hist(rate1998$rate, breaks = seq(10.9, 21.9, 1.0), xlab = "1998 Birth Rate",
     main = "Figure 1.14 in BSDA", col = "pink")
hist(rate1998$rate, breaks = seq(10.9, 21.9, 1.0), xlab = "1998 Birth Rate",
     main = "Figure 1.16 in BSDA", col = "pink", freq = FALSE)      
lines(density(rate1998$rate), lwd = 3)
rm(rate1998)

Education level of blacks by gender

Description

Data for Exercise 8.55

Usage

Blackedu

Format

A data frame/tibble with 3800 observations on two variables

gender: a factor with levels Female and Male
education: a factor with levels High school dropout, High school graudate, Some college, Bachelor's degree, and Graduate degree

Source

Bureau of Census data.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~gender + education, data = Blackedu)
T1
chisq.test(T1)

Blood pressure of 15 adult males taken by machine and by an expert

Description

Data for Exercise 7.84

Usage

Blood

Format

A data frame/tibble with 15 observations on the following two variables

machine: blood pressure recorded from an automated blood pressure machine
expert: blood pressure recorded by an expert using an at-home device

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


DIFF <- Blood$machine - Blood$expert
shapiro.test(DIFF)
qqnorm(DIFF)
qqline(DIFF)
rm(DIFF)
t.test(Blood$machine, Blood$expert, paired = TRUE)

Incomes of board members from three different universities

Description

Data for Exercise 10.14

Usage

Board

Format

A data frame/tibble with 7 observations on three variables

salary: 1999 salary (in $1000) for board directors
university: a factor with levels A, B, and C

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(salary ~ university, data = Board, col = c("red", "blue", "green"), 
        ylab = "Income")
tapply(Board$salary, Board$university, summary)
anova(lm(salary ~ university, data = Board))
## Not run: 
library(dplyr)
dplyr::group_by(Board, university) %>%
         summarize(Average = mean(salary))

## End(Not run)

Bone density measurements of 35 physically active and 35 non-active women

Description

Data for Example 7.22

Usage

Bones

Format

A data frame/tibble with 70 observations on two variables

density: bone density measurements
group: a factor with levels active and nonactive

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


t.test(density ~ group, data = Bones, alternative = "greater")
t.test(rank(density) ~ group, data = Bones, alternative = "greater")
wilcox.test(density ~ group, data = Bones, alternative = "greater")

Number of books read and final spelling scores for 17 third graders

Description

Data for Exercise 9.53

Usage

Books

Format

A data frame/tibble with 17 observations on two variables

book: number of books read
spelling: spelling score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(spelling ~ book, data = Books)
mod <- lm(spelling ~ book, data = Books)
summary(mod)
abline(mod, col = "blue", lwd = 2)

Prices paid for used books at three different bookstores

Description

Data for Exercise 10.30 and 10.31

Usage

Bookstor

Format

A data frame/tibble with 72 observations on two variables

dollars: money obtained for selling textbooks
store: a factor with levels A, B, and C

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(dollars ~ store, data = Bookstor, 
        col = c("purple", "lightblue", "cyan"))
kruskal.test(dollars ~ store, data = Bookstor)

Brain weight versus body weight of 28 animals

Description

Data for Exercises 2.15, 2.44, 2.58 and Examples 2.3 and 2.20

Usage

Brain

Format

A data frame/tibble with 28 observations on three variables

species: a factor with levels African elephant, Asian Elephant, Brachiosaurus, Cat, Chimpanzee, Cow, Diplodocus, Donkey, Giraffe, Goat, Gorilla, Gray wolf, Guinea Pig, Hamster, Horse, Human, Jaguar, Kangaroo, Mole, Mouse, Mt Beaver, Pig, Potar monkey, Rabbit, Rat, Rhesus monkey, Sheep, and Triceratops
bodyweight: body weight (in kg)
brainweight: brain weight (in g)

Source

P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection (New York: Wiley, 1987).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(log(brainweight) ~ log(bodyweight), data = Brain, 
     pch = 19, col = "blue", main = "Example 2.3")
mod <- lm(log(brainweight) ~ log(bodyweight), data = Brain)      
abline(mod, lty = "dashed", col = "blue")

Repair costs of vehicles crashed into a barrier at 5 miles per hour

Description

Data for Exercise 1.73

Usage

Bumpers

Format

A data frame/tibble with 23 observations on two variables

car: a factor with levels Buick Century, Buick Skylark, Chevrolet Cavalier, Chevrolet Corsica, Chevrolet Lumina, Dodge Dynasty, Dodge Monaco, Ford Taurus, Ford Tempo, Honda Accord, Hyundai Sonata, Mazda 626, Mitsubishi Galant, Nissan Stanza, Oldsmobile Calais, Oldsmobile Ciere, Plymouth Acclaim, Pontiac 6000, Pontiac Grand Am, Pontiac Sunbird, Saturn SL2, Subaru Legacy, and Toyota Camry
repair: total repair cost (in dollars) after crashing a car into a barrier four times while the car was traveling at 5 miles per hour

Source

Insurance Institute of Highway Safety.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Bumpers$repair)
stripchart(Bumpers$repair, method = "stack", pch = 19, col = "blue")
library(lattice)
dotplot(car ~ repair, data = Bumpers)

Attendance of bus drivers versus shift

Description

Data for Exercise 8.25

Usage

Bus

Format

A data frame/tibble with 29363 observations on two variables

attendance: a factor with levels absent and present
shift: a factor with levels am, noon, pm, swing, and split

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~attendance + shift, data = Bus)
T1
chisq.test(T1)

Median charges for coronary bypass at 17 hospitals in North Carolina

Description

Data for Exercises 5.104 and 6.43

Usage

Bypass

Format

A data frame/tibble with 17 observations on two variables

hospital: a factor with levels Carolinas Med Ct, Duke Med Ct, Durham Regional, Forsyth Memorial, Frye Regional, High Point Regional, Memorial Mission, Mercy, Moore Regional, Moses Cone Memorial, NC Baptist, New Hanover Regional, Pitt Co. Memorial, Presbyterian, Rex, Univ of North Carolina, and Wake County
charge: median charge for coronary bypass

Source

Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Bypass$charge)
t.test(Bypass$charge, conf.level=.90)$conf
t.test(Bypass$charge, mu = 35000)

Estimates of costs of kitchen cabinets by two suppliers on 20 prospective homes

Description

Data for Exercise 7.83

Usage

Cabinets

Format

A data frame/tibble with 20 observations on three variables

home: a numeric vector
supplA: estimate for kitchen cabinets from supplier A (in dollars)
supplB: estimate for kitchen cabinets from supplier A (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


DIF <- Cabinets$supplA - Cabinets$supplB
qqnorm(DIF)
qqline(DIF)
shapiro.test(DIF)
with(data = Cabinets, 
     t.test(supplA, supplB, paired = TRUE)
)
with(data = Cabinets,
     wilcox.test(supplA, supplB, paired = TRUE)
)
rm(DIF)

Survival times of terminal cancer patients treated with vitamin C

Description

Data for Exercises 6.55 and 6.64

Usage

Cancer

Format

A data frame/tibble with 64 observations on two variables

survival: survival time (in days) of terminal patients treated with vitamin C
type: a factor indicating type of cancer with levels breast, bronchus, colon, ovary, and stomach

Source

Cameron, E and Pauling, L. 1978. “Supplemental Ascorbate in the Supportive Treatment of Cancer.” Proceedings of the National Academy of Science, 75, 4538-4542.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(survival ~ type, Cancer, col = "blue")
stomach <- Cancer$survival[Cancer$type == "stomach"]
bronchus <- Cancer$survival[Cancer$type == "bronchus"]
boxplot(stomach, ylab = "Days")
SIGN.test(stomach, md = 100, alternative = "greater")
SIGN.test(bronchus, md = 100, alternative = "greater")
rm(bronchus, stomach)

Carbon monoxide level measured at three industrial sites

Description

Data for Exercise 10.28 and 10.29

Usage

Carbon

Format

A data frame/tibble with 24 observations on two variables

CO: carbon monoxide measured (in parts per million)
site: a factor with levels SiteA, SiteB, and SiteC

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(CO ~ site, data = Carbon, col = "lightgreen")
kruskal.test(CO ~ site, data = Carbon)

Reading scores on the California achievement test for a group of 3rd graders

Description

Data for Exercise 1.116

Usage

Cat

Format

A data frame/tibble with 17 observations on one variable

score: reading score on the California Achievement Test

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Cat$score)
fivenum(Cat$score)
boxplot(Cat$score, main = "Problem 1.116", col = "green")

Entry age and survival time of patients with small cell lung cancer under two different treatments

Description

Data for Exercises 7.34 and 7.48

Usage

Censored

Format

A data frame/tibble with 121 observations on three variables

survival: survival time (in days) of patients with small cell lung cancer
treatment: a factor with levels armA and armB indicating the treatment a patient received
age: the age of the patient

Source

Ying, Z., Jung, S., Wei, L. 1995. “Survival Analysis with Median Regression Models.” Journal of the American Statistical Association, 90, 178-184.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(survival ~ treatment, data = Censored, col = "yellow")
wilcox.test(survival ~ treatment, data = Censored, alternative = "greater")

Temperatures and O-ring failures for the launches of the space shuttle Challenger

Description

Data for Examples 1.11, 1.12, 1.13, 2.11 and 5.1

Usage

Challeng

Format

A data frame/tibble with 25 observations on four variables

flight: a character variable indicating the flight
date: date of the flight
temp: temperature (in fahrenheit)
failures: number of failures

Source

Dalal, S. R., Fowlkes, E. B., Hoadley, B. 1989. “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association, 84, No. 408, 945-957.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Challeng$temp)
summary(Challeng$temp)
IQR(Challeng$temp)
quantile(Challeng$temp)
fivenum(Challeng$temp)
stem(sort(Challeng$temp)[-1])
summary(sort(Challeng$temp)[-1])
IQR(sort(Challeng$temp)[-1])
quantile(sort(Challeng$temp)[-1])
fivenum(sort(Challeng$temp)[-1])
par(mfrow=c(1, 2))
qqnorm(Challeng$temp)
qqline(Challeng$temp)
qqnorm(sort(Challeng$temp)[-1])
qqline(sort(Challeng$temp)[-1])
par(mfrow=c(1, 1))

Starting salaries of 50 chemistry majors

Description

Data for Example 5.3

Usage

Chemist

Format

A data frame/tibble with 50 observations on one variable

salary: starting salary (in dollars) for chemistry major

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Chemist$salary)

Surface salinity measurements taken offshore from Annapolis, Maryland in 1927

Description

Data for Exercise 6.41

Usage

Chesapea

Format

A data frame/tibble with 16 observations on one variable

salinity: surface salinity measurements (in parts per 1000) for station 11, offshore from Annanapolis, Maryland, on July 3-4, 1927.

Source

Davis, J. (1986) Statistics and Data Analysis in Geology, Second Edition. John Wiley and Sons, New York.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Chesapea$salinity)
qqline(Chesapea$salinity)
shapiro.test(Chesapea$salinity)
t.test(Chesapea$salinity, mu = 7)

Insurance injury ratings of Chevrolet vehicles for 1990 and 1993 models

Description

Data for Exercise 8.35

Usage

Chevy

Format

A data frame/tibble with 67 observations on two variables

year: a factor with levels 1988-90 and 1991-93
frequency: a factor with levels much better than average, above average, average, below average, and much worse than average

Source

Insurance Institute for Highway Safety and the Highway Loss Data Institute, 1995.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~year + frequency, data = Chevy)
T1
chisq.test(T1)
rm(T1)

Weight gain of chickens fed three different rations

Description

Data for Exercise 10.15

Usage

Chicken

Format

A data frame/tibble with 13 observations onthree variables

gain: weight gain over a specified period
feed: a factor with levels ration1, ration2, and ration3

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(gain ~ feed, col = c("red","blue","green"), data = Chicken)
anova(lm(gain ~ feed, data = Chicken))

Measurements of the thickness of the oxide layer of manufactured integrated circuits

Description

Data for Exercises 6.49 and 7.47

Usage

Chipavg

Format

A data frame/tibble with 30 observations on three variables

wafer1: thickness of the oxide layer for wafer1
wafer2: thickness of the oxide layer for wafer2
thickness: average thickness of the oxide layer of the eight measurements obtained from each set of two wafers

Source

Yashchin, E. 1995. “Likelihood Ratio Methods for Monitoring Parameters of a Nested Random Effect Model.” Journal of the American Statistical Association, 90, 729-738.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Chipavg$thickness)
t.test(Chipavg$thickness, mu = 1000)
boxplot(Chipavg$wafer1, Chipavg$wafer2, name = c("Wafer 1", "Wafer 2"))
shapiro.test(Chipavg$wafer1)
shapiro.test(Chipavg$wafer2)
t.test(Chipavg$wafer1, Chipavg$wafer2, var.equal = TRUE)

Four measurements on a first wafer and four measurements on a second wafer selected from 30 lots

Description

Data for Exercise 10.9

Usage

Chips

Format

A data frame/tibble with 30 observations on eight variables

wafer11: first measurement of thickness of the oxide layer for wafer1
wafer12: second measurement of thickness of the oxide layer for wafer1
wafer13: third measurement of thickness of the oxide layer for wafer1
wafer14: fourth measurement of thickness of the oxide layer for wafer1
wafer21: first measurement of thickness of the oxide layer for wafer2
wafer22: second measurement of thickness of the oxide layer for wafer2
wafer23: third measurement of thickness of the oxide layer for wafer2
wafer24: fourth measurement of thickness of the oxide layer for wafer2

Source

Yashchin, E. 1995. “Likelihood Ratio Methods for Monitoring Parameters of a Nested Random Effect Model.” Journal of the American Statistical Association, 90, 729-738.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


with(data = Chips, 
     boxplot(wafer11, wafer12, wafer13, wafer14, wafer21, 
             wafer22, wafer23, wafer24, col = "pink")
)

Milligrams of tar in 25 cigarettes selected randomly from 4 different brands

Description

Data for Example 10.4

Usage

Cigar

Format

A data frame/tibble with 100 observations on two variables

tar: amount of tar (measured in milligrams)
brand: a factor indicating cigarette brand with levels brandA, brandB, brandC, and brandD

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(tar ~ brand, data = Cigar, col = "cyan", ylab = "mg tar")
anova(lm(tar ~ brand, data = Cigar))

Effect of mother's smoking on birth weight of newborn

Description

Data for Exercise 2.27

Usage

Cigarett

Format

A data frame/tibble with 16 observations on two variables

cigarettes: mothers' estimated average number of cigarettes smoked per day
weight: children's birth weights (in pounds)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(weight ~ cigarettes, data = Cigarett)
model <- lm(weight ~ cigarettes, data = Cigarett)
abline(model, col = "red")
with(data = Cigarett,
     cor(weight, cigarettes)
)
rm(model)

Confidence Interval Simulation Program

Description

This program simulates random samples from which it constructs confidence intervals for one of the parameters mean (Mu), variance (Sigma), or proportion of successes (Pi).

Usage

CIsim(
  samples = 100,
  n = 30,
  mu = 0,
  sigma = 1,
  conf.level = 0.95,
  type = "Mean"
)

Arguments

samples

the number of samples desired.

n

the size of each sample.

mu

if constructing confidence intervals for the population mean or the population variance, mu is the population mean (i.e., type is one of either "Mean", or "Var"). If constructing confidence intervals for the poulation proportion of successes, the value entered for mu represents the population proportion of successes (Pi), and as such, must be a number between 0 and 1.

sigma

the population standard deviation. sigma is not required if confidence intervals are of type "Pi".

conf.level

confidence level for the graphed confidence intervals, restricted to lie between zero and one.

type

character string, one of "Mean", "Var" or "Pi", or just the initial letter of each, indicating the type of confidence interval simulation to perform.

Details

Default is to construct confidence intervals for the population mean. Simulated confidence intervals for the population variance or population proportion of successes are possible by selecting the appropriate value in the type argument.

Value

Graph depicts simulated confidence intervals. The number of confidence intervals that do not contain the parameter of interest are counted and reported in the commands window.

Author(s)

Alan T. Arnholt

Examples


CIsim(100, 30, 100, 10)
    # Simulates 100 samples of size 30 from 
    # a normal distribution with mean 100
    # and standard deviation 10.  From the
    # 100 simulated samples, 95% confidence
    # intervals for the Mean are constructed 
    # and depicted in the graph. 

CIsim(100, 30, 100, 10, type="Var")
    # Simulates 100 samples of size 30 from 
    # a normal distribution with mean 100
    # and standard deviation 10.  From the
    # 100 simulated samples, 95% confidence
    # intervals for the variance are constructed 
    # and depicted in the graph.
    
CIsim(100, 50, .5, type="Pi", conf.level=.90)     
    # Simulates 100 samples of size 50 from 
    # a binomial distribution where the population
    # proportion of successes is 0.5.  From the
    # 100 simulated samples, 90% confidence
    # intervals for Pi are constructed 
    # and depicted in the graph.

Percent of peak bone density of different aged children

Description

Data for Exercise 9.7

Usage

Citrus

Format

A data frame/tibble with nine observations on two variables

age: age of children
percent: percent peak bone density

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


model <- lm(percent ~ age, data = Citrus)
summary(model)
anova(model)
rm(model)

Residual contaminant following the use of three different cleansing agents

Description

Data for Exercise 10.16

Usage

Clean

Format

A data frame/tibble with 45 observations on two variables

clean: residual contaminants
agent: a factor with levels A, B, and C

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(clean ~ agent, col = c("red", "blue", "green"), data = Clean)
anova(lm(clean ~ agent, data = Clean))

Signal loss from three types of coxial cable

Description

Data for Exercise 10.24 and 10.25

Usage

Coaxial

Format

A data frame/tibble with 45 observations on two variables

signal: signal loss per 1000 feet
cable: factor with three levels of coaxial cable typeA, typeB, and typeC

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(signal ~ cable, data = Coaxial, col = c("red", "green", "yellow"))
kruskal.test(signal ~ cable, data = Coaxial)

Productivity of workers with and without a coffee break

Description

Data for Exercise 7.55

Usage

Coffee

Format

A data frame/tibble with nine observations on three variables

without: workers' productivity scores without a coffee break
with: workers' productivity scores with a coffee break
differences: with minus without

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Coffee$differences)
qqline(Coffee$differences)
shapiro.test(Coffee$differences)
t.test(Coffee$with, Coffee$without, paired = TRUE, alternative = "greater")
wilcox.test(Coffee$with, Coffee$without, paired = TRUE, 
alterantive = "greater")

Yearly returns on 12 investments

Description

Data for Exercise 5.68

Usage

Coins

Format

A data frame/tibble with 12 observations on one variable

return: yearly returns on each of 12 possible investments

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Coins$return)
qqline(Coins$return)

Combinations

Description

Computes all possible combinations of n objects taken k at a time.

Usage

Combinations(n, k)

Arguments

n

a number.

k

a number less than or equal to n.

Value

Returns a matrix containing the possible combinations of n objects taken k at a time.

Examples


Combinations(5,2)
    # The columns in the matrix list the values of the 10 possible
    # combinations of 5 things taken 2 at a time.

Commuting times for selected cities in 1980 and 1990

Description

Data for Exercises 1.13, and 7.85

Usage

Commute

Format

A data frame/tibble with 39 observations on three variables

city: a factor with levels Atlanta, Baltimore, Boston, Buffalo, Charlotte, Chicago, Cincinnati, Cleveland, Columbus, Dallas, Denver, Detroit, Hartford, Houston, Indianapolis, Kansas City, Los Angeles, Miami, Milwaukee, Minneapolis, New Orleans, New York, Norfolk, Orlando, Philadelphia, Phoenix, Pittsburgh, Portland, Providence, Rochester, Sacramento, Salt Lake City, San Antonio, San Diego, San Francisco, Seattle, St. Louis, Tampa, and Washington
year: year
time: commute times

Source

Federal Highway Administration.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stripplot(year ~ time, data = Commute, jitter = TRUE) 
dotplot(year ~ time, data = Commute)
bwplot(year ~ time, data = Commute)
stripchart(time ~ year, data = Commute, method = "stack", pch = 1, 
           cex = 2, col = c("red", "blue"), 
           group.names = c("1980", "1990"), 
           main = "", xlab = "minutes")
title(main = "Commute Time") 
boxplot(time ~ year, data = Commute, names=c("1980", "1990"),
        horizontal = TRUE, las = 1)

Tennessee self concept scale scores for a group of teenage boys

Description

Data for Exercise 1.68 and 1.82

Usage

Concept

Format

A data frame/tibble with 28 observations on one variable

self: Tennessee self concept scores

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


summary(Concept$self)
sd(Concept$self)
diff(range(Concept$self))
IQR(Concept$self)
summary(Concept$self/10)
IQR(Concept$self/10)
sd(Concept$self/10)
diff(range(Concept$self/10))

Compressive strength of concrete blocks made by two different methods

Description

Data for Example 7.17

Usage

Concrete

Format

A data frame/tibble with 20 observations on two variables

strength: comprehensive strength (in pounds per square inch)
method: factor with levels new and old indicating the method used to construct a concrete block

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


wilcox.test(strength ~ method, data = Concrete, alternative = "greater")

Comparison of the yields of a new variety and a standard variety of corn planted on 12 plots of land

Description

Data for Exercise 7.77

Usage

Corn

Format

A data frame/tibble with 12 observations on three variables

new: corn yield with new meathod
standard: corn yield with standard method
differences: new minus standard

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(Corn$differences)
qqnorm(Corn$differences)
qqline(Corn$differences)
shapiro.test(Corn$differences)
t.test(Corn$differences, alternative = "greater")

Exercise to illustrate correlation

Description

Data for Exercise 2.23

Usage

Correlat

Format

A data frame/tibble with 13 observations on two variables

x: a numeric vector
y: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(y ~ x, data = Correlat)
model <- lm(y ~ x, data = Correlat)
abline(model)
rm(model)

Scores of 18 volunteers who participated in a counseling process

Description

Data for Exercise 6.96

Usage

Counsel

Format

A data frame/tibble with 18 observations on one variable

score: standardized psychology scores after a counseling process

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Counsel$score)
t.test(Counsel$score, mu = 70)

Consumer price index from 1979 to 1998

Description

Data for Exercise 1.34

Usage

Cpi

Format

A data frame/tibble with 20 observations on two variables

year: year
cpi: consumer price index

Source

Bureau of Labor Statistics.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(cpi ~ year, data = Cpi, type = "l", lty = 2, lwd = 2, col = "red")   
barplot(Cpi$cpi, col = "pink", las = 2, main = "Problem 1.34")

Violent crime rates for the states in 1983 and 1993

Description

Data for Exercises 1.90, 2.32, 3.64, and 5.113

Usage

Crime

Format

A data frame/tibble with 102 observations on three variables

state: a factor with levels Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, DC, Delaware, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missour, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming
year: a factor with levels 1983 and 1993
rate: crime rate per 100,000 inhabitants

Source

U.S. Department of Justice, Bureau of Justice Statistics, Sourcebook of Criminal Justice Statistics, 1993.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(rate ~ year, data = Crime, col = "red")

Charles Darwin's study of cross-fertilized and self-fertilized plants

Description

Data for Exercise 7.62

Usage

Darwin

Format

A data frame/tibble with 15 observations on three variables

pot: number of pot
cross: height of plant (in inches) after a fixed period of time when cross-fertilized
self: height of plant (in inches) after a fixed period of time when self-fertilized

Source

Darwin, C. (1876) The Effect of Cross- and Self-Fertilization in the Vegetable Kingdom, 2nd edition, London.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


differ <- Darwin$cross - Darwin$self
qqnorm(differ)
qqline(differ)
shapiro.test(differ)
wilcox.test(Darwin$cross, Darwin$self, paired = TRUE)
rm(differ)

Automobile dealers classified according to type dealership and service rendered to customers

Description

Data for Example 2.22

Usage

Dealers

Format

A data frame/tibble with 122 observations on two variables

type: a factor with levels Honda, Toyota, Mazda, Ford, Dodge, and Saturn
service: a factor with levels Replaces unnecessarily and Follows manufacturer guidelines

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


xtabs(~type + service, data = Dealers)
T1 <- xtabs(~type + service, data = Dealers)
T1
addmargins(T1)
pt <- prop.table(T1, margin = 1)
pt
barplot(t(pt),  col = c("red", "skyblue"), legend = colnames(T1))
rm(T1, pt)

Number of defective items produced by 20 employees

Description

Data for Exercise 1.27

Usage

Defectiv

Format

A data frame/tibble with 20 observations on one variable

number: number of defective items produced by the employees in a small business firm

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~ number, data = Defectiv)
T1
barplot(T1, col = "pink", ylab = "Frequency",
xlab = "Defective Items Produced by Employees", main = "Problem 1.27")
rm(T1)

Percent of bachelor's degrees awarded women in 1970 versus 1990

Description

Data for Exercise 2.75

Usage

Degree

Format

A data frame/tibble with 1064 observations on two variables

field: a factor with levels Health, Education, Foreign Language, Psychology, Fine Arts, Life Sciences, Business, Social Science, Physical Sciences, Engineering, and All Fields
awarded: a factor with levels 1970 and 1990

Source

U.S. Department of Health and Human Services, National Center for Education Statistics.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~field + awarded, data = Degree)
T1
barplot(t(T1), beside = TRUE, col = c("red", "skyblue"), legend = colnames(T1))
rm(T1)

Delay times on 20 flights from four major air carriers

Description

Data for Exercise 10.55

Usage

Delay

Format

A data frame/tibble with 80 observations on two variables

delay: the delay time (in minutes) for 80 randomly selected flights
carrier: a factor with levels A, B, C, and D

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(delay ~ carrier, data = Delay, 
        main = "Exercise 10.55", ylab = "minutes",
        col = "pink")
kruskal.test(delay ~carrier, data = Delay)

Number of dependent children for 50 families

Description

Data for Exercise 1.26

Usage

Depend

Format

A data frame/tibble with 50 observations on one variable

number: number of dependent children in a family

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~ number, data = Depend)
T1
barplot(T1, col = "lightblue", main = "Problem 1.26",
xlab = "Number of Dependent Children", ylab = "Frequency")
rm(T1)

Educational levels of a sample of 40 auto workers in Detroit

Description

Data for Exercise 5.21

Usage

Detroit

Format

A data frame/tibble with 40 observations on one variable

educ: the educational level (in years) of a sample of 40 auto workers in a plant in Detroit

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Detroit$educ)

Demographic characteristics of developmental students at 2-year colleges and 4-year colleges

Description

Data used for Exercise 8.50

Usage

Develop

Format

A data frame/tibble with 5656 observations on two variables

race: a factor with levels African American, American Indian, Asian, Latino, and White
college: a factor with levels Two-year and Four-year

Source

Research in Development Education (1994), V. 11, 2.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~race + college, data = Develop)
T1
chisq.test(T1)
rm(T1)

Test scores for students who failed developmental mathematics in the fall semester 1995

Description

Data for Exercise 6.47

Usage

Devmath

Format

A data frame/tibble with 40 observations on one variable

score: first exam score

Source

Data provided by Dr. Anita Kitchens.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Devmath$score)
t.test(Devmath$score, mu = 80, alternative = "less")

Outcomes and probabilities of the roll of a pair of fair dice

Description

Data for Exercise 3.109

Usage

Dice

Format

A data frame/tibble with 11 observations on two variables

x: possible outcomes for the sum of two dice
px: probability for outcome x

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


roll1 <- sample(1:6, 20000, replace = TRUE)
roll2 <- sample(1:6, 20000, replace = TRUE)
outcome <- roll1 + roll2
T1 <- table(outcome)/length(outcome)
remove(roll1, roll2, outcome)
T1
round(t(Dice), 5)
rm(roll1, roll2, T1)

Diesel fuel prices in 1999-2000 in nine regions of the country

Description

Data for Exercise 2.8

Usage

Diesel

Format

A data frame/tibble with 650 observations on three variables

date: date when price was recorded
pricepergallon: price per gallon (in dollars)
location: a factor with levels California, CentralAtlantic, Coast, EastCoast, Gulf, LowerAtlantic, NatAvg, NorthEast, Rocky, and WesternMountain

Source

Energy Information Administration, National Enerfy Information Center: 1000 Independence Ave., SW, Washington, D.C., 20585.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


par(las = 2)
boxplot(pricepergallon ~ location, data = Diesel)
boxplot(pricepergallon ~ location, 
       data = droplevels(Diesel[Diesel$location == "EastCoast" | 
       Diesel$location == "Gulf" | Diesel$location == "NatAvg" | 
       Diesel$location == "Rocky" | Diesel$location == "California", ]), 
       col = "pink", main = "Exercise 2.8")
par(las = 1) 
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Diesel, aes(x = date, y = pricepergallon, 
           color = location)) + 
           geom_point() + 
           geom_smooth(se = FALSE) + 
           theme_bw() + 
           labs(y = "Price per Gallon (in dollars)")

## End(Not run)

Parking tickets issued to diplomats

Description

Data for Exercises 1.14 and 1.37

Usage

Diplomat

Format

A data frame/tibble with 10 observations on three variables

country: a factor with levels Brazil, Bulgaria, Egypt, Indonesia, Israel, Nigeria, Russia, S. Korea, Ukraine, and Venezuela
number: total number of tickets
rate: number of tickets per vehicle per month

Source

Time, November 8, 1993. Figures are from January to June 1993.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


par(las = 2, mfrow = c(2, 2))
stripchart(number ~ country, data = Diplomat, pch = 19, 
           col= "red", vertical = TRUE)
stripchart(rate ~ country, data = Diplomat, pch = 19, 
           col= "blue", vertical = TRUE) 
with(data = Diplomat, 
     barplot(number, names.arg = country, col = "red"))
with(data = Diplomat, 
     barplot(rate, names.arg = country, col = "blue"))           
par(las = 1, mfrow = c(1, 1))
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Diplomat, aes(x = reorder(country, number), 
                 y = number)) + 
           geom_bar(stat = "identity", fill = "pink", color = "black") + 
           theme_bw() + labs(x = "", y = "Total Number of Tickets")
ggplot2::ggplot(data = Diplomat, aes(x = reorder(country, rate), 
                 y = rate)) +
           geom_bar(stat = "identity", fill = "pink", color = "black") + 
           theme_bw() + labs(x = "", y = "Tickets per vehicle per month")

## End(Not run)

Toxic intensity for manufacturing plants producing herbicidal preparations

Description

Data for Exercise 1.127

Usage

Disposal

Format

A data frame/tibble with 29 observations on one variable

pounds: pounds of toxic waste per $1000 of shipments of its products

Source

Bureau of the Census, Reducing Toxins, Statistical Brief SB/95-3, February 1995.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Disposal$pounds)
fivenum(Disposal$pounds)
EDA(Disposal$pounds)

Rankings of the favorite breeds of dogs

Description

Data for Exercise 2.88

Usage

Dogs

Format

A data frame/tibble with 20 observations on three variables

breed: a factor with levels Beagle, Boxer, Chihuahua, Chow, Dachshund, Dalmatian, Doberman, Huskie, Labrador, Pomeranian, Poodle, Retriever, Rotweiler, Schnauzer, Shepherd, Shetland, ShihTzu, Spaniel, Springer, and Yorkshire
ranking: numeric ranking
year: a factor with levels 1992, 1993, 1997, and 1998

Source

The World Almanac and Book of Facts, 2000.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


cor(Dogs$ranking[Dogs$year == "1992"], Dogs$ranking[Dogs$year == "1993"])
cor(Dogs$ranking[Dogs$year == "1997"], Dogs$ranking[Dogs$year == "1998"])
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Dogs, aes(x = reorder(breed, ranking), y = ranking)) + 
           geom_bar(stat = "identity") + 
           facet_grid(year ~. ) + 
           theme(axis.text.x  = element_text(angle = 85, vjust = 0.5)) 

## End(Not run)

Rates of domestic violence per 1,000 women by age groups

Description

Data for Exercise 1.20

Usage

Domestic

Format

A data frame/tibble with five observations on two variables

age: a factor with levels 12-19, 20-24, 25-34, 35-49, and 50-64
rate: rate of domestic violence per 1000 women

Source

U.S. Department of Justice.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


barplot(Domestic$rate, names.arg = Domestic$age)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Domestic, aes(x = age, y = rate)) + 
           geom_bar(stat = "identity", fill = "purple", color = "black") + 
           labs(x = "", y = "Domestic violence per 1000 women") + 
           theme_bw()

## End(Not run)

Dopamine b-hydroxylase activity of schizophrenic patients treated with an antipsychotic drug

Description

Data for Exercises 5.14 and 7.49

Usage

Dopamine

Format

A data frame/tibble with 25 observations on two variables

dbh: dopamine b-hydroxylase activity (units are nmol/(ml)(h)/(mg) of protein)
group: a factor with levels nonpsychotic and psychotic

Source

D.E. Sternberg, D.P. Van Kammen, and W.E. Bunney, "Schizophrenia: Dopamine b-Hydroxylase Activity and Treatment Respsonse," Science, 216 (1982), 1423 - 1425.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(dbh ~ group, data = Dopamine, col = "orange")
t.test(dbh ~ group, data = Dopamine, var.equal = TRUE)

Closing yearend Dow Jones Industrial averages from 1896 through 2000

Description

Data for Exercise 1.35

Usage

Dowjones

Format

A data frame/tibble with 105 observations on three variables

year: date
close: Dow Jones closing price
change: percent change from previous year

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(close ~ year, data = Dowjones, type = "l", main = "Exercise 1.35")
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Dowjones, aes(x = year, y = close)) +
           geom_point(size = 0.5) + 
           geom_line(color = "red") + 
           theme_bw() + 
           labs(y = "Dow Jones Closing Price")

## End(Not run)

Opinion on referendum by view on moral issue of selling alcoholic beverages

Description

Data for Exercise 8.53

Usage

Drink

Format

A data frame/tibble with 472 observations on two variables

drinking: a factor with levels ok, tolerated, and immoral
referendum: a factor with levels for, against, and undecided

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~drinking + referendum, data = Drink)
T1
chisq.test(T1)
rm(T1)

Number of trials to master a task for a group of 28 subjects assigned to a control and an experimental group

Description

Data for Example 7.15

Usage

Drug

Format

A data frame/tibble with 28 observations on two variables

trials: number of trials to master a task
group: a factor with levels control and experimental

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(trials ~ group, data = Drug,
        main = "Example 7.15", col = c("yellow", "red"))
wilcox.test(trials ~ group, data = Drug)
t.test(rank(trials) ~ group, data = Drug, var.equal = TRUE)

Data on a group of college students diagnosed with dyslexia

Description

Data for Exercise 2.90

Usage

Dyslexia

Format

A data frame/tibble with eight observations on seven variables

words: number of words read per minute
age: age of participant
gender: a factor with levels female and male
handed: a factor with levels left and right
weight: weight of participant (in pounds)
height: height of participant (in inches)
children: number of children in family

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(height ~ weight, data = Dyslexia)
plot(words ~ factor(handed), data = Dyslexia,
     xlab = "hand", col = "lightblue")

One hundred year record of worldwide seismic activity(1770-1869)

Description

Data for Exercise 6.97

Usage

Earthqk

Format

A data frame/tibble with 100 observations on two variables

year: year seimic activity recorded
severity: annual incidence of sever earthquakes

Source

Quenoille, M.H. (1952), Associated Measurements, Butterworth, London. p 279.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Earthqk$severity)
t.test(Earthqk$severity, mu = 100, alternative = "greater")

Exploratory Data Anaalysis

Description

Function that produces a histogram, density plot, boxplot, and Q-Q plot.

Usage

EDA(x, trim = 0.05)

Arguments

x

numeric vector. NAs and Infs are allowed but will be removed.

trim

fraction (between 0 and 0.5, inclusive) of values to be trimmed from each end of the ordered data. If trim = 0.5, the result is the median.

Details

Will not return command window information on data sets containing more than 5000 observations. It will however still produce graphical output for data sets containing more than 5000 observations.

Value

Function returns various measures of center and location. The values returned for the Quartiles are based on the definitions provided in BSDA. The boxplot is based on the Quartiles returned in the commands window.

Note

Requires package e1071.

Author(s)

Alan T. Arnholt

Examples


EDA(rnorm(100))
    # Produces four graphs for the 100 randomly
    # generated standard normal variates.

Crime rates versus the percent of the population without a high school degree

Description

Data for Exercise 2.41

Usage

Educat

Format

A data frame/tibble with 51 observations on three variables

state: a factor with levels Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, DC, Delaware, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missour, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming
nodegree: percent of the population without a high school degree
crime: violent crimes per 100,000 population

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(crime ~ nodegree, data = Educat, 
     xlab = "Percent of population without high school degree",
     ylab = "Violent Crime Rate per 100,000")

Number of eggs versus amounts of feed supplement

Description

Data for Exercise 9.22

Usage

Eggs

Format

A data frame/tibble with 12 observations on two variables

feed: amount of feed supplement
eggs: number of eggs per day for 100 chickens

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(eggs ~ feed, data = Eggs)
model <- lm(eggs ~ feed, data = Eggs)
abline(model, col = "red")
summary(model)
rm(model)

Percent of the population over the age of 65

Description

Data for Exercise 1.92 and 2.61

Usage

Elderly

Format

A data frame/tibble with 51 observations on three variables

state: a factor with levels Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, District of Colunbia, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missour, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming
percent1985: percent of the population over the age of 65 in 1985
percent1998: percent of the population over the age of 65 in 1998

Source

U.S. Census Bureau Internet site, February 2000.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


with(data = Elderly, 
stripchart(x = list(percent1998, percent1985), method = "stack", pch = 19,
           col = c("red","blue"), group.names = c("1998", "1985"))
           )
with(data = Elderly, cor(percent1998, percent1985))
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Elderly, aes(x = percent1985, y = percent1998)) +
           geom_point() + 
           theme_bw()

## End(Not run)

Amount of energy consumed by homes versus their sizes

Description

Data for Exercises 2.5, 2.24, and 2.55

Usage

Energy

Format

A data frame/tibble with 12 observations on two variables

size: size of home (in square feet)
kilowatt: killowatt-hours per month

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(kilowatt ~ size, data = Energy)
with(data = Energy, cor(size, kilowatt))
model <- lm(kilowatt ~ size, data = Energy)
plot(Energy$size, resid(model), xlab = "size")

Salaries after 10 years for graduates of three different universities

Description

Data for Example 10.7

Usage

Engineer

Format

A data frame/tibble with 51 observations on two variables

salary: salary (in $1000) 10 years after graduation
university: a factor with levels A, B, and C

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(salary ~ university, data = Engineer,
        main = "Example 10.7", col = "yellow")
kruskal.test(salary ~ university, data = Engineer)
anova(lm(salary ~ university, data = Engineer))
anova(lm(rank(salary) ~ university, data = Engineer))

College entrance exam scores for 24 high school seniors

Description

Data for Example 1.8

Usage

Entrance

Format

A data frame/tibble with 24 observations on one variable

score: college entrance exam score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Entrance$score)
stem(Entrance$score, scale = 2)

Fuel efficiency ratings for compact vehicles in 2001

Description

Data for Exercise 1.65

Usage

Epaminicompact

Format

A data frame/tibble with 22 observations on ten variables

class: a character variable with value MINICOMPACT CARS
manufacturer: a character variable with values AUDI, BMW, JAGUAR, MERCEDES-BENZ, MITSUBISHI, and PORSCHE
carline: a character variable with values 325CI CONVERTIBLE, 330CI CONVERTIBLE, 911 CARRERA 2/4, 911 TURBO, CLK320 (CABRIOLET), CLK430 (CABRIOLET), ECLIPSE SPYDER, JAGUAR XK8 CONVERTIBLE, JAGUAR XKR CONVERTIBLE, M3 CONVERTIBLE, TT COUPE, and TT COUPE QUATTRO
displ: engine displacement (in liters)
cyl: number of cylinders
trans: a factor with levels Auto(L5), Auto(S4), Auto(S5), Manual(M5), and Manual(M6)
drv: a factor with levels 4(four wheel drive), F(front wheel drive), and R(rear wheel drive)
cty: city mpg
hwy: highway mpg
cmb: combined city and highway mpg

Source

EPA data.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


summary(Epaminicompact$cty)
plot(hwy ~ cty, data = Epaminicompact)

Fuel efficiency ratings for two-seater vehicles in 2001

Description

Data for Exercise 5.8

Usage

Epatwoseater

Format

A data frame/tibble with 36 observations on ten variables

class: a character variable with value TWO SEATERS
manufacturer: a character variable with values ACURA, AUDI, BMW, CHEVROLET, DODGE, FERRARI, HONDA, LAMBORGHINI, MAZDA, MERCEDES-BENZ, PLYMOUTH, PORSCHE, and TOYOTA
carline: a character variable with values BOXSTER, BOXSTER S, CORVETTE, DB132/144 DIABLO, FERRARI 360 MODENA/SPIDER, FERRARI 550 MARANELLO/BARCHETTA, INSIGHT, MR2 ,MX-5 MIATA, NSX, PROWLER, S2000, SL500, SL600, SLK230 KOMPRESSOR, SLK320, TT ROADSTER, TT ROADSTER QUATTRO, VIPER CONVERTIBLE, VIPER COUPE, Z3 COUPE, Z3 ROADSTER, and Z8
displ: engine displacement (in liters)
cyl: number of cylinders
trans: a factor with levels Auto(L4), Auto(L5), Auto(S4), Auto(S5), Auto(S6), Manual(M5), and Manual(M6)
drv: a factor with levels 4(four wheel drive) F(front wheel drive) R(rear wheel drive)
cty: city mpg
hwy: highway mpg
cmb: combined city and highway mpg

@source Environmental Protection Agency.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


summary(Epatwoseater$cty)
plot(hwy ~ cty, data = Epatwoseater)
boxplot(cty ~ drv, data = Epatwoseater, col = "lightgreen")

Ages of 25 executives

Description

Data for Exercise 1.104

Usage

Executiv

Format

A data frame/tibble with 25 observations on one variable

age: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Executiv$age, xlab = "Age of banking executives", 
breaks = 5, main = "", col = "gray")

Weight loss for 30 members of an exercise program

Description

Data for Exercise 1.44

Usage

Exercise

Format

A data frame/tibble with 30 observations on one variable

loss: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Exercise$loss)

Measures of softness of ten different clothing garments washed with and without a softener

Description

Data for Example 7.21

Usage

Fabric

Format

A data frame/tibble with 20 observations on three variables

garment: a numeric vector
softner: a character variable with values with and without
softness: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


## Not run: 
library(tidyr)
tidyr::spread(Fabric, softner, softness) -> FabricWide
wilcox.test(Pair(with, without)~1, alternative = "greater", data = FabricWide)
T7 <- tidyr::spread(Fabric, softner, softness) %>% 
mutate(di = with - without, adi = abs(di), rk = rank(adi), 
       srk = sign(di)*rk)
T7
t.test(T7$srk, alternative = "greater")

## End(Not run)

Waiting times between successive eruptions of the Old Faithful geyser

Description

Data for Exercise 5.12 and 5.111

Usage

Faithful

Format

A data frame/tibble with 299 observations on two variables

time: a numeric vector
eruption: a factor with levels 1 and 2

Source

A. Azzalini and A. Bowman, "A Look at Some Data on the Old Faithful Geyser," Journal of the Royal Statistical Society, Series C, 39 (1990), 357-366.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


t.test(time ~ eruption, data = Faithful)
hist(Faithful$time, xlab = "wait time", main = "", freq = FALSE)
lines(density(Faithful$time))

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Faithful, aes(x = time, y = ..density..)) + 
           geom_histogram(binwidth = 5, fill = "pink", col = "black") + 
           geom_density() + 
           theme_bw() + 
           labs(x = "wait time")

## End(Not run)

Size of family versus cost per person per week for groceries

Description

Data for Exercise 2.89

Usage

Family

Format

A data frame/tibble with 20 observations on two variables

number: number in family
cost: cost per person (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(cost ~ number, data = Family)
abline(lm(cost ~ number, data = Family), col = "red")
cor(Family$cost, Family$number)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Family, aes(x = number, y = cost)) + 
           geom_point() + 
           geom_smooth(method = "lm") + 
           theme_bw()

## End(Not run)

Choice of presidental ticket in 1984 by gender

Description

Data for Exercise 8.23

Usage

Ferraro1

Format

A data frame/tibble with 1000 observations on two variables

gender: a factor with levels Men and Women
candidate: a character vector of 1984 president and vice-president candidates

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~gender + candidate, data = Ferraro1)
T1
chisq.test(T1)  
rm(T1)

Choice of vice presidental candidate in 1984 by gender

Description

Data for Exercise 8.23

Usage

Ferraro2

Format

A data frame/tibble with 1000 observations on two variables

gender: a factor with levels Men and Women
candidate: a character vector of 1984 president and vice-president candidates

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~gender + candidate, data = Ferraro2)
T1
chisq.test(T1)  
rm(T1)

Fertility rates of all 50 states and DC

Description

Data for Exercise 1.125

Usage

Fertility

Format

A data frame/tibble with 51 observations on two variables

state: a character variable with values Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, District of Colunbia, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland,Massachusetts, Michigan, Minnesota, Mississippi, Missour, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming
rate: fertility rate (expected number of births during childbearing years)

Source

Population Reference Bureau.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Fertility$rate)
fivenum(Fertility$rate)
EDA(Fertility$rate)

Ages of women at the birth of their first child

Description

Data for Exercise 5.11

Usage

Firstchi

Format

A data frame/tibble with 87 observations on one variable

age: age of woman at birth of her first child

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Firstchi$age)

Length and number of fish caught with small and large mesh codend

Description

Data for Exercises 5.83, 5.119, and 7.29

Usage

Fish

Format

A data frame/tibble with 1534 observations on two variables

codend: a character variable with values smallmesh and largemesh
length: length of the fish measured in centimeters

Source

R. Millar, “Estimating the Size - Selectivity of Fishing Gear by Conditioning on the Total Catch,” Journal of the American Statistical Association, 87 (1992), 962 - 968.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


tapply(Fish$length, Fish$codend, median, na.rm = TRUE)
SIGN.test(Fish$length[Fish$codend == "smallmesh"], conf.level = 0.99)
## Not run: 
dplyr::group_by(Fish, codend) %>%
         summarize(MEDIAN = median(length, na.rm = TRUE))

## End(Not run)

Number of sit-ups before and after a physical fitness course

Description

Data for Exercise 7.71

Usage

Fitness

Format

A data frame/tibble with 18 observations on the three variables

subject: a character variable indicating subject number
test: a character variable with values After and Before
number: a numeric vector recording the number of sit-ups performed in one minute

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


## Not run: 
tidyr::spread(Fitness, test, number) -> FitnessWide
t.test(Pair(After, Before)~1, alternative = "greater", data = FitnessWide)

Wide <- tidyr::spread(Fitness, test, number) %>%
mutate(diff = After - Before)
Wide
qqnorm(Wide$diff)
qqline(Wide$diff)
t.test(Wide$diff, alternative = "greater")

## End(Not run)

Florida voter results in the 2000 presidential election

Description

Data for Statistical Insight Chapter 2

Usage

Florida2000

Format

A data frame/tibble with 67 observations on 12 variables

county: a character variable with values ALACHUA, BAKER, BAY, BRADFORD, BREVARD, BROWARD, CALHOUN, CHARLOTTE, CITRUS, CLAY, COLLIER, COLUMBIA, DADE, DE SOTO, DIXIE, DUVAL, ESCAMBIA, FLAGLER, FRANKLIN, GADSDEN, GILCHRIST, GLADES, GULF, HAMILTON, HARDEE, HENDRY, HERNANDO, HIGHLANDS, HILLSBOROUGH, HOLMES, INDIAN RIVER, JACKSON, JEFFERSON, LAFAYETTE, LAKE, LEE, LEON, LEVY, LIBERTY, MADISON, MANATEE, MARION, MARTIN, MONROE, NASSAU, OKALOOSA, OKEECHOBEE, ORANGE, OSCEOLA, PALM BEACH, PASCO, PINELLAS, POLK, PUTNAM, SANTA ROSA, SARASOTA, SEMINOLE, ST. JOHNS, ST. LUCIE, SUMTER, SUWANNEE, TAYLOR, UNION, VOLUSIA, WAKULLA, WALTON, and WASHINGTON
gore: number of votes
bush: number of votes
buchanan: number of votes
nader: number of votes
browne: number of votes
hagelin: number of votes
harris: number of votes
mcreynolds: number of votes
moorehead: number of votes
phillips: number of votes
total: number of votes

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(buchanan ~ total, data = Florida2000, 
     xlab = "Total votes cast (in thousands)", 
     ylab = "Votes for Buchanan")

Breakdown times of an insulating fluid under various levels of voltage stress

Description

Data for Exercise 5.76

Usage

Fluid

Format

A data frame/tibble with 76 observations on two variables

kilovolts: a character variable showing kilowats
time: breakdown time (in minutes)

Source

E. Soofi, N. Ebrahimi, and M. Habibullah, 1995.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


DF1 <- Fluid[Fluid$kilovolts == "34kV", ]
DF1
# OR
DF2 <- subset(Fluid, subset = kilovolts == "34kV")
DF2
stem(DF2$time)
SIGN.test(DF2$time)
## Not run: 
library(dplyr)
DF3 <- dplyr::filter(Fluid, kilovolts == "34kV") 
DF3

## End(Not run)

Annual food expenditures for 40 single households in Ohio

Description

Data for Exercise 5.106

Usage

Food

Format

A data frame/tibble with 40 observations on one variable

expenditure: a numeric vector recording annual food expenditure (in dollars) in the state of Ohio.

Source

Bureau of Labor Statistics.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Food$expenditure)

Cholesterol values of 62 subjects in the Framingham Heart Study

Description

Data for Exercises 1.56, 1.75, 3.69, and 5.60

Usage

Framingh

Format

A data frame/tibble with 62 observations on one variable

cholest: a numeric vector with cholesterol values

Source

R. D'Agostino, et al., (1990) "A Suggestion for Using Powerful and Informative Tests for Normality," The American Statistician, 44 316-321.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Framingh$cholest)
boxplot(Framingh$cholest, horizontal = TRUE)
hist(Framingh$cholest, freq = FALSE)
lines(density(Framingh$cholest))
mean(Framingh$cholest > 200 & Framingh$cholest < 240)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Framingh, aes(x = factor(1), y = cholest)) + 
  geom_boxplot() +                 # boxplot
  labs(x = "") +                   # no x label  
  theme_bw() +                     # black and white theme  
  geom_jitter(width = 0.2) +       # jitter points
  coord_flip()                     # Create horizontal plot
ggplot2::ggplot(data = Framingh, aes(x = cholest, y = ..density..)) +
  geom_histogram(fill = "pink", binwidth = 15, color = "black") + 
  geom_density() + 
  theme_bw()

## End(Not run)

Ages of a random sample of 30 college freshmen

Description

Data for Exercise 6.53

Usage

Freshman

Format

A data frame/tibble with 30 observations on one variable

age: a numeric vector of ages

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


SIGN.test(Freshman$age, md = 19)

Cost of funeral by region of country

Description

Data for Exercise 8.54

Usage

Funeral

Format

A data frame/tibble with 400 observations on two variables

region: a factor with levels Central, East, South, and West
cost: a factor with levels less than expected, about what expected, and more than expected

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~region + cost, data = Funeral)
T1
chisq.test(T1)  
rm(T1)

Velocities of 82 galaxies in the Corona Borealis region

Description

Data for Example 5.2

Usage

Galaxie

Format

A data frame/tibble with 82 observations on one variable

velocity: velocity measured in kilometers per second

Source

K. Roeder, "Density Estimation with Confidence Sets Explained by Superclusters and Voids in the Galaxies," Journal of the American Statistical Association, 85 (1990), 617-624.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Galaxie$velocity)

Results of a Gallup poll on possession of marijuana as a criminal offense conducted in 1980

Description

Data for Exercise 2.76

Usage

Gallup

Format

A data frame/tibble with 1,200 observations on two variables

demographics: a factor with levels National, Gender: Male Gender: Female, Education: College, Eduction: High School, Education: Grade School, Age: 18-24, Age: 25-29, Age: 30-49, Age: 50-older, Religion: Protestant, and Religion: Catholic
opinion: a factor with levels Criminal, Not Criminal, and No Opinion

Source

George H. Gallup The Gallup Opinion Index Report No. 179 (Princeton, NJ: The Gallup Poll, July 1980), p. 15.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~demographics + opinion, data = Gallup)
T1
t(T1[c(2, 3), ])
barplot(t(T1[c(2, 3), ]))
barplot(t(T1[c(2, 3), ]), beside = TRUE)

## Not run: 
library(dplyr)
library(ggplot2)
dplyr::filter(Gallup, demographics == "Gender: Male" | demographics == "Gender: Female") %>%
ggplot2::ggplot(aes(x = demographics, fill = opinion)) + 
           geom_bar() + 
           theme_bw() + 
           labs(y = "Fraction")

## End(Not run)

Price of regular unleaded gasoline obtained from 25 service stations

Description

Data for Exercise 1.45

Usage

Gasoline

Format

A data frame/tibble with 25 observations on one variable

price: price for one gallon of gasoline

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Gasoline$price)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Gasoline, aes(x = factor(1), y = price)) + 
           geom_violin() + 
           geom_jitter() + 
           theme_bw()

## End(Not run)

Number of errors in copying a German passage before and after an experimental course in German

Description

Data for Exercise 7.60

Usage

German

Format

A data frame/tibble with ten observations on three variables

student: a character variable indicating student number
when: a character variable with values Before and After to indicate when the student received experimental instruction in German
errors: the number of errors in copying a German passage

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


## Not run: 
tidyr::spread(German, when, errors) -> GermanWide
t.test(Pair(After, Before) ~ 1, data = GermanWide)
wilcox.test(Pair(After, Before) ~ 1, data = GermanWide)
T8 <- tidyr::spread(German, when, errors) %>%
mutate(di = After - Before, adi = abs(di), rk = rank(adi), srk = sign(di)*rk)
T8
qqnorm(T8$di)
qqline(T8$di)
t.test(T8$srk)

## End(Not run)

Distances a golf ball can be driven by 20 professional golfers

Description

Data for Exercise 5.24

Usage

Golf

Format

A data frame/tibble with 20 observations on one variable

yards: distance a golf ball is driven in yards

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Golf$yards)
qqnorm(Golf$yards)
qqline(Golf$yards)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Golf, aes(sample = yards)) + 
           geom_qq() + 
           theme_bw()

## End(Not run)

Annual salaries for state governors in 1994 and 1999

Description

Data for Exercise 5.112

Usage

Governor

Format

A data frame/tibble with 50 observations on three variables

state: a character variable with values Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming
year: a factor indicating year
salary: a numeric vector with the governor's salary (in dollars)

Source

The 2000 World Almanac and Book of Facts.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(salary ~ year, data = Governor)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Governor, aes(x = salary)) + 
           geom_density(fill = "pink") + 
           facet_grid(year ~ .) + 
           theme_bw()

## End(Not run)

High school GPA versus college GPA

Description

Data for Example 2.13

Usage

Gpa

Format

A data frame/tibble with 10 observations on two variables

hsgpa: high school gpa
collgpa: college gpa

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(collgpa ~ hsgpa, data = Gpa)
mod <- lm(collgpa ~ hsgpa, data = Gpa)
abline(mod)               # add line
yhat <- predict(mod)      # fitted values
e <- resid(mod)           # residuals
cbind(Gpa, yhat, e)       # Table 2.1
cor(Gpa$hsgpa, Gpa$collgpa)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Gpa, aes(x = hsgpa, y = collgpa)) + 
           geom_point() + 
           geom_smooth(method = "lm") + 
           theme_bw()

## End(Not run)

Test grades in a beginning statistics class

Description

Data for Exercise 1.120

Usage

Grades

Format

A data frame with 29 observations on one variable

grades: a numeric vector containing test grades

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Grades$grades, main = "", xlab = "Test grades", right = FALSE)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Grades, aes(x = grades, y = ..density..)) + 
           geom_histogram(fill = "pink", binwidth = 5, color = "black") + 
           geom_density(lwd = 2, color = "red") + 
           theme_bw() 

## End(Not run)

Graduation rates for student athletes in the Southeastern Conf.

Description

Data for Exercise 1.118

Usage

Graduate

Format

A data frame/tibble with 12 observations on three variables

school: a character variable with values Alabama, Arkansas, Auburn, Florida, Georgia, Kentucky, Louisiana St, Mississippi, Mississippi St, South Carolina, Tennessee, and Vanderbilt
code: a character variable with values Al, Ar, Au Fl, Ge, Ke, LSt, Mi, MSt, SC, Te, and Va
percent: graduation rate

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


barplot(Graduate$percent, names.arg = Graduate$school, 
        las = 2, cex.names = 0.7, col = "tomato")

Varve thickness from a sequence through an Eocene lake deposit in the Rocky Mountains

Description

Data for Exercise 6.57

Usage

Greenriv

Format

A data frame/tibble with 37 observations on one variable

thick: varve thickness in millimeters

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Greenriv$thick)
SIGN.test(Greenriv$thick, md = 7.3, alternative = "greater")

Thickness of a varved section of the Green river oil shale deposit near a major lake in the Rocky Mountains

Description

Data for Exercises 6.45 and 6.98

Usage

Grnriv2

Format

A data frame/tibble with 101 observations on one variable

thick: varve thickness (in millimeters)

Source

J. Davis, Statistics and Data Analysis in Geology, 2nd Ed., Jon Wiley and Sons, New York.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Grnriv2$thick)
t.test(Grnriv2$thick, mu = 8, alternative = "less")

Group data to illustrate analysis of variance

Description

Data for Exercise 10.42

Usage

Groupabc

Format

A data frame/tibble with 45 observations on two variables

group: a factor with levels A, B, and C
response: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(response ~ group, data = Groupabc, 
        col = c("red", "blue", "green"))
        anova(lm(response ~ group, data = Groupabc))

An illustration of analysis of variance

Description

Data for Exercise 10.4

Usage

Groups

Format

A data frame/tibble with 78 observations on two variables

group: a factor with levels A, B, and C
response: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(response ~ group, data = Groups, col = c("red", "blue", "green"))
anova(lm(response ~ group, data = Groups))

Children's age versus number of completed gymnastic activities

Description

Data for Exercises 2.21 and 9.14

Usage

Gym

Format

A data frame/tibble with eight observations on three variables

age: age of child
number: number of gymnastic activities successfully completed

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(number ~ age, data = Gym)
model <- lm(number ~ age, data = Gym)
abline(model, col = "red")
summary(model)

Study habits of students in two matched school districts

Description

Data for Exercise 7.57

Usage

Habits

Format

A data frame/tibble with 11 observations on four variables

A: study habit score
B: study habit score
differ: B minus A
signrks: the signed-ranked-differences

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


shapiro.test(Habits$differ)
qqnorm(Habits$differ)
qqline(Habits$differ)
wilcox.test(Pair(B, A) ~ 1, data = Habits, alternative = "less")
t.test(Habits$signrks, alternative = "less")

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Habits, aes(x = differ)) + 
           geom_dotplot(fill = "blue") + 
           theme_bw()

## End(Not run)

Haptoglobin concentration in blood serum of 8 healthy adults

Description

Data for Example 6.9

Usage

Haptoglo

Format

A data frame/tibble with eight observations on one variable

concent: haptoglobin concentration (in grams per liter)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


shapiro.test(Haptoglo$concent)
t.test(Haptoglo$concent, mu = 2, alternative = "less")

Daily receipts for a small hardware store for 31 working days

Description

Daily receipts for a small hardware store for 31 working days

Usage

Hardware

Format

A data frame with 31 observations on one variable

receipt: a numeric vector of daily receipts (in dollars)

Source

J.C. Miller and J.N. Miller, (1988), Statistics for Analytical Chemistry, 2nd Ed. (New York: Halsted Press).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Hardware$receipt)

Tensile strength of Kraft paper for different percentages of hardwood in the batches of pulp

Description

Data for Example 2.18 and Exercise 9.34

Usage

Hardwood

Format

A data frame/tibble with 19 observations on two variables

tensile: tensile strength of kraft paper (in pounds per square inch)
hardwood: percent of hardwood in the batch of pulp that was used to produce the paper

Source

G. Joglekar, et al., "Lack-of-Fit Testing When Replicates Are Not Available," The American Statistician, 43(3), (1989), 135-143.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(tensile ~ hardwood, data = Hardwood)
model <- lm(tensile ~ hardwood, data = Hardwood)
abline(model, col = "red")
plot(model, which = 1)

Primary heating sources of homes on indian reservations versus all households

Description

Data for Exercise 1.29

Usage

Heat

Format

A data frame/tibble with 301 observations on two variables

fuel: a factor with levels Utility gas, LP bottled gas, Electricity, Fuel oil, Wood, and Other
location: a factor with levels American Indians on reservation, All U.S. households, and American Indians not on reservations

Source

Bureau of the Census, Housing of the American Indians on Reservations, Statistical Brief 95-11, April 1995.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~ fuel + location, data = Heat)
T1
barplot(t(T1), beside = TRUE, legend = TRUE)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Heat, aes(x = fuel, fill = location)) + 
           geom_bar(position = "dodge") + 
           labs(y = "percent") + 
           theme_bw() + 
           theme(axis.text.x = element_text(angle = 30, hjust = 1)) 

## End(Not run)

Fuel efficiency ratings for three types of oil heaters

Description

Data for Exercise 10.32

Usage

Heating

Format

A data frame/tibble with 90 observations on the two variables

type: a factor with levels A, B, and C denoting the type of oil heater
efficiency: heater efficiency rating

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(efficiency ~ type, data = Heating, 
        col = c("red", "blue", "green"))
kruskal.test(efficiency ~ type, data = Heating)

Results of treatments for Hodgkin's disease

Description

Data for Exercise 2.77

Usage

Hodgkin

Format

A data frame/tibble with 538 observations on two variables

type: a factor with levels LD, LP, MC, and NS
response: a factor with levels Positive, Partial, and None

Source

I. Dunsmore, F. Daly, Statistical Methods, Unit 9, Categorical Data, Milton Keynes, The Open University, 18.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~type + response, data = Hodgkin)
T1
barplot(t(T1), legend = TRUE, beside = TRUE)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Hodgkin, aes(x = type, fill = response)) + 
           geom_bar(position = "dodge") + 
           theme_bw()

## End(Not run)

Median prices of single-family homes in 65 metropolitan statistical areas

Description

Data for Statistical Insight Chapter 5

Usage

Homes

Format

A data frame/tibble with 65 observations on the four variables

city: a character variable with values Akron OH, Albuquerque NM, Anaheim CA, Atlanta GA, Baltimore MD, Baton Rouge LA, Birmingham AL, Boston MA, Bradenton FL, Buffalo NY, Charleston SC, Chicago IL, Cincinnati OH, Cleveland OH, Columbia SC, Columbus OH, Corpus Christi TX, Dallas TX, Daytona Beach FL, Denver CO, Des Moines IA, Detroit MI, El Paso TX, Grand Rapids MI, Hartford CT, Honolulu HI, Houston TX, Indianapolis IN, Jacksonville FL, Kansas City MO, Knoxville TN, Las Vegas NV, Los Angeles CA, Louisville KY, Madison WI, Memphis TN, Miami FL, Milwaukee WI, Minneapolis MN, Mobile AL, Nashville TN, New Haven CT, New Orleans LA, New York NY, Oklahoma City OK, Omaha NE, Orlando FL, Philadelphia PA, Phoenix AZ, Pittsburgh PA, Portland OR, Providence RI, Sacramento CA, Salt Lake City UT, San Antonio TX, San Diego CA, San Francisco CA, Seattle WA, Spokane WA, St Louis MO, Syracuse NY, Tampa FL, Toledo OH, Tulsa OK, and Washington DC
region: a character variable with values Midwest, Northeast, South, and West
year: a factor with levels 1994 and 2000
price: median house price (in dollars)

Source

National Association of Realtors.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


tapply(Homes$price, Homes$year, mean)
tapply(Homes$price, Homes$region, mean)
p2000 <- subset(Homes, year == "2000")
p1994 <- subset(Homes, year == "1994")
## Not run: 
library(dplyr)
library(ggplot2)
dplyr::group_by(Homes, year, region) %>%
   summarize(AvgPrice = mean(price))
ggplot2::ggplot(data = Homes, aes(x = region, y = price)) + 
           geom_boxplot() + 
           theme_bw() + 
           facet_grid(year ~ .)

## End(Not run)

Number of hours per week spent on homework for private and public high school students

Description

Data for Exercise 7.78

Usage

Homework

Format

A data frame with 30 observations on two variables

school: type of school either private or public
time: number of hours per week spent on homework

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(time ~ school, data = Homework, 
        ylab = "Hours per week spent on homework")
#
t.test(time ~ school, data = Homework)

Miles per gallon for a Honda Civic on 35 different occasions

Description

Data for Statistical Insight Chapter 6

Usage

Honda

Format

A data frame/tibble with 35 observations on one variable

mileage: miles per gallon for a Honda Civic

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


t.test(Honda$mileage, mu = 40, alternative = "less")

Hostility levels of high school students from rural, suburban, and urban areas

Description

Data for Example 10.6

Usage

Hostile

Format

A data frame/tibble with 135 observations on two variables

location: a factor with the location of the high school student (Rural, Suburban, or Urban)
hostility: the score from the Hostility Level Test

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(hostility ~ location, data = Hostile, 
        col = c("red", "blue", "green"))
kruskal.test(hostility ~ location, data = Hostile)

Median home prices for 1984 and 1993 in 37 markets across the U.S.

Description

Data for Exercise 5.82

Usage

Housing

Format

A data frame/tibble with 74 observations on three variables

city: a character variable with values Albany, Anaheim, Atlanta, Baltimore, Birmingham, Boston, Chicago, Cincinnati, Cleveland, Columbus, Dallas, Denver, Detroit, Ft Lauderdale, Houston, Indianapolis, Kansas City, Los Angeles, Louisville, Memphis, Miami, Milwaukee, Minneapolis, Nashville, New York, Oklahoma City, Philadelphia, Providence, Rochester, Salt Lake City, San Antonio, San Diego, San Francisco, San Jose, St Louis, Tampa, and Washington
year: a factor with levels 1984 and 1993
price: median house price (in dollars)

Source

National Association of Realtors.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stripchart(price ~ year, data = Housing, method = "stack", 
           pch = 1, col = c("red", "blue"))
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Housing, aes(x = price, fill = year)) + 
           geom_dotplot() + 
           facet_grid(year ~ .) + 
           theme_bw()

## End(Not run)

Number of storms, hurricanes and El Nino effects from 1950 through 1995

Description

Data for Exercises 1.38, 10.19, and Example 1.6

Usage

Hurrican

Format

A data frame/tibble with 46 observations on four variables

year: a numeric vector indicating year
storms: a numeric vector recording number of storms
hurrican: a numeric vector recording number of hurricanes
elnino: a factor with levels cold, neutral, and warm

Source

National Hurricane Center.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~hurrican, data = Hurrican)
T1
barplot(T1, col = "blue", main = "Problem 1.38",
        xlab = "Number of hurricanes", 
        ylab = "Number of seasons")
boxplot(storms ~ elnino, data = Hurrican, 
        col = c("blue", "yellow", "red"))
anova(lm(storms ~ elnino, data = Hurrican))
rm(T1)

Number of icebergs sighted each month south of Newfoundland and south of the Grand Banks in 1920

Description

Data for Exercise 2.46 and 2.60

Usage

Iceberg

Format

A data frame with 12 observations on three variables

month: a character variable with abbreviated months of the year
Newfoundland: number of icebergs sighted south of Newfoundland
Grand Banks: number of icebergs sighted south of Grand Banks

Source

N. Shaw, Manual of Meteorology, Vol. 2 (London: Cambridge University Press 1942), 7; and F. Mosteller and J. Tukey, Data Analysis and Regression (Reading, MA: Addison - Wesley, 1977).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(Newfoundland ~ `Grand Banks`, data = Iceberg)
abline(lm(Newfoundland ~ `Grand Banks`, data = Iceberg), col = "blue")

Percent change in personal income from 1st to 2nd quarter in 2000

Description

Data for Exercise 1.33

Usage

Income

Format

A data frame/tibble with 51 observations on two variables

state: a character variable with values Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, District of Colunbia, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missour, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming
percent_change: percent change in income from first quarter to the second quarter of 2000

Source

US Department of Commerce.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


Income$class <- cut(Income$percent_change, 
                    breaks = c(-Inf, 0.5, 1.0, 1.5, 2.0, Inf))
T1 <- xtabs(~class, data = Income)
T1
barplot(T1, col = "pink")   
## Not run: 
library(ggplot2)
DF <- as.data.frame(T1)
DF
ggplot2::ggplot(data = DF,  aes(x = class, y = Freq)) + 
           geom_bar(stat = "identity", fill = "purple") + 
           theme_bw()

## End(Not run)

Illustrates a comparison problem for long-tailed distributions

Description

Data for Exercise 7.41

Usage

Independent

Format

A data frame/tibble with 46 observations on two variables

score: a numeric vector
group: a factor with levels A and B

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Independent$score[Independent$group=="A"])
qqline(Independent$score[Independent$group=="A"])
qqnorm(Independent$score[Independent$group=="B"])
qqline(Independent$score[Independent$group=="B"])
boxplot(score ~ group, data = Independent, col = "blue")
wilcox.test(score ~ group, data = Independent)

Educational attainment versus per capita income and poverty rate for American indians living on reservations

Description

Data for Exercise 2.95

Usage

Indian

Format

A data frame/tibble with ten observations on four variables

reservation: a character variable with values Blackfeet, Fort Apache, Gila River, Hopi, Navajo, Papago, Pine Ridge, Rosebud, San Carlos, and Zuni Pueblo
percent high school: percent who have graduated from high school
per capita income: per capita income (in dollars)
poverty rate: percent poverty

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


par(mfrow = c(1, 2))
plot(`per capita income` ~ `percent high school`, data = Indian, 
     xlab = "Percent high school graudates", ylab = "Per capita income")
plot(`poverty rate` ~ `percent high school`, data = Indian, 
     xlab = "Percent high school graudates", ylab = "Percent poverty")
par(mfrow = c(1, 1))

Average miles per hour for the winners of the Indianapolis 500 race

Description

Data for Exercise 1.128

Usage

Indiapol

Format

A data frame/tibble with 39 observations on two variables

year: the year of the race
speed: the winners average speed (in mph)

Source

The World Almanac and Book of Facts, 2000, p. 1004.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(speed ~ year, data = Indiapol, type = "b")

Qualifying miles per hour and number of previous starts for drivers in 79th Indianapolis 500 race

Description

Data for Exercises 7.11 and 7.36

Usage

Indy500

Format

A data frame/tibble with 33 observations on four variables

driver: a character variable with values andretti, bachelart, boesel, brayton, c.guerrero, cheever, fabi, fernandez, ferran, fittipaldi, fox, goodyear, gordon, gugelmin, herta, james, johansson, jones, lazier, luyendyk, matsuda, matsushita, pruett, r.guerrero, rahal, ribeiro, salazar, sharp, sullivan, tracy, vasser, villeneuve, and zampedri
qualif: qualifying speed (in mph)
starts: number of Indianapolis 500 starts
group: a numeric vector where 1 indicates the driver has 4 or fewer Indianapolis 500 starts and a 2 for drivers with 5 or more Indianapolis 500 starts

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stripchart(qualif ~ group, data = Indy500, method = "stack",
           pch = 19, col = c("red", "blue"))
boxplot(qualif ~ group, data = Indy500)
t.test(qualif ~ group, data = Indy500)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Indy500, aes(sample = qualif)) + 
           geom_qq() + 
           facet_grid(group ~ .) + 
           theme_bw()

## End(Not run)

Private pay increase of salaried employees versus inflation rate

Description

Data for Exercises 2.12 and 2.29

Usage

Inflatio

Format

A data frame/tibble with 24 observations on four variables

year: a numeric vector of years
pay: average hourly wage for salaried employees (in dollars)
increase: percent increase in hourly wage over previous year
inflation: percent inflation rate

Source

Bureau of Labor Statistics.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(increase ~ inflation, data = Inflatio)
cor(Inflatio$increase, Inflatio$inflation, use = "complete.obs")

Inlet oil temperature through a valve

Description

Data for Exercises 5.91 and 6.48

Usage

Inletoil

Format

A data frame/tibble with 12 observations on one variable

temp: inlet oil temperature (Fahrenheit)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Inletoil$temp, breaks = 3)
qqnorm(Inletoil$temp)
qqline(Inletoil$temp)
t.test(Inletoil$temp)
t.test(Inletoil$temp, mu = 98, alternative = "less")

Type of drug offense by race

Description

Data for Statistical Insight Chapter 8

Usage

Inmate

Format

A data frame/tibble with 28,047 observations on two variables

race: a factor with levels white, black, and hispanic
drug: a factor with levels heroin, crack, cocaine, and marijuana

Source

C. Wolf Harlow (1994), Comparing Federal and State Prison Inmates, NCJ-145864, U.S. Department of Justice, Bureau of Justice Statistics.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~race + drug, data = Inmate)
T1
chisq.test(T1)
rm(T1)

Percent of vehicles passing inspection by type inspection station

Description

Data for Exercise 8.59

Usage

Inspect

Format

A data frame/tibble with 174 observations on two variables

station: a factor with levels auto inspection, auto repair, car care center, gas station, new car dealer, and tire store
passed: a factor with levels less than 70%, between 70% and 84%, and more than 85%

Source

The Charlotte Observer, December 13, 1992.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~ station + passed, data = Inspect)
T1
barplot(T1, beside = TRUE, legend = TRUE)
chisq.test(T1)
rm(T1)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Inspect, aes(x = passed, fill = station)) + 
           geom_bar(position = "dodge") + 
           theme_bw()

## End(Not run)

Heat loss through a new insulating medium

Description

Data for Exercise 9.50

Usage

Insulate

Format

A data frame/tibble with ten observations on two variables

temp: outside temperature (in degrees Celcius)
loss: heat loss (in BTUs)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(loss ~ temp, data = Insulate)
model <- lm(loss ~ temp, data = Insulate)
abline(model, col = "blue") 
summary(model)

## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Insulate, aes(x = temp, y = loss)) + 
           geom_point() + 
           geom_smooth(method = "lm", se = FALSE) + 
           theme_bw()

## End(Not run)

GPA versus IQ for 12 individuals

Description

Data for Exercises 9.51 and 9.52

Usage

Iqgpa

Format

A data frame/tibble with 12 observations on two variables

iq: IQ scores
gpa: Grade point average

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(gpa ~ iq, data = Iqgpa, col = "blue", pch = 19)
model <- lm(gpa ~ iq, data = Iqgpa)
summary(model)
rm(model)

R.A. Fishers famous data on Irises

Description

Data for Examples 1.15 and 5.19

Usage

Irises

Format

A data frame/tibble with 150 observations on five variables

sepal_length: sepal length (in cm)
sepal_width: sepal width (in cm)
petal_length: petal length (in cm)
petal_width: petal width (in cm)
species: a factor with levels setosa, versicolor, and virginica

Source

Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179-188.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


tapply(Irises$sepal_length, Irises$species, mean)
t.test(Irises$sepal_length[Irises$species == "setosa"], conf.level = 0.99)
hist(Irises$sepal_length[Irises$species == "setosa"], 
     main = "Sepal length for\n Iris Setosa",
     xlab = "Length (in cm)")
boxplot(sepal_length ~ species, data = Irises)

Number of problems reported per 100 cars in 1994 versus 1995s

Description

Data for Exercise 2.14, 2.17, 2.31, 2.33, and 2.40

Usage

Jdpower

Format

A data frame/tibble with 29 observations on three variables

car: a factor with levels Acura, BMW, Buick, Cadillac, Chevrolet, Dodge Eagle, Ford, Geo, Honda, Hyundai, Infiniti, Jaguar, Lexus, Lincoln, Mazda, Mercedes-Benz, Mercury, Mitsubishi, Nissan, Oldsmobile, Plymouth, Pontiac, Saab, Saturn, and Subaru, Toyota Volkswagen, Volvo
1994: number of problems per 100 cars in 1994
1995: number of problems per 100 cars in 1995

Source

USA Today, May 25, 1995.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


model <- lm(`1995` ~ `1994`, data = Jdpower)
summary(model)
plot(`1995` ~ `1994`, data = Jdpower)
abline(model, col = "red")
rm(model)

Job satisfaction and stress level for 9 school teachers

Description

Data for Exercise 9.60

Usage

Jobsat

Format

A data frame/tibble with nine observations on two variables

wspt: Wilson Stress Profile score for teachers
satisfaction: job satisfaction score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(satisfaction ~ wspt, data = Jobsat)
model <- lm(satisfaction ~ wspt, data = Jobsat)
abline(model, col = "blue")
summary(model)
rm(model)

Smoking habits of boys and girls ages 12 to 18

Description

Data for Exercise 4.85

Usage

Kidsmoke

Format

A data frame/tibble with 1000 observations on two variables

gender: character vector with values female and male
smoke: a character vector with values no and yes

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~smoke + gender, data = Kidsmoke)
T1
prop.table(T1)
prop.table(T1, 1)
prop.table(T1, 2)

Rates per kilowatt-hour for each of the 50 states and DC

Description

Data for Example 5.9

Usage

Kilowatt

Format

A data frame/tibble with 51 observations on two variables

state: a factor with levels Alabama Alaska, Arizona, Arkansas California, Colorado, Connecticut, Delaware, District of Columbia, Florida,Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa Kansas Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missour, Montana Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia Washington, West Virginia, Wisconsin, and Wyoming
rate: a numeric vector indicating rates for kilowatt per hour

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Kilowatt$rate)

Reading scores for first grade children who attended kindergarten versus those who did not

Description

Data for Exercise 7.68

Usage

Kinder

Format

A data frame/tibble with eight observations on three variables

pair: a numeric indicator of pair
kinder: reading score of kids who went to kindergarten
nokinder: reading score of kids who did not go to kindergarten

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(Kinder$kinder, Kinder$nokinder)
diff <- Kinder$kinder - Kinder$nokinder
qqnorm(diff)
qqline(diff)
shapiro.test(diff)
t.test(diff)
rm(diff)

Median costs of laminectomies at hospitals across North Carolina in 1992

Description

Data for Exercise 10.18

Usage

Laminect

Format

A data frame/tibble with 138 observations on two variables

area: a character vector indicating the area of the hospital with Rural, Regional, and Metropol
cost: a numeric vector indicating cost of a laminectomy

Source

Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(cost ~ area, data = Laminect, col = topo.colors(3))
anova(lm(cost ~ area, data = Laminect))

Lead levels in children's blood whose parents worked in a battery factory

Description

Data for Example 1.17

Usage

Lead

Format

A data frame/tibble with 66 observations on the two variables

group: a character vector with values exposed and control
lead: a numeric vector indicating the level of lead in children's blood (in micrograms/dl)

Source

Morton, D. et al. (1982), "Lead Absorption in Children of Employees in a Lead-Related Industry," American Journal of Epidemiology, 155, 549-555.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(lead ~ group, data = Lead, col = topo.colors(2))

Leadership exam scores by age for employees on an industrial plant

Description

Data for Exercise 7.31

Usage

Leader

Format

A data frame/tibble with 34 observations on two variables

age: a character vector indicating age with values under35 and over35
score: score on a leadership exam

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ age, data = Leader, col = c("gray", "green"))
t.test(score ~ age, data = Leader)

Survival time of mice injected with an experimental lethal drug

Description

Data for Example 6.12

Usage

Lethal

Format

A data frame/tibble with 30 observations on one variable

survival: a numeric vector indicating time surivived after injection (in seconds)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


SIGN.test(Lethal$survival, md = 45, alternative = "less")

Life expectancy of men and women in U.S.

Description

Data for Exercise 1.31

Usage

Life

Format

A data frame/tibble with eight observations on three variables

year: a numeric vector indicating year
men: life expectancy for men (in years)
women: life expectancy for women (in years)

Source

National Center for Health Statistics.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(men ~ year, type = "l", ylim = c(min(men, women), max(men, women)), 
    col = "blue", main = "Life Expectancy vs Year", ylab = "Age", 
    xlab = "Year", data = Life)
lines(women ~ year, col = "red", data = Life)
text(1955, 65, "Men", col = "blue")
text(1955, 70, "Women", col = "red")

Life span of electronic components used in a spacecraft versus heat

Description

Data for Exercise 2.4, 2.37, and 2.49

Usage

Lifespan

Format

A data frame/tibble with six observations two variables

heat: temperature (in Celcius)
life: lifespan of component (in hours)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(life ~ heat, data = Lifespan)
model <- lm(life ~ heat, data = Lifespan)
abline(model, col = "red")
resid(model)
sum((resid(model))^2)
anova(model)
rm(model)

Relationship between damage reports and deaths caused by lightning

Description

Data for Exercise 2.6

Usage

Ligntmonth

Format

A data frame/tibble with 12 observations on four variables

month: a factor with levels 1/01/2000, 10/01/2000, 11/01/2000, 12/01/2000, 2/01/2000, 3/01/2000, 4/01/2000, 5/01/2000, 6/01/2000, 7/01/2000, 8/01/2000, and 9/01/2000
deaths: number of deaths due to lightning strikes
injuries: number of injuries due to lightning strikes
damage: damage due to lightning strikes (in dollars)

Source

Lighting Fatalities, Injuries and Damage Reports in the United States, 1959-1994, NOAA Technical Memorandum NWS SR-193, Dept. of Commerce.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(deaths ~ damage, data = Ligntmonth)
model = lm(deaths ~ damage, data = Ligntmonth)
abline(model, col = "red")
rm(model)

Measured traffic at three prospective locations for a motor lodge

Description

Data for Exercise 10.33

Usage

Lodge

Format

A data frame/tibble with 45 observations on six variables

traffic: a numeric vector indicating the amount of vehicles that passed a site in 1 hour
site: a numeric vector with values 1, 2, and 3
ranks: ranks for variable traffic

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(traffic ~ site, data = Lodge, col = cm.colors(3))
anova(lm(traffic ~ factor(site), data = Lodge))

Long-tailed distributions to illustrate Kruskal Wallis test

Description

Data for Exercise 10.45

Usage

Longtail

Format

A data frame/tibble with 60 observations on three variables

score: a numeric vector
group: a numeric vector with values 1, 2, and 3
ranks: ranks for variable score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ group, data = Longtail, col = heat.colors(3))
kruskal.test(score ~ factor(group), data = Longtail)
anova(lm(score ~ factor(group), data = Longtail))

Reading skills of 24 matched low ability students

Description

Data for Example 7.18

Usage

Lowabil

Format

A data frame/tibble with 12 observations on three variables

pair: a numeric indicator of pair
experiment: score of the child with the experimental method
control: score of the child with the standard method

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


diff = Lowabil$experiment - Lowabil$control
qqnorm(diff)
qqline(diff)
shapiro.test(diff)
t.test(diff)
rm(diff)

Magnesium concentration and distances between samples

Description

Data for Exercise 9.9

Usage

Magnesiu

Format

A data frame/tibble with 20 observations on two variables

distance: distance between samples
magnesium: concentration of magnesium

Source

Davis, J. (1986), Statistics and Data Analysis in Geology, 2d. Ed., John Wiley and Sons, New York, p. 146.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(magnesium ~ distance, data = Magnesiu)
model = lm(magnesium ~ distance, data = Magnesiu)
abline(model, col = "red")
summary(model)
rm(model)

Amounts awarded in 17 malpractice cases

Description

Data for Exercise 5.73

Usage

Malpract

Format

A data frame/tibble with 17 observations on one variable

award: malpractice reward (in $1000)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


SIGN.test(Malpract$award, conf.level = 0.90)

Advertised salaries offered general managers of major corporations in 1995

Description

Data for Exercise 5.81

Usage

Manager

Format

A data frame/tibble with 26 observations on one variable

salary: random sample of advertised annual salaries of top executives (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Manager$salary)
SIGN.test(Manager$salary)

Percent of marked cars in 65 police departments in Florida

Description

Data for Exercise 6.100

Usage

Marked

Format

A data frame/tibble with 65 observations on one variable

percent: percentage of marked cars in 65 Florida police departments

Source

Law Enforcement Management and Administrative Statistics, 1993, Bureau of Justice Statistics, NCJ-148825, September 1995, p. 147-148.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Marked$percent)
SIGN.test(Marked$percent, md = 60, alternative = "greater")
t.test(Marked$percent, mu = 60, alternative = "greater")

Standardized math test scores for 30 students

Description

Data for Exercise 1.69

Usage

Math

Format

A data frame/tibble with 30 observations on one variable

score: scores on a standardized test for 30 tenth graders

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Math$score)
hist(Math$score, main = "Math Scores", xlab = "score", freq = FALSE)
lines(density(Math$score), col = "red")
CharlieZ <- (62 - mean(Math$score))/sd(Math$score)
CharlieZ
scale(Math$score)[which(Math$score == 62)]

Standardized math competency for a group of entering freshmen at a small community college

Description

Data for Exercise 5.26

Usage

Mathcomp

Format

A data frame/tibble with 31 observations one variable

score: scores of 31 entering freshmen at a community college on a national standardized test

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Mathcomp$score)
EDA(Mathcomp$score)

Math proficiency and SAT scores by states

Description

Data for Exercise 9.24, Example 9.1, and Example 9.6

Usage

Mathpro

Format

A data frame/tibble with 51 observations on four variables

state: a factor with levels Conn, D.C., Del, Ga, Hawaii, Ind, Maine, Mass, Md, N.C., N.H., N.J., N.Y., Ore, Pa, R.I., S.C., Va, and Vt
sat_math: SAT math scores for high school seniors
profic: math proficiency scores for eigth graders
group: a numeric vector

Source

National Assessment of Educational Progress and The College Board.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


model <- lm(sat_math ~ profic, data = Mathpro)
plot(sat_math ~ profic, data = Mathpro, ylab = "SAT", xlab = "proficiency")
abline(model, col = "red")
summary(model)
rm(model)

Error scores for four groups of experimental animals running a maze

Description

Data for Exercise 10.13

Usage

Maze

Format

A data frame/tibble with 32 observations on two variables

score: error scores for animals running through a maze under different conditions
condition: a factor with levels CondA, CondB, CondC, and CondD

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ condition, data = Maze, col = rainbow(4))
anova(lm(score ~ condition, data = Maze))

Illustrates test of equality of medians with the Kruskal Wallis test

Description

Data for Exercise 10.52

Usage

Median

Format

A data frame/tibble with 45 observations on two variables

sample: a vector with values Sample1, Sample 2, and Sample 3
value: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(value ~ sample, data = Median, col = rainbow(3))
anova(lm(value ~ sample, data = Median))
kruskal.test(value ~ factor(sample), data = Median)

Median mental ages of 16 girls

Description

Data for Exercise 6.52

Usage

Mental

Format

A data frame/tibble with 16 observations on one variable

age: mental age of 16 girls

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


SIGN.test(Mental$age, md = 100)

Concentration of mercury in 25 lake trout

Description

Data for Example 1.9

Usage

Mercury

Format

A data frame/tibble with 25 observations on one variable

mercury: a numeric vector measuring mercury (in parts per million)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Mercury$mercury)

Monthly rental costs in metro areas with 1 million or more persons

Description

Data for Exercise 5.117

Usage

Metrent

Format

A data frame/tibble with 46 observations on one variable

rent: monthly rent in dollars

Source

U.S. Bureau of the Census, Housing in the Metropolitan Areas, Statistical Brief SB/94/19, September 1994.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(Metrent$rent, col = "magenta")
t.test(Metrent$rent, conf.level = 0.99)$conf

Miller personality test scores for a group of college students applying for graduate school

Description

Data for Example 5.7

Usage

Miller

Format

A data frame/tibble with 25 observations on one variable

miller: scores on the Miller Personality test

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Miller$miller)
fivenum(Miller$miller)
boxplot(Miller$miller)
qqnorm(Miller$miller,col = "blue")
qqline(Miller$miller, col = "red")

Twenty scores on the Miller personality test

Description

Data for Exercise 1.41

Usage

Miller1

Format

A data frame/tibble with 20 observations on one variable

miller: scores on the Miller personality test

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Miller1$miller)
stem(Miller1$miller, scale = 2)

Moisture content and depth of core sample for marine muds in eastern Louisiana

Description

Data for Exercise 9.32

Usage

Moisture

Format

A data frame/tibble with 16 observations on four variables

depth: a numeric vector
moisture: g of water per 100 g of dried sediment
lnmoist: a numeric vector
depthsq: a numeric vector

Source

Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2d. ed., John Wiley and Sons, New York, pp. 177, 185.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(moisture ~ depth, data = Moisture)
model <- lm(moisture ~ depth, data = Moisture)
abline(model, col = "red")
plot(resid(model) ~ depth, data = Moisture)
rm(model)

Carbon monoxide emitted by smoke stacks of a manufacturer and a competitor

Description

Data for Exercise 7.45

Usage

Monoxide

Format

A data frame/tibble with ten observations on two variables

company: a vector with values manufacturer and competitor
emission: carbon monoxide emitted

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(emission ~ company, data = Monoxide, col = topo.colors(2))
t.test(emission ~ company, data = Monoxide)
wilcox.test(emission ~ company, data = Monoxide)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Monoxide, aes(x = company, y = emission)) + 
           geom_boxplot() + 
           theme_bw()

## End(Not run)

Moral attitude scale on 15 subjects before and after viewing a movie

Description

Data for Exercise 7.53

Usage

Movie

Format

A data frame/tibble with 12 observations on three variables

before: moral aptitude before viewing the movie
after: moral aptitude after viewing the movie
differ: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Movie$differ)
qqline(Movie$differ)
shapiro.test(Movie$differ)
t.test(Movie$differ, conf.level = 0.99)
wilcox.test(Movie$differ)

Improvement scores for identical twins taught music recognition by two techniques

Description

Data for Exercise 7.59

Usage

Music

Format

A data frame/tibble with 12 observations on three variables

method1: a numeric vector measuring the improvement scores on a music recognition test
method2: a numeric vector measuring the improvement scores on a music recognition test
differ: method1 - method2

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Music$differ)
qqline(Music$differ)
shapiro.test(Music$differ)
t.test(Music$differ)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Music, aes(x = differ)) + 
           geom_dotplot() + 
           theme_bw()

## End(Not run)

Estimated value of a brand name product and the conpany's revenue

Description

Data for Exercises 2.28, 9.19, and Example 2.8

Usage

Name

Format

A data frame/tibble with 42 observations on three variables

brand: a factor with levels Band-Aid, Barbie, Birds Eye, Budweiser, Camel, Campbell, Carlsberg, Coca-Cola, Colgate, Del Monte, Fisher-Price, ⁠Gordon's⁠, Green Giant, Guinness, Haagen-Dazs, Heineken, Heinz, Hennessy, Hermes, Hershey, Ivory, Jell-o, Johnnie Walker, Kellogg, Kleenex, Kraft, Louis Vuitton, Marlboro, Nescafe, Nestle, Nivea, Oil of Olay, Pampers, Pepsi-Cola, Planters, Quaker, Sara Lee, Schweppes, Smirnoff, Tampax, Winston, and ⁠Wrigley's⁠
value: value in billions of dollars
revenue: revenue in billions of dollars

Source

Financial World.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(value ~ revenue, data = Name)
model <- lm(value ~ revenue, data = Name)
abline(model, col = "red")
cor(Name$value, Name$revenue)
summary(model)
rm(model)

Efficiency of pit crews for three major NASCAR teams

Description

Data for Exercise 10.53

Usage

Nascar

Format

A data frame/tibble with 36 observations on six variables

time: duration of pit stop (in seconds)
team: a numeric vector representing team 1, 2, or 3
ranks: a numeric vector ranking each pit stop in order of speed

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(time ~ team, data = Nascar, col = rainbow(3))
model <- lm(time ~ factor(team), data = Nascar)
summary(model)
anova(model)
rm(model)

Reaction effects of 4 drugs on 25 subjects with a nervous disorder

Description

Data for Example 10.3

Usage

Nervous

Format

A data frame/tibble with 25 observations on two variables

react: a numeric vector representing reaction time
drug: a numeric vector indicating each of the 4 drugs

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(react ~ drug, data = Nervous, col = rainbow(4))
model <- aov(react ~ factor(drug), data = Nervous)
summary(model)
TukeyHSD(model)
plot(TukeyHSD(model), las = 1)

Daily profits for 20 newsstands

Description

Data for Exercise 1.43

Usage

Newsstand

Format

A data frame/tibble with 20 observations on one variable

profit: profit of each newsstand (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Newsstand$profit)
stem(Newsstand$profit, scale = 3)

Rating, time in 40-yard dash, and weight of top defensive linemen in the 1994 NFL draft

Description

Data for Exercise 9.63

Usage

Nfldraf2

Format

A data frame/tibble with 47 observations on three variables

rating: rating of each player on a scale out of 10
forty: forty yard dash time (in seconds)
weight: weight of each player (in pounds)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(rating ~ forty, data = Nfldraf2)
summary(lm(rating ~ forty, data = Nfldraf2))

Rating, time in 40-yard dash, and weight of top offensive linemen in the 1994 NFL draft

Description

Data for Exercises 9.10 and 9.16

Usage

Nfldraft

Format

A data frame/tibble with 29 observations on three variables

rating: rating of each player on a scale out of 10
forty: forty yard dash time (in seconds)
weight: weight of each player (in pounds)

Source

USA Today, April 20, 1994.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(rating ~ forty, data = Nfldraft)
cor(Nfldraft$rating, Nfldraft$forty)
summary(lm(rating ~ forty, data = Nfldraft))

Nicotine content versus sales for eight major brands of cigarettes

Description

Data for Exercise 9.21

Usage

Nicotine

Format

A data frame/tibble with eight observations on two variables

nicotine: nicotine content (in milligrams)
sales: sales figures (in $100,000)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


model <- lm(sales ~ nicotine, data = Nicotine)
plot(sales ~ nicotine, data = Nicotine)
abline(model, col = "red")
summary(model)
predict(model, newdata = data.frame(nicotine = 1), 
        interval = "confidence", level = 0.99)

Normal Area

Description

Function that computes and draws the area between two user specified values in a user specified normal distribution with a given mean and standard deviation

Usage

normarea(lower = -Inf, upper = Inf, m, sig)

Arguments

lower

the lower value

upper

the upper value

m

the mean for the population

sig

the standard deviation of the population

Author(s)

Alan T. Arnholt

Examples


normarea(70, 130, 100, 15)
    # Finds and P(70 < X < 130) given X is N(100,15).

Required Sample Size

Description

Function to determine required sample size to be within a given margin of error.

Usage

nsize(b, sigma = NULL, p = 0.5, conf.level = 0.95, type = "mu")

Arguments

b

the desired bound.

sigma

population standard deviation. Not required if using type "pi".

p

estimate for the population proportion of successes. Not required if using type "mu".

conf.level

confidence level for the problem, restricted to lie between zero and one.

type

character string, one of "mu" or "pi", or just the initial letter of each, indicating the appropriate parameter. Default value is "mu".

Details

Answer is based on a normal approximation when using type "pi".

Value

Returns required sample size.

Author(s)

Alan T. Arnholt

Examples


nsize(b=.03, p=708/1200, conf.level=.90, type="pi")
    # Returns the required sample size (n) to estimate the population 
    # proportion of successes with a 0.9 confidence interval 
    # so that the margin of error is no more than 0.03 when the
    # estimate of the population propotion of successes is 708/1200.
    # This is problem 5.38 on page 257 of Kitchen's BSDA.
    
nsize(b=.15, sigma=.31, conf.level=.90, type="mu")
    # Returns the required sample size (n) to estimate the population 
    # mean with a 0.9 confidence interval so that the margin 
    # of error is no more than 0.15.  This is Example 5.17 on page
    # 261 of Kitchen's BSDA.

Normality Tester

Description

Q-Q plots of randomly generated normal data of the same size as the tested data are generated and ploted on the perimeter of the graph while a Q-Q plot of the actual data is depicted in the center of the graph.

Usage

ntester(actual.data)

Arguments

actual.data

a numeric vector. Missing and infinite values are allowed, but are ignored in the calculation. The length of actual.data must be less than 5000 after dropping nonfinite values.

Details

Q-Q plots of randomly generated normal data of the same size as the tested data are generated and ploted on the perimeter of the graph sheet while a Q-Q plot of the actual data is depicted in the center of the graph. The p-values are calculated form the Shapiro-Wilk W-statistic. Function will only work on numeric vectors containing less than or equal to 5000 observations.

Author(s)

Alan T. Arnholt

References

Shapiro, S.S. and Wilk, M.B. (1965). An analysis of variance test for normality (complete samples). Biometrika 52 : 591-611.

Examples


ntester(rexp(50,1))
    # Q-Q plot of random exponential data in center plot
    # surrounded by 8 Q-Q plots of randomly generated 
    # standard normal data of size 50.

Price of oranges versus size of the harvest

Description

Data for Exercise 9.61

Usage

Orange

Format

A data frame/tibble with six observations on two variables

harvest: harvest in millions of boxes
price: average price charged by California growers for a 75-pound box of navel oranges

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(price ~ harvest, data = Orange)
model <- lm(price ~ harvest, data = Orange)
abline(model, col = "red")
summary(model)
rm(model)

Salaries of members of the Baltimore Orioles baseball team

Description

Data for Example 1.3

Usage

Orioles

Format

A data frame/tibble with 27 observations on three variables

first name: a factor with levels Albert, Arthur, B.J., Brady, Cal, Charles, dl-Delino, dl-Scott, Doug, Harold, Heathcliff, Jeff, Jesse, Juan, Lenny, Mike, Rich, Ricky, Scott, Sidney, Will, and Willis
last name: a factor with levels Amaral, Anderson, Baines, Belle, Bones, Bordick, Clark, Conine, Deshields, Erickson, Fetters, Garcia, Guzman, Johns, Johnson, Kamieniecki, Mussina, Orosco, Otanez, Ponson, Reboulet, Rhodes, Ripken Jr., Slocumb, Surhoff,Timlin, and Webster
1999salary: a numeric vector containing each player's salary (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stripchart(Orioles$`1999salary`, method = "stack", pch = 19)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Orioles, aes(x = `1999salary`)) + 
           geom_dotplot(dotsize = 0.5) + 
           labs(x = "1999 Salary") +
           theme_bw()

## End(Not run)

Arterial blood pressure of 11 subjects before and after receiving oxytocin

Description

Data for Exercise 7.86

Usage

Oxytocin

Format

A data frame/tibble with 11 observations on three variables

subject: a numeric vector indicating each subject
before: mean arterial blood pressure of subject before receiving oxytocin
after: mean arterial blood pressure of subject after receiving oxytocin

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


diff = Oxytocin$after - Oxytocin$before
qqnorm(diff)
qqline(diff)
shapiro.test(diff)
t.test(diff)
rm(diff)

Education backgrounds of parents of entering freshmen at a state university

Description

Data for Exercise 1.32

Usage

Parented

Format

A data frame/tibble with 200 observations on two variables

education: a factor with levels 4yr college degree, Doctoral degree, Grad degree, H.S grad or less, Some college, and Some grad school
parent: a factor with levels mother and father

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~education + parent, data = Parented)
T1
barplot(t(T1), beside = TRUE, legend = TRUE, col = c("blue", "red"))
rm(T1)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Parented, aes(x = education, fill = parent)) + 
    geom_bar(position = "dodge") + 
    theme_bw() +
    theme(axis.text.x  = element_text(angle = 85, vjust = 0.5)) + 
    scale_fill_manual(values = c("pink", "blue")) + 
    labs(x = "", y = "") 

## End(Not run)

Years of experience and number of tickets given by patrolpersons in New York City

Description

Data for Example 9.3

Usage

Patrol

Format

A data frame/tibble with ten observations on three variables

tickets: number of tickets written per week
years: patrolperson's experience (in years)
log_tickets: natural log of tickets

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


model <- lm(tickets ~ years, data = Patrol)
summary(model)
confint(model, level = 0.98)

Karl Pearson's data on heights of brothers and sisters

Description

Data for Exercise 2.20

Usage

Pearson

Format

A data frame/tibble with 11 observations on three variables

family: number indicating family of brother and sister pair
brother: height of brother (in inches)
sister: height of sister (in inches)

Source

Pearson, K. and Lee, A. (1902-3), On the Laws of Inheritance in Man, Biometrika, 2, 357.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(brother ~ sister, data = Pearson, col = "lightblue")
cor(Pearson$brother, Pearson$sister)

Length of long-distance phone calls for a small business firm

Description

Data for Exercise 6.95

Usage

Phone

Format

A data frame/tibble with 20 observations on one variable

time: duration of long distance phone call (in minutes)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Phone$time)
qqline(Phone$time)
shapiro.test(Phone$time)
SIGN.test(Phone$time, md = 5, alternative = "greater")

Number of poisonings reported to 16 poison control centers

Description

Data for Exercise 1.113

Usage

Poison

Format

A data frame/tibble with 226,361 observations on one variable

type: a factor with levels Alcohol, Cleaning agent, Cosmetics, Drugs, Insecticides, and Plants

Source

Centers for Disease Control, Atlanta, Georgia.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~type, data = Poison)
T1
par(mar = c(5.1 + 2, 4.1, 4.1, 2.1))
barplot(sort(T1, decreasing = TRUE), las = 2, col = rainbow(6))
par(mar = c(5.1, 4.1, 4.1, 2.1))
rm(T1)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Poison, aes(x = type, fill = type)) + 
           geom_bar() + 
           theme_bw() + 
           theme(axis.text.x  = element_text(angle = 85, vjust = 0.5)) +
           guides(fill = FALSE)

## End(Not run)

Political party and gender in a voting district

Description

Data for Example 8.3

Usage

Politic

Format

A data frame/tibble with 250 observations on two variables

party: a factor with levels republican, democrat, and other
gender: a factor with levels female and male

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~party + gender, data = Politic)
T1
chisq.test(T1)
rm(T1)

Air pollution index for 15 randomly selected days for a major western city

Description

Data for Exercise 5.59

Usage

Pollutio

Format

A data frame/tibble with 15 observations on one variable

inde: air pollution index

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Pollutio$inde)
t.test(Pollutio$inde, conf.level = 0.98)$conf

Porosity measurements on 20 samples of Tensleep Sandstone, Pennsylvanian from Bighorn Basin in Wyoming

Description

Data for Exercise 5.86

Usage

Porosity

Format

A data frame/tibble with 20 observations on one variable

porosity: porosity measurement (percent)

Source

Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2nd edition, pages 63-65.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Porosity$porosity)
fivenum(Porosity$porosity)
boxplot(Porosity$porosity, col = "lightgreen")

Percent poverty and crime rate for selected cities

Description

Data for Exercise 9.11 and 9.17

Usage

Poverty

Format

A data frame/tibble with 20 observations on four variables

city: a factor with levels Atlanta, Buffalo, Cincinnati, Cleveland, Dayton, O, Detroit, Flint, Mich, Fresno, C, Gary, Ind, Hartford, C, Laredo, Macon, Ga, Miami, Milwaukee, New Orleans, Newark, NJ, Rochester,NY, Shreveport, St. Louis, and Waco, Tx
poverty: percent of children living in poverty
crime: crime rate (per 1000 people)
population: population of city

Source

Children's Defense Fund and the Bureau of Justice Statistics.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(poverty ~ crime, data = Poverty)
model <- lm(poverty ~ crime, data = Poverty)
abline(model, col = "red")
summary(model)
rm(model)

Robbery rates versus percent low income in eight precincts

Description

Data for Exercise 2.2 and 2.38

Usage

Precinct

Format

A data frame/tibble with eight observations on two variables

rate: robbery rate (per 1000 people)
income: percent with low income

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(rate ~ income, data = Precinct)
model <- (lm(rate ~ income, data = Precinct))
abline(model, col = "red")
rm(model)

Racial prejudice measured on a sample of 25 high school students

Description

Data for Exercise 5.10 and 5.22

Usage

Prejudic

Format

A data frame with 25 observations on one variable

prejud: racial prejudice score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Prejudic$prejud)
EDA(Prejudic$prejud)

Ages at inauguration and death of U.S. presidents

Description

Data for Exercise 1.126

Usage

Presiden

Format

A data frame/tibble with 43 observations on five variables

first_initial: a factor with levels A., B., C., D., F., G., G. W., H., J., L., M., R., T., U., W., and Z.
last_name: a factor with levels Adams, Arthur, Buchanan, Bush, Carter, Cleveland, Clinton, Coolidge, Eisenhower, Fillmore, Ford, Garfield, Grant, Harding, Harrison, Hayes, Hoover, Jackson, Jefferson, Johnson, Kennedy, Lincoln, Madison, McKinley, Monroe, Nixon, Pierce, Polk, Reagan, Roosevelt, Taft, Taylor, Truman, Tyler, VanBuren, Washington, and Wilson
birth_state: a factor with levels ARK, CAL, CONN, GA, IA, ILL, KY, MASS, MO, NC, NEB, NH, NJ, NY, OH, PA, SC, TEX, VA, and VT
inaugural_age: President's age at inauguration
death_age: President's age at death

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


pie(xtabs(~birth_state, data = Presiden))
stem(Presiden$inaugural_age)
stem(Presiden$death_age)
par(mar = c(5.1, 4.1 + 3, 4.1, 2.1))
stripchart(x=list(Presiden$inaugural_age, Presiden$death_age), 
           method = "stack", col = c("green","brown"), pch = 19, las = 1)
par(mar = c(5.1, 4.1, 4.1, 2.1))

Degree of confidence in the press versus education level for 20 randomly selected persons

Description

Data for Exercise 9.55

Usage

Press

Format

A data frame/tibble with 20 observations on two variables

education_yrs: years of education
confidence: degree of confidence in the press (the higher the score, the more confidence)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(confidence ~ education_yrs, data = Press)
model <- lm(confidence ~ education_yrs, data = Press)
abline(model, col = "purple")
summary(model)
rm(model)

Klopfer's prognostic rating scale for subjects receiving behavior modification therapy

Description

Data for Exercise 6.61

Usage

Prognost

Format

A data frame/tibble with 15 observations on one variable

kprs_score: Kloper's Prognostic Rating Scale score

Source

Newmark, C., et al. (1973), Predictive Validity of the Rorschach Prognostic Rating Scale with Behavior Modification Techniques, Journal of Clinical Psychology, 29, 246-248.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Prognost$kprs_score)
t.test(Prognost$kprs_score, mu = 9)

Effects of four different methods of programmed learning for statistics students

Description

Data for Exercise 10.17

Usage

Program

Format

A data frame/tibble with 44 observations on two variables

method: a character variable with values method1, method2, method3, and method4
score: standardized test score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ method, col = c("red", "blue", "green", "yellow"), data = Program)
anova(lm(score ~ method, data = Program))
TukeyHSD(aov(score ~ method, data = Program))
par(mar = c(5.1, 4.1 + 4, 4.1, 2.1))
plot(TukeyHSD(aov(score ~ method, data = Program)), las = 1)
par(mar = c(5.1, 4.1, 4.1, 2.1))

PSAT scores versus SAT scores

Description

Data for Exercise 2.50

Usage

Psat

Format

A data frame/tibble with seven observations on the two variables

psat: PSAT score
sat: SAT score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


model <- lm(sat ~ psat, data = Psat)
par(mfrow = c(1, 2))
plot(Psat$psat, resid(model))
plot(model, which = 1)
rm(model)
par(mfrow = c(1, 1))

Correct responses for 24 students in a psychology experiment

Description

Data for Exercise 1.42

Usage

Psych

Format

A data frame/tibble with 23 observations on one variable

score: number of correct repsonses in a psychology experiment

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Psych$score)
EDA(Psych$score)

Weekly incomes of a random sample of 50 Puerto Rican families in Miami

Description

Data for Exercise 5.22 and 5.65

Usage

Puerto

Format

A data frame/tibble with 50 observations on one variable

income: weekly family income (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Puerto$income)
boxplot(Puerto$income, col = "purple")
t.test(Puerto$income,conf.level = .90)$conf

Plasma LDL levels in two groups of quail

Description

Data for Exercise 1.53, 1.77, 1.88, 5.66, and 7.50

Usage

Quail

Format

A data frame/tibble with 40 observations on two variables

group: a character variable with values placebo and treatment
level: low-density lipoprotein (LDL) cholestrol level

Source

J. McKean, and T. Vidmar (1994), "A Comparison of Two Rank-Based Methods for the Analysis of Linear Models," The American Statistician, 48, 220-229.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(level ~ group, data = Quail, horizontal = TRUE, xlab = "LDL Level",
        col = c("yellow", "lightblue"))

Quality control test scores on two manufacturing processes

Description

Data for Exercise 7.81

Usage

Quality

Format

A data frame/tibble with 15 observations on two variables

process: a character variable with values Process1 and Process2
score: results of a quality control test

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ process, data = Quality, col = "lightgreen")
t.test(score ~ process, data = Quality)

Rainfall in an area of west central Kansas and four surrounding counties

Description

Data for Exercise 9.8

Usage

Rainks

Format

A data frame/tibble with 35 observations on five variables

rain: rainfall (in inches)
x1: rainfall (in inches)
x2: rainfall (in inches)
x3: rainfall (in inches)
x4: rainfall (in inches)

Source

R. Picard, K. Berk (1990), Data Splitting, The American Statistician, 44, (2), 140-147.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


cor(Rainks)
model <- lm(rain ~ x2, data = Rainks)
summary(model)

Research and development expenditures and sales of a large company

Description

Data for Exercise 9.36 and Example 9.8

Usage

Randd

Format

A data frame/tibble with 12 observations on two variables

rd: research and development expenditures (in million dollars)
sales: sales (in million dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(sales ~ rd, data = Randd)
model <- lm(sales ~ rd, data = Randd)
abline(model, col = "purple")
summary(model)
plot(model, which = 1)
rm(model)

Survival times of 20 rats exposed to high levels of radiation

Description

Data for Exercise 1.52, 1.76, 5.62, and 6.44

Usage

Rat

Format

A data frame/tibble with 20 observations on one variable

survival_time: survival time in weeks for rats exposed to a high level of radiation

Source

J. Lawless, Statistical Models and Methods for Lifetime Data (New York: Wiley, 1982).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Rat$survival_time)
qqnorm(Rat$survival_time)
qqline(Rat$survival_time)
summary(Rat$survival_time)
t.test(Rat$survival_time)
t.test(Rat$survival_time, mu = 100, alternative = "greater")

Grade point averages versus teacher's ratings

Description

Data for Example 2.6

Usage

Ratings

Format

A data frame/tibble with 250 observations on two variables

rating: character variable with students' ratings of instructor (A-F)
gpa: students' grade point average

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(gpa ~ rating, data = Ratings, xlab = "Student rating of instructor", 
        ylab = "Student GPA")
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Ratings, aes(x = rating, y = gpa, fill = rating)) +
           geom_boxplot() + 
           theme_bw() + 
           theme(legend.position = "none") + 
           labs(x = "Student rating of instructor", y = "Student GPA")

## End(Not run)

Threshold reaction time for persons subjected to emotional stress

Description

Data for Example 6.11

Usage

Reaction

Format

A data frame/tibble with 12 observations on one variable

time: threshold reaction time (in seconds) for persons subjected to emotional stress

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Reaction$time)
SIGN.test(Reaction$time, md = 15, alternative = "less")

Standardized reading scores for 30 fifth graders

Description

Data for Exercise 1.72 and 2.10

Usage

Reading

Format

A data frame/tibble with 30 observations on four variables

score: standardized reading test score
sorted: sorted values of score
trimmed: trimmed values of sorted
winsoriz: winsorized values of score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Reading$score, main = "Exercise 1.72", 
     col = "lightgreen", xlab = "Standardized reading score")
summary(Reading$score)
sd(Reading$score)

Reading scores versus IQ scores

Description

Data for Exercises 2.10 and 2.53

Usage

Readiq

Format

A data frame/tibble with 14 observations on two variables

reading: reading achievement score
iq: IQ score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(reading ~ iq, data = Readiq)
model <- lm(reading ~ iq, data = Readiq)
abline(model, col = "purple")
predict(model, newdata = data.frame(iq = c(100, 120)))
residuals(model)[c(6, 7)]
rm(model)

Opinion on referendum by view on freedom of the press

Description

Data for Exercise 8.20

Usage

Referend

Format

A data frame with 237 observations on two variables

choice: a factor with levels A, B, and C
response: a factor with levels for, against, and undecided

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~choice + response, data = Referend)
T1
chisq.test(T1)
chisq.test(T1)$expected

Pollution index taken in three regions of the country

Description

Data for Exercise 10.26

Usage

Region

Format

A data frame/tibble with 48 observations on three variables

pollution: pollution index
region: region of a county (west, central, and east)
ranks: ranked values of pollution

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(pollution ~ region, data = Region, col = "gray")
anova(lm(pollution ~ region, data = Region))

Maintenance cost versus age of cash registers in a department store

Description

Data for Exercise 2.3, 2.39, and 2.54

Usage

Register

Format

A data frame/tibble with nine observations on two variables

age: age of cash register (in years)
cost: maintenance cost of cash register (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(cost ~ age, data = Register)
model <- lm(cost ~ age, data = Register)
abline(model, col = "red")
predict(model, newdata = data.frame(age = c(5, 10)))
plot(model, which = 1)
rm(model)

Rehabilitative potential of 20 prison inmates as judged by two psychiatrists

Description

Data for Exercise 7.61

Usage

Rehab

Format

A data frame/tibble with 20 observations on four variables

inmate: inmate identification number
psych1: rating from first psychiatrist on the inmates rehabilative potential
psych2: rating from second psychiatrist on the inmates rehabilative potential
differ: psych1 - psych2

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(Rehab$differ)
qqnorm(Rehab$differ)
qqline(Rehab$differ)
t.test(Rehab$differ)

Math placement test score for 35 freshmen females and 42 freshmen males

Description

Data for Exercise 7.43

Usage

Remedial

Format

A data frame/tibble with 84 observations on two variables

gender: a character variable with values female and male
score: math placement score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ gender, data = Remedial, 
col = c("purple", "blue"))
t.test(score ~ gender, data = Remedial, conf.level = 0.98)
t.test(score ~ gender, data = Remedial, conf.level = 0.98)$conf
wilcox.test(score ~ gender, data = Remedial, 
            conf.int = TRUE, conf.level = 0.98)

Weekly rentals for 45 apartments

Description

Data for Exercise 1.122

Usage

Rentals

Format

A data frame/tibble with 45 observations on one variable

rent: weekly apartment rental price (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Rentals$rent)
sum(Rentals$rent < mean(Rentals$rent) - 3*sd(Rentals$rent) | 
   Rentals$rent > mean(Rentals$rent) + 3*sd(Rentals$rent))

Recorded times for repairing 22 automobiles involved in wrecks

Description

Data for Exercise 5.77

Usage

Repair

Format

A data frame/tibble with 22 observations on one variable

time: time to repair a wrecked in car (in hours)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Repair$time)
SIGN.test(Repair$time, conf.level = 0.98)

Length of employment versus gross sales for 10 employees of a large retail store

Description

Data for Exercise 9.59

Usage

Retail

Format

A data frame/tibble with 10 observations on two variables

months: length of employment (in months)
sales: employee gross sales (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(sales ~ months, data = Retail)
model <- lm(sales ~ months, data = Retail)
abline(model, col = "blue")
summary(model)

Oceanography data obtained at site 1 by scientist aboard the ship Ron Brown

Description

Data for Exercise 2.9

Usage

Ronbrown1

Format

A data frame/tibble with 75 observations on two variables

depth: ocen depth (in meters)
temperature: ocean temperature (in Celsius)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(temperature ~ depth, data = Ronbrown1, ylab = "Temperature")

Oceanography data obtained at site 2 by scientist aboard the ship Ron Brown

Description

Data for Exercise 2.56 and Example 2.4

Usage

Ronbrown2

Format

A data frame/tibble with 150 observations on three variables

depth: ocean depth (in meters)
temperature: ocean temperature (in Celcius)
salinity: ocean salinity level

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(salinity ~ depth, data = Ronbrown2)
model <- lm(salinity ~ depth, data = Ronbrown2)
summary(model)
plot(model, which = 1)
rm(model)

Social adjustment scores for a rural group and a city group of children

Description

Data for Example 7.16

Usage

Rural

Format

A data frame/tibble with 33 observations on two variables

score: child's social adjustment score
area: character variable with values city and rural

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ area, data = Rural)
wilcox.test(score ~ area, data = Rural)
## Not run: 
library(dplyr)
Rural <- dplyr::mutate(Rural, r = rank(score))
Rural
t.test(r ~ area, data = Rural)

## End(Not run)

Starting salaries for 25 new PhD psychologist

Description

Data for Exercise 3.66

Usage

Salary

Format

A data frame/tibble with 25 observations on one variable

salary: starting salary for Ph.D. psycholgists (in dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Salary$salary, pch = 19, col = "purple")
qqline(Salary$salary, col = "blue")

Surface-water salinity measurements from Whitewater Bay, Florida

Description

Data for Exercise 5.27 and 5.64

Usage

Salinity

Format

A data frame/tibble with 48 observations on one variable

salinity: surface-water salinity value

Source

J. Davis, Statistics and Data Analysis in Geology, 2nd ed. (New York: John Wiley, 1986).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Salinity$salinity)
qqnorm(Salinity$salinity, pch = 19, col = "purple")
qqline(Salinity$salinity, col = "blue")
t.test(Salinity$salinity, conf.level = 0.99)
t.test(Salinity$salinity, conf.level = 0.99)$conf

SAT scores, percent taking exam and state funding per student by state for 1994, 1995 and 1999

Description

Data for Statistical Insight Chapter 9

Usage

Sat

Format

A data frame/tibble with 102 observations on seven variables

state: U.S. state
verbal: verbal SAT score
math: math SAT score
total: combined verbal and math SAT score
percent: percent of high school seniors taking the SAT
expend: state expenditure per student (in dollars)
year: year

Source

The 2000 World Almanac and Book of Facts, Funk and Wagnalls Corporation, New Jersey.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


Sat94 <- Sat[Sat$year == 1994, ]
Sat94
Sat99 <- subset(Sat, year == 1999)
Sat99
stem(Sat99$total)
plot(total ~ percent, data = Sat99)
model <- lm(total ~ percent, data = Sat99)
abline(model, col = "blue")
summary(model)
rm(model)

Problem asset ration for savings and loan companies in California, New York, and Texas

Description

Data for Exercise 10.34 and 10.49

Usage

Saving

Format

A data frame/tibble with 65 observations on two variables

par: problem-asset-ratio for Savings & Loans that were listed as being financially troubled in 1992
state: U.S. state

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(par ~ state, data = Saving, col = "red")
boxplot(par ~ state, data = Saving, log = "y", col = "red")
model <- aov(par ~ state, data = Saving)
summary(model)
plot(TukeyHSD(model))
kruskal.test(par ~ factor(state), data = Saving)

Readings obtained from a 100 pound weight placed on four brands of bathroom scales

Description

Data for Exercise 1.89

Usage

Scales

Format

A data frame/tibble with 20 observations on two variables

brand: variable indicating brand of bathroom scale (A, B, C, or D)
reading: recorded value (in pounds) of a 100 pound weight

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(reading ~ brand, data = Scales, col = rainbow(4), 
ylab = "Weight (lbs)")
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Scales, aes(x = brand, y = reading, fill = brand)) + 
           geom_boxplot() + 
           labs(y = "weight (lbs)") +
           theme_bw() + 
           theme(legend.position = "none") 

## End(Not run)

Exam scores for 17 patients to assess the learning ability of schizophrenics after taking a specified does of a tranquilizer

Description

Data for Exercise 6.99

Usage

Schizop2

Format

A data frame/tibble with 17 observations on one variable

score: schizophrenics score on a second standardized exam

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Schizop2$score, xlab = "score on standardized test after a tranquilizer", 
main = "Exercise 6.99", breaks = 10, col = "orange")
EDA(Schizop2$score)
SIGN.test(Schizop2$score, md = 22, alternative = "greater")

Standardized exam scores for 13 patients to investigate the learning ability of schizophrenics after a specified dose of a tranquilizer

Description

Data for Example 6.10

Usage

Schizoph

Format

A data frame/tibble with 13 observations on one variable

score: schizophrenics score on a standardized exam one hour after recieving a specified dose of a tranqilizer.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Schizoph$score, xlab = "score on standardized test", 
main = "Example 6.10", breaks = 10, col = "orange")
EDA(Schizoph$score)
t.test(Schizoph$score, mu = 20)

Injury level versus seatbelt usage

Description

Data for Exercise 8.24

Usage

Seatbelt

Format

A data frame/tibble with 86,759 observations on two variables

seatbelt: a factor with levels No and Yes
injuries: a factor with levels None, Minimal, Minor, or Major indicating the extent of the drivers injuries

Source

Jobson, J. (1982), Applied Multivariate Data Analysis, Springer-Verlag, New York, p. 18.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~seatbelt + injuries, data = Seatbelt)
T1
chisq.test(T1)
rm(T1)

Self-confidence scores for 9 women before and after instructions on self-defense

Description

Data for Example 7.19

Usage

Selfdefe

Format

A data frame/tibble with nine observations on three variables

woman: number identifying the woman
before: before the course self-confidence score
after: after the course self-confidence score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


Selfdefe$differ <- Selfdefe$after - Selfdefe$before
Selfdefe
t.test(Selfdefe$differ, alternative = "greater")

Reaction times of 30 senior citizens applying for drivers license renewals

Description

Data for Exercise 1.83 and 3.67

Usage

Senior

Format

A data frame/tibble with 31 observations on one variable

reaction: reaction time for senior citizens applying for a driver's license renewal

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Senior$reaction)
fivenum(Senior$reaction)
boxplot(Senior$reaction, main = "Problem 1.83, part d",
        horizontal = TRUE, col = "purple")

Sentences of 41 prisoners convicted of a homicide offense

Description

Data for Exercise 1.123

Usage

Sentence

Format

A data frame/tibble with 41 observations on one variable

months: sentence length (in months) for prisoners convicted of homocide

Source

U.S. Department of Justice, Bureau of Justice Statistics, Prison Sentences and Time Served for Violence, NCJ-153858, April 1995.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Sentence$months)
ll <- mean(Sentence$months)-2*sd(Sentence$months)
ul <- mean(Sentence$months)+2*sd(Sentence$months)
limits <- c(ll, ul)
limits
rm(ul, ll, limits)

Effects of a drug and electroshock therapy on the ability to solve simple tasks

Description

Data for Exercises 10.11 and 10.12

Usage

Shkdrug

Format

A data frame/tibble with 64 observations on two variables

treatment: type of treament Drug/NoS, Drug/Shk, NoDg/NoS, or NoDrug/S
response: number of tasks completed in a 10-minute period

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(response ~ treatment, data = Shkdrug, col = "gray")
model <- lm(response ~ treatment, data = Shkdrug)
anova(model)
rm(model)

Effect of experimental shock on time to complete difficult task

Description

Data for Exercise 10.50

Usage

Shock

Format

A data frame/tibble with 27 observations on two variables

group: grouping variable with values of Group1 (no shock), Group2 (medium shock), and Group3 (severe shock)
attempts: number of attempts to complete a task

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(attempts ~ group, data = Shock, col = "violet")
model <- lm(attempts ~ group, data = Shock)
anova(model)
rm(model)

Sales receipts versus shoplifting losses for a department store

Description

Data for Exercise 9.58

Usage

Shoplift

Format

A data frame/tibble with eight observations on two variables

sales: sales (in 1000 dollars)
loss: loss (in 100 dollars)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(loss ~ sales, data = Shoplift)
model <- lm(loss ~ sales, data = Shoplift)
summary(model)
rm(model)

James Short's measurements of the parallax of the sun

Description

Data for Exercise 6.65

Usage

Short

Format

A data frame/tibble with 158 observations on two variables

sample: sample number
parallax: parallax measurements (seconds of a degree)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Short$parallax, main = "Problem 6.65", 
xlab = "", col = "orange")
SIGN.test(Short$parallax, md = 8.798)
t.test(Short$parallax, mu = 8.798)

Number of people riding shuttle versus number of automobiles in the downtown area

Description

Data for Exercise 9.20

Usage

Shuttle

Format

A data frame/tibble with 15 observations on two variables

users: number of shuttle riders
autos: number of automobiles in the downtown area

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(autos ~ users, data = Shuttle)
model <- lm(autos ~ users, data = Shuttle)
summary(model)
rm(model)

Sign Test

Description

This function will test a hypothesis based on the sign test and reports linearly interpolated confidence intervals for one sample problems.

Usage

SIGN.test(
  x,
  y = NULL,
  md = 0,
  alternative = "two.sided",
  conf.level = 0.95,
  ...
)

Arguments

x

numeric vector; NAs and Infs are allowed but will be removed.

y

optional numeric vector; NAs and Infs are allowed but will be removed.

md

a single number representing the value of the population median specified by the null hypothesis

alternative

is a character string, one of "greater", "less", or "two.sided", or the initial letter of each, indicating the specification of the alternative hypothesis. For one-sample tests, alternative refers to the true median of the parent population in relation to the hypothesized value of the median.

conf.level

confidence level for the returned confidence interval, restricted to lie between zero and one

...

further arguments to be passed to or from methods

Details

Computes a “Dependent-samples Sign-Test” if both x and y are provided. If only x is provided, computes the “Sign-Test”.

Value

A list of class htest_S, containing the following components:

statistic

the S-statistic (the number of positive differences between the data and the hypothesized median), with names attribute “S”.

p.value

the p-value for the test

conf.int

is a confidence interval (vector of length 2) for the true median based on linear interpolation. The confidence level is recorded in the attribute conf.level. When the alternative is not "two.sided", the confidence interval will be half-infinite, to reflect the interpretation of a confidence interval as the set of all values k for which one would not reject the null hypothesis that the true mean or difference in means is k. Here infinity will be represented by Inf.

estimate

is avector of length 1, giving the sample median; this estimates the corresponding population parameter. Component estimate has a names attribute describing its elements.

null.value

is the value of the median specified by the null hypothesis. This equals the input argument md. Component null.value has a names attribute describing its elements.

alternative

records the value of the input argument alternative: "greater", "less", or "two.sided"

data.name

a character string (vector of length 1) containing the actual name of the input vector x

Confidence.Intervals

a 3 by 3 matrix containing the lower achieved confidence interval, the interpolated confidence interval, and the upper achived confidence interval

Null Hypothesis

For the one-sample sign-test, the null hypothesis is that the median of the population from which x is drawn is md. For the two-sample dependent case, the null hypothesis is that the median for the differences of the populations from which x and y are drawn is md. The alternative hypothesis indicates the direction of divergence of the population median for x from md (i.e., "greater", "less", "two.sided".)

Note

The reported confidence interval is based on linear interpolation. The lower and upper confidence levels are exact.

Author(s)

Alan T. Arnholt

References

Gibbons, J.D. and Chakraborti, S. (1992). Nonparametric Statistical Inference. Marcel Dekker Inc., New York.

Kitchens, L.J.(2003). Basic Statistics and Data Analysis. Duxbury.

Conover, W. J. (1980). Practical Nonparametric Statistics, 2nd ed. Wiley, New York.

Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden and Day, San Francisco.

Examples


x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8)
SIGN.test(x, md = 6.5)
        # Computes two-sided sign-test for the null hypothesis 
        # that the population median for 'x' is 6.5. The alternative 
        # hypothesis is that the median is not 6.5. An interpolated 95% 
        # confidence interval for the population median will be computed.
        
reaction <- c(14.3, 13.7, 15.4, 14.7, 12.4, 13.1, 9.2, 14.2, 
              14.4, 15.8, 11.3, 15.0)
SIGN.test(reaction, md = 15, alternative = "less")
        # Data from Example 6.11 page 330 of Kitchens BSDA.  
        # Computes one-sided sign-test for the null hypothesis 
        # that the population median is 15.  The alternative 
        # hypothesis is that the median is less than 15.  
        # An interpolated upper 95% upper bound for the population 
        # median will be computed.

Grade point averages of men and women participating in various sports-an illustration of Simpson's paradox

Description

Data for Example 1.18

Usage

Simpson

Format

A data frame/tibble with 100 observations on three variables

gpa: grade point average
sport: sport played (basketball, soccer, or track)
gender: athlete sex (male, female)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(gpa ~ gender, data = Simpson, col = "violet")
boxplot(gpa ~ sport, data = Simpson, col = "lightgreen")
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Simpson, aes(x = gender, y = gpa, fill = gender)) +
           geom_boxplot() + 
           facet_grid(.~sport) + 
           theme_bw()

## End(Not run)

Maximum number of situps by participants in an exercise class

Description

Data for Exercise 1.47

Usage

Situp

Format

A data frame/tibble with 20 observations on one variable

number: maximum number of situps completed in an exercise class after 1 month in the program

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Situp$number)
hist(Situp$number, breaks = seq(0, 70, 10), right = FALSE)
hist(Situp$number, breaks = seq(0, 70, 10), right = FALSE, 
     freq = FALSE, col = "pink", main = "Problem 1.47", 
     xlab = "Maximum number of situps")
lines(density(Situp$number), col = "red")

Illustrates the Wilcoxon Rank Sum test

Description

Data for Exercise 7.65

Usage

Skewed

Format

A data frame/tibble with 21 observations on two variables

C1: values from a sample of size 16 from a particular population
C2: values from a sample of size 14 from a particular population

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(Skewed$C1, Skewed$C2, col = c("pink", "lightblue"))
wilcox.test(Skewed$C1, Skewed$C2)

Survival times of closely and poorly matched skin grafts on burn patients

Description

Data for Exercise 5.20

Usage

Skin

Format

A data frame/tibble with 11 observations on four variables

patient: patient identification number
close: graft survival time in days for a closely matched skin graft on the same burn patient
poor: graft survival time in days for a poorly matched skin graft on the same burn patient
differ: difference between close and poor (in days)

Source

R. F. Woolon and P. A. Lachenbruch, "Rank Tests for Censored Matched Pairs," Biometrika, 67(1980), 597-606.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Skin$differ)
boxplot(Skin$differ, col = "pink")
summary(Skin$differ)

Sodium-lithium countertransport activity on 190 individuals from six large English kindred

Description

Data for Exercise 5.116

Usage

Slc

Format

A data frame/tibble with 190 observations on one variable

slc: Red blood cell sodium-lithium countertransport

Source

Roeder, K., (1994), "A Graphical Technique for Determining the Number of Components in a Mixture of Normals," Journal of the American Statistical Association, 89, 497-495.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Slc$slc)
hist(Slc$slc, freq = FALSE, xlab = "sodium lithium countertransport",
     main = "", col = "lightblue")
lines(density(Slc$slc), col = "purple")

Water pH levels of 75 water samples taken in the Great Smoky Mountains

Description

Data for Exercises 6.40, 6.59, 7.10, and 7.35

Usage

Smokyph

Format

A data frame/tibble with 75 observations on three variables

waterph: water sample pH level
code: charater variable with values low (elevation below 0.6 miles), and high (elevation above 0.6 miles)
elev: elevation in miles

Source

Schmoyer, R. L. (1994), Permutation Tests for Correlation in Regression Errors, Journal of the American Statistical Association, 89, 1507-1516.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


summary(Smokyph$waterph)
tapply(Smokyph$waterph, Smokyph$code, mean)
stripchart(waterph ~ code, data = Smokyph, method = "stack",
           pch = 19, col = c("red", "blue"))
           t.test(Smokyph$waterph, mu = 7)
           SIGN.test(Smokyph$waterph, md = 7)
           t.test(waterph ~ code, data = Smokyph, alternative = "less")
           t.test(waterph ~ code, data = Smokyph, conf.level = 0.90)
 ## Not run: 
 library(ggplot2)
 ggplot2::ggplot(data = Smokyph, aes(x = waterph, fill = code)) + 
            geom_dotplot() + 
            facet_grid(code ~ .) + 
            guides(fill = FALSE)

## End(Not run)

Snoring versus heart disease

Description

Data for Exercise 8.21

Usage

Snore

Format

A data frame/tibble with 2,484 observations on two variables

snore: factor with levels nonsnorer, ocassional snorer, nearly every night, and snores every night
heartdisease: factor indicating whether the indiviudal has heart disease (no or yes)

Source

Norton, P. and Dunn, E. (1985), Snoring as a Risk Factor for Disease, British Medical Journal, 291, 630-632.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~ heartdisease + snore, data = Snore)
T1
chisq.test(T1)
rm(T1)

Concentration of microparticles in snowfields of Greenland and Antarctica

Description

Data for Exercise 7.87

Usage

Snow

Format

A data frame/tibble with 34 observations on two variables

concent: concentration of microparticles from melted snow (in parts per billion)
site: location of snow sample (Antarctica or Greenland)

Source

Davis, J., Statistics and Data Analysis in Geology, John Wiley, New York.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(concent ~ site, data = Snow, col = c("lightblue", "lightgreen"))

Weights of 25 soccer players

Description

Data for Exercise 1.46

Usage

Soccer

Format

A data frame/tibble with 25 observations on one variable

weight: soccer players weight (in pounds)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Soccer$weight, scale = 2)
hist(Soccer$weight, breaks = seq(110, 210, 10), col = "orange",
     main = "Problem 1.46 \n Weights of Soccer Players", 
     xlab = "weight (lbs)", right = FALSE)

Description

Data for Exercise 6.63

Usage

Social

Format

A data frame/tibble with 25 observations on one variable

income: annual income (in dollars) of North Carolina social workers with less than five years experience.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


SIGN.test(Social$income, md = 27500, alternative = "less")

Grade point averages, SAT scores and final grade in college algebra for 20 sophomores

Description

Data for Exercise 2.42

Usage

Sophomor

Format

A data frame/tibble with 20 observations on four variables

student: identification number
gpa: grade point average
sat: SAT math score
exam: final exam grade in college algebra

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


cor(Sophomor)
plot(exam ~ gpa, data = Sophomor)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Sophomor, aes(x = gpa, y = exam)) + 
           geom_point()
           ggplot2::ggplot(data = Sophomor, aes(x = sat, y = exam)) + 
           geom_point()

## End(Not run)

Murder rates for 30 cities in the South

Description

Data for Exercise 1.84

Usage

South

Format

A data frame/tibble with 31 observations on one variable

rate: murder rate per 100,000 people

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(South$rate, col = "gray", ylab = "Murder rate per 100,000 people")

Speed reading scores before and after a course on speed reading

Description

Data for Exercise 7.58

Usage

Speed

Format

A data frame/tibble with 15 observations on four variables

before: reading comprehension score before taking a speed-reading course
after: reading comprehension score after taking a speed-reading course
differ: after - before (comprehension reading scores)
signranks: signed ranked differences

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


t.test(Speed$differ, alternative = "greater")
t.test(Speed$signranks, alternative = "greater")
wilcox.test(Pair(Speed$after, Speed$before) ~ 1, data = Speed, alternative = "greater")

Standardized spelling test scores for two fourth grade classes

Description

Data for Exercise 7.82

Usage

Spellers

Format

A data frame/tibble with ten observations on two variables

teacher: character variable with values Fourth and Colleague
score: score on a standardized spelling test

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ teacher, data = Spellers, col = "pink")
t.test(score ~ teacher, data = Spellers)

Spelling scores for 9 eighth graders before and after a 2-week course of instruction

Description

Data for Exercise 7.56

Usage

Spelling

Format

A data frame/tibble with nine observations on three variables

before: spelling score before a 2-week course of instruction
after: spelling score after a 2-week course of instruction
differ: after - before (spelling score)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Spelling$differ)
qqline(Spelling$differ)
shapiro.test(Spelling$differ)
t.test(Spelling$differ)

Favorite sport by gender

Description

Data for Exercise 8.32

Usage

Sports

Format

A data frame/tibble with 200 observations on two variables

gender: a factor with levels male and female
sport: a factor with levels football, basketball, baseball, and tennis

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~gender + sport, data = Sports)
T1
chisq.test(T1)
rm(T1)

Convictions in spouse murder cases by gender

Description

Data for Exercise 8.33

Usage

Spouse

Format

A data frame/tibble with 540 observations on two variables

result: a factor with levels not prosecuted, pleaded guilty, convicted, and acquited
spouse: a factor with levels husband and wife

Source

Bureau of Justice Statistics (September 1995), Spouse Murder Defendants in Large Urban Counties, Executive Summary, NCJ-156831.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~result + spouse, data = Spouse)
T1
chisq.test(T1)
rm(T1)

Simple Random Sampling

Description

Computes all possible samples from a given population using simple random sampling.

Usage

SRS(POPvalues, n)

Arguments

POPvalues

vector containing the poulation values.

n

the sample size.

Value

Returns a matrix containing the possible simple random samples of size n taken from a population POPvalues.

Author(s)

Alan T. Arnholt

Examples


SRS(c(5,8,3),2)
    # The rows in the matrix list the values for the 3 possible
    # simple random samples of size 2 from the population of 5,8, and 3.

Times of a 2-year old stallion on a one mile run

Description

Data for Exercise 6.93

Usage

Stable

Format

A data frame/tibble with nine observations on one variable

time: time (in seconds) for horse to run 1 mile

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


SIGN.test(Stable$time, md = 98.5, alternative = "greater")

Thicknesses of 1872 Hidalgo stamps issued in Mexico

Description

Data for Statistical Insight Chapter 1 and Exercise 5.110

Usage

Stamp

Format

A data frame/tibble with 485 observations on one variable

thickness: stamp thickness (in mm)

Source

Izenman, A., Sommer, C. (1988), Philatelic Mixtures and Multimodal Densities, Journal of the American Statistical Association, 83, 941-953.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Stamp$thickness, freq = FALSE, col = "lightblue", 
     main = "", xlab = "stamp thickness (mm)")
lines(density(Stamp$thickness), col = "blue")
t.test(Stamp$thickness, conf.level = 0.99)

Grades for two introductory statistics classes

Description

Data for Exercise 7.30

Usage

Statclas

Format

A data frame/tibble with 72 observations on two variables

class: class meeting time (9am or 2pm)
score: grade for an introductory statistics class

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


str(Statclas)
boxplot(score ~ class, data = Statclas, col = "red")
t.test(score ~ class, data = Statclas)

Operating expenditures per resident for each of the state law enforcement agencies

Description

Data for Exercise 6.62

Usage

Statelaw

Format

A data frame/tibble with 50 observations on two variables

state: U.S. state
cost: dollars spent per resident on law enforcement

Source

Bureau of Justice Statistics, Law Enforcement Management and Administrative Statistics, 1993, NCJ-148825, September 1995, page 84.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Statelaw$cost)
SIGN.test(Statelaw$cost, md = 8, alternative = "less")

Test scores for two beginning statistics classes

Description

Data for Exercises 1.70 and 1.87

Usage

Statisti

Format

A data frame/tibble with 62 observations on two variables

class: character variable with values Class1 and Class2
score: test score for an introductory statistics test

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ class, data = Statisti, col = "violet")
tapply(Statisti$score, Statisti$class, summary, na.rm = TRUE)
## Not run: 
library(dplyr)
dplyr::group_by(Statisti, class) %>%
 summarize(Mean = mean(score, na.rm = TRUE), 
           Median = median(score, na.rm = TRUE), 
           SD = sd(score, na.rm = TRUE),
           RS = IQR(score, na.rm = TRUE))

## End(Not run)

STEP science test scores for a class of ability-grouped students

Description

Data for Exercise 6.79

Usage

Step

Format

A data frame/tibble with 12 observations on one variable

score: State test of educational progress (STEP) science test score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Step$score)
t.test(Step$score, mu = 80, alternative = "less")
wilcox.test(Step$score, mu = 80, alternative = "less")

Short-term memory test scores on 12 subjects before and after a stressful situation

Description

Data for Example 7.20

Usage

Stress

Format

A data frame/tibble with 12 observations on two variables

prestress: short term memory score before being exposed to a stressful situation
poststress: short term memory score after being exposed to a stressful situation

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


diff <- Stress$prestress - Stress$poststress
qqnorm(diff)
qqline(diff)
t.test(diff)
## Not run: 
wilcox.test(Pair(Stress$prestress, Stress$poststress)~1, data = Stress)

## End(Not run)

Number of hours studied per week by a sample of 50 freshmen

Description

Data for Exercise 5.25

Usage

Study

Format

A data frame/tibble with 50 observations on one variable

hours: number of hours a week freshmen reported studying for their courses

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Study$hours)
hist(Study$hours, col = "violet")
summary(Study$hours)

Number of German submarines sunk by U.S. Navy in World War II

Description

Data for Exercises 2.16, 2.45, and 2.59

Usage

Submarin

Format

A data frame/tibble with 16 observations on three variables

month: month
reported: number of submarines reported sunk by U.S. Navy
actual: number of submarines actually sunk by U.S. Navy

Source

F. Mosteller, S. Fienberg, and R. Rourke, Beginning Statistics with Data Analysis (Reading, MA: Addison-Wesley, 1983).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


model <- lm(actual ~ reported, data = Submarin)
summary(model)
plot(actual ~ reported, data = Submarin)
abline(model, col = "red")
rm(model)

Time it takes a subway to travel from the airport to downtown

Description

Data for Exercise 5.19

Usage

Subway

Format

A data frame/tibble with 30 observations on one variable

time: time (in minutes) it takes a subway to travel from the airport to downtown

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Subway$time, main = "Exercise 5.19", 
xlab = "Time (in minutes)", col = "purple")
summary(Subway$time)

Wolfer sunspot numbers from 1700 through 2000

Description

Data for Example 1.7

Usage

Sunspot

Format

A data frame/tibble with 301 observations on two variables

year: year
sunspots: average number of sunspots for the year

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(sunspots ~ year, data = Sunspot, type = "l")
## Not run: 
library(ggplot2)
lattice::xyplot(sunspots ~ year, data = Sunspot, 
                main = "Yearly sunspots", type = "l")
lattice::xyplot(sunspots ~ year, data = Sunspot, type = "l", 
                main = "Yearly sunspots", aspect = "xy")
ggplot2::ggplot(data = Sunspot, aes(x = year, y = sunspots)) + 
           geom_line() + 
           theme_bw()

## End(Not run)

Margin of victory in Superbowls I to XXXV

Description

Data for Exercise 1.54

Usage

Superbowl

Format

A data frame/tibble with 35 observations on five variables

winning_team: name of Suberbowl winning team
winner_score: winning score for the Superbowl
losing_team: name of Suberbowl losing team
loser_score: score of losing teama numeric vector
victory_margin: winner_score - loser_score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Superbowl$victory_margin)

Top speeds attained by five makes of supercars

Description

Data for Statistical Insight Chapter 10

Usage

Supercar

Format

A data frame/tibble with 30 observations on two variables

speed: top speed (in miles per hour) of car without redlining
car: name of sports car

Source

Car and Drvier (July 1995).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(speed ~ car, data = Supercar, col = rainbow(6),
        ylab = "Speed (mph)")
summary(aov(speed ~ car, data = Supercar))
anova(lm(speed ~ car, data = Supercar))

Ozone concentrations at Mt. Mitchell, North Carolina

Description

Data for Exercise 5.63

Usage

Tablrock

Format

A data frame/tibble with 719 observations on the following 17 variables.

day: date
hour: time of day
ozone: ozone concentration
tmp: temperature (in Celcius)
vdc: a numeric vector
wd: a numeric vector
ws: a numeric vector
amb: a numeric vector
dew: a numeric vector
so2: a numeric vector
no: a numeric vector
no2: a numeric vector
nox: a numeric vector
co: a numeric vector
co2: a numeric vector
gas: a numeric vector
air: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


summary(Tablrock$ozone)
boxplot(Tablrock$ozone)
qqnorm(Tablrock$ozone)
qqline(Tablrock$ozone)
par(mar = c(5.1 - 1, 4.1 + 2, 4.1 - 2, 2.1))
boxplot(ozone ~ day, data = Tablrock, 
        horizontal = TRUE, las = 1, cex.axis = 0.7)
        par(mar = c(5.1, 4.1, 4.1, 2.1))
## Not run: 
library(ggplot2)
  ggplot2::ggplot(data = Tablrock, aes(sample = ozone)) + 
             geom_qq() + 
             theme_bw()
  ggplot2::ggplot(data = Tablrock, aes(x = as.factor(day), y = ozone)) + 
             geom_boxplot(fill = "pink") + 
             coord_flip() + 
             labs(x = "") + 
             theme_bw()

## End(Not run)

Average teacher's salaries across the states in the 70s 80s and 90s

Description

Data for Exercise 5.114

Usage

Teacher

Format

A data frame/tibble with 51 observations on three variables

state: U.S. state
year: academic year
salary: avaerage salary (in dollars)

Source

National Education Association.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


par(mfrow = c(3, 1))
hist(Teacher$salary[Teacher$year == "1973-74"],
     main = "Teacher salary 1973-74", xlab = "salary",
     xlim = range(Teacher$salary, na.rm = TRUE))
hist(Teacher$salary[Teacher$year == "1983-84"],
     main = "Teacher salary 1983-84", xlab = "salary",
     xlim = range(Teacher$salary, na.rm = TRUE))
hist(Teacher$salary[Teacher$year == "1993-94"],
     main = "Teacher salary 1993-94", xlab = "salary",
     xlim = range(Teacher$salary, na.rm = TRUE))
par(mfrow = c(1, 1))
## Not run:    
library(ggplot2)                    
    ggplot2::ggplot(data = Teacher, aes(x = salary)) + 
               geom_histogram(fill = "purple", color = "black") +  
               facet_grid(year ~ .) + 
               theme_bw()

## End(Not run)

Tennessee self concept scores for 20 gifted high school students

Description

Data for Exercise 6.56

Usage

Tenness

Format

A data frame/tibble with 20 observations on one variable

score: Tennessee Self-Concept Scale score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Tenness$score, freq= FALSE, main = "", col = "green",
xlab = "Tennessee Self-Concept Scale score")
lines(density(Tenness$score))
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Tenness, aes(x = score, y = ..density..)) + 
           geom_histogram(binwidth = 2, fill = "purple", color = "black") +
           geom_density(color = "red", fill = "pink", alpha = 0.3) + 
           theme_bw()

## End(Not run)

Tensile strength of plastic bags from two production runs

Description

Data for Example 7.11

Usage

Tensile

Format

A data frame/tibble with 72 observations on two variables

tensile: plastic bag tensile strength (pounds per square inch)
run: factor with run number (1 or 2)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(tensile ~ run, data = Tensile, 
        col = c("purple", "cyan"))
t.test(tensile ~ run, data = Tensile)

Grades on the first test in a statistics class

Description

Data for Exercise 5.80

Usage

Test1

Format

A data frame/tibble with 25 observations on one variable

score: score on first statistics exam

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Test1$score)
boxplot(Test1$score, col = "purple")

Heat loss of thermal pane windows versus outside temperature

Description

Data for Example 9.5

Usage

Thermal

Format

A data frame/tibble with 12 observations on the two variables

temp: temperature (degrees Celcius)
loss: heat loss (BTUs)

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


model <- lm(loss ~ temp, data = Thermal)
summary(model)
plot(loss ~ temp, data = Thermal)
abline(model, col = "red")
rm(model)

1999-2000 closing prices for TIAA-CREF stocks

Description

Data for your enjoyment

Usage

Tiaa

Format

A data frame/tibble with 365 observations on four variables

crefstk: closing price (in dollars)
crefgwt: closing price (in dollars)
tiaa: closing price (in dollars)
date: day of the year

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


data(Tiaa)

Time to complete an airline ticket reservation

Description

Data for Exercise 5.18

Usage

Ticket

Format

A data frame/tibble with 20 observations on one variable

time: time (in seconds) to check out a reservation

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Ticket$time)

Consumer Reports (Oct 94) rating of toaster ovens versus the cost

Description

Data for Exercise 9.36

Usage

Toaster

Format

A data frame/tibble with 17 observations on three variables

toaster: name of toaster
score: Consumer Reports score
cost: price of toaster (in dollars)

Source

Consumer Reports (October 1994).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(cost ~ score, data = Toaster)
model <- lm(cost ~ score, data = Toaster)
summary(model)
names(summary(model))
summary(model)$r.squared
plot(model, which = 1)

Size of tonsils collected from 1,398 children

Description

Data for Exercise 2.78

Usage

Tonsils

Format

A data frame/tibble with 1,398 observations on two variables

size: a factor with levels Normal, Large, and Very Large
status: a factor with levels Carrier and Non-carrier

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~size + status, data = Tonsils)
T1
prop.table(T1, 1)
prop.table(T1, 1)[2, 1]
barplot(t(T1), legend = TRUE, beside = TRUE, col = c("red", "green"))
## Not run: 
library(dplyr)
library(ggplot2)
NDF <- dplyr::count(Tonsils, size, status) 
ggplot2::ggplot(data = NDF, aes(x = size, y = n, fill = status)) + 
           geom_bar(stat = "identity", position = "dodge") + 
           scale_fill_manual(values = c("red", "green")) + 
           theme_bw()

## End(Not run)

The number of torts, average number of months to process a tort, and county population from the court files of the nation's largest counties

Description

Data for Exercise 5.13

Usage

Tort

Format

A data frame/tibble with 45 observations on five variables

county: U.S. county
months: average number of months to process a tort
population: population of the county
torts: number of torts
rate: rate per 10,000 residents

Source

U.S. Department of Justice, Tort Cases in Large Counties, Bureau of Justice Statistics Special Report, April 1995.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


EDA(Tort$months)

Hazardous waste sites near minority communities

Description

Data for Exercises 1.55, 5.08, 5.109, 8.58, and 10.35

Usage

Toxic

Format

A data frame/tibble with 51 observations on five variables

state: U.S. state
region: U.S. region
sites: number of commercial hazardous waste sites
minority: percent of minorities living in communities with commercial hazardous waste sites
percent: a numeric vector

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


hist(Toxic$sites, col = "red")
hist(Toxic$minority, col = "blue")
qqnorm(Toxic$minority)
qqline(Toxic$minority)
boxplot(sites ~ region, data = Toxic, col = "lightgreen")
tapply(Toxic$sites, Toxic$region, median)
kruskal.test(sites ~ factor(region), data = Toxic)

National Olympic records for women in several races

Description

Data for Exercises 2.97, 5.115, and 9.62

Usage

Track

Format

A data frame with 55 observations on eight variables

country: athlete's country
100m: time in seconds for 100 m
200m: time in seconds for 200 m
400m: time in seconds for 400 m
800m: time in minutes for 800 m
1500m: time in minutes for 1500 m
3000m: time in minutes for 3000 m
marathon: time in minutes for marathon

Source

Dawkins, B. (1989), "Multivariate Analysis of National Track Records," The American Statistician, 43(2), 110-115.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(`200m` ~ `100m`, data = Track)
plot(`400m` ~ `100m`, data = Track)
plot(`400m` ~ `200m`, data = Track)
cor(Track[, 2:8])

Olympic winning times for the men's 1500-meter run

Description

Data for Exercise 1.36

Usage

Track15

Format

A data frame/tibble with 26 observations on two variables

year: Olympic year
time: Olympic winning time (in seconds) for the 1500-meter run

Source

The World Almanac and Book of Facts, 2000.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(time~ year, data = Track15, type = "b", pch = 19,
     ylab = "1500m time in seconds", col = "green")

Illustrates analysis of variance for three treatment groups

Description

Data for Exercise 10.44

Usage

Treatments

Format

A data frame/tibble with 24 observations on two variables

score: score from an experiment
group: factor with levels 1, 2, and 3

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(score ~ group, data = Treatments, col = "violet")
summary(aov(score ~ group, data = Treatments))
summary(lm(score ~ group, data = Treatments))
anova(lm(score ~ group, data = Treatments))

Number of trees in 20 grids

Description

Data for Exercise 1.50

Usage

Trees

Format

A data frame/tibble with 20 observations on one variable

number: number of trees in a grid

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Trees$number)
hist(Trees$number, main = "Exercise 1.50", xlab = "number",
     col = "brown")

Miles per gallon for standard 4-wheel drive trucks manufactured by Chevrolet, Dodge and Ford

Description

Data for Example 10.2

Usage

Trucks

Format

A data frame/tibble with 15 observations on two variables

mpg: miles per gallon
truck: a factor with levels chevy, dodge, and ford

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(mpg ~ truck, data = Trucks, horizontal = TRUE, las = 1)
summary(aov(mpg ~ truck, data = Trucks))

Summarized t-test

Description

Performs a one-sample, two-sample, or a Welch modified two-sample t-test based on user supplied summary information. Output is identical to that produced with t.test.

Usage

tsum.test(
  mean.x,
  s.x = NULL,
  n.x = NULL,
  mean.y = NULL,
  s.y = NULL,
  n.y = NULL,
  alternative = "two.sided",
  mu = 0,
  var.equal = FALSE,
  conf.level = 0.95
)

Arguments

mean.x

a single number representing the sample mean of x

s.x

a single number representing the sample standard deviation for x

n.x

a single number representing the sample size for x

mean.y

a single number representing the sample mean of y

s.y

a single number representing the sample standard deviation for y

n.y

a single number representing the sample size for y

alternative

is a character string, one of "greater", "less" or "two.sided", or just the initial letter of each, indicating the specification of the alternative hypothesis. For one-sample tests, alternative refers to the true mean of the parent population in relation to the hypothesized value mu. For the standard two-sample tests, alternative refers to the difference between the true population mean for x and that for y, in relation to mu. For the one-sample and paired t-tests, alternative refers to the true mean of the parent population in relation to the hypothesized value mu. For the standard and Welch modified two-sample t-tests, alternative refers to the difference between the true population mean for x and that for y, in relation to mu. For the one-sample t-tests, alternative refers to the true mean of the parent population in relation to the hypothesized value mu. For the standard and Welch modified two-sample t-tests, alternative refers to the difference between the true population mean for x and that for y, in relation to mu.

mu

is a single number representing the value of the mean or difference in means specified by the null hypothesis.

var.equal

logical flag: if TRUE, the variances of the parent populations of x and y are assumed equal. Argument var.equal should be supplied only for the two-sample tests.

conf.level

is the confidence level for the returned confidence interval; it must lie between zero and one.

Details

If y is NULL, a one-sample t-test is carried out with x. If y is not NULL, either a standard or Welch modified two-sample t-test is performed, depending on whether var.equal is TRUE or FALSE.

Value

A list of class htest, containing the following components:

statistic

the t-statistic, with names attribute "t"

parameters

is the degrees of freedom of the t-distribution associated with statistic. Component parameters has names attribute "df".

p.value

the p-value for the test.

conf.int

is a confidence interval (vector of length 2) for the true mean or difference in means. The confidence level is recorded in the attribute conf.level. When alternative is not "two.sided", the confidence interval will be half-infinite, to reflect the interpretation of a confidence interval as the set of all values k for which one would not reject the null hypothesis that the true mean or difference in means is k . Here infinity will be represented by Inf.

estimate

vector of length 1 or 2, giving the sample mean(s) or mean of differences; these estimate the corresponding population parameters. Component estimate has a names attribute describing its elements.

null.value

the value of the mean or difference in means specified by the null hypothesis. This equals the input argument mu. Component null.value has a names attribute describing its elements.

alternative

records the value of the input argument alternative: "greater" , "less" or "two.sided".

data.name

a character string (vector of length 1) containing the names x and y for the two summarized samples.

Null Hypothesis

For the one-sample t-test, the null hypothesis is that the mean of the population from which x is drawn is mu. For the standard and Welch modified two-sample t-tests, the null hypothesis is that the population mean for x less that for y is mu.

The alternative hypothesis in each case indicates the direction of divergence of the population mean for x (or difference of means for x and y) from mu (i.e., "greater", "less", or "two.sided").

Author(s)

Alan T. Arnholt

References

Kitchens, L.J. (2003). Basic Statistics and Data Analysis. Duxbury.

Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.

Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.

Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.

Examples


tsum.test(mean.x=5.6, s.x=2.1, n.x=16, mu=4.9, alternative="greater")
        # Problem 6.31 on page 324 of BSDA states:  The chamber of commerce
        # of a particular city claims that the mean carbon dioxide
        # level of air polution is no greater than 4.9 ppm.  A random
        # sample of 16 readings resulted in a sample mean of 5.6 ppm,
        # and s=2.1 ppm.  One-sided one-sample t-test.  The null 
        # hypothesis is that the population mean for 'x' is 4.9.   
        # The alternative hypothesis states that it is greater than 4.9.  

x <- rnorm(12) 
tsum.test(mean(x), sd(x), n.x=12)
        # Two-sided one-sample t-test. The null hypothesis is that  
        # the population mean for 'x' is zero. The alternative 
        # hypothesis states  that it is either greater or less 
        # than zero. A confidence interval for the population mean 
        # will be computed.  Note: above returns same answer as: 
t.test(x)
   
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8) 
y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5) 
tsum.test(mean(x), s.x=sd(x), n.x=11 ,mean(y), s.y=sd(y), n.y=8, mu=2)
        # Two-sided standard two-sample t-test.  The null hypothesis  
        # is that the population mean for 'x' less that for 'y' is 2. 
        # The alternative hypothesis is that this difference is not 2. 
        # A confidence interval for the true difference will be computed.
        # Note: above returns same answer as: 
t.test(x, y)
        
tsum.test(mean(x), s.x=sd(x), n.x=11, mean(y), s.y=sd(y), n.y=8, conf.level=0.90)
        # Two-sided standard two-sample t-test.  The null hypothesis 
        # is that the population mean for 'x' less that for 'y' is zero.  
        # The alternative hypothesis is that this difference is not
        # zero.  A 90% confidence interval for the true difference will 
        # be computed.  Note: above returns same answer as:
t.test(x, y, conf.level=0.90)

Percent of students that watch more than 6 hours of TV per day versus national math test scores

Description

Data for Examples 2.1 and 2.7

Usage

Tv

Format

A data frame/tibble with 53 observations on three variables

state: U.S. state
percent: percent of students who watch more than six hours of TV a day
test: state average on national math test

Source

Educational Testing Services.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(test ~ percent, data = Tv, col = "blue")
cor(Tv$test, Tv$percent)

Intelligence test scores for identical twins in which one twin is given a drug

Description

Data for Exercise 7.54

Usage

Twin

Format

A data frame/tibble with nine observations on three variables

twinA: score on intelligence test without drug
twinB: score on intelligence test after taking drug
differ: twinA - twinB

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


qqnorm(Twin$differ)
qqline(Twin$differ)
shapiro.test(Twin$differ)
t.test(Twin$differ)

Data set describing a sample of undergraduate students

Description

Data for Exercise 1.15

Usage

Undergrad

Format

A data frame/tibble with 100 observations on six variables

gender: character variable with values Female and Male
major: college major
class: college year group classification
gpa: grade point average
sat: Scholastic Assessment Test score
drops: number of courses dropped

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stripchart(gpa ~ class, data = Undergrad, method = "stack", 
col = c("blue","red","green","lightblue"),
pch = 19, main = "GPA versus Class")
stripchart(gpa ~ gender, data = Undergrad, method = "stack", 
           col = c("red", "blue"), pch = 19,
           main = "GPA versus Gender")
           stripchart(sat ~ drops, data = Undergrad, method = "stack", 
           col = c("blue", "red", "green", "lightblue"),
           pch = 19, main = "SAT versus Drops")
stripchart(drops ~ gender, data = Undergrad, method = "stack", 
           col = c("red", "blue"), pch = 19, main = "Drops versus Gender")
 ## Not run: 
 library(ggplot2)
 ggplot2::ggplot(data = Undergrad, aes(x = sat, y = drops, fill = factor(drops))) + 
            facet_grid(drops ~.) +
            geom_dotplot() +
            guides(fill = FALSE)

## End(Not run)

Number of days of paid holidays and vacation leave for sample of 35 textile workers

Description

Data for Exercise 6.46 and 6.98

Usage

Vacation

Format

A data frame/tibble with 35 observations on one variable

number: number of days of paid holidays and vacation leave taken

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(Vacation$number, col = "violet")
hist(Vacation$number, main = "Exercise 6.46", col = "blue",
     xlab = "number of days of paid holidays and vacation leave taken")
     t.test(Vacation$number, mu = 24)

Reported serious reactions due to vaccines in 11 southern states

Description

Data for Exercise 1.111

Usage

Vaccine

Format

A data frame/tibble with 11 observations on two variables

state: U.S. state
number: number of reported serious reactions per million doses of a vaccine

Source

Center for Disease Control, Atlanta, Georgia.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Vaccine$number, scale = 2) 
fn <- fivenum(Vaccine$number)
fn
iqr <- IQR(Vaccine$number)
iqr

Fatality ratings for foreign and domestic vehicles

Description

Data for Exercise 8.34

Usage

Vehicle

Format

A data frame/tibble with 151 observations on two variables

make: a factor with levels domestic and foreign
rating: a factor with levels Much better than average, Above average, Average, Below average, and Much worse than average

Source

Insurance Institute for Highway Safety and the Highway Loss Data Institute, 1995.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~make + rating, data = Vehicle)
T1
chisq.test(T1)

Verbal test scores and number of library books checked out for 15 eighth graders

Description

Data for Exercise 9.30

Usage

Verbal

Format

A data frame/tibble with 15 observations on two variables

number: number of library books checked out
verbal: verbal test score

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(verbal ~ number, data = Verbal)
abline(lm(verbal ~ number, data = Verbal), col = "red")
summary(lm(verbal ~ number, data = Verbal))

Number of sunspots versus mean annual level of Lake Victoria Nyanza from 1902 to 1921

Description

Data for Exercise 2.98

Usage

Victoria

Format

A data frame/tibble with 20 observations on three variables

year: year
level: mean annual level of Lake Victoria Nyanza
sunspot: number of sunspots

Source

N. Shaw, Manual of Meteorology, Vol. 1 (London: Cambridge University Press, 1942), p. 284; and F. Mosteller and J. W. Tukey, Data Analysis and Regression (Reading, MA: Addison-Wesley, 1977).

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(level ~ sunspot, data = Victoria)
model <- lm(level ~ sunspot, data = Victoria)
summary(model)
rm(model)

Viscosity measurements of a substance on two different days

Description

Data for Exercise 7.44

Usage

Viscosit

Format

A data frame/tibble with 11 observations on two variables

first: viscosity measurement for a certain substance on day one
second: viscosity measurement for a certain substance on day two

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(Viscosit$first, Viscosit$second, col = "blue")
t.test(Viscosit$first, Viscosit$second, var.equal = TRUE)

Visual acuity of a group of subjects tested under a specified dose of a drug

Description

Data for Exercise 5.6

Usage

Visual

Format

A data frame/tibble with 18 observations on one variable

visual: visual acuity measurement

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


stem(Visual$visual)
boxplot(Visual$visual, col = "purple")

Reading scores before and after vocabulary training for 14 employees who did not complete high school

Description

Data for Exercise 7.80

Usage

Vocab

Format

A data frame/tibble with 14 observations on two variables

first: reading test score before formal vocabulary training
second: reading test score after formal vocabulary training

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


t.test(Pair(Vocab$first, Vocab$second) ~ 1)

Volume of injected waste water from Rocky Mountain Arsenal and number of earthquakes near Denver

Description

Data for Exercise 9.18

Usage

Wastewat

Format

A data frame/tibble with 44 observations on two variables

gallons: injected water (in million gallons)
number: number of earthqueakes detected in Denver

Source

Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2 ed., John Wiley and Sons, New York, p. 228, and Bardwell, G. E. (1970), Some Statistical Features of the Relationship between Rocky Mountain Arsenal Waste Disposal and Frequency of Earthquakes, Geological Society of America, Engineering Geology Case Histories, 8, 33-337.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(number ~ gallons, data = Wastewat)
model <- lm(number ~ gallons, data = Wastewat)
summary(model)
anova(model)
plot(model, which = 2)

Weather casualties in 1994

Description

Data for Exercise 1.30

Usage

Weather94

Format

A data frame/tibble with 388 observations on one variable

type: factor with levels Extreme Temp, Flash Flood, Fog, High Wind, Hurricane, Lighting, Other, River Flood, Thunderstorm, Tornado, and Winter Weather

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


T1 <- xtabs(~type, data = Weather94)
T1
par(mar = c(5.1 + 2, 4.1 - 1, 4.1 - 2, 2.1))
barplot(sort(T1, decreasing = TRUE), las = 2, col = rainbow(11))
par(mar = c(5.1, 4.1, 4.1, 2.1))
## Not run: 
library(ggplot2)
T2 <- as.data.frame(T1)
T2
ggplot2::ggplot(data =T2, aes(x = reorder(type, Freq), y = Freq)) + 
           geom_bar(stat = "identity", fill = "purple") +
           theme_bw() + 
           theme(axis.text.x  = element_text(angle = 55, vjust = 0.5)) + 
           labs(x = "", y = "count")

## End(Not run)

Price of a bushel of wheat versus the national weekly earnings of production workers

Description

Data for Exercise 2.11

Usage

Wheat

Format

A data frame/tibble with 19 observations on three variables

year: year
earnings: national weekly earnings (in dollars) for production workers
price: price for a bushel of wheat (in dollars)

Source

The World Almanac and Book of Facts, 2000.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


par(mfrow = c(1, 2))
plot(earnings ~ year, data = Wheat)
plot(price ~ year, data = Wheat)
par(mfrow = c(1, 1))

Direct current produced by different wind velocities

Description

Data for Exercise 9.34

Usage

Windmill

Format

A data frame/tibble with 25 observations on two variables

velocity: wind velocity (miles per hour)
output: power generated (DC volts)

Source

Joglekar, et al. (1989), Lack of Fit Testing when Replicates Are Not Available, The American Statistician, 43,(3), 135-143.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


summary(lm(output ~ velocity, data = Windmill))
anova(lm(output ~ velocity, data = Windmill))

Wind leakage for storm windows exposed to a 50 mph wind

Description

Data for Exercise 6.54

Usage

Window

Format

A data frame/tibble with nine observations on two variables

window: window number
leakage: percent leakage from a 50 mph wind

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


SIGN.test(Window$leakage, md = 0.125, alternative = "greater")

Baseball team wins versus seven independent variables for National league teams in 1990

Description

Data for Exercise 9.23

Usage

Wins

Format

A data frame with 12 observations on nine variables

team: name of team
wins: number of wins
batavg: batting average
rbi: runs batted in
stole: bases stole
strkout: number of strikeots
caught: number of times caught stealing
errors: number of errors
era: earned run average

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(wins ~ era, data = Wins)
## Not run: 
library(ggplot2)
ggplot2::ggplot(data = Wins, aes(x = era, y = wins)) + 
           geom_point() + 
           geom_smooth(method = "lm", se = FALSE) + 
           theme_bw()

## End(Not run)

Strength tests of two types of wool fabric

Description

Data for Exercise 7.42

Usage

Wool

Format

A data frame/tibble with 20 observations on two variables

type: type of wool (Type I, Type 2)
strength: strength of wool

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


boxplot(strength ~ type, data = Wool, col = c("blue", "purple"))
t.test(strength ~ type, data = Wool, var.equal = TRUE)

Monthly sunspot activity from 1974 to 2000

Description

Data for Exercise 2.7

Usage

Yearsunspot

Format

A data frame/tibble with 252 observations on two variables

number: average number of sunspots
year: date

Source

NASA/Marshall Space Flight Center, Huntsville, AL 35812.

References

Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.

Examples


plot(number ~ year, data = Yearsunspot)

Z-test

Description

This function is based on the standard normal distribution and creates confidence intervals and tests hypotheses for both one and two sample problems.

Usage

z.test(
  x,
  y = NULL,
  alternative = "two.sided",
  mu = 0,
  sigma.x = NULL,
  sigma.y = NULL,
  conf.level = 0.95
)

Arguments

x

numeric vector; NAs and Infs are allowed but will be removed.

y

numeric vector; NAs and Infs are allowed but will be removed.

alternative

character string, one of "greater", "less" or "two.sided", or the initial letter of each, indicating the specification of the alternative hypothesis. For one-sample tests, alternative refers to the true mean of the parent population in relation to the hypothesized value mu. For the standard two-sample tests, alternative refers to the difference between the true population mean for x and that for y, in relation to mu.

mu

a single number representing the value of the mean or difference in means specified by the null hypothesis

sigma.x

a single number representing the population standard deviation for x

sigma.y

a single number representing the population standard deviation for y

conf.level

confidence level for the returned confidence interval, restricted to lie between zero and one

Details

If y is NULL, a one-sample z-test is carried out with x. If y is not NULL, a standard two-sample z-test is performed.

Value

A list of class htest, containing the following components:

statistic

the z-statistic, with names attribute "z"

p.value

the p-value for the test

conf.int

estimate

vector of length 1 or 2, giving the sample mean(s) or mean of differences; these estimate the corresponding population parameters. Component estimate has a names attribute describing its elements.

null.value

is the value of the mean or difference in means specified by the null hypothesis. This equals the input argument mu. Component null.value has a names attribute describing its elements.

alternative

records the value of the input argument alternative: "greater", "less" or "two.sided".

data.name

a character string (vector of length 1) containing the actual names of the input vectors x and y

Null Hypothesis

For the one-sample z-test, the null hypothesis is that the mean of the population from which x is drawn is mu. For the standard two-sample z-tests, the null hypothesis is that the population mean for x less that for y is mu.

Author(s)

Alan T. Arnholt

References

Kitchens, L.J. (2003). Basic Statistics and Data Analysis. Duxbury.

Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.

Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.

Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.

Examples


x <- rnorm(12)
z.test(x,sigma.x=1)
        # Two-sided one-sample z-test where the assumed value for
        # sigma.x is one. The null hypothesis is that the population
        # mean for 'x' is zero. The alternative hypothesis states
        # that it is either greater or less than zero. A confidence
        # interval for the population mean will be computed.

x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8)
y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5., 4.1, 5.5)
z.test(x, sigma.x=0.5, y, sigma.y=0.5, mu=2)
        # Two-sided standard two-sample z-test where both sigma.x
        # and sigma.y are both assumed to equal 0.5. The null hypothesis
        # is that the population mean for 'x' less that for 'y' is 2.
        # The alternative hypothesis is that this difference is not 2.
        # A confidence interval for the true difference will be computed.

z.test(x, sigma.x=0.5, y, sigma.y=0.5, conf.level=0.90)
        # Two-sided standard two-sample z-test where both sigma.x and
        # sigma.y are both assumed to equal 0.5. The null hypothesis
        # is that the population mean for 'x' less that for 'y' is zero.
        # The alternative hypothesis is that this difference is not
        # zero.  A 90% confidence interval for the true difference will
        # be computed.
rm(x, y)

Summarized z-test

Description

This function is based on the standard normal distribution and creates confidence intervals and tests hypotheses for both one and two sample problems based on summarized information the user passes to the function. Output is identical to that produced with z.test.

Usage

zsum.test(
  mean.x,
  sigma.x = NULL,
  n.x = NULL,
  mean.y = NULL,
  sigma.y = NULL,
  n.y = NULL,
  alternative = "two.sided",
  mu = 0,
  conf.level = 0.95
)

Arguments

mean.x

a single number representing the sample mean of x

sigma.x

a single number representing the population standard deviation for x

n.x

a single number representing the sample size for x

mean.y

a single number representing the sample mean of y

sigma.y

a single number representing the population standard deviation for y

n.y

a single number representing the sample size for y

alternative

is a character string, one of "greater", "less" or "two.sided", or the initial letter of each, indicating the specification of the alternative hypothesis. For one-sample tests, alternative refers to the true mean of the parent population in relation to the hypothesized value mu. For the standard two-sample tests, alternative refers to the difference between the true population mean for x and that for y, in relation to mu.

mu

a single number representing the value of the mean or difference in means specified by the null hypothesis

conf.level

confidence level for the returned confidence interval, restricted to lie between zero and one

Details

If y is NULL , a one-sample z-test is carried out with x . If y is not NULL, a standard two-sample z-test is performed.

Value

A list of class htest, containing the following components:

statistic

the z-statistic, with names attribute z.

p.value

the p-value for the test

conf.int

estimate

vector of length 1 or 2, giving the sample mean(s) or mean of differences; these estimate the corresponding population parameters. Component estimate has a names attribute describing its elements.

null.value

the value of the mean or difference in means specified by the null hypothesis. This equals the input argument mu. Component null.value has a names attribute describing its elements.

alternative

records the value of the input argument alternative: "greater" , "less" or "two.sided".

data.name

a character string (vector of length 1) containing the names x and y for the two summarized samples

Null Hypothesis

The alternative hypothesis in each case indicates the direction of divergence of the population mean for x (or difference of means of x and y) from mu (i.e., "greater" , "less", "two.sided" ).

Author(s)

Alan T. Arnholt

References

Kitchens, L. J. (2003). Basic Statistics and Data Analysis. Duxbury.

Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.

Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.

Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.

Examples


zsum.test(mean.x=56/30,sigma.x=2, n.x=30, alternative="greater", mu=1.8)
        # Example 9.7 part a. from PASWR.
x <- rnorm(12)
zsum.test(mean(x),sigma.x=1,n.x=12)
        # Two-sided one-sample z-test where the assumed value for
        # sigma.x is one. The null hypothesis is that the population
        # mean for 'x' is zero. The alternative hypothesis states
        # that it is either greater or less than zero. A confidence
        # interval for the population mean will be computed.
        # Note: returns same answer as:
z.test(x,sigma.x=1)
        #
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8)
y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5)
zsum.test(mean(x), sigma.x=0.5, n.x=11 ,mean(y), sigma.y=0.5, n.y=8, mu=2)
        # Two-sided standard two-sample z-test where both sigma.x
        # and sigma.y are both assumed to equal 0.5. The null hypothesis
        # is that the population mean for 'x' less that for 'y' is 2.
        # The alternative hypothesis is that this difference is not 2.
        # A confidence interval for the true difference will be computed.
        # Note: returns same answer as:
z.test(x, sigma.x=0.5, y, sigma.y=0.5)
        #
zsum.test(mean(x), sigma.x=0.5, n.x=11, mean(y), sigma.y=0.5, n.y=8,
conf.level=0.90)
        # Two-sided standard two-sample z-test where both sigma.x and
        # sigma.y are both assumed to equal 0.5. The null hypothesis
        # is that the population mean for 'x' less that for 'y' is zero.
        # The alternative hypothesis is that this difference is not
        # zero.  A 90% confidence interval for the true difference will
        # be computed.  Note: returns same answer as:
z.test(x, sigma.x=0.5, y, sigma.y=0.5, conf.level=0.90)
rm(x, y)