成功案例设置

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580

ECON20003 – QUANTITATIVE METHODS 2

One of the major measures of the quality of service provided by any organisation is the
speed with which the organisation responds to customer complaints. Last year the flooring
department of a large family-owned department store received 50 complaints about carpet
installation. The following data represent the number of days between the receipt and
resolution of these complaints.

Days
54 35 29 2 1
11 126 4 35 26
12 165 27 26 74
13 5 29 22 26
33 137 28 123 14
5 110 52 94 20
19 32 152 25 27
4 27 61 36 5
10 31 29 81 13
68 110 30 31 23

a) Is the variable Days qualitative or quantitative? If it is quantitative, is it discrete or
continuous? In addition, determine its level of measurement. Explain your answers.

The observations are numbers of days resulting from a counting process and the possible
values are non-negative integers. Therefore, Days is a quantitative variable, it is discrete
(countable infinite). The measurement scale is ratio since there is a unit of measurement
(day) and a genuine zero point (0 day).

b) Launch RStudio and close the Script tab, if it is open at all. Create a new RStudio project
and script, and name both t1e2.

Follow similar steps than in Exercise 1.

c) Enter the observations from your keyboard to an RStudio spreadsheet and save them
in an RData file. Quit RStudio. When prompted, save only the t1e2.R file.

Follow similar steps than in Exercise 1.

d) Open your working directory. Capture your screen by taking a screenshot (Alt + Print
Screen) and paste it with your answers for part (a) in a Word document.

ECON20003 – QUANTITATIVE METHODS 2

TUTORIAL 2

Solutions

Exercises for Assessment

Exercise 4

In this exercise you are going to work on the data you saved in Exercise 2 last week.

a) Launch RStudio and close the Script tab, if it is open. Create a new RStudio project and
script, and name both t2e4. Retrieve the t1e2 data set and save it as t2e4.RData.

You can complete these tasks by following similar steps than in Exercise 2 of Tutorial 2.

The variable of interest, Days, is a discrete quantitative variable. The data set is cross-
sectional and it can be displayed graphically with, for example, a histogram or a boxplot.

b) Use RStudio to illustrate the data on Days with a histogram. Customize your plot as you
did in Exercise 3. Briefly describe what the graph tells you.

A basic histogram is generated by the following command:

hist(Days)

In return, RStudio displays the first plot on the next page. It is black and white and looks a
bit strange because the axes are too short. However, it can be easily improved by adding a
few arguments:

hist(Days,
xlim = c(0,200), ylim = c(0, 25),
col = "yellow")

The new histogram is second on the next page.

These histograms show that the sample data of Days is heavily skewed to the right and that
the second class interval, from 20 to 40, has the highest frequency, 21.

c) Use RStudio to illustrate the data on Days with a boxplot and customize your plot. Briefly
describe what the graph tells you.

Use the boxplot(Days) command to develop a basic boxplot and then add a main title to it,
add the Days label to the vertical axis, and colour the rectangle on the boxplot red.

A basic boxplot is generated by the

boxplot(Days)

command:

To add the required customization, execute

boxplot(Days,
main = "Boxplot for Days",
ylab = "Days",
col = "red")

The new boxplot is on the next page.

It shows that in the sample of Days, (i) the median (Q2) is a bit above 25, (ii) the first quartile
(Q1) is about 30, (iii) the third quartile (Q3) is a bit above 50, (iv) Q1 – 1.5 (Q3 – Q1) is about
zero, (v) Q3 + 1.5 (Q3 – Q1) is about 110, and (vi) there are a few outliers at the upper end
of the range.1

1 Observations that differ greatly from the majority of the data set in the sense that they are either smaller than
Q1 – 1.5 (Q3 – Q1) or larger than Q3 + 1.5 (Q3 – Q1) are considered to be outliers.
Downloaded by Chen Jack ([email protected])
lOMoARcPSD|8583414

L. Kónya, 2020, Semester 2 ECON20003 – Solutions 2
4
Exercise 5

The table below details the number of international visitors (aged 15 years and over) to
Australia from its top 10 markets during the 2018/19 financial year by country of residence
(COR).2

Overseas arrivals (‘000) by
country of residence (COR)
COR Visitors
China 1331
Hong Kong 284
India 364
Japan 455
Korea 250
Malaysia 344
New Zealand 1276
Singapore 417
UK 670
US 771

a) There are two variables: Market and Visitors. Are they qualitative or quantitative,
discrete or continuous? Explain your answers.

COR is a qualitative variable as its possible values are names / labels. Visitors, i.e. the
number of international visitors aged 15 years to Australia, is a quantitative variable because
the possible values are numbers resulting from a counting process. Originally this variable
is discrete, and its possible values are non-negative integers, but the actual observations
have been rounded to the nearest thousand.

b) Launch RStudio, create a new RStudio project and script (t2e5), enter the observations
from your keyboard to an RStudio spreadsheet and save it as an RData file.

Follow similar steps than in Exercise 1 and Exercise 2 of tutorial 1.

c) Depict the number of visitors by country of residence market with a bar graph.3

Use the barplot(Visitors) command to develop a basic bar graph.

It returns the following plot:
d) Annotate your bar graph with axes labels Country of Residence (x-axis), Visitors to
Australia (y-axis) and with the Bar graph for Visitors to Australia title.

Review the application of the main, ylab and xlab arguments in Exercise 3.

The following command

3 Notice that this time a histogram would be inappropriate because the observations are classified by
categories (countries of origin) rather than adjacent class intervals.
0
20
0
60
0
10
00
Downloaded by Chen Jack ([email protected])
lOMoARcPSD|8583414

L. Kónya, 2020, Semester 2 ECON20003 – Solutions 2
6
barplot(Visitors,
main = "Bar graph of Visitors to Australia",
xlab = "Country of residence",
ylab = "Number of visitors")

returns

d) Increase the scale on the vertical axis to (0,1400) and colour the bars orange.

Review the application of the ylim and col arguments in Exercise 3.

The following command

barplot(Visitors,
main = "Bar graph of Visitors to Australia",
xlab = "Country of residence",
ylab = "Number of visitors",
ylim = c(0,1400),
col = “orange”)

returns the bar graph shown on the next page.

Bar graph of Visitors to Australia
Country of residence
Nu
mb
er
of
vis
ito
rs
0
20
0
60
0
10
00

e) To make the bar graph more informative, expand the barplot command with the
names.arg = COR and cex.names = 0.5 arguments.

The expanded command is

barplot(Visitors,
main = "Bar graph of Visitors to Australia",
xlab = "Country of residence",
ylab = "Number of visitors",
ylim = c(0,1400),
col = "orange",
names.arg = COR, cex.names = 0.5)

It returns the bar graph shown on the next page.

f) Briefly describe what the bar graph in part (e) tells you.

This bar graph shows that in 2018/19 the most tourists to Australia arrived from China,
followed by New Zealand, the US and the UK.
Bar graph of Visitors to Australia
Country of residence
Nu
mb
er
of
vis
ito
rs
0
20
0
60
0
10
00
14
00

Although it was not part of Exercise 5, there is one more thing worth to mention. To make
this bar graph even more informative, it is good idea to display the bars in the descending
order of their heights. Let’s do this in three steps.

First, we set up a data frame called original that consists of COR and Visitors by executing
the

original = data.frame(COR, Visitors)

command.

Second, we rearrange original in the descending order of Visitors and call the new data
frame ordered. The relevant command is

ordered = original[order(-original$Visitors),]

Third, we run the barplot command like in part (e), but on the ordered data frame, i.e.

barplot(ordered$Visitors,
main = "Bar graph of Visitors to Australia",
xlab = "Country of residence",
ylab = "Number of visitors",
ylim = c(0,1400),
col = "orange",
names.arg = ordered$COR, cex.names = 0.5)
China India Korea New Zealand UK US
Bar graph of Visitors to Australia
Country of residence
Nu
mb
er
of
vis
ito
rs
0
20
0
60
0
10
00
14
00

The new bar graph looks like this:

Nu
mb
er
of
vis
ito
rs
0
20
0
60
0
10
00
14
00

ECON20003 – QUANTITATIVE METHODS 2

TUTORIAL 3

Solutions

Exercises for Assessment

Exercise 6

A parking officer is conducting an analysis of the amount of time left on parking meters. A
quick survey of 15 cars that have just left their metered parking spaces produced the times
(T, in minutes) saved in the t3e6 Excel file. Assuming that the population of T is normally
distributed, estimate with 95% confidence the mean amount of time left for all the vacant
meters. Do the calculations first manually and then with R.

Since the population of T is said to be normally distributed and the population standard
deviation is unknown, the appropriate confidence interval estimator for the mean is

/2 xx t s

Using your hand calculator you can obtain the sample mean and the sample standard
deviation:

18.133 , 9.753x s 

From the sample standard deviation and the sample size the estimate of standard error of
the sample mean is

9.753 2.51815x
s
s
n
  

From the t-table the 97.5th percentile of the t distribution with df = n - 1 = 14 is 2.145.

Putting all these together,

 /2 18.133 2.145 2.518 12.732 ; 23.534xx t s    

Therefore, with 95% confidence, the mean amount of time left for all the vacant meters is
somewhere between 12.732 and 23.534 minutes.

To obtain this confidence interval in R, import the data to RStudio and execute the

t.test(T, mu = 0, conf.level = 0.95)

command, which returns:

The 95% confidence interval on this printout confirms our manual calculations.

Exercise 7 (Selvanathan, p. 499, ex. 12.41)

In this exercise do all calculations manually.

a) A random sample of eight observations was taken from a normal population. The sample
mean and standard deviation are 75 and 50, respectively. Can we infer at the 10%
significance level that the population mean is less than 100?

Just like in the previous exercise, we are interested in the mean of an allegedly normally
distributed population whose standard deviation is unknown. This time, however, instead of
developing a confidence interval to estimate the population mean, we need to perform a
hypothesis test. Let’s follow the six-step test procedure.

The hypotheses are1

0: 100 , : 100AH H  

The sample mean is normally distributed, but since its standard error must be estimated
from the sample, the test statistic is

0
x
X
T
s


Under the null hypothesis this test statistic has a t distribution with df = n – 1.

The significance level is 10% and the critical value for this left-tail test is

, 0.10,7 1.415dft t    

and we reject the null hypothesis if the calculated test static happens to be smaller than this
critical value.

The calculated or observed value of the test statistic is

1 It is easier to start with the alternative hypothesis because it is implied by the question. This is usually the
case, except when the implied statement takes the form of an equality that must be in the null hypothesis.

0 75 100 1.41450 / 8obs x
x
t
s
    

Since the observed value of the test statistic is (slightly) larger than the critical value (-1.415),
we maintain H0 and conclude that at the 10% level there is not enough evidence to infer that
the population mean is smaller than 100.

b) Repeat part (a) assuming that you know that the population standard deviation is 50.

If the population standard deviation is known and it is  = 50, then the test statistic is

0
x
X
Z




The critical value is

0.10 1.282z z    

Although the test statistic is different than in part (a), its calculated value is the same:

0 75 100 1.41450 / 8obs x
x
z


    

Since the observed value of the test statistic is smaller than the critical value (-1.282), we
reject H0 and conclude that at the 10% level there is enough evidence to infer that the
population mean is less than 100.

c) Review parts (a) and (b). Explain why the test statistics differ.

The tests in parts (a) and (b) led to different conclusions. This is due to the fact that in part
(a) we had to use the t distribution, while in part (b) we could use the standard normal
distribution. Both distributions are symmetric around zero, but the t distribution is more
dispersed than the standard normal distribution and hence the critical value in part (a) is
further from zero than in part (b).

Exercise 8

Environmental engineers have found that the percentages of active bacteria in sewage
specimens collected at a sewage treatment plant have a non-normal distribution with a
median of 40% when the plant is running properly. If the median is larger than 40%, some
adjustments must be made. The percentages of active bacteria (PAB) in a random sample
of 10 specimens are saved in the t3e8 Excel file. Do the data provide enough evidence (at
 = 0.05) to indicate that adjustments are needed?

a) What are the null and alternative hypotheses?

Unlike the previous exercise, this one is about a population median. The hypotheses are

0: 40 , : 40AH H  

b) Which test(s) can be used to answer this question? What are the required conditions?
Do you think that these conditions are likely satisfied this time? Explain your answer.

We learnt about two nonparametric tests that can be used this time, the one-sample sign
test for the median and the one-sample Wilcoxon signed ranks test for the median.

The sign test assumes that (i) the data is a random sample, (ii) the variable of interest is
qualitative or quantitative, and (iii) the measurement scale is at least ordinal. In this case we
are told that the sample at hand is a random sample. The variable of interest, PAB, is a
quantitative variable measured on a ratio scale. Hence, all three requirements are met.

The Wilcoxon signed ranks test assumes that (i) the data is a random sample, (ii) the
variable of interest is quantitative and continuous, (iii) the measurement scale is interval or
ratio, and (iv) the distribution of the sampled population is symmetric. The first three
requirements are clearly met. As for the fourth one, due to the small sample size it is difficult
to verify it. Let’s just assume at this stage that it is satisfied and see whether the Wilcoxon
signed ranks test leads to same conclusion as the sign test. If it does, then the issue of
symmetricity is irrelevant.

c) Perform the test(s) first manually and then with R. Explain your decision and conclusion.