SCIE4401 Atakelty Hailu University of Western Australia

    Prepare answers for all the questions provided in this assignment and submit your results as a single PDF
    le by the due date indicated in the LMS.

    Presentation quality
    Your submitted le should be easy to mark. Label your answers clearly using the same numbers as those
    included in the assignment instructions. Include in your answers only the content requested. If you have
    been asked to include R script or R output for a question, include the R script/output after proper for-
    matting (alignment, spacing, etc.). If your answer is a report, any gures and tables must be appropriately
    formatted, appropriately scaled and labelled sequentially as Figure 1, Figure 2, etc. The main body of the
    report should refer to gures and tables included.

    Marking matters
    For marking, both the correctness of your answers and the quality of the presentation matter. First, raw
    output from R should never be included in your answers. Second, use well formatted tables and gures that
    have been re ned and are of good quality. Third, written summaries and discussion should be done using
    properly constructed sentences (and paragraphs, where appropriate). The writing should be properly edited,
    easy to read and follow. A mark breakdown is provided for answers requested as reports in Appendix A1.
    Discussion about assignment questions

    If you are not sure on what techniques or commands to use, refer to material covered in lectures and lab
    sessions in previous weeks. If you are still not sure, post your questions on the Discussion Board for the unit.
    Students are encouraged to post and respond to questions about commands or methods on the Discussion
    Board. However, students should refrain from posting or sharing actual solutions or answers to problems in
    the assignment. They should also refrain from asking instructors to con rm the correctness of their solutions
    or answers before the due date; but they should feel free to seek advice on material already covered in lectures
    and lab sessions.

    1. Do the following exercises using appropriate non-parametric tests. In each case, provide your answers as
    properly written sentences indicating what your null and alternative hypotheses are, what R methods
    you used to implement the statistical tests, what your p-values are, and the conclusions you drew.
    Unless otherwise stated, assume the level of signi cance is 0.05 ( = 0:05).
    Keep your answers short. For example, you could just answer as follows: “A Mann-Whitney non-
    parametric test was implemented in R using the wilcox.test(sample1, sample2, alternative=”two.sided”)
    call to test the null hypothesis that men and women do not differ in the number of cups of coffee they
    drink (the alternative hypothesis is that there is a difference). The p-value for the test is 0.012. Thus,
    we reject the null and conclude that there is a difference in the amount of coffee consumed by men and
    women. [30 marks]
    a) An analyst has been gathering data on the performance of different equipment brands (A, B, C
    and D). Performance equipment is measured by the number of hours of continuous operation before
    a major breakdown occurs. Using the following data, test if the equipment brands have the same
    A: 2600,2379,2614,1906,2050,2474,2593,2520,1694,2566
    B: 2908,2767,2820,3004,2506,2895,2762,2719,2901,2588,2627,2404,3073,2195,3133,2560,2908,2674,2413,2751
    C: 2133,2388,2213,2384,2231,2540,2485,2345,2356,2482,2098,2073,2391,2413
    D: 2742,2821,2723,2465,2448,2491,2303,2212,2578,2634,2630
    b) Yield observations for 3 crop varieties (A, B & C) are as follows:
    Variety A: 914, 1048, 1351, 1353, 1275
    Variety B: 1963, 2052, 2291, 1755, 1674, 2464
    Variety C: 1095, 1811, 1869, 1472, 1143, 1638, 1703
    Test if the three varieties have the same yields.
    c) A clinic has the following information on white blood cell (WBC) counts (in billion cells per litre)
    for a group of Caucasian men and Caucasian women that visited in a particular day.
    Caucasian women: 5.54,5.78,5.89,5.84,6.93,2.43,4.53,3.93,6.05,4.41,6.5,4.29
    Caucasian men: 7.76,8.74,8.26,6.64,7.02,5.24,3.8,7.96
    Use the data to test if the two groups have the same WBC counts.
    d) Suppose you have the following WBC count for a sample of 6 black men: 7.54,6.76,5.2,4.76,3.27,4.42
    Test if there is a difference between the WBC counts for black and Caucasian men.
    e) Would your approach to the above test change if you knew that the scienti c literature shows that
    total WBC counts are lower for black people? Would that knowledge change the way you set up
    your hypothesis and conduct the statistical test? Explain. And if your answer is ‘yes’, do the test
    again with the new setup (new null and alternative hypothesis). (There is an article included in
    the folder on this topic that you can quickly skim through (Coates et al. 2020) to learn about the
    type of WBCs that are lower in one race than another and about possible explanations. You can
    check the medical dictionary here for basic information on WBC types and their functions.)

    2. Use the data on human height described in the Appendix (C1) to do the following exercises. Read the
    description of the data in before you proceed.[30 marks]
    a) Summary of female height increases (15 marks)
    Suppose our interest is in understanding the percent increases in adult female height between 1896
    and 1996 and how these increases vary between continents. The following table presents summary
    statistics on female height and its growth over the century, by continent and for the whole world.
    The data shown in columns are: number of countries included from the continent (n); average adult
    height for females born in 1996 (cm.1996); average percent increase (pct.mean); minimum increase
    (pct.min); maximum increase (pct.max); standard deviation of the increase (; and coefficient
    of variation for the increase(
    i) Examine the statistics in the table and summarise your observations about current female
    height and its change between 1896 and 1996. Limit your answer to between 175 and 200
    words. Answers that are longer or shorter will attract penalties.
    ii) Write a script to generate in R a data frame with all the summary data shown in the table. The
    script should start by reading in the le with height data and then proceed to calculate and
    organise all the summary statistics. A stub is provided below for you as guidance on how you
    could set up your script. The marker will copy and paste your script for evaluation. Therefore,
    you should execute it to con rm that it works as intended.

