+ - 0:00:00
Notes for current slide
Notes for next slide

EES 211 2.0 Data Analysis and Modeling

(Department of Physics, USJ)

Dr Thiyanga Talagala

Department of Statistics, University of Sri Jayewardenepura

1

What is Statistics?

2

What is Statistics?

The science of collecting, analyzing, presenting, and interpreting data.

3

What is Statistics?

The science of collecting, analyzing, presenting, and interpreting data.

4

5

6

Tabular data

7
8

Tabular data

Image data

9

Tabular data

Audio data

Image data

10

Tabular data

Audio data

Image data

Video data

11

What is Statistics?

The science of collecting, analyzing, presenting, and interpreting data.

12

What is Statistics?

The science of collecting, analyzing, presenting, and interpreting data.

13

Data analysis

14
ID Gender A B C Weight
1 1 Male 80.0 3.6 2.5 4000
2 2 Female 90.0 2.5 6.3 5000
3 3 Female 110.0 4.0 4.5 6000
4 4 Female 100.0 4.5 3.2 7000
5 5 Female 91.5 3.0 3.5 7550
6 6 Male 92.0 3.9 3.7 4500
7 7 Male 88.0 4.2 3.8 3375
8 8 Male 70.0 4.6 3.9 5500
9 9 Female 100.5 2.9 1.2 2975
10 10 Female 99.8 3.7 4.2 3784
11 11 Female 101.5 2.7 2.2 2766
12 12 Female 99.2 5.1 4.4 5196
13 13 Female 99.6 3.8 3.9 3928
14 14 Female 99.1 2.7 3.1 2779
15 15 Female 99.8 4.0 4.8 4086
16 16 Female 100.4 4.2 2.8 4338
17 17 Female 99.3 4.1 2.9 4175
18 18 Female 100.8 3.2 2.5 3295
19 19 Female 98.8 5.0 2.8 5111
20 20 Female 99.0 3.9 4.4 3990
21 21 Female 101.4 2.9 3.9 2978
22 22 Female 99.0 1.3 4.5 1387
23 23 Female 100.4 4.6 3.1 4726
24 24 Female 99.6 3.5 3.9 3556
25 25 Female 100.4 3.5 3.7 3584
26 26 Female 101.7 4.4 2.1 4546
27 27 Female 101.6 4.3 5.3 4421
28 28 Female 99.7 4.1 3.6 4193
29 29 Female 97.7 4.4 4.3 4520
30 30 Female 102.5 4.3 4.5 4381
31 31 Female 100.7 3.6 3.4 3674
32 32 Female 100.5 1.5 3.2 1610
33 33 Female 100.0 4.1 4.4 4220
34 34 Female 100.5 3.4 2.5 3544
35 35 Female 99.8 3.3 5.5 3445
36 36 Female 100.4 2.0 3.1 2129
37 37 Female 99.6 3.0 5.2 3121
38 38 Female 98.6 3.9 5.0 4019
39 39 Female 101.0 4.9 3.6 4958
40 40 Female 101.5 3.4 4.1 3497
41 41 Female 99.7 3.9 2.5 3988
42 42 Female 98.7 3.4 3.8 3547
43 43 Female 100.6 2.1 4.5 2223
44 44 Female 100.0 3.1 3.6 3185
45 45 Female 98.3 3.1 3.0 3205
46 46 Female 100.0 3.4 2.8 3540
47 47 Female 99.4 4.6 3.5 4700
48 48 Female 99.7 4.3 4.6 4363
49 49 Female 98.8 3.3 3.0 3436
50 50 Female 101.8 3.2 3.4 3345
51 51 Female 99.7 4.2 2.2 4297
52 52 Female 98.4 4.1 4.0 4155
53 53 Female 100.2 2.8 4.8 2911
54 54 Female 100.3 2.8 5.0 2892
55 55 Male 99.0 3.4 4.3 5146
56 56 Male 97.1 3.8 1.6 5753
57 57 Male 99.4 2.9 4.0 4430
58 58 Male 100.6 3.9 4.0 5923
59 59 Male 99.9 3.4 3.1 5195
60 60 Male 99.9 2.4 3.7 3681
61 61 Male 100.6 3.3 2.6 5111
62 62 Male 98.8 1.9 4.2 2905
63 63 Male 101.1 4.4 3.2 6752
64 64 Male 100.0 5.0 1.9 7571
65 65 Male 100.7 2.6 3.1 4048
66 66 Male 101.0 2.0 4.9 3032
67 67 Male 100.2 3.6 3.2 5455
68 68 Male 99.1 2.9 4.2 4397
69 69 Male 101.2 5.4 4.4 8202
70 70 Male 98.0 3.0 3.5 4540
71 71 Male 99.5 3.7 3.1 5633
72 72 Male 99.7 3.0 3.0 4641
73 73 Male 99.8 2.3 4.2 3486
74 74 Male 101.0 3.2 2.4 4883
75 75 Male 100.1 1.2 3.7 1891
76 76 Male 100.4 4.5 3.2 6800
77 77 Male 99.9 3.2 1.2 4830
78 78 Male 99.8 5.2 2.1 7859
79 79 Male 100.7 3.5 4.4 5314
80 80 Male 101.1 2.3 3.3 3536
81 81 Male 97.6 3.6 4.3 5515
82 82 Male 100.6 2.1 5.4 3201
83 83 Male 100.4 1.7 5.0 2719
84 84 Male 99.6 3.3 4.2 5036
85 85 Male 101.0 2.6 3.9 3935
86 86 Male 99.6 3.0 3.3 4602
87 87 Male 99.7 3.1 5.1 4714
88 88 Male 100.9 2.4 4.1 3716
89 89 Male 101.7 2.4 2.3 3747
90 90 Male 100.3 2.9 3.3 4397
91 91 Male 99.6 4.2 1.6 6367
92 92 Male 98.8 1.5 3.3 2315
93 93 Male 99.7 3.6 0.9 5492
94 94 Male 99.1 3.3 4.8 5102
95 95 Male 99.7 4.1 2.9 6196
96 96 Male 100.4 2.7 3.1 4145
97 97 Male 99.1 3.4 3.3 5154
98 98 Male 102.6 3.3 4.1 5002
99 99 Male 100.2 2.5 4.2 3786
100 100 Male 101.1 4.2 4.1 6410
15
ID Gender A B C Weight
1 1 Male 80.0 3.6 2.5 4000
2 2 Female 90.0 2.5 6.3 5000
3 3 Female 110.0 4.0 4.5 6000
4 4 Female 100.0 4.5 3.2 7000
5 5 Female 91.5 3.0 3.5 7550
6 6 Male 92.0 3.9 3.7 4500
7 7 Male 88.0 4.2 3.8 3375
8 8 Male 70.0 4.6 3.9 5500
9 9 Female 100.5 2.9 1.2 2975
10 10 Female 99.8 3.7 4.2 3784
11 11 Female 101.5 2.7 2.2 2766
12 12 Female 99.2 5.1 4.4 5196
13 13 Female 99.6 3.8 3.9 3928
14 14 Female 99.1 2.7 3.1 2779
15 15 Female 99.8 4.0 4.8 4086
16 16 Female 100.4 4.2 2.8 4338
17 17 Female 99.3 4.1 2.9 4175
18 18 Female 100.8 3.2 2.5 3295
19 19 Female 98.8 5.0 2.8 5111
20 20 Female 99.0 3.9 4.4 3990
21 21 Female 101.4 2.9 3.9 2978
22 22 Female 99.0 1.3 4.5 1387
23 23 Female 100.4 4.6 3.1 4726
24 24 Female 99.6 3.5 3.9 3556
25 25 Female 100.4 3.5 3.7 3584
26 26 Female 101.7 4.4 2.1 4546
27 27 Female 101.6 4.3 5.3 4421
28 28 Female 99.7 4.1 3.6 4193
29 29 Female 97.7 4.4 4.3 4520
30 30 Female 102.5 4.3 4.5 4381
31 31 Female 100.7 3.6 3.4 3674
32 32 Female 100.5 1.5 3.2 1610
33 33 Female 100.0 4.1 4.4 4220
34 34 Female 100.5 3.4 2.5 3544
35 35 Female 99.8 3.3 5.5 3445
36 36 Female 100.4 2.0 3.1 2129
37 37 Female 99.6 3.0 5.2 3121
38 38 Female 98.6 3.9 5.0 4019
39 39 Female 101.0 4.9 3.6 4958
40 40 Female 101.5 3.4 4.1 3497
41 41 Female 99.7 3.9 2.5 3988
42 42 Female 98.7 3.4 3.8 3547
43 43 Female 100.6 2.1 4.5 2223
44 44 Female 100.0 3.1 3.6 3185
45 45 Female 98.3 3.1 3.0 3205
46 46 Female 100.0 3.4 2.8 3540
47 47 Female 99.4 4.6 3.5 4700
48 48 Female 99.7 4.3 4.6 4363
49 49 Female 98.8 3.3 3.0 3436
50 50 Female 101.8 3.2 3.4 3345
51 51 Female 99.7 4.2 2.2 4297
52 52 Female 98.4 4.1 4.0 4155
53 53 Female 100.2 2.8 4.8 2911
54 54 Female 100.3 2.8 5.0 2892
55 55 Male 99.0 3.4 4.3 5146
56 56 Male 97.1 3.8 1.6 5753
57 57 Male 99.4 2.9 4.0 4430
58 58 Male 100.6 3.9 4.0 5923
59 59 Male 99.9 3.4 3.1 5195
60 60 Male 99.9 2.4 3.7 3681
61 61 Male 100.6 3.3 2.6 5111
62 62 Male 98.8 1.9 4.2 2905
63 63 Male 101.1 4.4 3.2 6752
64 64 Male 100.0 5.0 1.9 7571
65 65 Male 100.7 2.6 3.1 4048
66 66 Male 101.0 2.0 4.9 3032
67 67 Male 100.2 3.6 3.2 5455
68 68 Male 99.1 2.9 4.2 4397
69 69 Male 101.2 5.4 4.4 8202
70 70 Male 98.0 3.0 3.5 4540
71 71 Male 99.5 3.7 3.1 5633
72 72 Male 99.7 3.0 3.0 4641
73 73 Male 99.8 2.3 4.2 3486
74 74 Male 101.0 3.2 2.4 4883
75 75 Male 100.1 1.2 3.7 1891
76 76 Male 100.4 4.5 3.2 6800
77 77 Male 99.9 3.2 1.2 4830
78 78 Male 99.8 5.2 2.1 7859
79 79 Male 100.7 3.5 4.4 5314
80 80 Male 101.1 2.3 3.3 3536
81 81 Male 97.6 3.6 4.3 5515
82 82 Male 100.6 2.1 5.4 3201
83 83 Male 100.4 1.7 5.0 2719
84 84 Male 99.6 3.3 4.2 5036
85 85 Male 101.0 2.6 3.9 3935
86 86 Male 99.6 3.0 3.3 4602
87 87 Male 99.7 3.1 5.1 4714
88 88 Male 100.9 2.4 4.1 3716
89 89 Male 101.7 2.4 2.3 3747
90 90 Male 100.3 2.9 3.3 4397
91 91 Male 99.6 4.2 1.6 6367
92 92 Male 98.8 1.5 3.3 2315
93 93 Male 99.7 3.6 0.9 5492
94 94 Male 99.1 3.3 4.8 5102
95 95 Male 99.7 4.1 2.9 6196
96 96 Male 100.4 2.7 3.1 4145
97 97 Male 99.1 3.4 3.3 5154
98 98 Male 102.6 3.3 4.1 5002
99 99 Male 100.2 2.5 4.2 3786
100 100 Male 101.1 4.2 4.1 6410

16
ID Gender A B C Weight
1 1 Male 80.0 3.6 2.5 4000
2 2 Female 90.0 2.5 6.3 5000
3 3 Female 110.0 4.0 4.5 6000
4 4 Female 100.0 4.5 3.2 7000
5 5 Female 91.5 3.0 3.5 7550
6 6 Male 92.0 3.9 3.7 4500
7 7 Male 88.0 4.2 3.8 3375
8 8 Male 70.0 4.6 3.9 5500
9 9 Female 100.5 2.9 1.2 2975
10 10 Female 99.8 3.7 4.2 3784
11 11 Female 101.5 2.7 2.2 2766
12 12 Female 99.2 5.1 4.4 5196
13 13 Female 99.6 3.8 3.9 3928
14 14 Female 99.1 2.7 3.1 2779
15 15 Female 99.8 4.0 4.8 4086
16 16 Female 100.4 4.2 2.8 4338
17 17 Female 99.3 4.1 2.9 4175
18 18 Female 100.8 3.2 2.5 3295
19 19 Female 98.8 5.0 2.8 5111
20 20 Female 99.0 3.9 4.4 3990
21 21 Female 101.4 2.9 3.9 2978
22 22 Female 99.0 1.3 4.5 1387
23 23 Female 100.4 4.6 3.1 4726
24 24 Female 99.6 3.5 3.9 3556
25 25 Female 100.4 3.5 3.7 3584
26 26 Female 101.7 4.4 2.1 4546
27 27 Female 101.6 4.3 5.3 4421
28 28 Female 99.7 4.1 3.6 4193
29 29 Female 97.7 4.4 4.3 4520
30 30 Female 102.5 4.3 4.5 4381
31 31 Female 100.7 3.6 3.4 3674
32 32 Female 100.5 1.5 3.2 1610
33 33 Female 100.0 4.1 4.4 4220
34 34 Female 100.5 3.4 2.5 3544
35 35 Female 99.8 3.3 5.5 3445
36 36 Female 100.4 2.0 3.1 2129
37 37 Female 99.6 3.0 5.2 3121
38 38 Female 98.6 3.9 5.0 4019
39 39 Female 101.0 4.9 3.6 4958
40 40 Female 101.5 3.4 4.1 3497
41 41 Female 99.7 3.9 2.5 3988
42 42 Female 98.7 3.4 3.8 3547
43 43 Female 100.6 2.1 4.5 2223
44 44 Female 100.0 3.1 3.6 3185
45 45 Female 98.3 3.1 3.0 3205
46 46 Female 100.0 3.4 2.8 3540
47 47 Female 99.4 4.6 3.5 4700
48 48 Female 99.7 4.3 4.6 4363
49 49 Female 98.8 3.3 3.0 3436
50 50 Female 101.8 3.2 3.4 3345
51 51 Female 99.7 4.2 2.2 4297
52 52 Female 98.4 4.1 4.0 4155
53 53 Female 100.2 2.8 4.8 2911
54 54 Female 100.3 2.8 5.0 2892
55 55 Male 99.0 3.4 4.3 5146
56 56 Male 97.1 3.8 1.6 5753
57 57 Male 99.4 2.9 4.0 4430
58 58 Male 100.6 3.9 4.0 5923
59 59 Male 99.9 3.4 3.1 5195
60 60 Male 99.9 2.4 3.7 3681
61 61 Male 100.6 3.3 2.6 5111
62 62 Male 98.8 1.9 4.2 2905
63 63 Male 101.1 4.4 3.2 6752
64 64 Male 100.0 5.0 1.9 7571
65 65 Male 100.7 2.6 3.1 4048
66 66 Male 101.0 2.0 4.9 3032
67 67 Male 100.2 3.6 3.2 5455
68 68 Male 99.1 2.9 4.2 4397
69 69 Male 101.2 5.4 4.4 8202
70 70 Male 98.0 3.0 3.5 4540
71 71 Male 99.5 3.7 3.1 5633
72 72 Male 99.7 3.0 3.0 4641
73 73 Male 99.8 2.3 4.2 3486
74 74 Male 101.0 3.2 2.4 4883
75 75 Male 100.1 1.2 3.7 1891
76 76 Male 100.4 4.5 3.2 6800
77 77 Male 99.9 3.2 1.2 4830
78 78 Male 99.8 5.2 2.1 7859
79 79 Male 100.7 3.5 4.4 5314
80 80 Male 101.1 2.3 3.3 3536
81 81 Male 97.6 3.6 4.3 5515
82 82 Male 100.6 2.1 5.4 3201
83 83 Male 100.4 1.7 5.0 2719
84 84 Male 99.6 3.3 4.2 5036
85 85 Male 101.0 2.6 3.9 3935
86 86 Male 99.6 3.0 3.3 4602
87 87 Male 99.7 3.1 5.1 4714
88 88 Male 100.9 2.4 4.1 3716
89 89 Male 101.7 2.4 2.3 3747
90 90 Male 100.3 2.9 3.3 4397
91 91 Male 99.6 4.2 1.6 6367
92 92 Male 98.8 1.5 3.3 2315
93 93 Male 99.7 3.6 0.9 5492
94 94 Male 99.1 3.3 4.8 5102
95 95 Male 99.7 4.1 2.9 6196
96 96 Male 100.4 2.7 3.1 4145
97 97 Male 99.1 3.4 3.3 5154
98 98 Male 102.6 3.3 4.1 5002
99 99 Male 100.2 2.5 4.2 3786
100 100 Male 101.1 4.2 4.1 6410

17
ID Gender A B C Weight
1 1 Male 80.0 3.6 2.5 4000
2 2 Female 90.0 2.5 6.3 5000
3 3 Female 110.0 4.0 4.5 6000
4 4 Female 100.0 4.5 3.2 7000
5 5 Female 91.5 3.0 3.5 7550
6 6 Male 92.0 3.9 3.7 4500
7 7 Male 88.0 4.2 3.8 3375
8 8 Male 70.0 4.6 3.9 5500
9 9 Female 100.5 2.9 1.2 2975
10 10 Female 99.8 3.7 4.2 3784
11 11 Female 101.5 2.7 2.2 2766
12 12 Female 99.2 5.1 4.4 5196
13 13 Female 99.6 3.8 3.9 3928
14 14 Female 99.1 2.7 3.1 2779
15 15 Female 99.8 4.0 4.8 4086
16 16 Female 100.4 4.2 2.8 4338
17 17 Female 99.3 4.1 2.9 4175
18 18 Female 100.8 3.2 2.5 3295
19 19 Female 98.8 5.0 2.8 5111
20 20 Female 99.0 3.9 4.4 3990
21 21 Female 101.4 2.9 3.9 2978
22 22 Female 99.0 1.3 4.5 1387
23 23 Female 100.4 4.6 3.1 4726
24 24 Female 99.6 3.5 3.9 3556
25 25 Female 100.4 3.5 3.7 3584
26 26 Female 101.7 4.4 2.1 4546
27 27 Female 101.6 4.3 5.3 4421
28 28 Female 99.7 4.1 3.6 4193
29 29 Female 97.7 4.4 4.3 4520
30 30 Female 102.5 4.3 4.5 4381
31 31 Female 100.7 3.6 3.4 3674
32 32 Female 100.5 1.5 3.2 1610
33 33 Female 100.0 4.1 4.4 4220
34 34 Female 100.5 3.4 2.5 3544
35 35 Female 99.8 3.3 5.5 3445
36 36 Female 100.4 2.0 3.1 2129
37 37 Female 99.6 3.0 5.2 3121
38 38 Female 98.6 3.9 5.0 4019
39 39 Female 101.0 4.9 3.6 4958
40 40 Female 101.5 3.4 4.1 3497
41 41 Female 99.7 3.9 2.5 3988
42 42 Female 98.7 3.4 3.8 3547
43 43 Female 100.6 2.1 4.5 2223
44 44 Female 100.0 3.1 3.6 3185
45 45 Female 98.3 3.1 3.0 3205
46 46 Female 100.0 3.4 2.8 3540
47 47 Female 99.4 4.6 3.5 4700
48 48 Female 99.7 4.3 4.6 4363
49 49 Female 98.8 3.3 3.0 3436
50 50 Female 101.8 3.2 3.4 3345
51 51 Female 99.7 4.2 2.2 4297
52 52 Female 98.4 4.1 4.0 4155
53 53 Female 100.2 2.8 4.8 2911
54 54 Female 100.3 2.8 5.0 2892
55 55 Male 99.0 3.4 4.3 5146
56 56 Male 97.1 3.8 1.6 5753
57 57 Male 99.4 2.9 4.0 4430
58 58 Male 100.6 3.9 4.0 5923
59 59 Male 99.9 3.4 3.1 5195
60 60 Male 99.9 2.4 3.7 3681
61 61 Male 100.6 3.3 2.6 5111
62 62 Male 98.8 1.9 4.2 2905
63 63 Male 101.1 4.4 3.2 6752
64 64 Male 100.0 5.0 1.9 7571
65 65 Male 100.7 2.6 3.1 4048
66 66 Male 101.0 2.0 4.9 3032
67 67 Male 100.2 3.6 3.2 5455
68 68 Male 99.1 2.9 4.2 4397
69 69 Male 101.2 5.4 4.4 8202
70 70 Male 98.0 3.0 3.5 4540
71 71 Male 99.5 3.7 3.1 5633
72 72 Male 99.7 3.0 3.0 4641
73 73 Male 99.8 2.3 4.2 3486
74 74 Male 101.0 3.2 2.4 4883
75 75 Male 100.1 1.2 3.7 1891
76 76 Male 100.4 4.5 3.2 6800
77 77 Male 99.9 3.2 1.2 4830
78 78 Male 99.8 5.2 2.1 7859
79 79 Male 100.7 3.5 4.4 5314
80 80 Male 101.1 2.3 3.3 3536
81 81 Male 97.6 3.6 4.3 5515
82 82 Male 100.6 2.1 5.4 3201
83 83 Male 100.4 1.7 5.0 2719
84 84 Male 99.6 3.3 4.2 5036
85 85 Male 101.0 2.6 3.9 3935
86 86 Male 99.6 3.0 3.3 4602
87 87 Male 99.7 3.1 5.1 4714
88 88 Male 100.9 2.4 4.1 3716
89 89 Male 101.7 2.4 2.3 3747
90 90 Male 100.3 2.9 3.3 4397
91 91 Male 99.6 4.2 1.6 6367
92 92 Male 98.8 1.5 3.3 2315
93 93 Male 99.7 3.6 0.9 5492
94 94 Male 99.1 3.3 4.8 5102
95 95 Male 99.7 4.1 2.9 6196
96 96 Male 100.4 2.7 3.1 4145
97 97 Male 99.1 3.4 3.3 5154
98 98 Male 102.6 3.3 4.1 5002
99 99 Male 100.2 2.5 4.2 3786
100 100 Male 101.1 4.2 4.1 6410
123452000400060008000
GenderFemaleMaleScatterplot of weight (kg) vs B (ft)B (ft)Weight (kg)
18
19

What is Statistics?

The science of collecting, analyzing, presenting, and interpreting data.

20

Myth-busting Statistics

21

Myth-busting Statistics

Popular Myths (false beliefs) about statistics

22

Myth 1

Statistics is a boring subject.

23

Myth 1

Statistics is a boring subject.

Statistics is a very interesting and useful subject with so many applications.

24
25
26
27
28
29
30
31

Your Turn

Assignment 1

Application of statistics in your field.

Method of evaluation: 1 slide

Time: 3 minutes

Upload your video recording to the LMS

Marks: 10 marks

32

Myth 2

Statistics hasn't changed much in years. It is just the same old stuff.

33

Myth 2

Statistics hasn't changed much in years. It is just the same old stuff.

The field of statistics has evolved significantly over time. It has also become a main componenet of many other disciplines.

34

Ross Ihaka and Robert Gentleman

  • Originators of the R programming language

  • R is a free software environment for statistical computing and graphics

35

Hadley Wickham: Chief Scientist at RStudio

  • International COPSS Presidents' Award in 2019 for "influential work in statistical computing, visualisation, graphics, and data analysis"
36

Robert Tibshirani

  • Professor in Statistics and Biomedical Data Science at Stanford University
  • Received the COPSS Presidents' Award in 1996
37

Trevor Hastie

  • Hastie is known for his contributions to applied statistics, especially in the field of machine learning, data mining, and bioinformatics.
38

Andrew Ng

  • Undergraduate degree with a triple major in computer science, statistics, and economics

  • Co-founded and led Google Brain, Coursera and deeplearning.ai

39

Data Science

40

Data Science

41

Data Science

42

Statistics

  • Descriptive statistics/ Exploratory Data Analysis (EDA)

Use of graphical and numerical summaries to highlight the key features of data.

  • Inferential statistics

Techniques for drawing conclusions about a population by examining random samples

43

Example

Objective: Design a new chair for the university lecture halls

Wants to identify right handed left handed count

44

Population

A population is a complete collection of individuals/ objects that we are interested in.

Sample

A sample is a subset of a population.

45

Parameter

A parameter is a descriptive measure(numerical value) of the population. Parameters are usually denoted by Greek letters.

θ - population proportion

μ - population mean

46

Statistic

A statistic is a descriptive measure of a sample. For example, sample mean, sample standard deviation, etc. We will talk about the notations under estimator and estimate.

47

Variables

1. Qualitative/ Categorical

  • Nominal, Ordinal

2. Quantitative/ Numerical

  • Discrete, Continuous
48

Your turn

data("mtcars")
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
49

Your turn

data("mtcars")
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

50

Descriptive Statistics

  • Data visualization: Graphics

  • Numerical measures

    • We are using R software for data analysis and modeling
51

What is R?

  • R is a software environment for statistical computing and graphics

  • Language designers: Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand

  • Parent language: S

  • The latest R version 3.6.2 has been released on 2019-12-12

52

Why R?

  • Free

  • Powerful: Over 18000 contributed packages on the main repository (CRAN), as of July 2022, provided by top international researchers and programmers

  • Flexible: It is a language, and thus allows you to create your own solutions

  • Community: Large global community friendly and helpful, lots of resources

53

Numerical summary measures

summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
54

Numerical summary measures (cont.)

## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
55

Measures of central tendancy

  • Mean

  • Median

  • Mode

Measures of dispersion

  • Range

  • Inter quartile range

  • Variance

  • Standard deviation

56

R environment

57

The RStudio IDE

58

The RStudio IDE

Image Credit: Clastic Detritus

59

R and RStudio

Image Credit: Clastic Detritus

60

R and RStudio

Image Credit: Clastic Detritus

61

"If R were an airplane, RStudio would be the airport, providing many, many supporting services that make it easier for you, the pilot, to take off and go to awesome places. Sure, you can fly an airplane without an airport, but having those runways and supporting infrastructure is a game-changer."

-- Julie Lowndes

62

Create a new project

63
64
65
66
67
68
69

R Console

7+1
[1] 8
rnorm(10)
[1] -0.572542604 -1.363291256 -0.388722244 0.277914132 -0.823081122
[6] -0.068840934 -1.167662326 -0.008309014 0.128855402 -0.145875628
70

Variable assignment

a <- rnorm(10)
a
[1] -0.16391096 1.76355200 0.76258651 1.11143108 -0.92320695 0.16434184
[7] 1.15482519 -0.05652142 -2.12936065 0.34484576
71

Variable assignment

a <- rnorm(10)
a
[1] -0.16391096 1.76355200 0.76258651 1.11143108 -0.92320695 0.16434184
[7] 1.15482519 -0.05652142 -2.12936065 0.34484576
b <- a*100
b
[1] -16.391096 176.355200 76.258651 111.143108 -92.320695 16.434184
[7] 115.482519 -5.652142 -212.936065 34.484576
72

Data permanency

  • ls() can be used to display the names of the objects which are currently stored within R.

  • The collection of objects currently stored is called the workspace

ls()
[1] "a" "A" "a1" "a2" "b" "B" "b1" "b2"
[9] "b3" "C" "c1" "c2" "df" "g1" "Gender" "ID"
[17] "mtcars" "p1" "w1" "w2" "w3" "Weight"
73
  • To remove objects the function rm is available

    • remove all objects rm(list=ls())

    • remove specific objects rm(x, y, z)

rm(a)
ls()
[1] "A" "a1" "a2" "b" "B" "b1" "b2" "b3"
[9] "C" "c1" "c2" "df" "g1" "Gender" "ID" "mtcars"
[17] "p1" "w1" "w2" "w3" "Weight"
rm(list=ls())
ls()
character(0)
74
75

At the end of an R session, if save: the objects are written to a file called .RData in the current directory, and the command lines used in the session are saved to a file called .Rhistory

76

When R is started at later time from the same directory

77

When R is started at later time from the same directory it reloads the associated workspace and commands history.

78
79

When R is started at later time from the same directory it reloads the associated workspace and commands history.

80

Comment your code

  • Each line of a comment should begin with the comment symbol and a single space: # .
rnorm(10) # This is a comment
[1] -1.9049554 -0.8111702 1.3240043 0.6156368 1.0916690 0.3066049
[7] -0.1101588 -0.9243128 1.5929138 0.0450106
sum(1:10) # 1+2
[1] 55
81

Style Guide

  • Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. -- Hadley Wickham
sum(1:10)#Bad commenting style
[1] 55
sum(1:10) # Good commenting style
[1] 55
  • Also, use commented lines of - and = to break up your file into easily readable sub-sections.
# Read data ----------------
# Plot data ----------------

To learn more read Hadley Wickham's Style guide.

82

Objects in R

83

Objects in R

  • R is an object-oriented language.
  • An object in R is anything (data structures, functions, etc., that can be assigned to a variable).
84

Let's take a look of some common types of objects.

85

Let's take a look of some common types of objects.

1. Data structures are the ways of arranging data.

  • You can create objects, using the left pointing arrow <-
86

2. Functions tell R to do something.

  • A function may be applied to an object.

  • Result of applying a function is usually an object too.

    • All function calls need to be followed by parentheses.
87

Example

a <- 1:20 # data structure
sum(a) # sum is a function applied on a
[1] 210
help.start() # Some functions work on their own.
88

Getting help with functions and features

  • R has inbuilt help facility

Method 1

help(rnorm)
  • For a feature specified by special characters such as for, if, [[
help("[[")
89
  • Search the help files for a word or phrase.
help.search(‘weighted mean’)

Method 2

?rnorm
??rnorm
90

Data structures

Image Credit: venus.ifca.unican.es

91

Data structures

Data structures differ in terms of,

  • Type of data they can hold

  • How they are created

  • Structural complexity

  • Syntax to identify and access individual elements

92

1. Vectors

93

Vectors

  • Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data.

  • Combine function c() is used to form the vector.

  • Data in a vector must only be one type or mode (numeric, character, or logical). You can’t mix modes in the same vector.

94

Vector assignment

Syntax

vector_name <- c(element1, element2, element3)
x <- c(5, 6, 3, 1 , 100)
  • assignment operator ('<-'), '=' can be used as an alternative.

  • c() function

95

What will be the output of the following code?

y <- c(x, 500, 600)
96

Types and tests with vectors

first_vec <- c(10, 20, 50, 70)
second_vec <- c("Jan", "Feb", "March", "April")
third_vec <- c(TRUE, FALSE, TRUE, TRUE)
fourth_vec <- c(10L, 20L, 50L, 70L)

To check if it is a

  • vector: is.vector()
is.vector(first_vec)
[1] TRUE
97
  • charactor vector: is.charactor()
is.character(first_vec)
[1] FALSE
  • double: is.double()
is.double(first_vec)
[1] TRUE
98
  • integer: is.integer()
is.integer(first_vec)
[1] FALSE
99
  • integer: is.integer()
is.integer(first_vec)
[1] FALSE
  • logical: is.logical()
is.logical(first_vec)
[1] FALSE
100
  • length
length(first_vec)
[1] 4
101

Coercion

Vectors must be homogeneous. When you attempt to combine different types they will be coerced to the most flexible type so that every element in the vector is of the same type.

Order from least to most flexible

logical --> integer --> double --> character

102
a <- c(3.1, 2L, 3, 4, "GPA")
typeof(a)
[1] "character"
anew <- c(3.1, 2L, 3, 4)
typeof(anew)
[1] "double"
103

Explicit coercion

Vectors can be explicitly coerced from one class to another using the as.* functions, if available. For example, as.charactor, as.numeric, as.integer, and as.logical.

104
vec1 <- c(TRUE, FALSE, TRUE, TRUE)
typeof(vec1)
[1] "logical"
vec2 <- as.integer(vec1)
typeof(vec2)
[1] "integer"
vec2
[1] 1 0 1 1
105

Why does the below output NAs?

x <- c("a", "b", "c")
as.numeric(x)
[1] NA NA NA
106
x1 <- 1:3
x2 <- c(10, 20, 30)
combinedx1x2 <- c(x1, x2)
combinedx1x2
[1] 1 2 3 10 20 30
107
x1 <- 1:3
x2 <- c(10, 20, 30)
combinedx1x2 <- c(x1, x2)
combinedx1x2
[1] 1 2 3 10 20 30
class(x1)
[1] "integer"
class(x2)
[1] "numeric"
class(combinedx1x2)
[1] "numeric"
108
  • If you combine a numeric vector and a character vector
y1 <- c(1, 2, 3)
y2 <- c("a", "b", "c")
c(y1, y2)
[1] "1" "2" "3" "a" "b" "c"
109

Name elements in a vector

You can name elements in a vector in different ways. We will learn two of them.

When creating it

x1 <- c(a=1991, b=1992, c=1993)
x1
## a b c
## 1991 1992 1993
110

Modifying the names of an existing vector

x2 <- c(1, 5, 10)
names(x2) <- c("a", "b", "b")
x2
## a b b
## 1 5 10

Note that the names do not have to be unique.

111

To remove names of a vector

Method 1

unname(x1); x1
[1] 1991 1992 1993
a b c
1991 1992 1993
112

Method 2

names(x2) <- NULL; x2
[1] 1 5 10

What will be the output of the following code?

v <- c(1, 2, 3)
names(v) <- c("a")
v
113

Simplifying vector creation

  • colon : produce regular spaced ascending or descending sequences.
10:16
[1] 10 11 12 13 14 15 16
-0.5:8.5
[1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
114
  • sequence: seq(initial_value, final_value, increment)
seq(1,11)
[1] 1 2 3 4 5 6 7 8 9 10 11
seq(1, 11, length.out=5)
[1] 1.0 3.5 6.0 8.5 11.0
seq(0, 11, by=2)
[1] 0 2 4 6 8 10
115
  • repeats rep()
rep(9, 5)
[1] 9 9 9 9 9
rep(1:4, 2)
[1] 1 2 3 4 1 2 3 4
rep(1:4, each=2) # each element is repeated twice
[1] 1 1 2 2 3 3 4 4
116
rep(1:4, times=2) # whole sequence is repeated twice
[1] 1 2 3 4 1 2 3 4
rep(1:4, each=2, times=3)
[1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
rep(1:4, 1:4)
[1] 1 2 2 3 3 3 4 4 4 4
rep(1:4, c(4, 1, 4, 2))
[1] 1 1 1 1 2 3 3 3 3 4 4
117

Logical operators

c(1, 2, 3) == c(10, 20, 3)
[1] FALSE FALSE TRUE
c(1, 2, 3) != c(10, 20, 3)
[1] TRUE TRUE FALSE
1:5 > 3
[1] FALSE FALSE FALSE TRUE TRUE
118
1:5 < 3
[1] TRUE TRUE FALSE FALSE FALSE
  • <= less than or equal to

  • >= greater than or equal to

  • | or

  • & and

119

Operators: %in% - in the set

a <- c(1, 2, 3)
b <- c(1, 10, 3)
a%in%b
[1] TRUE FALSE TRUE
x <- 1:10
y <- 1:3
120
x
[1] 1 2 3 4 5 6 7 8 9 10
y
[1] 1 2 3
x %in% y
[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
y %in% x
[1] TRUE TRUE TRUE
121

Vector arithmetic

  • operations are perfored element by element.
c(10, 100, 100) + 2 # two is added to every element in the vector
[1] 12 102 102
122

Vector arithmetic

  • operations are perfored element by element.
c(10, 100, 100) + 2 # two is added to every element in the vector
[1] 12 102 102
  • operations between two vectors
v1 <- c(1, 2, 3); v2 <- c(10, 100, 1000)
v1 + v2
[1] 11 102 1003
123

Add two vectors of unequal length

longvec <- seq(10, 100, length=10); shortvec <- c(1, 2, 3, 4, 5)
shortvec + longvec
[1] 11 22 33 44 55 61 72 83 94 105

What will be the output of the following code?

first <- c(1, 2, 3, 4); second <- c(10, 100)
first * second
124

References

Le Dinh, T., Lee, S. H., Kwon, S. G., & Kwon, K. R. (2022). COVID-19 Chest X-ray Classification and Severity Assessment Using Convolutional and Transformer Neural Networks. Applied Sciences, 12(10), 4861.

125

What is Statistics?

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow