+ - 0:00:00
Notes for current slide
Notes for next slide

R for Data Science With Sports Applications

2023-10-04

1

Recap

  • From Lab 1: it is important to keep track of where files are located in your computer.

  • Your hard drive is organized in folders or directories.

  • In Mac os, ~/Desktop is your desktop.

  • If you have a directory called lab 1 in your desktop, ~/Desktop/lab1 is the location of that folder.

  • Run setwd("~/Desktop/lab1") to make that the working directory.

  • list.files() prints the contents of the working directory in your console.

2

Recap 2

  • R code is organized in functions.

  • Functions take arguments and return values.

  • Data is stored in objects.

  • Assignment (<-) makes variables point to objects.

3

Recap 3

x <- c(1, 2, 3)
mean(x)
## [1] 2
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
4

Data frame

  • The most important kind of object in R.
# NBA championship 2017/2018 82 regular season games
df <- readRDS('./nba.rds')
5

Inspect df

df
## Team Playoff GP MIN PTS W L P2M P2A P2p P3M
## 1 Atlanta Hawks N 82 3941 8475 24 58 2213 4471 49.49676 917
## 2 Boston Celtics Y 82 3961 8529 55 27 2202 4483 49.11889 939
## 3 Brooklyn Nets N 82 3971 8741 28 54 2095 4190 50.00000 1041
## 4 Charlotte Hornets N 82 3956 8874 36 46 2373 4873 48.69690 824
## 5 Chicago Bulls N 82 3971 8440 27 55 2264 4736 47.80405 906
## 6 Cleveland Cavaliers Y 82 3946 9091 50 32 2330 4314 54.01020 981
## 7 Dallas Mavericks N 82 3961 8390 24 58 2161 4354 49.63252 967
## 8 Denver Nuggets N 82 3976 9020 46 36 2398 4566 52.51862 940
## 9 Detroit Pistons N 82 3961 8509 39 43 2322 4756 48.82254 886
## 10 Golden State Warriors Y 82 3946 9304 58 24 2583 4611 56.01822 926
## 11 Houston Rockets Y 82 3951 9213 65 17 1918 3436 55.82072 1256
## 12 Indiana Pacers Y 82 3951 8656 48 34 2604 5073 51.33057 741
## 13 LA Clippers N 82 3941 8937 42 40 2525 4808 52.51664 777
## 14 Los Angeles Lakers N 82 3981 8862 35 47 2516 4864 51.72697 822
## 15 Memphis Grizzlies N 82 3941 8145 22 60 2255 4636 48.64107 758
## 16 Miami Heat Y 82 3986 8480 44 38 2281 4491 50.79047 903
## 17 Milwaukee Bucks Y 82 3966 8731 44 38 2539 4783 53.08384 718
## 18 Minnesota Timberwolves Y 82 3961 8980 47 35 2707 5218 51.87811 658
## 19 New Orleans Pelicans Y 82 3991 9161 48 34 2663 4929 54.02719 837
## 20 New York Knicks N 82 3966 8566 29 53 2661 5279 50.40727 673
## 21 Oklahoma City Thunder Y 82 3966 8844 48 34 2390 4730 50.52854 881
## 22 Orlando Magic N 82 3946 8479 25 57 2338 4637 50.42053 844
## 23 Philadelphia 76ers Y 82 3956 9004 52 30 2448 4653 52.61122 901
## 24 Phoenix Suns N 82 3941 8522 21 61 2390 4855 49.22760 763
## 25 Portland Trail Blazers Y 82 3951 8661 49 33 2377 4824 49.27446 845
## 26 Sacramento Kings N 82 3951 8104 27 55 2441 5096 47.90031 738
## 27 San Antonio Spurs Y 82 3946 8424 47 35 2506 5022 49.90044 696
## 28 Toronto Raptors Y 82 3966 9156 59 23 2415 4464 54.09946 968
## 29 Utah Jazz Y 82 3951 8540 48 34 2252 4372 51.50961 887
## 30 Washington Wizards Y 82 3971 8742 43 39 2461 4845 50.79463 814
## P3A P3p FTM FTA FTp OREB DREB AST TOV STL BLK PF PM team
## 1 2544 36.04560 1298 1654 78.47642 743 2693 1946 1276 638 348 1606 -447 ATL
## 2 2492 37.68058 1308 1697 77.07720 767 2878 1842 1149 604 373 1618 294 BOS
## 3 2924 35.60192 1428 1850 77.18919 792 2852 1941 1245 512 390 1688 -307 BKN
## 4 2233 36.90103 1656 2216 74.72924 827 2901 1770 1041 559 373 1409 21 CHA
## 5 2549 35.54335 1194 1574 75.85769 790 2873 1923 1147 626 289 1571 -577 CHI
## 6 2636 37.21548 1488 1909 77.94657 694 2761 1916 1126 582 312 1524 77 CLE
## 7 2688 35.97470 1167 1530 76.27451 666 2717 1858 1007 578 310 1578 -249 DAL
## 8 2536 37.06625 1404 1830 76.72131 902 2748 2059 1227 627 404 1533 121 DEN
## 9 2373 37.33670 1207 1621 74.46021 830 2756 1868 1103 628 317 1508 -12 DET
## 10 2369 39.08822 1360 1668 81.53477 691 2877 2402 1265 655 612 1607 490 GSW
## 11 3470 36.19597 1609 2061 78.06890 739 2825 1767 1135 699 392 1597 695 HOU
## 12 2010 36.86567 1225 1573 77.87667 788 2684 1819 1088 721 340 1544 113 IND
## 13 2196 35.38251 1556 2095 74.27208 832 2767 1832 1204 628 373 1638 3 LAC
## 14 2384 34.47987 1364 1910 71.41361 876 2927 1949 1295 633 388 1736 -127 LAL
## 15 2152 35.22305 1361 1732 78.57968 779 2544 1767 1227 612 396 1900 -509 MEM
## 16 2506 36.03352 1209 1601 75.51530 763 2801 1862 1178 620 437 1648 39 MIA
## 17 2024 35.47431 1499 1915 78.27676 688 2579 1905 1135 722 443 1752 -25 MIL
## 18 1845 35.66396 1592 1980 80.40404 848 2593 1861 1021 689 345 1495 183 MIN
## 19 2312 36.20242 1324 1716 77.15618 712 2924 2195 1223 657 485 1570 107 NOP
## 20 1914 35.16196 1225 1557 78.67694 859 2752 1912 1207 552 421 1682 -292 NYK
## 21 2491 35.36732 1421 1985 71.58690 1024 2671 1750 1147 743 412 1653 280 OKC
## 22 2405 35.09356 1271 1678 75.74493 722 2692 1921 1192 622 400 1579 -395 ORL
## 23 2445 36.85072 1405 1868 75.21413 893 2996 2221 1353 682 420 1811 369 PHI
## 24 2286 33.37708 1453 1962 74.05708 842 2776 1743 1289 569 370 1807 -768 PHX
## 25 2308 36.61179 1372 1715 80.00000 835 2893 1599 1109 573 423 1599 213 POR
## 26 1967 37.51906 1008 1371 73.52298 777 2578 1768 1125 643 340 1639 -573 SAC
## 27 1977 35.20486 1324 1715 77.20117 849 2777 1868 1078 628 460 1408 237 SAS
## 28 2705 35.78558 1422 1790 79.44134 800 2807 1995 1095 626 500 1783 638 TOR
## 29 2425 36.57732 1375 1766 77.85957 740 2807 1839 1205 708 420 1608 353 UTA
## 30 2173 37.45973 1378 1786 77.15566 823 2713 2065 1196 645 353 1746 48 WAS
## Conference Division Rank
## 1 E Southeast 15
## 2 E Atlantic 2
## 3 E Atlantic 12
## 4 E Southeast 10
## 5 E Central 13
## 6 E Central 4
## 7 W Southwest 13
## 8 W Northwest 9
## 9 E Central 9
## 10 W Pacific 2
## 11 W Southwest 1
## 12 E Central 5
## 13 W Pacific 10
## 14 W Pacific 11
## 15 W Southwest 14
## 16 E Southeast 6
## 17 E Central 7
## 18 W Northwest 8
## 19 W Southwest 6
## 20 E Atlantic 11
## 21 W Northwest 4
## 22 E Southeast 14
## 23 E Atlantic 3
## 24 W Pacific 15
## 25 W Northwest 3
## 26 W Pacific 12
## 27 W Southwest 7
## 28 E Atlantic 1
## 29 W Northwest 5
## 30 E Southeast 8
6
library(dplyr)
glimpse(df)
## Rows: 30
## Columns: 28
## $ Team <chr> "Atlanta Hawks", "Boston Celtics", "Brooklyn Nets", "Charlo…
## $ Playoff <fct> N, Y, N, N, N, Y, N, N, N, Y, Y, Y, N, N, N, Y, Y, Y, Y, N,…
## $ GP <int> 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82,…
## $ MIN <int> 3941, 3961, 3971, 3956, 3971, 3946, 3961, 3976, 3961, 3946,…
## $ PTS <int> 8475, 8529, 8741, 8874, 8440, 9091, 8390, 9020, 8509, 9304,…
## $ W <int> 24, 55, 28, 36, 27, 50, 24, 46, 39, 58, 65, 48, 42, 35, 22,…
## $ L <int> 58, 27, 54, 46, 55, 32, 58, 36, 43, 24, 17, 34, 40, 47, 60,…
## $ P2M <int> 2213, 2202, 2095, 2373, 2264, 2330, 2161, 2398, 2322, 2583,…
## $ P2A <int> 4471, 4483, 4190, 4873, 4736, 4314, 4354, 4566, 4756, 4611,…
## $ P2p <dbl> 49.49676, 49.11889, 50.00000, 48.69690, 47.80405, 54.01020,…
## $ P3M <int> 917, 939, 1041, 824, 906, 981, 967, 940, 886, 926, 1256, 74…
## $ P3A <int> 2544, 2492, 2924, 2233, 2549, 2636, 2688, 2536, 2373, 2369,…
## $ P3p <dbl> 36.04560, 37.68058, 35.60192, 36.90103, 35.54335, 37.21548,…
## $ FTM <int> 1298, 1308, 1428, 1656, 1194, 1488, 1167, 1404, 1207, 1360,…
## $ FTA <int> 1654, 1697, 1850, 2216, 1574, 1909, 1530, 1830, 1621, 1668,…
## $ FTp <dbl> 78.47642, 77.07720, 77.18919, 74.72924, 75.85769, 77.94657,…
## $ OREB <int> 743, 767, 792, 827, 790, 694, 666, 902, 830, 691, 739, 788,…
## $ DREB <int> 2693, 2878, 2852, 2901, 2873, 2761, 2717, 2748, 2756, 2877,…
## $ AST <int> 1946, 1842, 1941, 1770, 1923, 1916, 1858, 2059, 1868, 2402,…
## $ TOV <int> 1276, 1149, 1245, 1041, 1147, 1126, 1007, 1227, 1103, 1265,…
## $ STL <int> 638, 604, 512, 559, 626, 582, 578, 627, 628, 655, 699, 721,…
## $ BLK <int> 348, 373, 390, 373, 289, 312, 310, 404, 317, 612, 392, 340,…
## $ PF <int> 1606, 1618, 1688, 1409, 1571, 1524, 1578, 1533, 1508, 1607,…
## $ PM <int> -447, 294, -307, 21, -577, 77, -249, 121, -12, 490, 695, 11…
## $ team <fct> ATL, BOS, BKN, CHA, CHI, CLE, DAL, DEN, DET, GSW, HOU, IND,…
## $ Conference <fct> E, E, E, E, E, E, W, W, E, W, W, E, W, W, W, E, E, W, W, E,…
## $ Division <fct> Southeast, Atlantic, Atlantic, Southeast, Central, Central,…
## $ Rank <int> 15, 2, 12, 10, 13, 4, 13, 9, 9, 2, 1, 5, 10, 11, 14, 6, 7, …
7

Dplyr verbs

Key functions. Take a data.frame as input and return a data.frame.

  • filter
  • select
  • mutate
  • group_by
  • summarize
  • arrange
8

Filter

Filter rows from the data.frame.

playoff_teams <- filter(df, Playoff=='Y')
glimpse(playoff_teams)
## Rows: 16
## Columns: 28
## $ Team <chr> "Boston Celtics", "Cleveland Cavaliers", "Golden State Warr…
## $ Playoff <fct> Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y
## $ GP <int> 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82,…
## $ MIN <int> 3961, 3946, 3946, 3951, 3951, 3986, 3966, 3961, 3991, 3966,…
## $ PTS <int> 8529, 9091, 9304, 9213, 8656, 8480, 8731, 8980, 9161, 8844,…
## $ W <int> 55, 50, 58, 65, 48, 44, 44, 47, 48, 48, 52, 49, 47, 59, 48,…
## $ L <int> 27, 32, 24, 17, 34, 38, 38, 35, 34, 34, 30, 33, 35, 23, 34,…
## $ P2M <int> 2202, 2330, 2583, 1918, 2604, 2281, 2539, 2707, 2663, 2390,…
## $ P2A <int> 4483, 4314, 4611, 3436, 5073, 4491, 4783, 5218, 4929, 4730,…
## $ P2p <dbl> 49.11889, 54.01020, 56.01822, 55.82072, 51.33057, 50.79047,…
## $ P3M <int> 939, 981, 926, 1256, 741, 903, 718, 658, 837, 881, 901, 845…
## $ P3A <int> 2492, 2636, 2369, 3470, 2010, 2506, 2024, 1845, 2312, 2491,…
## $ P3p <dbl> 37.68058, 37.21548, 39.08822, 36.19597, 36.86567, 36.03352,…
## $ FTM <int> 1308, 1488, 1360, 1609, 1225, 1209, 1499, 1592, 1324, 1421,…
## $ FTA <int> 1697, 1909, 1668, 2061, 1573, 1601, 1915, 1980, 1716, 1985,…
## $ FTp <dbl> 77.07720, 77.94657, 81.53477, 78.06890, 77.87667, 75.51530,…
## $ OREB <int> 767, 694, 691, 739, 788, 763, 688, 848, 712, 1024, 893, 835…
## $ DREB <int> 2878, 2761, 2877, 2825, 2684, 2801, 2579, 2593, 2924, 2671,…
## $ AST <int> 1842, 1916, 2402, 1767, 1819, 1862, 1905, 1861, 2195, 1750,…
## $ TOV <int> 1149, 1126, 1265, 1135, 1088, 1178, 1135, 1021, 1223, 1147,…
## $ STL <int> 604, 582, 655, 699, 721, 620, 722, 689, 657, 743, 682, 573,…
## $ BLK <int> 373, 312, 612, 392, 340, 437, 443, 345, 485, 412, 420, 423,…
## $ PF <int> 1618, 1524, 1607, 1597, 1544, 1648, 1752, 1495, 1570, 1653,…
## $ PM <int> 294, 77, 490, 695, 113, 39, -25, 183, 107, 280, 369, 213, 2…
## $ team <fct> BOS, CLE, GSW, HOU, IND, MIA, MIL, MIN, NOP, OKC, PHI, POR,…
## $ Conference <fct> E, E, W, W, E, E, E, W, W, W, E, W, W, E, W, E
## $ Division <fct> Atlantic, Central, Pacific, Southwest, Central, Southeast, …
## $ Rank <int> 2, 4, 2, 1, 5, 6, 7, 8, 6, 4, 3, 3, 7, 1, 5, 8

Return another data frame with the rows where the second argument is TRUE.

9

Select

Remove columns from the data frame

df_2 <- select(df, Team, Playoff, W, L)
glimpse(df_2)
## Rows: 30
## Columns: 4
## $ Team <chr> "Atlanta Hawks", "Boston Celtics", "Brooklyn Nets", "Charlotte…
## $ Playoff <fct> N, Y, N, N, N, Y, N, N, N, Y, Y, Y, N, N, N, Y, Y, Y, Y, N, Y,…
## $ W <int> 24, 55, 28, 36, 27, 50, 24, 46, 39, 58, 65, 48, 42, 35, 22, 44…
## $ L <int> 58, 27, 54, 46, 55, 32, 58, 36, 43, 24, 17, 34, 40, 47, 60, 38…
10

Mutate

Return a new data frame with a new column:

df_rebs <- mutate(df, REB=OREB+DREB)
glimpse(df_rebs)
## Rows: 30
## Columns: 29
## $ Team <chr> "Atlanta Hawks", "Boston Celtics", "Brooklyn Nets", "Charlo…
## $ Playoff <fct> N, Y, N, N, N, Y, N, N, N, Y, Y, Y, N, N, N, Y, Y, Y, Y, N,…
## $ GP <int> 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82,…
## $ MIN <int> 3941, 3961, 3971, 3956, 3971, 3946, 3961, 3976, 3961, 3946,…
## $ PTS <int> 8475, 8529, 8741, 8874, 8440, 9091, 8390, 9020, 8509, 9304,…
## $ W <int> 24, 55, 28, 36, 27, 50, 24, 46, 39, 58, 65, 48, 42, 35, 22,…
## $ L <int> 58, 27, 54, 46, 55, 32, 58, 36, 43, 24, 17, 34, 40, 47, 60,…
## $ P2M <int> 2213, 2202, 2095, 2373, 2264, 2330, 2161, 2398, 2322, 2583,…
## $ P2A <int> 4471, 4483, 4190, 4873, 4736, 4314, 4354, 4566, 4756, 4611,…
## $ P2p <dbl> 49.49676, 49.11889, 50.00000, 48.69690, 47.80405, 54.01020,…
## $ P3M <int> 917, 939, 1041, 824, 906, 981, 967, 940, 886, 926, 1256, 74…
## $ P3A <int> 2544, 2492, 2924, 2233, 2549, 2636, 2688, 2536, 2373, 2369,…
## $ P3p <dbl> 36.04560, 37.68058, 35.60192, 36.90103, 35.54335, 37.21548,…
## $ FTM <int> 1298, 1308, 1428, 1656, 1194, 1488, 1167, 1404, 1207, 1360,…
## $ FTA <int> 1654, 1697, 1850, 2216, 1574, 1909, 1530, 1830, 1621, 1668,…
## $ FTp <dbl> 78.47642, 77.07720, 77.18919, 74.72924, 75.85769, 77.94657,…
## $ OREB <int> 743, 767, 792, 827, 790, 694, 666, 902, 830, 691, 739, 788,…
## $ DREB <int> 2693, 2878, 2852, 2901, 2873, 2761, 2717, 2748, 2756, 2877,…
## $ AST <int> 1946, 1842, 1941, 1770, 1923, 1916, 1858, 2059, 1868, 2402,…
## $ TOV <int> 1276, 1149, 1245, 1041, 1147, 1126, 1007, 1227, 1103, 1265,…
## $ STL <int> 638, 604, 512, 559, 626, 582, 578, 627, 628, 655, 699, 721,…
## $ BLK <int> 348, 373, 390, 373, 289, 312, 310, 404, 317, 612, 392, 340,…
## $ PF <int> 1606, 1618, 1688, 1409, 1571, 1524, 1578, 1533, 1508, 1607,…
## $ PM <int> -447, 294, -307, 21, -577, 77, -249, 121, -12, 490, 695, 11…
## $ team <fct> ATL, BOS, BKN, CHA, CHI, CLE, DAL, DEN, DET, GSW, HOU, IND,…
## $ Conference <fct> E, E, E, E, E, E, W, W, E, W, W, E, W, W, W, E, E, W, W, E,…
## $ Division <fct> Southeast, Atlantic, Atlantic, Southeast, Central, Central,…
## $ Rank <int> 15, 2, 12, 10, 13, 4, 13, 9, 9, 2, 1, 5, 10, 11, 14, 6, 7, …
## $ REB <int> 3436, 3645, 3644, 3728, 3663, 3455, 3383, 3650, 3586, 3568,…

The first argument is a data.frame. The rest of the arguments is one or more expressions. You can use formulas and mathematical operators (-, +, *, /) in those expressions.

11

Group By

  • Returns a grouped data frame.
  • Does nothing to the data, but subsequent functions behave differently (summarize).
df_grouped <- group_by(df, Playoff)
12

Summarize

Returns a data frame with a summary of the argument. It will have one row per group in the argument data frame.

tbl <- summarize(df_grouped, avg_pts=mean(PTS))

Like mutate, you need to pass one or more expression, that will be applied to each group in the data.

13

Arrange

  • Sorts the data.frame
  • The arguments are the columns used for sorting.
  • Use a minus sign before the argument to sort in descending order (ascending is the default)
14

Arrange

  • Get the top 5 ranked teams
sorted_df <- arrange(df, Rank)
head(sorted_df, 5)
## Team Playoff GP MIN PTS W L P2M P2A P2p P3M P3A
## 1 Houston Rockets Y 82 3951 9213 65 17 1918 3436 55.82072 1256 3470
## 2 Toronto Raptors Y 82 3966 9156 59 23 2415 4464 54.09946 968 2705
## 3 Boston Celtics Y 82 3961 8529 55 27 2202 4483 49.11889 939 2492
## 4 Golden State Warriors Y 82 3946 9304 58 24 2583 4611 56.01822 926 2369
## 5 Philadelphia 76ers Y 82 3956 9004 52 30 2448 4653 52.61122 901 2445
## P3p FTM FTA FTp OREB DREB AST TOV STL BLK PF PM team
## 1 36.19597 1609 2061 78.06890 739 2825 1767 1135 699 392 1597 695 HOU
## 2 35.78558 1422 1790 79.44134 800 2807 1995 1095 626 500 1783 638 TOR
## 3 37.68058 1308 1697 77.07720 767 2878 1842 1149 604 373 1618 294 BOS
## 4 39.08822 1360 1668 81.53477 691 2877 2402 1265 655 612 1607 490 GSW
## 5 36.85072 1405 1868 75.21413 893 2996 2221 1353 682 420 1811 369 PHI
## Conference Division Rank
## 1 W Southwest 1
## 2 E Atlantic 1
## 3 E Atlantic 2
## 4 W Pacific 2
## 5 E Atlantic 3
  • Multiple arguments break ties
  • How would you print only the name of the teams?
15

Count

  • Count how many observations for each value of the variable.
  • No arguments counts all the rows
  • If we pass arguments, counts grouping with the variable we passed.
count(df)
## n
## 1 30
  • How many teams per division?
count(df, Division)
## Division n
## 1 Atlantic 5
## 2 Central 5
## 3 Northwest 5
## 4 Pacific 5
## 5 Southeast 5
## 6 Southwest 5
16

Remember object types

  • Different functions take different type of objects.

  • df is a data.frame

  • A data.frame is a collection of vectors
  • Vectors can be of different types
glimpse(df)
## Rows: 30
## Columns: 28
## $ Team <chr> "Atlanta Hawks", "Boston Celtics", "Brooklyn Nets", "Charlo…
## $ Playoff <fct> N, Y, N, N, N, Y, N, N, N, Y, Y, Y, N, N, N, Y, Y, Y, Y, N,…
## $ GP <int> 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82,…
## $ MIN <int> 3941, 3961, 3971, 3956, 3971, 3946, 3961, 3976, 3961, 3946,…
## $ PTS <int> 8475, 8529, 8741, 8874, 8440, 9091, 8390, 9020, 8509, 9304,…
## $ W <int> 24, 55, 28, 36, 27, 50, 24, 46, 39, 58, 65, 48, 42, 35, 22,…
## $ L <int> 58, 27, 54, 46, 55, 32, 58, 36, 43, 24, 17, 34, 40, 47, 60,…
## $ P2M <int> 2213, 2202, 2095, 2373, 2264, 2330, 2161, 2398, 2322, 2583,…
## $ P2A <int> 4471, 4483, 4190, 4873, 4736, 4314, 4354, 4566, 4756, 4611,…
## $ P2p <dbl> 49.49676, 49.11889, 50.00000, 48.69690, 47.80405, 54.01020,…
## $ P3M <int> 917, 939, 1041, 824, 906, 981, 967, 940, 886, 926, 1256, 74…
## $ P3A <int> 2544, 2492, 2924, 2233, 2549, 2636, 2688, 2536, 2373, 2369,…
## $ P3p <dbl> 36.04560, 37.68058, 35.60192, 36.90103, 35.54335, 37.21548,…
## $ FTM <int> 1298, 1308, 1428, 1656, 1194, 1488, 1167, 1404, 1207, 1360,…
## $ FTA <int> 1654, 1697, 1850, 2216, 1574, 1909, 1530, 1830, 1621, 1668,…
## $ FTp <dbl> 78.47642, 77.07720, 77.18919, 74.72924, 75.85769, 77.94657,…
## $ OREB <int> 743, 767, 792, 827, 790, 694, 666, 902, 830, 691, 739, 788,…
## $ DREB <int> 2693, 2878, 2852, 2901, 2873, 2761, 2717, 2748, 2756, 2877,…
## $ AST <int> 1946, 1842, 1941, 1770, 1923, 1916, 1858, 2059, 1868, 2402,…
## $ TOV <int> 1276, 1149, 1245, 1041, 1147, 1126, 1007, 1227, 1103, 1265,…
## $ STL <int> 638, 604, 512, 559, 626, 582, 578, 627, 628, 655, 699, 721,…
## $ BLK <int> 348, 373, 390, 373, 289, 312, 310, 404, 317, 612, 392, 340,…
## $ PF <int> 1606, 1618, 1688, 1409, 1571, 1524, 1578, 1533, 1508, 1607,…
## $ PM <int> -447, 294, -307, 21, -577, 77, -249, 121, -12, 490, 695, 11…
## $ team <fct> ATL, BOS, BKN, CHA, CHI, CLE, DAL, DEN, DET, GSW, HOU, IND,…
## $ Conference <fct> E, E, E, E, E, E, W, W, E, W, W, E, W, W, W, E, E, W, W, E,…
## $ Division <fct> Southeast, Atlantic, Atlantic, Southeast, Central, Central,…
## $ Rank <int> 15, 2, 12, 10, 13, 4, 13, 9, 9, 2, 1, 5, 10, 11, 14, 6, 7, …

We can access vectors inside a data frame in multiple ways. $ operator.

mean(df$PTS)
## [1] 8719.333
  • Dplyr verbs streamline access to vectors
17

Mutate

df_with_mean <- mutate(df, mean_pts=mean(PTS))
  • Think about data types!
18

Recap

  • From Lab 1: it is important to keep track of where files are located in your computer.

  • Your hard drive is organized in folders or directories.

  • In Mac os, ~/Desktop is your desktop.

  • If you have a directory called lab 1 in your desktop, ~/Desktop/lab1 is the location of that folder.

  • Run setwd("~/Desktop/lab1") to make that the working directory.

  • list.files() prints the contents of the working directory in your console.

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow