@@ -36,10 +36,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat
3636
3737::::::::::::::::::::::::::::::::::::::::: instructor
3838
39- Pay attention to and explain the errors and warnings generated from the
39+ Pay attention to and explain the errors and warnings generated from the
4040examples in this episode.
4141
42- :::::::::::::::::::::::::::::::::::::::::
42+ :::::::::::::::::::::::::::::::::::::::::
4343
4444
4545``` r
@@ -77,7 +77,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind
7777
7878- You can read directly from excel spreadsheets without
7979 converting them to plain text first by using the [ readxl] ( https://cran.r-project.org/package=readxl ) package.
80-
80+
8181
8282::::::::::::::::::::::::::::::::::::::::::::::::::
8383
@@ -99,7 +99,8 @@ str(gapminder)
9999 $ gdpPercap: num 779 821 853 836 740 ...
100100```
101101
102- We can also examine individual columns of the data frame with our ` class ` function:
102+ We can also examine individual columns of the data frame with the ` class ` or
103+ 'typeof' functions:
103104
104105
105106``` r
@@ -110,6 +111,14 @@ class(gapminder$year)
110111[1] "integer"
111112```
112113
114+ ``` r
115+ typeof(gapminder $ year )
116+ ```
117+
118+ ``` {.output}
119+ [1] "integer"
120+ ```
121+
113122``` r
114123class(gapminder $ country )
115124```
@@ -424,6 +433,104 @@ tail(gapminder_norway)
424433
425434To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.
426435
436+
437+ ## Removing columns and rows in data frames
438+
439+ To remove columns from a data frame, we can use the 'subset' function.
440+ This function allows us to remove columns using their names:
441+
442+
443+ ``` r
444+ life_expectancy <- subset(gapminder , select = - c(continent , pop , gdpPercap ))
445+ head(life_expectancy )
446+ ```
447+
448+ ``` {.output}
449+ country year lifeExp below_average
450+ 1 Afghanistan 1952 28.801 TRUE
451+ 2 Afghanistan 1957 30.332 TRUE
452+ 3 Afghanistan 1962 31.997 TRUE
453+ 4 Afghanistan 1967 34.020 TRUE
454+ 5 Afghanistan 1972 36.088 TRUE
455+ 6 Afghanistan 1977 38.438 TRUE
456+ ```
457+
458+ We can also use a logical vector to achieve the same result. Make sure the
459+ vector's length match the number of columns in the data frame (to avoid vector
460+ recycling):
461+
462+
463+ ``` r
464+ life_expectancy <- gapminder [c(TRUE , TRUE , FALSE , FALSE , TRUE , FALSE )]
465+ head(life_expectancy )
466+ ```
467+
468+ ``` {.output}
469+ country year lifeExp below_average
470+ 1 Afghanistan 1952 28.801 TRUE
471+ 2 Afghanistan 1957 30.332 TRUE
472+ 3 Afghanistan 1962 31.997 TRUE
473+ 4 Afghanistan 1967 34.020 TRUE
474+ 5 Afghanistan 1972 36.088 TRUE
475+ 6 Afghanistan 1977 38.438 TRUE
476+ ```
477+
478+ Alternatively, we can use column's positions:
479+
480+
481+ ``` r
482+ life_expectancy <- gapminder [- c(3 , 4 , 6 )]
483+ head(life_expectancy )
484+ ```
485+
486+ ``` {.output}
487+ country year lifeExp below_average
488+ 1 Afghanistan 1952 28.801 TRUE
489+ 2 Afghanistan 1957 30.332 TRUE
490+ 3 Afghanistan 1962 31.997 TRUE
491+ 4 Afghanistan 1967 34.020 TRUE
492+ 5 Afghanistan 1972 36.088 TRUE
493+ 6 Afghanistan 1977 38.438 TRUE
494+ ```
495+
496+ Note that the easy way to remove rows from a data frame is selecting the rows
497+ we want to keep instead.
498+ Anyway, to remove rows from a data frame, we can use their positions:
499+
500+
501+ ``` r
502+ # Filter data for Afghanistan during the 20th century:
503+ afghanistan_20c <- gapminder [gapminder $ country == " Afghanistan" &
504+ gapminder $ year > 2000 , ]
505+
506+ # Now remove data for 2002, that is, the first row:
507+ afghanistan_20c [- 1 , ]
508+ ```
509+
510+ ``` {.output}
511+ country year pop continent lifeExp gdpPercap below_average
512+ 12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
513+ ```
514+
515+
516+ An interesting case is removing rows containing NAs:
517+
518+
519+ ``` r
520+ # Turn some values into NAs:
521+ afghanistan_20c <- gapminder [gapminder $ country == " Afghanistan" , ]
522+ afghanistan_20c [afghanistan_20c $ year < 2007 , " year" ] <- NA
523+
524+ # Remove NAs
525+ na.omit(afghanistan_20c )
526+ ```
527+
528+ ``` {.output}
529+ country year pop continent lifeExp gdpPercap below_average
530+ 12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
531+ ```
532+
533+
427534## Factors
428535
429536Here is another thing to look out for: in a ` factor ` , each different value
0 commit comments