文章目录
原网站https://r4ds.had.co.nz/
library(tidyverse)
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
√ ggplot2 3.1.0 √ purrr 0.3.1
√ tibble 2.0.1 √ dplyr 0.8.0.1
√ tidyr 0.8.3 √ stringr 1.4.0
√ readr 1.3.1 √ forcats 0.4.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
创建tibble
将常规数据帧转换为tibble
as_tibble(iris)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
4.6 | 3.4 | 1.4 | 0.3 | setosa |
5.0 | 3.4 | 1.5 | 0.2 | setosa |
4.4 | 2.9 | 1.4 | 0.2 | setosa |
4.9 | 3.1 | 1.5 | 0.1 | setosa |
5.4 | 3.7 | 1.5 | 0.2 | setosa |
4.8 | 3.4 | 1.6 | 0.2 | setosa |
4.8 | 3.0 | 1.4 | 0.1 | setosa |
4.3 | 3.0 | 1.1 | 0.1 | setosa |
5.8 | 4.0 | 1.2 | 0.2 | setosa |
5.7 | 4.4 | 1.5 | 0.4 | setosa |
5.4 | 3.9 | 1.3 | 0.4 | setosa |
5.1 | 3.5 | 1.4 | 0.3 | setosa |
5.7 | 3.8 | 1.7 | 0.3 | setosa |
5.1 | 3.8 | 1.5 | 0.3 | setosa |
5.4 | 3.4 | 1.7 | 0.2 | setosa |
5.1 | 3.7 | 1.5 | 0.4 | setosa |
4.6 | 3.6 | 1.0 | 0.2 | setosa |
5.1 | 3.3 | 1.7 | 0.5 | setosa |
4.8 | 3.4 | 1.9 | 0.2 | setosa |
5.0 | 3.0 | 1.6 | 0.2 | setosa |
5.0 | 3.4 | 1.6 | 0.4 | setosa |
5.2 | 3.5 | 1.5 | 0.2 | setosa |
5.2 | 3.4 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.6 | 0.2 | setosa |
... | ... | ... | ... | ... |
6.9 | 3.2 | 5.7 | 2.3 | virginica |
5.6 | 2.8 | 4.9 | 2.0 | virginica |
7.7 | 2.8 | 6.7 | 2.0 | virginica |
6.3 | 2.7 | 4.9 | 1.8 | virginica |
6.7 | 3.3 | 5.7 | 2.1 | virginica |
7.2 | 3.2 | 6.0 | 1.8 | virginica |
6.2 | 2.8 | 4.8 | 1.8 | virginica |
6.1 | 3.0 | 4.9 | 1.8 | virginica |
6.4 | 2.8 | 5.6 | 2.1 | virginica |
7.2 | 3.0 | 5.8 | 1.6 | virginica |
7.4 | 2.8 | 6.1 | 1.9 | virginica |
7.9 | 3.8 | 6.4 | 2.0 | virginica |
6.4 | 2.8 | 5.6 | 2.2 | virginica |
6.3 | 2.8 | 5.1 | 1.5 | virginica |
6.1 | 2.6 | 5.6 | 1.4 | virginica |
7.7 | 3.0 | 6.1 | 2.3 | virginica |
6.3 | 3.4 | 5.6 | 2.4 | virginica |
6.4 | 3.1 | 5.5 | 1.8 | virginica |
6.0 | 3.0 | 4.8 | 1.8 | virginica |
6.9 | 3.1 | 5.4 | 2.1 | virginica |
6.7 | 3.1 | 5.6 | 2.4 | virginica |
6.9 | 3.1 | 5.1 | 2.3 | virginica |
5.8 | 2.7 | 5.1 | 1.9 | virginica |
6.8 | 3.2 | 5.9 | 2.3 | virginica |
6.7 | 3.3 | 5.7 | 2.5 | virginica |
6.7 | 3.0 | 5.2 | 2.3 | virginica |
6.3 | 2.5 | 5.0 | 1.9 | virginica |
6.5 | 3.0 | 5.2 | 2.0 | virginica |
6.2 | 3.4 | 5.4 | 2.3 | virginica |
5.9 | 3.0 | 5.1 | 1.8 | virginica |
使用vector创建tibble
tibble(
x=1:5,
y=1,
z = x^2+y)
x | y | z |
---|---|---|
1 | 1 | 2 |
2 | 1 | 5 |
3 | 1 | 10 |
4 | 1 | 17 |
5 | 1 | 26 |
tibble永远不会改变输入的类型(例如它永远不会将字符串转换为因子!),它永远不会改变变量的名称,也不会创建行名。
为非法名称创建tibble
引用这些变量,您需要用反引号将它们包围起来`
tb <- tibble(
`:)` = "smile",
` ` = "space",
`2000` = "number"
)
tb
:) | 2000 | |
---|---|---|
smile | space | number |
使用tribble创建tibble
主要针对少量数据
tribble(
~x, ~y, ~z,
"a", 2, 3.6,
"b", 1, 8.5
)
x | y | z |
---|---|---|
a | 2 | 3.6 |
b | 1 | 8.5 |
Tibbles与data.frame的区别
打印
Tibbles有一个精确的打印方法,只显示前10行,以及适合屏幕的所有列。这使得处理大数据变得更加容易。
tibble(
a = lubridate::now() + runif(1e3) * 86400,
b = lubridate::today() + runif(1e3) * 30,
c = 1:1e3,
d = runif(1e3),
e = sample(letters, 1e3, replace = TRUE)
)
a | b | c | d | e |
---|---|---|---|---|
2019-03-08 12:03:37 | 2019-03-27 | 1 | 0.9350711 | t |
2019-03-07 19:03:27 | 2019-03-19 | 2 | 0.7748978 | i |
2019-03-08 00:53:08 | 2019-03-30 | 3 | 0.5425871 | e |
2019-03-08 06:48:58 | 2019-03-22 | 4 | 0.3761034 | p |
2019-03-08 01:30:32 | 2019-03-11 | 5 | 0.7802678 | o |
2019-03-07 21:41:48 | 2019-03-13 | 6 | 0.7276809 | x |
2019-03-07 18:32:13 | 2019-03-29 | 7 | 0.1885623 | f |
2019-03-08 01:07:36 | 2019-03-14 | 8 | 0.5235737 | v |
2019-03-08 02:04:10 | 2019-04-03 | 9 | 0.7619842 | u |
2019-03-07 17:31:23 | 2019-03-08 | 10 | 0.9297701 | j |
2019-03-07 18:17:07 | 2019-03-26 | 11 | 0.8558517 | c |
2019-03-08 02:04:13 | 2019-03-23 | 12 | 0.8245066 | r |
2019-03-07 19:58:33 | 2019-04-02 | 13 | 0.7309103 | l |
2019-03-08 06:27:35 | 2019-03-19 | 14 | 0.3011315 | d |
2019-03-08 05:20:41 | 2019-04-04 | 15 | 0.8075490 | y |
2019-03-08 04:07:34 | 2019-03-26 | 16 | 0.2278415 | k |
2019-03-07 22:02:45 | 2019-03-30 | 17 | 0.4392638 | v |
2019-03-07 19:18:02 | 2019-03-18 | 18 | 0.5410472 | i |
2019-03-07 23:08:28 | 2019-04-04 | 19 | 0.4924298 | v |
2019-03-08 14:16:50 | 2019-03-29 | 20 | 0.8851770 | x |
2019-03-07 16:55:58 | 2019-03-28 | 21 | 0.1958053 | l |
2019-03-08 02:40:52 | 2019-03-15 | 22 | 0.9258686 | f |
2019-03-08 13:28:23 | 2019-03-25 | 23 | 0.6027402 | u |
2019-03-08 07:36:26 | 2019-03-28 | 24 | 0.5473824 | b |
2019-03-07 17:36:36 | 2019-03-19 | 25 | 0.6461237 | l |
2019-03-08 06:56:02 | 2019-03-29 | 26 | 0.9940611 | h |
2019-03-07 18:51:23 | 2019-03-21 | 27 | 0.4721487 | o |
2019-03-07 20:43:13 | 2019-03-13 | 28 | 0.4646166 | a |
2019-03-07 19:23:50 | 2019-03-31 | 29 | 0.2749480 | e |
2019-03-08 09:02:11 | 2019-04-05 | 30 | 0.2162153 | h |
... | ... | ... | ... | ... |
2019-03-08 15:14:21 | 2019-04-05 | 971 | 0.27085010 | k |
2019-03-08 05:46:30 | 2019-04-05 | 972 | 0.43354312 | r |
2019-03-08 06:08:11 | 2019-03-16 | 973 | 0.56341744 | s |
2019-03-08 00:09:36 | 2019-04-03 | 974 | 0.35070489 | z |
2019-03-07 19:12:50 | 2019-04-02 | 975 | 0.72674730 | w |
2019-03-08 04:20:00 | 2019-03-09 | 976 | 0.21156357 | h |
2019-03-07 21:10:20 | 2019-03-13 | 977 | 0.30344723 | h |
2019-03-07 17:38:27 | 2019-03-20 | 978 | 0.71715861 | y |
2019-03-08 01:58:09 | 2019-03-15 | 979 | 0.55302497 | c |
2019-03-08 03:59:16 | 2019-03-29 | 980 | 0.69767831 | r |
2019-03-08 06:18:01 | 2019-03-27 | 981 | 0.19649741 | s |
2019-03-07 17:50:31 | 2019-03-17 | 982 | 0.43877830 | t |
2019-03-08 02:28:38 | 2019-03-10 | 983 | 0.25183157 | t |
2019-03-07 19:13:57 | 2019-03-26 | 984 | 0.82745938 | e |
2019-03-07 22:39:07 | 2019-03-11 | 985 | 0.29956544 | o |
2019-03-08 03:20:20 | 2019-03-25 | 986 | 0.17728177 | w |
2019-03-07 16:14:48 | 2019-03-24 | 987 | 0.97755326 | w |
2019-03-07 23:04:04 | 2019-04-01 | 988 | 0.27542162 | h |
2019-03-08 14:39:32 | 2019-03-14 | 989 | 0.59929473 | p |
2019-03-07 16:22:22 | 2019-03-20 | 990 | 0.30857316 | a |
2019-03-08 00:23:25 | 2019-03-09 | 991 | 0.04862165 | g |
2019-03-08 12:24:14 | 2019-03-07 | 992 | 0.18062590 | o |
2019-03-08 12:26:09 | 2019-03-29 | 993 | 0.13784189 | l |
2019-03-07 16:53:37 | 2019-03-14 | 994 | 0.48948494 | v |
2019-03-08 05:14:55 | 2019-03-26 | 995 | 0.24617007 | v |
2019-03-07 22:57:17 | 2019-04-03 | 996 | 0.03077302 | p |
2019-03-08 08:03:03 | 2019-03-23 | 997 | 0.79776461 | t |
2019-03-08 00:22:03 | 2019-03-31 | 998 | 0.28776542 | o |
2019-03-08 06:38:52 | 2019-03-12 | 999 | 0.62346939 | w |
2019-03-07 22:21:26 | 2019-03-25 | 1000 | 0.23807852 | a |
如何控制输出的行数和列数
可以显式地print()显示数据框并控制行数(n)和width显示。width = Inf将显示所有列
nycflights13::flights %>%
print(n = 10, width = Inf)
还可以通过设置选项来控制默认打印行为:
-
options(tibble.print_max = n, tibble.print_min = m):如果超过n 行,则仅打印m行。
-
使用options(tibble.print_min = Inf)始终显示所有行。
-
使用options(tibble.width = Inf)总是打印所有列,无论屏幕的宽度。
可以通过查看软件包帮助来查看完整的选项列表package?tibble。
最后一个选择是使用RStudio的内置数据查看器来获取完整数据集的可滚动视图。在长链操作结束时,这通常也很有用。
nycflights13::flights %>%
View()
提取tibble中的信息
df <- tibble(
x = runif(5),
y = rnorm(5)
)
df
x | y |
---|---|
0.05800016 | -0.4546187 |
0.20734257 | -0.7548791 |
0.15471524 | -1.5633772 |
0.40664643 | -1.0151103 |
0.63954919 | -0.9263165 |
df$x
- 0.0580001603811979
- 0.207342570181936
- 0.154715242562816
- 0.406646434217691
- 0.639549192739651
df[['x']]
- 0.0580001603811979
- 0.207342570181936
- 0.154715242562816
- 0.406646434217691
- 0.639549192739651
df[[1]]
- 0.0580001603811979
- 0.207342570181936
- 0.154715242562816
- 0.406646434217691
- 0.639549192739651
# 在管道中使用它们
df %>% .$x
- 0.0580001603811979
- 0.207342570181936
- 0.154715242562816
- 0.406646434217691
- 0.639549192739651
df %>% .[["x"]]
- 0.0580001603811979
- 0.207342570181936
- 0.154715242562816
- 0.406646434217691
- 0.639549192739651
有一些方法在tibble中无法使用,如需使用可以转化成先data.frame
class(as.data.frame(tb))