learn from R4DS(1.Tibbles)


原网站https://r4ds.had.co.nz/

library(tidyverse)
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
√ ggplot2 3.1.0       √ purrr   0.3.1  
√ tibble  2.0.1       √ dplyr   0.8.0.1
√ tidyr   0.8.3       √ stringr 1.4.0  
√ readr   1.3.1       √ forcats 0.4.0  
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

创建tibble

将常规数据帧转换为tibble
as_tibble(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
... ... ... ... ...
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica
使用vector创建tibble
tibble(
x=1:5,
y=1,
z = x^2+y)
x y z
1 1 2
2 1 5
3 1 10
4 1 17
5 1 26

tibble永远不会改变输入的类型(例如它永远不会将字符串转换为因子!),它永远不会改变变量的名称,也不会创建行名。

为非法名称创建tibble

引用这些变量,您需要用反引号将它们包围起来`

tb <- tibble(
  `:)` = "smile", 
  ` ` = "space",
  `2000` = "number"
)
tb
:) 2000
smile space number
使用tribble创建tibble

主要针对少量数据

tribble(
  ~x, ~y, ~z,
  "a", 2, 3.6,
  "b", 1, 8.5
)
x y z
a 2 3.6
b 1 8.5

Tibbles与data.frame的区别

打印

Tibbles有一个精确的打印方法,只显示前10行,以及适合屏幕的所有列。这使得处理大数据变得更加容易。

tibble(
  a = lubridate::now() + runif(1e3) * 86400,
  b = lubridate::today() + runif(1e3) * 30,
  c = 1:1e3,
  d = runif(1e3),
  e = sample(letters, 1e3, replace = TRUE)
)
a b c d e
2019-03-08 12:03:37 2019-03-27 1 0.9350711 t
2019-03-07 19:03:27 2019-03-19 2 0.7748978 i
2019-03-08 00:53:08 2019-03-30 3 0.5425871 e
2019-03-08 06:48:58 2019-03-22 4 0.3761034 p
2019-03-08 01:30:32 2019-03-11 5 0.7802678 o
2019-03-07 21:41:48 2019-03-13 6 0.7276809 x
2019-03-07 18:32:13 2019-03-29 7 0.1885623 f
2019-03-08 01:07:36 2019-03-14 8 0.5235737 v
2019-03-08 02:04:10 2019-04-03 9 0.7619842 u
2019-03-07 17:31:23 2019-03-08 10 0.9297701 j
2019-03-07 18:17:07 2019-03-26 11 0.8558517 c
2019-03-08 02:04:13 2019-03-23 12 0.8245066 r
2019-03-07 19:58:33 2019-04-02 13 0.7309103 l
2019-03-08 06:27:35 2019-03-19 14 0.3011315 d
2019-03-08 05:20:41 2019-04-04 15 0.8075490 y
2019-03-08 04:07:34 2019-03-26 16 0.2278415 k
2019-03-07 22:02:45 2019-03-30 17 0.4392638 v
2019-03-07 19:18:02 2019-03-18 18 0.5410472 i
2019-03-07 23:08:28 2019-04-04 19 0.4924298 v
2019-03-08 14:16:50 2019-03-29 20 0.8851770 x
2019-03-07 16:55:58 2019-03-28 21 0.1958053 l
2019-03-08 02:40:52 2019-03-15 22 0.9258686 f
2019-03-08 13:28:23 2019-03-25 23 0.6027402 u
2019-03-08 07:36:26 2019-03-28 24 0.5473824 b
2019-03-07 17:36:36 2019-03-19 25 0.6461237 l
2019-03-08 06:56:02 2019-03-29 26 0.9940611 h
2019-03-07 18:51:23 2019-03-21 27 0.4721487 o
2019-03-07 20:43:13 2019-03-13 28 0.4646166 a
2019-03-07 19:23:50 2019-03-31 29 0.2749480 e
2019-03-08 09:02:11 2019-04-05 30 0.2162153 h
... ... ... ... ...
2019-03-08 15:14:21 2019-04-05 971 0.27085010 k
2019-03-08 05:46:30 2019-04-05 972 0.43354312 r
2019-03-08 06:08:11 2019-03-16 973 0.56341744 s
2019-03-08 00:09:36 2019-04-03 974 0.35070489 z
2019-03-07 19:12:50 2019-04-02 975 0.72674730 w
2019-03-08 04:20:00 2019-03-09 976 0.21156357 h
2019-03-07 21:10:20 2019-03-13 977 0.30344723 h
2019-03-07 17:38:27 2019-03-20 978 0.71715861 y
2019-03-08 01:58:09 2019-03-15 979 0.55302497 c
2019-03-08 03:59:16 2019-03-29 980 0.69767831 r
2019-03-08 06:18:01 2019-03-27 981 0.19649741 s
2019-03-07 17:50:31 2019-03-17 982 0.43877830 t
2019-03-08 02:28:38 2019-03-10 983 0.25183157 t
2019-03-07 19:13:57 2019-03-26 984 0.82745938 e
2019-03-07 22:39:07 2019-03-11 985 0.29956544 o
2019-03-08 03:20:20 2019-03-25 986 0.17728177 w
2019-03-07 16:14:48 2019-03-24 987 0.97755326 w
2019-03-07 23:04:04 2019-04-01 988 0.27542162 h
2019-03-08 14:39:32 2019-03-14 989 0.59929473 p
2019-03-07 16:22:22 2019-03-20 990 0.30857316 a
2019-03-08 00:23:25 2019-03-09 991 0.04862165 g
2019-03-08 12:24:14 2019-03-07 992 0.18062590 o
2019-03-08 12:26:09 2019-03-29 993 0.13784189 l
2019-03-07 16:53:37 2019-03-14 994 0.48948494 v
2019-03-08 05:14:55 2019-03-26 995 0.24617007 v
2019-03-07 22:57:17 2019-04-03 996 0.03077302 p
2019-03-08 08:03:03 2019-03-23 997 0.79776461 t
2019-03-08 00:22:03 2019-03-31 998 0.28776542 o
2019-03-08 06:38:52 2019-03-12 999 0.62346939 w
2019-03-07 22:21:26 2019-03-25 1000 0.23807852 a
如何控制输出的行数和列数

可以显式地print()显示数据框并控制行数(n)和width显示。width = Inf将显示所有列

nycflights13::flights %>% 
  print(n = 10, width = Inf)

还可以通过设置选项来控制默认打印行为:

  • options(tibble.print_max = n, tibble.print_min = m):如果超过n 行,则仅打印m行。

  • 使用options(tibble.print_min = Inf)始终显示所有行。

  • 使用options(tibble.width = Inf)总是打印所有列,无论屏幕的宽度。

可以通过查看软件包帮助来查看完整的选项列表package?tibble。

最后一个选择是使用RStudio的内置数据查看器来获取完整数据集的可滚动视图。在长链操作结束时,这通常也很有用。

nycflights13::flights %>% 
  View()

提取tibble中的信息

df <- tibble(
  x = runif(5),
  y = rnorm(5)
)
df
x y
0.05800016 -0.4546187
0.20734257 -0.7548791
0.15471524 -1.5633772
0.40664643 -1.0151103
0.63954919 -0.9263165
df$x
  1. 0.0580001603811979
  2. 0.207342570181936
  3. 0.154715242562816
  4. 0.406646434217691
  5. 0.639549192739651
df[['x']]
  1. 0.0580001603811979
  2. 0.207342570181936
  3. 0.154715242562816
  4. 0.406646434217691
  5. 0.639549192739651
df[[1]]
  1. 0.0580001603811979
  2. 0.207342570181936
  3. 0.154715242562816
  4. 0.406646434217691
  5. 0.639549192739651
# 在管道中使用它们
df %>% .$x
  1. 0.0580001603811979
  2. 0.207342570181936
  3. 0.154715242562816
  4. 0.406646434217691
  5. 0.639549192739651
df %>% .[["x"]]
  1. 0.0580001603811979
  2. 0.207342570181936
  3. 0.154715242562816
  4. 0.406646434217691
  5. 0.639549192739651

有一些方法在tibble中无法使用,如需使用可以转化成先data.frame

class(as.data.frame(tb))

发布了57 篇原创文章 · 获赞 63 · 访问量 8万+

猜你喜欢

转载自blog.csdn.net/weixin_41503009/article/details/88312249