如何解决两列之间的Grepl匹配
我有两个数据集,一个数据集带有一个地址列,另一个数据集包含地点名称及其对应的纬度和经度。
商店的数据集:
+--------------------+-----------+--------------------------------------------------+
| Store name | Postcodes | Address |
+--------------------+-----------+--------------------------------------------------+
| Floral showers | 2000 | Street 45,Level 9,Sydney,New South Wales 2000 |
| Cookie box | 4300 | Shop 3,Queensland 4300 |
| Mango troopers | 2010 | Aberdeen,Bankstown,NSW |
| Building AE44 | 4300 | 778/9 Goulburn Street,QLD |
| Floral showers Co. | 2230 | Steert 47 Cronulla,New South Wales 2230 |
| Vinci supplies | 2560 | West AIRDS,Mayfaille NSW |
+--------------------+-----------+--------------------------------------------------+
最新信息的数据集:
+-------------------+-------+-------------+--------------+
| Locality | State | Latitude | Longitude |
+-------------------+-------+-------------+--------------+
| ABERDARE | NSW | 151.317476 | -32.977861 |
| ABERDEEN | NSW | 151.102917 | -32.14622 |
| ACACIA PLATEAU | NSW | 152.49765 | -28.36456 |
| AIRDS | NSW | 150.768408 | -34.194216 |
| ADAMINABY | NSW | 148.769744 | -35.997349 |
| ABERCROMBIE RIVER | NSW | 149.3476918 | -33.91030648 |
| CRONULLA | NSW | 151.136596 | -34.093213 |
| SYDNEY | NSW | 151.268071 | -33.794883 |
+-------------------+-------+-------------+--------------+
我想创建一个新列,以从地址列中获取每个商店的位置,并从其他数据集中填充纬度和经度。由于地址不是固定格式,因此我知道必须进行字符串搜索。但是,我不确定如何在两列之间进行比较。
以下是两个示例dput输出:
structure(list(Stores_names = c("Floral showers","Cookie box","Mango troopers","Building AE44","Floral showers Co.","Vinci supplies"
),Postcodes = c("2000","4300","2010","2230","2560"
),Address = c("Street 45,New South Wales 2000","Shop 3,Queensland 4300","Aberdeen,NSW","778/9 Goulburn Street,QLD","Steert 47 Cronulla,New South Wales 2230","West AIRDS,Mayfaille NSW"
)),class = "data.frame",row.names = c(NA,-6L))
structure(list(Localities = c("ABERDARE","ABERDEEN","ACACIA PLATEAU","AIRDS","ADAMINABY","ABERCROMBIE RIVER","CRONULLA","SYDNEY"
),State = c("NSW","NSW","NSW"),lat = c("151.317476","151.102917","152.49765","150.768408","148.769744","149.3476918","151.136596","151.268071"),long = c("-32.977861","-32.14622","-28.36456","-34.194216","-35.997349","-33.91030648","-34.093213","-33.794883")),-8L))
我的最终数据集应包含三个新列:位置,纬度和经度。
+--------------------+-----------+--------------------------------------------------+----------+------------+------------+
| Store name | Postcodes | Address | Locality | lat | long |
+--------------------+-----------+--------------------------------------------------+----------+------------+------------+
| Floral showers | 2000 | Street 45,New South Wales 2000 | Sydney | 151.268071 | -33.794883 |
| Cookie box | 4300 | Shop 3,Queensland 4300 | | | |
| Mango troopers | 2010 | Aberdeen,NSW | Aberdeen | 151.102917 | -32.14622 |
| Building AE44 | 4300 | 778/9 Goulburn Street,QLD | | | |
| Floral showers Co. | 2230 | Steert 47 Cronulla,New South Wales 2230 | Cronulla | 151.136596 | -34.093213 |
| Vinci supplies | 2560 | West AIRDS,Mayfaille NSW | AIRDS | 150.768408 | -34.194216 |
+--------------------+-----------+--------------------------------------------------+----------+------------+------------+
在lat long集中找不到的那些可以保留为空白,但是我需要来自store数据集的所有数据。
感谢您的帮助!
解决方法
这项工作:
library(stringr)
library(dplyr)
df %>% mutate(city = str_extract(toupper(Address),paste0(df1$Localities,collapse = '|'))) %>%
left_join(df1,by = c("city"="Localities"),keep = T) %>% select(-c(city,State))
Stores_names Postcodes Address Localities lat long
1 Floral showers 2000 Street 45,Level 9,Sydney,New South Wales 2000 SYDNEY 151.268071 -33.794883
2 Cookie box 4300 Shop 3,Queensland 4300 <NA> <NA> <NA>
3 Mango troopers 2010 Aberdeen,Bankstown,NSW ABERDEEN 151.102917 -32.14622
4 Building AE44 4300 778/9 Goulburn Street,QLD <NA> <NA> <NA>
5 Floral showers Co. 2230 Steert 47 Cronulla,New South Wales 2230 CRONULLA 151.136596 -34.093213
6 Vinci supplies 2560 West AIRDS,Mayfaille NSW AIRDS 150.768408 -34.194216
>
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。