如何解决在R中,如何在新数据框中循环浏览csv文件和线性回归的安全输出?
可以在我的Github文件夹中找到我的脚本和前三个csv文件之一
我已将NDVI和气候数据列表拆分为小型csv。每个文件具有34年的数据。
然后,根据冲突年份,应将每34年分为两部分,分别保存在同一表和特定时间范围内。但是这部分代码已经可以使用了。
现在,我想通过使用多元线性回归来利用第一部分的气候数据控制列表的第二部分。
我基本上需要做一个循环来存储一个csv的lm函数每一轮的所有系数。文件添加到新列表中。
我知道我可以使用lapply循环并获得列表输出。但是实际上有一些缺少的部分可以循环通过csv。文件。
#load libraries
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
library(ggpubr)
library(plyr)
library(tidyverse)
library(fs)
file_paths <- fs::dir_ls("E:\\PYTHON_ST\\breakCSV_PYTHON\\AIM_2_regions\\Afghanistan")
file_paths
#create empty list and fill with file paths and loop through them
file_contents <- list()
for (i in seq_along(file_paths)) { #seq_along for vectors (list of file paths is a vector)
file_contents[[i]] <- read_csv(file = file_paths[[i]])
for (i in seq_len(file_contents[[i]])){ # redundant?
# do all the following steps in every file
# Step 1)
# Define years to divide table
#select conflict year in df
ConflictYear = file_contents[[i]][1,9]
ConflictYear
# select Start year of regression in df
SlopeYears = file_contents[[i]][1,7] #to get slope years (e.g.17)
BCStartYear = ConflictYear-SlopeYears #to get start year for regression
BCStartYear
#End year of regression
ACEndYear = ConflictYear+(SlopeYears-1) # -1 because the conflict year is included
ACEndYear
# Step 2
#select needed rows from df
#no headers but row numbers. NDVI.Year = [r1-r34,c2]
NDVI.Year <- file_contents[[i]][1:34,2]
NDVI <- file_contents[[i]][1:34,21]
T.annual.max <- file_contents[[i]][1:34,19]
Prec.annual.max <- file_contents[[i]][1:34,20]
soilM.annual.max <- file_contents[[i]][1:34,18]
#Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years
#Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc
BeforeConf1 <- file_contents[[i]][ which(file_contents[[i]]$NDVI.Year >= BCStartYear & file_contents[[i]]$NDVI.Year < ConflictYear),] #eg. 1982 to 1999
BeforeConf2 <- c(NDVI.Year,NDVI,T.annual.max,Prec.annual.max,soilM.annual.max) #which columns to include
BeforeConf <- BeforeConf1[BeforeConf2] #create table
AfterConf1 <- myFiles[ which(file_contents[[i]]$NDVI.Year >= ConflictYear & file_contents[[i]]$NDVI.Year <= ACEndYear),] #eg. 1999 to 2015
AfterConf2 <- c(NDVI.Year,soilM.annual.max)
AfterConf <- AfterConf1[AfterConf2]
#Step 3)a)
#create empty list,to fill with coefficient results from each model results for each csv file and safe in new list
#Create an empty df for the output coefficients
names <- c("(Intercept)","BeforeConf$T.annual.max","BeforeConf$Prec.annual.max","BeforeConf$soilM.annual.max")
coef_df <- data.frame()
for (k in names) coef_df[[k]] <- as.character()
#Apply Multiple Linear Regression
plyrFunc <- function(x){
model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max,data = BeforeConf)
return(summary(model)$coefficients[1,1:4])
}
coef_df <- ddply(BeforeConf,.(),x)
coef_DF
}}
解决方法
由于您的代码适用于单个CSV,请考虑将过程和循环分开。具体来说:
-
创建一个函数,该函数接收单个csv路径作为输入参数,并完成单个文件所需的所有操作。
get_coeffs <- function(csv_path) { df <- read.csv(csv_path) ### Step 1 # select conflict year,start year,and end year in df ConflictYear <- df[1,9] SlopeYears <- df[1,7] # to get slope years (e.g.17) BCStartYear <- ConflictYear - SlopeYears # to get start year for regression ACEndYear <- ConflictYear + (SlopeYears-1) # -1 because the conflict year is included ### Step 2 # select needed rows from df #no headers but row numbers. NDVI.Year = [r1-r34,c2] NDVI.Year <- df[1:34,2] NDVI <- df[1:34,21] T.annual.max <- df[1:34,19] Prec.annual.max <- df[1:34,20] soilM.annual.max <- df[1:34,18] # Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years # Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc BeforeConf1 <- df[ which(df$NDVI.Year >= BCStartYear & df$NDVI.Year < ConflictYear),] BeforeConf2 <- c(NDVI.Year,NDVI,T.annual.max,Prec.annual.max,soilM.annual.max) BeforeConf <- BeforeConf1[BeforeConf2] #create table AfterConf1 <- myFiles[ which(df$NDVI.Year >= ConflictYear & df$NDVI.Year <= ACEndYear),] AfterConf2 <- c(NDVI.Year,soilM.annual.max) AfterConf <- AfterConf1[AfterConf2] ### Step 3 tryCatch({ # Run model and return coefficients model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max,data = BeforeConf) return(summary(model)$coefficients[1,1:4]) },error = function(e) { print(e) return(rep(NA,4)) }) }
-
在csv路径中循环,将每个文件传递到函数中,构建结果列表,您可以使用
lapply
处理列表返回或sapply
(或vapply
指定长度和类型)以简化返回,例如向量,矩阵/数组(如果适用)。mypath <- "E:\\PYTHON_ST\\breakCSV_PYTHON\\AIM_2_regions\\Afghanistan" file_paths <- list.files(pattern=".csv",path=mypath) # LIST RETURN result_list <- lapply(file_paths,get_coeffs) # MATRIX RETURN results_matrix <- sapply(file_paths,get_coeffs) results_matrix <- vapply(file_paths,get_coeffs,numeric(4))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。