在R中，如何在新数据框中循环浏览csv文件和线性回归的安全输出？

如何解决在R中，如何在新数据框中循环浏览csv文件和线性回归的安全输出？

可以在我的Github文件夹中找到我的脚本和前三个csv文件之一

我已将NDVI和气候数据列表拆分为小型csv。每个文件具有34年的数据。

然后，根据冲突年份，应将每34年分为两部分，分别保存在同一表和特定时间范围内。但是这部分代码已经可以使用了。

现在，我想通过使用多元线性回归来利用第一部分的气候数据控制列表的第二部分。

我基本上需要做一个循环来存储一个csv的lm函数每一轮的所有系数。文件添加到新列表中。

我知道我可以使用lapply循环并获得列表输出。但是实际上有一些缺少的部分可以循环通过csv。文件。

#load libraries
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
library(ggpubr)
library(plyr)
library(tidyverse)
library(fs)


file_paths <- fs::dir_ls("E:\\PYTHON_ST\\breakCSV_PYTHON\\AIM_2_regions\\Afghanistan")
file_paths

#create empty list and fill with file paths and loop through them
file_contents <- list()
for (i in seq_along(file_paths)) { #seq_along for vectors (list of file paths is a vector)
  file_contents[[i]] <- read_csv(file = file_paths[[i]])
                  
                for (i in seq_len(file_contents[[i]])){ # redundant?
                  
                 # do all the following steps in every file                                        
                 
                 # Step 1) 
                 # Define years to divide table
                 
                 #select conflict year in df 
                 ConflictYear = file_contents[[i]][1,9]
                 ConflictYear
                 
                 # select Start year of regression in df
                 SlopeYears = file_contents[[i]][1,7] #to get slope years (e.g.17)
                 BCStartYear = ConflictYear-SlopeYears #to get start year for regression
                 BCStartYear
                 
                 #End year of regression
                 ACEndYear = ConflictYear+(SlopeYears-1) # -1 because the conflict year is included
                 ACEndYear
                 
                 
                 # Step 2
                 
                 #select needed rows from df
                 #no headers but row numbers. NDVI.Year = [r1-r34,c2]
                 NDVI.Year <- file_contents[[i]][1:34,2]
                 NDVI <- file_contents[[i]][1:34,21]
                 T.annual.max <- file_contents[[i]][1:34,19]
                 Prec.annual.max <- file_contents[[i]][1:34,20]
                 soilM.annual.max <- file_contents[[i]][1:34,18]
                 
                 #Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years
                 #Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc
                 BeforeConf1 <- file_contents[[i]][ which(file_contents[[i]]$NDVI.Year >= BCStartYear & file_contents[[i]]$NDVI.Year < ConflictYear),] #eg. 1982 to 1999
                 BeforeConf2 <-  c(NDVI.Year,NDVI,T.annual.max,Prec.annual.max,soilM.annual.max) #which columns to include
                 BeforeConf <- BeforeConf1[BeforeConf2] #create table
                 
                 AfterConf1 <- myFiles[ which(file_contents[[i]]$NDVI.Year >= ConflictYear & file_contents[[i]]$NDVI.Year <= ACEndYear),] #eg. 1999 to 2015
                 AfterConf2 <-  c(NDVI.Year,soilM.annual.max)
                 AfterConf <- AfterConf1[AfterConf2]
                 
                 #Step 3)a)
                 #create empty list,to fill with coefficient results from each model results for each csv file and safe in new list
                 
                 #Create an empty df for the output coefficients
                 names <- c("(Intercept)","BeforeConf$T.annual.max","BeforeConf$Prec.annual.max","BeforeConf$soilM.annual.max")
                 coef_df <- data.frame()
                 for (k in names) coef_df[[k]] <- as.character() 
                 
                 #Apply Multiple Linear Regression
                 plyrFunc <- function(x){
                   model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max,data = BeforeConf)
                   return(summary(model)$coefficients[1,1:4])
                 }
                 
                 coef_df <- ddply(BeforeConf,.(),x)
                 coef_DF
    }}

解决方法

由于您的代码适用于单个CSV，请考虑将过程和循环分开。具体来说：

创建一个函数，该函数接收单个csv路径作为输入参数，并完成单个文件所需的所有操作。

get_coeffs <- function(csv_path) {
  df <- read.csv(csv_path)

  ### Step 1
  # select conflict year,start year,and end year in df 
  ConflictYear <- df[1,9]
  SlopeYears <- df[1,7]                       # to get slope years (e.g.17)
  BCStartYear <- ConflictYear - SlopeYears    # to get start year for regression
  ACEndYear <- ConflictYear + (SlopeYears-1)  # -1 because the conflict year is included

  ### Step 2
  # select needed rows from df
  #no headers but row numbers. NDVI.Year = [r1-r34,c2]
  NDVI.Year <- df[1:34,2]
  NDVI <- df[1:34,21]
  T.annual.max <- df[1:34,19]
  Prec.annual.max <- df[1:34,20]
  soilM.annual.max <- df[1:34,18]

  # Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years
  # Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc
  BeforeConf1 <- df[ which(df$NDVI.Year >= BCStartYear & df$NDVI.Year < ConflictYear),]
  BeforeConf2 <- c(NDVI.Year,NDVI,T.annual.max,Prec.annual.max,soilM.annual.max)
  BeforeConf  <- BeforeConf1[BeforeConf2] #create table

  AfterConf1 <- myFiles[ which(df$NDVI.Year >= ConflictYear & df$NDVI.Year <= ACEndYear),]
  AfterConf2 <- c(NDVI.Year,soilM.annual.max)
  AfterConf  <- AfterConf1[AfterConf2]

  ### Step 3
  tryCatch({
      # Run model and return coefficients
      model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max,data = BeforeConf) 
      return(summary(model)$coefficients[1,1:4])
  },error = function(e) {
      print(e)
      return(rep(NA,4))
  })
}

在csv路径中循环，将每个文件传递到函数中，构建结果列表，您可以使用lapply处理列表返回或sapply（或vapply指定长度和类型）以简化返回，例如向量，矩阵/数组（如果适用）。

mypath <- "E:\\PYTHON_ST\\breakCSV_PYTHON\\AIM_2_regions\\Afghanistan"
file_paths <- list.files(pattern=".csv",path=mypath)

# LIST RETURN
result_list <- lapply(file_paths,get_coeffs)

# MATRIX RETURN
results_matrix <- sapply(file_paths,get_coeffs)
results_matrix <- vapply(file_paths,get_coeffs,numeric(4))

在R中，如何在新数据框中循环浏览csv文件和线性回归的安全输出？

如何解决在R中，如何在新数据框中循环浏览csv文件和线性回归的安全输出？

解决方法

相关推荐