在R中,如何在新数据框中循环浏览csv文件和线性回归的安全输出?

如何解决在R中,如何在新数据框中循环浏览csv文件和线性回归的安全输出?

可以在我的Github文件夹中找到我的脚本和前三个csv文件之一

我已将NDVI和气候数据列表拆分为小型csv。每个文件具有34年的数据。

然后,根据冲突年份,应将每34年分为两部分,分别保存在同一表和特定时间范围内。但是这部分代码已经可以使用了。

现在,我想通过使用多元线性回归来利用第一部分的气候数据控制列表的第二部分。

我基本上需要做一个循环来存储一个csv的lm函数每一轮的所有系数。文件添加到新列表中。

我知道我可以使用lapply循环并获得列表输出。但是实际上有一些缺少的部分可以循环通过csv。文件。

#load libraries
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
library(ggpubr)
library(plyr)
library(tidyverse)
library(fs)


file_paths <- fs::dir_ls("E:\\PYTHON_ST\\breakCSV_PYTHON\\AIM_2_regions\\Afghanistan")
file_paths

#create empty list and fill with file paths and loop through them
file_contents <- list()
for (i in seq_along(file_paths)) { #seq_along for vectors (list of file paths is a vector)
  file_contents[[i]] <- read_csv(file = file_paths[[i]])
                  
                for (i in seq_len(file_contents[[i]])){ # redundant?
                  
                 # do all the following steps in every file                                        
                 
                 # Step 1) 
                 # Define years to divide table
                 
                 #select conflict year in df 
                 ConflictYear = file_contents[[i]][1,9]
                 ConflictYear
                 
                 # select Start year of regression in df
                 SlopeYears = file_contents[[i]][1,7] #to get slope years (e.g.17)
                 BCStartYear = ConflictYear-SlopeYears #to get start year for regression
                 BCStartYear
                 
                 #End year of regression
                 ACEndYear = ConflictYear+(SlopeYears-1) # -1 because the conflict year is included
                 ACEndYear
                 
                 
                 # Step 2
                 
                 #select needed rows from df
                 #no headers but row numbers. NDVI.Year = [r1-r34,c2]
                 NDVI.Year <- file_contents[[i]][1:34,2]
                 NDVI <- file_contents[[i]][1:34,21]
                 T.annual.max <- file_contents[[i]][1:34,19]
                 Prec.annual.max <- file_contents[[i]][1:34,20]
                 soilM.annual.max <- file_contents[[i]][1:34,18]
                 
                 #Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years
                 #Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc
                 BeforeConf1 <- file_contents[[i]][ which(file_contents[[i]]$NDVI.Year >= BCStartYear & file_contents[[i]]$NDVI.Year < ConflictYear),] #eg. 1982 to 1999
                 BeforeConf2 <-  c(NDVI.Year,NDVI,T.annual.max,Prec.annual.max,soilM.annual.max) #which columns to include
                 BeforeConf <- BeforeConf1[BeforeConf2] #create table
                 
                 AfterConf1 <- myFiles[ which(file_contents[[i]]$NDVI.Year >= ConflictYear & file_contents[[i]]$NDVI.Year <= ACEndYear),] #eg. 1999 to 2015
                 AfterConf2 <-  c(NDVI.Year,soilM.annual.max)
                 AfterConf <- AfterConf1[AfterConf2]
                 
                 #Step 3)a)
                 #create empty list,to fill with coefficient results from each model results for each csv file and safe in new list
                 
                 #Create an empty df for the output coefficients
                 names <- c("(Intercept)","BeforeConf$T.annual.max","BeforeConf$Prec.annual.max","BeforeConf$soilM.annual.max")
                 coef_df <- data.frame()
                 for (k in names) coef_df[[k]] <- as.character() 
                 
                 #Apply Multiple Linear Regression
                 plyrFunc <- function(x){
                   model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max,data = BeforeConf)
                   return(summary(model)$coefficients[1,1:4])
                 }
                 
                 coef_df <- ddply(BeforeConf,.(),x)
                 coef_DF
    }}

解决方法

由于您的代码适用于单个CSV,请考虑将过程和循环分开。具体来说:

  1. 创建一个函数,该函数接收单个csv路径作为输入参数,并完成单个文件所需的所有操作。

    get_coeffs <- function(csv_path) {
      df <- read.csv(csv_path)
    
      ### Step 1
      # select conflict year,start year,and end year in df 
      ConflictYear <- df[1,9]
      SlopeYears <- df[1,7]                       # to get slope years (e.g.17)
      BCStartYear <- ConflictYear - SlopeYears    # to get start year for regression
      ACEndYear <- ConflictYear + (SlopeYears-1)  # -1 because the conflict year is included
    
      ### Step 2
      # select needed rows from df
      #no headers but row numbers. NDVI.Year = [r1-r34,c2]
      NDVI.Year <- df[1:34,2]
      NDVI <- df[1:34,21]
      T.annual.max <- df[1:34,19]
      Prec.annual.max <- df[1:34,20]
      soilM.annual.max <- df[1:34,18]
    
      # Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years
      # Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc
      BeforeConf1 <- df[ which(df$NDVI.Year >= BCStartYear & df$NDVI.Year < ConflictYear),]
      BeforeConf2 <- c(NDVI.Year,NDVI,T.annual.max,Prec.annual.max,soilM.annual.max)
      BeforeConf  <- BeforeConf1[BeforeConf2] #create table
    
      AfterConf1 <- myFiles[ which(df$NDVI.Year >= ConflictYear & df$NDVI.Year <= ACEndYear),]
      AfterConf2 <- c(NDVI.Year,soilM.annual.max)
      AfterConf  <- AfterConf1[AfterConf2]
    
      ### Step 3
      tryCatch({
          # Run model and return coefficients
          model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max,data = BeforeConf) 
          return(summary(model)$coefficients[1,1:4])
      },error = function(e) {
          print(e)
          return(rep(NA,4))
      })
    }
    
  2. 在csv路径中循环,将每个文件传递到函数中,构建结果列表,您可以使用lapply处理列表返回或sapply(或vapply指定长度和类型)以简化返回,例如向量,矩阵/数组(如果适用)。

    mypath <- "E:\\PYTHON_ST\\breakCSV_PYTHON\\AIM_2_regions\\Afghanistan"
    file_paths <- list.files(pattern=".csv",path=mypath)
    
    # LIST RETURN
    result_list <- lapply(file_paths,get_coeffs)
    
    # MATRIX RETURN
    results_matrix <- sapply(file_paths,get_coeffs)
    results_matrix <- vapply(file_paths,get_coeffs,numeric(4))
    

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-