如何解决从CSV移除不需要的行时出现IndexError
我不知道为什么得到这个,我试图跳过包含'?'的行。作为列值。 Example of dataset
csv文件示例:
59,Private,109015,HS-grad,9,Divorced,Tech-support,Unmarried,White,Female,40,United-States,<=50K
56,Local-gov,216851,Bachelors,13,Married-civ-spouse,Husband,Male,>50K
19,168294,Never-married,Craft-repair,Own-child,<=50K
54,?,180211,Some-college,10,Asian-Pac-Islander,60,South,>50K
39,367260,Exec-managerial,Not-in-family,80,<=50K
49,193366,<=50K
23,190709,Assoc-acdm,12,Protective-serv,52,<=50K
20,266015,Sales,Black,44,<=50K
45,386940,1408,<=50K
30,Federal-gov,59951,Adm-clerical,<=50K
18,226956,Other-service,30,<=50K
我正在使用python,这是我的代码:
# Load the adult dataset
import csv
f = open("./adult_data.csv")
records = csv.reader(f,delimiter = ',')
# We define a header ourselves since the dataset contains only the raw numbers.
dataset = []
header = ['Age','Workclass','Fnlwgt','Education','Education-num','Marital-status','Occupation','Relationship','Race','Sex','Capital-gain','Capital-loss','Hours-per-week','Native- country','Salary'
]
for line in records:
question_mark = True
for i in range(len(header)):
if (line[i] == ' ?'):
question_mark = False
if (question_mark):
d = dict(zip(header,line))
d['Age'] = int(d['Age'])
d['Fnlwgt'] = int(d['Fnlwgt'])
d['Education-num'] = int(d['Education-num'])
d['Capital-gain'] = int(d['Capital-gain'])
d['Capital-loss'] = int(d['Capital-loss'])
d['Hours-per-week'] = int(d['Hours-per-week'])
dataset.append(d)
这是我的输出:
Output
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-6-a6f851085aed> in <module>
12 question_mark = True
13 for i in range(len(header)):
---> 14 if(line[i] == ' ?'):
15 question_mark = False
16 if(question_mark):
IndexError: list index out of range
解决方法
线条
for i in range(len(header)):
if (line[i] == ' ?'):
例如,如果文件末尾有一个空行,或者某行中未包含预期的单元格数,则会引发索引错误。
您可以通过直接在行上进行遍历来解决此问题,而不是按索引访问项目(某些人认为这可能是较差的样式)。
for cell in line:
if cell == ' ?':
...
正如Furas在comments中指出的那样,可以将代码进一步简化为
question_mark = (' ?' in line)