如何解决如何从csv文件中提取一个numpy数组?
对于作业,我必须使用NumPy从CSV文件中提取数据。该文件包含多行,但是第一行包含标签,看起来像
label,pixel1,pixel2,pixel3,...,pixel785
-这个应该被忽略。
接下来的行在第一个单元格中包含一个标签(我相信是1-10之间的某个整数),接下来的784个单元格包含实际的像素数值。这些数字必须重塑为28x28数组。
该函数应该返回2种np.array类型,一种带有标签,一种带有图像,并且输出应如下所示:
(27455,28,28)
(27455,)
(7172,28)
(7172,)
到目前为止,这就是我所拥有的。我设法将像素值放入28x28阵列中(我认为),但是我不确定如何从那里开始。该作业建议使用np.as_type(),因为我需要将这些值转换为浮点数。
我从未在NumPy中使用过数组,所以我不确定如何使用它们。我是否正确地做了第一部分?如何退回图像和标签?
(由于我正试图理解所有概念和建议,因此请在回答时留在作业范围内。由于我已经在为此苦苦挣扎,因此我不希望找到其他可能的解决方案。 !)
def get_data(filename):
# You will need to write code that will read the file passed
# into this function. The first line contains the column headers
# so you should ignore it
# Each successive line contains 785 comma separated values between 0 and 255
# The first value is the label
# The rest are the pixel values for that picture
# The function will return 2 np.array types. One with all the labels
# One with all the images
#
# Tips:
# If you read a full line (as 'row') then row[0] has the label
# and row[1:785] has the 784 pixel values
# Take a look at np.array_split to turn the 784 pixels into 28x28
# You are reading in strings,but need the values to be floats
# Check out np.array().astype for a conversion
with open(filename,'r') as training_file:
# Your code starts here
#training_file.readline()
csv_reader = csv.reader(training_file)
header=next(csv_reader)
if header != None:
for row in csv_reader:
images=np.array_split(row[1:],28)
# Your code ends here
return images,labels
path_sign_mnist_train = f"{getcwd()}/../tmp2/sign_mnist_train.csv"
path_sign_mnist_test = f"{getcwd()}/../tmp2/sign_mnist_test.csv"
training_images,training_labels = get_data(path_sign_mnist_train)
testing_images,testing_labels = get_data(path_sign_mnist_test)
# Keep these
print(training_images.shape)
print(training_labels.shape)
print(testing_images.shape)
print(testing_labels.shape)
# Their output should be:
# (27455,28)
# (27455,)
# (7172,28)
# (7172,)
解决方法
认为这可以解决问题
def get_data(filename):
# You will need to write code that will read the file passed
# into this function. The first line contains the column headers
# so you should ignore it
# Each successive line contains 785 comma separated values between 0 and 255
# The first value is the label
# The rest are the pixel values for that picture
# The function will return 2 np.array types. One with all the labels
# One with all the images
#
# Tips:
# If you read a full line (as 'row') then row[0] has the label
# and row[1:785] has the 784 pixel values
# Take a look at np.array_split to turn the 784 pixels into 28x28
# You are reading in strings,but need the values to be floats
# Check out np.array().astype for a conversion
with open(filename,"r") as training_file:
# Your code starts here
# training_file.readline()
csv_reader = csv.reader(training_file) # None makes skip headers
next(csv_reader,None) # skip the headers
images = []
labels = []
for row in csv_reader:
images.append(np.array(row[1:]).reshape(28,28))
labels.append(row[0])
images = np.array(images).astype(np.float32)
labels = np.array(labels).astype(np.float32)
# Your code ends here
return images,labels
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。