CSV Cheatsheet
Read
Read from .csv file:
import csv
reader = csv.reader(open('input.csv', 'rb'), delimiter=','))
or
import csv
with open('input.csv', 'rb') as f:
reader = csv.reader(f, delimiter=','))
To skip first line(header)
next(reader, None)
Print all rows:
for row in reader:
print row
Read as a list of key-value pairs
with open('foo.csv') as f:
reader = csv.reader(f, delimiter='|')
header = next(reader)
for row in reader:
print(list(zip(header,row)))
or dict
with open('foo.csv') as f:
reader = csv.reader(f, delimiter='|')
header = next(reader)
for row in reader:
print(dict(zip(header,row)))
Write
import csv
writer = csv.writer(open('output.csv', 'wb'), delimiter=','))
writer.writerow(['a','b','c'])
Trouble Shooting
Error message:
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
Solution:
Use 'rU'
instead.
Read CSV From HDFS
result = subprocess.run(['hadoop', 'fs', '-text', '/path/to/data/part*'], stdout=subprocess.PIPE)
lines = result.stdout.decode().strip().split("\n")
reader = csv.reader(lines)