The Python CSV Module and Legacy Data
When you work with csv files as much as I do, particularly with csv files created by legacy applications, you tend to run into the odd problems. Consider the following legacy csv data (a real example):
"this is","an, example","10","of problem data",20
The reader of Python's csv module will turn this into a list like so:
[ "this is", "an, example", "10", "of problem data",20 ]
No problem so far. Now, let's use the csv writer to turn this same data back into csv data again, round trip. Without taking any special precautions, we would get:
this is,"an, example",10,of problem data,20
What happened here? Well, the csv writer will normally only quote data when it contains the field separator. We can get closer to what we want (that is, recreating the original csv data) by using the QUOTE_NONNUMERIC parameter to the writer. When we do, we get:
"this is","an, example",10,"of problem data",20
Closer, but the third field, which was quoted in the original data, is not. We could try using the QUOTE_ALL parameter, which would give us the third field quoted, but unfortunately we'd also get the fifth field quoted, which was not the way the original data had it.
What I need is a way of controlling the quoting of fields on a field by field basis. Sadly, Python's csv module doesn't give me that level of control over field quoting. So when I have to deal with legacy csv data like that above, I'm forced to bypass the csv module for writing, and roll my own. I can still use the csv module for reading.
11:29:11 PM
|
|