Trivial Thoughts

Trivial Thoughts
Thoughts and discussion on programming projects using the Python language.

Home

Python Projects

The Heap

Python Sites of Note

Software Development

Test-Driven Development

Extreme Programming

Recent Posts

The Python CSV Module and Legacy Data	4/23/04
Working With Fixed Record Length CSV Files	9/23/03
Working With Fixed Record Length Files	9/22/03
Python 2.3 and the csv module	9/12/03
Python 2.3 and the datetime module	9/11/03
"I'm not dead yet!"	8/21/03
mxDateTime Update, and Stuff	7/21/03
Quest for Massage, Part 8	7/17/03
Quest for Massage, Project Status Update	7/10/03
ElementTree and Tidy	7/7/03
Quest for Massage, Part 7	7/1/03
XMLSPY	6/26/03
Crunch Time	6/25/03
Quest for Massage, Part 6	6/18/03
Quest for Massage, Part 5	6/15/03
More Thoughts About Filters	6/2/03
Thinking About Filters	5/28/03

Friday, April 23, 2004

The Python CSV Module and Legacy Data

When you work with csv files as much as I do, particularly with csv files created by legacy applications, you tend to run into the odd problems. Consider the following legacy csv data (a real example):

"this is","an, example","10","of problem data",20

The reader of Python's csv module will turn this into a list like so:

[ "this is", "an, example", "10", "of problem data",20 ]

No problem so far. Now, let's use the csv writer to turn this same data back into csv data again, round trip. Without taking any special precautions, we would get:

this is,"an, example",10,of problem data,20

What happened here? Well, the csv writer will normally only quote data when it contains the field separator. We can get closer to what we want (that is, recreating the original csv data) by using the QUOTE_NONNUMERIC parameter to the writer. When we do, we get:

"this is","an, example",10,"of problem data",20

Closer, but the third field, which was quoted in the original data, is not. We could try using the QUOTE_ALL parameter, which would give us the third field quoted, but unfortunately we'd also get the fifth field quoted, which was not the way the original data had it.

What I need is a way of controlling the quoting of fields on a field by field basis. Sadly, Python's csv module doesn't give me that level of control over field quoting. So when I have to deal with legacy csv data like that above, I'm forced to bypass the csv module for writing, and roll my own. I can still use the csv module for reading.

11:29:11 PM comment []

Previous/Next