JSON and CSV: Converting Between Formats for Analysis and Import
JSON and CSV serve different purposes — and converting between them correctly requires handling nested objects, arrays, special characters, and encoding. This guide covers the conversion mechanics, common failure points, and the tools that handle edge cases automatically.
JSON and CSV are the two most common data interchange formats in software development. APIs return JSON. Spreadsheets import CSV. Databases export both. When you need to move data between systems — from an API response to a spreadsheet, from a database export to a data pipeline, from a form submission to a reporting tool — you almost always need to convert between these formats.
Converting between JSON and CSV is conceptually simple and practically tricky. The formats have fundamentally different shapes: JSON is hierarchical, CSV is flat. The conversion always involves a mapping decision — and if you do not make that decision explicitly, tools will make it for you in ways that may silently lose data.
This guide covers how both formats work, how conversion works in both directions, what can go wrong, and how to handle the edge cases that break naive converters.
JSON: Hierarchical, Schema-Optional
JSON (JavaScript Object Notation) represents data as a hierarchy of objects, arrays, strings, numbers, booleans, and nulls:
[
{
"id": "u_123",
"name": "Alice Chen",
"email": "alice@example.com",
"role": "admin",
"profile": {
"timezone": "UTC+8",
"language": "en"
},
"tags": ["beta", "enterprise"]
},
{
"id": "u_456",
"name": "Bob Kowalski",
"email": "bob@example.com",
"role": "viewer",
"profile": {
"timezone": "UTC+1",
"language": "pl"
},
"tags": ["enterprise"]
}
]
JSON arrays of objects — like the example above — are the natural input for CSV conversion. Each object maps to a row, each key maps to a column.
CSV: Flat, Column-Ordered
CSV (Comma-Separated Values) represents tabular data as rows of values separated by delimiters:
id,name,email,role,timezone,language,tags
u_123,Alice Chen,alice@example.com,admin,UTC+8,en,"beta,enterprise"
u_456,Bob Kowalski,bob@example.com,viewer,UTC+1,pl,enterprise
CSV is defined by RFC 4180, though implementations vary significantly. The rules that matter:
- Each record is on a separate line (CRLF line endings in the spec, LF in practice)
- The first record may be a header row (usually is)
- Fields are separated by commas (or semicolons in European locales, tabs in TSV)
- Fields containing commas, double quotes, or line breaks must be enclosed in double quotes
- A double quote inside a quoted field is escaped by doubling it:
"She said ""hello"""
JSON to CSV: The Flattening Problem
Converting JSON to CSV requires mapping a hierarchy to a table. For simple objects (one level deep, no arrays), this is straightforward: each key becomes a column header, each value becomes a cell.
The problem is nested objects and arrays.
Nested objects
{
"id": "u_123",
"profile": {
"timezone": "UTC+8",
"language": "en"
}
}
Option 1: Flatten with dot notation
id,profile.timezone,profile.language
u_123,UTC+8,en
The nested object becomes columns with dot-separated names. This is the most common approach — readable, reversible, and compatible with most spreadsheet tools.
Option 2: JSON-encode the nested object
id,profile
u_123,"{""timezone"":""UTC+8"",""language"":""en""}"
The nested object is serialised as a JSON string in the cell. Preserves structure but the cell is unreadable in a spreadsheet and requires parsing on import.
Option 3: Ignore nested objects
Some converters simply omit nested objects. This silently loses data — a dangerous default.
Arrays
{
"id": "u_123",
"tags": ["beta", "enterprise"]
}
Option 1: JSON-encode the array
id,tags
u_123,"[""beta"",""enterprise""]"
Option 2: Comma-join the values
id,tags
u_123,"beta,enterprise"
The joined value in a quoted cell is readable in a spreadsheet but loses the distinction between a multi-value field and a string that contains a comma.
Option 3: Expand to multiple columns
id,tags[0],tags[1]
u_123,beta,enterprise
Works if all arrays have the same length. Breaks down with variable-length arrays.
Option 4: Expand to multiple rows
id,tag
u_123,beta
u_123,enterprise
Relational normalisation — each tag gets its own row. Requires the consumer to understand the relationship and group by ID.
None of these options is universally correct. The right choice depends on what the target system expects. Most CSV converters default to option 1 or 2. Use the JSON to CSV tool to see exactly how conversion is applied and adjust the output to match your requirements.
CSV to JSON: The Typing Problem
Converting CSV to JSON is simpler in one direction — everything starts as a string, and you always get an array of objects. But CSV gives you no type information, so every value arrives as a string:
id,age,active,score
1,25,true,9.5
Naive conversion:
[
{
"id": "1",
"age": "25",
"active": "true",
"score": "9.5"
}
]
Everything is a string. Code that expects user.age > 18 will fail, because "25" > 18 is unreliable in JavaScript (coercion-dependent) and throws a type error in strongly-typed languages.
Type inference solves this by inspecting values and converting where unambiguous:
[
{
"id": 1,
"age": 25,
"active": true,
"score": 9.5
}
]
Type inference rules that are usually safe:
- If all values in a column parse as integers, convert to integer
- If all values parse as floats, convert to float
- If all values are
trueorfalse(case-insensitive), convert to boolean - If a value is empty, convert to
null(not"") - Everything else stays as string
Use the CSV to JSON tool to inspect how type inference is applied to your specific data before piping it into production code.
Handling Special Characters
Both formats have characters that require escaping:
In CSV
Commas in values — the value must be quoted:
name,address
"Smith, John","123 Main St, Springfield"
Quotes in values — doubled inside a quoted field:
name,quote
Alice,"She said ""Hello, world!"""
Newlines in values — the value must be quoted (the record spans multiple lines):
id,description
1,"Line one
Line two
Line three"
This is valid CSV but breaks parsers that assume one record per line. If you control the data, sanitise newlines before generating CSV.
Special characters to watch:
| Character | Problem | Fix |
|---|---|---|
, |
Delimiter confusion | Quote the field |
" |
Unescaped quote | Double it: "" |
\n, \r\n |
Multi-line record | Quote the field |
; |
Some locales use semicolons as delimiters | Know your delimiter |
BOM () |
Excel adds UTF-8 BOM, some parsers choke | Strip or handle BOM |
In JSON
JSON has its own escaping requirements inside strings:
| Character | Escaped form |
|---|---|
" |
\" |
\ |
\\ |
| Newline | \n |
| Tab | \t |
| Unicode | \uXXXX |
When embedding CSV content as a JSON string value, both layers of escaping apply:
{
"csv": "name,quote\nAlice,\"She said \\\"hello\\\"\""
}
Delimiter Variants
CSV uses commas by default, but real-world files use several delimiters:
| Format | Delimiter | When you see it |
|---|---|---|
| CSV | , |
Most tools, English locales |
| DSV / semicolon-separated | ; |
European locales (comma is decimal separator) |
| TSV | \t (tab) |
Database exports, log files |
| Pipe-separated | | |
Legacy systems, when values contain commas |
Excel in European locales (French, German, Dutch) generates semicolon-separated files when saving as CSV, because the comma is the decimal separator in those locales. If you receive a "CSV" file that looks wrong in a comma-based parser, try semicolon.
Encoding: UTF-8 and the Excel Problem
Modern CSV should be UTF-8. But Excel has a decades-long habit of:
- Not adding a UTF-8 BOM when saving CSV from non-Latin input
- Adding a UTF-8 BOM (
\xEF\xBB\xBF) when explicitly saving as "UTF-8 CSV" - Defaulting to the system encoding (CP-1252 on Windows, Mac Roman on older macOS) when saving from legacy workflows
The result: CSV files with non-Latin characters (Chinese, Arabic, Cyrillic, accented European characters) that look correct in Excel but display as garbage when read with a UTF-8 parser — or vice versa.
Diagnosis: open the file in a hex editor or run file -i yourfile.csv (Linux/macOS). Look for the byte sequence EF BB BF at the start (UTF-8 BOM) or check the declared encoding.
Fix on read:
// Node.js — strip BOM if present
const content = fs.readFileSync('data.csv', 'utf8').replace(/^/, '');
# Python — use utf-8-sig to auto-strip BOM
with open('data.csv', encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
Fix on write (for Excel compatibility):
// Add BOM so Excel opens the file correctly
const csvWithBom = '' + csvContent;
fs.writeFileSync('output.csv', csvWithBom, 'utf8');
Large Files: Streaming vs. Loading
For small files (< 50 MB), loading the entire content into memory and converting is fine. For large files, streaming is necessary.
Streaming CSV to JSON in Node.js
const fs = require('fs');
const csv = require('csv-parser'); // npm install csv-parser
const results = [];
fs.createReadStream('large-file.csv')
.pipe(csv())
.on('data', (row) => {
results.push(row);
// Or process row-by-row without accumulating all rows:
// processRow(row);
})
.on('end', () => {
console.log(`Processed ${results.length} rows`);
});
Streaming CSV to JSON in Python
import csv
import json
with open('large-file.csv', encoding='utf-8-sig') as csvfile:
reader = csv.DictReader(csvfile)
with open('output.ndjson', 'w') as outfile:
for row in reader:
# Write one JSON object per line (NDJSON/JSON Lines format)
outfile.write(json.dumps(row) + '\n')
NDJSON (Newline-Delimited JSON) is a streaming-friendly alternative to JSON arrays — each line is a complete JSON object, so you can process files line-by-line without parsing the entire file.
Common Conversion Failures
Empty arrays produce no columns. If the first object in your JSON array has an empty tags: [], the CSV may have no tags column. Later objects with non-empty arrays have no column to write to.
Inconsistent schemas. If object 1 has {"a": 1, "b": 2} and object 2 has {"a": 1, "c": 3}, the CSV needs columns for both b and c. Some converters only use the keys from the first object and silently drop c on row 2.
Null vs missing. {"name": null} and {} (missing key) produce the same empty cell in CSV. On the return trip, both come back as empty strings. You lose the distinction.
Numbers stored as strings in Excel. Excel applies number formatting to cells that look like numbers. A value like "001" (padded ID) becomes 1 when Excel opens the CSV. Prefix with a single quote in Excel, or encode the value with a non-numeric character.
Date formatting. CSV has no native date type. 2026-05-22 is a string — if you write it as 22/05/2026 in one file and 05/22/2026 in another, any parser that tries to infer dates will fail silently on one or both.
Use the CSV to JSON and JSON to CSV tools to validate conversion with your real data before writing any code — it is much faster to catch these issues interactively than to debug them after the fact.
Programmatic Conversion Reference
JavaScript / Node.js
// JSON array → CSV string
function jsonToCsv(data) {
if (!data.length) return '';
const headers = Object.keys(data[0]);
const rows = data.map(row =>
headers.map(h => {
const val = row[h] ?? '';
const str = typeof val === 'object' ? JSON.stringify(val) : String(val);
return str.includes(',') || str.includes('"') || str.includes('\n')
? `"${str.replace(/"/g, '""')}"` : str;
}).join(',')
);
return [headers.join(','), ...rows].join('\n');
}
// CSV string → JSON array
function csvToJson(csv) {
const lines = csv.trim().split('\n');
const headers = lines[0].split(',');
return lines.slice(1).map(line => {
const values = line.split(',');
return Object.fromEntries(headers.map((h, i) => [h.trim(), values[i]?.trim() ?? '']));
});
}
Python
import csv, json, io
def json_to_csv(data: list[dict]) -> str:
if not data: return ''
output = io.StringIO()
writer = csv.DictWriter(output, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
return output.getvalue()
def csv_to_json(csv_content: str) -> list[dict]:
reader = csv.DictReader(io.StringIO(csv_content))
return list(reader)
The Python csv module handles all RFC 4180 quoting and escaping automatically. For production code, prefer the standard library over manual string splitting.