JSON vs YAML vs CSV vs XML: Pick the Right Format
Each data format exists for a reason. Here's when to reach for each one, how to convert between them, and the gotchas that trip people up.
Every data format is a set of tradeoffs. JSON won the web API wars but is a pain for config files. YAML is great for config but will silently turn your version number into a float. CSV is dead simple until you hit a comma inside a field. XML is verbose but still runs half the enterprise world.
Here's when to reach for each one.
JSON: The Default
JSON (JavaScript Object Notation) is the lingua franca of web APIs. It supports objects, arrays, strings, numbers, booleans, and null. It's native to JavaScript and has first-class parsers in every language.
{
"name": "deployment-config",
"replicas": 3,
"env": {
"NODE_ENV": "production",
"LOG_LEVEL": "warn"
},
"ports": [8080, 8443]
}Strengths
- Universal support. Every language, every platform, every API.
- Clear, unambiguous syntax. No indentation-sensitivity surprises.
- Great tooling -- formatters, validators, schema generators everywhere.
- Native to browsers.
JSON.parse()is fast.
Weaknesses
- No comments. You literally cannot annotate your config files. This is the biggest complaint about JSON for configuration.
- Verbose for deeply nested structures. Lots of braces and quotes.
- No trailing commas. Adding a line to an array means modifying two lines in your diff.
- No native date type. Dates are strings, and everyone argues about the format.
Keep your JSON clean with the JSON formatter and catch syntax errors early with the JSON validator.
YAML: Config Files Done Right (Usually)
YAML is what happens when you take JSON and optimize it for humans writing config files. It supports comments, is less verbose, and reads more naturally. Kubernetes manifests, Docker Compose files, GitHub Actions workflows, CI/CD pipelines -- YAML is everywhere in DevOps.
# Deployment configuration name: deployment-config replicas: 3 env: NODE_ENV: production LOG_LEVEL: warn ports: - 8080 - 8443
The gotchas that will bite you
YAML has some infamous quirks that catch people off guard:
# The Norway problem country: NO # parsed as boolean false, not the string "NO" country: "NO" # this is the string "NO" # Version numbers version: 3.10 # parsed as float 3.1, not string "3.10" version: "3.10" # this keeps the trailing zero # Yes/no are booleans feature_flag: yes # parsed as boolean true feature_flag: "yes" # this is the string "yes" # Indentation matters parent: child: value # 2-space indent = nested under parent child: value # 1-space indent = YAML parse error
These aren't edge cases -- they're the kind of bugs that show up in production Kubernetes deployments. The YAML 1.2 spec fixed the boolean issue (only true/false), but many parsers still follow YAML 1.1.
Validate your YAML before deploying with the YAML validator. Need to convert between formats? The JSON to YAML converter handles the translation.
CSV: Flat Data, Maximum Compatibility
CSV is the simplest format here. Rows of data, comma-separated values, one record per line. It's the universal import/export format for spreadsheets, databases, and data pipelines.
name,email,role,start_date Jane Doe,jane@example.com,admin,2024-01-15 Bob Smith,"Smith, Bob",user,2024-03-01 Alice Johnson,alice@example.com,editor,2024-02-20
Where CSV wins
- Tabular data exports from databases and analytics tools.
- Data interchange with non-technical users (everyone can open it in Excel).
- Streaming large datasets -- you can process line by line without loading everything into memory.
- Log files and time-series data where each row is an event.
Where CSV falls apart
- No nested data. You can't represent an object with child objects. You either flatten the structure or use a different format.
- No types. Everything is a string. The consumer has to guess whether "42" is a number or a zip code.
- Quoting is a mess. What happens when a field contains a comma? Or a newline? Or a quote? RFC 4180 defines the rules, but not every implementation follows them.
- No standard encoding. Excel opens CSVs as Latin-1 by default on some systems. Your data pipeline reads them as UTF-8. Characters break.
If you need to transform CSV data into something more structured, the CSV to JSON converter handles the common case of turning rows into an array of objects.
XML: Verbose, Powerful, Still Everywhere
XML gets a bad reputation in the JSON era, but it solves problems JSON doesn't. Namespaces, schemas (XSD), mixed content (text with inline markup), and processing instructions are all built in.
<?xml version="1.0" encoding="UTF-8"?>
<deployment xmlns="http://example.com/deploy">
<name>deployment-config</name>
<replicas>3</replicas>
<env>
<var name="NODE_ENV">production</var>
<var name="LOG_LEVEL">warn</var>
</env>
<ports>
<port>8080</port>
<port>8443</port>
</ports>
</deployment>Where XML still dominates
- Enterprise integrations: SOAP APIs, EDI, healthcare (HL7/FHIR), financial services (FIX, XBRL). These aren't going away.
- Document formats: XHTML, SVG, RSS/Atom feeds, EPUB, Office Open XML (.docx, .xlsx).
- Configuration with schemas: Maven's pom.xml, Android manifests, Spring config. XSD validation catches errors before deployment.
- Mixed content: When you need text with inline markup (like HTML), XML is the natural fit. JSON can't do this cleanly.
Format messy XML with the XML formatter.
When to Convert Between Formats
Format conversion comes up in predictable situations:
- API response to config file: You fetch JSON from an API and need it in YAML for a Kubernetes manifest. Convert with JSON to YAML.
- Spreadsheet to API: Someone gives you a CSV export and your system ingests JSON. Convert with CSV to JSON.
- Legacy integration: A partner sends XML but your microservice speaks JSON. Parse the XML, map the fields, serialize to JSON.
- Human review: You have a huge JSON blob and need to scan it quickly. YAML's cleaner indentation can make it more readable for review.
Decision Cheat Sheet
| Use Case | Format | Why |
|---|---|---|
| Web API | JSON | Universal support, native to JS |
| Config file (with comments) | YAML | Readable, supports comments |
| Tabular data / spreadsheet | CSV | Simple, universally importable |
| Enterprise / document markup | XML | Schemas, namespaces, mixed content |
| Config file (no comments needed) | JSON | Simpler parsing, no indent issues |
| Data pipeline / ETL | CSV or JSON Lines | Streamable, line-by-line processing |
There's no universally "best" format. JSON is the safe default for APIs. YAML is the safe default for config. CSV is the safe default for tabular data. XML is the safe default when your enterprise partner says "we use XML." Pick the one that fits your use case and move on.