Raw JSON data often needs reshaping before it can be useful. An API returns nested JSON with unnecessary fields; you need only specific keys. A log file contains JSON records; you need to extract metrics.
A configuration file has default values; you need to merge with overrides. Doing this in code (JavaScript loops, Python dict comprehensions) works but requires writing, testing, and maintaining scripts. Dedicated JSON transformation tools handle this declaratively: you describe the desired output, the tool produces it.
two tools dominate: jq (command-line, Go-based, the standard for shell pipelines) and JSONata (expression language, browser-friendly, great for Node.js and data pipelines). Both are essential skills for developers working with JSON data. jq is faster and integrates with shell scripts; JSONata is more readable and works in web browsers. This guide covers both with practical examples for common transformation tasks.
jq Basics: Filtering and Selecting Data
jq is a command-line JSON processor written in Go, available on every platform (brew install jq, apt install jq, choco install jq). Basic syntax: pipe JSON into jq 'expression'. Select a key: echo '{"name": "Alice"}' | jq '.name' —"Alice". Select nested: jq '.user.address.city'.
# jq - Command-line JSON transformation examples
# Input data.json:
# {"users":[{"id":1,"name":"Alice","active":true},
# {"id":2,"name":"Bob","active":false}]}
# Filter active users
jq '.users[] | select(.active == true)' data.json
# Transform: extract id + uppercase name
jq '.users[] | {id, name: .name | ascii_upcase}' data.json
# Aggregate: count by status
jq '[.users[] | {status: (if .active then "active"
else "inactive" end)}]
| group_by(.status)
| map({status: .[0].status, count: length})' data.json
// JSONata - JavaScript expression-based transformation
const jsonata = require('jsonata');
const data = {
order: { id: "ORD-001", items: [
{ sku: "A100", qty: 2, price: 19.99 },
{ sku: "B200", qty: 1, price: 49.50 }
]}
};
// Calculate order total
const expr = jsonata(
'order.items.(price * qty) ~> $sum()');
console.log(expr.evaluate(data)); // 89.48
// Filter and reshape items over $20
const expr2 = jsonata(
'order.items[price > 20].' +
'{ "product": sku, "cost": price }');
console.log(expr2.evaluate(data));
// [{ "product": "B200", "cost": 49.5 }]
Array access: jq '.[0]' (first element), jq '.[-1]' (last). Array slice: jq '.[0:5]' (first 5). Multiple keys: jq '{name: .name, id: .id}' (object construction). Delete a key: jq 'del(.password)'.
Pipe to next filter: jq '.users[] | .name' (iterate array). In practice: curl -s https://api.example.com/users | jq '.[] | select(.active==true) | {name: .name, email: .email}'. This fetches users, filters active ones, outputs name and email. jq expressions are powerful but cryptic —read jq manual carefully. jq 1.7 is the stable release with improved performance and documentation.
jq Advanced: Transformations, Aggregations, and Functions
Beyond filtering, jq handles complex transformations. Object operations: jq 'keys' (get key names), jq 'values' (get values), jq 'to_entries' (object to array of {key, value}), jq 'from_entries' (array of {key, value} to object).
String operations: jq '.name | split("@") | .[0]' (split email, get local part), jq '.name |ascii_downcase', jq '.amount | tonumber * 100 | floor'. Array operations: jq 'map(.price)' (extract field from array), jq 'map(.price) | add / length' (average price), jq 'group_by(.category)' (group array by field).
Null handling: jq '.name // "unknown"' (default if null). Conditionals: jq 'if .age >= 18 then "adult" else "minor" end'. jq supports modules (import functions from files), custom functions, and regex with test, scan, split. Complex jq scripts can be saved in .jq files: jq -f transform.jq data.json.
JSONata: A Readable Alternative to jq
JSONata is an expression language designed for data transformation, with a syntax that's more readable than jq's. Created by IBM, it's available as a Node.js library (jsonata), a web tool at jsonata.org, and has been adopted by tools like Zapier and data transformation platforms. Basic syntax: $.name (select key), $uppercase($.name) (function call), $sum($.items.price) (aggregate).
Object construction: {name: $.name, email: $.email} (no colon, no quotes needed). Array iteration: $map(data.items, function($i) {$i.name}) or the shorthand data.items.name. Pipelines: data | $filter($, function($i) {$i.active}).
JSONata's strength: it looks like JavaScript, making it accessible to web developers. Its weakness: less powerful than jq for complex transformations. use JSONata when: building browser-based data transformations, working with Node.js data pipelines, or when team members find jq syntax intimidating.
Real-World Use Cases: API Data Shaping
Common scenarios where jq/JSONata shine. Extract specific fields from API response: curl -s api.github.com/repos/jqlang/jq | jq '{stars: .stargazers_count, forks: .forks_count, description: .description}'. Transform array items: echo '[1,2,3]' | jq 'map(. * 2)' —[2,4,6].
Flatten nested JSON: jq '.. | objects | select(has("id")) | {id, name}' (recursive descent). Merge two JSON objects: jq -s '.[0] * .[1]' config.json overrides.json. Aggregate logs: cat access.log | jq -r '.action' | sort | uniq -c | sort -rn (count log actions).
Filter and format for CSV: jq -r '.[] | [.id, .name, .email] | @csv'. For JSONata equivalents: these patterns translate directly. jq is the standard for shell-based data wrangling; JSONata dominates in visual data pipeline tools (Make.com, Zapier, n8n). Learn both —jq for quick CLI tasks, JSONata for GUI-based automation.
jq in CI/CD Pipelines and DevOps
jq is invaluable in CI/CD pipelines. Extract values from JSON responses: get a Kubernetes deployment image tag, parse Terraform plan output, extract test coverage from JSON reports. Example —extract image from Kubernetes deployment: kubectl get deployment app -o json | jq -r '.spec.template.spec.containers[0].image'.
Parse GitHub API response: curl -sH "Authorization: token $GH_TOKEN" https://api.github.com/repos/$REPO/releases/latest | jq -r '.tag_name'. Process JSON logs in a log shipper: Fluentd uses jq expressions to transform log entries. jq is pre-installed on most CI/CD runners (GitHub Actions ubuntu-latest, GitLab CI Docker images).
Common pattern: jq -n --argjson var "$ENV_VAR" '{key: $var}' to inject environment variables into JSON. The -n flag prevents reading stdin until you explicitly do so; --argjson parses the value as JSON. This prevents shell injection vulnerabilities when using environment variables in JSON construction.
Performance and Large File Handling
jq and JSONata handle large files differently. jq streams input for memory efficiency: jq -s (slurp mode) reads entire file into memory; without -s, jq processes line-by-line. For files larger than available RAM, use streaming: jq -c --stream outputs paths to values, enabling incremental processing. JSONata processes entire documents in memory —not suitable for GB-sized files without chunking.
for truly large JSON files (10GB+), use streaming parsers: ijson (Python), JSONStream (Node.js), or the ndjson format. For most practical use cases (API responses, config files, log files up to 100MB), jq and JSONata are fast enough. Benchmark with your actual data. jq's Go implementation is highly optimized —it outperforms Python JSON parsing for most tasks.
JSONata is slower due to its interpreted JavaScript engine. For performance-critical data pipelines, use jq for transformation and combine with orjson for serialization. com/tools/json-to-csv to convert JSON data.