JSON (JavaScript Object Notation), has become the de facto standard for data exchange across the web. From configuring applications to transmitting data between APIs, its lightweight and human-readable format makes it incredibly versatile. However, while machines can easily process JSON in its most compact form – often minified into a single line – that efficiency can turn into a significant hurdle for human developers.
This guide will walk you through the fundamentals, starting with Python’s built-in json module. From there, you’ll move beyond the basics to tackle more complex, real-world challenges. You’ll learn how to serialize custom Python objects, process massive JSON files without running out of memory, and leverage high-performance alternative libraries to speed up your applications.
Key Takeaways:
indent
and sort_keys
in json.dumps()
to instantly make your JSON output readable and consistently structured.separators
and ensure_ascii
parameters.JSONEncoder
subclass or by passing a handler function to the default
parameter.ijson
or adopting the line-delimited JSON format.json
module with faster alternatives like orjson
.object_hook
parameter in json.loads()
to intercept and transform the data.rich
library to pretty-print JSON with syntax highlighting directly in your terminal.We can use the dumps()
method to get the pretty formatted JSON string.
import json
json_data = '[{"ID":10,"Name":"Pankaj","Role":"CEO"},' \
'{"ID":20,"Name":"David Lee","Role":"Editor"}]'
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)
This outputs the formatted JSON:
[
{
"ID": 10,
"Name": "Pankaj",
"Role": "CEO"
},
{
"ID": 20,
"Name": "David Lee",
"Role": "Editor"
}
]
json.loads()
to create the JSON object from the JSON string.json.dumps()
method takes the JSON object and returns a JSON formatted string. The indent
parameter defines the indent level for the formatted string.Let’s see what happens when we try to print a JSON file data. The file data is saved in a pretty printed format.
import json
with open('Cars.json', 'r') as json_file:
json_object = json.load(json_file)
print(json_object)
print(json.dumps(json_object))
print(json.dumps(json_object, indent=1))
Output:
[{'Car Name': 'Honda City', 'Car Model': 'City', 'Car Maker': 'Honda', 'Car Price': '20,000 USD'}, {'Car Name': 'Bugatti Chiron', 'Car Model': 'Chiron', 'Car Maker': 'Bugatti', 'Car Price': '3 Million USD'}]
[{"Car Name": "Honda City", "Car Model": "City", "Car Maker": "Honda", "Car Price": "20,000 USD"}, {"Car Name": "Bugatti Chiron", "Car Model": "Chiron", "Car Maker": "Bugatti", "Car Price": "3 Million USD"}]
[
{
"Car Name": "Honda City",
"Car Model": "City",
"Car Maker": "Honda",
"Car Price": "20,000 USD"
},
{
"Car Name": "Bugatti Chiron",
"Car Model": "Chiron",
"Car Maker": "Bugatti",
"Car Price": "3 Million USD"
}
]
It’s clear from the output that we have to pass the indent value to get the JSON data into a pretty printed format.
json.dumps()
ParametersWhile indent
and sort_keys
are commonly used for creating readable JSON, the json.dumps()
function offers several other powerful parameters that give you finer control over the serialization process. Let’s explore some of the most useful ones.
separators
: Controlling Whitespace for Compact OutputThe separators
parameter allows you to customize the delimiter characters used in your JSON output. It takes a tuple containing two strings: (item_separator, key_separator)
. By default, Python uses (', ', ': ')
, which includes a space after the comma and colon for readability.
You can create a more compact representation by removing this whitespace. This is useful for reducing file size when readability is not the primary concern.
For example, let’s define a simple Python dictionary:
import json
data = {
"name": "John Doe",
"age": 30,
"isStudent": False,
"courses": [
{"title": "History", "credits": 3},
{"title": "Math", "credits": 4}
]
}
Now, let’s serialize it using the default separators and a more compact version:
# Default pretty-printed output
print(json.dumps(data, indent=4))
# Compact pretty-printed output
print(json.dumps(data, indent=4, separators=(',', ':')))
The output demonstrates the difference:
Default Output:
{
"name": "John Doe",
"age": 30,
"isStudent": false,
"courses": [
{
"title": "History",
"credits": 3
},
{
"title": "Math",
"credits": 4
}
]
}
Compact Output with separators
:
{
"name":"John Doe",
"age":30,
"isStudent":false,
"courses":[
{
"title":"History",
"credits":3
},
{
"title":"Math",
"credits":4
}
]
}
As you can see, the second version removes the space after the colons within each key-value pair, resulting in a slightly smaller output.
ensure_ascii
: Working with International CharactersBy default, json.dumps()
escapes all non-ASCII characters. For example, a character like 'é'
would be converted to \u00e9
. While this guarantees compatibility, it can make the JSON difficult to read if you are working with languages other than English.
By setting ensure_ascii=False
, you can instruct json.dumps()
to write these characters directly. This is highly recommended when your output destination supports UTF-8, which is standard for modern web APIs and file systems.
Consider this example with a non-ASCII character:
import json
data = {"name": "Søren", "city": "København"}
# Default behavior with ASCII escaping
print(json.dumps(data))
# With ensure_ascii=False for direct output
print(json.dumps(data, ensure_ascii=False))
Output:
{"name": "S\u00f8ren", "city": "K\u00f8benhavn"}
{"name": "Søren", "city": "København"}
The second line is much more readable and is the preferred format for UTF-8 compatible systems.
default
: Handling Custom Python ObjectsA TypeError
is raised when you try to serialize a Python object that isn’t directly supported by the JSON specification, such as a datetime
object or a custom class instance. The default
parameter provides an elegant way to handle this.
You can pass a function to default
that will be called for any object that the serializer doesn’t know how to handle. This function should return a JSON-serializable version of the object.
Let’s see how to serialize a datetime
object and a custom User
object.
import json
from datetime import datetime
class User:
def __init__(self, name, registered_at):
self.name = name
self.registered_at = registered_at
def custom_serializer(obj):
# Custom JSON serializer for objects not serializable by default.
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, User):
return {
"name": obj.name,
"registered_at": obj.registered_at.isoformat(),
"__class__": "User" # Optional: for custom decoding
}
raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable")
user = User("Jane Doe", datetime.now())
# Use the default parameter to handle the custom User object and datetime
json_string = json.dumps(user, default=custom_serializer, indent=4)
print(json_string)
Output:
{
"name": "Jane Doe",
"registered_at": "2025-09-11T15:03:18.673824",
"__class__": "User"
}
This approach allows you to centrally define serialization logic for any custom types in your application, making your code cleaner and more maintainable.
When working with data, you may encounter JSON files that are too large to fit into your computer’s memory. Loading such a file with json.load()
would lead to a MemoryError
. Fortunately, there are techniques for processing large JSON files without consuming excessive memory.
ijson
Streaming parsers read and parse a file incrementally, piece by piece, rather than loading the entire document at once. This approach allows you to process files of any size with a small, constant memory footprint.
A popular library for this in Python is ijson
. It can parse a JSON stream and yield items as they are found.
First, install the library:
pip install ijson
Imagine you have a large JSON file large_data.json
containing an array of user objects:
[
{"id": 1, "name": "Alice", "data": "..." },
{"id": 2, "name": "Bob", "data": "..." },
...
]
Instead of loading the whole list, you can iterate over it with ijson
:
import ijson
filename = "large_data.json"
with open(filename, 'r') as f:
users = ijson.items(f, 'item')
for user in users:
# Process each user object one by one
print(f"Processing user: {user['name']}")
In this example, ijson.items(f, 'item')
creates a generator that yields each object from the root array. Only one user object is held in memory at a time, making it efficient for terabyte-scale files.
Another common format for handling large datasets is Line-Delimited JSON, also known as Newline Delimited JSON (NDJSON). In this format, each line in the file is a complete, valid JSON object.
An example data.ndjson
file would look like this:
{"id": 1, "event": "login", "timestamp": "2025-09-11T10:00:00Z"}
{"id": 2, "event": "click", "target": "button_a", "timestamp": "2025-09-11T10:01:15Z"}
{"id": 1, "event": "logout", "timestamp": "2025-09-11T10:05:30Z"}
This format is excellent for streaming because you can process the file line by line. Each line can be parsed independently.
Processing an NDJSON file in Python is straightforward:
import json
filename = "data.ndjson"
with open(filename, 'r') as f:
for line in f:
try:
# Each line is a separate JSON object
event_data = json.loads(line)
print(f"Processed event: {event_data['event']} for user {event_data['id']}")
except json.JSONDecodeError:
print(f"Skipping malformed line: {line.strip()}")
This method is not only memory-efficient but also robust, as a malformed line doesn’t prevent the rest of the file from being processed.
While Python’s built-in json
module is sufficient for many use cases, several third-party libraries offer improved performance and additional features.
Library | Key Features | Best For |
---|---|---|
orjson |
Very high performance, serializes additional types (datetimes, UUIDs, dataclasses), produces compact, UTF-8 binary output. | Performance-critical applications, web APIs. |
simplejson |
The original library json was based on. Often faster and updated more frequently with new features. |
A drop-in replacement for json with potential performance gains. |
rich |
Not a parser, but provides beautiful, syntax-highlighted pretty-printing of JSON in the terminal. | Enhancing debuggability and readability during development. |
orjson
: The High-Performance Choiceorjson
is a fast JSON library for Python that is significantly faster than the standard json
module. It is written in Rust and is designed for performance.
First, install orjson
:
pip install orjson
Using orjson
is similar to the built-in module, but it serializes to bytes (bytes
) by default.
import orjson
from datetime import datetime
data = {
"name": "Project X",
"deadline": datetime(2026, 1, 1),
"status": "active"
}
# orjson handles datetime objects automatically
json_bytes = orjson.dumps(data)
print(json_bytes)
print(orjson.loads(json_bytes))
Output:
b'{"name":"Project X","deadline":"2026-01-01T00:00:00+00:00","status":"active"}'
{'name': 'Project X', 'deadline': '2026-01-01T00:00:00+00:00', 'status': 'active'}
simplejson
: A Feature-Rich Alternativesimplejson
is the external library that the json
module was originally based on. It is still actively developed and sometimes includes features or performance optimizations before they make it into the standard library.
Install simplejson
:
pip install simplejson
Its usage is identical to the json
module. You can use it as a drop-in replacement.
import simplejson as json
data = {"key": "value"}
print(json.dumps(data))
rich
: For Beautiful Terminal OutputWhen you’re debugging or inspecting JSON data in a terminal, readability is key. The rich
library excels at producing beautifully formatted and syntax-highlighted output for various data types, including JSON.
Install rich
:
pip install rich
To pretty-print JSON with rich
, you pass a JSON string to its JSON
class.
import json
from rich.console import Console
data = {
"name": "John Doe",
"age": 30,
"isStudent": False,
"courses": [
{"title": "History", "credits": 3},
{"title": "Math", "credits": 4}
]
}
console = Console()
json_string = json.dumps(data)
# Print the JSON with syntax highlighting
console.print_json(json_string)
This will produce a color-coded, indented output in your terminal, making nested structures much easier to read than with the standard print()
function.
We previously saw how the default
parameter in json.dumps()
can help serialize custom objects. For more complex scenarios, especially when you also need custom deserialization logic, creating custom encoder and decoder classes provides a more structured, object-oriented approach.
JSONEncoder
Subclassing json.JSONEncoder
allows you to create a reusable encoder for your custom objects. You only need to override the default()
method. This approach encapsulates the serialization logic within a class, which is cleaner than a standalone function if you have multiple custom types.
Let’s refactor our earlier example to use a custom encoder.
import json
from datetime import datetime
class User:
def __init__(self, name, registered_at):
self.name = name
self.registered_at = registered_at
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, User):
return {
"name": obj.name,
"registered_at": obj.registered_at.isoformat(),
"__class__": "User"
}
# Let the base class default method raise the TypeError
return super().default(obj)
user = User("Jane Doe", datetime.now())
# Use the cls parameter to specify the custom encoder
json_string = json.dumps(user, cls=CustomEncoder, indent=4)
print(json_string)
This produces the same output as before but packages the logic into a reusable CustomEncoder
class.
JSONDecoder
Deserialization is the process of converting a JSON string back into a Python object. To reconstruct your custom objects, you can use the object_hook
parameter in json.loads()
or create a custom JSONDecoder
subclass.
The object_hook
is a function that gets called with the result of any object literal decoded (a dict
). It can then transform this dictionary into a different object.
Let’s use the __class__
key we added during encoding to identify and reconstruct our User
object.
import json
from datetime import datetime
# Assume User and CustomEncoder classes are defined as above
def from_json_object(dct):
"""Object hook to decode custom objects."""
if "__class__" in dct and dct["__class__"] == "User":
return User(name=dct["name"], registered_at=datetime.fromisoformat(dct["registered_at"]))
return dct
json_string = """
{
"name": "Jane Doe",
"registered_at": "2025-09-11T15:03:18.673824",
"__class__": "User"
}
"""
# Use object_hook to deserialize the string into a User object
user_object = json.loads(json_string, object_hook=from_json_object)
print(type(user_object))
print(user_object.name)
print(user_object.registered_at)
Output:
<class '__main__.User'>
Jane Doe
2025-09-11 15:03:18.673824
This demonstrates how object_hook
successfully converted the dictionary back into an instance of our User
class. Creating a full JSONDecoder
subclass is also possible but is often unnecessary, as object_hook
handles most use cases with less boilerplate code.
When you make an API request, you often get a single, long line of JSON response to save bandwidth. This is incredibly difficult for humans to read and understand, especially for complex or deeply nested data. Pretty-printing transforms this unreadable string into a structured, indented, and human-readable format, making it far easier to identify correct data, missing fields, or unexpected errors.
Here’s an example of how to fetch an API response and pretty-print its content. We’re using the JSONPlaceholder API for testing.
import requests
import json
url = "https://jsonplaceholder.typicode.com/posts/1"
response = requests.get(url)
if response.status_code == 200:
data = response.json()
print(json.dumps(data))
print(json.dumps(data, indent=2))
else:
print(f"Error: {response.status_code}")
This will print the following output:
{"userId": 1, "id": 1, "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit", "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"}
{
"userId": 1,
"id": 1,
"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
As you can see, the pretty printed JSON is easier to read and debug.
Traditional log messages are often plain text. When you need to log complex events, user actions, or system states that are best represented as JSON (e.g., a full request payload, an error context, or a processed data record), logging it as a single, unformatted string makes logs difficult to read, parse, and analyze. Pretty-printing JSON within your logs makes them immediately understandable, especially when manually sifting through log files or using log aggregation tools that might not automatically format JSON.
This is suitable for local developement. However, in production environments, it’s better to use a log management system (e.g., ELK, Splunk, DataDog) to parse the data into searchable fields. This is more efficient for large volumes and automated analysis.
Configuration files are the backbone of many applications, defining settings, database connections, API keys, and more. While many are manually written and thus already formatted, pretty-printing JSON becomes invaluable in scenarios:
The best and easiest way to indent JSON output in Python is by using the the indent
parameter in the json.dumps()
function.
import json
data = {"name": "Alice", "age": 30, "hobbies": ["reading", "chess", "hiking"]}
# Indent JSON output by 4 spaces
json_string = json.dumps(data, indent=4)
print(json_string)
json.dumps()
and pprint
?json.dumps()
converts Python objects (like a dict or list) to a JSON-formatted string. You can use json.dumps()
when you want to serialize Python data into a valid JSON string. pprint()
pretty-prints any Python data structure for readability; usually used for debugging or displaying nested Python objects in a readable format.
There are several tools to automatically pretty-print JSON in Python scripts. Here are a few options:
You can use json.tool
in the terminal to pretty-print JSON from a file or standard input:
python -m json.tool input.json
jq is another powerful and fast tool for formatting and querying JSON:
jq . input.json
You will have to first install jq
using the pip install jq
command.
Many editors like VS Code, PyCharm, and Sublime Text have built-in or plugin-based JSON formatters that you can use.
orjson
instead of the built-in json
module?You should switch to orjson
when performance is a critical factor. orjson
is significantly faster than the standard json
library, making it the ideal choice for high-throughput applications like web APIs, data processing pipelines, or any system where serialization and deserialization speed is a bottleneck. Additionally, orjson
natively supports types like datetime and dataclasses, which can simplify your code by eliminating the need for custom handlers.
While debugging is the most common use case, pretty-printing JSON is valuable in several other scenarios. It is essential for creating human-readable configuration files (.json
), generating clear and understandable API documentation with example responses, and for logging structured data in a way that is easy for developers to inspect and analyze later. Any situation where a human needs to read or verify structured data can benefit from pretty-printing.
Pretty-printing JSON isn’t merely about making your JSON data look pretty; it’s a powerful technique to improve readability and enhance your debugging capabilities. With Python’s json.dumps()
and pprint
modules, you can quickly format output for better clarity. You can also use advanced parameters to handle non-ASCII characters, serialize custom Python objects, and fine-tune whitespace for compact output.
This extends beyond simple output, proving valuable for debugging API responses, logging structured data, and improving readability of config files. You are now equipped to tackle more advanced challenges, from processing massive JSON files with streaming parsers like ijson
to creating your own classes for full control over custom data types. By exploring high-performance alternative libraries like orjson
and simplejson
, you can optimize your applications for speed and efficiency. It’s a small change with a big impact on your development experience, elevating your ability to manage data in any scenario.
For more information on JSON, and working with files in Python, you can refer to the following tutorials:
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Java and Python Developer for 20+ years, Open Source Enthusiast, Founder of https://www.askpython.com/, https://www.linuxfordevices.com/, and JournalDev.com (acquired by DigitalOcean). Passionate about writing technical articles and sharing knowledge with others. Love Java, Python, Unix and related technologies. Follow my X @PankajWebDev
I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix
With over 6 years of experience in tech publishing, Mani has edited and published more than 75 books covering a wide range of data science topics. Known for his strong attention to detail and technical knowledge, Mani specializes in creating clear, concise, and easy-to-understand content tailored for developers.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.