
JSON (JavaScript Object Notation), has become the de facto standard for data exchange across the web. From configuring applications to transmitting data between APIs, its lightweight and human-readable format makes it incredibly versatile. However, while machines can easily process JSON in its most compact form – often minified into a single line – that efficiency can turn into a significant hurdle for human developers.
This guide will walk you through the fundamentals, starting with Python’s built-in json module. From there, you’ll move beyond the basics to tackle more complex, real-world challenges. You’ll learn how to serialize custom Python objects, process massive JSON files without running out of memory, and leverage high-performance alternative libraries to speed up your applications.
Key Takeaways:
indent and sort_keys in json.dumps() to instantly make your JSON output readable and consistently structured.separators and ensure_ascii parameters.JSONEncoder subclass or by passing a handler function to the default parameter.ijson or adopting the line-delimited JSON format.json module with faster alternatives like orjson.object_hook parameter in json.loads() to intercept and transform the data.rich library to pretty-print JSON with syntax highlighting directly in your terminal.We can use the dumps() method to get the pretty formatted JSON string.
import json
json_data = '[{"ID":10,"Name":"Pankaj","Role":"CEO"},' \
'{"ID":20,"Name":"David Lee","Role":"Editor"}]'
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)
This outputs the formatted JSON:
[
{
"ID": 10,
"Name": "Pankaj",
"Role": "CEO"
},
{
"ID": 20,
"Name": "David Lee",
"Role": "Editor"
}
]
json.loads() to create the JSON object from the JSON string.json.dumps() method takes the JSON object and returns a JSON formatted string. The indent parameter defines the indent level for the formatted string.Let’s see what happens when we try to print a JSON file data. The file data is saved in a pretty printed format.

import json
with open('Cars.json', 'r') as json_file:
json_object = json.load(json_file)
print(json_object)
print(json.dumps(json_object))
print(json.dumps(json_object, indent=1))
Output:
[{'Car Name': 'Honda City', 'Car Model': 'City', 'Car Maker': 'Honda', 'Car Price': '20,000 USD'}, {'Car Name': 'Bugatti Chiron', 'Car Model': 'Chiron', 'Car Maker': 'Bugatti', 'Car Price': '3 Million USD'}]
[{"Car Name": "Honda City", "Car Model": "City", "Car Maker": "Honda", "Car Price": "20,000 USD"}, {"Car Name": "Bugatti Chiron", "Car Model": "Chiron", "Car Maker": "Bugatti", "Car Price": "3 Million USD"}]
[
{
"Car Name": "Honda City",
"Car Model": "City",
"Car Maker": "Honda",
"Car Price": "20,000 USD"
},
{
"Car Name": "Bugatti Chiron",
"Car Model": "Chiron",
"Car Maker": "Bugatti",
"Car Price": "3 Million USD"
}
]
It’s clear from the output that we have to pass the indent value to get the JSON data into a pretty printed format.
json.dumps() ParametersWhile indent and sort_keys are commonly used for creating readable JSON, the json.dumps() function offers several other powerful parameters that give you finer control over the serialization process. Let’s explore some of the most useful ones.
separators: Controlling Whitespace for Compact OutputThe separators parameter allows you to customize the delimiter characters used in your JSON output. It takes a tuple containing two strings: (item_separator, key_separator). By default, Python uses (', ', ': '), which includes a space after the comma and colon for readability.
You can create a more compact representation by removing this whitespace. This is useful for reducing file size when readability is not the primary concern.
For example, let’s define a simple Python dictionary:
import json
data = {
"name": "John Doe",
"age": 30,
"isStudent": False,
"courses": [
{"title": "History", "credits": 3},
{"title": "Math", "credits": 4}
]
}
Now, let’s serialize it using the default separators and a more compact version:
# Default pretty-printed output
print(json.dumps(data, indent=4))
# Compact pretty-printed output
print(json.dumps(data, indent=4, separators=(',', ':')))
The output demonstrates the difference:
Default Output:
{
"name": "John Doe",
"age": 30,
"isStudent": false,
"courses": [
{
"title": "History",
"credits": 3
},
{
"title": "Math",
"credits": 4
}
]
}
Compact Output with separators:
{
"name":"John Doe",
"age":30,
"isStudent":false,
"courses":[
{
"title":"History",
"credits":3
},
{
"title":"Math",
"credits":4
}
]
}
As you can see, the second version removes the space after the colons within each key-value pair, resulting in a slightly smaller output.
ensure_ascii: Working with International CharactersBy default, json.dumps() escapes all non-ASCII characters. For example, a character like 'é' would be converted to \u00e9. While this guarantees compatibility, it can make the JSON difficult to read if you are working with languages other than English.
By setting ensure_ascii=False, you can instruct json.dumps() to write these characters directly. This is highly recommended when your output destination supports UTF-8, which is standard for modern web APIs and file systems.
Consider this example with a non-ASCII character:
import json
data = {"name": "Søren", "city": "København"}
# Default behavior with ASCII escaping
print(json.dumps(data))
# With ensure_ascii=False for direct output
print(json.dumps(data, ensure_ascii=False))
Output:
{"name": "S\u00f8ren", "city": "K\u00f8benhavn"}
{"name": "Søren", "city": "København"}
The second line is much more readable and is the preferred format for UTF-8 compatible systems.
default: Handling Custom Python ObjectsA TypeError is raised when you try to serialize a Python object that isn’t directly supported by the JSON specification, such as a datetime object or a custom class instance. The default parameter provides an elegant way to handle this.
You can pass a function to default that will be called for any object that the serializer doesn’t know how to handle. This function should return a JSON-serializable version of the object.
Let’s see how to serialize a datetime object and a custom User object.
import json
from datetime import datetime
class User:
def __init__(self, name, registered_at):
self.name = name
self.registered_at = registered_at
def custom_serializer(obj):
# Custom JSON serializer for objects not serializable by default.
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, User):
return {
"name": obj.name,
"registered_at": obj.registered_at.isoformat(),
"__class__": "User" # Optional: for custom decoding
}
raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable")
user = User("Jane Doe", datetime.now())
# Use the default parameter to handle the custom User object and datetime
json_string = json.dumps(user, default=custom_serializer, indent=4)
print(json_string)
Output:
{
"name": "Jane Doe",
"registered_at": "2025-09-11T15:03:18.673824",
"__class__": "User"
}
This approach allows you to centrally define serialization logic for any custom types in your application, making your code cleaner and more maintainable.
When working with data, you may encounter JSON files that are too large to fit into your computer’s memory. Loading such a file with json.load() would lead to a MemoryError. Fortunately, there are techniques for processing large JSON files without consuming excessive memory.
ijsonStreaming parsers read and parse a file incrementally, piece by piece, rather than loading the entire document at once. This approach allows you to process files of any size with a small, constant memory footprint.
A popular library for this in Python is ijson. It can parse a JSON stream and yield items as they are found.
First, install the library:
pip install ijson
Imagine you have a large JSON file large_data.json containing an array of user objects:
[
{"id": 1, "name": "Alice", "data": "..." },
{"id": 2, "name": "Bob", "data": "..." },
...
]
Instead of loading the whole list, you can iterate over it with ijson:
import ijson
filename = "large_data.json"
with open(filename, 'r') as f:
users = ijson.items(f, 'item')
for user in users:
# Process each user object one by one
print(f"Processing user: {user['name']}")
In this example, ijson.items(f, 'item') creates a generator that yields each object from the root array. Only one user object is held in memory at a time, making it efficient for terabyte-scale files.
Another common format for handling large datasets is Line-Delimited JSON, also known as Newline Delimited JSON (NDJSON). In this format, each line in the file is a complete, valid JSON object.
An example data.ndjson file would look like this:
{"id": 1, "event": "login", "timestamp": "2025-09-11T10:00:00Z"}
{"id": 2, "event": "click", "target": "button_a", "timestamp": "2025-09-11T10:01:15Z"}
{"id": 1, "event": "logout", "timestamp": "2025-09-11T10:05:30Z"}
This format is excellent for streaming because you can process the file line by line. Each line can be parsed independently.
Processing an NDJSON file in Python is straightforward:
import json
filename = "data.ndjson"
with open(filename, 'r') as f:
for line in f:
try:
# Each line is a separate JSON object
event_data = json.loads(line)
print(f"Processed event: {event_data['event']} for user {event_data['id']}")
except json.JSONDecodeError:
print(f"Skipping malformed line: {line.strip()}")
This method is not only memory-efficient but also robust, as a malformed line doesn’t prevent the rest of the file from being processed.
While Python’s built-in json module is sufficient for many use cases, several third-party libraries offer improved performance and additional features.
| Library | Key Features | Best For |
|---|---|---|
orjson |
Very high performance, serializes additional types (datetimes, UUIDs, dataclasses), produces compact, UTF-8 binary output. | Performance-critical applications, web APIs. |
simplejson |
The original library json was based on. Often faster and updated more frequently with new features. |
A drop-in replacement for json with potential performance gains. |
rich |
Not a parser, but provides beautiful, syntax-highlighted pretty-printing of JSON in the terminal. | Enhancing debuggability and readability during development. |
orjson: The High-Performance Choiceorjson is a fast JSON library for Python that is significantly faster than the standard json module. It is written in Rust and is designed for performance.
First, install orjson:
pip install orjson
Using orjson is similar to the built-in module, but it serializes to bytes (bytes) by default.
import orjson
from datetime import datetime
data = {
"name": "Project X",
"deadline": datetime(2026, 1, 1),
"status": "active"
}
# orjson handles datetime objects automatically
json_bytes = orjson.dumps(data)
print(json_bytes)
print(orjson.loads(json_bytes))
Output:
b'{"name":"Project X","deadline":"2026-01-01T00:00:00+00:00","status":"active"}'
{'name': 'Project X', 'deadline': '2026-01-01T00:00:00+00:00', 'status': 'active'}
simplejson: A Feature-Rich Alternativesimplejson is the external library that the json module was originally based on. It is still actively developed and sometimes includes features or performance optimizations before they make it into the standard library.
Install simplejson:
pip install simplejson
Its usage is identical to the json module. You can use it as a drop-in replacement.
import simplejson as json
data = {"key": "value"}
print(json.dumps(data))
rich: For Beautiful Terminal OutputWhen you’re debugging or inspecting JSON data in a terminal, readability is key. The rich library excels at producing beautifully formatted and syntax-highlighted output for various data types, including JSON.
Install rich:
pip install rich
To pretty-print JSON with rich, you pass a JSON string to its JSON class.
import json
from rich.console import Console
data = {
"name": "John Doe",
"age": 30,
"isStudent": False,
"courses": [
{"title": "History", "credits": 3},
{"title": "Math", "credits": 4}
]
}
console = Console()
json_string = json.dumps(data)
# Print the JSON with syntax highlighting
console.print_json(json_string)
This will produce a color-coded, indented output in your terminal, making nested structures much easier to read than with the standard print() function.
We previously saw how the default parameter in json.dumps() can help serialize custom objects. For more complex scenarios, especially when you also need custom deserialization logic, creating custom encoder and decoder classes provides a more structured, object-oriented approach.
JSONEncoderSubclassing json.JSONEncoder allows you to create a reusable encoder for your custom objects. You only need to override the default() method. This approach encapsulates the serialization logic within a class, which is cleaner than a standalone function if you have multiple custom types.
Let’s refactor our earlier example to use a custom encoder.
import json
from datetime import datetime
class User:
def __init__(self, name, registered_at):
self.name = name
self.registered_at = registered_at
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, User):
return {
"name": obj.name,
"registered_at": obj.registered_at.isoformat(),
"__class__": "User"
}
# Let the base class default method raise the TypeError
return super().default(obj)
user = User("Jane Doe", datetime.now())
# Use the cls parameter to specify the custom encoder
json_string = json.dumps(user, cls=CustomEncoder, indent=4)
print(json_string)
This produces the same output as before but packages the logic into a reusable CustomEncoder class.
JSONDecoderDeserialization is the process of converting a JSON string back into a Python object. To reconstruct your custom objects, you can use the object_hook parameter in json.loads() or create a custom JSONDecoder subclass.
The object_hook is a function that gets called with the result of any object literal decoded (a dict). It can then transform this dictionary into a different object.
Let’s use the __class__ key we added during encoding to identify and reconstruct our User object.
import json
from datetime import datetime
# Assume User and CustomEncoder classes are defined as above
def from_json_object(dct):
"""Object hook to decode custom objects."""
if "__class__" in dct and dct["__class__"] == "User":
return User(name=dct["name"], registered_at=datetime.fromisoformat(dct["registered_at"]))
return dct
json_string = """
{
"name": "Jane Doe",
"registered_at": "2025-09-11T15:03:18.673824",
"__class__": "User"
}
"""
# Use object_hook to deserialize the string into a User object
user_object = json.loads(json_string, object_hook=from_json_object)
print(type(user_object))
print(user_object.name)
print(user_object.registered_at)
Output:
<class '__main__.User'>
Jane Doe
2025-09-11 15:03:18.673824
This demonstrates how object_hook successfully converted the dictionary back into an instance of our User class. Creating a full JSONDecoder subclass is also possible but is often unnecessary, as object_hook handles most use cases with less boilerplate code.
When you make an API request, you often get a single, long line of JSON response to save bandwidth. This is incredibly difficult for humans to read and understand, especially for complex or deeply nested data. Pretty-printing transforms this unreadable string into a structured, indented, and human-readable format, making it far easier to identify correct data, missing fields, or unexpected errors.
Here’s an example of how to fetch an API response and pretty-print its content. We’re using the JSONPlaceholder API for testing.
import requests
import json
url = "https://jsonplaceholder.typicode.com/posts/1"
response = requests.get(url)
if response.status_code == 200:
data = response.json()
print(json.dumps(data))
print(json.dumps(data, indent=2))
else:
print(f"Error: {response.status_code}")
This will print the following output:
{"userId": 1, "id": 1, "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit", "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"}
{
"userId": 1,
"id": 1,
"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
As you can see, the pretty printed JSON is easier to read and debug.
Traditional log messages are often plain text. When you need to log complex events, user actions, or system states that are best represented as JSON (e.g., a full request payload, an error context, or a processed data record), logging it as a single, unformatted string makes logs difficult to read, parse, and analyze. Pretty-printing JSON within your logs makes them immediately understandable, especially when manually sifting through log files or using log aggregation tools that might not automatically format JSON.
This is suitable for local developement. However, in production environments, it’s better to use a log management system (e.g., ELK, Splunk, DataDog) to parse the data into searchable fields. This is more efficient for large volumes and automated analysis.
Configuration files are the backbone of many applications, defining settings, database connections, API keys, and more. While many are manually written and thus already formatted, pretty-printing JSON becomes invaluable in scenarios:
The best and easiest way to indent JSON output in Python is by using the the indent parameter in the json.dumps() function.
import json
data = {"name": "Alice", "age": 30, "hobbies": ["reading", "chess", "hiking"]}
# Indent JSON output by 4 spaces
json_string = json.dumps(data, indent=4)
print(json_string)
json.dumps() and pprint?json.dumps() converts Python objects (like a dict or list) to a JSON-formatted string. You can use json.dumps() when you want to serialize Python data into a valid JSON string. pprint() pretty-prints any Python data structure for readability; usually used for debugging or displaying nested Python objects in a readable format.
There are several tools to automatically pretty-print JSON in Python scripts. Here are a few options:
You can use json.tool in the terminal to pretty-print JSON from a file or standard input:
python -m json.tool input.json
jq is another powerful and fast tool for formatting and querying JSON:
jq . input.json
You will have to first install jq using the pip install jq command.
Many editors like VS Code, PyCharm, and Sublime Text have built-in or plugin-based JSON formatters that you can use.
orjson instead of the built-in json module?You should switch to orjson when performance is a critical factor. orjson is significantly faster than the standard json library, making it the ideal choice for high-throughput applications like web APIs, data processing pipelines, or any system where serialization and deserialization speed is a bottleneck. Additionally, orjson natively supports types like datetime and dataclasses, which can simplify your code by eliminating the need for custom handlers.
While debugging is the most common use case, pretty-printing JSON is valuable in several other scenarios. It is essential for creating human-readable configuration files (.json), generating clear and understandable API documentation with example responses, and for logging structured data in a way that is easy for developers to inspect and analyze later. Any situation where a human needs to read or verify structured data can benefit from pretty-printing.
Pretty-printing JSON isn’t merely about making your JSON data look pretty; it’s a powerful technique to improve readability and enhance your debugging capabilities. With Python’s json.dumps() and pprint modules, you can quickly format output for better clarity. You can also use advanced parameters to handle non-ASCII characters, serialize custom Python objects, and fine-tune whitespace for compact output.
This extends beyond simple output, proving valuable for debugging API responses, logging structured data, and improving readability of config files. You are now equipped to tackle more advanced challenges, from processing massive JSON files with streaming parsers like ijson to creating your own classes for full control over custom data types. By exploring high-performance alternative libraries like orjson and simplejson, you can optimize your applications for speed and efficiency. It’s a small change with a big impact on your development experience, elevating your ability to manage data in any scenario.
For more information on JSON, and working with files in Python, you can refer to the following tutorials:
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Java and Python Developer for 20+ years, Open Source Enthusiast, Founder of https://www.askpython.com/, https://www.linuxfordevices.com/, and JournalDev.com (acquired by DigitalOcean). Passionate about writing technical articles and sharing knowledge with others. Love Java, Python, Unix and related technologies. Follow my X @PankajWebDev
I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix
With over 6 years of experience in tech publishing, Mani has edited and published more than 75 books covering a wide range of data science topics. Known for his strong attention to detail and technical knowledge, Mani specializes in creating clear, concise, and easy-to-understand content tailored for developers.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.