At some point in your Python journey, you’re going to need to figure out how big a file is. Whether you’re building a file uploader, managing assets, or just doing a quick check on disk space, it’s a common task that’ll come your way. The good news is that Python makes this incredibly simple with its built-in file handling capabilities. This article will walk you through the best and most common ways to get a file’s size.
We’ll start with the classic os.path.getsize()
function, the go-to for a quick and direct answer. Then, we’ll explore the more modern and elegant pathlib
approach, which is a fantastic tool to have in your belt. We’ll also cover how to handle errors gracefully when a file is missing, and finally, how to convert raw byte counts into a clean, human-readable format like “KB” or “MB”.
Tested with: Python 3.12 on Ubuntu 22.04 LTS. The code samples are OS-agnostic and also work on macOS and Windows unless noted.
data/
directory with at least one file for testingWe recommend running the examples inside a Python virtual environment to avoid dependency conflicts. If you’re new to venv, see How To Use Python Virtual Environments with venv (Ubuntu 22.04)
os.path.getsize('path/to/file')
for the most direct, standard way to get a file’s size in bytes.pathlib
module.os.stat()
when you need additional file metadata, such as the last modification time, in addition to the file’s size.try...except
block to gracefully handle potential FileNotFoundError
and PermissionError
.os.path.getsize()
: The Standard Way to Get File SizeFor a quick and direct answer, the os.path.getsize()
function is the best choice. It’s part of Python’s standard os
module and is the most common way to get a file’s size. It does one thing and does it well: it takes a path to a file and returns its size.
It’s important to remember that the function returns the size as an integer representing the number of bytes.
import os
file_path = 'data/my_document.txt'
file_size = os.path.getsize(file_path)
print(f"The file size is: {file_size} bytes")
Output:
The file size is: 437 bytes
pathlib.Path
(Modern, Pythonic Approach)Introduced in Python 3.4, the pathlib
module offers a modern, object-oriented way to handle filesystem paths. If you’re writing new code, this is often the recommended approach because it makes your code more readable and expressive. Instead of working with plain strings, you create a Path
object that has its own methods, including one for getting file stats.
To get the size, you call the .stat()
method on your Path
object, which returns a result object (similar to os.stat()
), and then you access its .st_size
attribute.
from pathlib import Path
file_path = Path('data/my_document.txt')
file_size = file_path.stat().st_size
print(f"The file size is: {file_size} bytes")
Output:
The file size is: 437 bytes
When you need more than just the size of a file, os.stat()
function is the tool for the job. While os.path.getsize()
is a convenient shortcut, os.stat()
is the underlying function that retrieves a full “status” report on the file. This report is an object containing a wealth of metadata.
The file size is available via the st_size
attribute of the result object. This method is perfect when you also need to know things like the file’s last modification time (st_mtime
) or creation time (st_ctime
).
import os
import datetime
file_path = 'data/my_document.txt'
stat_info = os.stat(file_path)
file_size = stat_info.st_size
mod_time_timestamp = stat_info.st_mtime
mod_time = datetime.datetime.fromtimestamp(mod_time_timestamp)
print(f"File Size: {file_size} bytes")
print(f"Last Modified: {mod_time.strftime('%Y-%m-%d %H:%M:%S')}")
Output:
File Size: 437 bytes
Last Modified: 2025-07-16 17:42:05
From the previous examples, you may have noticed that the file sizes are always returned in bytes. While getting the file size in bytes is technically accurate, a number like 1474560 doesn’t mean much to most people at a glance. Is that big? Is it small? For a better user experience, it’s essential to convert this raw byte count into a more familiar format, like kilobytes (KB), megabytes (MB), or gigabytes (GB).
This is easily done with a small helper function. The logic is simple: we repeatedly divide the number of bytes by 1024 (the number of bytes in a kilobyte) and keep track of the unit until the number is small enough to be readable.
Here is a function that handles this conversion gracefully and can be integrated directly into your code.
This function takes the size in bytes and an optional number of decimal places for formatting.
def format_size(size_bytes, decimals=2):
if size_bytes == 0:
return "0 Bytes"
# Define the units and the factor for conversion (1024)
power = 1024
units = ["Bytes", "KB", "MB", "GB", "TB", "PB"]
# Calculate the appropriate unit
import math
i = int(math.floor(math.log(size_bytes, power)))
# Format the result
return f"{size_bytes / (power ** i):.{decimals}f} {units[i]}"
Let’s use it in an example:
import os
file_path = 'data/large_file.zip'
raw_size = os.path.getsize(file_path)
readable_size = format_size(raw_size)
print(f"Raw size: {raw_size} bytes")
print(f"Human-readable size: {readable_size}")
Output:
Raw size: 1474560 bytes
Human-readable size: 1.41 MB
By integrating a simple function like this, you can make your program’s output significantly more intuitive and professional.
In a perfect world, every file path would be correct and every file accessible. But in reality, things go wrong. Your script might try to access a file that has been moved, or it might not have the permissions to read it. Without proper error handling, these situations will crash your program. A robust script anticipates these issues and handles them gracefully.
Let’s see how to handle the most common errors you’ll encounter when getting a file’s size.
FileNotFoundError
(Missing Files)This is the most common error you’ll face. It occurs when you try to get the size of a file that doesn’t exist at the specified path. Wrapping your code in a try...except FileNotFoundError
block is the standard way to manage this. Learn more about Python exception handling for robust error management.
import os
file_path = 'path/to/non_existent_file.txt'
try:
file_size = os.path.getsize(file_path)
print(f"File size: {file_size} bytes")
except FileNotFoundError:
print(f"Error: The file at '{file_path}' was not found.")
PermissionError
(Access Denied)Sometimes the file exists, but your script doesn’t have the necessary operating system permissions to read it or its metadata. This will raise a PermissionError
. You can catch this error specifically to give a more informative message to the user.
import os
file_path = '/root/secure_file.dat'
try:
file_size = os.path.getsize(file_path)
print(f"File size: {file_size} bytes")
except FileNotFoundError:
print(f"Error: The file at '{file_path}' was not found.")
except PermissionError:
print(f"Error: Insufficient permissions to access '{file_path}'.")
Symbolic links (or symlinks) are pointers to other files. What happens if a symlink points to a file that has been deleted? The link itself exists, but it’s “broken.” Calling os.path.getsize()
on a broken symlink will raise an OSError
.
A good practice is to first check if the path is a link and then resolve its actual path before getting the size. This approach helps avoid issues with broken symbolic links that can cause unexpected errors.
import os
symlink_path = 'data/broken_link.txt'
try:
file_size = os.path.getsize(symlink_path)
print(f"File size: {file_size} bytes")
except FileNotFoundError:
print(f"Error: The file pointed to by '{symlink_path}' was not found.")
except OSError as e:
print(f"OS Error: Could not get size for '{symlink_path}'. It may be a broken link. Details: {e}")
Note: A broken symlink might raise FileNotFoundError
on some operating systems.
By catching these specific exceptions, you make your code more resilient and user-friendly, providing clear feedback when something goes wrong instead of just crashing.
Method | Returns | Best for | Notes |
---|---|---|---|
os.path.getsize(path) |
Integer bytes | Fast, minimal call when you only need size | Thin wrapper over stat() ; no extra metadata. |
os.stat(path).st_size |
Integer bytes (via struct) | Getting size along with other metadata (mtime, mode, etc.) | One system call; exposes full stat_result . |
Path(path).stat().st_size |
Integer bytes (via struct) | Modern, readable code using pathlib |
Negligible overhead; integrates well with Path APIs. |
Method | Pattern | Best for | Notes |
---|---|---|---|
os.scandir() |
Imperative loop with queue/stack | Maximum throughput on large trees | Fewer syscalls via DirEntry ; typically faster. |
Path(root).rglob('*') |
Iterator over Path objects |
Readable, concise traversal | Slight overhead for object creation; very close in practice. |
os.path.getsize()
vs os.stat()
vs pathlib
When you only need the size of a single file, all three approaches end up calling the same underlying system stat, so they’re functionally equivalent. The micro‑difference is in the Python overhead: os.path.getsize()
is a thin wrapper, os.stat()
returns a full struct you then read from, and pathlib.Path.stat()
adds a small object‑oriented layer. In practice the gap is tiny (microseconds), but it can matter in tight loops.
When you need the total size of a directory tree, the filesystem traversal dominates. Here the choice between os.scandir()
(imperative style) and pathlib.Path.rglob()
(iterator style) is more impactful than the choice between getsize()
and stat()
.
The snippet below measures each API by calling it many times on the same file (to isolate Python‑level overhead). Run it in a directory that has a data/large_file.bin
.
import os
from pathlib import Path
import time
TEST_FILE = Path('data/large_file.bin')
N = 200_000 # increase/decrease based on your machine
# Warm-up (prime filesystem caches)
for _ in range(5_000):
os.path.getsize(TEST_FILE)
start = time.perf_counter()
for _ in range(N):
os.path.getsize(TEST_FILE)
getsize_s = time.perf_counter() - start
start = time.perf_counter()
for _ in range(N):
os.stat(TEST_FILE).st_size
stat_s = time.perf_counter() - start
start = time.perf_counter()
for _ in range(N):
TEST_FILE.stat().st_size
pathlib_s = time.perf_counter() - start
print(f"getsize() : {getsize_s:.3f}s for {N:,} calls")
print(f"os.stat() : {stat_s:.3f}s for {N:,} calls")
print(f"Path.stat(): {pathlib_s:.3f}s for {N:,} calls")
Interpretation: Expect getsize()
and os.stat()
to be neck‑and‑neck, with Path.stat()
close behind. If you’re writing new code, prefer pathlib
for readability unless you’re inside a hot loop where the last few microseconds truly matter.
Tip: You can also use the built‑in
timeit
module for more formal micro‑benchmarks:import timeit, os from pathlib import Path p = Path('data/large_file.bin') print('getsize :', timeit.timeit(lambda: os.path.getsize(p), number=200_000)) print('os.stat :', timeit.timeit(lambda: os.stat(p).st_size, number=200_000)) print('Path.stat:', timeit.timeit(lambda: p.stat().st_size, number=200_000))
Below are two equivalent implementations that sum sizes for all regular files under a root directory. This is a more realistic scenario where traversal cost dominates.
Using os.scandir()
(fast, imperative):
import os
from collections import deque
def du_scandir(root: str) -> int:
total = 0
dq = deque([root])
while dq:
path = dq.popleft()
with os.scandir(path) as it:
for entry in it:
try:
if entry.is_file(follow_symlinks=False):
total += entry.stat(follow_symlinks=False).st_size
elif entry.is_dir(follow_symlinks=False):
dq.append(entry.path)
except (PermissionError, FileNotFoundError):
# Skip unreadable or concurrently-removed entries
continue
return total
Using pathlib
(readable, expressive):
from pathlib import Path
def du_pathlib(root: str) -> int:
p = Path(root)
total = 0
for child in p.rglob('*'):
try:
if child.is_file():
total += child.stat().st_size
except (PermissionError, FileNotFoundError):
continue
return total
Timing the directory methods with timeit
:
import timeit
print('scandir:', timeit.timeit(lambda: du_scandir('data'), number=10))
print('pathlib:', timeit.timeit(lambda: du_pathlib('data'), number=10))
Interpretation: On most systems, os.scandir()
is often a bit faster because it exposes low‑level DirEntry attributes and reduces extra system calls. pathlib
typically trades a small amount of speed for clarity. For very large trees or tight SLAs, use os.scandir()
; for maintainable application code, pathlib
is usually preferred.
os.path.getsize()
: Fast, minimal wrapper; fine for quick scripts and tight loops.os.stat()
: Use when you also need other metadata (mtime, mode) in the same call.pathlib.Path.stat()
: Prefer for new code for readability and cross‑platform path handling; overhead is negligible outside micro‑benchmarks.os.scandir()
for maximum throughput; use pathlib
when code clarity and consistency matter more than micro‑optimizations.Note on caches: Re‑running benchmarks on the same files/directories benefits from OS filesystem caches. If you want to compare cold‑cache behavior, vary the dataset or insert unrelated I/O between runs.
While the examples in this guide are portable, there are important platform differences in os.stat()
metadata and behavior. For comprehensive cross-platform development guidance:
st_ctime
semantics
Permissions & modes
st_mode
encodes rwx bits and file type; stat.S_ISDIR
, stat.S_ISREG
, etc. are reliable. st_uid
/st_gid
present.Symlink handling
follow_symlinks=False
to avoid resolving targets; use os.lstat()
/Path.lstat()
to stat the link itself.FileNotFoundError
/OSError
when followed. Understanding symbolic link behavior is crucial for cross-platform compatibility.Timestamps & precision
st_mtime
) and nanoseconds (st_mtime_ns
). Filesystems differ: NTFS typically ~100 ns ticks; ext4/APFS often nanosecond resolution; FAT may be coarse.Other practical quirks
MAX_PATH
(~260 chars) unless long paths are enabled.st_size
) can exceed on‑disk bytes. Use du
/platform APIs if you need physical allocation. For more details, see Python’s path handling documentation.Use platform checks to interpret metadata correctly and pick safe defaults for symlinks:
import os, sys, stat
from pathlib import Path
p = Path('data/example.txt')
info = p.stat() # follows symlinks
if sys.platform.startswith('win'):
created_or_changed = 'created' # st_ctime is creation time on Windows
else:
created_or_changed = 'changed' # inode metadata change time on Unix
print({'size': info.st_size, 'ctime_semantics': created_or_changed})
# If you need to stat a symlink itself (portable):
try:
link_info = os.lstat('link.txt') # or Path('link.txt').lstat()
except FileNotFoundError:
link_info = None
# When traversing trees, avoid following symlinks unless you intend to:
for entry in os.scandir('data'):
if entry.is_symlink():
continue # or handle explicitly
# Use follow_symlinks=False to be explicit:
if entry.is_file(follow_symlinks=False):
size = entry.stat(follow_symlinks=False).st_size
Guideline: Treat
st_ctime
as creation on Windows and metadata‑change on Unix; document the distinction in user‑facing output, and avoid logic that assumes a universal “created” timestamp across platforms.
In upload workflows, validating size before accepting a body prevents wasted bandwidth and broken UX. On the client or a pre‑processing step, read the file size and reject early if it exceeds policy (e.g., 10 MB for images, 100 MB for PDFs). On the server, verify again using os.stat()
or Path.stat()
after writing to a temp location. Emit precise errors (limit, actual size, allowed types) and log metrics by route to identify abusive clients or misconfigured mobile apps. This approach helps prevent unrestricted file upload vulnerabilities and ensures better security.
from pathlib import Path
MAX_BYTES = 10 * 1024 * 1024 # 10 MB
p = Path('uploads/tmp/user_image.jpg')
size = p.stat().st_size
if size > MAX_BYTES:
raise ValueError(f"Payload too large: {size} > {MAX_BYTES}")
Ops teams routinely track growth of logs, caches, and user‑generated content. A lightweight cron job can calculate the total size of key directories using os.scandir()
for throughput, then alert when thresholds are crossed (e.g., 80% and 95% of volume capacity). Include trend deltas (day‑over‑day growth) to distinguish spikes from steady leaks, and exclude ephemeral paths (e.g., sockets, tmp). This guards against outages caused by full disks and gives capacity planning signals. Understanding disk space management is crucial for system administrators.
import shutil
from datetime import datetime
used = shutil.disk_usage('/')
print({
'ts': datetime.utcnow().isoformat(),
'total': used.total,
'used': used.used,
'free': used.free,
})
In data ingestion, tiny files often signal corrupted shards, incomplete downloads, or unhelpful signal‑to‑noise. Gatekeeping by size speeds up training and reduces IO. Combine a minimum byte threshold with file‑type checks to keep only useful samples. Persist stats (kept vs. skipped counts, total bytes) so pipeline runs are reproducible and auditable. Use Path.rglob()
for readability in research code; switch to os.scandir()
if throughput becomes a bottleneck.
from pathlib import Path
MIN_BYTES = 8 * 1024 # skip files smaller than 8KB
kept, skipped = 0, 0
for f in Path('data/train').rglob('*.jsonl'):
try:
if f.stat().st_size >= MIN_BYTES:
kept += 1
else:
skipped += 1
except FileNotFoundError:
continue
print({'kept': kept, 'skipped': skipped})
On legacy 32-bit systems, especially with older Python versions or C libraries, file size APIs may incorrectly report sizes above 2GB or 4GB due to integer overflows or filesystem limitations. Modern Python builds usually handle this transparently, but you should verify that st_size
returns a long
(64-bit) and not a truncated value. Always test large media (e.g., videos or compressed archives) in deployment environments that still run on 32-bit platforms or embedded devices.
import os
size = os.stat('data/huge_video.mkv').st_size
print(f"Size in GB: {size / (1024 ** 3):.2f} GB")
Getting the total size of a directory (especially with nested folders) is not as simple as getsize()
. You must walk the entire directory tree and sum each file. Use os.walk()
or pathlib.Path.rglob()
depending on whether you want full control or expressive iteration. Consider skipping symbolic links to avoid infinite loops and guard with try/except in case of permission issues.
import os
def get_total_size(path):
total = 0
for dirpath, _, filenames in os.walk(path):
for f in filenames:
try:
fp = os.path.join(dirpath, f)
if not os.path.islink(fp):
total += os.path.getsize(fp)
except (FileNotFoundError, PermissionError):
continue
return total
When working with files on NFS, SMB, or cloud-mounted volumes (e.g., Dropbox, EFS, or Google Drive FUSE mounts), file metadata calls like stat()
can have much higher latency and looser consistency than local disk. File size values may lag behind actual content or fail on disconnected mounts. To make scripts resilient, cache metadata when possible, retry transient failures, and test the mount type before assuming os.path.getsize()
will behave like local filesystems.
import os
try:
size = os.path.getsize('/mnt/nfs_share/data.csv')
print(f"Size: {size} bytes")
except (OSError, TimeoutError) as e:
print(f"NFS access failed: {e}")
The table below consolidates the edge cases with descriptive notes, providing a quick yet detailed reference. Each cell includes not only the core idea but also the context of why it matters in production environments.
Edge Case | Description |
---|---|
Large files on 32-bit systems | File size reporting may fail on older 32-bit builds due to integer overflow, causing incorrect values for files larger than 2GB or 4GB. Although modern 64-bit Python generally resolves this, teams maintaining embedded devices or legacy environments must test with realistic datasets like large videos or archives. Always validate that st_size returns 64-bit integers and consider explicit error handling for overflow risks. |
Recursively walking directory size | Calculating total size for a folder is non-trivial because directories only contain entries, not aggregated file sizes. Scripts must walk the entire tree, summing each file with functions like os.walk() or pathlib.Path.rglob() . Recursive walking must also defend against symlink loops, permission errors, and transient missing files. Properly implemented, this method provides accurate metrics for disk usage reports, backups, or user quota enforcement, even at scale. |
Network-mounted files | Files residing on NFS, SMB, or cloud-mounted volumes often show high latency for metadata calls such as stat() . Unlike local disks, results may be inconsistent if synchronization is delayed, and failures may occur if mounts are disconnected. Scripts should be robust by retrying operations, caching metadata when acceptable, and providing clear error feedback. Understanding this distinction is vital when deploying to hybrid environments where both local and remote files coexist. |
In many ML tasks, extremely small or extremely large files can degrade training quality or slow throughput. For example, corrupted JSONL shards may be only a few bytes, while runaway data exports can be multi‑GB and exceed GPU memory budgets at load time. Add a size gate to your dataset loader so that only samples within an expected range are passed to the training job. Persist counters for kept/skipped files and emit Prometheus‑friendly metrics to correlate model performance with data hygiene. During hyperparameter sweeps or A/B runs, log thresholds alongside experiment IDs so results are reproducible.
from pathlib import Path
MIN_B = 4 * 1024 # 4KB: likely non-empty JSONL row/chunk
MAX_B = 200 * 1024**2 # 200MB: cap to protect RAM/VRAM
kept, skipped = 0, 0
valid_paths = []
for f in Path('datasets/train').rglob('*.jsonl'):
try:
s = f.stat().st_size
if MIN_B <= s <= MAX_B:
valid_paths.append(f)
kept += 1
else:
skipped += 1
except (FileNotFoundError, PermissionError):
skipped += 1
print({'kept': kept, 'skipped': skipped, 'ratio': kept / max(1, kept + skipped)})
# pass valid_paths to your DataLoader / tf.data pipeline
Why this matters: Size filters remove low‑signal noise and protect downstream memory use. They’re fast (metadata only) and complement content‑aware validation (schema checks, row counts) without expensive parsing.
Production systems generate logs, traces, and checkpoints that can balloon storage. Use an automation tool such as n8n to orchestrate periodic scans and decisions. A simple workflow: (1) cron trigger in n8n, (2) run a Python script that enumerates log directories and emits a JSON list of files over a threshold (e.g., 500MB), (3) optional LLM step in n8n to classify files into delete, archive, retain based on filename, age, and service, (4) execute delete/move actions with audit logging. Keep the Python side deterministic; keep the “policy” flexible in n8n so ops can adjust thresholds without redeploying code.
# emit JSON for n8n to consume
import os, json, time
THRESHOLD = 500 * 1024**2 # 500 MB
ROOTS = ['/var/log/myapp', '/var/log/nginx']
candidates = []
now = time.time()
for root in ROOTS:
for dirpath, _, files in os.walk(root):
for name in files:
fp = os.path.join(dirpath, name)
try:
st = os.stat(fp)
if st.st_size >= THRESHOLD:
candidates.append({
'path': fp,
'size_bytes': st.st_size,
'mtime': st.st_mtime,
'age_days': (now - st.st_mtime) / 86400,
})
except (FileNotFoundError, PermissionError):
continue
print(json.dumps({'candidates': candidates}))
Why this matters: You separate concerns—Python handles fast file system introspection; n8n handles policy, approvals, and notifications. This reduces toil, prevents full disks, and creates an auditable trail for compliance.
Ingest pipelines (Apache Kafka consumers, S3 batch pulls, BigQuery exports) benefit from pre‑parse size checks to short‑circuit bad inputs and protect memory. For streaming, apply a size guard per message/blob before deserialization; for batch, annotate manifests with st_size
and reject outliers or quarantine them for review. Publish metrics (p50/p95/p99 sizes) to catch regressions when an upstream service starts emitting unexpectedly large payloads. Couple size thresholds with backoff/retry so transient spikes don’t trigger false positives.
# Example: pre-parse guard in a consumer loop
import os
def accept(path: str, min_b=1_024, max_b=512 * 1024**2):
try:
s = os.stat(path).st_size
return min_b <= s <= max_b
except FileNotFoundError:
return False
for blob_path in get_next_blobs(): # your iterator
if not accept(blob_path):
quarantine(blob_path) # move aside, alert, and continue
continue
process(blob_path) # safe to parse and load
Why this matters: Early size validation protects parsers, keeps consumer lag under control, and makes capacity predictable. It also produces actionable telemetry so data teams can negotiate contracts and SLAs with upstream producers.
The most straightforward way to get the size of a file in Python is by using the os.path.getsize()
function. This function is part of the built-in os module and returns the size of the file in bytes. Here’s a quick example:
import os
file_size = os.path.getsize('data/example.txt')
print(f"File size: {file_size} bytes")
This method works well for most use cases where you just need a fast and simple byte count.
os.path.getsize()
and os.stat()
?While both functions return file size, they serve different purposes. os.path.getsize()
is a convenience function that returns only the size in bytes. In contrast, os.stat()
provides a full status object (stat_result
) that includes various metadata such as:
st_size
: file size in bytesst_mtime
: last modification timest_ctime
: creation time (or metadata change time, depending on the OS)Example:
import os
stat = os.stat('data/example.txt')
print(f"Size: {stat.st_size} bytes, Last Modified: {stat.st_mtime}")
Use os.stat()
when you need more than just the size.
pathlib
instead of os.path
for file size?Yes, especially in modern Python code (version 3.4 and above). The pathlib
module provides an object-oriented interface for file system operations. It improves readability and is considered more Pythonic.
Instead of working with plain strings, you work with Path
objects:
from pathlib import Path
file_path = Path('data/example.txt')
file_size = file_path.stat().st_size
This approach is cross-platform, cleaner, and integrates well with other modern Python features.
Raw byte counts can be hard to interpret, especially for larger files. To display sizes in a human-readable format, you can use a helper function that divides the size by 1024 repeatedly and appends the correct unit:
def format_size(size_bytes, decimals=2):
if size_bytes == 0:
return "0 Bytes"
power = 1024
units = ["Bytes", "KB", "MB", "GB", "TB", "PB"]
import math
i = int(math.floor(math.log(size_bytes, power)))
return f"{size_bytes / (power ** i):.{decimals}f} {units[i]}"
Using this function, 1474560 bytes would become 1.41 MB, which is much more user-friendly.
If the file path is incorrect or the file doesn’t exist, Python will raise a FileNotFoundError
. If the file exists but your script doesn’t have permission to access it, a PermissionError
is raised. To prevent your program from crashing, wrap the operation in a try...except
block. For comprehensive error handling, refer to Python’s exception hierarchy:
try:
size = os.path.getsize('some/file.txt')
except FileNotFoundError:
print("The file does not exist.")
except PermissionError:
print("You do not have permission to access this file.")
This ensures your program handles errors gracefully and provides helpful feedback. For production applications, consider implementing proper logging to track these errors and monitor system health.
If the symbolic link points to a valid file, os.path.getsize()
will return the size of the target file. However, if the symlink is broken (i.e., the target no longer exists), calling this function will raise a FileNotFoundError
or OSError
, depending on the operating system. Understanding symbolic link behavior is essential for robust file handling.
To avoid this, you can check if the path is a symlink and whether its target exists:
import os
if os.path.islink('link.txt') and os.path.exists(os.readlink('link.txt')):
size = os.path.getsize('link.txt')
else:
print("Broken symbolic link or target not found.")
This way, you can handle broken symlinks gracefully.
You now know how to get a file’s size in Python using the direct os.path.getsize()
, the modern pathlib
module, or the more detailed os.stat()
function. We also covered how to handle errors and convert byte counts into a human-readable format. While the simpler methods work well, remember that pathlib
is the recommended standard for writing robust, maintainable code.
To build on these skills, you can explore how to handle plain text files to read and write data or build a complete command-line utility by handling user arguments.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Java and Python Developer for 20+ years, Open Source Enthusiast, Founder of https://www.askpython.com/, https://www.linuxfordevices.com/, and JournalDev.com (acquired by DigitalOcean). Passionate about writing technical articles and sharing knowledge with others. Love Java, Python, Unix and related technologies. Follow my X @PankajWebDev
With over 6 years of experience in tech publishing, Mani has edited and published more than 75 books covering a wide range of data science topics. Known for his strong attention to detail and technical knowledge, Mani specializes in creating clear, concise, and easy-to-understand content tailored for developers.
Building future-ready infrastructure with Linux, Cloud, and DevOps. Full Stack Developer & System Administrator @ DigitalOcean | GitHub Contributor | Passionate about Docker, PostgreSQL, and Open Source | Exploring NLP & AI-TensorFlow | Nailed over 50+ deployments across production environments.
Thanks for the example. It was helpful. As an FYI I use a mac os version 10.15.5. The get info feature of Finder reports file size in multiples of 1,000,000 ( 1000 * 1000) not 1,048,596 (1024 * 1024). Not sure if this changed at some point. Thanks again
- John
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.