Tutorial

How to Read Large Text Files in Python

Published on August 4, 2022
author

Pankaj

How to Read Large Text Files in Python

Python File object provides various ways to read a text file. The popular way is to use the readlines() method that returns a list of all the lines in the file. However, it’s not suitable to read a large text file because the whole file content will be loaded into the memory.

Reading Large Text Files in Python

We can use the file object as an iterator. The iterator will return each line one by one, which can be processed. This will not read the whole file into memory and it’s suitable to read large files in Python. Here is the code snippet to read large file in Python by treating it as an iterator.

import resource
import os

file_name = "/Users/pankaj/abcdef.txt"

print(f'File Size is {os.stat(file_name).st_size / (1024 * 1024)} MB')

txt_file = open(file_name)

count = 0

for line in txt_file:
    # we can process file line by line here, for simplicity I am taking count of lines
    count += 1

txt_file.close()

print(f'Number of Lines in the file is {count}')

print('Peak Memory Usage =', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
print('User Mode Time =', resource.getrusage(resource.RUSAGE_SELF).ru_utime)
print('System Mode Time =', resource.getrusage(resource.RUSAGE_SELF).ru_stime)

When we run this program, the output produced is:

File Size is 257.4920654296875 MB
Number of Lines in the file is 60000000
Peak Memory Usage = 5840896
User Mode Time = 11.46692
System Mode Time = 0.09655899999999999
Python Read Large Text File
Python Read Large Text File
  • I am using os module to print the size of the file.
  • The resource module is used to check the memory and CPU time usage of the program.

We can also use with statement to open the file. In this case, we don’t have to explicitly close the file object.

with open(file_name) as txt_file:
    for line in txt_file:
        # process the line
        pass

What if the Large File doesn’t have lines?

The above code will work great when the large file content is divided into many lines. But, if there is a large amount of data in a single line then it will use a lot of memory. In that case, we can read the file content into a buffer and process it.

with open(file_name) as f:
    while True:
        data = f.read(1024)
        if not data:
            break
        print(data)

The above code will read file data into a buffer of 1024 bytes. Then we are printing it to the console. When the whole file is read, the data will become empty and the break statement will terminate the while loop. This method is also useful in reading a binary file such as images, PDF, word documents, etc. Here is a simple code snippet to make a copy of the file.

with open(destination_file_name, 'w') as out_file:
    with open(source_file_name) as in_file:
        for line in in_file:
            out_file.write(line)

Reference: StackOverflow Question

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar
Pankaj

author

While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
JournalDev
DigitalOcean Employee
DigitalOcean Employee badge
December 16, 2019

Hi Pankaj, I have requirement to find string in a file size upto 5GB, how can be achieved. Thanks Hari

- hari

    Try DigitalOcean for free

    Click below to sign up and get $200 of credit to try our products over 60 days!

    Sign up

    Join the Tech Talk
    Success! Thank you! Please check your email for further details.

    Please complete your information!

    Become a contributor for community

    Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

    DigitalOcean Documentation

    Full documentation for every DigitalOcean product.

    Resources for startups and SMBs

    The Wave has everything you need to know about building a business, from raising funding to marketing your product.

    Get our newsletter

    Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

    New accounts only. By submitting your email you agree to our Privacy Policy

    The developer cloud

    Scale up as you grow — whether you're running one virtual machine or ten thousand.

    Get started for free

    Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

    *This promotional offer applies to new accounts only.