An SQL injection takes advantage of lax coding habits. It's an attack whereby a hostile user submits code to one of your web forms, instead of whatever data you were trying to collect. The hostile code either queries your database in a way you don't expect, or breaks out of your web application and performs operations directly on your cloud server. It's surprisingly easy to launch an SQL injection attack against an unsuspecting site. Guarding against such an attack should be one of your top priorities from day one.
SQL attacks always take the form of a user-submitted string that contains two parts. The first part is a guess at how to safely terminate a command your code is trying to perform; the second part is the hostile code that the attacker wishes to run on your VPS. Here's an example of a user-submitted string designed to take advantage of laxness in your code:
x' AND user.email IS NULL; --
This looks almost like something you might have written yourself - and that's the point. The user is hoping that your code will take this string, and use it in an SQL query that looks something like this:
SELECT email, passwd FROM user
WHERE email = 'x' AND user.email IS NULL; --';
This may not seem to do much, but depending on how your application responds, it could inform the attacker that they've correctly guessed the relevant table name. After that, there are further attacks that can start to reveal more information, such as usernames and passwords.
The key detail to understand is that the hostile code attempts to close your SQL quote, or in some other way break out of the constraints of your SQL query, and either request or insert data of its own.
Not all SQL injection attacks require closing quotes. If your code performs a query on a number, you wouldn't put that data in quotes. This leaves you vulnerable to the following type of injection:
2097 OR 1=1
The attacker hopes your application will do something like the following:
SELECT somedata FROM yourtable WHERE userid = 2097 OR 1=1;
Your code intended the above SQL query to return data only if the userid matched that of the correct user. But the SQL injection causes the query to always return the data.
The point is not the exact behavior of any particular SQL injection. The most important thing to take away from this is the common characteristic to all SQL injections: an attempt to guess how to terminate one part of a query, and initiate another part that you didn't expect. That's the signature of all SQL injections. And it's how to fight them.
In your code, the best place to track down all the possible entry-points for an SQL injection attack is not by looking for your HTML forms. Sure, you could catch a bunch that way, but there are other ways the user can input data, including in the URL, or through one of your AJAX interfaces.
No, the best place to look is right at the point of vulnerability - the SQL query itself. Probably all your queries are performed using the same basic command - or perhaps two or three different possible commands. Just search for those in your codebase, and you'll find every point of vulnerability very quickly. For example, if your code uses Perl, all your queries may take a form similar to this:
$result = mysql_query($sql)
In that case, you could quickly find all points of vulnerability from the command-line, using a command similar to this:
$ grep -R mysql_query *
There are a variety of techniques people use to prevent SQL injection attacks, but your front-line defense should be to sanitize all user input. You must never assume that the user will submit data in the format you desire. In fact, you should assume just the opposite - that they want to submit just the right string to tear your code apart.
Sanitizing input means that all user-submitted strings are tested to make sure they contain only safe characters, that could never be used in an attack.
User-submitted strings will be used by your SQL server in one of two ways:
1. As a number, like 2097
2. As a string, such as a username, password, or email address
Your code always expects one of these two forms of data. In the examples given at the top of this tutorial, the first example wanted a string, and the second example wanted a number. It will always be one or the other.
There are also two ways to sanitize data - a good way and a bad way. The bad way involves checking it for possible injections. The reason that's bad is because the number of possible injections is so vast, and the creativity of the attackers is so great. The good way to sanitize data is to identify what a proper input would look like, and exclude everything that doesn't conform to those constraints.
Numerical data is easiest to sanitize, so we'll cover that first. A number has an optional minus sign on the left, followed by some number of numerals, and maybe a decimal point. Nothing else. In Perl, you could test numerical user input like this:
if($numericaluserinput !~ /^-?[0-9.]+$/) { # the user input is not a number }
It's difficult to imagine any hostile injection sneaking past that barrier.
Sanitizing text input is more difficult, because there are so many ways to attempt it. The attacker may use quotes and backslashes in creative ways. In fact, trying to blacklist particular characters in user input is frowned upon, because it's too easy to miss something important.
A better approach, as we did with numerical input, is to restrict user input to a set of whitelisted characters. For example, an email address may be restricted to letters, numbers, dashes, underscores, plus signs, decimals, and the @ symbol. In Perl, you could test email address inputs like this:
if($useremail ~= /^[0-9a-zA-Z\-_+.\@]$/) { # the user's email address is unacceptable }
A similar whitelist could be found for other kinds of user inputs, such as usernames, home addresses, and so on.
Users may chafe against whitelists. The specification for email addresses includes a variety of symbols not included in the test given above. Of course you're free to include additional symbols in your tests, but before doing so you should research each one to make sure it's never used in SQL injections. Given the choice between user comfort and site security, site security must always come first.
Sanitizing inputs with whitelisted character-sets is good because it's easy. If you're willing to put a few simple constraints on the type of data users can submit, and if you're diligent about regularly checking your SQL queries to make sure they use only sanitized inputs, you can virtually eliminate the risk of SQL injections.
What? But we just got through talking about the importance of not accepting risky inputs.
It's true that highly restrictive whitelists are the way to go, if your application can tolerate putting that level of restriction on users. But in some cases, it may be important for your business model to not put any sort of constraint on user inputs.
In this case, the programming language used in your application probably has libraries to protect your code from hostile user inputs. For example, the Perl DBI library has methods to prevent user inputs from breaking out of the specific part of the query for which they're intended:
my $sql = "INSERT INTO user (username, email) VALUES (?, ?)"; my $handle = $dbh->prepare( $sql ); $handle->execute( $untrustedusername, $untrustedemail);
In the above example, the '?' characters are used as placeholders. The Perl DBI library replaces them with untrusted variables gathered from the user. But in doing so, the DBI library explicitly restricts those variables to relating only to the fields that expect them.
Other languages have similar libraries, either to restrict the usage of user data, or to escape the data.
The benefit of this technique is that you can put your faith in the people maintaining the library code, that they will maintain it, and keep it free from bugs and security holes. The drawback of this technique is that it's a little bit less readable, and so it becomes more likely that some of your SQL queries will forget to use the appropriate library calls to protect themselves.
Depending on your business model, you may want to implement a final line of defense - something completely independent from your application developers. After all, maybe one of them used the wrong whitelist somewhere, or failed to call the appropriate library call to isolate user data. A single hole in the code is all it may take to render your entire site vulnerable to SQL injection.
First of all, you should assume that if an attacker breaks your injection defenses, that they have now obtained full root privileges on that VPS. They own the machine, even though you're doing all the care and feeding of it.
To mitigate the fallout from that, the VPS itself should be configured within an isolated part of the network, so that a root user on that system can't see or access any other systems in your infrastructure. That kind of defense is called a DMZ, and it's very cool, and also beyond the scope of this tutorial.
Whatever mechanism you use to secure the cloud server against a hostile user that has already broken in, you should still try to set up some kind of alert system, so that your system administrators will be informed if certain activities are recorded on the VPS. These indicators will tell you that your application code has been penetrated, and that you need to do an immediate review of your entire codebase, to find the bug. Without those alerts, the attacker can take their sweet time penetrating your DMZ, and you might never have a clue anything was amiss until they absconded with all the credit card numbers you thought were isolated on a completely different server, in an 'inaccessible' part of the network.
If you're implementing a web application without the help of a security professional, then you're going to be swimming in very dangerous waters. If for some reason you aren't ready to hire a security professional yet, you should consider a very draconian approach to whitelisting the characters allowed in user inputs. Whitelists are relatively simple to implement and review, and a draconian enough system should be fairly impenetrable. Let your users experience the slight inconvenience of a constrained character set, while you build your business and prepare to do the hiring you need. Once you have someone who understands security issues, you'll be better prepared to guard against SQL injections, cross-site-scripting attacks, and many other dangerous security problems that plague the modern web.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
If you’re using Ruby on Rails, check out the Brakeman gem or Code Climate to identify security risks.
Awesome Tutorial on recognizing SQL injections! As states above using a framework is a good idea. It saves you time not having to reinvent the wheel.
I really enjoy working with codeigniter in PHP and Rails for Ruby. I started to dabble in Django. All of them are great and have prebuilt functions for sanitizing and securing you SQL statements.
Step Zero: USE A PROPER FRAMEWORK!!!
Many good open source frameworks have strong input filtering features. I use Drupal and the entire database interactions pass by string processing, to ensure no SQL attacks. But CakePHP, symfony, ZF, and other good frameworks also do it. Also other frameworks from languages like Rails and Django (these I quoted before are all written in PHP).