How Prepared Statement works internally in Java.

How Prepared Statement works internally in Java.


How Prepared Statement works internally in Java. Prepared Statement is part of Java JDBC API. PreparedStatement internally caches query.

This is the famous interview question for the beginners, So Let's see what it is all about.

SQL Injection is code injection technique where SQL is injected by user (as part of user input) into the back end query, and ultimately changes query purpose which upon execution gives harmful result.

Detailed explanation on SQL Injection: What is SQL Injection?


How can SQL Injection happen.


At server side, queries generally by themselves are not complete and require user data to make it complete, meaningful and executable.
"select * from user where username = ' " + username + " ' ";
Above query is not complete as it has dependency on username variable.

Now if username variable is filled by third party, then there are chances that user data contains SQL,

Take an example. Application is asking user to enter user name,
Enter user name:________________________

Enter user name:___jayesh'; delete from user where id='1__

At Server Side,

username = "jayesh'; delete from user where id='1"
Final Query = "select * from user where username = ' jayesh'; delete from user where id='1 ' ";

If you observe final query, upon execution it will delete the record from user table which was never the purpose of original query and this is called SQL Injection attack. 

Because of user data (which can be anything and uncontrolled) involvement in formation of query, SQL Injection attack can happen.

Detailed explanation on: How can SQL Injection happen?

How PreparedStatement in Java prevents SQL Injection?


To understand this, Lets see steps involved in query execution.
1. Compilation Phase.

2. Execution Phase.

Whenever SQL server engine receives a query, It has to pass through below phases,

1. Parsing and Normalization Phase
    In this phase, Query is checked for syntax and semantics. It checks whether references table and
    columns used in query exist or not.

    It also has many other task to do, Let's not go in much detail.

2. Compilation Phase
    In this phase, keywords used in query like select, from, where etc are converted into format
    understandable by machine.

    This is the phase where query is interpreted and corresponding action to take is decided.
    It also has many other task to do, Let's not go in much detail.

3. Query Optimization Plan
     In this phase, Decision Tree is created for finding the ways in which query can be executed.
     It find out the number of ways in which query can be executed and cost associated with each way
     of executing Query. 

     It choose the best plan for executing a query.

4. Cache
    Best plan selected in Query optimization plan is stored in cache, so that whenever next
    time same query come in, it doesn't has to pass through Phase 1, Phase 2 and Phase 3.

    When next time query come in, it will be checked directly in Cache and picked up from there
    and execute.


5. Execution Phase
    In this phase, supplied query gets executed and data is returned to user as ResultSet object.


Behaviour of PreparedStatement API on above steps?


1. PreparedStatement are not complete SQL query and contains placeholder,
    which at run time is replaced by actual user data.

2. Whenever any PreparedStatment containing placeholders is passed in to SQL Server engine,
    It passes through below phases


      1. Parsing and Normalization Phase
      2. Compilation Phase
      3. Query Optimization Plan
      4. Cache (Compiled Query with placeholders are stored in Cache.)
UPDATE user set username=? and password=? WHERE id=?
3. Above query will get parsed, compiled with placeholders as special treatment, optimized and
    get Cached.

    Query at this stage is already compiled and converted in machine understandable format.
    So we can say that Query stored in cache is Pre-Compiled means already compiled and
    only placeholders need to be replaced with user data.


  4. Now at run-time when user data comes in, Pre-Compiled Query is picked up from Cache and 
    placeholders are replaced with user data.
   
    (Remember, after place holders are replaced with user data, final query is not
    compiled/interpreted again and SQL Server engine treats user data as pure data and not a
    SQL that needs to be parsed or compiled again and that is beauty of PreparedStatement.)


     If the query doesn't has to go through compilation phase again, then whatever data is replaced to
     placeholders are treated as pure data and has no meaning to SQL Server engine and it directly
     executes the query.


     Note: It is the compilation phase after parsing phase, that understands/interprets the query 
               structure and give meaningful behavior to it. In case of PreparedStatement, query is
               compiled only once and cached compiled query is picked up all the time to replace
               user data and execute.

               
      Due to one time compilation feature of PreparedStatement, it is free of SQL Injection
      attack.


SUMMARY:

    PrepareStatment query which is incomplete and contain placeholders is first compiled.
    In the compilation phase, actual query is parsed and at this point only, any inner queries or
    any sub queries are checked.
   
    After this phase query is optimized and saved in Cache.
   
     Now, when user data comes in, Pre-compiled Query is picked from Cache and placeholders are  

     replaced with user data.
   
     After user data replacement, final Query formed is not compiled again since it is Pre-Compiled,
     Now, if user data contain any SQL then also it has no effect and will be treated as pure data.
     (Remember if query is compiled again then only it will treat data as SQL if it contains.)


Why to use PreparedStatement instead of Statement API in Java?


1. Whenever Query is executed using Statement in java, then Query contains all the necessary data
    for execution

    Query is supplied to SQL Server engine after placeholder replacement with user data.
"SELECT * FROM user WHERE username = ' " + username + " ' ";
2. Now at run time, if user supplied data contains SQL like,
    username = jayesh'; delete from user where id='1 then query becomes,
select * from user where username='jayesh'; delete from user where id='1';
3. This final query is now supplied to SQL Server engine, which pass through all the below phases,

    1. Parsing and Normalization Phase
    2. Compilation Phase
    3. Query Optimization Plan
    4. Cache (Compiled Query with placeholders are stored in Cache.)

4. In Compilation phase, Query supplied will became,
    select * from user where username='jayesh';
    delete from user where id='1';
    and here the problem starts. So Statement API is prone to SQL Injection attack.

NOTE:
     So problem with Statement API is query is not pre-compiled.

     In original query with placeholders, user data is first replaced and modified query is passed to 
     SQL Server engine which runs compilation phase every time on query.
     In compilation phase, if user data contains SQL then that will be converted into instructions and
     get executed which is not the case with PreparedStatment where compilation of query happens
     only one time.


5. When Compared to PreparedStatement, Statement API are not Pre-Compiled and whenever
    supplied to
SQL Server engine, it passes through compilation phase, which doesn't differentiate
    over user input and SQL and that is where problem starts.

6. In PreparedStatement, SQL is supplied to SQL Server engine with place holders which doesn't
    contain any user data.
SQL Server engine passes the query to all above phases and keep it in ready
    to run state, where only place holder replacement is remaining.

    So whenever user data is supplied, placeholders are replaced with data and final query is executed
    without compiling and that is where user data is treated as pure data and if SQL is supplied in data
    that will be treated as data.


Importatnt benefits of PreparedStatement in Java?


1. Now, from above observation we saw that whenever a query is executed using PreparedStatment,
    It passes through below phases, 
and keep the SQL query in Pre-Compiled state.
    1. Parsing and Normalization Phase
    2. Compilation Phase
    3. Query Optimization Plan
    4. Cache (Compiled Query with placeholders are stored in Cache.)
   
    So now, if we need to run a query 10 times or 100 times, it doesn't has to pass through all the
    above phases
and can run quickly by simply replacing placeholders in Pre-Compiled query.
    This feature of PreparedStatement gives better performance.

6. Since PreparedStatement are compiled only once, it helps SQL Injection attacks while
    Statement is prone to SQL Injection due to its every time compile nature.




You may also like to see


Explain SQL Injection with example?

Enjoy !!!! 

If you find any issue in post or face any error while implementing, Please comment.