Providing Security
against SQL Injection Attack in Web Application
MAMTA SAHARE 44
Algorithm of the Needleman-Wunsch Algorithm
1.
Initialization
C(0, 0) = 0
C(0, i) = −i * d
C(j, 0) = −j * d
2. Main
Iteration
For each i = 1 . . . M
For each j = 1 . . . N
C(i,
j) =
max
{ C(i − 1, j − 1) + s(xi, yj), case 1
C(i − 1, j) − d ,
case2
C(i, j − 1) − d , case3
}
Ptr(i,
j) =
{ DIAG , if case 1
LEFT , if case 2
UP , if case 3
}
Where C is the Score Matrix
and Ptr is the Traceback Matrix.
Suppose there are two sequences
e.g. Normal:- AM EQ (profile amino acid code)
Attacked:- AM EQ AM
The Needleman-Wunsch algorithm consists of three steps
1.
Initialization of the Score Matrix.
Score Matrix is
a matrix which is used in Needleman wunsch algorithm to calculate best score. It
also helps to construct a matrix which in turn helps to deduce best alignment
between two sequences.
2.
Calculation of scores and Filling the Traceback matrix.
Traceback
Matrix is used to deduce a Traceback path. This Traceback path helps to find
best alignment.
3.
Deducing the alignment from the Traceback matrix.
Initialization of
the Score Matrix
The cells of the Score Matrix
are labeled C(i, j) where
i = 1; 2; …..;N and j = 1;
2;….;M
|
A
|
M
|
E
|
Q
|
A
|
M
|
|
C(1,1)
|
C(1,2)
|
C(1,3)
|
C(1,4)
|
C(1,5)
|
C(1,6)
|
C(1,7)
|
|
A
|
C(2,1)
|
C(2,2)
|
C(2,3)
|
C(2,4)
|
C(2,5)
|
C(2,6)
|
C(2,7)
|
M
|
C(3,1)
|
C(3,2)
|
C(3,3)
|
C(3,4)
|
C(3,5)
|
C(3,6)
|
C(3,7)
|
E
|
C(4,1)
|
C(4,2)
|
C(4,3)
|
C(4,4)
|
C(4,5)
|
C(4,6)
|
C(4,7)
|
Q
|
C(5,1)
|
C(5,2)
|
C(5,3)
|
C(5,4)
|
C(5,5)
|
C(5,6)
|
C(5,7)
|
SCORE MATRIX
This is the initialization of first row
and first column of the score matrix. This can be done by any value. Here the
initialization is done with the values 0, -1,-2, …………..
A
|
M
|
E
|
Q
|
A
|
M
|
||
0
|
-1
|
-2
|
-3
|
-4
|
-5
|
-6
|
|
A
|
-1
|
||||||
M
|
-2
|
||||||
E
|
-3
|
||||||
Q
|
-4
|
TRACEBACK MATRIX
This is the initialization of
first row and first column of Traceback Matrix.
First row is initialized with a
value ‘left’ and first column is initialized with a value ‘up’. This will help
to deduce Traceback Path.
A
|
M
|
E
|
Q
|
A
|
M
|
||
done
|
left
|
left
|
left
|
left
|
Left
|
left
|
|
A
|
up
|
||||||
M
|
up
|
||||||
E
|
up
|
||||||
Q
|
up
|
3. Scoring
Formula for calculating the
identity score
The score of any cell C(i , j)
is the maximum of:
qdiag = C(i -1, j - 1) + S(i ; j)
qup = C(i -1,j) + g
qleft = C(i , j - 1) + g ----------------(1)
Where S(i , j) is a score given to whether i-th
element and j-th element of alignment are identical or not. Gap is a cut
in score for insertion of blank, which fills matrix C. The score lower right end of matrix C is the maximum score of global alignment.
The calculation for the cell C(2 , 2):-
i=2, j=2, g=(-1)
qdiag
= C(1, 1) + S(A,A) = 0 + 1 = 1
qup
= C(1, 2) + g = -1 + (-1) = -2
qleft
= C(2, 1) + g = -1 + (-1) = -2
Here the maximum value is 1 so
we put C(2,2) =1 in score matrix and fill diag into the Traceback matrix in the
cell Ptr(2,2)=diag.
SCORE MATRIX
In this matrix the value for the
cell c(2,2) is filled as calculated with the
formula 1.
A
|
M
|
E
|
Q
|
A
|
M
|
||
0
|
-1
|
-2
|
-3
|
-4
|
-5
|
-6
|
|
A
|
-1
|
1
|
|||||
M
|
-2
|
||||||
E
|
-3
|
||||||
Q
|
-4
|
TRACEBACK MATRIX
Since the value calculated for
the cell c(2,2) has come from qleft. Thus ptr(2,2)=diag is filled in the
Traceback Matrix.
A
|
M
|
E
|
Q
|
A
|
M
|
||
done
|
left
|
left
|
left
|
left
|
left
|
Left
|
|
A
|
up
|
diag
|
|||||
M
|
up
|
||||||
E
|
up
|
||||||
Q
|
up
|
This process is continued until
all the blank cells are fulfilled.
SCORE MATRIX
Here the values are calculating for all
the cells of score matrix. The completely filled matrix is given below
A
|
M
|
E
|
Q
|
A
|
M
|
||
0
|
-1
|
-2
|
-3
|
-4
|
-5
|
-6
|
|
A
|
-1
|
1
|
0
|
-1
|
-2
|
-3
|
-4
|
M
|
-2
|
0
|
2
|
1
|
0
|
-1
|
0
|
E
|
-3
|
-2
|
1
|
3
|
2
|
1
|
0
|
Q
|
-4
|
-3
|
0
|
2
|
4
|
3
|
2
|
TRACEBACK MATRIX
After filling Score Matrix,
different cells of Traceback Matrix are to be filled according to their values
in Score Matrix. The final Traceback Matrix obtained is given below
A
|
M
|
E
|
Q
|
A
|
M
|
||
done
|
left
|
left
|
left
|
left
|
left
|
Left
|
|
A
|
up
|
diag
|
left
|
left
|
left
|
left
|
Left
|
M
|
up
|
up
|
diag
|
left
|
left
|
left
|
Diag
|
E
|
up
|
up
|
up
|
diag
|
left
|
left
|
left
|
Q
|
up
|
up
|
up
|
up
|
diag
|
left
|
Left
|
For deducing best alignment, Traceback
path is to be formed. For calculating Traceback path, start from rightmost
lower corner. In the first cell i.e. c(5,7)=left is written, so now move in the
leftward direction. In the next cell again a left is found, so again move in
leftward direction. Now in the next cell a diag is found, so now move in
diagonal direction. Proceeding in the similar manner, traverse until done is
found.
The path formed in this manner
is called Traceback path. The final matrix formed is given below
A
|
M
|
E
|
Q
|
A
|
M
|
||
done
|
left
|
left
|
left
|
left
|
left
|
left
|
|
A
|
up
|
diag
|
left
|
Left
|
left
|
left
|
left
|
M
|
up
|
up
|
diag
|
left
|
left
|
left
|
diag
|
E
|
up
|
up
|
up
|
diag
|
left
|
left
|
left
|
Q
|
up
|
up
|
up
|
up
|
diag
|
left
|
left
|
For deducing best alignment,
start traversing Traceback path. Start again from rightmost lower corner. The
rule to be followed for deducing the alignment is as follows
1. If a left is encountered, place
a gap in leftward sequence.
2. If a up is encountered, place
a gap in upward sequence.
3. If a diag is encountered, then
it means that the two characters are aligned.
With the help of given formulas,
traverse the Traceback path to introduce gap in the sequences.
AM EQ AM
AM EQ
Best score is=2
Step 5 Measure
similarity of Na Code Amino and Aa Code Amino
Alignment identity I for
measuring similarity to detect whether Pa
Code Amino and Aa Code
Amino are injected is computed as shown in formula.
I
= (alignment match code (Pa Code Amino, Aa Code Amino)/
Aa Code Amino length + gap))
%100
The identity for the measurement
of similarity is computed by dividing identity scores of two sequence
alignments by sum of blank length and whole sequence length. Namely, upon
calculating similarity, score is computed in consideration of sequence length
both in the case of long sequence length with high identity score and in the
case of short sequence length with low identity score, which secures balance.
Alignment is compared in order to judge which alignment is superior. If the
code sequences are 100% identical then this query is allowed to be executed
otherwise this is rejected. The identity is calculated for given two sequences
No comments:
Post a Comment