How to split a string depends on a pattern in other column (UNIX environment)

bash unix awk split substr

142 观看

3回复

111 作者的声誉

I have a TAB file something like:

V    I      280     6   -   VRSSAI
N    V      2739    7   -   SAVNATA
A    R      203     5   -   AEERR
Q    A      2517    7   -   AQSTPSP
S    S      1012    5   -   GGGSS
L    A      281    11   -   AAEPALSAGSL

And I would like to check the last column respect to the order of letters in 1st and 2nd column. If are coincidences between the first and last letter in last column comparing to the 1st and 2nd column respectively remain identical. On the contrary if there are not coincidences I would like to locate the reverse order pattern in last column and then print the string from the letter in 1st column to the end and then take the first letter and print to the letter in 2nd column. The desired output would be:

V    I      280     6   -   VRSSAI
N    V      2739    7   -   NATASAV
A    R      203     5   -   AEERR
Q    A      2517    7   -   QSTPSPA
S    S      1012    5   -   SGGGS
L    A      281    11   -   LSAGSLAAEPA

In this way I'm try to do different scripts but do not work correctly I don't know exactly why.

awk 'BEGIN {FS=OFS="\t"}{gsub(/$2$1/,"\t",$6); print $1$7$6$2}' "input" > "output";

Other way is:

awk 'BEGIN {FS=OFS="\t"} {len=split($11,arrseq,"$7$6"); for(i=0;i<len;i++){printf "%s ",arrseq[i],arrseq[i+1]}' `"input" > "output";`

And I try by means of substr function too but finally no one works correctly. Is it possible to do in bash? Thanks in advance

I try to put an example in order to understand better the question.

$1                 $2                 $6
L                  A                  AAEPALSAGSL (reverse pattern 'AL' $2$1)

desired output in $6 from the corresponding $2 letter within reverse pattern to the end following by first letter to corresponding $1 letter within the reverse pattern

$1                 $2                 $6
L                  A                  LSAGSLAAEPA
作者: Perceval Vellosillo Gonzalez 的来源 发布者: 2017 年 12 月 27 日

回应 3


2

2035 作者的声誉

You can try this awk, it's not perfect but it give you a starting point.

awk '{i=(match($6,$1));if(i==1)print;else{a=$6;b=substr(a,i);c=substr(a,1,(i-1));$6=b c;print}}' OFS='\t' infile
作者: ctac_ 发布者: 2017 年 12 月 27 日

5

5965 作者的声誉

决定

If I understood the question correctly, this awk should do it:

awk '( substr($6, 1, 1) != $1 || substr($6, length($6), 1) != $2 ) && i = index($6, $2$1) { $6 = substr($6, i+1) substr($6, 1, i)  }1' OFS=$'\t' data

You basically want to rotate the string so that the beginning of the string matches the char in $1 and the end of the string matches the char in $2. Strings that cannot be rotated to match that condition are left unchanged, for example:

A    B    3    3    -    BCAAB
作者: PesaThe 发布者: 2017 年 12 月 27 日

1

435 作者的声誉

gawk '
BEGIN{
    OFS="\t"
}
$6 !~ "^"$1".*"$2"$" {
    $6 = gensub("(.*"$2")("$1".*)", "\\2\\1", 1, $6)
}
{print}
' input.txt

Output

V   I   280     6   -   VRSSAI
N   V   2739    7   -   NATASAV
A   R   203     5   -   AEERR
Q   A   2517    7   -   QSTPSPA
S   S   1012    5   -   SGGGS
L   A   281     11  -   LSAGSLAAEPA
作者: MiniMax 发布者: 2017 年 12 月 27 日
32x32