Unix: Frequently used Unix commands.

 File Cleanup Commands

My file is a tab delimiter. Move the line which don’t have 11 columns to bad file.
— awk -F”\t” ‘NF-1 != 11 {print $0} ‘

Display a range of columns using Awk

Using the below command one can display a range of columns for example column 5th to 10th etc.,
— awk -F”\t” ‘BEGIN { OFS=”\t” ; ORS=”\t” } { for (i=1; i<=14;i++) printf(“%s%s”, $i,(i==14) ? “\n” : OFS) }’

Replace csv to custom delimiter file 
sed -Ee :1 -e 's/^(([^",]|"[^"]*")*),/\1|#|/;t1'
awk to handle multiple delimiters

awk -F'[|][#][|]' '{print $24}' CLN_STG_Campaign_Insert_filtered.txt | sort | uniq
Replace null columns in csv file to some value

-- awk 'BEGIN { FS = OFS = "^" } {sub(/""/, "\"Other\"", $15);sub(/""/, "\"Other Form\"", $28);sub(/""/, "\"Brand - 0F3\"", $24);print;}'
 Check how many tabs present in each line 

-- awk -F"\t" '{print NF}' FINL_Contracts_Data.csv | sort | uniq
count the tabs in each line and aggregate the value 
awk -F"\t" '{print NF}' FINL_Contracts_Data.csv | sort | uniq -c
Replace newline character to null between double quotes.
awk -v RS='"[^\t]*"' -v ORS= '{gsub(/\r/, " ", RT); print $0  RT}'
Replace newline character to blank between double quotes in a csv file.
-- awk 'FPAT="([^,]+)|(\"[^\"]+\")"{x=$0;while((gsub(/\"/,"\"",x)%2)!=0){getline;x=x " ";x=x $0};$0=x;print}'

another way

-- awk '(NR-1)%2{$1=$1}1' RS=\" ORS=\"
Remove non-ASCII characters from CSV
-- LANG=C sed -i 's/[\d128-\d255]//g' or sed -i 's/[\d128-\d255]//g'
Ignore comma between double quotes in a csv file.
-- awk -v FPAT='([^,]*)|("[^"]+")' -v OFS=, '{print $20}'
Replace invisible "NUL" charter between alphabets.
-- sed -i 's/\x0//g'
Remove all the non ASCII characters from the source file.
-- perl -pi -e 's/[^[:ascii:]]//g'
Remove Blank line what might have white space or newline.
-- sed -i '/^[[:space:]]*$/d;s/[[:space:]]*$//'
Remove the lines in a file that has just quotes (blank line just has Quotes)
-- sed -i '/^\"$/d'   (More robust command would be sed -i '/^.$/d')

To grep and see such line in the file use — grep “^\”$”

Question:- AWK comma as field separator and text inside double quotes as a field

Column1,             Column2,                           Column13
aaa,          “hell world, test text”,bbb,ccc,”  test text”

Find the length of the second column in the above example.

Solution:- awk -F\” ‘{for(i=1;i<=NF;i+=2) {gsub(“,”, “;”, $i)}}1’ OFS= File_nm.txt | awk -F\; ‘{ if (length($2) > max) max = length($2) } END { print max }’

Question:- Find Max Length of a column using AWK.

— awk ‘max==”” {max=length($1)} END{ print max}’ FS=”,”

Question:- Get the length for all the column in the file.

awk -F”,” ‘BEGIN {OFS=”,”} NR==1 {for (b=1;b<=NF;b++) {hdr[b]=$b} } NR > 1 {for (i=1;i<=NF;i++) {if(length($i) > max[i]) max[i] = length($i)} } END {for (i=1;i <= NF;i++) print hdr[i],max[i]+0}’ Csv_file_nm.csv

Note:- If you have “|” pipe delimiter just change -F”,” to -F”|” and (optional) OFS=”,” to OFS=”|”

Question:- Merge lines based on a pattern
for more information on the scenario please refer the below link


— perl -0pe ‘s/\n(?!([0-9]{6}|$))//g’

Question:- Get min value from a column
— awk ‘min==”” || $1 < min {min=$1} END{ print min}’ FS=”\t”

Question:- Lookup in Unix
awk ‘FNR==NR{a[$1]=$0;next}; $1 in a {print a[$1],”|”,$0}’ Actual_File.txt Clean_File.txt

for advance concepts of gawk

Question :- Creating / adding header to a file

first place the header in a variable enclosed in single quotes and then call it in awk as follows

Header='”Description of the following table:”,”vamshi”‘
awk -v var=”$Header” ‘BEGIN{print var}{print $0}’ test.txt > temp.txt


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s