103.2 Process text streams using filters Flashcards
cat
he cat (short for “concatenate“) command is one of the most frequently used command in Linux/Unix like operating systems. cat command allows us to create single or multiple files, view contain of file, concatenate files and redirect output in terminal or files.
cut
The cut command in UNIX is a command for cutting out the sections from each line of files and writing the result to standard output. It can be used to cut parts of a line by byte position, character and field. Basically the cut command slices a line and extracts the text. It is necessary to specify option with command otherwise it gives error. If more than one file name is provided then data from each file is not precedes by its file name.
expand
expand which allows you to convert tabs into spaces in a file and when no file is specified it reads from standard input.
fmt
fmt command in LINUX actually works as a formatter for simplifying and optimizing text files. Formatting of text files can also be done manually, but it can be really time consuming when it comes to large text files, this is where fmt comes to rescue.
head
It is complementary to Tail command. The head command, as the name implies, print the top N number of data of the given input. By default, it prints the first 10 lines of the specified files. If more than one file name is provided then data from each file is preceded by its file name.
od
od command in Linux is used to convert the content of input in different formats with octal format as the default format.This command is especially useful when debugging Linux scripts for unwanted changes or characters. If more than one file is specified, od command concatenates them in the listed order to form the input.It can display output in a variety of other formats, including hexadecimal, decimal, and ASCII. It is useful for visualizing data that is not in a human-readable format, like the executable code of a program.
https://www.geeksforgeeks.org/od-command-linux-example/
join
The join command in UNIX is a command line utility for joining lines of two files on a common field.
Suppose you have two files and there is a need to combine these two files in a way that the output makes even more sense.For example, there could be a file containing names and the other containing ID’s and the requirement is to combine both files in such a way that the names and corresponding ID’s appear in the same line. join command is the tool for it. join command is used to join the two files based on a key field present in both the files. The input file can be separated by white space or any delimiter.
Syntax:
// displaying the contents of first file // $cat file1.txt 1 AAYUSH 2 APAAR 3 HEMANT 4 KARTIK
// displaying contents of second file // $cat file2.txt 1 101 2 102 3 103 4 104
//..using join command...// $join file1.txt file2.txt 1 AAYUSH 101 2 APAAR 102 3 HEMANT 103 4 KARTIK 104
// by default join command takes the first column as the key to join as in the above case //
nl
nl is a linux command to number lines of the files, it copies its files to standard output, prepending line numbers. It’s more flexible than cat with its -n and -b options, providing an almost bizarre amount of control over the numbering.
paste
It is used to join files horizontally (parallel merging) by outputting lines consisting of lines from each file specified, separated by tab as delimiter, to the standard output.
pr
The pr command writes the specified file or files to standard output. If you specify the - (minus sign) parameter instead of the File parameter, or if you specify neither, the pr command reads standard input. A heading that contains the page number, date, time, and name of the file separates the output into pages.
sed
SED command in UNIX is stands for stream editor and it can perform lot’s of function on file like, searching, find and replace, insertion or deletion. Though most common use of SED command in UNIX is for substitution or for find and replace. By using SED you can edit files even without opening it, which is much quicker way to find and replace something in file, than first opening that file in VI Editor and then changing it.
SED is a powerful text stream editor. Can do insertion, deletion, search and replace(substitution).
SED command in unix supports regular expression which allows it perform complex pattern matching.
sort
SORT command is used to sort a file, arranging the records in a particular order. By default, the sort command sorts file assuming the contents are ASCII. Using options in sort command, it can also be used to sort numerically.
SORT command sorts the contents of a text file, line by line.
sort is a standard command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order.
The sort command is a command line utility for sorting lines of text files. It supports sorting alphabetically, in reverse order, by number, by month and can also remove duplicates.
The sort command can also sort by items not at the beginning of the line, ignore case sensitivity and return whether a file is sorted or not. Sorting is done based on one or more sort keys extracted from each line of input.
By default, the entire input is taken as sort key. Blank space is the default field separator.
The sort command follows these features as stated below:
Lines starting with a number will appear before lines starting with a letter.
Lines starting with a letter that appears earlier in the alphabet will appear before lines starting with a letter that appears later in the alphabet.
Lines starting with a lowercase letter will appear before lines starting with the same letter in uppercase.
Command : $ cat > mix.txt abc apple BALL Abc bat Now use the sort command
Command : $ sort mix.txt Output : abc Abc apple bat BALL
split
it’ command is used to split or break a file into the pieces in Linux and UNIX systems. Whenever we split a large file with split command then split output file’s default size is 1000 lines and its default prefix would be ‘x’.
tail
It is the complementary of head command.The tail command, as the name implies, print the last N number of data of the given input. By default it prints the last 10 lines of the specified files. If more than one file name is provided then data from each file is precedes by its file name.
tr
The tr command in UNIX is a command line utility for translating or deleting characters. It supports a range of transformations including uppercase to lowercase, squeezing repeating characters, deleting specific characters and basic find and replace. It can be used with UNIX pipes to support more complex translation. tr stands for translate.