University of Oregon
Topic: Unix Part 2: More Advance Ninja-ry
So unfortunately as with this blog, I am or will be unable to give you files that we practiced on but Julian’s slides are quite good. Remember, we learned about pipes and added on to our current knowledge of command line. We finished up the end of the first slide set… which included all the following commands:
- man [command you are confused about]: manual for commands that gives you all the options. Some man pages are more helpful than others but if you stare at it long enough it’ll start to make sense. Often times there are examples of usage so pay attention to those.
- ls, gunzip, more, cat, head, tail, grep, wc…all that we learned –so don’t forget it and today we ended up having to man some of those commands to learn the options.
- sort [filename], just as it sounds–depending on what kind of sort you want you have to specify an option (see man page for sort; ie. numeric, alphabetical)
- uniq [filename], as with sort, lots of options to tease things out of your file.
- cut [filename], we learned this before too–BUT I learned anew that it is only for column files today.
- tr ” ” “,” [filename]: translate command that changes all tabs to commas in the given filename.
- tab = Ctrl+V+tab or \t
- |, this is a ‘pipe’ read Julian’s slides part 1
- I grabbed the file = cat
- I piped it to the command ‘tr’ specifying I wanted all tabs changed to commas
- Then piped it to another command ‘grep’ (remember what grep does?)
- With grep I specified I wanted to look for all entries where the beginning of the line (hat symbol) started with 96053
- He has a table in his slides that lists what all the symbols in a regular expression mean
- [0-9] any number, just one (so 0 or 1 or 2…)
- [0-9]+ any number plus however many digits of numbers (this cover two digit, three digit etc numbers)
- [a-z] lower case alphabet
- [A-Z] upper case alphabet
- [a-zA-z] upper/lower case alphabet
- Well…we are ‘grabbing’ the file record.tsv and sending it (via pipe) to the command sed…
- What’s sed going to do?
- Well sed is a ‘find and replace’ type of command…ok, well it looks like we want to be ‘general’ in what we want to look for…so lets use a regular expression (be sure to use quotes)–hence the -E in the command.
- Ok…how general, well, lets find all the letters in all the words…well, words can be separated by spaces and can be any length…so [a-z]+ [a-z]+, there that’s good…
- What do we want to replace them with? how about the word ‘foo’–cause that’s fun.
- In summary…we have grabbed the file (cat), we have sent it to sed (via a pipe) so that we can change all the words in the file to the word foo using a regular expression (-E ‘s/[a-z]+ [a-z]+/foo/’)
- I grabbed tinkerbell and did a search on her but I was only interested in 1 part of her which was conveniently located right above the @ symbol which I’m sure she was trying to hide from me. Now she gets confused easily so I’m going to make sure I give her the simplest, dumbest, commands all nicely separated by a big vertical line and I looked at the manual of all the commands I could give her with their options to be double sure she doesn’t screw up! No offense tinkerbell.txt. So if you don’t understand a dash option (ie. -A) go into your terminal and man the command attached to it (ie. man grep).
- When I yanked what I wanted she protested by giving me double dashes –, kind of like narrow eyes; well since I don’t like people giving me stink eye (mean looks), I told her to get rid of those double dashes.
- Now her @ sign was still there as I had used it to find and yank what I wanted but I hadn’t asked her to ‘get rid’ of her @ sign. So lets do that now.
- So now I have only the pieces of information about her that I want…ha ha ha. Wow, that information is long and I really only want the first 5 bits of it, so lets surgically remove it (don’t worry I anesthetized her).
- Great now I have a jumble of nonsense a lot of which is repeated…I’m interested in what’s repeated but don’t want to count it all one by one so I’m going to make her do it. Now lets say she’s getting fussy and will only compare and count if all the ‘similar’ data is close together so I’ll have her sort it out, alphabetically first…
- then make her count all the unique entries
- then put them all in order for me again and send me the result…and that’s all I’ll make her do…for now
- Cause I’m mean like that…perfect. Evil stepmothers have nothing on me…