This is the note that i took for the Class of the MIT : The Missing Semester of Your CS Education(2020).
https://www.bilibili.com/video/BV1x7411H7wa/?p=1
1.Shell
Whitespace seperates the arguments, if I need an argument as multiple words:
1.Quote those words;
2.Use ‘’\‘’+Whitespace as space in this argument;
Path & Some relative handy commands
Veriable $PATH
1 | echo $PATH |
This shows all the paths on my machine that the shell will search for programs of files whose name matches the command which i tried to run.
1 | which echo |
which
command tells me that if I were to run a program called echo, I would run that one in the directory it shows.
Paths are the way to name the location on your computer.On Linux and MacOS these paths are separated by ‘/‘ (forward slash) , while on Windows paths are seperated by ‘\‘ (back slash) instead.
Absolute path
On Linux and MacOS, everything lives under the root namespace, so all absolute paths start with a slash.
On Windows there is one root for every partition,
C:\
、D:\
.Relative path
1 | pwd |
Print working directory.
1 | cd /home |
Change Directory. .
means current directory, ..
means the parent directory,~
means the home directory. -
means the directory you previously in.
1 | ls |
List all files and ditectories(in the directory which i typed in, default is the current directory).
Most programs take what are known as arguments like flags and options. These are things that usually start with a -
. One of the most handy ones is --help
, most programs implement this . This shows how to use this program and all the other flags and options of this program.
1 | ls -l |
result example:
1 | total 0 |
d
indicates that something is a directory.l
indicates that something is a link file,-
indicates that it is a normal file;- The following 9 letters after that indicate the permissions that are set for that file:
- r (Read)
- w (Wirte)
- x (eXecute)
The first there of those indicates the permissions of the owner of this file. The next three indicated the permission of the group of this file. Last three indicates other’s permission.
- The following number means the number of the hard links
gongfeiyang
is the owner of this file,staff
is the group of the owner.- The following number indicates the size(Bytes) of the file.
Other handy commands
mv
,cp
,mkdir
Streams
Every program by default has two primary streams: input stream and output stream. By default, the input stream and output stream are terminal.
Shell gives a way to rewire these streams to change where the input and output of a program pointed.
The most straightforward way to do this is using <
(rewire the input to this file) and >
(rewire the output to this file) ,>>
(append to):
Eg:
1 | echo hello > hello.txt |
Pipe character :
|
Take the output of the program to the left and make the input of the program to the right.
eg:
1
ls -l / | tail -n1
I want the output of
ls
to be the input of thetail
.
Root User
Root user can do whatever he want to the computer, sort of like a super user.
1 | sudo |
Do as super user.
1 | sudo su |
Drop into root shell.
Variable
Assign variables and replace them in string:
1 | foo=bar |
Output:
1 | Value is bar |
$0
: The name of the script.$1
~$9
: The Xth argument of the command input.$_
:The last argument of the previous command.$?
:The error code from the previous command. 0 means no error.$#
:The number of arguments we are giving to this program.$$
: The process ID of this command that is running.$@
: The vector of all arguments we are giving.How to use
$?
:1
2
3
4false || echo "Oops fail"
true || echo "this will not be printed"
true && echo "Things went well"
false && echo "This will not be printed"||
: If the$?
of the left command is not 0, bash will try to execute the right one.&&
: If the$?
of the left command is 0, bash will execute the right one.
Getting the output of a command into a variable.
1 | foo=$(pwd) |
Globbing & Expand
1 | ls *.sh |
*
can be replaced by any string.
?
can be replaced by one character.
1 | ls image.png image.jpg |
Running Scripts written by other language
Python:
1 | #!usr/local/bin/python |
The first line tells shell what to use to interpret the scripts.
Python:
1 | #!usr/bin/env python |
The first line tells shell to use the result of the env
command to interpret the scripts.
Shell check
shellcheck
can help you to modify your *.sh
files to avoid many mistakes and bugs.
2 .Vim
Vim is kind of like a programming language. You can combine those concepts in it together to accomplish a goal.
Tips:
^V
、Ctrl+V
、<C-V>
means pressing Ctrl button and V button.
- Normal mode :
h
: :arrow_left: left arrow key.j
: :arrow_down: down arrow key.k
: :arrow_up: up arrow key.l
: :arrow_right: right arrow key.w
: Move the cursor forward by one word.b
: Move the cursor backward by one word.e
: Move to the end of a word.0
: Move to the beginning of a line.$
: Move to the end of a line.^
: (Shift
+6
) Move to the first non-empty character of this line.^U
,^D
: Scroll up; Scroll down.G
: Move to all the way down;gg
: Move to the top.f
+?
: Find the first character?
after current character;F
+?
: Find the first character?
before current character.t
+?
: Jump to the character before the first character?
after current character.T
+?
: Jump to the character after the first character?
after current character.o
,O
: Open a new line {under, above} the current line. And enter the insert mode.d
+Movement key: Delete all characters between the current cursor and the end position of the cursor.dd
: Delete this line.c
+Movement key: Delete all characters between the current cursor and the end position of the cursor. And take you into Insert mode.cc
: Delete this line and enter insert mode.y
+Movement key: Copy ~~(Just liked
&c
)yy
: Copy this line;p
: Paste.u
: Undo. (The last change.)^R
: Redo. (The last change.)- If you have done a lot of change in Insert mode and exit to normal mode, all those changes are considered as ONE change.
u
will Undo all those changes.
- If you have done a lot of change in Insert mode and exit to normal mode, all those changes are considered as ONE change.
x
: Delete this character.r
+?
: Replace this character with?
..
: Repeat the previous change./
+???
: Search for???
.
- Insert mode : Press
i
to get into Insert mode from normal mode. PressEsc
to return to normal mode. - Replace mode : Press
r
to get into. PressEsc
to get out. - Visual mode: Press
v
to enter. To Select blocks of texts.- Visual line: Press
Shift
+V
to enter. - Visual block: Press
^v
to enter.
- Visual line: Press
- Command-line mode: Press
:
to enter.:w
: Write the changes into the file == save.:q
: Quit(Close the current window).:qa
: Close all windows.:help
: Returns the information of the particular key or command following the:help
.
Counts: Numbers +Command. Means to execute the Command ‘Number’ times.
Modifier:a
: around (this whole group) ; i
: inside. Interact with different types of grouping things like parentheses and square brackets.
3. Data Wrangling
The lecture use a system log of a server as the data of demonstation.
1 | ssh <remote server name> <username> journalctl |
This is the raw data which will be used in data wrangling.
1 | ssh tsp journalctl | grep ssh | grep "Disconnected from" > ssh.log |
Now I have all the lines which contains ssh
, which also contain the username of who had loged into the server and their IP address and so on.
Using sed
and regular expression
1 | sed 's/TheStringNeedToBeReplaced/NewString/g' |
.
: means any single character.
-E
: Let sed
support the extended regular expression.
*
: means zero or more the previous character. +
: means one or more the previous character
?
: means zero or one the previous one.
[]
: lets you match one of many different chatacters. Eg: [ab]
: a
or b
; [0-9]
: 0
~9
;
If you want to match as many times as possible , use g
modifier : sed 's/[ab]//g'
.
()
: Capture group; Use \(
to mean (
in the string ;So does \[
mean [
.
|
: Or. eg '(ab|bc)*'
;
The ultimate regluar expression which perfectly matches the log file
The log file format:
1 | Jan 14 02:00:20 the squarreplanet.com sshd[25663]: Disconnected from authenticating user root 167.99.46.65 port 564692 [preauth] |
Command:
1 | cat ssh.log | sed -E 's/^.*Disconnected form (invalid |authenticationg )? user (.*) [0-9.]+ port [0-9]+ ( \[preauth\])?$/\2/' | head -n100 |
^
: matched the head of a line. $
: matched the end of a line.
\2
: The second capture group. In this case, Use the second capture group to replace the whole matched string
Now we have all the usernames!
You can use https://regex101.com to debug and test your regular expression.
1 | cat ssh.log | sed -E 's/^.*Disconnected form (invalid |authenticationg )? user (.*) [0-9.]+ port [0-9]+ ( \[preauth\])?$/\2/' | head -n100 | sort | uniq -c |
Sort all the usernames and don’t show the same ones but show their total numbers.
1 | cat ssh.log | sed -E 's/^.*Disconnected form (invalid |authenticationg )? user (.*) [0-9.]+ port [0-9]+ ( \[preauth\])?$/\2/' | head -n100 | sort | uniq -c | sort -nk1,1 | tail -n10 |
-n
means numerical sort ,k
means use whitespace to separate column from the input,1,1
means start sort from column 1 to column 1.
paste
:
To paste the input together in a certain format.
paste -sd,
: paste the input in single line(-s
), with ,
as a separator.
awk
:
awk
is a column based sting process, It focused on columnar data.
1 | cat ssh.log | sed -E 's/^.*Disconnected form (invalid |authenticationg )? user (.*) [0-9.]+ port [0-9]+ ( \[preauth\])?$/\2/' | sort | uniq -c | awk '$1 == 1 && $2 ~ /^c.*e$/ {print $0}' |
Print the whole line if the first column equals 1
and column 2 has the pattern of /^c.*e$/
,which means starts with a c
and end with a e
.
1 | cat ssh.log | sed -E 's/^.*Disconnected form (invalid |authenticationg )? user (.*) [0-9.]+ port [0-9]+ ( \[preauth\])?$/\2/' | sort | uniq -c | awk 'BEGIN {rows = 0} $1 == 1 && $2 ~ /^c.*e$/ {rows += 1} END {print rows}' |
BEGIN
: matches the zeroth line; END
: matches after the last line.
bc
:( Basic calculator ?)
Read from the input buffer and calculate them
1 | cat ssh.log | sed -E 's/^.*Disconnected form (invalid |authenticationg )? user (.*) [0-9.]+ port [0-9]+ ( \[preauth\])?$/\2/' | head -n100 | sort | uniq -c | awk '$1 != 1 {print $1}' | paste -sd+ | bc -l |
xargs
:
Transform the lines of input into arguments.
4. Command-line Environment
Job control
sleep <Integer>
: sleep <Integer>
seconds.
Press Ctrl
+ c
: Send a SIGINT
signal to tell the program to stop itself.
man signal
to checkout other signals.
Press Ctrl
+z
: Suspend the job.
1 | nohup sleep 2000 & |
nohup
will make this job ignore the hang up signal; &
means running the job in background.
jobs
: shows the existing jobs.
1 | jobs |
Use bg
+ % +<number>
to continue the<number>
th job running in the background, Use fg
to bring it to foreground instead .
kill
command
Allows you to send any sort of Unix signal
1 | kill -STOP %1 |
Terminal multiplexers (Tmux in this lecture)
The three core concepts of tmux:
Sessions
Windows
- Panes
Dotfiles(starts with a .)
alias <string>=="<Some Commands>"
: bind the <string>
to the commands.
alias <string>
: Show the current mapping of <string>
.
vim ~./bashrc
to configure your own bash shell.
You can use syslinks and git to make your dotfiles neat.
Remote machines (Using ssh command)
SSH is just like a secure shell, it’s just gonna take the responsibility for reaching wherever we want to go and trying to open a session there.
ssh [-p port] <username>@<IP address>/<RemoteMachineName>/<DNS name>
sshkeys
ssh-keygen
command can generate the key.
Use:
1 | cat <keypsth> | ssh <username>@<DNS name> .ssh/authorized_keys |
to let the server remember it.
scp
command
ssh copy : scp <file> <<name>@<DNS name>:<file>>
rsync
command
1 | rsync -avP <folder name> <<name>@<DNS name>:<folder name>> |
The scp
will try to copy the file no matter the file is already exist or not.rysnc
can try to check if the files are there and start copying from where it stopped.
5. Version control(git)
Data structure of git(logical)
1 | Type blob = array<byte> //file |
Commands
git init
cd into a directory and initialize git.
1 | git init |
all the git-related data are stored in .git
directory.
git status
Show the status , such as which branch I’m currently on ans how many commits there are.
git add
Stage the changes you’ve made.
git commit
Commit the staged changes.
git commit -m ""
:Commit the staged changes. ""
is the message of this commit.
git log
Viusalize the version history.
1 | git log --all --graph --decoreate |
This shows log in graph.
git checkout
Checkout the files in a certain commit or branch.
git diff
Show the changes made to the file compare to <>
1 | git diff hello.txt |
git diff
will compare the changes to the HEAD
(Where I’m looking at) by default.
Branching and Merging
git branch
List all branches
git branch -vv
: Show branches with more information.git branch whatever
: Create a new branch named whatever
.git checkout -b dog
: git branch dog
and git checkout dog
.
git merge
If there are merge conflicts , after manully solved those conflicts, use git merge --continue
to continue the merge. If we can’t fix those conflicts, use git merge --abort
instead.
Remote
git remote
git can automatically detect other clones of the same repository.git remote
command will list all the remotes for the current repository.
1 | git remote add <name> <URL> |
git push
1 | git push <remote> <local branch> : <remote branch> |
git branch --set-upstream-to=origin/master
: set default remote branch.
git clone
1 | git clone <URL> <folder name> |
git fetch & git pull
1 | git fetch <remote> |
git pull
: git fetch
; git merge
.
Other useful commands
git config
git clone --shallow
: Use this when you don’t want the whole git history of it, but only the latest version of it.
git blame
: Show the information of a certain commit or file, such as when and who
git bisect
: Binary search the first file that failed the test.
7. Debugging and Profiling
I skipped a lot in this lecture. Maybe you can go check it yourself.
log
system log
log show --last 10s
: show the last 10 seconds’ system log.
logger + <Message>
Add the message to the system log:
Debugger
Depends on the language you use. In this lecture, python is used as an example.
Profiler
Tracing profiler: Kind of instrument your code, every time your code enters a function call, they kind of take a note of it; Once they finish, they can report it that it had spent how much time executing in this function.
Sampling profiler: This issue with tracing profilers is they add a lot of overhead. What sampling profiler is gonna do is gonna execute your program and every some sort of defined period of time, it will stop your program (halt it),and look at the stack trace and know where the code is executing now in this point After doing so long enough , It can tell where most of the time is being spent.
Htop command
8. MetaProgramming
Bulid systems.
Bulid system’s idea is that you want to encode the rules for what commands to run in order to build particular targets into a tool that can do it for you. In particular, you are going to teach this tool the dependencies between those different artifacts that you might build there. The common core idea of the build system is that you have a number of targets ,these are the things that you want to build , and you have a bunch of dependencies and dependencies are things that need to be built in order for these things to be built. And you have rules define how do you go from a complete list of dependencies to the given target.
In this case , the make
command will be introduced as a build tool.
makefile : where you encode these targets , denpendencies and rules. Eg:
1 | paper.pdf: paper.tex plot-data.png |
Then, use make
command.
Tips:
%
: means any string. But if you reuse it in the dependcy part, it repeats the pattern in the dependency.:
: The things on the left side of:
is target, those on the right side of:
is the dependcies. The next line starting with a<tab>
is the rule to build the target with those dependencies.$*
: A special variable which has already defined for you and makefile rules that matches whatever the%
was.$@
: A special variable that means the name of the target.- make command will remake the targets only when its dependencies are changed.
Semantic versioning
Suppose 8.1.7
is some software’s version number. 8
is considered as major version; 1
is considered as minor version; 7
is considered as patch number.
If the external of the software doesn’t change, such as security bug fixed, you should increase the patch number.
If you add something to the library , you should increase the minor version and set the patch number to zero.
If you make a backwards incompatible change where if my software used to work with whatever version you had and tthen you made a change that means my software will no longer work such as removing a function or rename it then you should increment the major version and set minor and patch to zero.
Lock file : It makes sure that you don’t accidentally update something. The lock file at it’s core is really just a list of your dependencies and which version of them you are currently using.
Continuous integration
9. Security and Cryptography
Entropy
Entropy is a measure of randomness.
$$
Entropy = \log_{2}(Posibilities)
$$
Hash functions
Eg : Sha1(Bytes) —-> 160 bits
1 | printf "hello" | sha1sum |
Properties:
- Non-invertible : Hard to get the information of the input from the output, even though you have the function.
- Collision resistant : Hard to find two input that have the exactly same output.
You can use the hash function to help to check whether the two inputs are the same; or just tell someone some information, say a result of a flip of a coin, the he can guess it and check the result himself without being able to cheat to win.
Key Derivation functions(KDFs)
Eg : PBKDF2(…) —–>
Properties:
- Slow to compute : Easy to check but slow to calculate? Sort of like np and p problems?
In password authentication , the server have the hash of a password saved and then someone input a password; The server want to know if that corresponds to the hash of the password It’s ok to do it slow because usually you only need to do it once; but when some one want to brute-force the password with the database stole from the website, he need to go through this millions of times, So the attack is slowed down and give the website time to minimize the loss.
Symmertic key cryptography
Some fo its functions:
Keygeneration() -> key
Encrypt(plaintext, key) -> ciphertext
Decrypt(cipertext, key) -> plaintext
Properties:
- Given cipertext, can’t figure out the plain text.(Without the key)
- Decrypt( Encrypt( m, K ), K ) = m;
A demo:
1 | openssl aes-256-cbc -salt -in README.md -out README.enc.md |
aes-256-cbc
is the name of the KDF
salt
: Rather than just storing the hash of the password, we first compute a random data called salt and store the hash of the password conbined with salt.So that the same password in different website are stored differently.
Asymmetric Key croprography
Some fo its functions:
- Keygeneration() -> (Public key, private key)
- Encrypt(plaintext, Public key) -> ciphertext
- Decrypt(cipertext, Private key) -> plaintext
- Sign( message, Private ket ) -> Signature
- Verify( message , signature, Public key) -> ok?
Properties:
- Given cipertext, can’t figure out the plain text.(Without the key)
- Decrypt( Encrypt( m, K ), K ) = m;
- Signature is hard to forge( withput private key)
- Check out the signature.
Hybrid encryption
An Example :
Sender’s side:
Message :arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_heading_down:
Symmetric key generation( ) :arrow_right: Key. :arrow_right: Symmentric Encryption :arrow_right: Cipertext
:arrow_down:
Public key :arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_right::arrow_right: Asymmertic Encryptions :arrow_right::arrow_right: Encryptied Key
Receiver’s side:
Encryptied Key:arrow_right: :arrow_right::arrow_heading_down:
Private key:arrow_right::arrow_right::arrow_right::arrow_right: Asymmertic Decryptions :arrow_right::arrow_right: Key.:arrow_heading_down:
Cipertext :arrow_right: Symmetric Decription :arrow_right::arrow_right::arrow_right: Message
10. Potpourri
Keyboard remapping
Daemon
Some program running as background processes and they’re just executing in the background and waiting events to happen or enabling some sort of functionality in your computer. Usually, the daemon program’s name is ended with a ‘d’: ‘sshd’,’systemd’
File system
Backup
API
Commandline arguments
--help
: Print the information of how you can run this program.--version
: Prints the version of the program.--verbose
or-v
: Let you increase the output of the program. It makes the program print more about what it is doing and very often you can repeat the flag like `-vvvvvvv.- Dry run flag: Differs from tools to tools. It will run the tool but it will not actually make any changes instead it will just inform you of what it would have done if you hadn’t run it with dry run.
- Interactive mode : For example,
rm
andmove
tools both do, often just-i
although not always. When you run the tool in the interactive mode, it will prompt you whenever it’s about to do an action which you can’t undo. - Recursive flag: Often
-r
. Let the tool traverse down into the tree to go deeper.But you need to opt into this behavior - Many tools usually ask for a file name of a path. Instead of giving a file name you can often just give a single
-
and what that means is standard input or standard output depending on whether that argument is an inpit file or a output file. - If you don’t want some arguments to be interpreted as flag , use
--
, then everything after it will not be interpreted as a flag.(--
has space on both side)
WIndow manager
VPNS
Markdowns
Hammer spoon(MacOS)
Use Luna Scripts to automize your computer.
Booting process and live USBs
Vitural machine
Vagrant
Notebook programming environment
Jupiter Nodebook
Github
11. Q&A
Other data wrangling tool
Perl? Use Vim and pandas in python