The missing semester of your CS education

Course overview + the shell

Concepts

Paths. Paths are a way to name the location of a file on your computer.

When running commands:

Options. Everything that does take a value. Ex. tail -n 50. In this case, -n is an option and 50 is the argument.

Flags. Everything that doesn't take a value. Ex. tail -f error.log. In this case, -f is a flag

ls

List the content of the current directory. Use the flag -l to use a long listing format

1missing:~$ ls -l /home
2drwxr-xr-x 1 missing users 4096 Jun 15 2019 missing

This gives us a bunch more information about each file or directory present.

  • d, the d at the beginning of the line tells us that missing is a directory.

Then follow three groups of three characters (rwx). These indicate what permissions the owner of the file (missing), the owning group (users), and everyone else respectively have on the relevant item.

The permissions meaning change from directories and files. The permissions in files work as you would expected, if you have write (w) permission you can modify the file, if you have execution (x) permission and if you have read (r) permissions you can read the content of a file. In directories, it works as follows:

  • w, modify permission (i.e., add/remove files in it)
  • x, execute. To enter a directory, a user must have “search” (represented by “execute”: x) permissions on that directory (and its parents).
  • r, to list its contents, a user must have read (r) permissions on that directory.

Connecting programs

In the shell, programs have two primary “streams” associated with them: their input stream and their output stream. When the program tries to read input, it reads from the input stream, and when it prints something, it prints to its output stream. Normally, a program’s input and output are both your terminal. That is, your keyboard as input and your screen as output. However, we can also rewire those streams!

The simplest form of redirection is < file and > file. These let you rewire the input and output streams of a program to a file respectively:

1missing:~$ echo hello > hello.txt
2missing:~$ cat hello.txt
3hello
4missing:~$ cat < hello.txt
5hello
6missing:~$ cat < hello.txt > hello2.txt
7missing:~$ cat hello2.txt
8hello

You can also use >> to append to a file. Where this kind of input/output redirection really shines is in the use of pipes. The | operator lets you “chain” programs such that the output of one is the input of another.

A versatile and powerful tool

The root user is above (almost) all access restrictions, and can create, read, update, and delete any file in the system.

One thing you need to be root in order to do is writing to the sysfs file system mounted under /sys. sysfs exposes a number of kernel parameters as files, so that you can easily reconfigure the kernel on the fly without specialized tools. Note that sysfs does not exist on Windows or macOS.

Commands

1cd - # it change the directory to the previously you where in
2tee # it takes an input and writes it to a file but also to standard out.

Exercises

  1. Create a new directory called missing under /tmp.
1cd /temp
2mkdir missing
  1. Look up the touch program. The man program is your friend.
  2. Use touch to create a new file called semester in missing.
1touch semester
  1. Write the following into that file, one line at a time:

    #!/bin/sh curl --head --silent https://missing.csail.mit.edu

    The first line might be tricky to get working. It’s helpful to know that # starts a comment in Bash, and ! has a special meaning even within double-quoted (") strings. Bash treats single-quoted strings (') differently: they will do the trick in this case. See the Bash quoting manual page for more information.

    1echo "#!/bin/sh" >> semester # <- this doesn't work
    2echo '#!/bin/sh' >> semester # <- this does.
  2. Try to execute the file, i.e. type the path to the script (./semester) into your shell and press enter. Understand why it doesn’t work by consulting the output of ls (hint: look at the permission bits of the file).

    Because, the owner doesn't have permission to execute it (x)

  3. Run the command by explicitly starting the sh interpreter, and giving it the file semester as the first argument, i.e. sh semester. Why does this work, while ./semester didn’t?

  4. Look up the chmod program (e.g. use man chmod).

  5. Use chmod to make it possible to run the command ./semester rather than having to type sh semester. How does your shell know that the file is supposed to be interpreted using sh? See this page on the shebang line for more information.

    1chmod 744 semester #744 adds a permission to write, read and execute to the owner
  6. Use | and > to write the “last modified” date output by semester into a file called last-modified.txt in your home directory.

    1./semester | grep -i last-modified | cut -d ' ' -f 2- > last-modified.txt
    2# tee works great if you want to write the input into a file and output in the standar output
    3./semester | grep -i last-modified | cut -d ' ' -f 2- | tee last-modified.txt
  7. Write a command that reads out your laptop battery’s power level or your desktop machine’s CPU temperature from /sys. Note: if you’re a macOS user, your OS doesn’t have sysfs, so you can skip this exercise.

Shell Tools and Scripting

Shell scripts are the next step in complexity. Most shells have their own scripting language with variables, control flow, and its own syntax. What makes shell scripting different from other scripting programming language is that it is optimized for performing shell-related tasks. Thus, creating command pipelines, saving results into files, and reading from standard input are primitives in shell scripting, which makes it easier to use than general-purpose scripting languages.

Assigning variables

To assign variables in bash, use the syntax foo=bar and access the value of the variable with $foo. Note that foo = bar will not work since it is interpreted as calling the foo program with arguments = and bar.

Strings in bash can be defined with ' and " delimiters, but they are not equivalent. Strings delimited with ' are literal strings and will not substitute variable values whereas " delimited strings will.

1foo=bar
2echo "$foo"
3# prints bar
4echo '$foo'
5# prints $foo

Relevant variables

  • $0 - Name of the script
  • $1 to $9 - Arguments to the script. $1 is the first argument and so on.
  • $@ - All the arguments
  • $# - Number of arguments
  • $? - Return code of the previous command
  • $$ - Process identification number (PID) for the current script
  • !! - Entire last command, including arguments. A common pattern is to execute a command only for it to fail due to missing permissions; you can quickly re-execute the command with sudo by doing sudo !!
  • $_ - Last argument from the last command. If you are in an interactive shell, you can also quickly get this value by typing Esc followed by .

Another common pattern is wanting to get the output of a command as a variable. This can be done with command substitution. Whenever you place $( CMD ) it will execute CMD, get the output of the command, and substitute it in place. For example, if you do for file in $(ls), the shell will first call ls and then iterate over those values. A lesser-known similar feature is process substitution, <( CMD ) will execute CMD and place the output in a temporary file and substitute the <() with that file’s name. This is useful when commands expect values to be passed by file instead of by STDIN. For example, diff <(ls foo) <(ls bar) will show differences between files in dirs foo and bar.

When launching scripts, you will often want to provide arguments that are similar. Bash has ways of making this easier, expanding expressions by carrying out filename expansion. These techniques are often referred to as shell globbing.

  • Wildcards - Whenever you want to perform some sort of wildcard matching, you can use ? and * to match one or any amount of characters respectively. For instance, given files foofoo1foo2foo10 and bar, the command rm foo? will delete foo1 and foo2 whereas rm foo* will delete all but bar.
  • Curly braces {} - Whenever you have a common substring in a series of commands, you can use curly braces for bash to expand this automatically. This comes in very handy when moving or converting files.
1convert image.{png,jpg}
2# Will expand to
3convert image.png image.jpg
4mv *{.py,.sh} folder

Note that scripts need not necessarily be written in bash to be called from the terminal. For instance, here’s a simple Python script that outputs its arguments in reversed order:

1#!/usr/local/bin/python
2import sys
3for arg in reversed(sys.argv[1:]):
4 print(arg)

Finding files

1# Find all directories named src
2find . -name src -type d
3# Find all python files that have a folder named test in their path
4find . -path '*/test/*.py' -type f
5# Find all files modified in the last day
6find . -mtime -1
7# Find all zip files with size in range 500k to 10M
8find . -size +500k -size -10M -name '*.tar.gz'

fd is a simple, fast, and user-friendly alternative to find. It offers some nice defaults like colorized output, default regex matching, and Unicode support. It also has, in my opinion, a more intuitive syntax. For example, the syntax to find a pattern PATTERN is fd PATTERN.

Finding code

grep -C 5 will print 5 lines before and after the match. When it comes to quickly search through many files, you want to use -R since it will Recursively go into directories and look for files for the matching string.

But grep -R can be improved in many ways, such as ignoring .git folders, using multi CPU support, &c. Many grep alternatives have been developed, including ack, ag and rg.

1# Find all python files where I used the requests library
2rg -t py 'import requests'
3# Find all files (including hidden files) without a shebang line
4rg -u --files-without-match "^#!"
5# Find all matches of foo and print the following 5 lines
6rg foo -A 5
7# Print statistics of matches (# of matched lines and files )
8rg --stats PATTERN

Finding shell commands

The history command will let you access your shell history programmatically. It will print your shell history to the standard output. If we want to search there we can pipe that output to grep and search for patterns. history | grep find will print commands that contain the substring “find”.

In most shells, you can make use of Ctrl+R to perform backward search through your history.

Another cool history-related trick I really enjoy is history-based autosuggestions. First introduced by the fish shell, this feature dynamically autocompletes your current shell command with the most recent command that you typed that shares a common prefix with it. It can be enabled in zsh and it is a great quality of life trick for your shell.

Today I learned how to add history-based autosuggestions to my terminal.

The first thing you need to do is to install Oh My Zsh

After installing Oh My Zsh, you need to install zsh-autosuggestions based on your system

Finally, you clone this repository using

1git clone https://github.com/zsh-users/zsh-autosuggestions ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions

Add the plugin to the list of plugins for Oh My Zsh to load (inside ~/.zshrc):

1plugins=(zsh-autosuggestions)

and to apply the changes run

1exec zsh

Directory Navigation

More complex tools exist to quickly get an overview of a directory structure: tree, broot or even full fledged file managers like nnn or ranger.

Exercises

Read [man ls](https://www.man7.org/linux/man-pages/man1/ls.1.html) and write an ls command that lists files in the following manner

  • Includes all files, including hidden files
  • Sizes are listed in human-readable format (e.g. 454M instead of 454279954)
  • Files are ordered by recency
  • Output is colorized
1ls -lath

Editors (Vim)

Exercises

  1. Complete vimtutor. Note: it looks best in a 80x24 (80 columns by 24 lines) terminal window.
  2. Download our basic vimrc and save it to ~/.vimrc. Read through the well-commented file (using Vim!), and observe how Vim looks and behaves slightly differently with the new config.
  3. Install and configure a plugin: ctrlp.vim.
    1. Create the plugins directory with mkdir -p ~/.vim/pack/vendor/start
    2. Download the plugin: cd ~/.vim/pack/vendor/start; git clone https://github.com/ctrlpvim/ctrlp.vim
    3. Read the documentation for the plugin. Try using CtrlP to locate a file by navigating to a project directory, opening Vim, and using the Vim command-line to start :CtrlP.
    4. Customize CtrlP by adding configuration to your ~/.vimrc to open CtrlP by pressing Ctrl-P.
  4. To practice using Vim, re-do the Demo from lecture on your own machine.
  5. Use Vim for all your text editing for the next month. Whenever something seems inefficient, or when you think “there must be a better way”, try Googling it, there probably is. If you get stuck, come to office hours or send us an email.
  6. Configure your other tools to use Vim bindings (see instructions above).
  7. Further customize your ~/.vimrc and install more plugins.

Data Wrangling

Command-line Environment

Terminal Multiplexers

Terminal multiplexers like tmux allow you to multiplex terminal windows using panes and tabs so you can interact with multiple shells sessions. Moreover, terminal multiplexers let you detach a current terminal session and rettach at some point later in time.

Terminal multiplexer are composed of sessions. Sessions have windows and windows have panes.

Remote Machines

Key based authentication

ssh will look into .ssh/authorized_keys to determine which clients it should let in. To copy a public key over you can use:

1cat .ssh/id_ed25519.pub | ssh foobar@remote 'cat >> ~/.ssh/authorized_keys'

A simpler solution can be achieved with ssh-copy-id where available:

1ssh-copy-id -i .ssh/id_ed25519.pub foobar@remote

SSH Config

This config file is useful because instead of type ssh jjgo@192.168.246.142, you could just write ssh vm

1# ~/.ssh/config
2Host vm
3 User kkgp
4 HostName 192.168.246.142
5 IdentityFile ~/.ssh/id_ed25519
6 RemoteForward 9999 localhost:888

Version Control (Git)

Version control systems (VCSs) are tools used to track changes to source code (or other collections of files and folders).

Git’s data model

Data model, as pseudocode

1// a file is a bunch of bytes
2type blob = array<byte>
3
4// a directory contains named files and directories
5type tree = map<string, tree | blob>
6
7// a commit has parents, metadata, and the top-level tree
8type commit = struct {
9 parent: array<commit>
10 author: string
11 message: string
12 snapshot: tree
13}

Objects and content-addressing

An “object” is a blob, tree, or commit:

1type object = blob | tree | commit

Git command-line interface

Undo

  • git commit --amend: edit a commit’s contents/message
  • git reset HEAD <file>: unstage a file
  • git checkout -- <file>: discard changes

Advanced Git

  • git clone --depth=1: shallow clone, without entire version history
  • git add -p: interactive staging
  • git rebase -i: interactive rebasing
  • git blame: show who last edited which line
  • git stash: temporarily remove modifications to working directory
  • git bisect: binary search history (e.g. for regressions)
  • .gitignorespecify intentionally untracked files to ignore

Debugging and Profiling

Profiling

Profilers

CPU

here are two main types of CPU profilers: tracing and sampling profilers. Tracing profilers keep a record of every function call your program makes whereas sampling profilers probe your program periodically (commonly every millisecond) and record the program’s stack.

A caveat of many profilers is that they display time per function call. That can become unintuitive really fast, especially if you are using third-party libraries in your code since internal function calls are also accounted for. A more intuitive way of displaying profiling information is to include the time taken per line of code, which is what line profilers do.

Visualization

One common way to display CPU profiling information for sampling profilers is to use a Flame Graph, which will display a hierarchy of function calls across the Y axis and time taken proportional to the X axis.

The%20missing%20semester%20of%20your%20CS%20education%202fc7a9213da24bf89335b9c6bb8b99cb/Untitled.png

Call graphs or control flow graphs display the relationships between subroutines within a program by including functions as nodes and functions calls between them as directed edges. When coupled with profiling information such as the number of calls and time taken, call graphs can be quite useful for interpreting the flow of a program.

The%20missing%20semester%20of%20your%20CS%20education%202fc7a9213da24bf89335b9c6bb8b99cb/Untitled%201.png

Metaprogramming

Dependency management

Semantic Versioning

64[major].0[minor].20200324[minor]

  • If a new release does not change the API, increase the patch version.
  • If you add to your API in a backwards-compatible way, increase the minor version.
  • If you change the API in a non-backwards-compatible way, increase the major version.

Continuous integration systems

A brief aside on testing

  • Test suite: a collective term for all the tests
  • Unit test: a “micro-test” that tests a specific feature in isolation
  • Integration test: a “macro-test” that runs a larger part of the system to check that different feature or components work together.
  • Regression test: a test that implements a particular pattern that previously caused a bug to ensure that the bug does not resurface.
  • Mocking: to replace a function, module, or type with a fake implementation to avoid testing unrelated functionality. For example, you might “mock the network” or “mock the disk”.

Security and Cryptography

Entropy

Entropy is a measure of randomness. This is useful, for example, when determining the strength of a password.

Entropy is measured in bits, and when selecting uniformly at random from a set of possible outcomes, the entropy is equal to log_2(# of possibilities).

How many bits of entropy is enough? It depends on your threat model. For online guessing, as the XKCD comic points out, ~40 bits of entropy is pretty good. To be resistant to offline guessing, a stronger password would be necessary (e.g. 80 bits, or more).

Hash functions

A cryptographic hash function maps data of arbitrary size to a fixed size,

A hash function has the following properties:

  • Deterministic: the same input always generates the same output.
  • Non-invertible: it is hard to find an input m such that hash(m) = h for some desired output h.
  • Target collision resistant: given an input m_1, it’s hard to find a different input m_2 such that hash(m_1) = hash(m_2).
  • Collision resistant: it’s hard to find two inputs m_1 and m_2 such that hash(m_1) = hash(m_2) (note that this is a strictly stronger property than target collision resistance).

Key derivation functions

Similar to Hash functions but they are deliberately slow, in order to slow down offline brute-force attacks.

Symmetric cryptography

Hiding message contents is probably the first concept you think about when you think about cryptography. Symmetric cryptography accomplishes this with the following set of functionality:

1keygen() -> key (this function is randomized)
2encrypt(plaintext: array<byte>, key) -> array<byte> (the ciphertext)
3decrypt(ciphertext: array<byte>, key) -> array<byte> (the plaintext)

The encrypt function has the property that given the output (ciphertext), it’s hard to determine the input (plaintext) without the key.

Encrypt files

1openssl enc -aes-256-cbc -md sha512 -pbkdf2 -iter 100000 -salt -in README.md -out README.enc2.md

Decrypt files

The same command from above but you just add -d flag for decrypt and change the input file to match the encrypted file.

1openssl enc -aes-256-cbc -md sha512 -pbkdf2 -iter 100000 -salt -d -in README.enc.md -out README.dec.md

What's salt?

Salt is a random string appended to a hash.

El término hash tiene sentido en las hash functions, por ejemplo, cuando se almacenan contraseñas en una base de datos, no se deben almacenar los contraseña en sí, sino un hash de esta, el problema de hacer esto es que los hackers comenzaron a crear diccionarios, en el que tomaban palabras y su hash value con el fin de que. una vez que tenían el hash de la contraseña, buscaban en el diccionario para ver cual era el input, es decir, la contraseña en sí.

Salt viene a resolver este problema y lo que se hace es que en lugar de almacenar únicamente el hash, se almacena el salt, que si recordamos es un string aleatorio, mas el hash salt, hash(password, salt) y el diccionario se vuelve inservible

Asymmetric cryptography

The term “asymmetric” refers to there being two keys, with two different roles. A private key, as its name implies, is meant to be kept private, while the public key can be publicly shared and it won’t affect security (unlike sharing the key in a symmetric cryptosystem). Asymmetric cryptosystems provide the following set of functionality, to encrypt/decrypt and to sign/verify:

1keygen() -> (public key, private key) (this function is randomized)
2
3encrypt(plaintext: array<byte>, public key) -> array<byte> (the ciphertext)
4decrypt(ciphertext: array<byte>, private key) -> array<byte> (the plaintext)
5
6sign(message: array<byte>, private key) -> array<byte> (the signature)
7verify(message: array<byte>, signature: array<byte>, public key) -> bool (whether or not the signature is valid)

The sign/verify functions have the same properties that you would hope physical signatures would have, in that it’s hard to forge a signature. No matter the message, without the private key, it’s hard to produce a signature such that verify(message, signature, public key) returns true. And of course, the verify function has the obvious correctness property that verify(message, sign(message, private key), public key) = true.

Key distribution

Potpourri

Keyboard remapping

Some software resources to get started on the topic:

Daemons

FUSE

Backups

APIs

Common command-line flags/patterns

Sometimes, you want to pass something that looks like a flag as a normal argument. For example, imagine you wanted to remove a file called -r. Or you want to run one program “through” another, like ssh machine foo, and you want to pass a flag to the “inner” program (foo). The special argument -- makes a program stop processing flags and options (things starting with -) in what follows, letting you pass things that look like flags without them being interpreted as such: rm -- -r or ssh machine --for-ssh -- foo --for-foo.

Window managers

VPNs

Markdown

Hammerspoon (desktop automation on macOS)

Booting + Live USBs

Docker, Vagrant, VMs, Cloud, OpenStack

Notebook programming

GitHub

Q&A