Home Page
Grasshopper
-
Getting Started - What is Linux? Get started with choosing a distribution and installation.
-
Command Line - Learn the fundamentals of the command line, navigating files, directories and more.
-
Text-Fu - Learn basic text manipulation and navigation.
-
Advanced Text-Fu - Navigate text like a Linux spider monkey with vim and emacs.
-
User Management - Learn about user roles and management.
-
Permissions - Learn about permission levels and modifying permissions.
-
Processes - Learn about the running processes on the system.
-
Packages - Learn all about the dpkg, apt-get, rpm and yum package management tools.
Journeyman
-
Devices - Learn about Linux devices and how they interact with the kernel and user space.
-
The Filesystem - Learn about the Linux filesystem, the different types of filesystems, partitioning and more.
-
Boot the System - Learn about the stages of the Linux boot process.
-
Kernel - The most important part of the Linux system, learn about how it works and how to configure it.
-
Init - Learn about the different init systems, SysV, Upstart and systemd.
-
Process Utilization - Learn resource monitoring with top, load averages, iostat and more!
-
Logging - Learn about system logs and the /var/log directory.
Networking Nomad
-
Network Sharing - Learn about network sharing with rsync, scp, nfs and more.
-
Network Basics - Learn about networking basics and the TCP/IP model.
-
Subnetting - Learn about subnets and how to do subnet arithmetic!
-
Routing - Learn how packets are routed across networks!
-
Network Config - Learn about network configuration using Linux tools!
-
Troubleshooting - Learn about common networking tools to help you diagnose and troubleshoot issues!
-
DNS - Everything and more that you wanted to know about DNS.
Getting started section lessons
-
linux-history
-
choosing-a-linux-distribution
-
debian
-
red-hat-enterprise-linux
-
ubuntu
-
fedora
-
linux-mint
-
gentoo
-
arch-linux
-
openSUSE
History
Lesson Content
Hey rookie! So you decided to dive into this wonderful world known as Linux? Well you better strap in, because it’s gonna be a long and hard road. My name is Penguin Pete and I’m here to guide you through this journey. Let’s get started with a little bit of backstory about Linux.
To learn about how Linux came to be, let’s go back to the beginning to 1969 where Ken Thompson and Dennis Ritchie of Bell Laboratories developed the UNIX operating system. It was later rewritten in C to make it more portable and eventually became a widely used operating system.
A decade or so later, Richard Stallman started working on the GNU (GNU is Not UNIX) project, the GNU kernel called Hurd, which unfortunately never came to completion. The GNU General Public License (GPL), a free software license, was also created as a result of this.
The kernel is the most important piece in the operating system. It allows the hardware to talk to the software. It also does a whole lot of other things, but we’ll dig into that in a different course. For now, just know that the kernel controls pretty much everything that happens on your system.
During this time other efforts such as BSD, MINIX, etc were developed to be UNIX like-systems. However, one thing that all these UNIX like-systems had in common was the lack of a unified kernel.
Then in 1991, a young fellow named Linus Torvalds started developing what we now know today as the Linux kernel.
Exercise
Additional reading:
Choosing a Linux Distribution
Lesson Content
In the previous lesson, we learned about the Linux kernel which powers millions of devices a day. One thing before we move forward, the term Linux is actually quite a misnomer, since it actually refers to the Linux kernel. However, many distributions use the Linux kernel so therefore are commonly known as Linux operating systems.
A Linux system is divided into three main parts:
- Hardware - This includes all the hardware that your system runs on as well as memory, CPU, disks, etc.
- Linux Kernel - As we discussed above, the kernel is the core of the operating system. It manages the hardware and tells it how to interact with the system.
- User Space - This is where users like yourself will be directly interacting with the system.
So the first step we’ll need to take is to install Linux on your machine. You have many options to choose from and this course will help inform you and get you started on choosing a Linux distribution.
There are many Linux distributions to choose from, we’ll just go over the most popular options.
Exercise
No exercises for this lesson.
Quiz Question
No questions, skip ahead!
Quiz Answer
Debian
Lesson Content
Overview Debian is an operating system composed entirely of free and open-source software. It’s widely known and has been in development for over 20 years. There are three branches that you can use, Stable, Testing and Unstable.
Stable is an overall good branch to be on. Testing and Unstable are rolling releases. This means that any incremental changes in those branches will eventually become Stable. For example, if you wanted to get to the next update from Windows 8 to Windows 10, you’ll have to do a complete Windows 10 installation. However being on the Testing release, you’ll automatically get updates until it becomes the next operating system release without having to do a full installation.
Package Management Debian also uses Debian package management tools. Every Linux distribution installs and manages packages differently and they use different package management tools. We’ll get more into this in a later course.
Configurability Debian may not get the latest updates, but it's extremely stable. If you want a good "core" operating system, this is the one for you.
Uses Debian is an overall great operating system for any platform.
Exercise
If you're interested in having Debian as your operating system, head over to the installation section and give it a try: https://www.debian.org/
Red Hat Enterprise Linux
Lesson Content
Overview Red Hat Enterprise Linux commonly referred to as RHEL is developed by Red Hat. RHEL has strict rules to restrict free re-distribution although it still provides source code for free.
Package Management RHEL uses a different package manager than Debian, RPM package manager, which we will eventually learn about as well.
Configurability RHEL-based operating systems will differ slightly from the Debian-based operating systems, most noticeably in package management. If you decide to go with RHEL it’s probably best if you know you’ll be working with it.
Uses As described by the name it's mostly used in enterprise, so if you need a solid server OS this would be a good one.
Exercise
If you're interested in having RHEL as your operating system, head over to the installation section and give it a try: https://www.redhat.com/rhel/
Quiz Questions
Ubuntu
Lesson Content
Overview One of the most popular Linux distributions for personal machines is Ubuntu. Ubuntu also releases its own desktop environment manager Unity by default.
Package Management Ubuntu is a Debian-based operating system developed by Canonical. So it uses a core Debian package management system.
Configurability Ubuntu is a great choice for a beginner who wants to get into Linux. Ubuntu offers ease of use and a great user interface experience that has led to its wide adoption. It’s widely used and supported and is most like other operating systems like OSX and Windows in terms of usability.
Uses Great for any platform, desktop, laptop and server.
Exercise
If you're interested in having Ubuntu as your operating system, head over to the installation section and give it a try: http://www.ubuntu.com/
Fedora
Lesson Content
Overview Backed by Red Hat, the Fedora Project is community driven containing open-source and free software. Red Hat Enterprise Linux branches off Fedora, so think of Fedora as an upstream RHEL operating system. Eventually RHEL will get updates from Fedora after thorough testing and quality assurance. Think of Fedora as an Ubuntu equivalent that uses a Red Hat backend instead of Debian.
Package Management Uses Red Hat package manager.
Configurability If you want to use a Red Hat based operating system, this is a user friendly version.
Uses Fedora is great if you want a Red Hat based operating system without the price tag. Recommended for desktop and laptop.
Exercise
If you're interested in having Fedora as your operating system, head over to the installation section and give it a try: https://getfedora.org/
Quiz Questions
What is RHEL branched off of?
Quiz Answer
Fedora
Linux Mint
Lesson Content
Overview Linux Mint is based off of Ubuntu. It uses Ubuntu’s software repositories so the same packages are available on both distributions. Linux Mint is preferred by others over Ubuntu because it doesn’t come with some of the proprietary software that Ubuntu includes such as Unity.
Package Management Since Linux Mint is Ubuntu based, it uses the Debian package manager.
Configurability Great user interface, great for beginners and less bloated than Ubuntu. In this course, I’ll be using Linux Mint, but any other distribution can be used.
Uses Great for desktop and laptop.
Exercise
If you're interested in having Linux Mint as your operating system, head over to the installation section and give it a try: http://linuxmint.com/
Quiz Questions
What is Linux Mint based off of?
Quiz Answer
Ubuntu
Gentoo
Lesson Content
Overview Gentoo offers ridiculous flexibility with the operating system at a price. It’s made for advanced users who don’t mind getting their hands dirty with the system.
Package Management Gentoo uses its own package management, Portage. The Portage package management is very modular and easy to maintain, which plays a big part in the operating system as a whole being very flexible.
Configurability If you’re just getting started with Linux and want to take a more difficult path, I’d choose Gentoo or Arch Linux as your distribution.
Uses Great for desktop and laptop.
Exercise
If you're interested in having Gentoo as your operating system, head over to the installation section and give it a try: https://www.gentoo.org/
Quiz Questions
What package management system does Gentoo use?
Quiz Answer
Portage
Arch Linux
Lesson Content
Overview Arch is a lightweight and flexible Linux distribution driven 100% by the community. Similar to Debian, Arch uses a rolling release model so incremental updates eventually become the Stable release. You really need to get your hands dirty to understand the system and its functions, but in turn you get complete and total control of your system.
Package Management It uses its own package manager, Pacman, to install, update and manage packages.
Configurability If you want a lightweight operating system and really want to understand Linux use Arch! There’s a bit of a learning curve, but for the hardcore Linux users, this is a great choice.
Uses Great for desktop and laptop. If you also have a small device such as a Raspberry Pi and need to stick a lightweight OS on it, you can’t go wrong with Arch.
Exercise
If you're interested in having Arch as your operating system, head over to the installation section and give it a try: https://www.archlinux.org/
Quiz Questions
What package manager does Arch Linux use?
Quiz Answer
Pacman
openSUSE
Lesson Content
Overview openSUSE Linux is created by the openSUSE Project. A community that promotes the use of Linux everywhere, working together in an open, transparent and friendly manner as part of the worldwide Free and Open Source Software community. openSUSE is the second oldest still running Linux Distributions and shares the base system with SUSE's award-winning SUSE Linux Enterprise products.
Package Management Uses RPM package manager.
Configurability openSUSE is a great choice for a new Linux user. It offers an easy to use graphical installer/administration application (YaST) and a tiday base system, easy to tinker with. openSUSE includes everything you need to enjoy the Internet worry free of viruses/spy-ware and to live out your creativity, be it with your photos, videos, music or code.
Uses openSUSE Leap is fully capable of being used on a desktop PC and laptop.
Exercise
If you're interested in having openSUSE as your operating system, head over to the download page and give it a try: software.opensuse.org
Quiz Questions
What is the name of openSUSE's Administration/Installation Tool?
Quiz Answer
yast
The Command Line
01-the-shell.md 02-print-working-directory-pwd-command.md 03-change-directory-cd-command.md 04-list-directories-ls-command.md 05-touch-command.md 06-file-command.md 07-cat-command.md 08-less-command.md 09-history-command.md 10-copy-cp-command.md 11-move-mv-command.md 12-make-directory-mkdir-command.md 13-remove-rm-command.md 14-find-command.md 15-help-command.md 16-man-command.md 17-whatis-command.md 18-alias-command.md 19-exit-command.md
The Shell
Lesson Content
The world is your oyster, or really the shell is your oyster. What is the shell? The shell is basically a program that takes your commands from the keyboard and sends them to the operating system to perform. If you’ve ever used a GUI, you’ve probably seen programs such as “Terminal” or “Console” these are just programs that launch a shell for you. Throughout this entire course we will be learning about the wonders of the shell.
In this course we will use the shell program bash (Bourne Again shell), almost all Linux distributions will default to the bash shell. There are other shells available such as ksh, zsh, tsch, but we won’t get into any of those.
Let’s jump right in! Depending on the distribution your shell prompt might change, but for the most part it should adhere to the following format:
username@hostname:current_directory pete@icebox:/home/pete $
Notice the $ at the end of the prompt? Different shells will have different prompts, in our case the $ is for a normal user using Bash, Bourne or Korn shell, you don't add the prompt symbol when you type the command, just know that it's there.
Let’s start with a simple command, echo. The echo command just prints out the text arguments to the display.
$ echo Hello World
Exercise
Try some other Linux commands and see what they output:
- $ date
- $ whoami
Quiz Question
What should be outputted to the display when you type echo Hello World?
Quiz Answer
Hello World
pwd (Print Working Directory)
Lesson Content
Everything in Linux is a file, as you journey deeper into Linux you’ll understand this, but for now just keep that in mind. Every file is organized in a hierarchical directory tree. The first directory in the filesystem is aptly named the root directory. The root directory has many folders and files which you can store more folders and files, etc. Here is an example of what the directory tree looks like:
/ |-- bin | |-- file1 | |-- file2 |-- etc | |-- file3 | `-- directory1 | |-- file4 | `-- file5 |-- home |-- var
The location of these files and directories are referred to as paths. If you had a folder named home with a folder inside of it named pete and another folder in that folder called Movies, that path would look like this: /home/pete/Movies, pretty simple huh?
Navigation of the filesystem, much like real life is helpful if you know where you are and where you are going. To see where you are, you can use the pwd command, this command means “print working directory” and it just shows you which directory you are in, note the path stems from the root directory.
$ pwd
Where are you? Where am I? Give it a try.
Exercise
No exercises for this lesson.
Quiz Question
How do I find what directory you are currently in?
Quiz Answer
pwd
cd (Change Directory)
Lesson Content
Now that you know where you are, let’s see if we can move around the filesystem a bit. Remember we’ll need to navigate our way using paths. There are two different ways to specify a path, with absolute and relative paths.
- Absolute path: This is the path from the root directory. The root is the head honcho. The root directory is commonly shown as a slash. Every time your path starts with / it means you are starting from the root directory. For example, /home/pete/Desktop.
- Relative path: This is the path from where you are currently in filesystem. If I was in location /home/pete/Documents and wanted to get to a directory inside Documents called taxes, I don’t have to specify the whole path from root like /home/pete/Documents/taxes, I can just go to taxes/ instead.
Now that you know how paths work, we just need something to help us change to the directory we want to. Luckily, we have cd or “change directory” to do that.
$ cd /home/pete/Pictures
So now I've changed my directory location to /home/pete/Pictures.
Now from this directory I have a folder inside called Hawaii, I can navigate to that folder with:
$ cd Hawaii
Notice how I just used the name of the folder? It’s because I was already in /home/pete/Pictures.
It can get pretty tiring navigating with absolute and relative paths all the time, luckily there are some shortcuts to help you out.
- . (current directory). This is the directory you are currently in.
- .. (previous directory). Takes you to the directory above your current.
- ~ (home directory). This directory defaults to your “home directory”. Such as /home/pete.
- - (previous directory). This will take you to the previous directory you were just at.
$ cd . $ cd .. $ cd ~ $ cd -
Give them a try!
Exercise
- Run the cd command without any flags, where does it take you?
Quiz Question
If you are in /home/pete/Pictures and wanted to go to /home/pete, what’s a good shortcut to use?
Quiz Answer
cd ..
list directory contents
touch
Lesson Content
Let’s learn how to make some files. A very simple way is to use the touch command. Touch allows you to the create new empty files.
$ touch mysuperduperfile
And boom, new file!
Touch is also used to change timestamps on existing files and directories. Give it a try, do an ls -l on a file and note the timestamp, then touch that file and it will update the timestamp.
There are many other ways to create files that involve other things like redirection and text editors, but we’ll get to that in the Text Manipulation course.
Exercise
- Create a new file
- Note the timestamp
- Touch the file and check the timestamp once again
Quiz Question
How do you create a file called myfile?
Quiz Answer
touch myfile
file
Lesson Content
In the previous lesson we learned about touch, let’s go back to that for a bit. Did you notice that the filename didn’t conform to standard naming like you’ve probably seen with other operating systems like Windows? Normally you would expect a file called banana.jpeg and expect a JPEG picture file.
In Linux, filenames aren’t required to represent the contents of the file. You can create a file called funny.gif that isn’t actually a GIF.
To find out what kind of file a file is, you can use the file command. It will show you a description of the file’s contents.
$ file banana.jpg
Exercise
Run the file command on a few different directories and files and note the output.
Quiz Question
What command can you use to find the file type of a file?
Quiz Answer
file
cat
Lesson Content
We’re almost done navigating files, but first let’s learn how to read a file. A simple command to use is the cat command, short for concatenate, it not only displays file contents but it can combine multiple files and show you the output of them.
$ cat dogfile birdfile
It’s not great for viewing large files and it’s only meant for short content. There are many other tools that we use to view larger text files that we’ll discuss in the next lesson.
Exercise
Run cat on different files and directories. Then try to cat multiple files.
Quiz Question
What's a good way to see the contents of a file?
Quiz Answer
cat
less
Lesson Content
If you are viewing text files larger than a simple output, less is more. (There is actually a command called more that does something similar, so this is ironic.) The text is displayed in a paged manner, so you can navigate through a text file page by page.
Go ahead and look at the contents of a file with less. Once you’re in the less command, you can actually use other keyboard commands to navigate in the file.
$ less /home/pete/Documents/text1
Use the following command to navigate through less:
- q - Used to quit out of less and go back to your shell.
- Page up, Page down, Up and Down - Navigate using the arrow keys and page keys.
- g - Moves to beginning of the text file.
- G - Moves to the end of the text file.
- /search - You can search for specific text inside the text document. Prefacing the words you want to search with /
- h - If you need a little help about how to use less while you’re in less, use help.
Exercise
Run less on a file, then page up and around the file. Try searching for a specific word. Quickly navigate to the beginning or the end of the file.
Quiz Question
How do you quit out of a less command?
Quiz Answer
q
history
Lesson Content
In your shell, there is a history of the commands that you previously entered, you can actually look through these commands. This is quite useful when you want to find and run a command you used previously without actually typing it again.
$ history
Want to run the same command you did before, just hit the up arrow.
Want to run the previous command without typing it again? Use !!. If you typed cat file1 and want to run it again, you can actually just go !! and it will run the last command you ran.
Another history shortcut is ctrl-R, this is the reverse search command, if you hit ctrl-R and you start typing parts of the command you want it will show you matches and you can just navigate through them by hitting the ctrl-R key again. Once you found the command you want to use again, just hit the Enter key.
Our terminal is getting a little cluttered no? Let’s do a little cleanup, use the clear command to clear up your display.
$ clear
There that looks better doesn’t it?
While we are talking about useful things, one of the most useful features in any command-line environment is tab completion. If you start typing the beginning of a command, file, directory, etc and hit the Tab key, it will autocomplete based on what it finds in the directory you are searching as long as you don’t have any other files that start with those letters. For example if you were trying to run the command chrome, you can type chr and press Tab and it will autocomplete chrome.
Exercise
Navigate through your previous command history with the Up and Down keys. Play around with ctrl-R reverse search.
Quiz Question
What is the command to clear the terminal?
Quiz Answer
clear
cp (Copy)
Lesson Content
Let’s start making some copies of these files. Much like copy and pasting files in other operating systems, the shell gives us an even simpler way of doing that.
$ cp mycoolfile /home/pete/Documents/cooldocs
mycoolfile is the file you want to copy and /home/pete/Documents/cooldocs is where you are copying the file to.
You can copy multiple files and directories as well as use wildcards. A wildcard is a character that can be substituted for a pattern based selection, giving you more flexibility with searches. You can use wildcards in every command for more flexibility.
- * the wildcard of wildcards, it's used to represent all single characters or any string.
- ? used to represent one character
- [] used to represent any character within the brackets
$ cp *.jpg /home/pete/Pictures
This will copy all files with the .jpg extension in your current directory to the Pictures directory.
A useful command is to use the -r flag, this will recursively copy the files and directories within a directory.
Try to do a cp on a directory that contains a couple of files to your Documents directory. Didn’t work did it? Well that’s because you’ll need to copy over the files and directories inside as well with -r command.
$ cp -r Pumpkin/ /home/pete/Documents
One thing to note, if you copy a file over to a directory that has the same filename, the file will be overwritten with whatever you are copying over. This is no bueno if you have a file that you don’t want to get accidentally overwritten. You can use the -i flag (interactive) to prompt you before overwriting a file.
$ cp -i mycoolfile /home/pete/Pictures
Exercise
Copy over a couple of files, be careful not to overwrite anything important.
Quiz Question
What flag do you need to specify to copy over a directory?
Quiz Answer
-r
mv (Move)
Lesson Content
Used for moving files and also renaming them. Quite similar to the cp command in terms of flags and functionality.
You can rename files like this:
$ mv oldfile newfile
Or you can actually move a file to a different directory:
$ mv file2 /home/pete/Documents
And move more than one file:
$ mv file_1 file_2 /somedirectory
You can rename directories as well:
$ mv directory1 directory2
Like cp, if you mv a file or directory it will overwrite anything in the same directory. So you can use the -i flag to prompt you before overwriting anything.
mv -i directory1 directory2
Let’s say you did want to mv a file to overwrite the previous one. You can also make a backup of that file and it will just rename the old version with a ~.
$ mv -b directory1 directory2
Exercise
Rename a file, then move that file to a different directory.
Quiz Question
How do you rename a file called cat to dog?
Quiz Answer
mv cat dog
mkdir (Make Directory)
Lesson Content
We’re gonna need some directories to store all these files we’ve been working on. The mkdir command (Make Directory) is useful for that, it will create a directory if it doesn’t already exist. You can even make multiple directories at the same time.
$ mkdir books paintings
You can also create subdirectories at the same time with the -p (parent flag).
$ mkdir -p books/hemmingway/favorites
Exercise
Make a couple of directories and move some files into that directory.
Quiz Question
What command is use to make a directory?
Quiz Answer
mkdir
rm (Remove)
Lesson Content
Now I think we have too many files, let’s remove some files. To remove files you can use the rm command. The rm (remove) command is used to delete files and directories.
$ rm file1
Take caution when using rm, there is no magical trash can that you can fish out removed files. Once they are gone, they are gone for good, so be careful.
Fortunately there are some safety measures put into place, so the average joe can’t just remove a bunch of important files. Write-protected files will prompt you for confirmation before deleting them. If a directory is write-protected it will also not be easily removed.
Now if you don’t care about any of that, you can absolutely remove a bunch of files.
$ rm -f file1
-f or force option tells rm to remove all files, whether they are write protected or not, without prompting the user (as long as you have the appropriate permissions).
$ rm -i file
Adding the -i flag like many of the other commands, will give you a prompt on whether you want to actually remove the files or directories.
$ rm -r directory
You can’t just rm a directory by default, you’ll need to add the -r flag (recursive) to remove all the files and any subdirectories it may have.
You can remove a directory with the rmdir command.
$ rmdir directory
Exercise
- Create a file called -file (don't forget the dash!).
- Remove that file.
Quiz Question
How do you remove a file called myfile?
Quiz Answer
find
Lesson Content
With all these files we have on the system it can get a little hectic trying to find a specific one. Well there’s a command we can use for that, find!
$ find /home -name puppies.jpg
With find you’ll have to specify the directory you’ll be searching it, what you’re searching for, in this case we are trying to find a file by the name of puppies.jpg.
You can specify what type of file you are trying to find.
$ find /home -type d -name MyFolder
You can see that I set the type of file I’m trying to find as (d) for directory and I’m still searching by the name of MyFolder.
One cool thing to note is that find doesn’t stop at the directory you are searching, it will look inside any subdirectories that directory may have as well.
Exercise
- Find a file from the root directory that has the word net in it.
Quiz Question
What option should I specify for find if I want to search by name?
Quiz Answer
-name
help
Lesson Content
Linux has some great built-in tools to help you how to use a command or check what flags are available for a command. One tool, help, is a built-in bash command that provides help for other bash commands (echo, logout, pwd, etc).
$ help echo
This will give you a description and the options you can use when you want to run echo. For other executable programs, it’s convention to have an option called --help or something similar.
$ echo --help
Not all developers who ship out executables will conform to this standard, but it’s probably your best shot to find some help on a program.
Exercise
Run help on the echo command, logout command and pwd command.
Quiz Question
How do you get quick command line help for built-in bash commands?
Quiz Answer
help
man
Lesson Content
Gee I wish some of these programs had a manual so we can see some more information about them. Well luckily they do! Aptly named man pages, you can see the manuals for a command with the man command.
$ man ls
Man pages are manuals that are by default built into most Linux operating systems. They provide documentation about commands and other aspects of the system.
Try it out on a few commands to get more information about them.
Exercise
Run the man command on the ls command.
Quiz Question
How do you see the manuals for a command?
Quiz Answer
man
whatis
Lesson Content
Whew, we’ve learned quite a bit of commands so far, if you are ever feeling doubtful about what a command does, you can use the whatis command. The whatis command provides a brief description of command line programs.
$ whatis cat
The description gets sourced from the manual page of each command. If you ran whatis cat, you’d see there is a small blurb with a short description.
Exercise
Run the whatis command on the less command.
Quiz Question
What command can you use to see a small description of a command?
Quiz Answer
whatis
alias
Lesson Content
Sometimes typing commands can get really repetitive, or if you need to type a long command many times, it’s best to have an alias you can use for that. To create an alias for a command you simply specify an alias name and set it to the command.
$ alias foobar='ls -la'
Now instead of typing ls -la, you can type foobar and it will execute that command, pretty neat stuff. Keep in mind that this command won't save your alias after reboot, so you'll need to add a permanent alias in:
~/.bashrc
or similar files if you want to have it persist after reboot.
You can remove aliases with the unalias command:
$ unalias foobar
Exercise
Create a couple of aliases then remove them.
Quiz Question
What command is used to make an alias?
Quiz Answer
alias
exit
Lesson Content
Well, you sure did a good job getting through the basics. We’ve only scratched the surface, now that you’ve learned to crawl, in the next set of courses, I’m gonna teach how to walk.
For now, you can pat yourself on the back and take a break. To exit from the shell, you can use the exit command
$ exit
Or the logout command:
$ logout
Or if you are working out of a terminal GUI, you can just close the terminal, see you in the next course!
Exercise
Exit out of the shell and see what happens. Make sure you don't need to do anymore work in that shell.
Quiz Question
How can you exit from the shell?
Quiz Answer
exit
Text fu
stdout (Standard Out)
Lesson Content
By now, we've become familiar with many commands and their output and that brings us to our next subject I/O (input/output) streams. Let's run the following command and we'll discuss how this works.
$ echo Hello World > peanuts.txt
What just happened? Well check the directory where you ran that command and lo and behold you should see a file called peanuts.txt, look inside that file and you should see the text Hello World. Lots of things just happened in one command so let's break it down.
First let's break down the first part:
$ echo Hello World
We know this prints out Hello World to the screen, but how? Processes use I/O streams to receive input and return output. By default the echo command takes the input (standard input or stdin) from the keyboard and returns the output (standard output or stdout) to the screen. So that's why when you type echo Hello World in your shell, you get Hello World on the screen. However, I/O redirection allows us to change this default behavior giving us greater file flexibility.
Let's proceed to the next part of the command:
>
The > is a redirection operator that allows us the change where standard output goes. It allows us to send the output of echo Hello World to a file instead of the screen. If the file does not already exist it will create it for us. However, if it does exist it will overwrite it (you can add a shell flag to prevent this depending on what shell you are using).
And that's basically how stdout redirection works!
Well let's say I didn't want to overwrite my peanuts.txt, luckily there is a redirection operator for that as well, >>:
$ echo Hello World >> peanuts.txt
This will append Hello World to the end of the peanuts.txt file, if the file doesn't already exist it will create it for us like it did with the > redirector!
Exercise
Try a couple of commands:
$ ls -l /var/log > myoutput.txt $ echo Hello World > rm $ > somefile.txt
Quiz Question
What redirector do you use to append output to a file?
Quiz Answer
stdin (Standard In)
Lesson Content
In our previous lesson we learned that we have different stdout streams we can use, such as a file or the screen. Well there are also different standard input (stdin) streams we can use as well. We know that we have stdin from devices like the keyboard, but we can use files, output from other processes and the terminal as well, let's see an example.
Let's use the peanuts.txt file in the previous lesson for this example, remember it had the text Hello World in it.
$ cat < peanuts.txt > banana.txt
Just like we had > for stdout redirection, we can use < for stdin redirection.
Normally in the cat command, you send a file to it and that file becomes the stdin, in this case, we redirected peanuts.txt to be our stdin. Then the output of cat peanuts.txt which would be Hello World gets redirected to another file called banana.txt.
Exercise
Try out a couple of commands:
$ echo < peanuts.txt > banana.txt $ ls < peanuts.txt > banana.txt $ pwd < peanuts.txt > banana.txt
Quiz Question
What redirector do you use to redirect stdin?
Quiz Answer
<
stderr (Standard Error)
Lesson Content
Let's try something a little different now, let's try to list the contents of a directory that doesn't exist on your system and redirect the output to the peanuts.txt file again.
$ ls /fake/directory > peanuts.txt
What you should see is:
ls: cannot access /fake/directory: No such file or directory
Now you're probably thinking, shouldn't that message have been sent to the file? There is actually another I/O stream in play here called standard error (stderr). By default, stderr sends its output to the screen as well, it's a completely different stream than stdout. So you'll need to redirect its output a different way.
Unfortunately the redirector is not as nice as using < or > but it's pretty close. We will have to use file descriptors. A file descriptor is a non-negative number that is used to access a file or stream. We will go in depth about this later, but for now know that the file descriptor for stdin, stdout and stderr is 0, 1, and 2 respectively.
So now if we want to redirect our stderr to the file we can do this:
$ ls /fake/directory 2> peanuts.txt
You should see just the stderr messages in peanuts.txt.
Now what if I wanted to see both stderr and stdout in the peanuts.txt file? It's possible to do this with file descriptors as well:
$ ls /fake/directory > peanuts.txt 2>&1
This sends the results of ls /fake/directory to the peanuts.txt file and then it redirects stderr to the stdout via 2>&1. The order of operations here matters, 2>&1 sends stderr to whatever stdout is pointing to. In this case stdout is pointing to a file, so 2>&1 also sends stderr to a file. So if you open up that peanuts.txt file you should see both stderr and stdout. In our case, the above command only outputs stderr.
There is a shorter way to redirect both stdout and stderr to a file:
$ ls /fake/directory &> peanuts.txt
Now what if I don't want any of that cruft and want to get rid of stderr messages completely? Well you can also redirect output to a special file call /dev/null and it will discard any input.
$ ls /fake/directory 2> /dev/null
Exercise
What is the following command doing?
$ ls /fake/directory >> /dev/null 2>&1
Quiz Question
What is the redirector for stderr?
Quiz Answer
2>
pipe and tee
Lesson Content
Let's get into some plumbing now, not really but kinda. Let's try a command:
$ ls -la /etc
You should see a very long list of items, it's a little hard to read actually. Instead of redirecting this output to a file, wouldn't it be nice if we could just see the output in another command like less? Well we can!
$ ls -la /etc | less
The pipe operator |, represented by a vertical bar, allows us to get the stdout of a command and make that the stdin to another process. In this case, we took the stdout of ls -la /etc and then piped it to the less command. The pipe command is extremely useful and we will continue to use it for all eternity.
Well what if I wanted to write the output of my command to two different streams? That's possible with the tee command:
$ ls | tee peanuts.txt
You should see the output of ls on your screen and if you open up the peanuts.txt file you should see the same information!
Exercise
Try the following commands:
$ ls | tee peanuts.txt banan.txt
Quiz Question
What key represents the pipe operator?
Quiz Answer
env (Environment)
Lesson Content
Run the following command:
$ echo $HOME
You should see the path to your home directory, mine looks like /home/pete.
What about this command?
$ echo $USER
You should see your username!
Where is this information coming from? It's coming from your environment variables. You can view these by typing
$ env
This outputs a whole lot of information about the environment variables you currently have set. These variables contain useful information that the shell and other processes can use.
Here is a short example:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/bin PWD=/home/user USER=pete
One particularly important variable is the PATH Variable. You can access these variables by sticking a $ infront of the variable name like so:
$ echo $PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/bin
This returns a list of paths separated by a colon that your system searches when it runs a command. Let's say you manually download and install a package from the internet and put it in to a non standard directory and want to run that command, you type $ coolcommand and the prompt says command not found. Well that's silly you are looking at the binary in a folder and know it exists. What is happening is that $PATH variable doesn't check that directory for this binary so it's throwing an error.
Let's say you had tons of binaries you wanted to run out of that directory, you can just modify the PATH variable to include that directory in your PATH environment variable.
Exercise
What does the following output? Why?
$ echo $HOME
Quiz Question
How do you see your environment variables?
Quiz Answer
env
cut
Lesson Content
We're gonna learn a couple of useful commands that you can use to process text. Before we get started, let's create a file that we'll be working with. Copy and paste the following command, once you do that add a TAB in between lazy and dog (hold down Ctrl-v + TAB).
$ echo 'The quick brown; fox jumps over the lazy dog' > sample.txt
First command we'll be learning about is the cut command. It extracts portions of text from a file.
To extract contents by a list of characters:
$ cut -c 5 sample.txt
This outputs the 5th character in each line of the file. In this case it is "q", note that the space also counts as a character.
To extract the contents by a field, we'll need to do a little modification:
$ cut -f 2 sample.txt
The -f or field flag cuts text based off of fields, by default it uses TABs as delimiters, so everything separated by a TAB is considered a field. You should see "dog" as your output.
You can combine the field flag with the delimiter flag to extract the contents by a custom delimiter:
$ cut -f 1 -d ";" sample.txt
This will change the TAB delimiter to a ";" delimiter and since we are cutting the first field, the result should be "The quick brown".
Exercise
What does the following command do? Why?
$ cut -c 5-10 sample.txt $ cut -c 5- sample.txt $ cut -c -5 sample.txt
Quiz Question
What command would you use to get the first character of every line in a file?
Quiz Answer
cut -c 1
paste
Lesson Content
The paste command is similar to the cat command, it merges lines together in a file. Let's create a new file with the following contents:
sample2.txt The quick brown fox
Let's combine all these lines into one line:
$ paste -s sample2.txt
The default delimiter for paste is TAB, so now there is one line with TABs separating each word.
Let's change this delimiter (-d) to something a little more readable:
$ paste -d ' ' -s sample2.txt
Now everything should be on one line delimited by spaces.
Exercise
Try to paste multiple files together, what happens?
Quiz Question
What flag do you use with paste to make everything go on one line?
Quiz Answer
-s
head
Lesson Content
Let's say we have a very long file, in fact we have many to choose from, go ahead and cat /var/log/syslog. You should see pages upon pages of text. What if I just wanted to see the first couple of lines in this text file? Well we can do that with the head command, by default the head command will show you the first 10 lines in a file.
$ head /var/log/syslog
You can also modify the line count to whatever you choose, let's say I wanted to see the first 15 lines instead.
$ head -n 15 /var/log/syslog
The -n flag stands for number of lines.
Exercise
What does the following command do and why?
$ head -c 15 /var/log/syslog
Quiz Question
What flag would you use to change the number of lines you want to view for the head command?
Quiz Answer
-n
tail
Lesson Content
Similar to the head command, the tail command lets you see the last 10 lines of a file by default.
$ tail /var/log/syslog
Along with head you can change the number of lines you want to see.
$ tail -n 10 /var/log/syslog
Another great option you can use is the -f (follow) flag, this will follow the file as it grows. Give it a try and see what happens.
$ tail -f /var/log/syslog
Your syslog file will be continually changing while you interact with your system and using tail -f you can see everything that is getting added to that file.
Exercise
Look at the man page of tail and read some of the other commands we didn't discuss.
$ man tail
Quiz Question
What is the flag used to follow a file in tail?
Quiz Answer
-f
expand and unexpand
Lesson Content
In our lesson on the cut command, we had our sample.txt file that contained a tab. Normally TABs would usually show a noticeable difference but some text files don't show that well enough. Having TABs in a text file may not be the desired spacing you want. To change your TABs to spaces, use the expand command.
$ expand sample.txt
The command above will print output with each TAB converted into a group of spaces. To save this output in a file, use output redirection like below.
$ expand sample.txt > result.txt
Opposite to expand, we can convert back each group of spaces to a TAB with the unexpand command:
$ unexpand -a result.txt
Exercise
What happens if you just type expand with no file input?
Quiz Question
What command is used to convert TABs to spaces?
Quiz Answer
expand
sort
Lesson Content
The sort command is useful for sorting lines.
file1.txt dog cow cat elephant bird $ sort file1.txt bird cat cow dog elephant
You can also do a reverse sort:
$ sort -r file1.txt elephant dog cow cat bird
And also sort via numerical value:
$ sort -n file1.txt bird cat cow elephant dog
Exercise
The real power of sort comes with its ability to be combined with other commands, try the following command and see what happens?
$ ls /etc | sort -rn
Quiz Question
What flag do you use to do a reverse sort?
Quiz Answer
-r
tr (Translate)
Lesson Content
The tr (translate) command allows you to translate a set of characters into another set of characters. Let's try an example of translating all lower case characters to uppercase characters.
$ tr a-z A-Z hello HELLO
As you can see we made the ranges of a-z into A-Z and all text we type that is lowercase gets uppercased.
Exercise
Try the following command what happens?
$ tr -d ello hello
Quiz Question
What command is used to translate characters?
Quiz Answer
tr
uniq (Unique)
Lesson Content
The uniq (unique) command is another useful tool for parsing text.
Let's say you had a file with lots of duplicates:
reading.txt book book paper paper article article magazine
And you wanted to remove the duplicates, well you can use the uniq command:
$ uniq reading.txt book paper article magazine
Let's get the count of how many occurrences of a line:
$ uniq -c reading.txt 2 book 2 paper 2 article 1 magazine
Let's just get unique values:
$ uniq -u reading.txt magazine
Let's just get duplicate values:
$ uniq -d reading.txt book paper article
Note : uniq does not detect duplicate lines unless they are adjacent. For eg:
Let's say you had a file with duplicates which are not adjacent:
reading.txt book paper book paper article magazine article
$ uniq reading.txt reading.txt book paper book paper article magazine article
The result returned by uniq will contain all the entries unlike the very first example.
To overcome this limitation of uniq we can use sort in combination with uniq:
$ sort reading.txt | uniq article book magazine paper
Exercise
What result would you get if you tried uniq -uc?
Quiz Question
What command would you use to remove duplicates in a file?
Quiz Answer
uniq
wc and nl
Lesson Content
The wc (word count) command shows the total count of words in a file.
$ wc /etc/passwd 96 265 5925 /etc/passwd
It display the number of lines, number of words and number of bytes, respectively.
To just see just the count of a certain field, use the -l, -w, or -c respectively.
$ wc -l /etc/passwd 96
Another command you can use to check the count of lines on a file is the nl (number lines) command.
file1.txt i like turtles
$ nl file1.txt 1. i 2. like 3. turtles
Exercise
How would you get the total count of lines by using the nl file without searching through the entire output? Hint: Use some of the other commands you learned in this course.
Quiz Question
What command would you use to get the total number of words in a file and just the words?
Quiz Answer
wc -w
env (Environment)
Lesson Content
Run the following command:
$ echo $HOME
You should see the path to your home directory, mine looks like /home/pete.
What about this command?
$ echo $USER
You should see your username!
Where is this information coming from? It's coming from your environment variables. You can view these by typing
$ env
This outputs a whole lot of information about the environment variables you currently have set. These variables contain useful information that the shell and other processes can use.
Here is a short example:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/bin PWD=/home/user USER=pete
One particularly important variable is the PATH Variable. You can access these variables by sticking a $ infront of the variable name like so:
$ echo $PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/bin
This returns a list of paths separated by a colon that your system searches when it runs a command. Let's say you manually download and install a package from the internet and put it in to a non standard directory and want to run that command, you type $ coolcommand and the prompt says command not found. Well that's silly you are looking at the binary in a folder and know it exists. What is happening is that $PATH variable doesn't check that directory for this binary so it's throwing an error.
Let's say you had tons of binaries you wanted to run out of that directory, you can just modify the PATH variable to include that directory in your PATH environment variable.
Exercise
What does the following output? Why?
$ echo $HOME
Quiz Question
How do you see your environment variables?
Quiz Answer
env
Text fu advanced
Text Editors
Lesson Content
If you get a couple of diehard Linux users in a room and ask them what is the best text editor to use, you'll hear a never ending banter about the godliness of either vim or emacs. Don't even try to bring up using a GUI editor if you value your life.
Vim and emacs are popular text editors that are installed by default on most Linux distributions and they both have their pros and cons. If you want to get around your system like a ninja, you'll need to pick up one of these text editors to use. They are essentially coding, word document processing and basically all in one editors.
Exercise
Take a little tour of vim and emacs:
Quiz Question
No questions move along!
Quiz Answer
Vim (Vi Improved)
Lesson Content
Vim stands for vi (Improved) just like its name it stands for an improved version of the vi text editor command.
It's super lightweight, opening and editing a file with vim is quick and easy. It's also almost always available, if you booted up a random Linux distribution, chances are vim is installed by default.
To fire up vim just type:
vim
Exercise
No exercises for this lesson.
Quiz Question
No questions move along!
Quiz Answer
Vim Search Patterns
Lesson Content
To search for an expression just type the / key and then your search result while you are in a vim session. Once you hit enter, you can press "n" to go forward or "N" to go backward in your search results.
My pretty file is very pretty. /pretty will find the pretty words in the text file.
The ? search command will search the text file backwards, so in the previous example, the last pretty would come up first.
My pretty file is very pretty. ?pretty will find the pretty words in the text file.
Exercise
Play with the search key, open a text file in vim with: vim [textfile] and start searching!
Quiz Question
What key is used to search in vim?
Quiz Answer
/
Vim Navigation
Lesson Content
Now you may notice, the mouse is nowhere is use here. To navigate a text document in vim, use the following keys:
- h or the left arrow - will move you left one character
- j or the up arrow - will move you up one line
- k or the down arrow - will move you down one line
- l or the right arrow - will move you right one character
Exercise
No exercises for this lesson.
Quiz Question
What letter is used to move down?
Quiz Answer
k
Title
Lesson Content
Exercise
Quiz Question
Quiz Answer
Title
Lesson Content
Exercise
Quiz Question
Quiz Answer
Vim Saving and Exiting
Lesson Content
Now that you've done your editing it's time to actually save and quit out of vim:
- :w - writes or saves the file
- :q - quit out of vim
- :wq - write and then quit
- :q! - quit out of vim without saving the file
- ZZ - equivalent of :wq, but one character faster
- u - undo your last action
- Ctrl-r - redo your last action
You may not think ZZ is necessary, but you'll eventually see that your fingers may tend to lean towards this rather than :wq.
Whew that was a lot of information to take about Vim. Now that you know some basic commands and navigation, you can start editing some text files. There are many more options you can use in vim to increase your ability to master this text editor, head on to Vim's online guide to take a look.
Exercise
No exercises for this lesson.
Quiz Question
How do you quit out of vim without saving?
Quiz Answer
:q!
Emacs
Lesson Content
Emacs is for users who want an extremely powerful text editor, which may be an understatement because you essentially live in emacs. You can do all your code editing, file manipulation, etc all within emacs. It's a bit slower to load up and the learning curve is a bit steeper than vim, but if you want a powerful editor that is extremely extensible, this is the one for you. When I say extensible, I literally mean you can write up scripts for emacs that extend its functionality.
To start emacs just use:
emacs
You should be greeted with the default welcome buffer.
Buffers in emacs is what your text resides in. So if you open up a file, a buffer is used to store that file's content. You can have multiple buffers open at the same time and you can easily switch between buffers.
Exercise
No exercises for this lesson.
Quiz Question
No questions move along!
Quiz Answer
Emacs Buffer Navigation
Lesson Content
To move around buffers (or files you're visiting) use the following commands:
Switch buffers
C-x b - switch buffer C-x right arrow - right-cycle through buffer C-x left arrow - left-cycle through buffer
Close the buffer
C-x k
Split the current buffer
C-x 2
This allows you see multiple buffers on one screen. To move between these buffers use: C-x o
Set a single buffer as the current screen
C-x 1
If you ever used a terminal multiplexer like screen and tmux, the buffer commands will feel very familiar.
Exercise
Play around with buffers.
Quiz Question
How do you kill a buffer?
Quiz Answer
C-x k
Emacs Editing
Lesson Content
Text Navigation
C-up arrow : move up one paragraph C-down arrow: move down one paragraph C-left arrow: move one word left C-right arrow: move one word right M-> : move to the end of the buffer
With text navigation, your regular text buttons work as they should, home, end, page up, page down and the arrow keys, etc.
Cutting and Pasting
To cut (kill) or paste (yank) in Emacs you'll need to be able to select text first. To select text, move your cursor to where you want to cut or paste and hit
C-space keythen you can use the navigation keys to select the text you want. Now you can do the cut and paste like so:
C-w : cut C-y : yank
Exercise
Play around with text navigation.
Quiz Question
How do you move to the end of the buffer?
Quiz Answer
M->
Emacs Exiting and Help
Lesson Content
To close out of emacs
C-x C-c
If you have any open buffers, it will ask you to save it before closing out of emacs.
Confused?
C-h C-h : help menu
Undo
C-x u
As you can see Emacs has more moving parts, so the learning curve is a little steeper. In exchange though, you get a very powerful text editor.
Exercise
Visit the Emacs site to learn about more commands. Emacs
Quiz Question
How do you access the help menu?
Quiz Answer
C-h C-h
User Management
Users and Groups
Lesson Content
In any traditional operating system, there are users and groups. They exist solely for access and permissions. When running a process, it will run as the owner of that process whether that is Jane or Bob. File access and ownership is also permission dependent. You wouldn't want Jane to see Bob's documents and vice versa.
Each user has their own home directory where their user specific files get stored, this is usually located in /home/username, but can vary in different distributions.
The system uses user ids (UID) to manage users, usernames are the friendly way to associate users with identification, but the system identifies users by their UID. The system also uses groups to manage permissions, groups are just sets of users with permission set by that group, they are identified by the system with their group ID (GID).
In Linux, you'll have users in addition to the normal humans that use the system. Sometimes these users are system daemons that continuously run processes to keep the system functioning. One of the most important users is root or superuser, root is the most powerful user on the system, root can access any file and start and terminate any process. For that reason, it can be dangerous to operate as root all the time, you could potentially remove system critical files. Luckily, if root access is needed and a user has root access, they can run a command as root instead with the sudo command. The sudo command (superuser do) is used to run a command with root access, we'll go more in depth on how a user receives root access in a later lesson.
Go ahead and try to view a protected file like /etc/shadow:
$ cat /etc/shadow
Notice how you get a permission denied error, look at the permissions with:
$ ls -la /etc/shadow -rw-r----- 1 root shadow 1134 Dec 1 11:45 /etc/shadow
We haven't gone through permissions yet, but what's happening here is that root is the owner of the file and you'll need root access or be part of the shadow group to read the contents. Now run the command with sudo:
$ sudo cat /etc/shadow
Now you'll be able to see the contents of the file!
Exercise
No exercises for this lesson.
Quiz Question
What command do you use to run as root?
Quiz Answer
sudo
root
Lesson Content
We've looked at one way to get superuser access using the sudo command. You can also run commands as the superuser with the su command. This command will "substitute users" and open a root shell if no username is specified. You can use this command to substitute to any user as long as you know the password.
$ su
There are some downsides to using this method: it's much easier to make a critical mistake running everything in root, you won't have records of the commands you use to change system configurations, etc. Basically, if you need to run commands as the superuser, just stick to sudo.
Now that you know what commands to run as the superuser, the question is how do you know who has access to do that? The system doesn't let every single Joe Schmoe run commands as the superuser, so how does it know? There is a file called the /etc/sudoers file, this file lists users who can run sudo. You can edit this file with the visudo command.
Exercise
Open up the /etc/sudoers file and see what superuser permissions other users on the machine have.
Quiz Question
What file shows the users who have access to sudo?
Quiz Answer
/etc/sudoers
/etc/passwd
Lesson Content
Remember that usernames aren't really identifications for users. The system uses a user ID (UID) to identify a user. To find out what users are mapped to what ID, look at the /etc/passwd file.
$ cat /etc/passwd
This file shows you a list of users and detailed information about them. For example, the first line in this file most likely looks like this:
root:x:0:0:root:/root:/bin/bash
Each line displays user information for one user, most commonly you'll see the root user as the first line. There are many fields separated by colons that tell you additional information about the user, let's look at them all:
- Username
- User's password - the password is not really stored in this file, it's usually stored in the /etc/shadow file. We'll discuss more in the next lesson about /etc/shadow, but for now, know that it contains encrypted user passwords. You can see many different symbols that are in this field, if you see an "x" that means the password is stored in the /etc/shadow file, a "*" means the user doesn't have login access and if there is a blank field that means the user doesn't have a password.
- The user ID - as you can see root has the UID of 0
- The group ID
- GECOS field - This is used to generally leave comments about the user or account such as their real name or phone number, it is comma delimited.
- User's home directory
- User's shell - you'll probably see a lot of user's defaulting to bash for their shell
Normally in a user's setting page, you would expect you see just human users. However, you'll notice /etc/passwd contains other users. Remember that users are really only on the system to run processes with different permissions. Sometimes we want to run processes with pre-determined permissions. For example, the daemon user is used for daemon processes.
Also should note that you can edit the /etc/passwd file by hand if you want to add users and modify information with the vipw tool, however things like these are best left to the tools we will discuss in a later lesson such as useradd and userdel.
Exercise
Look at your /etc/passwd file, take a look at some of the users and note the access they have.
Quiz Question
If a user doesn't have login access how is that denoted in /etc/passwd?
Quiz Answer
/etc/shadow
Lesson Content
The /etc/shadow file is used to store information about user authentication. It requires superuser read permissions.
$ sudo cat /etc/shadow root:MyEPTEa$6Nonsense:15000:0:99999:7:::
You'll notice that it looks very similar to the contents of /etc/passwd, however in the password field you'll see an encrypted password. The fields are separated by colons as followed:
- Username
- Encrypted password
- Date of last password changed - expressed as the number of days since Jan 1, 1970. If there is a 0 that means the user should change their password the next time they login
- Minimum password age - Days that a user will have to wait before being able to change their password again
- Maximum password age - Maximum number of days before a user has to change their password
- Password warning period - Number of days before a password is going to expire
- Password inactivity period - Number of days after a password has expired to allow login with their password
- Account expiration date - date that user will not be able to login
- Reserved field for future use
In most distributions today, user authentication doesn't rely on just the /etc/shadow file, there are other mechanisms in place such as PAM (Pluggable Authentication Modules) that replace authentication.
Exercise
Take a look at the /etc/shadow file
Quiz Question
No questions move along!
Quiz Answer
/etc/group
Lesson Content
Another file that is used in user management is the /etc/group file. This file allows for different groups with different permissions.
$ cat /etc/group root:*:0:pete
Very similar to the /etc/password field, the /etc/group fields are as follows:
- Group name
- Group password - there isn't a need to set a group password, using an elevated privilege like sudo is standard. A "*" will be put in place as the default value.
- Group ID (GID)
- List of users - you can manually specify users you want in a specific group
Exercise
Run the command groups. What do you see?
Quiz Question
What is the GID of root?
Quiz Answer
0
User Management Tools
Lesson Content
Most enterprise environments are using management systems to manage users, accounts and passwords. However, on a single machine computer there are useful commands to run to manage users.
Adding Users
You can use the adduser or the useradd command. The adduser command contains more helpful features such as making a home directory and more. There are configuration files for adding new users that can be customized depending on what you want to allocate to a default user.
$ sudo useradd bob
You'll see that the above command creates an entry in /etc/passwd for bob, sets up default groups and adds an entry to the /etc/shadow file.
Removing Users
To remove a user, you can use the userdel command.
$ sudo userdel bob
This basically does its best to undo the file changes by useradd.
Changing Passwords
$ passwd bob
This will allow you to change the password of yourself or another user (if you are root).
Exercise
Create a new user then change their password and login as the new user.
Quiz Question
What command is used to change a password?
Quiz Answer
passwd
Access
File Permissions
Lesson Content
As we learned previously, files have different permissions or file modes. Let's look at an example:
$ ls -l Desktop/ drwxr-xr-x 2 pete penguins 4096 Dec 1 11:45 .
There are four parts to a file's permissions. The first part is the filetype, which is denoted by the first character in the permissions, in our case since we are looking at a directory it shows d for the filetype. Most commonly you will see a - for a regular file.
The next three parts of the file mode are the actual permissions. The permissions are grouped into 3 bits each. The first 3 bits are user permissions, then group permissions and then other permissions. I've added the pipe to make it easier to differentiate.
d | rwx | r-x | r-x
Each character represent a different permission:
- r: readable
- w: writable
- x: executable (basically an executable program)
- -: empty
So in the above example, we see that the user pete has read, write and execute permissions on the file. The group penguins has read and execute permissions. And finally, the other users (everyone else) has read and execute permissions.
Exercise
Use the ls -l command on multiple files and recite their permissions, user and group.
Quiz Question
What permission bit is used for executable?
Quiz Answer
x
Modifying Permissions
Lesson Content
Changing permissions can easily be done with the chmod command.
First, pick which permission set you want to change, user, group or other. You can add or remove permissions with a + or -, let's look at some examples.
Adding permission bit on a file
$ chmod u+x myfile
The above command reads like this: change permission on myfile by adding executable permission bit on the user set. So now the user has executable permission on this file!
Removing permission bit on a file
$ chmod u-x myfile
Adding multiple permission bits on a file
$ chmod ug+w
There is another way to change permissions using numerical format. This method allows you to change permissions all at once. Instead of using r, w, or x to represent permissions, you'll use a numerical representation for a single permission set. So no need to specify the group with g or the user with u.
The numerical representations are seen below:
- 4: read permission
- 2: write permission
- 1: execute permission
Let's look at an example:
$ chmod 755 myfile
Can you guess what permissions we are giving this file? Let's break this down, so now 755 covers the permissions for all sets. The first number (7) represents user permissions, the second number (5) represents group permissions and the last 5 represents other permissions.
Wait a minute, 7 and 5 weren't listed above, where are we getting these numbers? Remember we are combining all the permissions into one number now, so you'll have to get some math involved.
7 = 4 + 2 + 1, so 7 is the user permissions and it has read, write and execute permissions
5 = 4 + 1, the group has read and execute permissions
5 = 4 +1, and all other users have read and execute permissions
One thing to note: it's not a great idea to be changing permissions nilly willy, you could potentially expose a sensitive file for everyone to modify, however many times you legitimately want to change permissions, just take precaution when using the chmod command.
Exercise
Change some basic text file permissions and see the bits changing as you do an ls -l.
Quiz Question
What number represents the read permission when using numerical format?
Quiz Answer
4
Ownership Permissions
Lesson Content
In addition to modifying permissions on files, you can also modify the group and user ownership of the file as well.
Modify user ownership
$ sudo chown patty myfile
This command will set the owner of myfile to patty.
Modify group ownership
$ sudo chgrp whales myfile
This command will set the group of myfile to whales.
Modify both user and group ownership at the same time If you add a colon and groupname after the user you can set both the user and group at the same time.
$ sudo chown patty:whales myfile
Exercise
Modify the group and user of some test files. Afterwards take a look at the permissions with ls -l.
Quiz Question
What command do you use to change user ownership?
Quiz Answer
chown
unmask
Setuid
Lesson Content
There are many cases in which normal users need elevated access to do stuff. The system administrator can't always be there to enter in a root password every time a user needed access to a protected file, so there are special file permission bits to allow this behavior. The Set User ID (SUID) allows a user to run a program as the owner of the program file rather than as themselves.
Let's look at an example:
Let's say I want to change my password, simple right? I just use the passwd command:
$ passwd
What is the password command doing? It's modifying a couple of files, but most importantly it's modifying the /etc/shadow file. Let's look at that file for a second:
$ ls -l /etc/shadow -rw-r----- 1 root shadow 1134 Dec 1 11:45 /etc/shadow
Oh wait a minute here, this file is owned by root? How is it possible that we are able to modify a file owned by root?
Let's look at another permission set, this time of the command we ran:
$ ls -l /usr/bin/passwd -rwsr-xr-x 1 root root 47032 Dec 1 11:45 /usr/bin/passwd
You'll notice a new permission bit here s. This permission bit is the SUID, when a file has this permission set, it allows the users who launched the program to get the file owner's permission as well as execution permission, in this case root. So essentially while a user is running the password command, they are running as root.
That's why we are able to access a protected file like /etc/shadow when we run the passwd command. Now if you removed that bit, you would see that you will not be able to modify /etc/shadow and therefore change your password.
Modifying SUID
Just like regular permissions there are two ways to modify SUID permissions.
Symbolic way:
$ sudo chmod u+s myfile
Numerical way:
sudo chmod 4755 myfile
As you can see the SUID is denoted by a 4 and pre-pended to the permission set. You may see the SUID denoted as a capital S this means that it still does the same thing, but it does not have execute permissions.
Exercise
Look at the permission for /etc/passwd in detail, do you notice anything else? Files with SUID enabled are also easily distinguishable.
Quiz Question
What number represents the SUID?
Quiz Answer
4
Setgid
Lesson Content
Similar to the set user ID permission bit, there is a set group ID (SGID) permission bit. This bit allows a program to run as if it was a member of that group.
Let's look at one example:
$ ls -l /usr/bin/wall -rwxr-sr-x 1 root tty 19024 Dec 14 11:45 /usr/bin/wall
We can see now that the permission bit is in the group permission set.
Modifying SGID
$ sudo chmod g+s myfile $ sudo chmod 2555 myfile
The numerical representation for SGID is 2.
Exercise
No exercises for this lesson.
Quiz Question
What number represents the SGID?
Quiz Answer
2
Process Permissions
Lesson Content
Let's segway into process permissions for a bit, remember how I told you that when you run the passwd command with the SUID permission bit enabled you will run the program as root? That is true, however does that mean since you are temporarily root you can modify other user's passwords? Nope fortunately not!
This is because of the many UIDs that Linux implements. There are three UIDS associated with every process:
When you launch a process, it runs with the same permissions as the user or group that ran it, this is known as an effective user ID. This UID is used to grant access rights to a process. So naturally if Bob ran the touch command, the process would run as him and any files he created would be under his ownership.
There is another UID, called the real user ID this is the ID of the user that launched the process. These are used to track down who the user who launched the process is.
One last UID is the saved user ID, this allows a process to switch between the effective UID and real UID, vice versa. This is useful because we don't want our process to run with elevated privileges all the time, it's just good practice to use special privileges at specific times.
Now let's piece these all together by looking at the passwd command once more.
When running the passwd command, your effective UID is your user ID, let's say its 500 for now. Oh but wait, remember the passwd command has the SUID permission enabled. So when you run it, your effective UID is now 0 (0 is the UID of root). Now this program can access files as root.
Let's say you get a little taste of power and you want to modify Sally's password, Sally has a UID of 600. Well you'll be out of luck, fortunately the process also has your real UID in this case 500. It knows that your UID is 500 and therefore you can't modify the password of UID of 600. (This of course is always bypassed if you are a superuser on a machine and can control and change everything).
Since you ran passwd, it will start the process off using your real UID, and it will save the UID of the owner of the file (effective UID), so you can switch between the two. No need to modify all files with root access if it's not required.
Most of the time the real UID and the effective UID are the same, but in such cases as the passwd command they will change.
Exercise
We haven't discussed processes yet, we can still take a look at this change happening in real time:
- Open one terminal window, and run the command: watch -n 1 "ps aux | grep passwd". This will watch for the passwd process.
- Open a second terminal window and run: passwd
- Look at the first terminal window, you'll see a process come up for passwd. The first column in the process table is the effective user ID, lo and behold it's the root user!
Quiz Question
What UID decides what access to grant?
Quiz Answer
effective
The Sticky Bit
Lesson Content
One last special permission bit I want to talk about is the sticky bit.
This permission bit, "sticks a file/directory" this means that only the owner or the root user can delete or modify the file. This is very useful for shared directories. Take a look at the example below:
$ ls -ld /tmp drwxrwxrwxt 6 root root 4096 Dec 15 11:45 /tmp
You'll see a special permission bit at the end here t, this means everyone can add files, write files, modify files in the /tmp directory, but only root can delete the /tmp directory.
Modify sticky bit
$ sudo chmod +t mydir $ sudo chmod 1755 mydir
The numerical representation for the sticky bit is 1
Exercise
What other files and directories do you think have a sticky bit enabled?
Quiz Question
What symbol represents the sticky bit?
Quiz Answer
t
Processes
ps (Processes)
Lesson Content
Processes are the programs that are running on your machine. They are managed by the kernel and each process has an ID associated with it called the process ID (PID). This PID is assigned in the order that processes are created.
Go ahead and run the ps command to see a list of running processes:
$ ps PID TTY STAT TIME CMD 41230 pts/4 Ss 00:00:00 bash 51224 pts/4 R+ 00:00:00 ps
This shows you a quick snapshot of the current processes:
- PID: Process ID
- TTY: Controlling terminal associated with the process (we'll go in detail about this later)
- STAT: Process status code
- TIME: Total CPU usage time
- CMD: Name of executable/command
If you look at the man page for ps you'll see that there are lots of command options you can pass, they will vary depending on what options you want to use - BSD, GNU or Unix. In my opinion the BSD style is more popular to use, so we're gonna go with that. If you are curious the difference between the styles is the amount of dashes you use and the flags.
$ ps aux
The a displays all processes running, including the ones being ran by other users. The u shows more details about the processes. And finally the x lists all processes that don't have a TTY associated with it, these programs will show a ? in the TTY field, they are most common in daemon processes that launch as part of the system startup.
You'll notice you're seeing a lot more fields now, no need to memorize them all, in a later course on advanced processes, we'll go over some of these again:
- USER: The effective user (the one whose access we are using)
- PID: Process ID
- %CPU: CPU time used divided by the time the process has been running
- %MEM: Ratio of the process's resident set size to the physical memory on the machine
- VSZ: Virtual memory usage of the entire process
- RSS: Resident set size, the non-swapped physical memory that a task has used
- TTY: Controlling terminal associated with the process
- STAT: Process status code
- START: Start time of the process
- TIME: Total CPU usage time
- COMMAND: Name of executable/command
The ps command can get a little messy to look at, for now the fields we will look at the most are PID, STAT and COMMAND.
Another very useful command is the top command, top gives you real time information about the processes running on your system instead of a snapshot. By default you'll get a refresh every 10 seconds. Top is an extremely useful tool to see what processes are taking up a lot of your resources.
$ top
Exercise
Use the ps command with different flags and see how the output changes.
Quiz Question
What ps flag is used to view detailed information about processes?
Quiz Answer
u
Controlling Terminal
Lesson Content
We discussed how there is a TTY field in the ps output. The TTY is the terminal that executed the command.
There are two types of terminals, regular terminal devices and pseudoterminal devices. A regular terminal device is a native terminal device that you can type into and send output to your system, this sounds like the terminal application you've been launching to get to your shell, but it's not.
We're gonna segue so you can see this action, go ahead and type Ctrl-Alt-F1 to get into TTY1 (the first virtual console), you'll notice how you don't have anything except the terminal, no graphics, etc. This is considered a regular terminal device, you can exit this with Ctrl-Alt-F7.
A pseudoterminal is what you've been used to working in, they emulate terminals with the shell terminal window and are denoted by PTS . If you look at ps again, you'll see your shell process under pts/*.
Ok, now circling back to the controlling terminal, processes are usually bound to a controlling terminal. For example, if you were running a program on your shell window such as find and you closed the window, your process would also go with it.
There are processes such as daemon processes, which are special processes that are essentially keeping the system running. They often start at system boot and usually get terminated when the system is shutdown. They run in the background and since we don't want these special processes to get terminated they are not bound to a controlling terminal. In the ps output, the TTY is listed as a ? meaning it does not have a controlling terminal.
Exercise
Look at your ps output and list all the unique TTY values.
Quiz Question
What value is given for a process that does not have a controlling terminal?
Quiz Answer
?
Process Details
Lesson Content
Before we get into more practical applications of processes, we have to first understand what they are and how they work. This part can get confusing since we are diving into the nitty gritty, so feel free to come back to this lesson if you don't want to learn about it now.
A process like we said before is a running program on the system, more precisely it's the system allocating memory, CPU, I/O to make the program run. A process is an instance of a running program, go ahead and open 3 terminal windows, in two windows, run the cat command without passing any options (the cat process will stay open as a process because it expects stdin). Now in the third window run: ps aux | grep cat. You'll see that there are two processes for cat, even though they are calling the same program.
The kernel is in charge of processes, when we run a program the kernel loads up the code of the program in memory, determines and allocates resources and then keeps tabs on each process, it knows:
- The status of the process
- The resources the process is using and receives
- The process owner
- Signal handling (more on that later)
- And basically everything else
All processes are trying to get a taste of that sweet resource pie, it's the kernel's job to make sure that processes get the right amount of resources depending on process demands. When a process ends, the resources it used are now freed up for other processes.
Exercise
No exercises for this lesson.
Quiz Question
What manages and controls processes?
Quiz Answer
kernel
Process Creation
Lesson Content
Again this lesson and the next are purely information to let you see what's under the hood, feel free to circle back to this once you've worked with processes a bit more.
When a new process is created, an existing process basically clones itself using something called the fork system call (system calls will be discussed very far into the future). The fork system call creates a mostly identical child process, this child process takes on a new process ID (PID) and the original process becomes its parent process and has something called a parent process ID PPID. Afterwards, the child process can either continue to use the same program its parent was using before or more often use the execve system call to launch up a new program. This system call destroys the memory management that the kernel put into place for that process and sets up new ones for the new program.
We can see this in action:
$ ps l
The l option gives us a "long format" or even more detailed view of our running processes. You'll see a column labelled PPID, this is the parent ID. Now look at your terminal, you'll see a process running that is your shell, so on my system I have a process running bash. Now remember when you ran the ps l command, you were running it from the process that was running bash. Now you'll see that the PID of the bash shell is the PPID of the ps l command.
So if every process has to have a parent and they are just forks of each other, there must be a mother of all processes right? You are correct, when the system boots up, the kernels creates a process called init, it has a PID of 1. The init process can't be terminated unless the system shuts down. It runs with root privileges and runs many processes that keep the system running. We will take a closer look at init in the system bootup course, for now just know it is the process that spawns all other processes.
Exercise
Take a look at your running processes, can you see what other processes have parents?
Quiz Question
What system call creates a new process?
Quiz Answer
fork
Process Termination
Lesson Content
Now that we know what goes on when a process gets created, what is happening when we don't need it anymore? Be forewarned, sometimes Linux can get a little dark...
A process can exit using the _exit system call, this will free up the resources that process was using for reallocation. So when a process is ready to terminate, it lets the kernel know why it's terminating with something called a termination status. Most commonly a status of 0 means that the process succeeded. However, that's not enough to completely terminate a process. The parent process has to acknowledge the termination of the child process by using the wait system call and what this does is it checks the termination status of the child process. I know it's gruesome to think about, but the wait call is a necessity, after all what parent wouldn't want to know how their child died?
There is another way to terminate a process and that involves using signals, which we will discuss soon.
Orphan Processes
When a parent process dies before a child process, the kernel knows that it's not going to get a wait call, so instead it makes these processes "orphans" and puts them under the care of init (remember mother of all processes). Init will eventually perform the wait system call for these orphans so they can die.
Zombie Processes
What happens when a child terminates and the parent process hasn't called wait yet? We still want to be able to see how a child process terminated, so even though the child process finished, the kernel turns the child process into a zombie process. The resources the child process used are still freed up for other processes, however there is still an entry in the process table for this zombie. Zombie processes also cannot be killed, since they are technically "dead", so you can't use signals to kill them. Eventually if the parent process calls the wait system call, the zombie will disappear, this is known as "reaping". If the parent doesn't perform a wait call, init will adopt the zombie and automatically perform wait and remove the zombie. It can be a bad thing to have too many zombie processes, since they take up space on the process table, if it fills up it will prevent other processes from running.
Exercise
No exercises for this lesson.
Quiz Question
What is the most common termination status for a process succeeding?
Quiz Answer
0
Signals
Lesson Content
A signal is a notification to a process that something has happened.
Why we have signals
They are software interrupts and they have lots of uses:
- A user can type one of the special terminal characters (Ctrl-C) or (Ctrl-Z) to kill, interrupt or suspend processes
- Hardware issues can occur and the kernel wants to notify the process
- Software issues can occur and the kernel wants to notify the process
- They are basically ways processes can communicate
Signal process
When a signal is generated by some event, it's then delivered to a process, it's considered in a pending state until it's delivered. When the process is ran, the signal will be delivered. However, processes have signal masks and they can set signal delivery to be blocked if specified. When a signal is delivered, a process can do a multitude of things:
- Ignore the signal
- "Catch" the signal and perform a specific handler routine
- Process can be terminated, as opposed to the normal exit system call
- Block the signal, depending on the signal mask
Common signals
Each signal is defined by integers with symbolic names that are in the form of SIGxxx. Some of the most common signals are:
- SIGHUP or HUP or 1: Hangup
- SIGINT or INT or 2: Interrupt
- SIGKILL or KILL or 9: Kill
- SIGSEGV or SEGV or 11: Segmentation fault
- SIGTERM or TERM or 15: Software termination
- SIGSTOP or STOP: Stop
Numbers can vary with signals so they are usually referred by their names.
Some signals are unblockable, one example is the SIGKILL signal. The KILL signal destroys the process.
Exercise
No exercises for this lesson.
Quiz Question
What signal is unblockable?
Quiz Answer
SIGKILL
kill (Terminate)
Lesson Content
You can send signals that terminate processes, such a command is aptly named the kill command.
$ kill 12445
The 12445 is the PID of the process you want to kill. By default it sends a TERM signal. The SIGTERM signal is sent to a process to request its termination by allowing it to cleanly release its resources and saving its state.
You can also specify a signal with the kill command:
$ kill -9 12445
This will run the SIGKILL signal and kill the process.
Differences between SIGHUP, SIGINT, SIGTERM, SIGKILL, SIGSTOP?
These signals all sound reasonably similar, but they do have their differences.
- SIGHUP - Hangup, sent to a process when the controlling terminal is closed. For example, if you closed a terminal window that had a process running in it, you would get a SIGHUP signal. So basically you've been hung up on
- SIGINT - Is an interrupt signal, so you can use Ctrl-C and the system will try to gracefully kill the process
- SIGTERM - Kill the process, but allow it to do some cleanup first
- SIGKILL - Kill the process, kill it with fire, doesn't do any cleanup
- SIGSTOP - Stop/suspend a process
Exercise
Kill some processes using different signals.
Quiz Question
What is the signal name for the default kill command?
Quiz Answer
SIGTERM
niceness
Lesson Content
When you run multiple things on your computer, like perhaps Chrome, Microsoft Word or Photoshop at the same time, it may seem like these processes are running at the same time, but that isn't quite true.
Processes use the CPU for a small amount of time called a time slice. Then they pause for milliseconds and another process gets a little time slice. By default, process scheduling happens in this round-robin fashion. Every process gets enough time slices until it's finished processing. The kernel handles all of these switching of processes and it does a pretty good job at it most of the time.
Processes aren't able to decide when and how long they get CPU time, if all processes behaved normally they would each (roughly) get an equal amount of CPU time. However, there is a way to influence the kernel's process scheduling algorithm with a nice value. Niceness is a pretty weird name, but what it means is that processes have a number to determine their priority for the CPU. A high number means the process is nice and has a lower priority for the CPU and a low or negative number means the process is not very nice and it wants to get as much of the CPU as possible.
$ top
You can see a column for NI right now, that is the niceness level of a process.
To change the niceness level you can use the nice and renice commands:
$ nice -n 5 apt upgrade
The nice command is used to set priority for a new process. The renice command is used to set priority on an existing process.
$ renice 10 -p 3245
Exercise
What processes aren't very nice and why?
Quiz Question
If I want a process to get more CPU priority, do I use a lower or higher nice number?
Quiz Answer
lower
Process States
Lesson Content
Let's take a look at the ps aux command again:
$ ps aux
In the STAT column, you'll see lots of values. A linux process can be in a number of different states. The most common state codes you'll see are described below:
- R: running or runnable, it is just waiting for the CPU to process it
- S: Interruptible sleep, waiting for an event to complete, such as input from the terminal
- D: Uninterruptible sleep, processes that cannot be killed or interrupted with a signal, usually to make them go away you have to reboot or fix the issue
- Z: Zombie, we discussed in a previous lesson that zombies are terminated processes that are waiting to have their statuses collected
- T: Stopped, a process that has been suspended/stopped
Exercise
Take a look at the running processes on your system and check out their process states.
Quiz Question
What STAT code is used to represent an uninterruptible sleep?
Quiz Answer
D
/proc filesystem
Lesson Content
Remember everything in Linux is a file, even processes. Process information is stored in a special filesystem known as the /proc filesystem.
$ ls /proc
You should see multiple values in here, there are sub-directories for every PID. If you looked at a PID in the ps output, you would be able to find it in the /proc directory.
Go ahead and enter one of the processes and look at that file:
$ cat /proc/12345/status
You should see process state information and well as more detailed information. The /proc directory is how the kernel is views the system, so there is a lot more information here than what you would see in ps.
Exercise
No exercises for this lesson.
Quiz Question
What filesystem stores process information?
Quiz Answer
/proc
Job Control
Lesson Content
Let's say you're working on a single terminal window and you're running a command that is taking forever. You can't interact with the shell until it is complete, however we want to keep working on our machines, so we need that shell open. Fortunately we can control how our processes run with jobs:
Sending a job to the background
Appending an ampersand (&) to the command will run it in the background so you can still use your shell. Let's see an example:
$ sleep 1000 & $ sleep 1001 & $ sleep 1002 &
View all background jobs
Now you can view the jobs you just sent to the background.
$ jobs [1] Running sleep 1000 & [2]- Running sleep 1001 & [3]+ Running sleep 1002 &
This will show you the job id in the first column, then the status and the command that was run. The + next to the job ID means that it is the most recent background job that started. The job with the - is the second most recent command.
Sending a job to the background on existing job
If you already ran a job and want to send it to the background, you don't have to terminate it and start over again. First suspend the job with Ctrl-Z, then run the bg command to send it to the background.
pete@icebox ~ $ sleep 1003 ^Z [4]+ Stopped sleep 1003 pete@icebox ~ $ bg [4]+ sleep 1003 & pete@icebox ~ $ jobs [1] Running sleep 1000 & [2] Running sleep 1001 & [3]- Running sleep 1002 & [4]+ Running sleep 1003 &
Moving a job from the background to the foreground
To move a job out of the background just specify the job ID you want. If you run fg without any options, it will bring back the most recent background job (the job with the + sign next to it)
$ fg %1
Kill background jobs
Similar to moving jobs out of the background, you can use the same form to kill the processes by using their Job ID.
kill %1
Exercise
Move some jobs between the background and the foreground
Quiz Question
What command is used to list background jobs?
Quiz Answer
jobs
Packages
Software Distribution
Lesson Content
Your system is comprised of many packages such as internet browsers, text editors, media players, etc. These packages are managed via package managers, which install and maintain the software on your system. Not all packages are installed through package managers though, you can commonly install packages directly from their source code (we'll get to that soon). However the majority of the time you will use a package manager to install software, the most common variety of packages are Debian (.deb) and Red Hat (.rpm). Debian style packages are used in distributions such as Debian, Ubuntu, LinuxMint, etc. Red Hat style packages are seen in Red Hat Enterprise Linux, Fedora, CentOS, etc.
What are packages? You may know them as Chrome, Photoshop, etc and they are, but what they really are just lots and lots of files that have been compiled into one. The people (or sometimes a single person) that write this software are known as upstream providers, they compile their code and write up how to get it installed. These upstream providers work on getting out new software and update existing software. When they are ready to release it to the world, they send their package to package maintainers, who handle getting this piece of software in the hands of the users. These package maintainers review, manage and distribute this software in the form of packages.
Exercise
No exercises for this lesson.
Quiz Question
No questions, move along!
Quiz Answer
Package Repositories
Lesson Content
How do packages that get uploaded to the internet somehow end up on our computers? Do you go to the download page of each package you want and click download and install? Well, actually you can do that, but there is something better called package repositories. Repositories are just a central storage location for packages. There are tons of repositories that hold lots of packages and best of all they are all found on the internet, no silly installation disks. Your machine doesn't know where to look for these repositories unless you explicitly tell it where to look.
For example, let's say I want WackyWidgets Software on my machine. Well WackyWidgets manages their own repositories for their widget packages, inside this repository are 10 packages, the CoolWidget package, the SuperWidget package, etc. WackyWidgets hosts this repository at a source link called: http://download.widgets/linux/deb/
Now instead of going to their website to download the package directly, you can tell your machine to find WackyWidgets software from the source link.
Your distribution already comes with pre-approved sources to get packages from and this is how it installs all the base packages you see on your system. On a Debian system, this sources file is the /etc/apt/sources.list file. Your machine will know to look there and check for any source repositories you added.
Exercise
No exercises for this lesson.
Quiz Question
Where is the sources file in a Debian system?
Quiz Answer
/etc/apt/sources.list
tar and gzip
Lesson Content
Before we get into package installation and the different managers, we need to discuss archiving and compressing files, because you will most likely encounter these when you hunt for software on the internet.
You probably already know what a file archive is, you've most likely encountered file types such as .rar and .zip. These are an archive of files, they contain many files inside of them, but they come in this very neat single file known as an archive.
Compressing files with gzip
gzip is program used to compress files in Linux, they end in a .gz extension.
To compress a file down:
$ gzip mycoolfile
To decompress the file:
$ gunzip mycoolfile.gz
Creating archives with tar Unfortunately, gzip can't add multiple files into one archive for us. Luckily we have the tar program which does. When you create an archive using tar, it will have a .tar extension.
$ tar cvf mytarfile.tar mycoolfile1 mycoolfile 2
- c - create
- v - tell the program to be verbose and let us see what it's doing
- f - the filename of the tar file has to come after this option, if you are creating a tar file you'll have to come up with a name
Unpacking archives with tar
To extract the contents of a tar file, use:
$ tar xvf mytarfile.tar
- x - extract
- v - tell the program to be verbose and let us see what it's doing
- f - the file you want to extract
Compressing/uncompressing archives with tar and gzip
Many times you'll see a tar file that has been compressed such as: mycompressedarchive.tar.gz, all you need to do is work outside in, so first remove the compression with gunzip and then you can unpack the tar file. Or you can alternatively use the z option with tar, which just tells it to use the gzip or gunzip utility.
Create a compressed tar file:
$ tar czf myfile.tar.gz
Uncompress and unpack:
$ tar xzf file.tar
If you need help remember this: eXtract all Zee Files!
tar is one of those commands that is so important and yet you never really remember it, relevant xkcd: https://xkcd.com/1168/
Other Utilities
Throughout your journey of Linux, you'll encounter other archive and compression types such as: bzip2, compress, zip, unzip, etc. They are a little less common, but just keep in mind that different utilities will call for different commands.
Exercise
Familiarize yourself with the tar documentation and look at the other options available in the manpage.
Quiz Question
What tar flag is used to create archives?
Quiz Answer
c
Package Dependencies
Lesson Content
Packages very rarely work by themselves, they are most often accompanied by dependencies to help them run. For example, let's say we have a group of restaurants, these restaurants all make different cuisine, however they all get their ingredients from the same farm. Their food is dependent on the farm's supplies, if the farm were to suddenly stop supplying food, well then the restaurants would be in a pretty bad state.
In Linux, these dependencies are often other packages or shared libraries. Shared libraries are libraries of code that other programs want to use and don't want to have to rewrite for themselves. Think of the restaurant again, how much work would it be if every restaurant also farmed their own food? Too much.
We will dig more into shared libraries in the filesystem course, so for now just remember that packages have dependencies to help them run, whether those dependencies are other packages or libraries, if the dependencies aren't there the package will end up in a broken state and most of the time not even install.
Exercise
No exercises for this lesson.
Quiz Question
No questions, move along!
Quiz Answer
rpm and dpkg
Lesson Content
Although most of this course is about package management systems (the Batmans of package management), we mustn't forget about the Robins. Although very useful and reliable, they don't come with that sweet batmobile and utility belt.
Just like .exe is a single executable file, so is .deb and .rpm. You normally wouldn't see these if you use package repositories, but if you directly download packages, you will most likely get them in these popular format. Obviously, they are exclusive to their distributions, .deb for Debian based and .rpm for Red Hat based.
To install these direct packages, you can use the package management commands: rpm and dpkg. These tools are used to install package files, however they will not install the package dependencies, so if your package had 10 dependencies, you would have to install those packages separately and then their dependencies and so on and so forth. As you can see, that was one of the reasons that brought forth the full blown management systems that we will discuss this later.
Keep in mind that there will be countless times when you need to install, query or verify a package with one of these tools, so remember these commands.
Install a package
Debian: $ dpkg -i some_deb_package.deb RPM: $ rpm -i some_rpm_package.rpm
The i stands for install. You can also use the longer format of --install.
Remove a package
Debian: $ dpkg -r some_deb_package.deb RPM: $ rpm -e some_rpm_package.rpm
Debian: r for remove RPM: e for erase
List installed packages
Debian: $ dpkg -l RPM: $ rpm -qa
Debian: l for list RPM: q for query and a for all
Exercise
Find a program that you want to install on your system like Google Chrome and install it using one of these commands.
Quiz Question
What is the package management tool for .deb files?
Quiz Answer
dpkg
yum and apt
Lesson Content
Ah, the Batmans of package management, these systems come with all the fixins to make package installation, removal and changes easier, including installing package dependencies. Two of the most popular management systems is yum and apt. Yum is exclusive to the Red Hat family and apt is exclusively to the Debian family.
Install a package from a repository
Debian: $ apt install package_name RPM: $ yum install package_name
Remove a package
Debian: $ apt remove package_name RPM: $ yum erase package_name
Updating packages for a repository
It's always best practice to update your package repositories so they are up to date before you install and update a package.
Debian: apt update; apt upgrade RPM: yum update
Get information about an installed package
Debian: apt show package_name RPM: yum info package_name
Exercise
Run through each of these package commands and see the output you receive.
Quiz Question
What command is used to show package information on a Debian system?
Quiz Answer
apt show
Compile Source Code
Lesson Content
Often times you will encounter an obscure package that only comes in the form of pure source code. You'll need to use a few commands to get that source code package compiled and installed on your system.
First thing is first, you'll need to have software to install the tools that will allow you to compile source code.
$ sudo apt install build-essential
Once you do that, extract the contents of the package file, most likely a .tar.gz file.
$ tar -xzvf package.tar.gz
Before you do anything, take a look at the README or INSTALL file inside the package. Sometimes there will be specific installation instructions.
Depending on what compile method that the developer used, you'll have to use different commands, such as cmake or something else.
However, most commonly you'll see basic make compilation, so we'll discuss that:
Inside the package contents will be a configure script, this script checks for dependencies on your system and if you are missing anything, you'll see an error and you'll need to fix those dependencies.
$ ./configure
The ./ allows you to execute a script in the current directory.
$ make
Inside of the package contents, there is a file called Makefile that contains rules to building the software. When you run the make command, it looks at this file to build the software.
$ sudo make install
This command actually installs the package, it will copy the correct files to the correct locations on your computer.
If you want to uninstall the package, use:
$ sudo make uninstall
Be wary when using make install, you may not realize how much is actually going on in the background. If you decide to remove this package, you may not actually remove everything because you didn't realize what was added to your system. Instead forget everything about make install that I just explained to you and use the checkinstall command. This command will make a .deb file for you that you can easily install and uninstall.
$ sudo checkinstall
This command will essentially "make install" and build a .deb package and install it. This makes it easier to remove the package later on.
Exercise
Find a source code program (from a trusted site) and install from source.
Quiz Question
What should you use instead of make install ALWAYS?
Quiz Answer
checkinstall
Devices
/dev directory
Lesson Content
When you connect a device to your machine, it generally needs a device driver to function properly. You can interact with device drivers through device files or device nodes, these are special files that look like regular files. Since these device files are just like regular files, you can use programs such as ls, cat, etc to interact with them. These device files are generally stored in the /dev directory. Go ahead and ls the /dev directory on your system, you'll see a large amount of devices files that are on your system.
$ ls /dev
Some of these devices you've already used and interacted with such as /dev/null. Remember when we send output to /dev/null, the kernel knows that this device takes all of our input and just discards it, so nothing gets returned.
In the old days, if you wanted to add a device to your system, you'd add the device file in /dev and then probably forget about it. Well repeat that a couple of times and you can see where there was a problem. The /dev directory would get cluttered with static device files of devices that you've long since upgraded, stopped using, etc. Devices are also assigned device files in the order that the kernel finds them. So if everytime you rebooted your system, the devices could have different device files depending on when they were discovered.
Thankfully we no longer use that method, now we have something that we use to dynamically add and remove devices that are currently being used on the system and we'll be discussing this in the coming lessons.
Exercise
Check out the contents of the /dev directory, do you recognize any familiar devices?
Quiz Question
Where are device files stored on the system?
Quiz Answer
/dev
device types
Lesson Content
Before we chat about how devices are managed, let's actually take a look at some devices.
$ ls -l /dev brw-rw---- 1 root disk 8, 0 Dec 20 20:13 sda crw-rw-rw- 1 root root 1, 3 Dec 20 20:13 null srw-rw-rw- 1 root root 0 Dec 20 20:13 log prw-r--r-- 1 root root 0 Dec 20 20:13 fdata
The columns are as follows from left to right:
- Permissions
- Owner
- Group
- Major Device Number
- Minor Device Number
- Timestamp
- Device Name
Remember in the ls command you can see the type of file with the first bit on each line. Device files are denoted as the following:
- c - character
- b - block
- p - pipe
- s - socket
Character Device
These devices transfer data, but one a character at a time. You'll see a lot of pseudo devices (/dev/null) as character devices, these devices aren't really physically connected to the machine, but they allow the operating system greater functionality.
Block Device
These devices transfer data, but in large fixed-sized blocks. You'll most commonly see devices that utilize data blocks as block devices, such as harddrives, filesystems, etc.
Pipe Device
Named pipes allow two or more processes to communicate with each other, these are similar to character devices, but instead of having output sent to a device, it's sent to another process.
Socket Device
Socket devices facilitate communication between processes, similar to pipe devices but they can communicate with many processes at once.
Device Characterization
Devices are characterized using two numbers, major device number and minor device number. You can see these numbers in the above ls example, they are separated by a comma. For example, let's say a device had the device numbers: 8, 0:
The major device number represents the device driver that is used, in this case 8, which is often the major number for sd block devices. The minor number tells the kernel which unique device it is in this driver class, in this case 0 is used to represent the first device (a).
Exercise
Look at your /dev directory and find out what types of devices you can see.
Quiz Question
What is the symbol for character devices in the ls -l command?
Quiz Answer
c
Device Names
Lesson Content
Here are the most common device names that you will encounter:
SCSI Devices
If you have any sort of mass storage on your machine, chances are it is using the SCSI (pronounced "scuzzy") protocol. SCSI stands for Small Computer System Interface, it is a protocol used for allow communication between disks, printers, scanners and other peripherals to your system. You may have heard of SCSI devices which aren't actually in use in modern systems, however our Linux systems correspond SCSI disks with hard disk drives in /dev. They are represented by a prefix of sd (SCSI disk):
Common SCSI device files:
- /dev/sda - First hard disk
- /dev/sdb - Second hard disk
- /dev/sda3 - Third partition on the first hard disk
Pseudo Devices
As we discussed earlier, pseudo devices aren't really physically connected to your system, most common pseudo devices are character devices:
- /dev/zero - accepts and discards all input, produces a continuous stream of NULL (zero value) bytes
- /dev/null - accepts and discards all input, produces no output
- /dev/random - produces random numbers
PATA Devices
Sometimes in older systems you may see hard drives being referred to with an hd prefix:
- /dev/hda - First hard disk
- /dev/hdd2 - Second partition on 4th hard disk
Exercise
Write to the pseudo devices and see what happens, be careful not to write your disks to those devices!
Quiz Question
What would commonly be the device name for the first partition on the second SCSI disk?
Quiz Answer
sdb1
sysfs
Lesson Content
Sysfs was created long ago to better manage devices on our system that the /dev directory failed to do. Sysfs is a virtual filesystem, most often mounted to the /sys directory. It gives us more detailed information than what we would be able to see in the /dev directory. Both directories /sys and /dev seem to be very similar and they are in some regards, but they do have major differences. Basically, the /dev directory is simple, it allows other programs to access devices themselves, while the /sys filesystem is used to view information and manage the device.
The /sys filesystem basically contains all the information for all devices on your system, such as the manufacturer and model, where the device is plugged in, the state of the device, the hierarchy of devices and more. The files you see here aren't device nodes, so you don't really interact with devices from the /sys directory, rather you are managing devices.
Take a look at the contents of the /sys directory:
pete@icebox:~$ ls /sys/block/sda alignment_offset discard_alignment holders removable sda6 trace bdi events inflight ro size uevent capability events_async power sda1 slaves dev events_poll_msecs queue sda2 stat device ext_range range sda5 subsystem
Exercise
Check out the contents of the /sys directory and see what files are located in there.
Quiz Question
What directory is used to view detailed information on devices?
Quiz Answer
/sys
udev
Lesson Content
Back in the old days and actually today if you really wanted to, you would create device nodes using a command such as:
$ mknod /dev/sdb1 b 8 3
This command will make a device node /dev/sdb1 and it will make it a block device (b) with a major number of 8 and a minor number of 3.
To remove a device, you would simply rm the device file in the /dev directory.
Luckily, we really don't need to do this anymore because of udev. The udev system dynamically creates and removes device files for us depending on whether or not they are connected. There is a udevd daemon that is running on the system and it listens for messages from the kernel about devices connected to the system. Udevd will parse that information and it will match the data with the rules that are specified in /etc/udev/rules.d, depending on those rules it will most likely create device nodes and symbolic links for the devices. You can write your own udev rules, but that is a little out of scope for this lesson. Fortunately, your system already comes with lots of udev rules so you may never need to write your own.
You can also view the udev database and sysfs using the udevadm command. This tool is very useful, but sometimes can get very convoluted, a simple command to view information for a device would be:
$ udevadm info --query=all --name=/dev/sda
Exercise
Run the udevadm command given and check out the input.
Quiz Question
What dynamically adds and removes devices?
Quiz Answer
udev
lsusb, lspci, lssci
Lesson Content
Just like we would use the ls command to list files and directories, we can use similar tools that list information about devices.
Listing USB Devices
$ lsusb
Listing PCI Devices
$ lspci
Listing SCSI Devices
$ lsscsi
Exercise
Try out each of these commands and see the output you receive.
Quiz Question
What command can be used to view usb devices?
Quiz Answer
lsusb
dd
Lesson Content
The dd tool is super useful for converting and copying data. It reads input from a file or data stream and writes it to a file or data stream.
Consider the following command:
$ dd if=/home/pete/backup.img of=/dev/sdb bs=1024
This command is copying the contents of backup.img to /dev/sdb. It will copy the data in blocks of 1024 bytes until there is no more data to be copied.
- if=file - Input file, read from a file instead of standard input
- of=file - Output file, write to a file instead of standard output
- bs=bytes - Block size, it reads and writes this many bytes of data at a time. You can use different size metrics by denoting the size with a k for kilobyte, m for megabyte, etc, so 1024 bytes is 1k
- count=number - Number of blocks to copy.
You will see some dd commands that use the count option, usually with dd if you want to copy a file that is 1 megabyte, you'll usually want to see that file as 1 megabyte when it's done being copied. Let's say you run the following command:
$ dd if=/home/pete/backup.img of=/dev/sdb bs=1M count=2
Our backup.img file is 10M, however, we are saying in this command to copy over 1M 2 times, so only 2M is being copied, leaving our copied data incomplete. Count can come in handy in many situations, but if you are just copying over data, you can pretty much omit count and even bs for that matter. If you really want to optimize your data transfers, then you'll want to start using those options.
dd is extremely powerful, you can use it to make backups of anything, including whole disk drives, restoring disks images, and more. Be careful, that powerful tool can come at a price if you aren't sure what you are doing.
Exercise
Use the dd command to make a backup of your drive and set the output to a .img file.
Quiz Question
What is the dd option for block size?
Quiz Answer
bs
File system
file system heirarchy
file system types
Anatomy of a Disk
Lesson Content
Hard disks can be subdivided into partitions, essentially making multiple block devices. Recall such examples as, /dev/sda1 and /dev/sda2, /dev/sda is the whole disk, but /dev/sda1 is the first partition on that disk. Partitions are extremely useful for separating data and if you need a certain filesystem, you can easily create a partition instead of making the entire disk one filesystem type.
Partition Table
Every disk will have a partition table, this table tells the system how the disk is partitioned. This table tells you where partitions begin and end, which partitions are bootable, what sectors of the disk are allocated to what partition, etc. There are two main partition table schemes used, Master Boot Record (MBR) and GUID Partition Table (GPT).
Partition
Disks are comprised of partitions that help us organize our data. You can have multiple partitions on a disk and they can't overlap each other. If there is space that is not allocated to a partition, then it is known as free space. The types of partitions depend on your partition table. Inside a partition, you can have a filesystem or dedicate a partition to other things like swap (we'll get to that soon).
MBR
- Traditional partition table, was used as the standard
- Can have primary, extended, and logical partitions
- MBR has a limit of four primary partitions
- Additional partitions can be made by making a primary partition into an extended partition (there can only be one extended partition on a disk). Then inside the extended partition you add logical partitions. The logical partitions are used just like any other partition. Silly I know.
- Supports disks up to 2 terabytes
GPT
- GUID Partition Table (GPT) is becoming the new standard for disk partitioning
- Has only one type of partition and you can make many of them
- Each partition has a globally unique ID (GUID)
- Used mostly in conjunction with UEFI based booting (we'll get into details in another course)
Filesystem Structure
We know from our previous lesson that a filesystem is an organized collection of files and directories. In its simplest form, it is comprised of a database to manage files and the actual files themselves, however we're going to go into a little more detail.
- Boot block - This is located in the first few sectors of the filesystem, and it's not really used the by the filesystem. Rather, it contains information used to boot the operating system. Only one boot block is needed by the operating system. If you have multiple partitions, they will have boot blocks, but many of them are unused.
- Super block - This is a single block that comes after the boot block, and it contains information about the filesystem, such as the size of the inode table, size of the logical blocks and the size of the filesystem.
- Inode table - Think of this as the database that manages our files (we have a whole lesson on inodes, so don't worry). Each file or directory has a unique entry in the inode table and it has various information about the file.
- Data blocks - This is the actual data for the files and directories.
Let's take a look at the different partition tables. Below is an example of a partition using the MBR partitioning table (msdos). You can see the primary, extended and logical partitions on the machine.
pete@icebox:~$ sudo parted -l Model: Seagate (scsi) Disk /dev/sda: 21.5GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 1049kB 6860MB 6859MB primary ext4 boot 2 6861MB 21.5GB 14.6GB extended 5 6861MB 7380MB 519MB logical linux-swap(v1) 6 7381MB 21.5GB 14.1GB logical xfs
This example is GPT, using just a unique ID for the partitions.
Model: Thumb Drive (scsi) Disk /dev/sdb: 4041MB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 17.4kB 1000MB 1000MB first 2 1000MB 4040MB 3040MB second
Exercise
Run parted -l on your machine and evaluate your results.
Quiz Question
What partition type is used to create more than 4 partitions in the MBR partitioning scheme?
Quiz Answer
extended
Disk Partitioning
Lesson Content
Let's do some practical stuff with filesytems by working through the process on a USB drive. If you don't have one, no worries, you can still follow along these next couple of lessons.
First we'll need to partition our disk. There are many tools available to do this:
- fdisk - basic command-line partitioning tool, it does not support GPT
- parted - this is a command line tool that supports both MBR and GPT partitioning
- gparted - this is the GUI version of parted
- gdisk - fdisk, but it does not support MBR only GPT
Let's use parted to do our partitioning. Let's say I connect the USB device and we see the device name is /dev/sdb2.
Launch parted
$ sudo parted
You'll be entered in the parted tool, here you can run commands to partition your device.
Select the device
select /dev/sdb2
To select the device you'll be working with, select it by its device name.
View current partition table
(parted) print Model: Seagate (scsi) Disk /dev/sda: 21.5GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 1049kB 6860MB 6859MB primary ext4 boot 2 6861MB 21.5GB 14.6GB extended 5 6861MB 7380MB 519MB logical linux-swap(v1) 6 7381MB 21.5GB 14.1GB logical xfs
Here you will see the available partitions on the device. The start and end points are where the partitions take up space on the hard drive, you'll want to find a good start and end location for your partitions.
Partition the device
mkpart primary 123 4567
Now just choose a start and end point and make the partition, you'll need to specify the type of partition depending on what table you used.
Resize a partition
You can also resize a partition if you don't have any space.
resize 2 1245 3456
Select the partition number and then the start and end points of where you want to resize it to.
Parted is a very powerful tool and you should be careful when partitioning your disks.
Exercise
Partition a USB drive with half of the drive as ext4 and the other half as free space.
Quiz Question
What is the parted command to make a partition?
Quiz Answer
mkpart
Creating Filesystems
Lesson Content
Now that you've actually partitioned a disk, let's create a filesystem!
$ sudo mkfs -t ext4 /dev/sdb2
Simple as that! The mkfs (make filesystem) tool allows us to specify the type of filesystem we want and where we want it. You'll only want to create a filesystem on a newly partitioned disk or if you are repartitioning an old one. You'll most likely leave your filesystem in a corrupted state if you try to create one on top of an existing one.
Exercise
Make an ext4 filesystem on the USB drive.
Quiz Question
What command is used to create a filesystem?
Quiz Answer
mkfs
mount and umount
Lesson Content
Before you can view the contents of your filesystem, you will have to mount it. To do that I'll need the device location, the filesystem type and a mount point, the mount point is a directory on the system where the filesystem is going to be attached. So we basically want to mount our device to a mount point.
First create the mount point, in our case mkdir /mydrive
$ sudo mount -t ext4 /dev/sdb2 /mydrive
Simple as that! Now when we go to /mydrive we can see our filesystem contents, the -t specifies the type of filesystem, then we have the device location, then the mount point.
To unmount a device from a mount point:
$ sudo umount /mydrive or $ sudo umount /dev/sdb2
Remember that the kernel names devices in the order it finds them. What if our device name changes for some reason after we mount it? Well fortunately, you can use a device's universally unique ID (UUID) instead of a name.
To view the UUIDS on your system for block devices:
pete@icebox:~$ sudo blkid /dev/sda1: UUID="130b882f-7d79-436d-a096-1e594c92bb76" TYPE="ext4" /dev/sda5: UUID="22c3d34b-467e-467c-b44d-f03803c2c526" TYPE="swap" /dev/sda6: UUID="78d203a0-7c18-49bd-9e07-54f44cdb5726" TYPE="xfs"
We can see our device names, their corresponding filesystem types and their UUIDs. Now when we want to mount something, we can use:
$ sudo mount UUID=130b882f-7d79-436d-a096-1e594c92bb76 /mydrive
Most of the time you won't need to mount devices via their UUIDs, it's much easier to use the device name and often times the operating system will know to mount common devices like USB drives. If you need to automatically mount a filesystem at startup though like if you added a secondary hard drive, you'll want to use the UUID and we'll go over that in the next lesson.
Exercise
Look at the manpage for mount and umount and see what other options you can use.
Quiz Question
What command is used to attach a filesystem?
Quiz Answer
mount
/etc/fstab
Lesson Content
When we want to automatically mount filesystems at startup we can add them to a file called /etc/fstab (pronounced "eff es tab" not "eff stab") short for filesystem table. This file contains a permanent list of filesystems that are mounted.
pete@icebox:~$ cat /etc/fstab UUID=130b882f-7d79-436d-a096-1e594c92bb76 / ext4 relatime,errors=remount-ro 0 1 UUID=78d203a0-7c18-49bd-9e07-54f44cdb5726 /home xfs relatime 0 2 UUID=22c3d34b-467e-467c-b44d-f03803c2c526 none swap sw 0 0
Each line represents one filesystem, the fields are:
- UUID - Device identifier
- Mount point - Directory the filesystem is mounted to
- Filesystem type
- Options - other mount options, see manpage for more details
- Dump - used by the dump utility to decide when to make a backup, you should just default to 0
- Pass - Used by fsck to decide what order filesystems should be checked, if the value is 0, it will not be checked
To add an entry, just directly modify the /etc/fstab file using the entry syntax above. Be careful when modifying this file, you could potentially make your life a little harder if you mess up.
Exercise
Add the USB drive we've been working on as a entry in /etc/fstab, when you reboot you should still see it mounted.
Quiz Question
What file is used to define how filesystems should be mounted?
Quiz Answer
/etc/fstab
swap
Lesson Content
In our previous example, I showed you how to see your partition table, let's revisit that example, more specifically this line:
Number Start End Size Type File system Flags 5 6861MB 7380MB 519MB logical linux-swap(v1)
What is this swap partition? Well swap is what we used to allocate virtual memory to our system. If you are low on memory, the system uses this partition to "swap" pieces of memory of idle processes to the disk, so you're not bogged for memory.
Using a partition for swap space
Let's say we wanted to set our partition of /dev/sdb2 to be used for swap space.
- First make sure we don't have anything on the partition
- Run: mkswap /dev/sdb2 to initialize swap areas
- Run: swapon /dev/sdb2 this will enable the swap device
- If you want the swap partition to persist on bootup, you need to add an entry to the /etc/fstab file. sw is the filesystem type that you'll use.
- To remove swap: swapoff /dev/sdb2
Generally you should allocate about twice as much swap space as you have memory. But modern systems today are usually pretty powerful enough and have enough RAM as it is.
Exercise
Partition the free space in the USB drive for swap space.
Quiz Question
What is the command to enable swap space on a device?
Quiz Answer
swapon
Disk Usage
Lesson Content
There are a few tools you can used to see the utilization of your disks:
pete@icebox:~$ df -h Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 6.2G 2.3G 3.6G 40% /
The df command shows you the utilization of your currently mounted filesystems. The -h flag gives you a human readable format. You can see what the device is, and how much capacity is used and available.
Let's say your disk is getting full and you want to know what files or directories are taking up that space, for that you can use the du command.
$ du -h
This shows you the disk usage of the current directory you are in, you can take a peek at the root directory with du -h / but that can get a little cluttered.
Both of these commands are so similar in syntax it can be hard to remember which one to use, to check how much of your disk is free use df. To check disk usage, use du.
Exercise
Look at your disk usage and free space with both du and df.
Quiz Question
What command is use to show how much space is free on your disk?
Quiz Answer
df
Filesystem Repair
Lesson Content
Sometimes our filesystem isn't always in the best condition, if we have a sudden shutdown, our data can become corrupt. It's up to the system to try to get us back in a working state (although we sure can try ourselves).
The fsck (filesystem check) command is used to check the consistency of a filesystem and can even try to repair it for us. Usually when you boot up a disk, fsck will run before your disk is mounted to make sure everything is ok. Sometimes though, your disk is so bad that you'll need to manually do this. However, be sure to do this while you are in a rescue disk or somewhere where you can access your filesystem without it being mounted.
$ sudo fsck /dev/sda
Exercise
Look at the manpage for fsck to see what else it can do.
Quiz Question
What command is used to check the integrity of a filesystem?
Quiz Answer
fsck
Inodes
Lesson Content
Remember how our filesystem is comprised of all our actual files and a database that manages these files? The database is known as the inode table.
What is an inode?
An inode (index node) is an entry in this table and there is one for every file. It describes everything about the file, such as:
- File type - regular file, directory, character device, etc
- Owner
- Group
- Access permissions
- Timestamps - mtime (time of last file modification), ctime (time of last attribute change), atime (time of last access)
- Number of hardlinks to the file
- Size of the file
- Number of blocks allocated to the file
- Pointers to the data blocks of the file - most important!
Basically inodes store everything about the file, except the filename and the file itself!
When are inodes created?
When a filesystem is created, space for inodes is allocated as well. There are algorithms that take place to determine how much inode space you need depending on the volume of the disk and more. You've probably at some point in your life seen errors for out of disk space issues. Well the same can occur for inodes as well (although less common), you can run out of inodes and therefore be unable to create more files. Remember data storage depends on both the data and the database (inodes).
To see how many inodes are left on your system, use the command df -i
Inode information
Inodes are identified by numbers, when a file gets created it is assigned an inode number, the number is assigned in sequential order. However, you may sometimes notice when you create a new file, it gets an inode number that is lower than others, this is because once inodes are deleted, they can be reused by other files. To view inode numbers run ls -li:
pete@icebox:~$ ls -li 140 drwxr-xr-x 2 pete pete 6 Jan 20 20:13 Desktop 141 drwxr-xr-x 2 pete pete 6 Jan 20 20:01 Documents
The first field in this command lists the inode number.
You can also see detailed information about a file with stat, it tells you information about the inode as well.
pete@icebox:~$ stat ~/Desktop/ File: ‘/home/pete/Desktop/’ Size: 6 Blocks: 0 IO Block: 4096 directory Device: 806h/2054d Inode: 140 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 1000/ pete) Gid: ( 1000/ pete) Access: 2016-01-20 20:13:50.647435982 -0800 Modify: 2016-01-20 20:13:06.191675843 -0800 Change: 2016-01-20 20:13:06.191675843 -0800 Birth: -
How do inodes locate files?
We know our data is out there on the disk somewhere, unfortunately it probably wasn't stored sequentially, so we have to use inodes. Inodes point to the actual data blocks of your files. In a typical filesystem (not all work the same), each inode contains 15 pointers, the first 12 pointers point directly to the data blocks. The 13th pointer, points to a block containing pointers to more blocks, the 14th pointer points to another nested block of pointers, and the 15th pointer points yet again to another block of pointers! Confusing, I know! The reason this is done this way is to keep the inode structure the same for every inode, but be able to reference files of different sizes. If you had a small file, you could find it quicker with the first 12 direct pointers, larger files can be found with the nests of pointers. Either way the structure of the inode is the same.
Exercise
Observe some inode numbers for different files, which ones are usually created first?
Quiz Question
How do you see how many inodes are left on your system?
Quiz Answer
df -i
symlinks
Lesson Content
Let's use a previous example of inode information:
pete@icebox:~$ ls -li 140 drwxr-xr-x 2 pete pete 6 Jan 20 20:13 Desktop 141 drwxr-xr-x 2 pete pete 6 Jan 20 20:01 Documents
You may have noticed that we've been glossing over the third field in the ls command, that field is the link count. The link count is the total number of hard links a file has, well that doesn't mean anything to you right now. So let's discuss links first.
Symlinks
In the Windows operating system, there are things known as shortcuts, shortcuts are just aliases to other files. If you do something to the original file, you could potentially break the shortcut. In Linux, the equivalent of shortcuts are symbolic links (or soft links or symlinks). Symlinks allow us to link to another file by its filename. Another type of links found in Linux are hardlinks, these are actually another file with a link to an inode. Let's see what I mean in practice starting with symlinks.
pete@icebox:~/Desktop$ echo 'myfile' > myfile pete@icebox:~/Desktop$ echo 'myfile2' > myfile2 pete@icebox:~/Desktop$ echo 'myfile3' > myfile3 pete@icebox:~/Desktop$ ln -s myfile myfilelink pete@icebox:~/Desktop$ ls -li total 12 151 -rw-rw-r-- 1 pete pete 7 Jan 21 21:36 myfile 93401 -rw-rw-r-- 1 pete pete 8 Jan 21 21:36 myfile2 93402 -rw-rw-r-- 1 pete pete 8 Jan 21 21:36 myfile3 93403 lrwxrwxrwx 1 pete pete 6 Jan 21 21:39 myfilelink -> myfile
You can see that I've made a symbolic link named myfilelink that points to myfile. Symbolic links are denoted by ->. Notice how I got a new inode number though, symlinks are just files that point to filenames. When you modify a symlink, the file also gets modified. Inode numbers are unique to filesystems, you can't have two of the same inode number in a single filesystem, meaning you can't reference a file in a different filesystem by its inode number. However, if you use symlinks they do not use inode numbers, they use filenames, so they can be referenced across different filesystems.
Hardlinks
Let's see an example of a hardlink:
pete@icebox:~/Desktop$ ln myfile2 myhardlink pete@icebox:~/Desktop$ ls -li total 16 151 -rw-rw-r-- 1 pete pete 7 Jan 21 21:36 myfile 93401 -rw-rw-r-- 2 pete pete 8 Jan 21 21:36 myfile2 93402 -rw-rw-r-- 1 pete pete 8 Jan 21 21:36 myfile3 93403 lrwxrwxrwx 1 pete pete 6 Jan 21 21:39 myfilelink -> myfile 93401 -rw-rw-r-- 2 pete pete 8 Jan 21 21:36 myhardlink
A hardlink just creates another file with a link to the same inode. So if I modified the contents of myfile2 or myhardlink, the change would be seen on both, but if I deleted myfile2, the file would still be accessible through myhardlink. Here is where our link count in the ls command comes into play. The link count is the number of hardlinks that an inode has, when you remove a file, it will decrease that link count. The inode only gets deleted when all hardlinks to the inode have been deleted. When you create a file, it's link count is 1 because it is the only file that is pointing to that inode. Unlike symlinks, hardlinks do not span filesystems because inodes are unique to the filesystem.
Creating a symlink
$ ln -s myfile mylink
To create a symbolic link, you use the ln command with -s for symbolic and you specific a target file and then a link name.
Creating a hardlink
$ ln somefile somelink
Similar to a symlink creation, except this time you leave out the -s.
Exercise
Play around with making symlinks and hardlinks, delete a couple and see what happens.
Quiz Question
What is the command used to make a symlink?
Quiz Answer
ln -s
Booting
Boot Process Overview
Lesson Content
Now that we've gotten a pretty good grasp at some of the important components of Linux, let's piece them altogether by learning about how the system boots. When you turn on your machine, it does some neat things like show you the logo screen, run through some different messages and then at the end you're prompted with a login window. Well there is actually a ton of stuff happening between when you push the power button to when you login and we'll discuss those in this course.
The Linux boot process can be broken down in 4 simple stages:
1. BIOS
The BIOS (stands for "Basic Input/Output System") initializes the hardware and makes sure with a Power-on self test (POST) that all the hardware is good to go. The main job of the BIOS is to load up the bootloader.
2. Bootloader
The bootloader loads the kernel into memory and then starts the kernel with a set of kernel parameters. One of the most common bootloaders is GRUB, which is a universal Linux standard.
3. Kernel
When the kernel is loaded, it immediately initializes devices and memory. The main job of the kernel is to load up the init process.
4. Init
Remember the init process is the first process that gets started, init starts and stops essential service process on the system. There are three major implementations of init in Linux distributions. We will go over them briefly and then dive into them in another course.
There it is, the (very) simple explanation of the Linux boot process. We will go into more detail about these stages in the next lessons.
Exercise
Reboot your system and see if you can spot each step as your machine boots up.
Quiz Question
What is the last stage in the Linux boot process?
Quiz Answer
init
Boot Process: BIOS
Lesson Content
BIOS
The first step in the Linux boot process is the BIOS which performs system integrity checks. The BIOS is a firmware that comes most common in IBM PC compatible computers, the dominant type of computers out there today. You've probably used the BIOS firmware to change the boot order of your harddisks, check system time, your machine's mac address, etc. The BIOS's main goal is to find the system bootloader.
So once the BIOS boots up the hard drive, it searches for the boot block to figure out how to boot up the system. Depending on how you partition your disk, it will look to the master boot record (MBR) or GPT. The MBR is located in the first sector of the hard drive, the first 512 bytes. The MBR contains the code to load another program somewhere on the disk, this program in turn actually loads up our bootloader.
Now if you partitioned your disk with GPT, the location of the bootloader changes a bit.
UEFI
There is another way to boot up your system instead of using BIOS and that is with UEFI (stands for "Unified extensible firmware interface"). UEFI was designed to be successor to BIOS, most hardware out there today comes with UEFI firmware built in. Macintosh machines have been using EFI booting for years now and Windows has mostly moved all of their stuff over to UEFI booting. The GPT format was intended for use with EFI. You don't necessarily need EFI if you are booting a GPT disk. The first sector of a GPT disk is reserved for a "protective MBR" to make it possible to boot a BIOS-based machine.
UEFI stores all the information about startup in an .efi file. This file is stored on a special partition called EFI system partition on the hardware. Inside this partition it will contain the bootloader. UEFI comes with many improvements from the traditional BIOS firmware. However, since we are using Linux, the majority of us are using BIOS. So all of these lessons will be going along with that pretense.
Exercise
Go into your BIOS menu and see if you have UEFI booting enabled.
Quiz Question
What does the BIOS load?
Quiz Answer
bootloader
Boot Process: Bootloader
Lesson Content
The bootloader's main responsibilities are:
- Booting into an operating system, it can also be used to boot to non-Linux operating systems
- Select a kernel to use
- Specify kernel parameters
The most common bootloader for Linux is GRUB, you are most likely using it on your system. There are many other bootloaders that you can use such as LILO, efilinux, coreboot, SYSLINUX and more. However, we will just be working with GRUB as our bootloader.
So we know that the bootloader's main goal is to load up the kernel, but where does it find the kernel? To find it, we will need to look at our kernel parameters. The parameters can be found by going into the GRUB menu on startup using the 'e' key. If you don't have GRUB no worries, we'll go through the boot parameters that you will see:
- initrd - Specifies the location of initial RAM disk (we'll talk more about this in the next lesson).
- BOOT_IMAGE - This is where the kernel image is located
- root - The location of the root filesystem, the kernel searches inside this location to find init. It is often represented by it's UUID or the device name such as /dev/sda1.
- ro - This parameter is pretty standard, it mounts the fileystem as read-only mode.
- quiet - This is added so that you don't see display messages that are going on in the background during boot.
- splash - This lets the splash screen be shown.
Exercise
If you have GRUB as your bootloader, go into the GRUB menu with 'e' and take a look at the settings.
Quiz Question
What kernel parameter makes it so you don't see bootup messages?
Quiz Answer
quiet
Boot Process: Kernel
Lesson Content
So now that our bootloader has passed on the necessary parameters, let's see how it get's started:
Initrd vs Initramfs
There is a bit of a chicken and egg problem when we talk about the kernel bootup. The kernel manages our systems hardware, however not all drivers are available to the kernel during bootup. So we depend on a temporary root filesystem that contains just the essential modules that the kernel needs to get to the rest of the hardware. In older versions of Linux, this job was given to the initrd (initial ram disk). The kernel would mount the initrd, get the necessary bootup drivers, then when it was done loading everything it needed, it would replace the initrd with the actual root filesystem. These days, we have something called the initramfs, this is a temporary root filesystem that is built into the kernel itself to load all the necessary drivers for the real root filesystem, so no more locating the initrd file.
Mounting the root filesystem
Now the kernel has all the modules it needs to create a root device and mount the root partition. Before you go any further though, the root partition is actually mounted in read-only mode first so that fsck can run safely and check for system integrity. Afterwards it remounts the root filesystem in read-write mode. Then the kernel locates the init program and executes it.
Exercise
No exercises for this lesson.
Quiz Question
What is used in modern systems to load up a temporary root filesystem?
Quiz Answer
initramfs
Boot Process: Init
Lesson Content
We've discussed init in previous lessons and know that it is the first process that gets started and it starts all the other essential services on our system. But how?
There are actually three major implementations of init in Linux:
System V init (sysv)
This is the traditional init system. It sequentially starts and stops processes, based on startup scripts. The state of the machine is denoted by runlevels, each runlevel starts or stops a machine in a different way.
Upstart
This is the init you'll find on older Ubuntu installations. Upstart uses the idea of jobs and events and works by starting jobs that performs certain actions in response to events.
Systemd
This is the new standard for init, it is goal oriented. Basically you have a goal that you want to achieve and systemd tries to satisfy the goal's dependencies to complete the goal.
We have an entire course on Init systems where we will dive into each of these systems in more detail.
Exercise
No exercises for this lesson.
Quiz Question
What is the newest standard for init?
Quiz Answer
systemd
Kernel
Overview of the Kernel
Lesson Content
As you've learned up to this point, the kernel is the core of the operating system. We've talked about the other parts of the operating system but have yet to show how they all work together. The Linux operating system can be organized into three different levels of abstraction.
The most basic level is hardware, this includes our CPU, memory, hard disks, networking ports, etc. The physical layer that actually computes what our machine is doing.
The next level is the kernel, which handles process and memory management, device communication, system calls, sets up our filesystem, etc. The kernel's job is to talk to the hardware to make sure it does what we want our processes to do.
And the level that you are familiar with is the user space, the user space includes the shell, the programs that you run, the graphics, etc.
In this course, we'll be focusing on the kernel and learning its intricacies.
Exercise
No exercises for this lesson.
Quiz Question
What level of the operating system manages devices?
Quiz Answer
kernel
Privilege Levels
Lesson Content
The next few lessons get pretty theoretical, so if you're looking for some practical stuff you can skip ahead and come back later.
Why do we have different abstraction layers for user space and kernel? Why can't you combine both powers into one layer? Well there is a very good reason why these two layers exist separately. They both operate in different modes, the kernel operates in kernel mode and the user space operates in user mode.
In kernel mode, the kernel has complete access to the hardware, it controls everything. In user space mode, there is a very small amount of safe memory and CPU that you are allowed to access. Basically, when we want to do anything that involves hardware, reading data from our disks, writing data to our disks, controlling our network, etc, it is all done in kernel mode. Why is this necessary? Imagine if your machine was infected with spyware, you wouldn't want it to be able to have direct access to your system's hardware. It can access all your data, your webcam, etc. and that's no good.
These different modes are called privilege levels (aptly named for the levels of privilege you get) and are often described as protection rings. To make this picture easier to paint, let's say you find out that Britney Spears is in town at your local klerb, she's protected by her groupies, then her personal bodyguards, then the bouncer outside the klerb. You want to get her autograph (because why not?), but you can't get to her because she is heavily protected. The rings work the same way, the innermost ring corresponds to the highest privilege level. There are two main levels or modes in an x86 computer architecture. Ring #3 is the privilege that user mode applications run in, Ring #0 is the privilege that the kernel runs in. Ring #0 can execute any system instruction and is given full trust. So now that we know how those privilege levels work, how are we able to write anything to our hardware? Won't we always be in a different mode than the kernel?
The answer is with system calls, system calls allow us to perform a privileged instruction in kernel mode and then switch back to user mode.
Exercise
No exercises for this lesson.
Quiz Question
What ring number has the highest privileges?
Quiz Answer
0
System Calls
Lesson Content
Remember Britney in the previous lesson? Let's say we want to see her and get some drinks together, how do we get from standing outside in the crowds of people to inside her innermost circle? We would use system calls. System calls are like the VIP passes that get you to a secret side door that leads directly to Britney.
System calls (syscall) provide user space processes a way to request the kernel to do something for us. The kernel makes certain services available through the system call API. These services allow us to read or write to a file, modify memory usage, modify our network, etc. The amount of services are fixed, so you can't be adding system calls nilly willy, your system already has a table of what system calls exist and each system call has a unique ID.
I won't get into specifics of system calls, as that will require you to know a bit of C, but the basics is that when you call a program like ls, the code inside this program contains a system call wrapper (so not the actual system call yet). Inside this wrapper it invokes the system call which will execute a trap, this trap then gets caught by the system call handler and then references the system call in the system call table. Let's say we are trying to call the stat() system call, it's identified by a syscall ID and the purpose of the stat() system call is to query the status of a file. Now remember, you were running the ls program in non-privilege mode. So now it sees you're trying to make a syscall, it then switches you over to kernel mode, there it does lots of things but most importantly it looks up your syscall number, finds it in a table based on the syscall ID and then executes the function you wanted to run. Once it's done, it will return back to user mode and your process will receive a return status if it was successful or if it had an error. The inner workings of syscalls get really detailed, I would recommend looking at information online if you want to learn more.
You can actually view the system calls that a process makes with the strace command. The strace command is useful for debugging how a program executed.
$ strace ls
Exercise
No exercises for this lesson.
Quiz Question
What is used to switch from user mode to kernel mode?
Quiz Answer
system call
Kernel Installation
Lesson Content
Ok, now that we've got all that boring stuff out of the way, let's talk about actually installing and modifying kernels. You can install multiple kernels on your system, remember in our lesson on the boot process? In our GRUB menu we can choose which kernel to boot to.
To see what kernel version you have on your system, use the following command:
$ uname -r 3.19.0-43-generic
The uname command prints system information, the -r command will print out all of the kernel release version.
You can install the Linux kernel in different ways, you can download the source package and compile from source or you can install it using package management tools.
$ sudo apt install linux-generic-lts-vivid
and then just reboot into the kernel you installed. Simple right? Kind of, you'll need to also install other linux packages such as the linux-headers, linux-image-generic, etc). You can also specify the version number, so the above command can look like, sudo apt install 3.19.0-43-generic
Alternatively, if you just want the updated kernel version, just use dist-upgrade, it performs upgrades to all package on your system:
$ sudo apt dist-upgrade
There are many different kernel versions, some are used as LTS (long term support), some are the latest and greatest, the compatibility may be very different between kernel versions so you may want to try out different kernels.
Exercise
- Find out what kernel version you have.
- Research the different versions of kernels available
Quiz Question
How do you see the kernel version of your system?
Quiz Answer
uname -r
Kernel Location
Lesson Content
What happens when you install a new kernel? Well it actually adds a couple of files to your system, these files are usually added to the /boot directory.
You will see multiple files for different kernel versions:
- vmlinuz - this is the actual linux kernel
- initrd - as we've discussed before, the initrd is used as a temporary file system, used before loading the kernel
- System.map - symbolic lookup table
- config - kernel configuration settings, if you are compiling your own kernel, you can set which modules can be loaded
If your /boot directory runs out of space, you can always delete old versions of these files or just use a package manager, but be careful when doing maintenance in this directory and don't accidentally delete the kernel you are using.
Exercise
Go into your boot directory and see what files are in there.
Quiz Question
What is the kernel image called in /boot?
Quiz Answer
vmlinuz
Kernel Modules
Lesson Content
Let's say I have a sweet ride, I invest a lot of time and money into it. I add a spoiler, hitch, bike rack and other random things. These components don't actually change the core functionality of the car and I can remove and add them very easily. The kernel uses the same concept with kernel modules.
The kernel in itself is a monolithic piece of software, when we want to add support for a new type of keyboard, we don't write this code directly into the kernel code. Just as we wouldn't meld a bike rack to our car (well maybe some people would do that). Kernel modules are pieces of code that can be loaded and unloaded into the kernel on demand. They allow us to extend the functionality of the kernel without actually adding to the core kernel code. We can also add modules and not have to reboot the system (in most cases).
View a list of currently loaded modules
$ lsmod
Load a module
$ sudo modprobe bluetooth
Modprobe loads tries the module from /lib/modules/(kernel version)/kernel/drivers. Kernel modules may also have dependencies, modprobe loads our module dependencies if they are not already loaded.
Remove a module
$ sudo modprobe -r bluetooth
Load on bootup
You can also load modules during system boot, instead of temporarily loading them with modprobe (which will be unloaded when you reboot). Just modify the /etc/modprobe.d directory and add a configuration file in it like so:
pete@icebox:~$ /etc/modprobe.d/peanutbutter.conf options peanut_butter type=almond
A bit of a outlandish example, but if you had a module named peanut_butter and you wanted to add a kernel parameter for type=almond, you can have it load on startup using this configuration file. Also note that kernel modules have their own kernel parameters so you'll want to read about the module specifically to find out more.
Do not load on bootup
You can also make sure a module does not load on bootup by adding a configuration file like so:
pete@icebox:~$ /etc/modprobe.d/peanutbutter.conf blacklist peanut_butter
Exercise
Unload your bluetooth module with modprobe and see what happens. How will you fix this?
Quiz Question
What command is used to unload a module?
Quiz Answer
modprobe -r
init
System V Overview
Lesson Content
The main purpose of init is to start and stop essential processes on the system. There are three major implementations of init in Linux, System V, Upstart and systemd. In this lesson, we're going to go over the most traditional version of init, System V init or Sys V (pronounced as 'System Five').
To find out if you are using the Sys V init implementation, if you have an /etc/inittab file you are most likely running Sys V.
Sys V starts and stops processes sequentially, so let's say if you wanted to start up a service named foo-a, well before foo-b can work, you have to make sure foo-a is already running. Sys V does that with scripts, these scripts start and stop services for us, we can write our own scripts or most of the time use the ones that are already built in the operating system and are used to load essential services.
The pros of using this implementation of init, is that it's relatively easy to solve dependencies, since you know foo-a comes before foo-b, however performance isn't great because usually one thing is starting or stopping at a time.
When using Sys V, the state of the machine is defined by runlevels which are set from 0 to 6. These different modes will vary depending on the distribution, but most of the time will look like the following:
- 0: Shutdown
- 1: Single User Mode
- 2: Multiuser mode without networking
- 3: Multiuser mode with networking
- 4: Unused
- 5: Multiuser mode with networking and GUI
- 6: Reboot
When your system starts up, it looks to see what runlevel you are in and executes scripts located inside that runlevel configuration. The scripts are located in /etc/rc.d/rc[runlevel number].d/ or /etc/init.d. Scripts that start with S(start) or K(kill) will run on startup and shutdown, respectively. The numbers next to these characters are the sequence they run in.
For example:
pete@icebox:/etc/rc.d/rc0.d$ ls K10updates K80openvpn
We see when we switch to runlevel 0 or shutdown mode, our machine will try to run a script to kill the updates services and then openvpn. To find out what runlevel your machine is booting into, you can see the default runlevel in the /etc/inittab file. You can also change your default runlevel in this file as well.
One thing to note, System V is slowly getting replaced, maybe not today, or even years from now. However, you may see runlevels come up in other init implementations, this is primarily to support those services that are only started or stopped using System V init scripts.
Exercise
If you are running System V, change the default runlevel of your machine to something else and see what happens.
Quiz Question
What runlevel is usually used for shutdown?
Quiz Answer
0
System V Service
Lesson Content
There are many command line tools you can use to manage Sys V services.
List services
$ service --status-all
Start a service
$ sudo service networking start
Stop a service
$ sudo service networking stop
Restart a service
$ sudo service networking restart
These commands aren't specific to Sys V init systems, you can use these commands to manage Upstart services as well. Since Linux is trying to move away from the more traditional Sys V init scripts, there are still things in place to help that transition.
Exercise
Manage a couple of services and change their states, what do you observe?
Quiz Question
What is the command to stop a service named peanut with Sys V?
Quiz Answer
sudo service peanut stop
Upstart Overview
Lesson Content
Upstart was developed by Canonical, so it was the init implementation on Ubuntu for a while, however on modern Ubuntu installations systemd is now used. Upstart was created to improve upon the issues with Sys V, such as the strict startup processes, blocking of tasks, etc. Upstart's event and job driven model allow it to respond to events as they happen.
To find out if you are using Upstart, if you have a /usr/share/upstart directory that's a pretty good indicator.
Jobs are the actions that Upstart performs and events are messages that are received from other processes to trigger jobs. To see a list of jobs and their configuration:
pete@icebox:~$ ls /etc/init acpid.conf mountnfs.sh.conf alsa-restore.conf mtab.sh.conf alsa-state.conf networking.conf alsa-store.conf network-interface.conf anacron.conf network-interface-container.conf
Inside these job configurations, it'll include information on how to start jobs and when to start jobs.
For example, in the networking.conf file, it could say something as simple as:
start on runlevel [235] stop on runlevel [0]
This means that it will start setting up networking on runlevel 2, 3 or 5 and will stop networking on runlevel 0. There are many ways to write the configuration file and you'll discover that when you look at the different job configurations available.
The way that Upstart works is that:
- First, it loads up the job configurations from /etc/init
- Once a startup event occurs, it will run jobs triggered by that event.
- These jobs will make new events and then those events will trigger more jobs
- Upstart continues to do this until it completes all the necessary jobs
Exercise
If you are running Upstart, see if you can make sense of the job configurations in /etc/init.
Quiz Question
What is the init implementation that is used by Ubuntu?
Quiz Answer
upstart
systemd overview
systemd goals
power states
Process utilization
Tracking processes: top
Lesson Content
In this course, we'll go over how to read and analyze the resource utilization on your system, this lesson shows some great tools to use when you need to track what a process is doing.
top
We've discussed top before, but we're going to dig into the specifics of what it's actually displaying. Remember top is the tool we used to get a real time view of the system utilization by our processes:
top - 18:06:26 up 6 days, 4:07, 2 users, load average: 0.92, 0.62, 0.59 Tasks: 389 total, 1 running, 387 sleeping, 0 stopped, 1 zombie %Cpu(s): 1.8 us, 0.4 sy, 0.0 ni, 97.6 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 32870888 total, 27467976 used, 5402912 free, 518808 buffers KiB Swap: 33480700 total, 39892 used, 33440808 free. 19454152 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6675 patty 20 0 1731472 520960 30876 S 8.3 1.6 160:24.79 chrome 6926 patty 20 0 935888 163456 25576 S 4.3 0.5 5:28.13 chrome
Let's go over what this output means, you don't have to memorize this, but come back to this when you need a reference.
1st line: This is the same information you would see if you ran the uptime command (more to come)
The fields are from left to right:
- Current time
- How long the system has been running
- How many users are currently logged on
- System load average (more to come)
2nd line: Tasks that are running, sleeping, stopped and zombied
3rd line: Cpu information
- us: user CPU time - Percentage of CPU time spent running users’ processes that aren’t niced.
- sy: system CPU time - Percentage of CPU time spent running the kernel and kernel processes
- ni: nice CPU time - Percentage of CPU time spent running niced processes
- id: CPU idle time - Percentage of CPU time that is spent idle
- wa: I/O wait - Percentage of CPU time that is spent waiting for I/O. If this value is low, the problem probably isn’t disk or network I/O
- hi: hardware interrupts - Percentage of CPU time spent serving hardware interrupts
- si: software interrupts - Percentage of CPU time spent serving software interrupts
- st: steal time - If you are running virtual machines, this is the percentage of CPU time that was stolen from you for other tasks
4th and 5th line: Memory Usage and Swap Usage
Processes List that are Currently in Use
- PID: Id of the process
- USER: user that is the owner of the process
- PR: Priority of process
- NI: The nice value
- VIRT: Virtual memory used by the process
- RES: Physical memory used from the process
- SHR: Shared memory of the process
- S: Indicates the status of the process: S=sleep, R=running, Z=zombie,D=uninterruptible,T=stopped
- %CPU - this is the percent of CPU used by this process
- %MEM - percentage of RAM used by this process
- TIME+ - total time of activity of this process
- COMMAND - name of the process
You can also specify a process ID if you just want to track certain processes:
$ top -p 1
Exercise
Play around with the top command and see what processes are using the most resources.
Quiz Question
What command displays the same output as the first line in top?
Quiz Answer
uptime
lsof and fuser
Lesson Content
Let's say you plugged in a USB drive and starting working on some files, once you were done, you go and unmount the USB device and you're getting an error "Device or Resource Busy". How would you find out which files in the USB drive are still in use? There are actually two tools you can use for this:
lsof
Remember files aren't just text files, images, etc, they are everything on the system, disks, pipes, network sockets, devices, etc. To see what is in use by a process, you can use the lsof command (short for "list open files") this will show you a list of all the open files and their associated process.
pete@icebox:~$ lsof . COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lxsession 1491 pete cwd DIR 8,6 4096 131 . update-no 1796 pete cwd DIR 8,6 4096 131 . nm-applet 1804 pete cwd DIR 8,6 4096 131 . indicator 1809 pete cwd DIR 8,6 4096 131 . xterm 2205 pete cwd DIR 8,6 4096 131 . bash 2207 pete cwd DIR 8,6 4096 131 . lsof 5914 pete cwd DIR 8,6 4096 131 . lsof 5915 pete cwd DIR 8,6 4096 131 .
Now I can see what processes are currently holding the device/file open. In our USB example, you can also kill these processes so we can unmount this pesky drive.
fuser
Another way to track a process is the fuser command (short for "file user"), this will show you information about the process that is using the file or the file user.
pete@icebox:~$ fuser -v . USER PID ACCESS COMMAND /home/pete: pete 1491 ..c.. lxsession pete 1796 ..c.. update-notifier pete 1804 ..c.. nm-applet pete 1809 ..c.. indicator-power pete 2205 ..c.. xterm pete 2207 ..c.. bash
We can see which processes are currently using our /home/pete directory. The lsof and fuser tools are very similar, familiarize yourself with these tools and try using them next time you need to track a file or process down.
Exercise
Read the manpages for lsof and fuser, there is a lot of information that we didn't cover that allows you to have greater flexibility with these tools.
Quiz Question
What command is used to list open files and their process information?
Quiz Answer
lsof
Process Threads
Lesson Content
You may have heard of the terms single-threaded and multi-threaded processes. Threads are very similar to processes, in that they are used to execute the same program, they are often referred to as lightweight processes. If a process has one thread it is single-threaded and if a process has more than one thread it is multi-threaded. However, all processes have at least one thread.
Processes operate with their own isolated system resources, however threads can share these resources among each other easily, making it easier for them to communicate among each other and at times it is more efficient to have a multi-threaded application than a multi-process application.
Basically, let's say you open up LibreOffice Writer and Chrome, each is it's own separate process. Now you go inside Writer and start editing text, when you edit the text it gets automatically saved. These two parallel "lightweight processes" of saving and editing are threads.
To view process threads, you can use:
pete@icebox:~$ ps m PID TTY STAT TIME COMMAND 2207 pts/2 - 0:01 bash - - Ss 0:01 - 5252 pts/2 - 0:00 ps m - - R+ 0:00 -
The processes are denoted with each PID and underneath the processes are their threads (denoted by a --). So you can see that the processes above are both single-threaded.
Exercise
Run the ps m command and see what processes you have running are multi-threaded.
Quiz Question
True or false, all processes start out single-threaded.
Quiz Answer
True
CPU Monitoring
Lesson Content
Let's go over a useful command, uptime.
pete@icebox:~$ uptime 17:23:35 up 1 day, 5:59, 2 users, load average: 0.00, 0.02, 0.05
We talked about uptime in the first lesson of this course, but we haven't gone over the load average field. Load averages are good way to see the CPU load on your system. These numbers represent the average CPU load in 1, 5, and 15 minute intervals. What do I mean by CPU load, the CPU load is the average number of processes that are waiting to be executed by the CPU.
Let's say you have a single-core CPU, think of this core as a single lane in traffic. If it's rush hour on the freeway, this lane is gonna be really busy and traffic is gonna be at 100% or a load of 1. Now the traffic has become so bad, it's backing up the freeway and getting the regular roads busy by twice the amount of cars, we can say that your load is 200% or a load of 2. Now let's say it clears up a bit and there are only half as many cars on the freeway lane, we can say the load of the lane is 0.5. When traffic is non-existent and we can get home quicker, the load should ideally be very low, like 2am traffic low. The cars in this case are processes and these processes are just waiting to get off the freeway and get home.
Now just because you have a load average of 1 doesn't mean your computer is slogging around. Most modern machines these days have multiple cores. If you had a quad core processor (4 cores) and your load average is 1, it's really just affecting 25% of your CPU. Think of each core as a lane in traffic. You can view the amount of cores you have on your system with cat /proc/cpuinfo.
When observing load average, you have to take the number of cores into account, if you find that your machine is always using an above average load, there could something wrong going on.
Exercise
Check the load average of your system and see what it's doing.
Quiz Question
What command can you use to see the load average?
Quiz Answer
uptime
I/O Monitoring
Lesson Content
We can also monitor CPU usage as well as monitor disk usage with a handy tool known as iostat
pete@icebox:~$ iostat Linux 3.13.0-39-lowlatency (icebox) 01/28/2016 _i686_ (1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.13 0.03 0.50 0.01 0.00 99.33 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 0.17 3.49 1.92 385106 212417
The first part is the CPU information:
- %user - Show the percentage of CPU utilization that occurred while executing at the user level (application)
- %nice - Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.user CPU utilization with nice priorities
- %system - Show the percentage of CPU utilization that occurred while executing at the system level (kernel).
- %iowait - Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
- %steal - Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
- %idle - Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
The second part is the disk utilization:
- tps - Indicate the number of transfers per second that were issued to the device. A transfer is an I/O request to the device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size.
- kB_read/s - Indicate the amount of data read from the device expressed in kilobytes per second.
- kB_wrtn/s - Indicate the amount of data written to the device expressed in kilobytes per second.
- kB_read - The total number of kilobytes read.
- kB_wrtn - The total number of kilobytes written.
Exercise
Use iostat to view your disk usage.
Quiz Question
What command can be used to view I/O and CPU usage?
Quiz Answer
iostat
Memory Monitoring
Lesson Content
In addition to CPU monitoring and I/O monitoring you can monitor your memory usage with vmstat
pete@icebox:~$ vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 396528 38816 384036 0 0 4 2 38 79 0 0 99 0 0
The fields are as follows:
procs
- r - Number of processes for run time
- b - Number of processes in uninterruptible sleep
memory
- swpd - Amount of virtual memory used
- free - Amount of free memory
- buff - Amount of memory used as buffers
- cache - Amount of memory used as cache
swap
- si - Amount of memory swapped in from disk
- so - Amount of memory swapped out to disk
io
- bi - Amount of blocks received in from a block device
- bo - Amount of blocks sent out to a block device
system
- in - Number of interrupts per second
- cs - Number of context switches per second
cpu
- us - Time spent in user time
- sy - Time spent in kernel time
- id - Time spent idle
- wa - Time spent waiting for IO
Exercise
Look at your memory usage with vmstat.
Quiz Question
What tool is used to view memory utilization?
Quiz Answer
vmstat
Continuous Monitoring
Lesson Content
These monitoring tools are good to look at when your machine is having issues, but what about machines that are having issues when you aren't looking. For those, you'll need to use a continuous monitoring tool, something that will collect, report and save your system activity information. In this lesson we will go over a great tool to use sar.
Installing sar Sar is a tool that is used to do historical analysis on your system, first make sure you have it installed by installing the sysstat package sudo apt install sysstat.
Setting up data collection Usually once you install sysstat, your system will automatically start collecting data, if it doesn't you can enable it by modifying the ENABLED field in /etc/default/sysstat.
Using sar
$ sudo sar -q
This command will list the details from the start of the day.
$ sudo sar -r
This will list the details of memory usage from the start of the day.
$ sudo sar -P
This will list the details of CPU usage.
To see a view of a different day, you can go into /var/log/sysstat/saXX where XX is the day you want to view.
$sar -q /var/log/sysstat/sa02
Exercise
Install sar on your system and start collecting and analyzing your system resource utilization.
Quiz Question
What is a good tool to use for monitoring system resources?
Quiz Answer
sar
Cron Jobs
Lesson Content
Although we have been talking about resource utilization, I think this would be a good point to mention a neat tool in Linux that is used to schedule tasks using cron. There is a service that runs programs for you at whatever time you schedule. This is a really useful if you have a script you want to run once a day that needs to execute something for you.
For example, let's say I have a script located in /home/pete/scripts/change_wallpaper. I use this script every morning to change the picture I use for my wallpaper, but each morning I have to manually execute this script. Instead what I can do is create a cron job that executes my script through cron. I can specify the time I want this cron job to run and execute my script.
30 08 * * * /home/pete/scripts/change_wallpaper
The fields are as follows from left to right:
- Minute - (0-59)
- Hour - (0-23)
- Day of the month - (1-31)
- Month - (1-12)
- Day of the week - (0-7). 0 and 7 are denoted as Sunday
The asterisk in the field means to match every value. So in my above example, I want this to run every day in every month at 8:30am.
To create a cronjob, just edit the crontab file:
crontab -e
Exercise
Create a cronjob that you want to run at a scheduled time.
Quiz Question
What is the command to edit your cronjobs?
Quiz Answer
crontab -e
Logging
System Logging
Lesson Content
The services, kernel, daemons, etc on your system are constantly doing something, this data is actually sent to be saved on your system in the form of logs. This allows us to have a human readable journal of the events that are happening on our system. This data is usually kept in the /var directory, the /var directory is where we keep our variable data, such as logs!
How are these messages even getting received on your system? There is a service called syslog that sends this information to the system logger.
Syslog actually contains many components, one of the important ones is a daemon running called syslogd (newer Linux distributions use rsyslogd), that waits for event messages to occur and filter the ones it wants to know about, and depending on what it's supposed to do with that message, it will send it to a file, your console or do nothing with it.
You would think that this system logger is the centralized place to manage logs, but unfortunately it's not. You'll see many applications that write their own logging rules and generate different log files, however in general the format of logs should include a timestamp and the event details.
Here is an example of a line from syslog:
pete@icebox:~$ less /var/log/syslog Jan 27 07:41:32 icebox anacron[4650]: Job `cron.weekly' started
Here we can see that at Jan 27 07:41:32 our cron service ran the cron.weekly job. You can view all the event messages that syslog collects with in the /var/log/syslog file.
Exercise
Look at your /var/log/syslog file and see what else is happening on your machine.
Quiz Question
What is the daemon that manages log on newer Linux systems?
Quiz Answer
rsyslogd
syslog
Lesson Content
The syslog service manages and sends logs to the system logger. Rsyslog is an advanced version of syslog, most Linux distributions should be using this new version. The output of all the logs the syslog service collects can be found at /var/log/syslog (every message except auth messages).
To find out what files are maintained by our system logger, look at the configuration files in /etc/rsyslog.d:
pete@icebox:~$ less /etc/rsyslog.d/50-default.conf # First some standard log files. Log by facility. # auth,authpriv.* /var/log/auth.log *.*;auth,authpriv.none -/var/log/syslog #cron.* /var/log/cron.log #daemon.* -/var/log/daemon.log kern.* -/var/log/kern.log #lpr.* -/var/log/lpr.log mail.* -/var/log/mail.log #user.* -/var/log/user.log
These rules to log files are denoted by the selector on the left column and the action on the right column. The action tells us where to send the log information, in a file, console, etc. Remember not every application and service uses rsyslog to manage their logs, so if you want to know specifically what is logged you'll have to look inside this directory.
Let's actually see logging in action, you can manually send a log with the logger command:
logger -s Hello
Now look inside your /var/log/syslog and you should see this entry in your logs!
Exercise
Look at your /etc/rsyslog.d configuration file and see what else is being logged via the system logger.
Quiz Question
What command can you use to manually log a message?
Quiz Answer
logger
General Logging
Lesson Content
There are many log files you can view on your system, many important ones can be found under /var/log. We won't go through them all, but we'll discuss a couple of the major ones.
There are two general log files you can view to get a glimpse of what your system is doing:
/var/log/messages
This log contains all non-critical and non-debug messages, includes messages logged during bootup (dmesg), auth, cron, daemon, etc. Very useful to get a glimpse of how your machine is acting.
/var/log/syslog
This logs everything except auth messages, it's extremely useful for debugging errors on your machine.
These two logs should be more than enough when troubleshooting issues with your system, However, if you just want to view a specific log component, there are also separate logs for those as well.
Exercise
Look at your /var/log/messages and /var/log/syslog files and see what the differences are.
Quiz Question
What log file logs everything except auth messages?
Quiz Answer
syslog
Kernel Logging
Lesson Content
/var/log/dmesg On boot-time your system logs information about the kernel ring buffer. This shows us information about hardware drivers, kernel information and status during bootup and more. This log file can be found at /var/log/dmesg and gets reset on every boot, you may not actually see any use in it now, but if you were to ever have issues with something during bootup or a hardware issue, dmesg is the best place to look. You can also view this log using the dmesg command.
/var/log/kern.log Another log you can use to view kernel information is the /var/log/kern.log file, this logs the kernel information and events on your system, it also logs dmesg output.
Exercise
Look at your dmesg and kern logs, what differences do you notice?
Quiz Question
What command can be used to view kernel bootup messages?
Quiz Answer
dmesg
authenticating logging
Managing Log Files
Lesson Content
Log files generate lots of data and they store this data on your hard disks, however there are lots of issues with this, for the most part we just want to be able to see newer logs, we also want to manage our disk space efficiently, so how do we do all of this? The answer is with logrotate.
The logrotate utility does log management for us. It has a configuration file that allows us to specify how many and what logs to keep, how to compress our logs to save space and more. The logrotate tool is usually run out of cron once a day and the configuration files can be found in /etc/logrotate.d.
There are other logrotating tools you can use to manage your logs, but logrotate is the most common one.
Exercise
Look at your logrotate configuration file and see how it manages some of your logs.
Quiz Question
What utility is used to manage logs?
Quiz Answer
logrotate
Network Sharing
File Sharing Overview
Lesson Content
You usually are not the only computer on your network, this is especially the case if you're working in a commercial environment. When we want to transfer data from one machine to another, sometimes it maybe easier to connect a USB drive and manually copy them. But for the most part, if you're working with machines on the same network, the way to transfer data is through network file sharing.
In this course we'll go over a couple of different methods to copy data to and from different machines on your network. We'll discuss some simple file copies, then we'll talk about mounting entire directories on your machine that act as a separate drive.
One simple file sharing tool is the scp command. The scp command stands for secure copy, it works exactly the way the cp command does, but allows you to copy from one host over to another host on the same network. It works via ssh so all your actions are using the same authentication and security as ssh.
To copy a file over from local host to a remote host
$ scp myfile.txt username@remotehost.com:/remote/directory
To copy a file from a remote host to your local host
$ scp username@remotehost.com:/remote/directory/myfile.txt /local/directory
To copy over a directory from your local host to a remote host
$ scp -r mydir username@remotehost.com:/remote/directory
Exercise
Try to copy a file over with scp from one machine to another.
Quiz Question
What command can you use to securely copy files from one host to another?
Quiz Answer
scp
rsync
Lesson Content
Another tool used to copy data from different hosts is rsync (short for remote synchronization). Rsync is very similar to scp, but it does have a major difference. Rsync uses a special algorithm that checks in advanced if there is already data that you are copying to and will only copy over the differences. For example, let's say that you were copying over a file and your network got interrupted, therefore your copy stopped midway. Instead of re-copying everything from the beginning, rsync will only copy over the parts that didn't get copied.
It also verifies the integrity of a file you are copying over with checksums. These small optimizations allow greater file transfer flexibility and makes rsync ideal for directory synchronization remotely and locally, data backups, large data transfers and more.
Some commonly-used rsync options:
- v - verbose output
- r - recursive into directories
- h - human readable output
- z - compressed for easier transfer, great for slow connections
Copy/sync files on the same host
$ rsync -zvr /my/local/directory/one /my/local/directory/two
Copy/sync files to local host from a remote host
$ rsync /local/directory username@remotehost.com:/remote/directory
Copy/sync files to a remote host from a local host
$ rsync username@remotehost.com:/remote/directory /local/directory
Exercise
Use rsync to sync a directory to another directory, be sure not to overwrite an important directory!
Quiz Question
What command would be useful for data backups?
Quiz Answer
rsync
Simple HTTP Server
Lesson Content
Python has a super useful tool for serving files over HTTP. This is great if you just want to create a quick network share that other machines on your network can access. To do that just go to the directory you want to share and run:
$ python -m SimpleHTTPServer
This sets up a basic webserver that you can access via the localhost address. So grab the IP address of the machine you ran this on and then on another machine access it in the browser with: http://IP_ADDRESS:8000. On your own machine, you can view the files available by typing: http://localhost:8000 in your web browser.
You can also do this with node or if you are running Python 3, the syntax will be a little bit different.
Exercise
Try setting up a SimpleHTTPServer!
Quiz Question
What tool can you use to create a simple http server with python?
Quiz Answer
SimpleHTTPServer
NFS
Lesson Content
The most standard network file share for Linux is NFS (Network File System), NFS allows a server to share directories and files with one or more clients over the network.
We won't get into the details of how to create an NFS server as it can get complex, however we will discuss setting up NFS clients.
Setting up NFS client
$ sudo service nfsclient start $ sudo mount server:/directory /mount_directory
Automounting
Let's say you use the NFS server quite often and you want to keep it permanently mounted, normally you think you'd edit the /etc/fstab file, but you may not always get a connection to the server and that can cause issues on bootup. Instead what you want to do is setup automounting so that you can connect to the NFS server when you need to. This is done with the automount tool or in recent versions of Linux amd. When a file is accessed in a specified directory, automount will look up the remote server and automatically mount it.
Exercise
Read the manpage for NFS to learn more.
Quiz Question
What tool is used to manage mount points automatically?
Quiz Answer
automount
Samba
Lesson Content
In the early days of computing, it became necessary for Windows machines to share files with Linux machines, thus the Server Message Block (SMB) protocol was born. SMB was used for sharing files between Windows operating systems (Mac also has file sharing with SMB) and then it was later cleaned up and optimized in the form of the Common Internet File System (CIFS) protocol.
Samba is what we call the Linux utilities to work with CIFS on Linux. In addition to file sharing, you can also share resources like printers.
Create a network share with Samba
Let's go through the basic steps to create a network share that a Windows machine can access:
Install Samba
$ sudo apt update $ sudo apt install samba
Setup smb.conf
The configuration file for Samba is found at /etc/samba/smb.conf, this file should tell the system what directories should be shared, their access permissions, and more options. The default smb.conf comes with lots of commented code already and you can use those as an example to write your own configurations.
$ sudo vi /etc/samba/smb.conf
Setup up a password for Samba
$ sudo smbpasswd -a [username]
Create a shared directory
$ mkdir /my/directory/to/share
Restart the Samba service
$ sudo service smbd restart
Accessing a Samba share via Windows
In Windows, just type in the network connection in the run prompt: \HOST\sharename.
Accessing a Samba/Windows share via Linux
$ smbclient //HOST/directory -U user
The Samba package includes a command line tool called smbclient that you can use to access any Windows or Samba server. Once you're connected to the share you can navigate and transfer files.
Attach a Samba share to your system
Instead of transferring files one by one, you can just mount the network share on your system.
$ sudo mount -t cifs servername:directory mountpount -o user=username,pass=password
Exercise
Setup a Samba share, if you don't have one, open up smb.conf and familiarize yourself with the options in the config file.
Quiz Question
What is the latest protocol used for file transfer between Windows and Linux?
Quiz Answer
CIFS
Network Fundamentals
Network Basics
Lesson Content
Let's look at a typical home network, you have a few different components.
- ISP - Your internet service provider, the company you pay to get Internet at your house.
- Router - The router allows each machine on your network to connect to the Internet. In most modern routers, you can connect via wireless or an Ethernet cable.
- WAN - Wide Area Network, this is what we call the network that encompasses everything between your router and a wider network such the Internet.
- WLAN - Wireless Local Area Network, this is the network between your router and any wireless devices you may have such as laptops.
- LAN - Local Area Network, this is the network between your router and any wired devices such as Desktop PCs.
- Hosts - Each machine on a network is known as a host.
The data and information that gets transmitted through networks are known as packets and by the end of the Networking Nomad section, you'll understand in detail how a packet travels to and from hosts.
Exercise
No exercises for this lesson.
Quiz Question
What is the local area network known as?
Quiz Answer
LAN
OSI Model
Lesson Content
Before we can look at some practical networking stuff, we have to go over some boring jargon that you've probably heard of before. The OSI (Open Systems Interconnection) model is a theoretical model of networking. This model shows us how a packet traverses through a network in seven different layers. I won't get into specifics of this model, since most of these networking courses will be focused on the TCP/IP model, but it should be mentioned that such a theoretical networking model exists and has actually played a large part in the TCP/IP networking model that we use today.
Exercise
Read more about the OSI model: https://en.wikipedia.org/wiki/OSI_model
Quiz Question
What is used as the theoretical model of networking?
Quiz Answer
OSI
TCP/IP Model
Lesson Content
The OSI model gave birth to what eventually became the TCP/IP model and this model is actually what the Internet is based off of. It is the actual implementation of networking. The TCP/IP model uses the TCP/IP protocol suite, which we just commonly refer to as TCP/IP. These protocols work together to specify how data should be gathered, addressed, transmitted and routed through a network. Using the TCP/IP model, we can see how these protocols are used to show the breakdown of how a packet travels through the network.
Application Layer
The top layer of the TCP/IP model. It determines how your computer's programs (such as your web browser) interface with the transport layer services to view the data that gets sent or received.
This layer uses:
- HTTP (Hypertext Transfer Protocol) - used for the webpages on the Internet.
- SMTP (Simple Mail Transfer Protocol) - electronic mail (email) transmission
Transport Layer
How data will be transmitted, includes checking the correct ports, the integrity of the data, and basically delivering our packets.
This layer uses:
- TCP (Transmission Control Protocol) - reliable data delivery
- UDP (User Datagram Protocol) - unreliable data delivery
Network Layer
This layers specifies how to move packets between hosts and across networks.
This layer uses:
- IP (Internet Protocol) - Helps route packets from one machine to another.
- ICMP (Internet Control Message Protocol) - Helps tell us what is going on, such as error messages and debugging information.
Link Layer
This layer specifies how to send data across a physical piece of hardware. Such as data travelling through Ethernet, fiber, etc.
The lists above of protocols each layer uses is not extensive and you'll encounter many other protocols that come into play.
In the following lessons, we will dive through each of these layers and discuss how our packet traverses through the network in the eyes of the TCP/IP model (there are many perspectives on how a packet travels across networks, we won't look at them all, but be aware that they exist).
Exercise
No exercises for this lesson.
Quiz Question
What is the top layer of the TCP/IP model?
Quiz Answer
Application
Network Addressing
Lesson Content
Before we jump into seeing how a packet moves across a network, we have to familiarize ourselves with some terminology. When you mail a letter, you must know who it is being sent to and where it is coming from. Packets need the same information, our hosts and other hosts are identified using MAC (media access control) addresses and IP addresses, to make it easier on us humans we use hostnames to identify a host.
MAC Addresses
A MAC address is a unique identifier used as a hardware address. This address will never change. When you want to get access to the Internet, your machine needs to have a device called a network interface card. This network adapter has its own hardware address that's used to identify your machine. A MAC address for an Ethernet device looks something like this 00:C4:B5:45:B2:43. MAC addresses are given to network adapters when they are manufactured. Each manufacturer has an organizationally unique identifier (OUI) to identify them as the manufacturer. This OUI is denoted by the first 3 bytes of the MAC address. For example, Dell has 00-14-22, so a network adapter from Dell could have a MAC address like: 00-14-22-34-B2-C2.
IP Addresses
An IP Address is used to identify a device on a network, they are hardware independent and can vary in syntax depending on if you are using IPv4 or IPv6 (more on this later). For now we'll assume you are using IPv4, so a typical IP address would look like: 10.24.12.4. IP addresses are used with the software side of networking. Anytime a system is connected to the Internet it should have an IP address. They can also change if your network changes and are unique to the entire Internet (this isn't always the case once we learn about NAT).
Remember it takes both software and hardware to move packets across networks, so we have two identifiers for each, MAC (hardware) and IP (software).
Hostnames
One last way to identify your machines is through hostname. Hostnames take your IP address and allow you to tie that address to a human readable name. Instead of remembering 192.12.41.4 you can just remember myhost.com.
Exercise
No exercises for this lesson.
Quiz Question
How many bytes are in an IPv4 address?
Quiz Answer
4
Application Layer
Lesson Content
Let's say I wanted to send an email to Patty. We'll go through each of the TCP/IP layers to see this in action.
Remember that packets are used to transmit data across networks, a packet consists of a header and payload. The header contains information about where the packet is going and where it came from. The payload is the actual data that is being transferred. As our packet traverses the network, each layer adds a bit of information to the header of the packet. Also keep in mind that different layers use a different term for our "packet". In the transport layer we essentially encapsulate our data in a segment and in the link layer we refer to this as a frame, but just know that packet can be used in regards to the same thing.
First we start off in the application layer. When we send our email through our email client, the application layer will encapsulate this data. The application layer talks to the transport layer through a specified port and through this port it sends its data. We want to send an email through the application layer protocol SMTP (simple mail transfer protocol). The data is sent through our transport protocol which opens a connection to this port (port 25 is used for SMTP), so we get this data sent through this port and that data is sent to the Transport layer to be encapsulated into segments.
Exercise
No exercises for this lesson.
Quiz Question
What layer is used to present the packet data in a user friendly format?
Quiz Answer
Application
Transport Layer
Lesson Content
The transports layer helps us transfer our data in a way networks can read it. It breaks our data into chunks that will be transported and put back together in the correct order. These chunks are known as segments. Segments make it easier to transport data across networks.
Ports
Even though we know where we are sending our data via IP addresses, they aren't specific enough to send our data to a certain processes or services. Services such as HTTP use a communication channel via ports. If we want to send webpage data, we need to send it over the HTTP port (port 80). In addition to forming segments, the transport layer will also attach the source and destination ports to the segment, so when the receiver gets the final packet it will know what port to use.
UDP
There are two popular transport protocols UDP and TCP. We'll briefly discuss UDP and spend most of our time on TCP, since it's the most commonly used.
UDP is not a reliable method of transporting data, in fact it doesn't really care if you get all of your original data. This may sound terrible, but it does have its uses, such as for media streaming, it's ok if you lose some frames in return you get your data a little faster.
TCP
TCP provides a reliable connection-oriented stream of data. TCP uses ports to send data to and from hosts. An application opens up a connection from one port on its host to another port on a remote host. In order to establish the connection, we use the TCP handshake.
- The client (connecting process) sends a SYN segment to the server to request a connection
- Server sends the client a SYN-ACK segment to acknowledge the client's connection request
- Client sends an ACK to the server to acknowledge the server's connection request
Once this connection is established, data can be exchanged over a TCP connection. The data is sent over in different segments and are tracked with TCP sequence numbers so they can be arranged in the correct order when they are delivered. In our email example, the transport layer attaches the destination port (25) to the source port of the source host.
Exercise
No exercises for this lesson.
Quiz Question
What is a reliable transport protocol?
Quiz Answer
TCP
Network Layer
Lesson Content
The Network layer determines the routing of our packets from our source host to a destination host. Fortunately in our example, our packet is only traveling within the same network, but the Internet is made up of many networks. These smaller networks that make up the Internet are known as subnets. All subnets connect to each other in some way, which is why we are able to get to www.google.com even though it's on its own network. I won't go into detail as we have a whole course dedicated to subnets, but for now in regards to our Network layer, know that the IP addresses define the rules to travel to different subnets.
In the network layer, it receives the segment coming from the transport layer and encapsulates this segment in an IP packet then attaches the IP address of the source host and the IP address of the destination host to the packet header. So at this point, our packet has information about where it is going and where it came from. Now it sends our packet to the physical hardware layer.
Exercise
No exercises for this lesson.
Quiz Question
What are smaller networks that make up the Internet called?
Quiz Answer
subnets
Link Layer
Lesson Content
At the bottom of the TCP/IP model sits the Link Layer. This layer is the hardware specific layer.
In the link layer, our packet is encapsulated once more into something called a frame. The frame header attaches the source and destination MAC addresses of our hosts, checksums and packet separators so that the receiver can tell when a packet ends.
Fortunately we are on the same network, so our packet won't have to travel too far. First, the link layer attaches my source MAC address to the frame header, but it needs to know Patty's MAC address as well. How does it know that and how do I find it since it's not on the Internet? We use ARP!
ARP (Address Resolution Protocol)
ARP finds the MAC address associated with an IP address. ARP is used within the same network. If Patty was not on the same network, we would use a routing system to determine the next router that would receive the packet and once we were on the same network, we could use ARP.
Once we are on the same network, systems first use the ARP look-up table that stores information about what IP addresses are associated with what MAC address. If the value is not there, then ARP is used. Then the system will send a broadcast message to the network using the ARP protocol to find out which host has IP 10.10.1.4. A broadcast message is a special message that is sent to all hosts on a network (aptly named for sending a broadcast). Any machine with the requested IP address will reply with an ARP packet containing the IP address and the MAC address.
Now that we have all the necessary data we need, IP address and MAC addresses, our link layer forwards this frame through our network interface card, out to the next device and finds Patty's network. This step is a little more complex than how I just explained it, but we will discuss more details in the Routing course.
And there it is a simple (or not so simple) packet traversal down the TCP/IP layer. Keep in mind that packets don't travel in a one way fashion like this. We haven't even gotten to Patty's network yet! When travelling through networks, it requires going through the TCP/IP model at least twice before any data is sent or received. In reality, the way this packet looks would be something like this:
Packet Traversal
- Pete sends Patty an email: this data gets sent to the transport layer.
- The transport layer encapsulates the data into a TCP or UDP header to form a segment, the segment attaches the destination and source TCP or UDP port, then the segment is sent to the network layer.
- The network layer encapsulates the TCP segment inside an IP packet, it attaches the source and destination IP address. Then routes the packet to the link layer.
- The packet then reaches Pete's physical hardware and gets encapsulated in a frame. The source and destination MAC address get added to the frame.
- Patty's receives this data frame through her physical layer and checks each frame for data integrity, then de-encapsulates the frame contents and sends the IP packet to the network layer.
- The network layer reads the packet to find the source and destination IP that was previously attached. It checks if its IP is the same as the destination IP, which it is! It de-encapsulates the packet and sends the segment to the transport layer.
- The transport layer de-encapsulates the segments, checks the TCP or UDP port numbers and makes a connection to the application layer based on those port numbers.
- The application layer receives the data from the transport layer on the port that was specified and presents it to Patty in the form of the final email message
Exercise
No exercises for this lesson.
Quiz Question
What is used to find the MAC address on the same network?
Quiz Answer
ARP
DHCP Overview
Lesson Content
An important networking concept that we did not go over yet is DHCP (Dynamic Host Configuration Protocol)
DHCP assigns IP addresses, subnet masks and gateways to our machines. For example, let's say you have a cell phone and you want to get a cell phone number to start talking to people. You have to call up your phone carrier and they will give you a number. As long as your pay your bills you can keep using your phone. DHCP is the phone carrier in this case, it gives you an IP address so that you can talk to other IP addresses. You are also leased an IP address, these last for a certain period of time, then will get renewed depending on how you have your lease settings.
DHCP is great for many reasons, it allows a network administrator to not worry about assigning IP addresses and it also prevents them from setting up duplicate IP addresses. Every physical network should have its own DHCP server so that a host can request an IP address. In a regular home setting, the router usually acts as the DHCP server.
The way DHCP gets all your dynamic host information is:
- DHCP DISCOVER - This message is broadcasted to search for a DHCP server.
- DHCP OFFER - The DHCP server in the network replies with an offer message. The offer contains a packet with DHCP lease time, subnet mask, IP address, etc.
- DHCP REQUEST - The client sends out another broadcast to let all DHCP servers know which offer it accepted.
- DHCP ACK - Acknowledgement is sent by the server.
DHCP gets more involved than this, but this is the gist of it.
Exercise
No exercises for this lesson.
Quiz Question
What are the steps in a DHCP request?
Quiz Answer
DISCOVER, OFFER, REQUEST, ACK
subnetting
IPv4
Lesson Content
So we know that network hosts have a unique address they can be found at. These addresses are known as IP addresses. An IPv4 address looks something like this:
204.23.124.23
This address actually contains two parts, the network portion that tells us know network it's on and the host portion that tells us which host on that network it is. For this course we will mostly be discussing IPv4 addresses, which are what you commonly will see when referring to IP addresses.
An IP address is separated into octets by the periods. So there are 4 octets in an IPv4 address. If you know a bit of computer science, an octet is 8 bits and 8 bits actually equal 1 byte, so we also refer to an IPv4 address as having 4 bytes. We use bits frequently when dealing with subnets and IP addresses.
You can view your IP address with the ifconfig -a command:
pete@icebox:~$ ifconfig -a eth0 Link encap:Ethernet HWaddr 1d:3a:32:24:4d:ce inet addr:192.168.1.129 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fd60::21c:29ff:fe63:5cdc/64 Scope:Link
As you can see my IPv4 address is: 192.168.1.129
Exercise
Find your IP address with ifconfig.
Quiz Question
How many bytes are in an IPv4 address?
Quiz Answer
4
Subnets
Lesson Content
How can I tell if I'm on the same network as Patty? Well we can just look at the subnet short for subnetwork. A subnet is a group of hosts with IP addresses that are similar in a certain way. These hosts usually are in a proximate location from each other and you can easily send data to and from hosts on the same subnet. Think about it as sending mail in the same zip code, it's a lot easier than sending mail to a different state.
For example, all hosts with an IP address that starts with 123.45.67 would be on the same subnet. My host has an IP of 123.45.67.8 and Patty's has an IP of 123.45.67.9. The common numbers are my network prefix and the 8 and 9 are our hosts, therefore my network is the same as Patty's. A subnet is divided into a network prefix, such as 123.45.67.0 and a subnet mask.
Subnet Masks
Subnet masks determine what part of your IP address is the network portion and what part is the host portion.
A typical subnet mask can look something like this:
255.255.255.0
The 255 portion is actually our mask. To make this a little easier to understand, remember how we refer to each octet as 8 bits? In computer science a bit is denoted by a 0 or a 1 in binary form. When binary numbers are used, 1 means on and 0 means off. So what does 8 0's or 1's equal?
Punch into Google "binary to decimal calculator" and convert 11111111 into a decimal form. What do you get? 255! So an octet ranges from 0 to 255. So if we had a subnet mask of 255.255.255.0, and an IP address of 192.168.1.0, how many hosts are on that subnet? We'll find out the answer to that in our subnet math lesson.
Also when we talk about our subnet, we commonly denote it by the network prefix followed by the subnet mask:
123.234.0.0/255.255.0.0
Why?
Why on earth do we make subnets? Subnetting is used to segment networks and control the flow of traffic within that network. So a host on one subnet can’t interact with another host on a different subnet.
But wait a minute, what if I want to connect to other hosts like yahoo.com? Then you need to connect subnets together. To connect subnets you just need to find the hosts that are connected to more than one subnet. For example, if my host at 192.168.1.129 is connected to a local network of 192.168.1.129/24 it can reach any hosts on that network. To reach hosts on the rest of the Internet, it needs to communicate through the router. Traditionally, on most networks with a subnet mask of 255.255.255.0, the router is usually at address 1 of the subnet, so 192.168.1.1. Now that router will have a port that connects it to another subnet (more in the Routing course). Certain IP addresses (private networks) are not visible to the internet, and we have things like NAT in place (more on this later).
Exercise
Use ifconfig to view your subnet mask.
Quiz Question
True or false, a subnet consists of a subnet mask and network prefix.
Quiz Answer
True
Subnet Math
Lesson Content
Ok, we know that subnet masks are important to figure out how many hosts we can have on our subnet. So how many hosts would that be?
Let's say I have an IP address of 192.168.1.0 and a subnet mask of 255.255.255.0, now let's line up these numbers in binary form. For now use an online calculator to convert these values from decimal to binary.
192.168.1.165 = 11000000.10101000.00000001.10100101 255.255.255.0 = 11111111.11111111.11111111.00000000
The IP address is masked by our subnet mask, when you see a 1, it is masked and we pretend like we don't see it. So the only possible hosts we can have are from the 00000000 region. Remember 11111111 in binary form equals 255, we also account 0 as a host number, so there are 256 possible options. However, it may look like we have 256 possible options, but we actually subtract 2 hosts because we have to account for the broadcast address and the subnet address, leaving us with 254 possible hosts on our subnet. So we know that we can have hosts with IP addresses ranging from 192.168.1.1 - 192.168.1.254.
Exercise
No exercises for this lesson.
Quiz Question
What is the binary equivalent of 255?
Quiz Answer
11111111
Subnetting Cheats
Lesson Content
I hate to have to add this section, in the real world you would most likely never have to do subnet math by hand, however if you were getting interviewed on this, you'll have to know how to convert to and from binary form for subnetting. Luckily there are some arithmetic cheats you can memorize.
First memorize your base-2 calculations, just do it:
- 2^1 = 2
- 2^2 = 4
- 2^3 = 8
- 2^4 = 16
- 2^5 = 32
- 2^6 = 64
- 2^7 = 128
- 2^8 = 256
- 2^9 = 512
- 2^10 = 1024
- 2^11 = 2048
- 2^12 = 4096
Decimal to Binary Chart
1 1 1 1 1 1 1 1 128 64 32 16 8 4 2 1
There are lots of reasons why the following chart looks the way it does, if you're curious how it works there are lots of resources online.
Ok, got these memorized? Let's do a quick decimal to binary conversion:
Convert 192.168.23.43 to Binary
Remember: 128 / 64 / 32 / 16 / 8 / 4 / 2 / 1
Let's walk through converting the first octet to binary and you'll understand how the rest works.
- Can you subtract 192 - 128? Yes, so the first bit is 1
- 192 - 128 = 64, the next number in the chart is 64, can you subtract 64 - 64? Yes, so the second bit is 1
- We've run out of numbers to subtract from, so our binary form of 192 is 11000000
Convert Binary 11000000 to Decimal
For binary to decimal conversion you add up the numbers that have a 1, so:
128 + 64 + 0 + 0 + 0 + 0 + 0 + 0 = 192!
Exercise
Look at your IP address and subnet mask and see how many hosts you can have on your subnet.
Quiz Question
What is the binary conversion of 123?
Quiz Answer
1111011
classless inter domain routing cidr
NAT
Lesson Content
We've brought up NAT (network address translation) before but didn't touch upon it, when we are working on our network, does that mean that the Internet can see our IP address? Not quite.
NAT makes a device like our router act as an intermediary between the Internet and private network. So only a single, unique IP address is required to represent an entire group of computers.
Think of NAT is like a receptionist in a large office, if someone wants to contact you, they only know the number to the whole office, the receptionist would then have to look for your extension number and forward the call to you.
How does it work?
A simple case would look like this:
- Patty wants to connect to www.google.com, so her machine sends this request through the router
- The router takes that request and opens its own connection to google.com, then it sends Patty's request once it makes a connection
- The router is the intermediary between Patty and www.google.com. Google doesn't know about Patty instead all it can see is the router.
NAT and packet routing in general can get pretty ugly, but we won't dive into the specifics.
Exercise
No exercises for this lesson.
Quiz Question
What is used to represent a single private address to the Internet?
Quiz Answer
NAT
IPv6
Lesson Content
We've heard the term IPv6 here and there, but what is it? Every device that connects to the Internet gets it's own IP address, well that happens to be a finite number that we are soon approaching in this digital age. IPv6 was created to allow us to connect more hosts to the Internet, it comes with more IP improvements however, it's adoption is quite slow. It isn't meant to replace IPv4, they are meant to complement each other. The two IP protocols are very similar and if you know IPv4 you'll understand IPv6, the major difference is the way the address is written. Here is what a typical IPv6 address looks like:
2dde:1235:1256:3:200:f8ed:fe23:59cf
Exercise
Check ifconfig to see if you have an IPv6 address listed.
Quiz Question
What IP address is used to help increase the number of hosts that can connect to the Internet?
Quiz Answer
IPv6
Routing
What is a router?
Lesson Content
We've used this term router before, hopefully you know what one is, since you probably have one in your home. A router enables machines on a network to communicate with each other as well as other networks. On a typical router, you will have LAN ports, that allow your machines to connect to the same local area network and you will also have an Internet uplink port that connects you to the Internet, sometimes you'll see this port being labelled as WAN, because it is essentially connecting you to a wider network. When we do any sort of networking activity, it has to go through the router. The router decides where our network packets go and which ones come in. It routes our packets between multiple networks to get from it's source host to it's destination host.
How does a router work?
Think about routing the same way as mail delivery, we have an address we want to send a letter to, when we send it off to the post office, they get the letter and see, oh this is going to California, I'll put it on the truck going to California (I honestly have no idea how the postal system works). The letter then gets sent to San Francisco, inside San Francisco there are different zip codes, and then in those zip codes there are smaller address codes, until finally someone is able to deliver your letter to the address you wanted. On the other hand, if you already lived in San Francisco and in the same zipcode, the mail deliverer will probably know exactly where the letter has to go to without handing it off to anyone else.
When we route packets, they use similar address "routes", such as to get to network A, send these packets to network B. When we don't have a route set for that, we have a default route that our packets will use. These routes are set on a routing table that our system uses to navigate us across networks.
Hops
As packets move across networks, they travel in hops, a hop is how we roughly measure the distance that the packet must travel to get from the source to the destination. Let's say to I have two routers connecting host A to host B, so therefore we say there are two hops between host A and host B. Each hop is a intermediate device like the routers that we must pass through.
Understanding the basic difference between Switching, Routing & Flooding? Packet SWITCHING is basically receiving, processing and forwarding data to the destination device. ROUTING is a process of creating the routing table, so that we can do SWITCHING better. Before routing, FLOODING was used. If a router don't know which way to send a packet than every incoming packet is sent through every outgoing link except the one it arrived on.
Exercise
No exercises for this lesson.
Quiz Question
How do packets measure distance?
Quiz Answer
hops
Routing Table
Lesson Content
Look at your machine's routing table:
pete@icebox:~$ sudo route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.224.2 0.0.0.0 UG 0 0 0 eth0 192.168.224.0 0.0.0.0 255.255.255.0 U 1 0 0 eth0
Destination
In the first field, we have a destination IP address of 192.168.224.0, this says that any packet that tries to go to this network, goes out through my Ethernet cable (eth0). If I was 192.168.224.5 and wanted to get to 192.168.224.7, I would just use the network interface eth0 directly.
Notice that we have addresses of 0.0.0.0 this means that no address is specified or it's unknown. So if for example, I wanted to send a packet to IP address 151.123.43.6, our routing table doesn't know where that goes, so it denotes it as 0.0.0.0 and therefore routes our packet to the Gateway.
Gateway
If we are sending a packet that is not on the same network, it will be sent to this Gateway address. Which is aptly named as being a Gateway to another network.
Genmask
This is the subnet mask, used to figure out what IP addresses match what destination.
Flags
- UG - Network is Up and is a Gateway
- U - Network is Up
Iface
This is the interface that our packet will be going out of, eth0 usually stands for the first Ethernet device on your system.
Exercise
Look at your routing table and see where your packets can go.
Quiz Question
Where are packets routed to if our routing table doesn't know?
Quiz Answer
Gateway
Path of a Packet
Lesson Content
Let's look at how a packet travels within it's local network
- First the local machine will compare the destination IP address to see if it's in the same subnet by looking at its subnet mask.
- When packets are sent they need to have a source MAC address, destination MAC address, source IP address and destination IP address, at this point we do not know the destination MAC address.
- To get to the destination host, we use ARP to broadcast a request on the local network to find the MAC address of the destination host.
- Now the packet can be successfully sent!
Let's see how a packet travels outside it's network
- First the local machine will compare the destination IP address, since it's outside of our network, it does not see the MAC address of the destination host. And we can't use ARP because the ARP request is a broadcast to locally connected hosts.
- So our packet now looks at the routing table, it doesn't know the address of the destination IP, so it sends it out to the default gateway (another router). So now our packet contains our source IP, destination IP and source MAC, however we don't have a destination MAC. Remember MAC addresses are only reached through the same network. So what does it do? It sends an ARP request to get the MAC address of the default gateway.
- The router looks at the packet and confirms the destination MAC address, but it's not the final destination IP address, so it keeps looking at the routing table to forward the packet to another IP address that can help the packet move along to its destination. Everytime the packet moves, it strips the old source and destination MAC address and updates the packet with the new source and destination MAC addresses.
- Once the packet gets forwarded to the same network, we use ARP to find the final destination MAC address
- During this process, our packet doesn't change the source or destination IP address.
Exercise
No exercises for this lesson.
Quiz Question
How do we find the MAC address of an IP address?
Quiz Answer
ARP
Routing Protocols
Lesson Content
It would be a pain to have to manually configure routes on a routing table for every device on your network, so instead we use what are known as routing protocols. Routing protocols are used to help our system adapt to network changes, it learns of different routes, builds them in the routing table and then routes our packets through that way. There are two primary routing protocol types, distance vector protocols and link state protocols.
Convergence
Before we talk about the protocols, we should go over a term using in routing known as convergence. When using routing protocols, routers communicate with other routers to collect and exchange information about the network. When they agree on how a network should look, every routing table maps out the complete topology of the network, thus "converging". When something occurs in the network topology, the convergence will temporarily break until all routers are aware of this change.
Exercise
No exercises for this lesson.
Quiz Question
What is the term used when all routing tables know the network topology?
Quiz Answer
convergence
Distance Vector Protocols
Lesson Content
Distance vector protocols determine the path of other networks using the hop count a packet takes across the network. If network A was 3 hops away and network B was next to network A, then we assume it must be 4 hops away. In distance vector protocols, the next route would be the one with the least amount of hops.
Distance vector protocols are great for small networks, when networks start to scale it takes longer for the routers to converge because it periodically sends the entire routing table out to every router. Another downside to distance vector protocols is efficiency, it chooses routes that are closer in hops, but it may not always choose the most efficient route.
One of the common distance vector protocols is RIP (Routing Information Protocol), it broadcasts the routing table to every router in the network every 30 seconds. For a large network, this can take some serious juice to pull off, because of that RIP limits it's hop count to 15.
Exercise
No exercises for this lesson.
Quiz Question
True or false, distance protocols use the route with the least amount of bandwidth?
Quiz Answer
false
Link State Protocols
Lesson Content
Link state protocols are great for large scale networks, they are more complex than distance vector protocols, however a large upside is their ability to converge quickly, this is because instead of periodically sending out the whole routing table, they only send updates to neighboring routes. They use a different algorithm to calculate the shortest path first and construct their network topology in the form of a graph to show which routers are connected to other routers.
One of the common link state protocols is OSPF (Open Shortest Path First), it only updates the routing tables if there was a network change. It doesn't have a hop limit.
Exercise
No exercises for this lesson.
Quiz Question
What is one of the most common link state protocols?
Quiz Answer
OSPF
Border Gateway Protocol
Lesson Content
The last important protocol we'll discuss is BGP, BGP is basically how the Internet runs. It's used to collect and exchange routing information among autonomous systems. Think of an autonomous system as an Internet service provider, a company, university, any organization, etc. Without BGP, these systems would not know how to talk to each other, they would just be siloed off. Instead of routing inside these autonomous systems, BGP routes between them.
Let's say you were on your home network and I'm working from Starbucks, I want to be able to communicate with you, so I send an email and the network packet travels through Starbuck's network, it bounces around there and goes through the routing tables in Starbuck's network until it finally reaches a point at the border of the Starbucks network and passes it to a Border Gateway router. This router contains the information for my packet to leave the Starbucks network and traverse other networks.
Exercise
No exercises for this lesson.
Quiz Question
What protocol basically makes the Internet work?
Quiz Answer
BGP
Network Configuration
Network Interfaces
Lesson Content
A network interface is how the kernel links up the software side of networking to the hardware side. We've already seen an example of this:
pete@icebox:~$ ifconfig -a eth0 Link encap:Ethernet HWaddr 1d:3a:32:24:4d:ce inet addr:192.168.1.129 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fd60::21c:29ff:fe63:5cdc/64 Scope:Link
The ifconfig command
The ifconfig tool allows us to configure our network interfaces, if we don't have any network interfaces set up, the kernel's device drivers and the network won't know how to talk to each other. Ifconfig runs on bootup and configures our interfaces through config files, but we can also manually modify them. The output of ifconfig shows the interface name on the left side and the right side shows detailed information. You'll most commonly see interfaces named eth0 (first Ethernet card in the machine), wlan0 (wireless interface), lo (loopback interface). The loopback interface is used to represent your computer, it just loops you back to yourself. This is good for debugging or connecting to servers running locally.
The status of interfaces, can be up or down, as you can guess if you wanted to "turn off" an interface you can set it to go down. The fields you'll probably look at the most in the ifconfig output is the HWaddr (MAC address of the interface), inet address (IPv4 address) and inet6 (IPv6 address). Of course you can see that the subnet mask and broadcast address are there as well. You can also view interface information at /etc/network/interfaces.
To create an interface and bring it up
$ ifconfig eth0 192.168.2.1 netmask 255.255.255.0 up
This assigns an IP address and netmask to the eth0 interface and also turns it up.
To bring up or down an interface
$ ifup eth0 $ ifdown eth0
The ip command
The ip command also allows us to manipulate the networking stack of a system. Depending on the distribution you are using it may be the preferred method of manipulating your network settings.
Here are some examples of its use:
To show interface information for all interfaces
$ ip link show
To show the statistics of an interface
$ ip -s link show eth0
To show ip addresses allocated to interfaces
$ ip address show
To bring interfaces up and down
$ ip link set eth0 up $ ip link set eth0 down
To add an IP address to an interface
$ ip address add 192.168.1.1/24 dev eth0
Exercise
Try changing the state of your network interfaces to either up or down and observe what happens.
Can you change your network interface's with both the ifconfig and ip commands ?
Quiz Question
What is the command to configure our network interfaces?
Quiz Answer
ifconfig
route
Lesson Content
We've already discussed viewing our routing tables with the route command, if you wanted to add or remove routes you can do so manually.
Add a new route
$ sudo route add -net 192.168.2.1/23 gw 10.11.12.3
Delete a route
$ sudo route del -net 192.168.2.1/23
You can also perform these changes with the ip command:
To add a route
$ ip route add 192.168.2.1/23 via 10.11.12.3
To delete a route
$ ip route delete 192.168.2.1/23 via 10.11.12.3 or $ ip route delete 192.168.2.1/23
Exercise
There are no exercises for this lesson but you can read more information on commands discussed here in the man pages
$ man route
$ man ip-route
Quiz Question
What is the command flag to delete a route?
Quiz Answer
del
dhclient
Lesson Content
We've discussed DHCP before and most often you will never need to statically set your IP addresses, subnet masks, etc. Instead you'll be using DHCP! The dhclient starts up on boot and gets a list of network interfaces from the dhclient.conf file. For each interface listed it tries to configure the interface using the DHCP protocol.
In the dhclient.leases file, dhclient keeps track of a list of leases across system reboots, after reading dhclient.conf, the dhclient.leases file is read to let it know what leases it's already assigned.
To obtain a fresh IP
$ sudo dhclient
Exercise
No exercises for this lesson.
Quiz Question
What tries to assign IP addresses with the DHCP protocol?
Quiz Answer
dhclient
Network Manager
Lesson Content
Of course if you wanted to have your system's networking up and running automatically there is something already in place for that. Most distributions utilize the NetworkManager daemon to configure their networks automatically.
You'll notice NetworkManager in the form of an applet somewhere on your desktop taskbar if you are using a GUI. As you can see it manages your network's hardware and connection information. For instance on startup, NetworkManager will gather network hardware information, search for connections to wireless, wired, etc. and then activates it.
There are also command-line tools to interact with NetworkManager:
nm-tool
nm-tools reports NetworkManager's state and it's devices
pete@icebox:/$ nm-tool NetworkManager Tool State: connected (global) - Device: eth0 [Wired connection 1] ------------------------------------------- Type: Wired Driver: pcnet32 State: connected Default: yes HW Address: 12:3D:45:56:7D:CC Capabilities: Carrier Detect: yes Wired Properties Carrier: on IPv4 Settings: Address: 192.168.22.1 Prefix: 24 (255.255.255.0) Gateway: 192.168.22.2 DNS: 192.168.22.2
nmcli
The nmcli command allows you to control and modify NetworkManager, see the manpage for more details.
Exercise
No exercises for this lesson.
Quiz Question
What is the command to view NetworkManager information?
Quiz Answer
nm-tool
arp
Lesson Content
Remember when we lookup a MAC address with ARP, it first checks the locally stored ARP cache on our system, you can actually view this cache:
pete@icebox:~$ arp Address HWtype HWaddress Flags Mask Iface 192.168.22.1 ether 00:12:24:fc:12:cc C eth0 192.168.22.254 ether 00:12:45:f2:84:64 C eth0
The ARP cache is actually empty when a machine boots up, it gets populated as packets are being sent to other hosts. If we send a packet to a destination that isn't in the ARP cache, the following happens:
- The source host creates the Ethernet frame with an ARP request packet
- The source host broadcasts this frame to the entire network
- If one of the hosts on the network knows the correct MAC address, it will send a reply packet and frame containing the MAC address
- The source host adds the IP to MAC address mapping to the ARP cache and then proceeds with sending the packet
You can also view your arp cache via the ip command:
$ ip neighbour show
Exercise
Observe what happens to your ARP cache when you reboot your machine and then do something on the network.
Quiz Question
What command can you use to view your ARP cache?
Quiz Answer
arp
Network troubleshooting
ICMP
Lesson Content
The Internet Control Message Protocol (ICMP) is part of the TCP/IP protocol suite, it used to send updates and error messages and is an extremely useful protocol used for debugging network issues such as a failed packet delivery.
Each ICMP message contains a type, code and checksum field. The type field is the type of ICMP message, the code is a sub-type and describes more information about the message and the checksum is used to detect any issues with the integrity of the message.
Let's look at some common ICMP Types:
- Type 0 - Echo Reply
- Type 3 - Destination Unreachable
- Type 8 - Echo Request
- Type 11 - Time Exceeded
When a packet can't get to a destination, Type 3 ICMP message is generated, within Type 3 there are 16 code values that will further describe why it can't get to the destination:
- Code 0 - Network Unreachable
- Code 1 - Host Unreachable etc..etc..
These messages will make more sense as we use some network troubleshooting tools.
Exercise
No exercises for this lesson.
Quiz Question
What is the ICMP type for echo request?
Quiz Answer
8
ping
Lesson Content
One of the most simplest networking tools ping, it's used to test whether or not a packet can reach a host. It works by sending ICMP echo request (Type 8) packets to the destination host and waits for an ICMP echo reply (Type 0). Ping is successful when a host sends out the request packet and receives a response from the target. Let's look at an example:
pete@icebox:~$ ping -c 3 www.google.com PING www.google.com (74.125.239.112) 56(84) bytes of data. 64 bytes from nuq05s01-in-f16.1e100.net (74.125.239.112): icmp_seq=1 ttl=128 time=29.0 ms 64 bytes from nuq05s01-in-f16.1e100.net (74.125.239.112): icmp_seq=2 ttl=128 time=23.7 ms 64 bytes from nuq05s01-in-f16.1e100.net (74.125.239.112): icmp_seq=3 ttl=128 time=15.1 ms
In this example, we are using ping to check if we can get to www.google.com. The -c flag (count) is used to stop sending echo request packets after the count has been reached.
The first part says that we are sending 64-byte packets to 74.125.239.112 (google.com) and the rest show us the details of the trip. By default it sends a packet per second.
icmp_seq
The icmp_seq field is used to show the sequence number of packets sent, so in this case, I sent out 3 packets and we can see that 3 packets made it back. If you do a ping and you get some sequence numbers missing, that means that some connectivity issue is happening and not all your packets are getting through. If the sequence number is out of order, your connection is probably very slow as your packets are exceeding the one second default.
ttl
The Time To Live (ttl) field is used as a hop counter, as you make hops, it decrements the counter by one and once the hop counter reaches 0, our packet dies. This is meant to give the packet a lifespan, we don't want our packets travelling around forever.
time
The roundtrip time it took from you sending the echo request packet to getting an echo reply.
Exercise
Do a ping on a website and look at the output you receive.
Quiz Question
What is the roundtrip time unit of measurement?
Quiz Answer
ms
traceroute
Lesson Content
The traceroute command is used to see how packets are getting routed. It works by sending packets with increasing TTL values, starting with 1. So the first router gets the packet, and it decrements the TTL value by one, thus dropping the packet. The router sends back an ICMP Time Exceeded message back to us. And then the next packet gets a TTL of 2, so it makes it past the first router, but when it gets to the second router the TTL is 0 and it returns another ICMP Time Exceeded message. Traceroute works this way because as it sends and drops packets it is build a list of routers that the packets traverse, until it finally gets to its destination and gets an ICMP Echo Reply message.
Here's a little snippet of a traceroute:
$ traceroute google.com traceroute to google.com (216.58.216.174), 30 hops max, 60 byte packets 1 192.168.4.254 (192.168.4.254) 0.028 ms 0.009 ms 0.008 ms 2 100.64.1.113 (100.64.1.113) 1.227 ms 1.226 ms 0.920 ms 3 100.64.0.20 (100.64.0.20) 1.501 ms 1.556 ms 0.855 ms
Each line is a router or machine that is between me and my target. It shows the name of the target and its IP address and the last three columns correspond to the round-trip time of a packet to get to that router. By default, we send three packets along the route.
Exercise
Run the traceroute command on your machine and observe the output.
Quiz Question
What gets decremented by one when making hops across the network?
Quiz Answer
ttl
netstat
Lesson Content
Well Known Ports
We've discussed data transmission through ports on our machine, let's look at some well known ports.
You can get a list of well-known ports by looking at the file /etc/services:
ftp 21/tcp ssh 22/tcp smtp 25/tcp domain 53/tcp # DNS http 80/tcp https 443/tcp ..etc..
The first column is the name of the service, then the port number and the transport layer protocol it uses.
netstat
An extremely useful tool to get detailed information about your network is netstat. Netstat displays various network related information such network connections, routing tables, information about network interfaces and more, it's the swiss army knife of networking tools. We will focus mostly on one feature netstat has and that's the status of network connections. Before we look at an example, let's talk about sockets and ports first. A socket is an interface that allows programs to send and receive data while a port is used to identify which application should send or receive data. The socket address is the combination of the IP address and port. Every connection between a host and destination requires a unique socket. For example, HTTP is a service that runs on port 80, however we can have many HTTP connections and to maintain each connection a socket gets created per connection.
pete@icebox:~$ netstat -at Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 icebox:domain *:* LISTEN tcp 0 0 localhost:ipp *:* LISTEN tcp 0 0 icebox.lan:44468 124.28.28.50:http TIME_WAIT tcp 0 0 icebox.lan:34751 124.28.29.50:http TIME_WAIT tcp 0 0 icebox.lan:34604 economy.canonical.:http TIME_WAIT tcp6 0 0 ip6-localhost:ipp [::]:* LISTEN tcp6 1 0 ip6-localhost:35094 ip6-localhost:ipp CLOSE_WAIT tcp6 0 0 ip6-localhost:ipp ip6-localhost:35094 FIN_WAIT2
The netstat -a command shows the listening and non-listening sockets for network connections, the -t flag shows only tcp connections.
The columns are as follows from left to right:
- Proto: Protocol used, TCP or UDP.
- Recv-Q: Data that is queued to be received
- Send-Q: Data that is queued to be sent
- Local Address: Locally connected host
- Foreign Address: Remotely connected host
- State: The state of the socket
See the manpage for a list of socket states, but here are a few:
- LISTENING: The socket is listening for incoming connections, remember when we make a TCP connection our destination has to be listening for us before we can connect.
- SYN_SENT: The socket is actively attempting to establish a connection.
- ESTABLISHED: The socket has an established connection
- CLOSE_WAIT: The remote host has shutdown and we're waiting for the socket to close
- TIME_WAIT: The socket is waiting after close to handle packets still in the network
Exercise
Look at the manpage for netstat and learn all the features it has to offer.
Quiz Question
What port is used for HTTPS?
Quiz Answer
443
Packet Analysis
Lesson Content
The subject of packet analysis could fill an entire course of its own and there are many books written just on packet analysis. However, today we will just learn the basics. There are two extremely popular packet analyzers, Wireshark and tcpdump. These tools scan your network interfaces, capture the packet activity, parse the packages and output the information for us to see. They allows us to get into the nitty gritty of network analysis and get into the low level stuff. We'll be using tcpdump since it has a simpler interface, however if you were to pick up packet analysis for your toolbelt, I would recommend looking into Wireshark.
Install tcpdump
$ sudo apt install tcpdump
Capture packet data on an interface
pete@icebox:~$ sudo tcpdump -i wlan0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wlan0, link-type EN10MB (Ethernet), capture size 65535 bytes 11:28:23.958840 IP icebox.lan > nuq04s29-in-f4.1e100.net: ICMP echo request, id 1901, seq 2, length 64 11:28:23.970928 IP nuq04s29-in-f4.1e100.net > icebox.lan: ICMP echo reply, id 1901, seq 2, length 64 11:28:24.960464 IP icebox.lan > nuq04s29-in-f4.1e100.net: ICMP echo request, id 1901, seq 3, length 64 11:28:24.979299 IP nuq04s29-in-f4.1e100.net > icebox.lan: ICMP echo reply, id 1901, seq 3, length 64 11:28:25.961869 IP icebox.lan > nuq04s29-in-f4.1e100.net: ICMP echo request, id 1901, seq 4, length 64 11:28:25.976176 IP nuq04s29-in-f4.1e100.net > icebox.lan: ICMP echo reply, id 1901, seq 4, length 64 11:28:26.963667 IP icebox.lan > nuq04s29-in-f4.1e100.net: ICMP echo request, id 1901, seq 5, length 64 11:28:26.976137 IP nuq04s29-in-f4.1e100.net > icebox.lan: ICMP echo reply, id 1901, seq 5, length 64 11:28:30.674953 ARP, Request who-has 172.254.1.0 tell ThePickleParty.lan, length 28 11:28:31.190665 IP ThePickleParty.lan.51056 > 192.168.86.255.rfe: UDP, length 306
You'll notice a lot of stuff happening when you run a packet capture, well that's to be expected there's a lot of network activity happening in the background. In my above example, I've taken only a snippet of my capture specifically the time when I decided to ping www.google.com.
Understanding the output
11:28:23.958840 IP icebox.lan > nuq04s29-in-f4.1e100.net: ICMP echo request, id 1901, seq 2, length 64 11:28:23.970928 IP nuq04s29-in-f4.1e100.net > icebox.lan: ICMP echo reply, id 1901, seq 2, length 64
- The first field is a timestamp of the network activity
- IP, this contains the protocol information
- Next, you'll see the source and destination address: icebox.lan > nuq04s29-in-f4.1e100.net
- seq, this is the TCP packets's starting and ending sequence number
- length, length in bytes
As you can see from our tcpdump output, we are sending an ICMP echo request packet to www.google.com and getting an ICMP echo reply packet in return! Also note that different packets will output different information, refer to the manpage to see what those are.
Writing tcpdump output to a file
$ sudo tcpdump -w /some/file
Some final thoughts: we only scraped the surface of the subject of packet analysis. There is so much you can look at and we haven't even touched upon going even deeper with Hex and ASCII output. There are plenty of resources online to help you learn more about packet analyzers and I urge you to find them!
Exercise
Download and install the Wireshark tool and play around with the interface.
Quiz Question
What is the flag to capture a specific interface with tcpdump?
Quiz Answer
-i
DNS
What is DNS?
Lesson Content
Imagine if every time you wanted to do a search on Google you had to type in http://192.78.12.4 instead of www.google.com. Well without DNS ("Domain Name System") that's exactly what would happen. Low level networking only understands the raw IP address to identify a host. DNS allows us humans to keep track of websites and hosts by name instead of an IP address. It's like a contact list for the Internet. If you know someone's name but don’t know their phone number, you can simply look it up in your contacts list.
DNS is fundamentally a distributed database of hostnames to IP addresses, we manage our database so people know how to get to our site/domain, and somewhere else another person is managing their database so others can get to their domain. These domains are then able to talk to each other and build a massive contact list of the Internet.
In this course, we will go over some basics of DNS, but be wary that DNS is an exhaustive topic and if you really want to get down and dirty with it, you'll need to do some additional research.
Exercise
No exercises for this lesson.
Quiz Question
True or false, DNS helps us find MAC addresses for hostnames?
Quiz Answer
false
DNS Components
Lesson Content
The DNS database of the Internet relies on sites and organizations providing part of that database. To do that, they need:
Name Server
We setup DNS via "name servers", the name servers load up our DNS settings and configs and answers any questions from clients or other servers that want to know things like "Who is google.com?". If the name server doesn't know the answer to that query, it will redirect the request to other name servers. Name servers can be "authoritative", meaning they hold the actual DNS records that you're looking for, or "recursive" meaning they would ask other servers and those servers would ask other servers until they found an authoritative server that contained the DNS records. Recursive servers can also have the information we want cached instead of reaching an authoritative server.
Zone File
Inside a name server lives something called zone files. Zone files are how the name server stores information about the domain or how to get to the domain if it doesn't know.
Resource Records
A zone file is comprised of entries of resource records. Each line is a record and contains information about hosts, nameservers, other resources, etc. The fields consist of the following:
- Record name
- TTL - The time after which we discard the record and obtain a new one, in DNS TTL is denoted by time, so records could have a TTL of one hour. The reason we do this is because the Internet is constantly changing, one minute a host can be mapped to X IP address then next it can be at Y IP address
- Class - Namespace of the record information, most commonly IN is used for Internet
- Type - Type of information stored in the record data. We won't get into record types, but you've probably seen common ones like A for address, MX or mail exchanger, etc.
- Data - This field can contain an IP address if it's an A record or something else depending on the record type.
patty IN A 192.168.0.4
Exercise
No exercises for this lesson.
Quiz Question
What resource record type is used for mail exchangers?
Quiz Answer
MX
DNS Process
Lesson Content
Let's look at an example of how your host finds a domain (catzontheinterwebz.com) with DNS. Essentially, we funnel our way down until we reach the DNS server that knows of that domain.
Local DNS Server
First our host asks, "Where is catzontheinterwebz.com?", our local DNS server doesn't know so it goes and starts from the top of the funnel to ask the Root Servers. Keep in mind that our host is not making these requests to find catzontheinterwebz.com directly, most users talk to a recursive DNS server provided by their ISPs and that server is then tasked to find the location of catzontheinterwebz.com.
Root Servers
There are 13 Root Servers for the Internet, they are mirrored and distributed around the world to handle DNS requests for the Internet, so there are really hundreds of servers that are working, they are controlled by different organizations and they contain information about Top-Level Domains. Top-level domains are what you know as .org, .com, .net, etc addresses. So the Root Server doesn't know where catzontheinterwebz.com is, so it tells us ask the .com Top-Level Domain DNS Server at an IP address it gives us.
Top-Level Domain
So now we send another request to the name server that knows about ".com" addresses and asks if it knows where catzontheinterwebz.com is? The TLD doesn't have the catzontheinterwebz.com in their zone files, but it does see a record for the name server for catzontheinterwebz.com. So it gives us the IP address of that name server and tells us to look there.
Authoritative DNS Server
Now we send a final request to the DNS server that actually has the record we want. The name server sees that it has a zone file for catzontheinterwebz.com and there is a resource record for 'www' for this host. It then gives us the IP address of this host and we can finally see some cats on the Internet.
Exercise
No exercises for this lesson.
Quiz Question
What is the abbreviation for the nameservers where .com, .net, .org, etc addresses are found?
Quiz Answer
TLD
/etc/hosts
Lesson Content
Before our machine actually hits DNS to do a query, it first looks locally on our machines.
/etc/hosts
The /etc/hosts file contains mappings of some hostnames to IP addresses. The fields are pretty self explanatory, there is one for the IP address, the hostname and then any alias's for the host.
pete@icebox:~$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 icebox
You'll typically see your localhost address listed as a default in this file. You can also manage access to hosts by modifying the /etc/hosts.deny or /etc/hosts.allow files. However, if you were security conscientious, this isn't really the way to go and you should be modifying your firewall rules instead.
Let's see a fun example of /etc/hosts. Modify the file and add a line for:
123.45.6.7 www.google.com
Save the file and now go to www.google.com. Having issues aren't you? Well that's because we just mapped www.google.com to a completely wrong IP address. Since our hosts first look locally for IP address mappings, it never reaches DNS to find google.com.
/etc/resolv.conf
Traditionally, we've used a file called /etc/resolv.conf to map DNS name servers for more efficient lookups, however with the improvements made to DNS this file is quite often irrelevant, in fact, you can see in my example below that /etc/resolv.conf isn't managed manually. Refer to your distribution specific settings to manage DNS name server mappings.
conf(5) file for glibc resolver(3) generated by resolvconf(8) # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN nameserver 127.0.1.1 search localdomain
Exercise
No exercises for this lesson.
Quiz Question
What file is used to map hostnames to IP addresses on our machines?
Quiz Answer
/etc/hosts
DNS Setup
Lesson Content
We won't got through setting up a DNS server, as that would be quite a lengthy tutorial. Instead here is a quick comparison list of the popular DNS servers to use with Linux.
BIND
The most popular DNS server on the Internet, it's the standard that is used with Linux distributions. It was originally developed at the University of California at Berkeley hence the name BIND (Berkeley Internet Name Domain). If you need full-featured power and flexibility, you can't go wrong with BIND.
DNSmasq
Lightweight and much easier to configure than BIND. If you want simplicity and don't need all the bells and whistles of BIND, use DNSmasq. It comes with all the tools you need to setup DHCP and DNS, recommended for a smaller network.
PowerDNS
Full-featured and similar to BIND, it offers you a little bit more flexibility with options. It reads information from multiple databases such as MySQL, PostgreSQL, etc. for easier administration. Just because BIND has been the way we do things, it doesn't mean it has to stay that way.
This isn't a complete list, but it should give you an idea of where to look if you are setting up your own DNS server.
Exercise
No exercises for this lesson.
Quiz Question
What is the de facto DNS server for Linux?
Quiz Answer
BIND
DNS Tools
Lesson Content
nslookup
The "name server lookup" tool is used to query name servers to find information about resource records. Let's find where the name server for google.com is:
pete@icebox:~$ nslookup www.google.com Server: 127.0.1.1 Address: 127.0.1.1#53 Non-authoritative answer: Name: www.google.com Address: 216.58.192.4
dig
Dig (domain information groper) is a powerful tool for getting information about DNS name servers, it is more flexible than nslookup and great for troubleshooting DNS issues.
pete@icebox:~$ dig www.google.com ; <<>> DiG 9.9.5-3-Ubuntu <<>> www.google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42376 ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; MBZ: 0005 , udp: 512 ;; QUESTION SECTION: ;www.google.com. IN A ;; ANSWER SECTION: www.google.com. 5 IN A 74.125.239.147 www.google.com. 5 IN A 74.125.239.144 www.google.com. 5 IN A 74.125.239.146 www.google.com. 5 IN A 74.125.239.145 www.google.com. 5 IN A 74.125.239.148 ;; Query time: 27 msec ;; SERVER: 127.0.1.1#53(127.0.1.1) ;; WHEN: Sun Feb 07 10:14:00 PST 2016 ;; MSG SIZE rcvd: 123
Exercise
Read up on the manpage for dig.
Quiz Question
What tool is used to get detailed information about DNS name servers?
Quiz Answer
dig