Part 10 — A complete beginner’s guide to Computer Programming with Clojure: Files.

Photo by Sincerely Media on Unsplash

In Part 8, we touched briefly on file handling; the supplied example required two tools:

(require ‘[clojure.java.io :as io])(require ‘[clojure.edn :as edn])

The first, clojure.java.io, is the main tool used by Clojure to control file input and output.

The second, edn, concerns extensible data notation. This edn tool allowed us to take the Library hash-map created in Part 8 and store (or retrieve) it in a text file. In fact, a hash-map such as our Library example is actually written in edn format. You should think of edn as a schema similar to JSON and XML. In other words, it is a system for storing and describing data.

Spit & Slurp

Clearly, the inventors of Clojure have a sense of humor! The keyword to write to a file is spit, with slurp for file reading. Both of these keywords are inbuilt functions.

The general format for writing to a file is:

(spit filename content)

If the file does not exist, the file will be created on the fly to accommodate the content. If the file already has content, it will be overwritten. In programming, overwriting a file is called clobbering. Therefore, if you wish to add to the contents of a file you must use the append keyword:

(spit filename content :append true)

The general format for reading from a file is:

(slurp filename)

The following Code demonstrates reading, writing, overwriting, and appending to a file.

(spit "file.txt" "Hello World")(slurp "file.txt")
"Hello World"
(spit "file.txt" "goodbye World")(slurp "file.txt")
"goodbye World"
(spit "file.txt" " and Hello World" :append true)(slurp "file.txt")
"goodbye World and Hello World"

Repetition

Recall from Part1, all programs can be built out of three elements, sequence, selection, and repetition. Repetition is the ability to execute a piece of code or function while a condition exists or doesn’t exist. This is known as recursion or looping.

The classic repetition demonstration involves counting to 10 and printing to the screen each repetition:

(loop [x 1]
(when (< x 11)
(println x)
(recur (+ x 1))))
1
2
3
4
5
6
7
8
9
10
=> nil

Let’s explain this code.

The first line, loop [x 1] is referred to as binding. In other words, bind the value 1 to x. The next line executes a test. Here we test to see if the value of x is less than 11. Therefore, if when test evaluates to true then the next line of code will be executed. Consider the following:

(when (< 1 11) 
println "moves to the next line of code")
=> "moves to the next line of code"
(when (< 10 11)
println "moves to the next line of code")
=> "moves to the next line of code"
(when (< 11 11)
println "moves to the next line of code")
=> nil

As you can see above, once the code stopped, it immediately evaluates to nil; println is not called.

The next line of our loop program, println just prints the value of x. The final line, recur (+ x 1) will return to the beginning of our code and call the loop function again. However, immediately before it will add 1 to the value of x; the value of x is increased from 1 to 2. This program will continue to repeat while the value of x is less than 11.

We could have also written our code as:

(loop [x 1]
(when (< x 11)
(println x)
(recur (inc x))))

Or with a doseq statement:

(doseq [x (range 1 11)]
(println x)
)

This also works as a for statement:

(for [x (range 1 11)]
(println x)
)

Note, the range inbuilt function. This function’s operation should be obvious.

Lazy vs Eager

However, there is a difference between for and doseq. The for statement will always build a lazy list. Recall from Part 9, the term ‘lazy’ infers the list, or sequence, is only available after a function has been applied. Consider the following:

(for [x ["a" "b" "c"]] x)
=> ("a" "b" "c")
(doseq [x ["a" "b" "c"]] x)
=> nil

The above shows that for actually creates a lazy list or sequence e.g. (“a” “b” “c”). Whereas, doseq will return the result, or effect, of the function. In this case, the effect is nil.

Another example:

(def lazy (for [x ["a" "b" "c"]] (println x)))
=> #'user/lazy
(def eager (doseq [x ["a" "b" "c"]] (println x)))
a
b
c

This brings in the other programming concept, eager. A lot has been written about both these concepts. Suffice to say, lazy won’t execute until it is required, whereas eager executes the code immediately. Hence, because it is eager, doseq immediately ran the println function. However, for held back! To reiterate, doseq by design is eager. Whereas, for is created to be lazy.

Find & Replace

Now we’ve dispensed with some theory, let’s use repetition to process a file. In short, we will use repetition to find specific strings of text and replace each instance with a different string.

First, we will create a text file.

(require ‘[clojure.java.io :as io])(spit “file.txt” “Find this Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit amet congue magna. Quisque sed quam diam. Quisque interdum nisl eu tristique mattis. Morbi tincidunt ipsum ut lacinia congue. Donec auctor laoreet urna, ut dictum Find this ex hendrerit id. Etiam vestibulum dolor eget urna commodo, nec laoreet est blandit. Donec congue justo ut lorem bibendum consequat.
Find this
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit amet congue magna. Quisque sed quam diam. Quisque interdum nisl eu tristique mattis. Morbi tincidunt ipsum ut lacinia congue. Donec auctor laoreet urna, ut dictum
Find this
“)
(println (slurp "file.txt"))
Find this
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit amet congue magna. Quisque sed quam diam. Quisque interdum nisl eu tristique mattis. Morbi tincidunt ipsum ut lacinia congue. Donec auctor laoreet urna, ut dictum Find this ex hendrerit id. Etiam vestibulum dolor eget urna commodo, nec laoreet est blandit. Donec congue justo ut lorem bibendum consequat.
Find this
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit amet congue magna. Quisque sed quam diam. Quisque interdum nisl eu tristique mattis. Morbi tincidunt ipsum ut lacinia congue. Donec auctor laoreet urna, ut dictum
Find this

=> nil

We can see that file.txt has the string, “Find this” 4 times. We will write a simple function to run through the file and replace each instance of “Find this” with “Replace with this”.

(defn swapper [& more]
(for [x more]
(spit x (.replace (slurp x)
“Find this” “Replace with this”))))

Our function is called swapper and is designed to use for to repetitively find the string “Find this” and replace it with the string “Replace with this”. Notice the additional functionality by using .replace (Note, .replace is the same as clojure.string/replace).

(swapper "file.txt")
=> (nil)
(println (slurp "file.txt"))
Replace with this
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit amet congue magna. Quisque sed quam diam. Quisque interdum nisl eu tristique mattis. Morbi tincidunt ipsum ut lacinia congue. Donec auctor laoreet urna, ut dictum Replace with this ex hendrerit id. Etiam vestibulum dolor eget urna commodo, nec laoreet est blandit. Donec congue justo ut lorem bibendum consequat.
Replace with this
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit amet congue magna. Quisque sed quam diam. Quisque interdum nisl eu tristique mattis. Morbi tincidunt ipsum ut lacinia congue. Donec auctor laoreet urna, ut dictum
Replace with this

=> nil

As you can see, our file.txt has had its contents altered by our swapper function.

File Extraction

How about just extracting specific content from a file and creating a new file from the extracted content? Our file.txt has a lot of Latin filler text. For instance, it contains the word “sit” 4 times. Let’s use a REGEX to extract them all and place them in a new file called result.txt.

(spit "result.txt" (re-seq #"sit" (slurp "file.txt")))
=> nil
(println (slurp "result.txt"))
("sit" "sit" "sit" "sit")
=> nil

Could you re-write the above with a REGEX designed to extract mobile phone numbers or internet IP addresses?

Shell

In Part 2, we installed a virtual machine (VM) to run all our Clojure code. Recall, this VM is Linux. Linux has a lot of inbuilt system commands for dealing with files and directories; some covered in Part 2. By calling these system commands to create directories and files we don’t have to create our own functions. Particularly useful if you are already familiar with the Linux command-line.

The Linux system command-line environment is also called the system shell. We will need to call in a Library function to allow us to utilize this shell and call Linux commands from our REPL.

(use '[clojure.java.shell :only [sh]])

This will allow us to send Linux commands to the Linux shell as text strings and execute them from inside our REPL. For example, we can run Linux commands like mkdir (create a directory), cd (change to a directory), and ls (list the contents of a directory).

For instance, to create a directory called new.

(sh "mkdir" "new")

To create a file called new.txt we could use the Linux touch command.

(sh "touch" "newfile.txt")

Let’s list everything in the directory and print it to the screen in an easy-to-read format.

(println (sh "ls" "-l"))
{:exit 0, :out total 28
drwxr-xr-x 1 runner runner 16 Dec 28 18:03 SillyConcernedLoops
-rw-r--r-- 1 runner runner 664 Dec 29 11:01 file.txt
drwxr-xr-x 2 runner runner 4096 Dec 29 11:28 new
-rw-r--r-- 1 runner runner 0 Dec 29 11:29 newfile.txt
-rw-r--r-- 1 runner runner 2447 Jul 20 15:58 pom.xml
-rw-r--r-- 1 root root 425 Jul 20 15:57 project.clj
-rw-r--r-- 1 runner runner 25 Dec 29 11:04 result.txt
drwxr-xr-x 1 runner runner 4096 Dec 29 11:01 target
, :err }
=> nil

Clearly, we can see the new directory (line begins with a d) and our newfile.txt.

drwxr-xr-x 2 runner runner 4096 Dec 29 11:28 new
-rw-r--r-- 1 runner runner 0 Dec 29 11:29 newfile.txt

SUMMARY

As usual, we covered a little more than just file handling. We also brought in some computer theory and jargon. For instance, we introduced the term clobbering for file overwriting. We also discussed Lazy vs Eager functions. The point is, programs can be written in many ways and produce the same result. Nevertheless, how it obtains this result can vary considerably. This is why it is important to understand the fundamentals of sequence, selection, and repetition. For example, you can create repetition in many ways but you don’t need to know them all to write working Code.

Next, we looked at 2 useful file functions: Find & Replace, and File Extraction. The ability to extract specific information, such as telephone numbers, out of large files is particularly useful.

Finally, we looked at how we can access and use the host operating system (OS). This technique of running OS commands from inside another program is a form of obfuscation. Obfuscation is a technique used by Hackers to hide malicious code. For example, the Hacker may create a shell function that calls the OS and deletes files and directories on a specific date and time. In short, a seemingly benign program can actually contain a Logic bomb!

Previous

Part 11 — Chatbot

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store