Basic Concepts 3 – Data and its Binary Representation

File:ASCII Code Chart-Quick ref card.png

If you are reading this article, it is likely that you own or have access to a computer. Devices like laptops, tablets and smartphones, but also internet servers are all computers.

Computers might contain different types of data, like texts, pictures, sound files, films, web pages, spread sheets, etc. For each type of data, there are programs (marketing people have recently coined the term “apps”, but that is the same thing) that can process or display data of that type. For example, web pages are displayed using a browser, pictures are displayed using graphics programs.

Without these programs, there would be no different data types. In the computer’s storage media, all types of data look the same. You can think of them as sequences of ones and zeros, but one can think of these two “digits” as characters of a simple alphabet as well that contains only two different characters. However, normally they are interpreted as binary digits.

Other types of data are represented as sequences of these binary digits or characters. For example, a program that can display a text file as text on your computer’s screen might translate the sequence “01100001” that it finds in a file on a storage device as the lower case letter “a”. It would use a certain code (in this case a code known as ASCII) in which every lower case letter, every upper case letter, every special character (including the empty space “ “) and every decimal digit is represented by a combination of 8 binary digits. There are 256 different combinations of 8 binary digits, so such a code would provide codings for up to 256 different characters (although the ASCII-code contains codes for lell characters). If you want to represent scripts with larger numbers of characters, like Chinese, you would need longer sequences of binary digits. For example, there are codes with 16 digits for each sign that have enough “space” to provide a code for almost any script in use today (and even some scripts no longer in active use, like Egyptian hieroglyphs).

Many kinds of data can be represented as sequences of characters and can be stored inside computers using such codes. Other types of data may be represented as strings of binary digits in another way. A graphics program might take some sequences of ones and zeros as representing binary numbers (that in turn represent the color and brightness of pixels of a picture. Although here the storage format would not use characters coded as binary sequences, one can still think of such data formats as some kind of code. The rules of that code are part of the program used to display or manipulate the data.

One might think of such a program as something comparable to the grammar of a language. If you hear the sounds of a foreign language that you don’t know, it is “just a sequence of sounds” for you. If you know the language, you can perceive a structure in that sequence. You perceive words, clauses and sentences, and you can understand these sentences. In a comparable way, you can look at the data stored in your computer as “just a sequence of binary digits” or you can look at it as something structured and meaningful. However, you do not need to know the “grammar” of the data type or file format yourself, but that “grammatical knowledge” is automated for you in the form of the programs you are using to access the data.

So from your perspective as a user, the computer contains different types of things, like pictures, texts, music, films etc. But one can also look at it from a technical perspective. Viewed this way, all the data objects in your computer have the same form. One can view them as sequences of ones and zeros.

The programs are a kind of data themselves. We are going to look at this important type of data in more detail in later articles.

The important idea to take away from this article is that the different types of things that you can interact with in your computer are all made up of the same “stuff”. The distinctions that turn them into different kinds of things are introduced by the programs running on the computer. So what is going on inside a computer can be described in different ways. These different views do not exclude each other. We may think of these different views as “layers of description”. We can describe a computer from a user’s viewpoint. From this point of view the computer presents a “world” that contains different types of things. On a more technical layer, all the data inside a computer looks the same: sequences of ones and zeros.

(The picture, showing a code table for the ASCII code mentioned in the text, is from https://commons.wikimedia.org/wiki/File:ASCII_Code_Chart-Quick_ref_card.png)

Advertisements

4 comments

  1. The idea behind this article is to pick up people where they are standing. Many people today (and probably all of those reading blogs) have experience with computers. and data I am trying to start with computers as experienced by people and move from there to the way data is encoded internally. This prepares us for the next step, taken in the next article, where I am going to introduce the concept of Gödel numbers.

  2. I strongly suspect this is stuff you understand, but it’s worth noting that even the numbers are themselves interpretations of what’s going on inside the computer, which (in contemporary equipment) is ultimately just micro-transistors in various voltage states, one range of which we interpret to be a bit with a value of 1 and another range to be 0. But the number description layer is indeed a crucial engineering layer because it drives the design that causes the sequence of hardware states.

    I’d also note that programs, which can themselves be considered just sequences of numbers, only manipulate other numbers. It’s the I/O hardware that maps these numbers into the things we think about being there. For instance, a value at a particular memory address in the video processor circuitry translates into a specific color of pixel at a specific location on the screen. Signals interpreted as numbers coming in from a certain device address represent a specific key being pressed on the keyboard, etc.

    1. You are absolutely right with what you are writing here. Thinking of what is inside a computer as numbers or characters etc. is an abstraction, an as-if-construction. And what you write about the role of devices like screens or speakers is absolutely correct.

      The description I am giving here is definitely not complete and a bit simplistic. I want people without much of a maths or technology background to be able to understand the following articles. You are maybe not really a member of the target group of these preliminary articles. 🙂

      One could introduce concepts like computation, Turing machines etc. without any reference to physical computers, as is done in some textbooks on computability theory, where algorithms are treated as mere mathematical objects. However, I prefer here, for dedactical reasons, to start with what people know, which is physical computers. This approach forces me then to look at computers through the glasses of abstraction. There is maybe no optimal way to explaining things. But I want to avoid the pitfall into which many Wikipedia articles on scientific and mathematical matters have fallen: they have fallen into the hands of experts. As a result, they become very exact but at the same time completely uncomprehensible for the non-expert. In order to be comprehensible, you have to accept certain level of haziness and then maybe go through the same stuff again a second and third time. You build up some simple pre-understanding and prepare the audience to understand certain things first, then you can provide the details later.
      The question of the ontological status of such abstractions like numbers in computers (where actually there are just voltages etc.) is indeed an interesting one but is not in the direction I am moving into at the moment (which is “upwards” towards the more abstract descriptions).

      1. nannus, I understand and agree completely, particularly about the Wikipedia articles, which in many cases have become largely useless to the uninitiated. Sorry, had no intention of disrupting what you’re doing. I just thought those points were interesting. Looking forward to the rest of the series.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: