Why TRUE + TRUE = 2: Data Types

In the early days of computing programmers needed to be very sure about the data, they were operating on. If an operation was fed a number when it was expecting a letter, a character: at best you might get a garbled response, and at worst you could break the system. Maybe even physically. Now, at the low level of coding, yeah, that’s still true. But these days we have programming languages where we don’t always need to be so rigorous in defining the data, and we can let the computer figure it out. But for something that seems so technical, this can be controversial.

When you write a computer program, you use variables, which are basically just labeled buckets of memory. Inside that bucket is some data, and you can change it, you can very it. Hence, variable. I know it’s a massive simplification, but computer memory is a bit like an enormous rack of switches storing ones and zeros that represent other things like letters and numbers. But if we look into the region of memory (1 0 0 0 1 1 1 1), there is nothing in there to indicate what those ones and zeros are actually representing.  So in the code, we declare that the variable is a particular type.

What’s contained in that variable, in that bucket? it’s an integer.

What’s in that output one? It’s a string of characters.

That tells the computer how to interpret those ones and zeros in memory. The types that you get to use can differ a bit between languages. But in general, you will at least have: Integer or INT. That’s a whole number that can’t have anything after the decimal point. And those are extremely useful for storing things like 0 0 0 1 0 1 0 1 to 42 the number of times you have looped through some code, or how many points your player’s clocked up, or how many pennies there are in someone’s account. Then you have got character pr CHAR.

These are letters, numbers, punctuation, and whitespaces, like the space between words, and instructions to start a new line. And in most high-level languages, you will probably be using a STRING instead, which is just a string of characters. Then you have got Boolean, or BOOL, named after George Boole, an English mathematician. That’s very simple: it’s TRUE or FALSE. A boolean only contains either a zero or a one. A yes or a no. Ano or a yes. Then there are floating-point numbers or FLOATs. Floats are complicated and messy and a whole other lecture, but in short, they let you store numbers with decimals, although you might lose a very small bit of precision as you do it 42.5. There are others, other types, in a lot of languages, I know it’s more complicated than this: but this is just the basics.

So. Most languages use “explicit type declaration”. So when you declare a when you set up that bucket, you have to also declare its type. So, x is an integer, it can only hold integers, and right now, that integer is 2 int x = 2 (0 0 0 0 0 0 1 0).  Including some popular ones that people tend to get started with, and that I like, you don’t need to actually declare that. It just gets figured out from your code. That’s called “implicit type declaration”.  So in JavaScript, you can just type x = 1.5 and it will know, that’s a number 1.5.  Put them "1.5" in quotes, and it will go, ah, it’s a string. So, okay, it’s storing 1.5 and "1.5" as different ones and zeros.

Why does that matter?

Well, in JavaScript, the plus sign means two different things. It’s the addition operator, for adding two numbers together. But it’s also the concatenation operator, for combining two strings together. So if, x is "1.5", you ask for x + x it returns 3. But if either of that, xs is "1.5"a string, like xa = "1.5", xb = 1.5, xa + xb =  it will return that “1.51.5”. And that’s called “type casting”; converting from one data type to another. Some languages require the programmer to explicitly request the conversion in code. Other languages, like JavaScript there, do it autometically. JavaScript is referred to as having “weak typing” as opposed to “strong typing”. And it’s weak beacuse, even if that 1.5 is a strong, and you ask for it multiplied by 2 it will retun 3 like x = "1.5", x * 2 = 3 . Unlike the plus sign, that asterisk can only mean “multiply”, so it can only handle an integer or a floating-point number. Give it a string, though, and it won’t throw an error like a strongly-typed language would. it will just convert it for you on the fly. Really convenient. Really easy to program with. Really easy to accidentally screw things up and create a bug that will take you hours to track down. Or worse, create a bug that you don’t even notice until much, much later.

In a lot of languages, you can also cast to and from boolean values. Which is called “truthiness”, and experienced programmers who are watching this may already be grimacing. Truthiness is a great shorthand. If you convert an empty string to a boolean, it generally comes out as false. Anything else,  true. So you can just test for an empty string with if(x) . But that also means that in JavaScript, you can ask for true + true and it will tell you that the answer to that is 2, because when you cast 'true' to a number you get 1. In PHP, a language notable for many questionable design decisions, even a string with just a single zero "0" in it will get converted to a boolean false, there is a special case just for that string. Which can cause a lot of unexpected bugs.

Now, ther is is a  workaround for that in loosely-typed languages. Normally, if you want to compare two variables, you use two equals signs, like this 1.5 == "1.5" . You can’t use a single one, because that’s used for assigning variables. I have been coding for about thirty years and I still absent-mindedly screw that up sometimes.

Now, if you ask if 1.5 is equal to "1.5" with two equals signs in JavaScript or PHP, you will get “true”. but if you add a third equals sign ===, then you are asking for strict equality. If the data types don’t match, any comparison will automatically fail.

So why is all this controversial? Well, languages like JavaScript and PHP get a bad reputation because they use weak typing. If you see yourself as a Real Programmer — and I am using that sarcastically, but if you see yourself as the kind of programmer where you are in control of everything, then yeah, you can see that weak typing is like training wheels, something that introduces sloppy coding practices and bugs and shorthand. And that’s not unfair. But weak typing also makes programming easier to learn and easier to do, it can reduce frustration and just make programmers’ lives easier. It is a trade-off: even if it is a controversial one.

This article is based on Tom Scott 


Exit mobile version