Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Not my type

Posted on Oct 22 An irreverent yet positively innovative approach regarding data typingI love programming. I discovered the computers and wrote my first programs when I was a kid and I've never stopped since then.For some time now, I've been working on a Web App framework, and this led me to think about a lot of things, amongst which data typing.It took me quite a long time to realize that the conventional approach of using Variable types in most programming languages leads developers to create programs that require tremendous efforts to deal with validation, adaptation, storage and rendering of the manipulated values.The reason is that traditional data typing in programs is very permissive, tends to prioritizes technical needs over human reasoning, often lacks of context validation, and is rarely explicit in terms of human representation.This article presents a solution that aims to tackle these limitations by suggesting a more complete, flexible and unambiguous way to define and to handle data. Then you're using an ambiguous typing and you probably have to manually deal with validations and conversions. This article presents the advantages of using an explicit typing, and suggests that a simple syntax inspired by the Media Type notation could be used as a supplement or even as a replacement to traditional data types.Example: "A money amount with up to 12 integer digits and 4 decimal digits" (FASAB/GAAP compliant) :As a developer, I deal with variables and types all the time. It is so common that it has become difficult to explain what exactly is a variable. Let's try anyway.All programming languages rely on data types in order to define how each piece of data has to be stored in memory, and to guarantee that operations performed on the data are consistent.Programs use names to identify the memory locations that are assigned for storing the data. These names are called "variables".In most cases, the value stored in a variable can change during the execution of the program (which is why it is called a "variable").A variable can be used in computations, assigned to other variables, or passed as argument to functions or methods.Most programming languages use types as classification of variables based on the kind of data they represent.The most common types, used in nearly all programming languages, are : bool(ean), int(eger), float(ing-point number), and string.Whatever the purpose of the processing, when a program handles variables there are 4 key considerations from the developer perspective:In most cases, programming languages deal with a single concept of "type" to manage all these considerations, which often turns out to be incomplete or even impossible. And some of (and sometimes all of) these tasks are left to the discretion of the developer on a case-by-case basis.Data modeling consists of telling the kind of data (human concept) that is stored into a variable (computer concept), and specifying if some constraints must be applied on a variable. The questions it addresses are "how to validate it" and "what operations can be performed on it".In most situations, the type of a variable is used to control the operations that can be performed on it. It is also used to check that the variable that is passed as parameter to a function has the expected type, based on the function signature.This is because the intended use of a piece of data always has some implicit restrictions. As a consequence, some operators might be forbidden for variables of a certain type. For instance, the operator / is valid for numbers (means "perform a division") but doesn't make sense for strings.Also, depending on the language, a same operator might have distinct meaning depending on the type of the operands. For instance, in JS or C++, using the operator + on numbers means "perform an arithmetic addition on these values" while using it on strings means "concatenate these strings".Observations : Most languages allow developers to define variables amongst the following models : character, number, string, binary value and array.But these types can describe a lot of things: The Model should not only tell "What it is" but also "What it is intended for". Doing so makes it possible to bind some constraints to a variable. And the advantage of having constraints bound to data types is that, in case a submitted value does not comply with what is expected, it is easy to automatically provide the user with an accurate feedback about the reason why its submission was rejected. Data storage relates to the amount of memory the computer must reserve in order to store the variable. The question it addresses is "what amount of memory must be allocated".Types are almost always used to tell the computer how much memory (what amount of bytes) should be allocated for storing the value of a variable.In the early days of computing, when every byte was valuable, resource allocation efficiency was a crucial consideration. And the rigor in the choice of the most appropriate type in terms of memory was decisive in order to take advantage of the least available bit. But it also resulted in not very intuitive and sometimes ambiguous notations. For instance, in C and C++, the int type can be stored on 2 or 4 bytes depending on the environment.Here are the elementary computer memory units: The way types are defined depends mostly on machine architecture (CPU registers capacity: 16 bits, 32 bits, 64 bits, ...) and on common values expected to be manipulated for different levels of precision (min, max) according to Byte multiples (int32, unsigned int, long long int, ...)Here are a few examples of variable declaration (as it can be found in common programming languages) that are ambiguous for humans in terms of memory allocation:Observations : Data Exchange relates to the transfer of data and variables from one environment to another. The question it addresses is "How to convert a piece of data received from the outside into a local variable and convert it back for sending a response".Each environment might store variables using its own types and representations.When the application receives data from the outside, nothing guarantees that it can be converted from the input format to its native format (programming language) in a consistent way and without exceeding the capacity of the underlying layers (available/allocated memory).The same goes for sending data : we have no clue about how to format the data so that is it correctly interpreted, apart from the notation suggested by official norms like ISO. But again, nothing guarantees that the system we're communicating with follow these standards.Observations : Data representation relates to the encoding of a value in a way that makes it understandable by humans and valid within a given context. The question it addresses it "How to display it within a given context?".For some types of variables, outputting data is not trivial and can involve a lot a possible variations (for instance when preferences or user settings are involved). In such case, it is necessary to be explicit about how the data must be displayed.An additional difficulty is that some representations might vary depending on the "locale" settings (e.g. character for decimal separator) or international variations of notations (e.g. amount of digits of a phone number or digits grouping convention).Dates are a good example: various strategies can be found depending on the tools and environments (timestamps, formatted strings, ISO notation). And this matter is still subject to many issues is modern softwares.Observations : Unlike computers, humans know that a real is a number and that a string is a piece of text. On the other hand, unlike humans, computers need to know beforehand how much memory must be allocated in order to store a real number or a string.The problem is that the human definitions of "text" and "number" are quite vague. Respectively: "a coherent set of signs that transmits some kind of informative message"; and "symbols that represents an amount" and give no clue about storage or rendering.To enable a computer to handle a value accurately, without requiring the developer to program in every detail how to store and manipulate it based on its intended use, the associated variable should have an explicit type.In other words, explicit typing describes a variable in a way that covers the 4 aspects presented above : modeling, allocation, adaptation and formating.The suggestion made in this article is that explicit typing can be achieved by using a single descriptor.Here below, is a proposal in the form of a proof of concept, that uses a notation logic close to the the Media Type syntax (MIME) for building such descriptors.Using the MIME notation, in conjunction with concepts for which an unambiguous definition or an international standard exists, provides an explicit typing. For instance, we can assume that a "language" is a value that holds 2 or 3 lowercase letters and whose possible values are provided by the ISO standards ISO-639-1 and ISO-639-2.The term "usage" is used to distinguish explicit types from primitive types.Here is the full syntax of a "usage" descriptor:type [ ["/" subtype[.variation][":" length]] ]["{" min, max "}"]The primitive types can coexist, but become special cases of the "usage".The proposition being that:Below is presented a non-exhaustive list of descriptors using such notation.When it comes to numbers, it is common for a variable not to relate to any standard or external convention, but instead to have boundaries (i.e. minimum and maximum possible values).For that, we can use a notation similar to the one used in regular expressions for {n,m} quantifiers (https://docs.oracle.com/javase/tutorial/essential/regex/quant.html)Here are a few examples: This technique can be advantageously extended to strings to define specific lengths:Array is a kind of super type. It has a length (that can be dynamic), and holds elements of a specific type.Various notations for arrays can be found amongst popular programming languages. The most usual notations are :In order to respect the assumptions made earlier, and make it easily usable in controllers (validation only), we should be able to determine the maximum length of the array, along with the kind of elements it holds.For arrays as well, Usage can helpfully complete the information about the handled data:we know the variable holds a series of items and we know how to store, validate, display and convert those items.There is a special case of non-typed arrays that may be used in order to accept arbitrary values or maps.This approach can be used for identifying (or guessing) the kind of data that a field stores, which can be useful in various situations :When using production data within a DEV or Staging environment, this approach allows to identity sensitive or privacy-related data.Having a Usage telling the kind of value we're dealing with (ex. name, birthdate, gender, address, email, iban, password) allows developer to identify which values must be obfuscated (withdrawn or randomized) before importing data to the test environment, and/or with what sample data they can be replaced.Also, for record-based backups, it can be used to tell which columns must by encrypted.In the same way, coupling properties with Usages makes it easy to generate sample sets for seeding a database with dummy data in a consistent way.If your curious about this or would like to see an example of implementation, you can have a look at the eQual framework.Here are the direct links to the involved files in the repository :https://en.wikipedia.org/wiki/Data_typehttps://en.wikipedia.org/wiki/Media_typehttps://www3.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.htmlhttps://www1.icsi.berkeley.edu/~sather/Documentation/EclecticTutorial/node5.htmlhttps://www.lehigh.edu/~ineng2/notes/datatypeshttps://doc.rust-lang.org/nomicon/repr-rust.htmlhttps://docs.oracle.com/javase/tutorial/essential/regex/Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Raju ghorai - Sep 18 Raju ghorai - Sep 18 Cristian Fernando - Sep 18 Alexey Shevelyov - Sep 22 Once suspended, cedricfrancoys will not be able to comment or publish posts until their suspension is removed. Once unsuspended, cedricfrancoys will be able to comment and publish posts again. Once unpublished, all posts by cedricfrancoys will become hidden and only accessible to themselves. If cedricfrancoys is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Cédric Françoys. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag cedricfrancoys: cedricfrancoys consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging cedricfrancoys will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Not my type

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×