Architecting Usability

How to conduct heuristic inspections for evaluating software usability

Kevin Matz — Wed, 02 Jan 2013 01:13:00 +0000

Heuristics are “rule-of-thumb” design principles, rules, and characteristics that are stated in broad terms and are often difficult to specify precisely. Assessing whether a product exhibits the qualities embodied in a heuristic is thus a subjective affair.

If you inspect a prototype or product and systematically check whether it adheres to a set of heuristics, you are conducting what is called a heuristic inspection or heuristic evaluation. It is a simple, effective, and inexpensive means of identifying problems and defects and is an excellent first technique to use before moving on to more costly and involved methods such as user observation sessions.

It is usually best when a heuristic evaluation is carried out by an experienced usability specialist, but heuristic evaluations can also be very effectively when they are conducted by a team of individuals with diverse backgrounds (for example, domain experts, developers, and users).

To conduct a heuristic evaluation, you should choose several scenarios for various tasks that a user would perform. As you act out each of the steps of the task flows in the scenarios, consult the list of heuristics, and judge whether the interface conforms to each heuristic (if it is applicable).

Jakob Nielsen introduced the idea of heuristic evaluations, and his 1994 list of ten heuristics, reproduced below, is still the most commonly used set of heuristics today (Nielsen, 1994, p. 30):

Visibility of system status	“The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.”
Match between system and the real world	“The system should speak the users’ language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.”
User control and freedom	“Users often choose system functions by mistake and will need a clearly marked ‘emergency exit’ to leave the unwanted state without having to go through an extended dialog. Supports undo and redo.”
Consistency and standards	“Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions.”
Error prevention	“Even better than a good error message is a careful design that prevents a problem from occurring in the first place.”
Recognition rather than recall	“Make objects, actions, and options visible. The user should not have to remember information from one part of the dialog to another. Instructions or use of the system should be visible or easily retrievable whenever appropriate.”
Flexibility and efficiency of use	“Accelerators — unseen by the novice user — may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.”
Aesthetic and minimalist design	“Dialogs should not contain information that is irrelevant or rarely needed. Every extra unit of information in a dialog competes with the relevant units of information and diminishes their relative visibility.”
Help users recognize, diagnose, and recover from errors	“Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.”
Help and documentation	“Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large.”

An obvious weakness of the heuristic inspection technique is that the inspectors are usually not the actual users. Biases, pre-existing knowledge, incorrect assumptions about how users go about tasks, and the skill or lack of skill of the inspectors are all factors that can skew the results of a heuristic inspection.

Heuristic inspections can also be combined with standards inspections or checklist inspections, where you inspect the interface and verify that it conforms to documents such as style guides, platform standards guides, or specific checklists devised by your project team. This can help ensure conformity and consistency throughout your application.

Focus groups as a usability evaluation technique

Kevin Matz — Tue, 11 Sep 2012 22:26:12 +0000

A focus group brings together a group of users or other stakeholders to participate in a discussion of preprepared questions, led by a facilitator. A focus group could be used as an usability evaluation technique if the group is shown a demonstration of a product or prototype, and then the group’s impressions and opinions are discussed.

Focus groups might appear to be a convenient, time-saving way to get feedback from eight or ten people in a single session. In practice, however, the technique is not particularly reliable. Watching a demonstration is not the same as having the opportunity to interact with the product hands-on. And group dynamics can vary widely; different groups can come up with completely different conclusions.

Focus group discussions often tend to be dominated by one or two loud and opinionated participants, and the quieter participants often say little and go along with the group consensus. There is also the risk that the facilitator may consciously or unconsciously lead the discussion towards a particular outcome. If you choose to use focus groups, you should use them with caution and be aware of the limitations.

Analytics as a usability evaluation technique

Kevin Matz — Tue, 11 Sep 2012 20:59:05 +0000

Once your product has been released, understanding how it is actually being used very valuable. Analytics refers to the use of instrumentation to record data on users’ activities, followed by the analysis the collected data to detect trends and patterns. This data can then validate your assumptions as to which functions are being used most frequently and which parts of the product are seldom or never used, and you may be able to identify where users are running into trouble.

Some examples of the type of data that you can collect through analytics include:

Pages or screens visited, and time spent on each
Functions used, buttons and controls pressed, menu options selected, shortcut keystrokes pressed, etc.
Errors and failures
Duration of usage sessions

Websites and web apps are well suited to logging and tracking user activities. Many web analytics packages and services can provide additional contextual data such as the user’s geographic location, whether they have visited the site before, and what search terms were used to find the site if the user visited via a search engine.

Desktop and mobile apps can also collect usage data, but because of privacy concerns and regulations, it is important to declare to the user what data you intend to collect, and you must gain the user’s permission before transmitting any usage data.

No matter what type of product you offer, privacy concerns are important and you must ensure that your practices and Terms of Service follow the legal regulations appropriate for your jurisdiction. Tracking abstract usage data such as button presses are generally acceptable, but it is usually considered unacceptable to pry into content the user creates with the product.

Quantifying cognitive load and task efficiency

Kevin Matz — Thu, 23 Aug 2012 04:56:15 +0000

If we wanted to attempt to quantify the cognitive load — i.e., the thinking and effort involved — for performing a particular task, we could write out a list of the actions or operations that a user would have to do to carry out that task under normal circumstances. We could then estimate or assign a score, representing our idea of the effort involved, to each individual action, and then sum up all of the scores to get a total effort score for the task.

The KLM-GOMS model, the Keystroke-Level Model for the Goals, Operators, Methods, and Selection Rules analysis approach (Card, Moran, and Newell, 1983), is one analysis technique based on this idea, but instead of assigning scores representing effort, an estimate of the time required to do each action is estimated instead. The amount of time it takes to complete a task is a good proxy for physical effort, although it does not accurately measure the intensity of mental effort.

Let’s take a very condensed tour of the KLM-GOMS approach.

To accomplish a goal, the user will break the work into tasks, and for each task unit, the user will take a moment to construct a mental representation and choose a strategy or method for carrying out the task. This preparation time is called the “task acquisition” time, and can be very short — perhaps 1 to 3 seconds — for routine tasks, or much longer, perhaps even extending into several minutes, for creative design and composition tasks.

After the task acquisition, the user carries out the task by means of a sequence of actions or operations. The total time taken to carry out the actions is called the “task execution” time. Thus the total time required to complete a task is the sum of the task acquisition and task execution times.

To estimate the task execution time, KLM-GOMS defines basic operations (we assume here that we are dealing with a keyboard-and-mouse system):

Operation		Description	Suggested average values
K	Keystroking	Pressing a key or mouse button, including the Shift key and other modifier keys	Best typist: 0.08 sec Good typist: 0.12 sec Average typist: 0.20 sec Worst typist: 1.20 sec
P	Pointing	Moving the mouse pointer to a target on the screen	1.1 sec
H	Homing	Moving a hand from the keyboard to the mouse or vice-versa	0.40 sec
M	Mental operation	Preparation	1.35 sec
R	System response operation	Time taken for the system to respond	varies

So to use the mouse to click on a button, we would have a sequence of operations encoded as “HPK”: homing, to move the hand to the mouse; pointing, to point the mouse cursor to the button; and a keystroke, representing the pressing of the mouse button.

In addition to these operators, the KLM-GOMS model also includes a set of heuristic rules governing how the “M” operation, the mental operation, is to be inserted into an encoded sequence. For instance, “M” operations should be placed before most “K” and “P” operations, except for various special cases. So the “HPK” sequence discussed above would become “HMPK”. The rules are fairly arcane and we won’t go into the details here.

As an example, let’s consider the task of finding instances of a search term in a document in a text editor. One possible sequence of actions to accomplish this might be:

Click on the “Search” menu
Click on the “Find text” item
Enter “puppy” as the search term in dialog
Click on the “OK” button

This can be encoded using KLM-GOMS and used to formulate an estimate of the average time required as follows:

Action/Operation	Encoding	Time (s)
Task acquisition	(none)	1.5
Click on the “Search” menu	H[mouse]	0.40
	MP[“Search” menu]	1.35 + 1.1
	K[“Search” menu]	0.20
Click on the “Find Text” item	MP[“Find Text” item]	1.35 + 1.1
	K[“Find Text” item]	0.20
	H[keyboard]	0.40
Enter “puppy” as the search term	5K[p u p p y]	5(0.20)
Click on the “OK” button	H[mouse]	0.40
	MP[OK button]	1.35 + 1.1
	K[OK button]	0.20
Total		11.65 s

Of course, we would expect a more skilled user to be able to accomplish the same task in substantially less time by using shortcut keystrokes rather than the mouse, and by typing faster than an “average” user.

There are obviously limitations to this kind of analysis; it provides a general rough estimate only, and it assumes that users know the right sequences of actions to complete a task. It also does not account for errors and mistakes. But when you are designing an interface and considering how to design an interaction, methods such as the KLM-GOMS model give you a way to compare the efficiency of different alternatives, and all other things being equal, the alternative that can be done in the least amount of time is the most convenient to the user, and may involve the least cognitive load.

The impact of hardware devices on software ergonomics

Kevin Matz — Sun, 19 Aug 2012 11:42:12 +0000

A product that is ergonomic is designed in a way that helps reduces physical discomfort, stress, strain, fatigue, and potential injury during operation. While ergonomics is usually associated with physical products, the design of the a software application’s interface also influences the way the user physically interacts with the hardware device on which the application runs. And ergonomics also extends to the cognitive realm, as we seek to design software that helps people work more productively and comfortably, by reducing the dependence on memorization, for example.

To create an ergonomically sound software application, it is important to first think about the properties and the context of use of the hardware device on which the application will run. For the majority of consumer and business applications, there are currently three main forms of general-purpose personal computing devices:

Desktop and laptop computers with a screen, keyboard, and a pointing device such as a mouse or trackpad, are comfortable for users sitting at a desk for a long period of time.
Tablet devices with touchscreens have a form factor that is comfortable for sitting and consuming content (reading webpages, watching movies, etc.), but entering information and creating content via touch-screen control is generally not as comfortable and convenient as with a desktop machine.
Mobile phones (and similar devices such as portable music players) are usually used for short bursts of activity while on the go.

For more specialized applications, you might have a combination of software and custom-designed, special-purpose hardware. Examples include a machine that sells subway tickets, an automated teller machine, or an industrial thermostat control. If you are a designer for such a product, you may have responsibility for designing the form of the physical interface in addition to the software.

To give you an idea of some of the practical ergonomic aspects that you should keep in mind when designing for different devices, let’s compare desktop computers with touchscreen tablets:

Tablet devices with multi-touch touchscreens are pleasant and fun to use from an interaction standpoint because you can interact directly with on-screen elements by touching them with your finger. Desktop machines (as of this writing) generally don’t offer touchscreens, as reaching your arm out to the monitor places strain on the arm and shoulder muscles and would quickly become physically tiring. Desktop setups thus rely on pointing devices such the mouse or trackpads. These pointing devices introduce a level of indirection, however: moving the pointing device moves a cursor on the screen.

On desktop systems, there is a pointing device cursor (mouse arrow), whereas touchscreen devices have no such cursor. Some mouse gestures, like hovering the cursor over a control, thus have no counterpart in touchscreen systems. On both desktop and touchscreen systems, however, a text cursor (caret) appears when a text field receives the focus.

While a mouse may have multiple buttons, and clicks can be combined with holding down modifier keys (Control/Alt/Command/Shift), touchscreens don’t offer as many options. When you drag your finger across the screen, is it to be interpreted as a scrolling gesture, or an attempt to drag and drop an object on the screen? Cut-and-paste and right-clicking to get a context menu are easy on a desktop machine, but on a tablet, such operations require double-touch or touch-and-hold gestures that are not always evident.

Fingers range in size substantially; young children have small, narrow fingertips, whereas some men have very thick, fat fingers. Touchscreen buttons and icons thus must be large enough to accommodate “blunt” presses without triggering other nearby controls. In contrast, the mouse arrow allows pixel-precise pointing, and so buttons and icons can be substantially smaller on desktop applications than on touchscreen devices.

When the user is touching something on the screen, the user’s finger and hand will obscure part of the screen, so you have to be careful about what you display and where, so that important information is not hidden. When pressing an on-screen button, the user’s fingertip will obscure the button being pressed. Because button presses don’t always “register”, users seek visual feedback to see that the button press worked, and so you either need to make the buttons large enough so that the animation of the button being depressed is visible, or you should give some other clue when the user retracts the finger to show that the button was pressed (maybe pressing a Next button makes the application navigate to the next screen, which is very clear feedback that the button press was successful). Auditory feedback, like a clicking sound, can also be useful as a cue that the button was pressed successfully.

Mobile devices and tablet devices are often held by the user in one hand while standing, and so the user has only the other hand free to operate the touchscreen.

When designing a product, understanding the constraints and limitations, as well as the opportunities, of the hardware devices the software will run on will help you design appropriate and comfortable interactions.

Software requirements in a nutshell

Kevin Matz — Mon, 30 Jul 2012 11:59:44 +0000

Requirements are statements of things that your product must achieve for it to be considered successful. If you are building a customized solution for a client, requirements express the wants and needs of your client. If you are building a product for sale to a wider market, requirements express the aggregate wants and needs of potential customers that will be necessary for the product to be sell enough copies to be economically successful.

Functional requirements are the features your product will have, in terms of the functions, actions, and behavior it will support. For example, some functional requirements for a word processor would be that it support bold and italic type, that it allow documents to be printed, and that it allows images to be embedded in documents.

Non-functional requirements are performance or quality constraints that are general or “cross-cutting” in nature. Non-functional requirements address areas such as performance, security, stability, capacity, and scalability. For example, an e-commerce website might be required to serve 100,000 users concurrently and provide a response time of 2.0 seconds or less.

The scope of your product and project is defined by the set of requirements that need to be implemented. Without careful management, additional wishes and demands will continually be added to the project’s scope. This phenomenon, known as scope creep, will threaten your ability to deliver on schedule.

Requirements should be stated in concrete, measurable terms whenever possible. For example, rather than stating that “the product must be easy to learn”, which is subjective and unprovable, you might consider a phrasing that lends itself to objective testing, such as “95 percent of users will be able to successfully process a standard passport application after the two-day training session”.

Some requirements are “definitional”: they result from, and form part of, your definition of what the purpose and market positioning of your product is. Other requirements need to be elicited from your users and other stakeholders.

Designing search systems

Kevin Matz — Mon, 23 Jul 2012 08:33:04 +0000

In document- and content-oriented applications and websites, the quality of the user experience often depends on the user being able to find what she is looking for, and so effective search functionality becomes critical when there is a large repository of content.

Let’s examine search systems and look at what factors you have to consider when designing one.

Searching

In a search scenario, the user enters a search query, and in response, the system retrieves matching items from a repository.

Alternatively, some people prefer to think of searching as a filtering mechanism: The user chooses filter criteria, and the system filters out any items that do not match those criteria.

When designing a search system, you need to think about and decide on the following:

What types of items are in the repository to be searched? Files, documents, images, videos? Can the search return multiple types of items?

What is the form of the search query?

Are there multiple fields, checkboxes, and drop-down lists that act as filter criteria?

If it is a textual search, does it search for an exact phrase match? Is it case sensitive?

Will you provide “basic” and “advanced” search interfaces to cater to different user audiences?

If you support features such as wildcards, regular expressions, and boolean operators (AND, OR, and NOT), how will you communicate to the user that these features are available, and where will you explain the proper syntax? For example, Google’s Advanced Search page offers many options like these, and it simultaneously explains the syntax for the shortcuts:

What is the scope of the search? For example, when searching documents, are matches for the search term sought only in the text of the document, or are metadata such as the filename, document title, document properties, and any tags searched as well?

Does the search attempt to find appropriate variations of the search term? Will the technique of stemming be employed so that a search for “eat” also finds instances of inflected forms like “eats”, “eating”, and “ate”?

How are search results presented? When the user submits a search query, are the results presented on a separate page? Or are the results presented on the same screen and filtered in real-time as the user modifies the search criteria?

If there are many results, are the results broken up across multiple pages?

If the search results are textual documents, is a small snippet of the text surrounding the match presented to provide context? What if there are multiple matches within the same document?

Can users save and retrieve queries that they expect to use frequently?

Search quality

The perceived quality of search results has a big impact on the user experience. Search quality is a function of the following aspects:

Accuracy of recall: Search results must include all of the items that match the search criteria, and exclude any items that do not match the search criteria. If the search does not locate all of the matching items, or if irrelevant items are presented, then users can become frustrated when their expectations are not met. In some cases, users may also be misled by inaccurate search results.

Relevance: It’s desirable to sort the results to show the most relevant items first, although the definition of “relevance” depends on the application. Recent news articles would usually be presented ahead of older articles, for instance, and an article featuring the phrase “financial crisis” in the headline and containing multiple references to that phrase in the body would be considered more relevant than an article that mentioned that phrase only once. Users can become frustrated when they have to dig through “noisy” results to find the items that they perceive to be relevant.

Performance: Search results should be delivered promptly. But there is often a trade-off involved here; building performant search systems can sometimes be a very tricky and costly technical challenge.

Lookups

A specialized form of search functionality is the lookup function associated with some data-entry fields. Some fields have constraints on what values are valid, but there are too many valid values to make a drop-down list practical.

For example, a customer number field might allow the user to enter a customer number directly, but it is rare that users will have memorized the number for a particular customer, and there may be tens of thousands of customers on file. In this case, the field should provide a lookup button (or a shortcut keystroke) that allows the user to search for a customer by name. Upon selection of a customer, the customer number field is then populated with the corresponding customer number.

Finding text within a document

In document-based applications like word processors and web browsers, users will expect to be able to find all of the instances of a search term within the current document. (The generally-accepted terminology in English-language software is to “search” to locate instances of a term within a repository of documents, and to “find” to locate instances of a term within an individual document.)

For editable documents, the ability to replace instances of the search term with another term will also be expected.

Some applications will highlight all instances of a search term within a document:

Alternatives to searching

While searching is convenient, in most applications, it should not be the only means of navigation.

In many systems, there will be some items that are accessed frequently. You might offer shortcut links, for example, to provide rapid access to the most popular items, the most recently-added items, or the user’s most recently-accessed items. Allowing the user to bookmark locations or search results may be useful as well.

We can also imagine cases where the user might prefer to browse the contents of the repository. For example:

the user doesn’t know what is in the repository; or,
the user is looking for something specific, but doesn’t know the right words to describe it; or,
the user has a vague idea of what they want, but it may not be easily describable in words — it’s a case of “I’ll know it when I’ll see it” (for example, the user wants to find an action photo of an athlete playing baseball, but the specific baseball player doesn’t matter as much as the general appearance and composition of the photo).

Hierarchical menus, keyword indices, and sitemaps can be useful strategies for allowing users to browse the repository and discover content.

Some systems might benefit from allowing users to tag items with keywords. Browsing the list of keywords then becomes another way of getting an overview of the repository contents and accessing items.

Some applications can take advantages of metaphors that simulate real-world situations. For example, a website for a bookstore or library might allow users to view the covers of books in various categories, providing an experience similar to browsing titles on a physical bookshelf.

Designing an interaction framework for your application’s tasks

Kevin Matz — Wed, 18 Jul 2012 04:32:48 +0000

Many applications are centered around a set of features, tasks, actions, operations, business processes, or use cases that share a similar pattern of interaction. For example:

A paint program has a toolbar or palette with various drawing tools. Clicking on a tool selects it, and then the user operates on the canvas with that tool until a different tool is selected.
A game might have a number of levels. Each level has a different map, but all of the levels have essentially the same gameplay, scoring, and success criteria for moving on to the next level.
A workflow-driven human resources management system might have different business processes for business events like scheduling job interviews, hiring an employee, recording employee evaluations, or adjusting employee benefits. Each business process can consist of multiple stages or subtasks that require action and approval by different users. Each business process is started by selecting it from a menu, and a business process will have an “active” status until a terminating condition is reached.

If your application has a set of similar tasks, you first will want to create a list to keep track of them.

You can then design an interaction framework that describes the commonalities of the user interface and behavior for those tasks.

Some of the issues you should consider include:

the means by which the tasks are started or triggered (e.g., selection from a menu);
the authorizations for which tasks can be initiated by which groups of users;
conditions under which the task can be activated, or cases where it may be disabled;
how the task is ended or deemed to be complete;
whether the initiation or end of a task changes any statuses or modes;
whether the end of the task leads to follow-up tasks; and
the effect that the task has on the data in the system; for example, upon task completion, the data may be saved persistently, whereas if the task is abandoned or cancelled, the data will not be saved. (These types of considerations can form part of the transaction/persistence concept.)

Designing an interaction framework helps ensure that you really understand how your application fundamentally works. It ensures consistency across similar tasks, which helps users perceive patterns and form correct mental models.

By documenting the commonalities amongst the tasks in an interaction framework, it also saves you from having to re-document the same aspects for each individual task. The interaction framework will also be critical for helping the development team design and build the technical “platform” on which the various tasks can be implemented.

How to build a visual hierarchy to express relationships between page elements

Kevin Matz — Mon, 16 Jul 2012 06:10:12 +0000

The underlying structure of a page’s layout can be understood as a visual hierarchy, where some visual elements on the page are subordinate to others. The visual hierarchy helps guide the user’s eye through the page, and aids users in interpreting the content of the page by giving clues to the relationships amongst the elements.

Take this page for example:

The banner is the highest element in the hierarchy of this page. The banner and logo tell the viewer that everything on the page is associated with the site named in the banner.

The navigation bar on the left-hand side of the page comes second in the visual hierarchy.

The main content panel’s heading, “Events Calendar”, which describes the contents that follow, forms the third element in the visual hierarchy.

The two subheadings are subordinate to the main heading, so they come next in the visual hierarchy.

Finally, the sections of body text are subordinate to their respective headings. These come last in the visual hierarchy.

When scanning the page, the viewer’s eye will tend to look first at the banner, then move to the navigation sidebar, then the main heading. While the viewer may read the content under the main heading from top to bottom, it is likely that the viewer’s eye will be caught by the subheadings first, and then the viewer’s eye may go back to read the body text.

The visual hierarchy helps the viewer interpret the content on the page in a logical way. Let’s now take a closer look at how create a visual hierarchy and express relationships between different elements on the page. Our main tools for achieving this are the use of similar or contrasting visual attributes of elements, and the relative positioning of elements.

Attributes

Attributes are the general properties of things on the page. Common attributes are size, shape, color, texture, and direction. For text, attributes include the typeface, weight, spacing, and decorations such as italicization or underlining. In other words, attributes are ways to style a visual element.

Visual elements that are similar or belong to the same category should share the same attributes, whereas elements that are intended to be different should have one or more contrasting attributes. If one element is intended to be stronger or more important than the other element, then the attributes should be chosen to reflect that.

For example, if you have a list or a menu, then all of the entries belong to the same class or category of elements, and so they should be styled consistently with the same attributes. But the heading that sits atop the list serves a different function. It describes or summarizes the contents of the list or menu, and so it should be styled with contrasting attributes that emphasize its dominance. The heading might be larger or bolder, or it may take a different typeface or color.

Contrast is weak when the elements being contrasted are only slightly different. When two element differ only slightly, it can often look like the difference was made by accident. Strong contrast is produced when the differences are clearly intentional. To create intentional contrast between two elements, the general guideline is to make sure the elements differ in at least two ways; that is, at least two attributes should be different between the elements.

For the purposes of this guideline, surrounding space is often considered to be an attribute as well, so leaving a gap of whitespace between two elements can count as one of the differences.

The following diagram shows some examples of weak contrast and strong contrast between a heading and a list of items:

In example (1), there are no differences between the heading “Commodities” and the entries in the list, so it does not look like a heading at all.

Example (2) is better — the heading is in bold type — but the difference still does not stand out strongly.

Example (3) places a gap between the heading and the list. While this is also better than (1), it is still not satisfying, as the heading is set in the same type as the list entries.

Examples (4) through (6) illustrate how using two differences produces much stronger visual contrast. Example (4) uses a gap and sets the heading in bold type. Example (5) sets the heading in bold type and uses indented bullets to offset the list from the heading. Example (6) increases the size of the heading’s font and sets the heading in a different color.

The latter three examples communicate the relationship between the heading and the list entries much more effectively than do the first three examples.

Positioning

In the English-speaking world, and in other left-to-right languages, we read from left-to-right and from top-to-bottom. What is at the top of the page is considered to be more important than what is at the bottom of the page, and to a lesser extent, things on the left in a row of things are perceived to come first. (In right-to-left languages like Arabic and Hebrew, the right-to-left direction is reversed.)

Thus, the top-left corner of the page is where the eye begins when scanning the page, and so the most important element in the visual hierarchy is usually placed there.

If we have two visual elements A and B, we should ensure that A is positioned either above, or to the left of, element B, when we want to show that:

Element A is more important than element B; or,
Element B is a subelement of A; or,
Element B depends on, logically follows from, or derives from, element A; or,
Element A is the cause and B is the effect; or
Element B naturally follows A in a logical sequence.

As an example, let’s take one example of poor design that I’ve encountered. One system had a screen for editing customer details that looked roughly like this:

Users were expected to enter a value in the Customer Number field and then click Retrieve. The other fields on the screen would then be populated with the data on file for that particular customer.

The above design is poor because the relationship between the customer number and the remaining fields is not communicated by the visual design.

The data on this screen is dependent on the customer number, because the customer number is the identifying piece of information, or key, for a customer record. If the user enters a new customer number and clicks Retrieve, new data for the new customer number will be presented.

But because the user will start reading the screen from the top left, the user might assume that the last name and first name are identifying the customer record. Additionally, the fact that the user is expected to locate the Customer Number field first is troubling; it is buried deep in the screen, and there are no visual cues that it is the most important element upon which the others are dependent. If it is the identifying field upon which the other fields depend, then it should be situated in a place that better communicates its importance: the upper left, where the user begins scanning the screen.

And the fact that the user has to jump from the Customer Number field up to the Retrieve button is poor design as well. There are no cues that this is how the interaction flow is supposed to work; because we read from left-to-right and from top-to-bottom, jumping from below to above is counterintuitive. The button should be moved so that there is a left-to-right or top-to-bottom flow from the Customer Number field to the Retrieve button.

Thus, one possibility for an improved layout might be something like the following:

In this design, it is clearer that the details are dependent upon the chosen customer number. There is a left-to-right flow from the Customer Number field to the Retrieve button, and there is top-to-bottom flow that leads towards the finalizing Save and Cancel buttons.

Practical aspects of visual hierarchy for user interface design

While you may not necessarily explicitly design a visual hierarchy when creating a page composition, an awareness of the general concept of the visual hierarchy and an understanding of how relationships between elements can be expressed can help you produce better designs.

In large project teams, you can try to ensure some degree of visual design consistency throughout your product by creating a style guide that defines the general look-and-feel of the interface in terms of a visual hierarchy. Writing a style guide is not always easy; it’s not always possible to completely document everything that makes up a consistent set of visual designs. But by specifying rules for the styles and positioning of headings and other visual elements, and by providing page layout templates and examples, a style guide can help communicate your design intentions to your project team.

Interaction design and usability for data persistence and transactions

Kevin Matz — Fri, 13 Jul 2012 20:59:44 +0000

Most applications deal with data that needs to be stored persistently — that is, saved — so that it can be accessed later. A “persistence concept” or “transaction concept” is an explanation, at the user-interface level, of how this works in your application.

To help us understand what’s involved in interaction design for persistence and transactions, let’s first look at how data is saved in two typical classes of applications: document-oriented desktop apps, and multi-user web and client-server apps.

Document-oriented desktop applications

For document-oriented desktop applications, like word processors and spreadsheets, documents are usually saved as individual files on a disk.

When a user is working with a such an application, there is a copy of the document stored in the working memory of the user’s computer. As the user edits the document, the copy in working memory is modified, and so it will no longer match the copy on disk. By saving the document, the copy on disk will be updated to match the copy in working memory. If the user makes changes to the document and then closes the document or closes the application without saving the document, then the changes will be lost.

Most people with computing experience are familiar with this model. You can indicate that your application uses this model by following standard conventions (which can vary between platforms). There should be “Save” and “Save As…” commands under the File menu, and (especially on Windows) there may be a “Save” icon in the toolbar. On Mac OS X, a black dot appears in the red “Close Window” button whenever unsaved changes are present, and this dot disappears after the document has been saved. On Windows, some applications place an asterisk next to the document title in the window’s title bar when unsaved changes are present.

The dot in the red "Close Window" button indicates unsaved changes are present

Some usability specialists argue that the need to know about the separation between working memory and persistent storage is a “leaky abstraction” — an underlying aspect of the technology that is exposed to the user, creating an unnecessary mental burden. The Canon Cat was a unique word processing system in the 1980’s that hid the distinction between working memory and persistent storage. No “Save” command was offered because the system automatically synchronized all changes with the copy on disk. The popular word processing application Scrivener similarly saves all changes automatically every few seconds, meaning that users never have to worry about explicitly saving their work. Diverting from the conventional way of doing things can initially cause users confusion, though, and so Scrivener still offers a “Save” command in the File menu for convenience, even though it’s never really necessary.

Web and client-server applications

For most web applications and client-server applications, data is usually stored in a database (which in turn stores the data in files on a disk). Database systems allow many different users to access the same data simultaenously.

In applications that are backed by a database, when a user creates, edits, or deletes data in the system, these changes can be accumulated in units called transactions. If other users of the system retrieve data from the database while the first user’s transaction is still in progress, the other users will not see these changes. But when the software issues a “commit” command for the first user’s transaction, the transaction is ended, and the user’s accumulated pending changes are saved “permanently” to the database so that other users of the system can see the user’s changes.

If instead a “rollback” command is issued, the transaction is also ended, but all of the pending changes for that user’s transaction are cancelled, and the database is not updated; other users see no changes in the data in the database.

Most applications hide the technical concepts of transactions, commits, and rollbacks from the user. This hiding is done by aligning the start and end of transactions with places in the user interface where various events or task flows start and end. Terminology is also used that is more familiar to the user.

For example, we can image that when a dialog box such as a Properties dialog is opened, a transaction will be started. If the user closes the dialog or presses the “Cancel” button, then the transaction will be rolled back and any changes the user had made in the dialog will be lost. If the user presses the “OK” button, the user’s changes in the dialog will be committed to the database.

For multi-user systems, you also need to think about what happens when two users try to edit the same information records simultaneously.

Imagine a situation where two users are attempting to make changes to the address information on file for a particular customer. The original address on file is “123 Main Street”. User A opens the address record and starts changing “123 Main Street” to “456 First Ave.”, while seconds later, User B opens the same address record and starts changing “123 Main Street” to “789 Second Ave.” If User A presses the “OK” button to save the changes, and then User B presses the “OK” button shortly afterwards, what happens to the data on file? There are a couple of possibilities:

User A’s changes get saved, but then User B’s changes overwrite User A’s changes. So the address on file at the end is “789 Second Ave.”
User A’s changes get saved, but User B’s changes are ignored because User A was first.

Neither of these is particularly satisfying, as both users will think that their changes have been saved, but one user will have had their changes overwritten or lost without their knowledge.

One solution to this issue is to use some form of record locking: When User A opens the customer record, the system locks that record, so that if User B attempts to open the same record, she receives a message that the record is locked and unavailable for editing. When User A commits or rolls back his changes, then the lock is removed and the other users can edit the record again. One problem with locks is that if User A leaves his terminal and goes home, or if User A’s application or operating system crashes, the lock might be “stuck” in place for a long time, requiring an administrator to intervene so that other users can edit the record again.

In many applications, it makes sense to want to allow multiple users to view the same record simultaneously, but only one user at a time should be able to edit the data. This raises the question of whether users who are viewing a record should be notified when the data they are viewing has been changed by another user. If there is no notification and if the display is not automatically refreshed, the user will be looking at “stale” data that was at one time correct, but no longer matches the current state of the database, and this may or may not be a problem depending on the nature of the application.

Collaborative web-based applications where users work together on editing the same document can present many challenges like these, and it can take some creative thinking to find usable and non-intrusive solutions to avoid or manage simultaenous editing conflicts.

Designing and documenting a transaction and persistence concept

We’ve seen that an application’s learnability and usability can be impacted by how it handles the persistence of data and manages multi-user editing conflicts, and how the persistence model is presented via the user interface and interaction design. Therefore, explicitly designing how these aspects will work from a user’s standpoint is a good idea for applications of significant complexity.

Questions you need to ask and eventually make decisions about include:

What types of data validations take place, and when is the validation performed? How are errors and warnings presented?
Can data or documents be saved when validation errors exist or when mandatory fields are empty?
At what points in the application can data be saved? How and when can any changes be lost (intentionally or unintentionally)?
If the system uses transactions, where do transactions begin and end?
Does the application save data automatically, or does it rely on the user to use some form of “Save” or “Commit” command? Are controls such as “Save” menu options or toolbar buttons prominently visible, and is it clear to the user how and when to use them?
How is the user interface structured to help user understand the persistence or transaction model? Is the state of the data (saved or unsaved) made clear?

You’ll often need to clarify some of these questions with the technical architects and developers in your project, as the technology framework being used can often dictate how some of these aspects will have work. At the same time, just because the technology requires things to be done in a certain way does not necessarily mean that you have to expose all of the details to the user; technical details can be hidden. Whenever possible, create the design that is clearest and easiest for the user, and then build the system to support that way of working.