As people, when we give instructions to other people, we often use a combination of spatial (I don't know if this is the right word) and text instructions, for example "Here, take this, and go put it over there" or "Screw the nail in over here". This is possible if the items we're referring to are within reach and the communication is happening in person.
This method of providing instructions is different from how we give instructions to app UIs: we just click on a song for it to play. But the instructions we can provide are limited by the UI: what we can see, or read tells us the instructions we can give to the device. This method is completely spatial.
This is also different from how we interact with terminals, where we just type our instructions similar to verbalizing them. This method is completely textual.
A good question to ask here is if these two can be combined to produce the same effect as how we give instructions to people. A concept might be to type "play" in a terminal-like interface on spotify and highlight a group of songs indicating "play these". Or, type "find album" and then click on a song to find its album. The clicks on the song would mean different things depending in the context provided in the terminal. Ultimately you could even combine actions the same way you can pipe actions in unix, allowing the user to combine complex actions "play these songs and then play these songs".
This would allow a drastic reduction in the UI; despite the addition of the terminal, it would allow many redundant options to get removed from the UI. It's potentially also faster. For actions that the UI already optimizes for, like playing songs, it would be slower, but for complex actions, it could be much faster.
January 4, 2021
So actually, a lot of apps are making a "comeback" to this idea of the terminal to perform commands.
This is rly interesting bc its for the use case that I mentioned before, potentially complex actions can be a lot faster. It's also a way to simplify the user interface; designers can just put the most commonly used and direct actions in the actual interface and just hide complex ones inside the terminal. For example, in Notion, with their "Quick Find" or whatever it's called, it's just a matter of hitting Cmd+P
and typing in the name of a page no matter how deeply nested it is in my workspace.
I think the real potential of this idea is still much higher though. Currently, I think most implementations hardcode the options a user has given their context (are they currently writing an email). One advancement over this idea would use type theory and the current user's context to allow the user to string together many different queries, kind of like excel formulas.
Why type systems matter for UX: an example
Consider the situation proposed at the end of this article:
I click on a comment. A status bar indicates that I have selected something of type
Comment
. Now that I have a handle to this type, I then ask for functions of accepting aList Comment
. The delete comment function pops up, and I select it. The UI asks that I fill in the argument to the delete comment function. It knows the function expects aList Comment
and populates an autocomplete box with several entries, including anall comments
choice. I select that, and hitApply
. The comments are all deleted. If I want, I can insert a filter in between the call toall comments
and the function to delete those comments. Of course, the UI is type directed–it knows that the input type to the filtering function must accept aComment
, and prepopulates an autocomplete with common choices–by person, by date, etc.
It's not crazy to imagine how a command line interface could be used to perform these actions, along with autocomplete based on type inference.
The real advancement though, is a real marriage between between spatial and text input. Going back and forth from writing in the command line to clicking on interface elements within the same query. Imagine asking something complex in Figma like "Move all the objects away from this center point", clicking on a center point, selecting the objects to move, and then dragging an object out defining the magnitude and rotation of the movement.
I think that level of control and command is what is needed if we want to make talking with our applications as natural as talking with each other.
February 3, 2021
Tweeted about this today in the perspective that "design is object-oriented". I don't think people would argue this; designing visual systems is about what you see and how they act, and what they do comes afterwards. It's related here because the idea of command lines as I mentioned before and keyboard shortcuts are a kind of declarative action with no visual object or interaction associated with it. Instead, they just happen.