Certainly, there are many approaches to building a home studio depending on goals, stylistic constraints and monetary resources. My aim is to configure a digital tool chain capable of professional grade digital and analog source recording, whether I have the skill to produce a professional grade master is another matter entirely. This article shall focus on the tool chain, not the sound engineer’s skill. My personal music taste is ‘wall of sound’, classically infused, progressive metal. This style, and many others, requires a sophisticated chain allowing an arbitrary number of tracks from a variety of sources and the ability to manipulate tracks in a complex manner. This can all be accomplished at minimal cost using the tool chain outlined here.
I will take a ‘big-picture’ approach and outline how a number of different software applications are used together to achieve a sophisticated tool chain for recording music. I will not cover the operating details of each individual tool. However, I will reference external resources such as documentation and tutorials. Each individual tool has a learning curve of its own and will require time investment.
A personal computer running a Ubuntu Studio; a ‘mostly free’ distribution of the Linux OS geared toward multimedia production. It features a low-latency kernel that allows real-time monitoring while recording an analog source. Unless otherwise specified, it includes all the software discussed below. A reasonable quality sound interface is also required. Generally, a consumer grade sound card geared toward gaming is not appropriate. A sound interface designed for recording music with support for multiple balanced line inputs will save a tremendous amount of hassle and preserve fidelity. Many ‘pro’ and ‘prosumer’ grade interfaces use firewire to communicate with the PC. For a list of all supported firewire devices, click here. I personally use a Focusrite Saffire 4-in/10-out. Beware of Motu interfaces. They have a poor reputation with regards to supporting the Linux community.
Browse the Ubuntu Studio menu and become familiar with it’s layout and contents. One of the items is ‘jackcontrol’ or ‘qjackcontrl’, a software version of a studio patch-bay. Consult the Jack Quick Start guide for an overview of basic functionality. Ensure that your sound interface inputs and outputs are visible from within jackcontrol before continuing. This application allows all other components of the studio to communicate with each other. It is the cables running from box to box in a physical studio.
This is where it all happens. The Ardour application is a muti-track audio mixer/recorder with extensive track editing facilities. It communicates with your sound interface via Jack and can synchronize with all other sound applications, such as drum machines and sequencers, over the Jack protocol. Besides mouse and keyboard, Ardour can be controlled via external surfaces such as Mackie or Tascam mixing consoles. Enough information to get productive with Ardour is available through their online documentation.
Ardour supports LADSPA (Linux Audio Developer’s Plugin API) for effects processing modules. For most users, Steve Harris’s library will be more than enough to meet processing requirements. Effects can be added and removed from individual tracks via Ardour’s mixer console.
If you plan to record mostly ‘real’ audio sources, i.e. non-midi, then your sound interface + Ardour may be all you need. However, most folks don’t have a fully mic’ed drum set handy and many will want to emulate instruments they don’t actually play or own themselves. For these purposes, some additional software is required.
Linuxsampler is capable of playing ‘.gig’ or ‘giga’ format instrument samples which was the format of some of the highest quality sample libraries. I say ‘was’ since Tascam discontinued in-house development of the format. However, development continues within the open source community. Regardless, giga samples are still widely revered in terms of quality today and many libraries remain available. Although the tool is free for non-commercial use, one generally must pay for the instrument libraries, which can run in the hundreds to thousands of dollars. However, a fantastic concert grand piano is available on the Linuxsampler site free of cost. Due to the software license, Linuxsampler is not included with Ubuntu Studio and must be downloaded from the Linuxsampler site.
The caveats aside, it’s the best sampler I’ve been able to find for the platform. Linuxsampler is a command line interface, so I recommend installing JSampler as well (also on the Linuxsampler site). This provides a graphical interface to the sampler engine. Consult the Jsampler documentation for installation and usage details.
Alright, we now have a nifty sampler that can be patched to Ardour either via the ‘mixer’ window or Jack directly, but there’s a catch. Ardour does not currently support midi-tracks. That is, the recording of midi data rather than the raw audio coming from the sampler. Capturing the midi data is ideal because Ardour can play back the track, using it as a midi source, into the sampler allowing different instrument setups to be substituted. If the audio is recorded then we’re stuck with that audio. Preserving the original midi-data provides extensive flexibility for re-mixing at a later time.
Enter the application Muse, a midi/audio sequencer. Rather than directly recording the audio output of the sampler with Ardour, we’ll patch the midi controller (i.e. a midi keyboard) to Muse, patch the output of the muse midi track to the sampler, and patch the sampler output to Ardour. Muse is responsible for capturing the midi-data while Ardour optionally records the audio simultaneously. If we’re unsatisfied with the instrument or sampler settings, the midi track can simply be played back in Muse with different sampler settings and re-recorded in Ardour. Muse also provides editing facilities for midi data.
Traditional hardware based drum machines are about as fun as drilling into concrete. To track a sophisticated song takes so long you just end up not doing it; the interfaces just stink. At least from my perspective, this all changed with Hydrogen. I don’t recall even needing documentation to begin using it, but a good manual is available on the site. A number of alternative drum kits are available, including a ‘death metal’ kit. Hydrogen will synchronize with Ardour over Jack. Mixing and panning of each individual drum piece can be performed through Hydrogen’s mixer. Like Ardour, Hydrogen also supports LADSPA plugins. When satisfied with the drum track and mix, just patch the output of Hydrogen to Ardour using jackcontrol and lay down the track.
Personally, I consider ‘mastering’ the art of ‘pulling the tracks together’ to sound like a cohesive performance rather than x-consecutive weekends in front of a computer. Many people would actually classify this as part of the mixing process. Either way, it can be accomplished by ‘mixing down’ tracks in Ardour, possibly with some EQ or reverb plugins.
The other side of mastering involves loudness control and saturation of the dynamic range; ‘filling the sound space’ to produce a sonic perception of ‘fullness’. The tool for this job is JAMin. It provides a suite of filters, limiters, EQs, analyzers and other processors. Tracks are routed from Ardour to JAMin and then back to Ardour. Tutorials are available on the site.
One part of the chain excluded from this discussion is tools to take the final master track from Ardour and publish it to one of many formats, mp3, CD, etc. This is because I’ve not yet reached this phase myself as of this writing. I’ll be sure to report my findings in a future update to this post.
The best of luck and satisfaction on your own musical endeavors!
As I’m no longer a student, the carbonsilicon blog was rarely receiving updates. I decided to roll the content of that blog into this personal blog and let the carbonsilicon domain expire. Most of the carbonsilicon content can be found under the Cognitive Science category of this blog.
Computer science pioneer Edsger Dijkstra published a set of aphorisms that reflected his solid belief that one’s choice of programming language affects the cognitive capabilities of the programmer. The more colorful phrases included, “The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense” and “The tools we use have a profound (and devious!) influence on our thinking habits, and, therefore, on our thinking abilities” (Dijkstra 1982). Although never directly referenced by Dijkstra, the comments imply a belief in a “language is thought” hypothesis such as those championed by Benjamin Whorf (Carroll 1997) .
The degree to which spoken language influences thought is intensely debated, with Steven Pinker amongst the most outspoken of Whorfian hypothesis critics. Daniel Casasanto (2008) refutes the popular anti-Whorfian stance proclaimed by Pinker on the grounds that Pinker’s claims are too broad in scope. Casasanto argues that one must distinguish between an Orwellian flavor of the hypothesis, where language and thought are formal equivalences, and linguistic relativism where language and thought simply influence one another. Although Pinker’s assertions may successfully challenge an Orwellian interpretation of Whorf, extending the argument to ‘weaker’ forms of the hypothesis, such as linguistic relativism, is logically fallacious (Casasanto 2008).
The degree to which programming languages in particular influence cognition has received little attention. An informal survey conducted by Richard Wexelblat (1980) revealed that some programmers believe certain languages can interfere with abstract forms of reasoning such as data-structure design and high level engineering activities. Wexelblat expressed his personal concern over the popularity of “do it quick and dirty” languages, such as BASIC, and the generation of students indoctrinated to them. Although interesting, the usefulness of such self-reports is extremely limited. Accordingly, Wexelblat concludes with a call for controlled studies on the subject. Unfortunately, to my knowledge no such study has ever been conducted. Indeed, it’s likely that an experimental approach to such research is not yet sufficiently developed.
A strong version of the Whorfian hypothesis is likely an inappropriate foundation for considering the interaction between the mind of a programmer and a programming language. The language is not the programmer’s thought. Minimally, the language is the cumulative thoughts of many contributors under many transformations. However, a weaker version of the hypothesis may provide a starting point for the study of how programming languages influence the cognition of the programmer. However, significant definition needs to occur prior to applying a Whorfian-like hypothesis to programming languages. Here are some points I’ve identified.
1. Definition and scope of the term ‘language’. Certainly, programming languages are simpler than natural languages. There are closed definitions for the syntax and semantics for programming languages. One may assert that the set of all closed language definitions is the definition of language in the context of programmer-machine interactions. However, this definition is far too narrow as the programmer rarely interacts with the language specification itself. Rather, the interactive elements include emergent properties of the language, such as graphical and syntactical abstractions. Therefore, a useful definition of language in this domain must include all abstractions that carry meaning for the programmer including atomic elements such as keywords up to highly composite elements like user interface components. Developing a clear and agreeable definition will be challenging.
2. Capturing differences in state or state changes. This involves some method of objectively measuring changes in the programmer’s behavior due to changes in the ‘language’; using the broader form of definition above. These could be changes in the structure of the language that influences adaptive behavior on the part of the programmer or changes in the ‘behavior of the language’. Therefore, operationalized metrics must be established that represent an abstract measure of machine/language state and human state.
One could begin with high level programming language qualities such as type system, syntactical style, binding style, etc. and measure how a programmer’s approach to problem solving changes after continued exposure and use of a language that sits at a particular point on this qualitative coordinate system. One method that may be used to examine changes in the programmer’s state, in order to infer changes in problem solving strategy, is examination of their semantic network via lexical decision tasks at different intervals of language exposure time. Such tasks are used to measure word or concept availability in the domain of natural language (McNamara 2005).
If such experiments demonstrated a link between semantic network changes in the programmer that appeared programming language specific then Dijkstra’s intuition that a programming language influences a programmer’s thought processes gains validity. It follows that such influences would form a loop between the programmer and the machine/language forming a cognitive system where the behavior of the programmer influences the language behavior which influences the programmers behavior and so forth. The practical applications of such findings would include renewed appreciation of programming language diversity and help elucidate the relative strengths and weaknesses of language properties with respect to how they combine with the programmer to solve problems.
Carroll J.B. (Ed.). (1997). Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. Cambridge, Mass.: MIT Press.
Casasanto D. (2008). Who’s afraid of the big bad Whorf? Crosslingual Differences in Temporal Language and Thought. Language Learning. 58(Suppl. 1), 63-79.
Dijkstra E.W. (1982). Selected Writings on Computing: A Personal Perspective. (pp. 129-131). New York, NY.: Springer-Verlag.
McNamara T.P. (2005). Semantic Priming, perspectives from memory and word recognition. New York, NY.: Taylor & Francis Group.
Wexelblat R.L. (1980). The consequences of one’s first programming language. Proceedings of the 3rd ACM SIGSMALL symposium and the first SIGPC symposium on Small systems. 52-55.
I recently replicated a Shepard and Metzler (1971) style experiment in order to explore OpenGL programming under Matlab. Additionally, I wanted to manipulate certain variables, such as the
type(class) of object, to verify predictions made by emerging models of
top-down processes of human vision such as those mentioned in the previous post. This replication provided a convenient point to start.
In this implementation, the participant is shown a series of 3D object pairs. On some trials, the two objects are different. On other trials, one object is simply a rotation on the xy plane of the other object by a variable multiple of 20 degrees. The participant responds by pressing ’1′ to indicate the same object or ’2′ to indicate different objects while their response time is measured. Previous research shows that response time increases linearly relative to the number of degrees one object is rotated from the other (Shepard, Metzler 1971).
The image to the right below shows the stimuli for one trial. The image to the left depicts my own results for 100 trials. My response time is far less than the typical Shepard (1971) subject but I have years of experience working with 3D graphics and, of course, wrote the code for this experiment. I’ve posted the code as I think it provides a decent example of accessing OpenGL from Matlab and a good starting point for related experiments. The code is available for download here.
Shepard R., Metzler J. (1971). Mental Rotation of Three-Dimensional Objects. Science. 171.3972, 701-703.
One of the toughest problems in the domain of biologically inspired models of vision is explaining how one can properly categorize objects while recognizing novel instances of objects within a category. For example, there are an infinite number of things that you could categorize as a rectangle, yet you have no problems deciding if something is rectangular. The problem becomes fiendish when considering more complex objects such as faces.
It is exceedingly unlikely that visual information is stored in memory for all angles of all objects encountered, as this would require virtually incalculable storage space. Therefore, object recognition is probably not based on rote comparison against stored images. An interesting and demonstrable model described over a series of papers from researchers at MIT proposes that top-down components of object recognition involves a set of 2D prototype objects from which a set of object transforms are learned. These learned transforms can be applied to novel concrete instances from the same object class (Poggio, Vetter 1997).
The best way to illustrate this is by example. I implemented the Poggio and Vetter (1997) model in Matlab and tested it on a set of cuboid prototypes. One can think of these as representing the cube like objects of prior experience. For this implementation, the 3D cuboid vertices were randomly generated. The 3D representations were tilted by a few degrees to avoid an accidental perspective and then projected to 2D space using a perspective projection. The resulting 2D cuboids are rendered in figure 1.
Of course, one rarely observes an object from just one angle, so we’ll add another angle. In reality, there would be many, but not all, possible transformation of the object class. For simplicity, we’ll consider one rotation about the x-axis by 20 degrees. The resulting object class is rendered in figure 2.
Upon this rotation, the model learns how to transform the set of 2D vertices defining the non-rotated object into the 2D vertices defining the rotated objects. In this implementation, the transform is learned by a single layer neural network. However, an analytic solution also exists if the vertices form a linearly independent set; for details see Poggio & Vetter (1997). At this point, we have a single function that will transform any 2D cuboid into the same, or at least very close (if class is not linearly independent), cuboid rotated by 20 degrees.
This can be demonstrated by generating a new random cuboid, one that is not in the prototype class, and using the learned transformation function to rotate the novel cuboid. Figure 3 depicts the random cuboid in red rendered in the same location as the rotated version in blue. This is intended to assist in recognizing a successful transformation.
It is key to recognize that a traditional rotation in 3D space is not being performed on the novel cuboid. The novel object is projected into 2D space and then rotated via the learned 2D->2D transform.
An interesting prediction implied by this model is that when an individual mentally rotates an object, their performance should degrade relative to the number of degrees they are trying to rotate due to repeated applications of learned transforms. For instance, if we wanted to rotate the novel cuboid 40 degrees then we could simply apply the learned transform twice and this should take twice the amount of time. It also follows that the performance should degrade linearly, as the transform is a linear function. In fact, this is exactly what was reported in the classic mental rotation study by Shepard and Metzler (1971). They found a linear increase in reaction time when deciding whether or not two objects observed from different angles were the same object as the degrees of rotation between the two objects increased.
It would be interesting to see if performance holds for rotations of different increments. Shepard and Metzler tested only one increment; all rotations were in increments of 20 degrees. Differences in performance would be expected if individuals stored more than one rotation transform for an object class. For instance, if asked to rotate something by 40 degrees one may apply two iterations of 20 or several iterations of a more granular transform. However, one may be able to jump strait to a 45 or 90 degree rotation, especially for cubes.
Furthermore, experiments investigating the effects of different object classes would provide additional insight. Although Shepard and Metzler argued their objects were novel, they were all composed of cubes. It’s possible that the visual system may be able to apply the cube transforms to the cubes composing the object individually. What would happen if pyramids were used as the atomic component as opposed to cubes? The Poggio and Vetter model examined here couples transforms to an object class. Therefore, differences should be expected when manipulating object class. More specifically, one may not expect performance increases due to learning effects to carry over to a new class.
Poggio T., Vetter T. (1997). Linear Object Classes and Image Synthesis From a Single Example Image. IEEE Transactions on Pattern Matching and Machine Intelligence. 19.7, 733-741.
Shepard R., Metzler J. (1971). Mental Rotation of Three-Dimensional Objects. Science. 171.3972, 701-703.
Associative network models of semantic memory such as those proposed by Collins and Loftus (1975) have successfully explained complex psychological phenomenon such as priming effects. In a nutshell, a given stimulus such as a word activates a node representing the word in long term memory. The activation spreads outward from the starting node to neighboring nodes that are related in meaning or context; ‘semantically related.’ If the neighboring nodes become activated enough, the elements represented by those nodes are at the level of awareness within the individual. For example, in the David Lynch film ‘Wild At Heart’ a character states “My dog barks. [pause]. In your mind you picture a dog even though I have not told you what my dog looks like.” The experience of visualizing a particular dog given only a ‘dog prime’ could be explained, in part, by spreading activation in a semantic network model. Of course, in normal circumstances the activation only spreads so far and then decays.
In researching this topic I ran across a rather original if not entertaining paper that found psilocybin, the psychoactive chemical in hallucinogenic mushrooms, changes the behavior of spreading activation as measured by a lexical decision task (Spitzer et. al. 1996). In sum, it appears to ‘de-focus’ the spread so that words with less direct relationships begin to benefit from priming. For example, color-red is a direct pair whereas lemon-sweet is indirect. The psilocybin group showed a greater relative increase in reaction time for indirect relationships versus no relationship with respect to the placebo group. The study speculates that subjective effects of ‘mind expansion’ may be due to increased availability of remote semantic nodes, as if spreading activation is potentiated.
Collins, A.M. & Loftus, E.F. (1975). A spreading activation theory of semantic processing. Psychological Review, 82, 407-428.
Spitzer M., Thimm M., Hermle L., Holzmann P., Kovar K., Heirnann H., Gouzoulis-Mayfrank E., Kischka U., Schneider F. (1996). Increased Activation of Indirect Semantic Associations under Psilocybin. Society of Biological Psychiatry, 39, 1055-1057.
Dr. Hunter of the University of Colorado wrote a handy implementation of many statistical functions for Common Lisp. The package is called cl-statistics and is available via Unbuntu’s package manager. However, an inverse F cumulative distribution function is mysteriously absent. That is, a function that will provide an F statistic given two degrees of freedom and a percentile. Luckily, it’s pretty strait forward to modify the source to add it. In hopes of saving someone a couple hours of fumbling around, here’s the modification; just add the following function to the cl-statistics source file and add the function name to the export list near the beginning of the file.
(defun f-distribution (dof1 dof2 percentile)
(test-variables (dof1 :posint) (dof2 :posint) (percentile :prob))
#’(lambda (x) (f-significance x dof1 dof2 :one-tailed-p))
(- 1 percentile)))
I have long been interested in how the selection and use of a particular programming language by the programmer influences how they represent the problem internally. Ultimately, this would influence how, or even ‘if’, they solve the problem. As part of my studies at the University of Washington, I’ve finally had the opportunity to do some actual research in this area, albeit of very limited scope. The paper below examines the observed differences in the semantic networks of three programmers. An emphasis is placed on differences between users of languages in different paradigms, such as functional versus imperative.
This interest grew back when I was making heavy use of both the functional and interactive features of the Lua language. Over time, programming began to ‘feel’ different and I notices I was using different types of design patterns as compared to my C++ heavy days. These and many other subjective aspects encouraged me to explore other uniquely variant views of software development such as Smalltalk and Qi. I became further convinced the linguistic homogeny pervasive across industry is detrimental to innovation.
Unfortunately, on the topic of ‘what language is suited for what’ practitioners and researchers alike have little more than anecdotes. After all, most general purpose languages are all Turing complete and therefore computationally equivalent, so why does it matter? It matters when one considers the programmer and the machine as a composite unit. Then, the programmer is involved in a stimulus/response loop with the machine where the programming language and environment is the communication medium. At this level, the way the programmer perceives and responds to the machine state becomes important. I assert that language influences this interaction in an under-appreciated fashion regardless of underlying technical equivalencies.
Semantic Organization of Programming Language Constructs (postscript format)
“To be truly challenging, a voyage, like a life, must rest on a firm foundation of financial unrest. Otherwise, you are doomed to a routine traverse, the kind known to yachtsmen who play with their boats at sea… cruising, it is called. Voyaging belongs to seamen, and to the wanderers of the world who cannot, or will not, fit in. If you are contemplating a voyage and you have the means, abandon the venture until your fortunes change. Only then will you know what the sea is all about. “I’ve always wanted to sail to the south seas, but I can’t afford it.” What these men can’t afford is not to go. They are enmeshed in the cancerous discipline of security. And in the worship of security we fling our lives beneath the wheels of routine – and before we know it our lives are gone. What does a man need – really need? A few pounds of food each day, heat and shelter, six feet to lie down in – and some form of working activity that will yield a sense of accomplishment. That’s all – in the material sense, and we know it. But we are brainwashed by our economic system until we end up in a tomb beneath a pyramid of time payments, mortgages, preposterous gadgetry, playthings that divert our attention for the sheer idiocy of the charade. The years thunder by, the dreams of youth grow dim where they lie caked in dust on the shelves of patience. Before we know it, the tomb is sealed. Where, then, lies the answer? In choice. Which shall it be: bankruptcy of purse or bankruptcy of life?” – Sterling Hayden (General Jack D. Ripper from Dr. Stranglove).
The notion of an urban village has existed for several decades. In contrast to traditional city planning, an urban village attempts to minimize the daily need for personal cars, long-haul transit for foodstuffs and general dependence on a ‘foreign’ source for daily necessities. Foreign in this context may be interpreted as non-local corporate entity. A casual glance at modern cities in the West on the whole reveals the opposite ideology where most goods are shipped anywhere from hundreds of miles away to the opposite side of the globe. Although this strategy protects some daily needs from the toll of a local disaster, the cost in terms of energy is incalculable. That is, one can not calculate the true cost of natural resources and arguably labor. Traditional city design also incurs social side effects. The consumer is not responsible for, and thus has no direct power, over the production and distribution of daily needs. This has its most obvious impact on the poor who are beholden to the aloof and invisible master that produces and distributes their food and energy.
Alternatively, an urban village design facilitates some local food production. Certainly, growers will still expect some form of compensation for their crop. However, when production is local, labor may be exchanged with an employer who has a direct interest in the well being of the community as opposed to quarterly profits. In addition, sound planning and policy may allow lower income individuals some direct control over their own necessities. The Greenwood neighborhood of Seattle Washington recently released the following proposition for the city of Seattle.
“…develop new standards or incentive programs that encourage incorporating food gardens into multi-family developments” and “Directing the Department of Neighborhoods to identify the most suitable City-owned properties for conversion to use for food production, asking for no less than two acres that could be developed in 2009-2010″
It remains to be seen if these propositions succeed and the scope at which they’re implemented. However, to see such ideas being seriously considered by a significant mass of the populous fills one with optimism. I will post updates as they unfold.