Human-centric jobs

Collocations are hard to pin down, but bloody useful” (Prof. Dr. Hans-Jörg Schmid, 2014)

Aristotle postulates the unity of “physics” and “logos” and suggests it not as accidental but necessary. Nowadays Fenollosa remarks, “Nature is not made up of things (nouns) and motions (verbs), but of things which move and a motion which is only that of moving things. … Our tendency to isolate either the motion or the location (verb or noun) reveals the limit of conceptuality already built into Western grammatology.” Additionally, the description of ontology is governed by the constraints of language and culture. For example, our eyes can discern millions of colours but the English language can assign only 99 words or collocations to colours. Possessing “a language is to be continuously involved in trying to extend one’s powers of articulation. …How do we articulate, that is, expand, the domain of the sayable? … New terms don’t arise simply because independently existing phenomena come to our notice and are named. They are generated out of enactment and the discourse of norms and exemplars which arises out of this enactment.” (Charles Taylor) Stephen Hawking adds: “We certainly cannot continue, for long, with the exponential growth of knowledge that we have had in the last 300 years.” (Stephen Hawking, 2018) And von Humboldt maintains that there is always ” a feeling that there is something which the language does not directly contain, but which the [mind/ soul], spurred on by language, must supply; and the [drive], in turn, to couple everything felt by the soul with a sound.”

One possible way out of this dilemma is the forming of combinations of existing words and the subsequent assigning of these new collocations to things, concepts, meanings and feelings that human mind is increasingly discovering every single day.

As a consequence, my collocations dictionary is based on the Platonist theory of mind, so far as concepts of a natural language which can be equated to a single word are called “lexical concepts”. Meanings, or concepts, are identical with word-usages. Modern neuroscientists may call them “patterns”. But in contrast to some experts, I assume a lexical concept can consist of more than a single word. For example, “other creditors including taxation and social security” is one lexical concept that can be as precisely defined as “obligation”. Therefore, collocations or multi-word units are “lexical concepts”.

In addition to this, I support the theory-theory approach postulating that lexical concepts are not learned in isolation, but rather as a part of the learner’s experiences or templates (Ch. Taylor), together with the cultural environment. This view is backed by recent findings by Wiltgen et alia, which show that during memory retrieval the hippocampus is required for the reactivation of cortical activity patterns that occurred during encoding, but artificial reactivation of the cortical representation of a memory alone is sufficient to drive recall. (Wiltgen and colleagues, 2014). And one can become aware of the entire concept of a thing or idea when one sees or hears one of its associated features. (Glass and Holyoak 1985)

“Such memory systems are called auto-associative, since they recall by association rather than by address.” (Max Tegmark, Life 3.0, 2017)

“The “thing” is the locus of the full corona of liminal meanings”. (Heidegger) If you read the examples provided by the documents, you can experience, i.e. get a feeling for the full corona of liminal meanings of the headword.

The semantical intra-connections of documents in the ARCS simulate the approaches mentioned above. This is also called latent semantic indexing. If one wants to write a report about certain topics, “each document is a combination of certain topics, and each topic is written using a small subset of all possible words.” (Ethem Alpaydin, 2016)

I am well aware that many linguists would place the conception of an agreed list of English collocations in the realm of fancy as the English language is an open-ended system. It seems, then, that the list of English collocations to be counted is indefinitely large. Any attempt to define collocations only by statistical significance or textual vicinity through computers is almost useless.

One cannot rely only on linguistic theories, as they are provisional and incomplete. A dogmatical solution by computers is therefore not only unsatisfactory but impossible. “Condillac belongs to the mode of thought which conceives language as an instrument, a set of connections which we can use to construct and control things. … He wouldn’t have known where Herder was ‘coming from’, just as his heirs today, the proponents of chimp language, ‘talking’ computers, and truth-conditional theories of meaning, find the analogous objections to their views gratuitous and puzzling.” (Charles Taylor, 2016) “These perceptions would not then belong to any experience, consequently would be without an object, merely a blind play of representations, less than a dream.” (I. Kant)

If one wants a solution, there is one option only: One must rely on ‘intuitive observation’ and reflection, or “Besonnenheit” (Herder), in selecting collocations. This means that a heavy burden falls on the author’s training in the art of ´alert reading`, and of responding to linguistic, structural, and other cues.

As early as 1957, Firth remarked that collocations have to do with “mutual expectancies of words”. Additionally, “even a brief presentation of a word can activate the meaning of a related word”. (Fowler, Wolford, Slade, and Tassinary 1981) When you are deep in conversation, 20 billion cells are directly engaged in information processing, each cell having up to 15,000 connections with other cells. Thus the nervous system overcomes a slow interneural transmission by neural parallelism. (Kolb, Wilshaw 1980)

“There is a widespread tendency to conceptualize the influence of predicitons exclusively in terms of ‘top-down’ processes, whereby the predictions generated in higher-level areas exert their influnce on lower-level areas within an information processing hierarchy. However, this exludes from consideration the predictive information embedded in the ‘bottom-up’ stream of information processing which … is critical for the development of the predicitve processing framework.” (Ch. Teufel & Paul C. Fletcher, 2020)

During a conversation, native speakers of any language will be able to intuitively predict with a high degree of certainty the occurrence of one word when they hear or read the other. If you want to avoid ‘collocative clashes’, i.e. combinations of words which conflict with native speakers’ expectations, you must resort to the ARCS. “Even a few rough spots in the language and readability of a paper can prevent a journal editor from sending it out to reviewers. … There are too many unwritten rules in English. English speakers don’t really know the rules but we can hear them. ..There’s a cadence to the language that you have to hit to get the rhythm right.” (Morgan Tucker)

This dictionary enhances translators, writers, and speakers to use faithful English. It is a commonplace that everybody trusts a speaker or writer who chooses word combinations like his best friends. Relationships are intrinsic to social behaviour. “Humans spontaneously register a great deal of information when perceiving other people, such as intentions, traits and emotions. (C. Parkinson, A. Kleinbaum, Th. Wheatley, 2017)

“These days, I don’t know with any certainty of a technology that can cope with the phenomenon of having a conversation with a human, or can achieve more than execute simple commands. Nor do I know of any successful research in this field of interest.” (Ronnie Vuine, 2018)

Where an absolute norm for collocations cannot be relied on, the establishing of a relative norm can be very useful. Luckily, advanced readers have developed an intuition of their own that enables them to pick the right collocations from the results provided by the ARCS. “Good writers acquire their craft not from memorizing rules but from reading a lot.” (Steven Pinker) Short sequences, such as collocations or idioms, seem to be learned as a result of natural repetition. As ill luck would have it, “language is arbitrary and illogical and must be acquired not by logic but by brute-force memorization.” (Steven Pinker)

In oral communication, the “features” of intonation as well as body language matter more than the main message. Non-native speakers lack this intuition more or less. “Non-native speakes may often be able to interpret the relation underlying such collocations when they encounter them; in language production, however, they are victims of the fact that the choice of the precise words that make up a collocation is to a large extent arbitrary.” (Glaser 1986)

Each concept or word in memory has a certain level of activation. When the activation of a concept exceeds a certain threshold, the concept or word enters consciousness. Words that are semantically associated may play a role as retrieval cues in recall tasks. Typically, a collocation is learned at the conceptual level, so the production of the conceptual representation of each item activates its predecessor. Such sequences are susceptible to retrograde interference from the learning of a new sequence that contains the same items in a different order. (Briggs 1957; McGeoch 1936; Melton and Irwin 1940) Learning collocations usually does not involve something entirely new, instead, involves adding more details to a well-developed conceptual network.

Words and concepts are organised as layers or documents where all words in a layer take input from all the words in the previous layer. Computational experts call this a multi-layer perceptron.

The recall of a collocation requires an additional processing step that is usually not required in a recognition task. It is here where the ARCS comes in useful. The ARCS is an empowerment tool for augmented knowldege. It thus provides the right collocation and helps to avoid clumsy sentences with unusual or unacceptable word combinations.”Despite the absence of glosses, learners can nonetheless select collocations through a culling process whereby familiar words are contemplated as potential candidates and unknown words ignored.” (Nicolas R. Cueto)

In the world of augmented knowledge, the ARCS is a real game-changer. It serves as an intelligence amplification that uses a computer to make speaking, writing, and translating easier for a human to perform. If one looks at the evolution of speech technologies from a Bell Labs demonstration of speech synthesis in 1961 till today, “the risk that machines will develop and even control the human race is very limited. …The real promise is in helping humans, who will remain the master and set the goals. I don’t see a risk that we will ever be taken over.” (Nils Lenke, Research Director at NUANCE, 2016)

“The identification, interpretation and translation of multi-word units (MWUs, i.e. collocations) still represent open challenges, both from a theoretical and a practical point of view. The low standard of analysis and translation of MWUs in translation technologies suggest that there is the need to invest in further research with the goal of improving the performance of the various translation applications.

“Multi-word units (MWUs) are a complex linguistic phenomenon, ranging from lexical units with a relatively high degree of internal variability to expressions that are frozen or semi-frozen. Such units are very frequent both in everyday language and in languages for special purposes. Their interpretation and translation sometimes present unexpected obstacles even to human translators, mainly because of intrinsic ambiguities, structural and lexical asymmetries between languages, and, finally, cultural differences.”(Johanna Monti, MT Summit 2013)

One can think only if one has at least one language, and one can only think what one can express linguistically. Translation must faithfully reproduce meaning, purpose, and intention, it is is the interlinguistic transfer of conceptual idiosyncracies of the translator and a change of cultural perspective, That’s why everybody must be familiar with the limitations of translation. “A great deal will be lost in the translation: the new (target) language will be unable to convey the same warmth or spirit of the stories, word-play will be missing, anecdotes and jokes lack a certain punch, ceremonial expressions will not have the same alliterative or rhythmical gravity.” (David Crystal, 2000) Surely every collocation has some associations – emotive, moral, ideological, etc. – in addition to its brute sense. And if translators use unfamiliar collocations, they may even unvoluntarily create an atmosphere of uneasiness or even distrust.

“Professional translation is an extremely complex process: bilingual translators of the English and German language should not only know how to translate “word for word” or “sentence for sentence” but faithfully and idiomatically as well so that the complete information actually makes sense in the other language. Your English or German should not lack idiomaticity. Very often, machine translation is not even a first beginning.” (H.F.B.) Herder and Schleiermacher would not dignify machine translation with the honorific name of “translation” at all.

One of the results of the TARAXÜ project is: “A noticeable result is that Google performs worst on the WMT corpus.” Types of errors are: missing content word(s), wrong content word(s), incorrect word form(s), incorrect word order, incorret punctuation, and other errors.”

Severe errors made by machine translation engines may have economic consequences. (Click on the previous sentence.)

“Writing abilities are among the most important business skills for a CIO, senior IT manager, or any IT person seeking a promotion.” (Jody Gilbert)

TechRepublic runs a blog about English usage issues that can cost you a job interview or that makes you look stupid.

Here are some laughable pieces of inefficiency. (Click on the previous sentence!)

Don’t look stupid and stop the loss of time by using the ARCS and become more confident and proficient. The ARCS is an indispensable adjunct to language learning and idiomatic translation. There is nothing to equal the ARCS.

“While easily mastered by native speakers, their (i.e. multiword expressions or collocations) interpretation poses a major challenge for computational systems, due to their flexible and heterogeneous nature.” (Marina Santini, 2014)

“The way machines process natural language entails no understanding of it at all.” (Michael Collins, 2013)

The ARCS is a promise kept. It helps you to

sound like a native speaker,

avoid critical translation mistakes, and

write with more confidence.

Human memory has an organization similar to that of my dictionary, although the organizing features are much more general than letters of the alphabet.

“Patterns triggered in the neocortex trigger other patterns. Partially complete patterns send signals down the conceptual hierarchy; completed patterns send signals up the conceptual hierarchy. These neocortical patterns are the language of thought. But our thoughts are not conceived primarily in the elements of language. … The HHMM method can also include probability networks on higher levels of language structure, such as the order of words, the inclusion of phrases, collocations and whole sentences, and so on up the hierarchy of language.” (Ray Kurzweil, IN: How to create a Mind, 2012)

“According to Dr. Kay, who headed up the Iraq Survey Group and acted as a weapons inspector in Iraq after the 2003 U.S. invasion, mistranslations of Arabic to English is what went wrong in 2003 when then Secretary of State Colin Powell on behalf of the Bush administration made the case to go into Iraq at the United Nations. …Very often, as you know, with Google Translate, you get very different meanings. And in the case in 2003, what Colin Powell cited as communication intercepts in the Security Council, indeed, did not mean what we thought they meant. It was a combination of mistranslation and code words they used.” (Marinka Peschmann, 2013)

The design of the ARCS imitates the processes in the brain by intra-connecting all its words semantically and conceptually (see MINDMAP). It’s still the neocortex of the human translator that must decide which WMU, collocation, syntax or genre fits precisely the intention of the writer or speaker, at least as long as computers have no consciousness. In October 2014, machine-learning expert and UC Berkeley Professor Michael Jordan states: “We have no idea how neurons are storing information, how they are computing, what the rules are, what the algorithms are, what the representations are, and the like.”

On the other hand, our human memories will be extended by means of the ARCS to an extreme degree. Would you, for example, be able to retrieve about 800 adjective-noun combinations of the noun “person”? Therefore, I consider the ARCS as a technological extension of our perceptions and memories.

In an associative network, pieces of information are represented at nodes connected by arcs. Arcs that return to the same node can be traversed any number of times.

With its more than 4,100 pages or 70,000 nodes (headwords), the ARCS is the most comprehensive evidence-based and corpus-based universal English collocation dictionary and thesaurus which shows how language is really used. It helps those who want their writing to be lucid and effective, because it sounds natural.

The ARCS is

highly recommended,

the most innovative,

the most comprehensive, and, in the world,

the most time-saving dictionary of English collocations and their German or phraseological equivalents.

It is highly recomended as it has a five-star rating in the ELRA catalogue (Keyword: M0013),

it is the most innovative as it is a relational data base,

it is the most comprehensive as it intra-connects more than one million words, and because it is several times larger than the other collocation dictionaries,

it is the most time-saving as it provides very fast reports of all collocations that help make your English to sound natural and up-to-date.

The Arcs can be easily utilized as it is organized in database format. It is pragmatic as it provides only part-of-speech tags and avoids extensive annotation schemes, as e.g. the SPAADIA-System by G. Leech & M. Weisser, or Flickr Tags. Extensive algorithms and tagging schemes may be more critical for machine translation.
The ARCS is a treasure trove of contemporary corpus-based collocational knowledge. The ARCS helps you avoid typical and critical translation mistakes.

The ARCS is a single-handed achievement of mine.