Lex'Recap - Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs

Introduction

> It's fascinating how the Piraha people lack words for exact counting, not even for one. It challenges our assumptions about language.

> Their unique linguistic traits really make us rethink the fundamental aspects of communication and human cognition.

Human language

> I found grammar to be like a puzzle, a mathematical puzzle that intrigued me from an engineering perspective, leading me into computational linguistics. It felt like an interesting area with untapped potential for meaningful work.

> I focused on tackling the form of human language first, as the syntax seemed more manageable compared to the complexity of unraveling meaning. While others excel at form, grappling with the essence and deeper meaning of language remains a formidable challenge.

Generalizations in language

> I find it fascinating how word order in languages aligns harmoniously to minimize dependencies between words, making it easier to convey and understand meaning. This pattern is seen in around 95% of the 1,000 languages studied.

> The division between verb initial and verb final languages, with subjects generally coming first, is almost evenly split worldwide, showcasing the prevalence of these two fundamental word order structures across diverse languages.

Dependency grammar

> One key insight is that language has three components: sounds, words with form and meaning, and grammar or syntax, which involves combining words to create meaning. This is represented in linguistics through tree structures. Everyone agrees that sentences can be broken down into trees, with the root typically being the verb defining the event. The process of constructing a tree from a sentence can vary based on theoretical notions like dependency grammar, showcasing connections between words to convey a larger meaning. This process is somewhat automatable and involves identifying morphemes, the minimal meaning units within a language, which can add additional meaning to words through morphology like tense markers. High-frequency irregular verbs in English demonstrate how language evolves through irregularities that become common and sticky over time, challenging linguistic rules.

Morphology

> Morphology, the study of morphemes and their connections to roots, varies across languages with English tending to have one or two morphemes per word, while languages like Finnish can have up to 10 morphemes on a single root, resulting in millions of word forms.

>

> The evolution of language, like the development of color words, reflects the need for efficient communication. Different cultures may have varying numbers of color terms based on what they find necessary to communicate, highlighting how language evolves from functional requirements and the need to convey distinctions.

Evolution of languages

> I find it fascinating how language evolution is difficult to study due to the lack of historical data, as many languages lack a writing system. I realized that we mostly have snapshots of current languages, like Mandarin and English, for long-term language evolution.

> The rapid evolution of language on platforms like Reddit, driven by humor and the desire to deviate from mainstream communication, showcases how languages constantly change over time. Even English, such as the Queen's English, has undergone significant shifts in vowels over the years, highlighting the dynamic nature of language evolution. It's intriguing to think about how new languages born in the future could provide high-resolution data for studying linguistic changes.

Noam Chomsky

> I prefer dependency grammar over phrase structure grammar because it's more perspicuous in representing the connections between words.

> Noam Chomsky's movement theory suggests that words shift position in sentences, but contrary to this, I propose a lexical copying rule where words have multiple forms for different sentence structures.

> The study of formal language theory, like programming languages, explores the complexity of language structures, with human languages often falling into the category of context-free languages.

> In studying language evolution, I focus on the ease of language production as a key factor, optimizing for shorter dependencies for efficient communication.

> While form and meaning are intertwined for effective communication, I emphasize controlling for the form to analyze the ease of language production, with the form being a crucial aspect in conveying the intended meaning.

Thinking and language

> Firstly, Eve Fedorenko's research shows that the language network in the brain is stable and distinct. It activates for comprehension of sentences, regardless of whether they are spoken or written, highlighting high-level language comprehension as separate from other cognitive tasks like music or programming.

> Secondly, individuals with left-brain strokes that affect the language network can still perform tasks like playing chess or driving, demonstrating that language is not necessary for general cognition. This challenges the idea that symbolic processing, such as in math, is equivalent to language, further emphasizing the unique nature of language in cognitive processing.

> Lastly, the idea that language is a separate system from general thinking processes is intriguing and has significant implications. While language provides a powerful tool for expression, it is not a prerequisite for thought, suggesting a distinct cognitive path that language follows in the brain.

LLMs

> Large language models are incredibly proficient at predicting the structure of English, making them arguably the best current theories of human language in terms of form. They excel at covering all the data in the language but might lack the simplicity typically associated with theories.

> While large language models demonstrate exceptional form understanding, they might not truly grasp meaning. They can be easily tricked and show limitations in understanding deeper layers of language, unlike humans who have a greater capacity for discerning meaning and logic.

> The abilities of large language models are remarkable when it comes to form like nested structures, but their understanding of meaning falls short in comparison with human comprehension. Despite excelling at generating true statements based on training data, their reliance on form rather than deeper reasoning points to a potential gap in their capacity for true understanding.

Center embedding

> Dependency Grammar and Cognitive Cost: Viewing language through a dependency grammar framework reveals the cognitive cost associated with longer distance connections, showing that producing and comprehending connections between non-adjacent words incurs a measurable cost. It's fascinating how dependency distances impact processing, as longer dependencies lead to stronger activation in the language network.

> Center Embedding Complexity: Center Embedding in language poses significant challenges both for production and comprehension due to the cognitive cost it incurs. The complexity of center embedding, marked by nested structures in phrases like legalese, significantly hinders understanding and retention of information, highlighting the need to simplify language for optimal communication.

> Legalese Dilemma: Exploring legalese reveals a high prevalence of center embedding and low-frequency words, impairing comprehension. Surprisingly, passive voice doesn't significantly impact understanding. The preference for less complex, un-center embedded versions by both laypeople and lawyers raises questions about the complexities introduced in legal language.

> Noisy Channels and Language Optimization: Claude Shannon's concept of noisy channels in communication theory sheds light on how language may have evolved to optimize signal transmission amid noise. Examining the role of word order and syntax in dealing with noisy channels offers insights into how languages may have adapted to ensure robust communication despite potential signal loss.

Learning a new language

> When thinking about languages and learning, it's crucial to understand that languages are optimized for communication, not necessarily for ease of learning. Different languages have unique structures that may not make them easy to learn but are effective for communication.

> Despite public perception, there isn't any concrete evidence to suggest that certain languages are inherently harder or easier to learn for babies. By the age of three or four, children are proficient in the language they are exposed to, indicating that all languages are equally learnable.

Nature vs nurture

> I don't see a reason to postulate much innate structure in language. Large language models are successful because they can learn the forms of human language from input. The modularization of language areas doesn't necessarily mean we're born with them; we could have easily learned them.

> The brain's natural experiments, where people develop language abilities despite missing sections of their brain, are a fascinating area of research. Understanding how the brain's areas develop and adapt, even in the face of anomalies, sheds light on the intricate workings of the brain.

Culture and language

> Working with the Chimane and Piraha tribes in the Amazon, we delved into isolate languages without known connections to other languages, shedding light on how culture influences language. Discovering the Piraha's lack of exact counting words was mind-blowing, revealing language's role in shaping what we need to communicate, like practical information for survival. In tasks where exact counts weren't needed, they excelled, but when precise counting was necessary, their language proved limiting, sparking reflections on the interplay between language, culture, and cognitive capabilities. Such findings hint at how societal needs, like farming, could have spurred the development of counting systems, emphasizing the intricate relationship between language, society, and cognitive abilities.

Universal language

> Universal languages like Esperanto have not taken off because there needs to be a function for a language in the community to survive and be useful. When a language doesn't hold value for the local people, like in the case of the dying language Mositan, it will not thrive. Languages like English, Mandarin, and Spanish are popular because of their value in big economies, motivating people to learn them for economic reasons.

> There is tension between the convenience of trade and communication, and the importance of language as a symbol of national identity. It's not just about economic value, but also about preserving cultural identity and uniqueness. As a cognitive scientist and language expert, I believe it's crucial to maintain and celebrate the diversity of languages, as they are not only fascinating linguistically but also deeply connected to culture.

Language translation

> I find it fascinating how challenging it is to translate certain concepts between languages due to differences in cultural experiences and worldviews. The nuances and rhythms of language form a crucial aspect of translation, not just the literal content. It's like there's a beat and edge to the form that adds depth to the meaning.

> There is potential to measure and compare the musical aspects of different writing styles like the sentence structure of authors such as Hemingway. It's intriguing to think about analyzing and possibly quantifying these unique characteristics, although the actual process of translating them remains uncertain.

Animal communication

> - Human language is not necessarily special or superior to other communication systems in nature. We shouldn't assume our language is the only complex one just because we can't understand others like whales or crows yet.

> - There is potential for communication beyond humans, including with animals and even plants. By being intellectually humble and open to the possibility of diverse communication systems, we may someday bridge the gap between different forms of life on Earth.

Lex'Recap AI-generated recaps from the Lex Fridman podcast

Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs