Feb 16, 2021
I’d like to start our discussion from examine a human’s daily life. We all know that human brain has evolved tens thousands of years ever since the debut of homo-sapien – to a large degree marks the debut of human intelligence. But besides just human animals are also considered as intelligent, this dates back to billions years of evolution of single cell to mammals, which forms the unbelievably complex biological structure of human beings – I’m talking about physicals, vision, nerves, hormones all that governs human’s daily life. IMO, we should avoid over-indulging in the complex world of biology or bionics. Rather, let’s start with the “generic AI” that is of the most interest in current research & engineering field, and examine how we can build such systems and having them orchestrate in the long future.
At 7:30am, you wake up and get to your grooming routines, shower, brush your teeth, comb your hair. In order to achieve these delicate hand-eye coordination movements, your visual information is relayed thru retina to the visual cortex of your brain, here your brain will generate instructions based on these visual “cue”, say your brain is somehow “programmed” to automatically route the “brushing teeth” routine to the “motor cortex” of the brain, which connects to the muscle tissues of your arm & hands, meanwhile your “Cerebellum” is the control unit that “balances” out your movements thru fine-tuning the joints of your limbs, to ensure you didn’t fall off your toothbrush and brush steadily instead of jerking up and down in your mouth. To build such an AI system, you need powerful visual recognition systems to mark out the area of interest and their positions, a planning system that contains the “programs” of each routine, that is, showering program, brushing teeth program, comb hair program, the programs can be somewhat high level information to instruct the “motor” system to move, for example, as a human you prolly don’t know how much degree to elbow in, how much Pascal of pressure to hold the brush, how much Pascal of pressure to squeeze out toothpaste, you just “generally” think doing this and your “motor cortex” figure it out for you. For the motor system, it takes these high level information passed down from planner and execute the moves and manipulate those complex robotic control magics. Of course in reality the joint of the robots is not gonna be biological, for instance Boston dynamics uses hydraulic drives, which applies a unique set of physics to move the joints. This is in fact what today’s robotics can achieve, given instructions of walk, jump, or even backflip, it’ll program it’s complex control system to perform these joints. The gap between current system vs what’s in future is how much easy we can reprogram these instructions, and how could the robot learn new movements from ambiguous, high level thoughts, or even just from mimicking.
As your finish off the sanitary routines, and go to your office, your brain & body turns from the “routine” mode to a more “creative” mode. Apparently, assuming your job is not dull, you’ll have to interact with coworkers thru verbal/textual communications, process a massive amount of informations, and make decisions based on data and past experience. This is what current AI system excels at – it’s able to learn from massive data and outperforming humans especially in tasks that requires heavy memorization. Say you’re customer manager checking in with your clients thru phone, your auditory cortex receives streams from microphone, convert that into text of meaning, and further trigger “thinking”. How does human “think”? Let’s not try to answer these unanswerable questions, rather think from the end-goal perspective, you’re checking in with your clients as a customer manager, you’re suppose to hear feedbacks of the recent new feature your team developed, thus you need to “correlate” your client’s word to the “new feature” of your product, get a sense of satisfaction level of your client, that is, understand sentiment; and in reality the task can be more complicated, such as rerouting the question to technicians if you can’t answer, or even brainstorming together with your client on an ambiguous topic that neither of you have good previous experience. Today, AI systems did phenomenal job processing speech signals, they can convert speech to text within a very narrow error margin. They can also perform preliminary natural language understanding “tasks” such as intent understanding, entity extraction, or slightly heavier tasks like reading comprehension or simple Q & A based on knowledge database. The gap between current state and an ideal, “generic” language bot is, however, more subtle. Currently powerful NLP systems like GPT-3 are trained using “co-occurance” language models, that means, the model is not able to reason the “causality” of sentences and provides a logical explanation of its output. Think abt the same customer manager scenario, replaced with a “customer manager” bot, it’s able to successfully correlate customer’s feedback to a specific new product recently developed, score on the satisfaction level of the customer based on sentiment analysis, but it couldn’t explain “why” exactly it performed those actions – in visual processing, this might be fine cuz it couldn’t make much sense explaining pixels, but a “customer manager” is a serious job and both your boss and your client needs a clear logical explanation of your actions based on what you hear. If explainability is considered to be a pitfall of current language AI, “inflexibility” is the real gap between today’s AI and a real AI. Think about AI as an employee of a company, it couldn’t do any of those unless it’s trained to do just that. If it’s amazing that a motor system is able to learn and perform moves from high level ideas, like animals, human is such an extraordinary system that he can proactively learn just by himself. Human, while working as an employee, is able to proactively understand the problem and proactively seek informations to create logics/process to solve a problem. If you’re the boss trying to hire a few customer managers, how often do you train them? Perhaps one orientation at the start date? Well to hire a bunch of AI “customer managers” built today, you’d need to train them at least every day – provided your business logic is so damn simple And every time you try to train a new task many unexpected problems pop up. At the end of the day, it’s not AI at all! Because you end up being the sucker that learn & understand the problems! Well the benefit you got is “scalability”, imagine you train a new task, and you can deploy to thousands of your customer manager bots! If you’re hiring humans, you mostly be supervising a few tens of them before you burn out. And also since those are bots so you can rip them off day and night without considering minimum wage. Sounds nasty right? That’s exactly how today’s AI help build customer service bot and voice assistant like Alexa and Google home. However, not all business are Alexa that serves millions, many small business only serves a few customers and need to solve much, much delicate tasks beyond “Hi Alexa what’s the news?”. In ideal state of language AI, customer would say “hey can you give me a step by step guide on how to use feature X?”, the bot would be confused at first, then it realized the semantics of “step-by-step guide” is to provide a series of instructions, then it can respond in 2 ways, then it realized it should read “feature X” in the latest user manual and devise such instructions, then it starts to read the manual and learn, after understanding the general idea of “feature X”, it goes to the product page (assuming it’s a web interface), and start goofing around, just like a man, after a few failed trials it finally collected all the needed steps, which is click the the button on the top right corner, select X from the dropdown list, click button Y, then boom it’s done. It turns out the client is ecstatic on this tutorial that he only believes he’s speaking to a real human being.
As demonstrated in a working class human’s daily life, to build a generic AI, we roughly feel these components are in need:
- A perception system that processes visual/audio signals
- A planning system that, knowing what task it needs to do, receiving visual/audio signals, (and maybe sensory signals as well), and extract high level ideas, and pass down such high level ideas to motor system
- A motor system that receiving high level ideas from planner, either it’s already programmed on how to execute, or it learns to do it through trial and error or mimicking.
- A language system that understands the semantics of verbal/textual communications.
- A real intelligent “thinker” system, at the beginning it’s your newbie employee who doesn’t know what to do at all. But it’s able to understand your problem and learn. He learns new tasks thru interacting with external interfaces, and orchestrate the planning system, motor system and language system to perform the desired tasks.
As of 2021, Machine Learning techniques has made significant progresses in 1-4, but I still consider them prehistoric AI, since all the systems built from state-of-art ML are far from robust and explainable. For 5, 0 progress has made so far according to my observation and work in the AI industry. And I’m holding a relatively pessimistic view on actually achieving 5 — I believe achieving 5 needs a break-through rather than incremental. And I definitely don’t believe ML scopes the future of 5. Perhaps 5 is done, in the far future, by something completely different than ML, or it could be ML is just part of the whole solution set, like ML compared with 5, is just like a dust in the universe.
Nonetheless I do not hold pessimistic view about AI industry. Current industry trend is to offload entire 5 to human beings. Safe enough right? As 1-4 matures, so is 5 getting more sophisticated, or rather the human behind 5 getting more sophisticated. This brings a big opportunity to engineers working in AI who’s got to understand tons of problems and create tons of tasks. The trick is to play a good balance between cost and return, since building AI system is gonna be expensive.