hobbit
Well-Known Member
A few days ago, Tesla apparently had a big gathering at their headquarters
called "autonomy day". It was intended to present the recent work leading
up to Hardware 3.0 and the updated software that runs on it, all having to
do with the endless journey toward full self-driving. There are a couple
of different Youtubes covering it, most of them very long as it's the entire
presentation portion of the conference. I watched a bunch of it the other
night, two or three hours worth, and got a better idea of what they're after
and how they're working toward it.
The hardware is a distinctly impressive effort -- big custom ASICs
optimized for the neural-network they need to run the camera video through
for recognition, and over 20 times faster than the best Nvidia GPUs on
the market for the purpose. One of the lead fellows working on the
software side went into a deep description of the machine learning
process and training the neural net, and I had to do a bunch of ancillary
searching/reading to get a glimmer of what he was actually talking about.
It was all rather enlightening.
While it's a complex and noble effort, it inevitably lacks one key attribute
that HUMANS bring to the table: *intuition*. You can train by rote, using
thousands of pictures of cars, lanes, landscapes, obstacles, crashes
happening in realtime, water distortion and weird lighting effects at
night, or whatever, and that alone will never get you to level 5 autonomy.
There will always be *some* wacky situation that the best-trained neural net
will not know how to respond to, and usually in such situations there is
NOT enough time to alert a human driver and have them wake up and take over
in a competent and perceptive enough fashion. The big problem is, that 99.8
percent "FSD" efficacy that gets most people to work and back most of the
time without incident simply cannot assess things to the depth that an
alert and engaged human driver does, and breeds insidious complacency on
the part of the drivers that it's "good enough". When it abruptly turns
out that it's not good enough, those drivers are *not* ready to do their
job in time.
The introductory example that the software guy gave was a picture of an
iguana. An untrained neural network will just as easily decide that it's
a picture of a boat. By showing it hundreds, thousands more pictures of
iguanas in all different positions, with different lighting, coloration,
shadows, occlusion, etc -- gradually the net can build up a pretty solid
model of what an iguana looks like and maybe even how it's moving. We,
humans, can basically do that from ONE picture, and immediately extrapolate
what any iguana will look like after it's rotated, injured, greyscaled,
partially obscured by an object in front, or all of that -- without much
additional help or input. It's just something we do, applying our generalized
intuitive models for how objects move and appear in space. We do the same
for all the traffic dynamics we handle on a daily basis. We don't need to
have seen a thousand different pictures of a semi to have a good sense for
it volume and motion paths. We are able to spot a generic SUV sitting
unmoving near the end of a driveway like any vision system could, but the
fact that its reverse lights are on tells US to anticipate something very
different from a parked car, and it seems unlikely that Tesla's best efforts
would pick up on a subtlety like that until it actually started moving and
intersecting the forward path lines. Too late in some cases.
Great example from earlier this week: an empty flatbed had just pulled into
the driveway of a shop, but needed to turn around. It started backing out,
into the road I was on approaching it. The shape that intruded into the
space of the road was completely unlike any car -- it was the narrow wedged
end of the flatbed and the supporting framework under it, visually connected
to the side of the road, sitting fairly low down, not as high as a vehicle
but not lying on the road surface either. I had the briefest moment of
"wtf is *that*" while approaching, and quickly figured it out. If I had
kept going in a straight line I would have slammed into it; but without even
thinking much about it I evaulated the existing data I had already collected
about traffic in the lane to my left, which was clear, and smoothly swerved
into the next lane and around the end of the flatbed [which had stopped, but
still hanging out into the road, intending to back out the rest of the way
and get himself straightened out].
My intuition figured out very quickly what the strange shape actually was
and where the rest of that vehicle was and even why it was there and what
its driver was trying to accomplish. I was also fully ready for him to
*not* stop moving farther into the road, with a clear mental picture of all
the surrounding space I had to utilize and/or how long I'd need to stop
if needed. The deepest-trained "passive" neural net under the Tesla model
would still have no idea, and even after such an "incident" got uploaded to
the mothership and supposedly evaluated by a human for further training, I
have my continuing doubts. FSD's path to production will be burdened by
countless "woulda/coulda/shoulda" excuses for avoidable incidents that
weren't avoided.
And I certainly wouldn't pay an extra two grand for the privilege of
beta-testing something whose faults and deficiencies I now understand
at a fairly comprehensive level. Just ... no. *My* neural net's traffic
training has been decades in its tuning, and I'll continue to rely on
that. It has the additional layers to evaluate "what does all this
actually *mean*", which by comparison is priceless. Machine learning
is still a long way from that point.
_H*
called "autonomy day". It was intended to present the recent work leading
up to Hardware 3.0 and the updated software that runs on it, all having to
do with the endless journey toward full self-driving. There are a couple
of different Youtubes covering it, most of them very long as it's the entire
presentation portion of the conference. I watched a bunch of it the other
night, two or three hours worth, and got a better idea of what they're after
and how they're working toward it.
The hardware is a distinctly impressive effort -- big custom ASICs
optimized for the neural-network they need to run the camera video through
for recognition, and over 20 times faster than the best Nvidia GPUs on
the market for the purpose. One of the lead fellows working on the
software side went into a deep description of the machine learning
process and training the neural net, and I had to do a bunch of ancillary
searching/reading to get a glimmer of what he was actually talking about.
It was all rather enlightening.
While it's a complex and noble effort, it inevitably lacks one key attribute
that HUMANS bring to the table: *intuition*. You can train by rote, using
thousands of pictures of cars, lanes, landscapes, obstacles, crashes
happening in realtime, water distortion and weird lighting effects at
night, or whatever, and that alone will never get you to level 5 autonomy.
There will always be *some* wacky situation that the best-trained neural net
will not know how to respond to, and usually in such situations there is
NOT enough time to alert a human driver and have them wake up and take over
in a competent and perceptive enough fashion. The big problem is, that 99.8
percent "FSD" efficacy that gets most people to work and back most of the
time without incident simply cannot assess things to the depth that an
alert and engaged human driver does, and breeds insidious complacency on
the part of the drivers that it's "good enough". When it abruptly turns
out that it's not good enough, those drivers are *not* ready to do their
job in time.
The introductory example that the software guy gave was a picture of an
iguana. An untrained neural network will just as easily decide that it's
a picture of a boat. By showing it hundreds, thousands more pictures of
iguanas in all different positions, with different lighting, coloration,
shadows, occlusion, etc -- gradually the net can build up a pretty solid
model of what an iguana looks like and maybe even how it's moving. We,
humans, can basically do that from ONE picture, and immediately extrapolate
what any iguana will look like after it's rotated, injured, greyscaled,
partially obscured by an object in front, or all of that -- without much
additional help or input. It's just something we do, applying our generalized
intuitive models for how objects move and appear in space. We do the same
for all the traffic dynamics we handle on a daily basis. We don't need to
have seen a thousand different pictures of a semi to have a good sense for
it volume and motion paths. We are able to spot a generic SUV sitting
unmoving near the end of a driveway like any vision system could, but the
fact that its reverse lights are on tells US to anticipate something very
different from a parked car, and it seems unlikely that Tesla's best efforts
would pick up on a subtlety like that until it actually started moving and
intersecting the forward path lines. Too late in some cases.
Great example from earlier this week: an empty flatbed had just pulled into
the driveway of a shop, but needed to turn around. It started backing out,
into the road I was on approaching it. The shape that intruded into the
space of the road was completely unlike any car -- it was the narrow wedged
end of the flatbed and the supporting framework under it, visually connected
to the side of the road, sitting fairly low down, not as high as a vehicle
but not lying on the road surface either. I had the briefest moment of
"wtf is *that*" while approaching, and quickly figured it out. If I had
kept going in a straight line I would have slammed into it; but without even
thinking much about it I evaulated the existing data I had already collected
about traffic in the lane to my left, which was clear, and smoothly swerved
into the next lane and around the end of the flatbed [which had stopped, but
still hanging out into the road, intending to back out the rest of the way
and get himself straightened out].
My intuition figured out very quickly what the strange shape actually was
and where the rest of that vehicle was and even why it was there and what
its driver was trying to accomplish. I was also fully ready for him to
*not* stop moving farther into the road, with a clear mental picture of all
the surrounding space I had to utilize and/or how long I'd need to stop
if needed. The deepest-trained "passive" neural net under the Tesla model
would still have no idea, and even after such an "incident" got uploaded to
the mothership and supposedly evaluated by a human for further training, I
have my continuing doubts. FSD's path to production will be burdened by
countless "woulda/coulda/shoulda" excuses for avoidable incidents that
weren't avoided.
And I certainly wouldn't pay an extra two grand for the privilege of
beta-testing something whose faults and deficiencies I now understand
at a fairly comprehensive level. Just ... no. *My* neural net's traffic
training has been decades in its tuning, and I'll continue to rely on
that. It has the additional layers to evaluate "what does all this
actually *mean*", which by comparison is priceless. Machine learning
is still a long way from that point.
_H*