Nvidia GeForce GTX 1660 Review: The Turing Assault Continues


It was only a matter of time before Nvidia took the virgin graphics processor TU116 from its GeForce GTX 1660 Ti and sculpts it a bit to create a cheaper derivative. Not surprisingly, the new GeForce GTX 1660 is very similar to the high-end model in that it lacks the RT and Tensor signature cores of Turing architecture. Instead, it targets immediate resources to accelerate today's rasterized games.

Nvidia did not even reduce the TU116 resource pool significantly when creating the GeForce GTX 1660: a pair of streaming multiprocessors is excised, taking 128 CUDA cores and eight texture units. But the GPU is also quite complete. The biggest loss of this card is its lack of GDDR6 memory. By replacing the GDDR5 with 8 Gb / s, the bandwidth increases from 288 GB / s on the 1660 Ti to 192 GB / s.

Naturally, the GeForce GTX 1660 is primarily for FHD games, where 6GB of slow memory slows performance as much as it would at higher resolutions. Can the $ 220 / £ 200 card keep up with fast enough frame rates to avoid the AMD Radeon RX 590 with more GDDR5 on a wider bus?

TU116 Recap: Turing Without the nuclei RT and Tensor

The GPU at the heart of the GeForce GTX 1660 is specifically named TU116-300-A1. It's a close relative of the TU116-400-A1's GeForce GTX 1660 Ti, which switches from 24 streaming multiprocessors to 22. We of course still have to deal with a processor devoid of the RT and Tensor cores. Nvidia's future, measuring 284 mm² and composed of 6.6 billion transistors manufactured using TSMC's FinFET 12 nm process.

Despite its smaller transistors, the TU116 is 42% larger than the GP106 processor that preceded it. Part of this growth is attributable to the more sophisticated shaders of Turing architecture. Like the high-end GeForce RTX 20 cards, the GeForce GTX 1660 supports the simultaneous execution of FP32 arithmetic instructions, which make up most of the shader workloads, as well as INT32 operations (for Addressing / retrieval of data, floating point min / max, comparison, etc.)). When you hear that Turing cores achieve better performance than Pascal at a given clock frequency, this ability largely explains why.

Turing's streaming multiprocessors have fewer CUDA cores than Pascal's, but the design partially offsets more SMs on each GPU. The new architecture assigns a scheduler to each set of 16 CUDA cores (2x Pascal), as well as to a 16-core CUDA shipping unit (identical to Pascal). Four of these 16-core arrays include the SM, as well as 96 KB of configurable cache memory as 64 KB L1 / 32 KB shared memory or vice versa, and four texture units. Since Turing doubles the number of schedulers, he just needs to send an instruction to the CUDA cores every two clock cycles to keep them full. In the meantime, it is free to issue a different instruction for any other unit, including INT32 cores.

In TU116, Nvidia replaces Turing Tensor cores with 128 dedicated FP16 cores per SM, which allows the GeForce GTX 1660 to handle half-precision operations twice as fast as FP32. Other Turing-based GPUs also have dual-rate FP16s via their Tensor cores. The TU116 configuration maintains this standard with the hardware built specifically for this GPU. The following table is an updated version of the one published in our review GeForce GTX 1660 Ti, which illustrates the significant improvement made by the TU116 at a half-precision rate compared to the GeForce GTX 1060 and its GP106 chip. Pascal's base.

When we ran Sandra's scientific analysis module, which tests the general matrix multiplications, we see how much processing time the number of Tensor cores of TU106 of FP16 is greater than that obtained with TU116. The GeForce GTX 1060, which only supported the FP16 symbolically, barely records on the graph.

In addition to shaders and the unified Turing architecture cache, the TU116 also supports a pair of algorithms called Content Adaptive Shading and Motion Adaptive Shading, also known as Variable Rate Shading. We covered this technology in the explored Nvidia Turing architecture: inside the GeForce RTX 2080. This story also introduced Turing's accelerated video encoding and decoding capabilities, which are also applicable to the GeForce GTX 1660.

Put all together …

Nvidia packs 24 SMs in the TU116, dividing them into three graphics processing clusters. With 64 FP32 cores per SM, 1,536 CUDA cores and 96 texture units across the GPU. By losing two SMs, the GeForce GTX 1660 is left with 1,408 active CUDA cores and 88 usable texture units.

Board partners will undoubtedly target a range of frequencies to differentiate their cards. However, the official base clock is 1530 MHz with a GPU Boost specification of 1785 MHz.. These two figures are slightly higher than the clocks of the GeForce GTX 1660 Ti, although they can not fully compensate for the missing SMs.

Our Gigabyte GeForce GTX 1660 OC 6G sample maintained a constant frequency of 1,935 MHz during three Metro last light, operating about 90 MHz faster than the 1660 Ti we examined a few weeks ago. Thus, on paper, the GeForce GTX 1660 offers up to 5 TFLOPS FP32 performance and 10 TFLOPS flow rates FP16.

Six 32-bit memory controllers provide the TU116 with a 192-bit aggregated bus, populated with 8 Gb / s GDDR5 modules up to 192 GB / s. This is comparable to the GeForce GTX 1060 6GB and a 33% reduction over the GeForce GTX 1660 Ti. Combined with the loss of two SMs, the shift from GDDR6 to GDDR5 memory accounts resulted in lower performance than the GeForce GTX 1660 compared to 1660 Ti.

Each memory controller is associated with eight ROPs and a 256KB L2 cache. In total, TU116 exhibited 48 ROP and 1.5 MB of L2. The ROP number of the GeForce GTX 1660 compares favorably with RTX 2060, which also uses 48 rendering renderings. But the L2 cache slices of TU116 are half the size of TU106.

Given the similarities to the GeForce GTX 1660 Ti, it is not surprising that the GeForce GTX 1660 has the same 120W capacity. Unfortunately, none of these graphics cards support multiple GPUs. Nvidia continues to argue that SLI is expected to generate higher absolute performance, rather than giving players a way to match configurations to a single GPU.

Gigabyte GeForce GTX 1660 OC 6G GeForce GTX 1660 Ti GeForce RTX 2060 FE GeForce GTX 1060 FE GeForce GTX 1070 FE
Architecture (GPU)
Turing (TU116) Turing (TU116) Turing (TU106) Pascal (GP106) Pascal (GP104)
CUDA Cores
1408 1536 1920 1280 1920
Peak FP32 Compute
Tensor nuclei
N / A N / A 240 N / A N / A
RT Hearts
N / A N / A 30 N / A N / A
Texture units
88 96 120 80 120
Basic clock rate
1530 MHz 1500 MHz 1365 MHz 1506 MHz 1506 MHz
GPU boost rate
1785 MHz 1770 MHz 1680 MHz 1708 MHz 1683 MHz
Memory capacity
Memory bus
192 bits 192 bits 192 bits 192 bits 256 bits
Memory bandwidth
192 GB / s 288 GB / s 336 GB / s 192 GB / s 256 GB / s
48 48 48 48 64
L2 cache
1.5 MB 1.5 MB 3MB 1.5 MB 2MB
120W 120W 160W 120W 150W
Number of transistors
6.6 billion 6.6 billion 10.8 billion 4.4 billion 7.2 billion
Size of the matrix
284 mm² 284 mm² 445 mm² 200 mm² 314 mm²
SLI support
No No No No Yes (MIO)

PLUS: best graphics cards

MORE: Desktop GPU Performance Hierarchy Table

MORE: All graphic content

Source link