Nvidia GeForce GTX 1660 Review: The Turing Assault Continues

[ad_1]

It was only a matter of time before Nvidia took the virgin graphics processor TU116 from its GeForce GTX 1660 Ti and sculpts it a bit to create a cheaper derivative. Not surprisingly, the new GeForce GTX 1660 is very similar to the high-end model in that it lacks the RT and Tensor signature cores of Turing architecture. Instead, it targets immediate resources to accelerate today's rasterized games.

Nvidia did not even reduce the TU116 resource pool significantly when creating the GeForce GTX 1660: a pair of streaming multiprocessors is excised, taking 128 CUDA cores and eight texture units. But the GPU is also quite complete. The biggest loss of this card is its lack of GDDR6 memory. By replacing the GDDR5 with 8 Gb / s, the bandwidth increases from 288 GB / s on the 1660 Ti to 192 GB / s.

Naturally, the GeForce GTX 1660 is primarily for FHD games, where 6GB of slow memory slows performance as much as it would at higher resolutions. Can the $ 220 / £ 200 card keep up with fast enough frame rates to avoid the AMD Radeon RX 590 with more GDDR5 on a wider bus?

Advantages

Excellent 1080p performance
Attractive price at the point of entry of $ 220
A reasonable power consumption of 120 W limits heat and noise

The inconvenients

Not ideal for 1440p games
Power profile similar to that of the faster GeForce GTX 1660 Ti

Verdict

Based on the same TU116 processor as the GeForce GTX 1660 Ti, Nvidia's GeForce GTX 1660 loses two streaming multiprocessors and replaces GDDR6 memory with a slower GDDR5. Therefore, it remains an excellent choice for 1920×1080 games but is not recommended for 2560×1440 formats. Just make sure to compare prices before making your purchase. The good deals on the Radeon RX 580 cards may warrant a look, despite their inferior performance.

TU116 Recap: Turing Without the nuclei RT and Tensor

The GPU at the heart of the GeForce GTX 1660 is specifically named TU116-300-A1. It's a close relative of the TU116-400-A1's GeForce GTX 1660 Ti, which switches from 24 streaming multiprocessors to 22. We of course still have to deal with a processor devoid of the RT and Tensor cores. Nvidia's future, measuring 284 mm² and composed of 6.6 billion transistors manufactured using TSMC's FinFET 12 nm process.

Despite its smaller transistors, the TU116 is 42% larger than the GP106 processor that preceded it. Part of this growth is attributable to the more sophisticated shaders of Turing architecture. Like the high-end GeForce RTX 20 cards, the GeForce GTX 1660 supports the simultaneous execution of FP32 arithmetic instructions, which make up most of the shader workloads, as well as INT32 operations (for Addressing / retrieval of data, floating point min / max, comparison, etc.)). When you hear that Turing cores achieve better performance than Pascal at a given clock frequency, this ability largely explains why.

Turing's streaming multiprocessors have fewer CUDA cores than Pascal's, but the design partially offsets more SMs on each GPU. The new architecture assigns a scheduler to each set of 16 CUDA cores (2x Pascal), as well as to a 16-core CUDA shipping unit (identical to Pascal). Four of these 16-core arrays include the SM, as well as 96 KB of configurable cache memory as 64 KB L1 / 32 KB shared memory or vice versa, and four texture units. Since Turing doubles the number of schedulers, he just needs to send an instruction to the CUDA cores every two clock cycles to keep them full. In the meantime, it is free to issue a different instruction for any other unit, including INT32 cores.

In TU116, Nvidia replaces Turing Tensor cores with 128 dedicated FP16 cores per SM, which allows the GeForce GTX 1660 to handle half-precision operations twice as fast as FP32. Other Turing-based GPUs also have dual-rate FP16s via their Tensor cores. The TU116 configuration maintains this standard with the hardware built specifically for this GPU. The following table is an updated version of the one published in our review GeForce GTX 1660 Ti, which illustrates the significant improvement made by the TU116 at a half-precision rate compared to the GeForce GTX 1060 and its GP106 chip. Pascal's base.

When we ran Sandra's scientific analysis module, which tests the general matrix multiplications, we see how much processing time the number of Tensor cores of TU106 of FP16 is greater than that obtained with TU116. The GeForce GTX 1060, which only supported the FP16 symbolically, barely records on the graph.

In addition to shaders and the unified Turing architecture cache, the TU116 also supports a pair of algorithms called Content Adaptive Shading and Motion Adaptive Shading, also known as Variable Rate Shading. We covered this technology in the explored Nvidia Turing architecture: inside the GeForce RTX 2080. This story also introduced Turing's accelerated video encoding and decoding capabilities, which are also applicable to the GeForce GTX 1660.

Put all together …

Nvidia packs 24 SMs in the TU116, dividing them into three graphics processing clusters. With 64 FP32 cores per SM, 1,536 CUDA cores and 96 texture units across the GPU. By losing two SMs, the GeForce GTX 1660 is left with 1,408 active CUDA cores and 88 usable texture units.

Board partners will undoubtedly target a range of frequencies to differentiate their cards. However, the official base clock is 1530 MHz with a GPU Boost specification of 1785 MHz.. These two figures are slightly higher than the clocks of the GeForce GTX 1660 Ti, although they can not fully compensate for the missing SMs.

Our Gigabyte GeForce GTX 1660 OC 6G sample maintained a constant frequency of 1,935 MHz during three Metro last light, operating about 90 MHz faster than the 1660 Ti we examined a few weeks ago. Thus, on paper, the GeForce GTX 1660 offers up to 5 TFLOPS FP32 performance and 10 TFLOPS flow rates FP16.

Six 32-bit memory controllers provide the TU116 with a 192-bit aggregated bus, populated with 8 Gb / s GDDR5 modules up to 192 GB / s. This is comparable to the GeForce GTX 1060 6GB and a 33% reduction over the GeForce GTX 1660 Ti. Combined with the loss of two SMs, the shift from GDDR6 to GDDR5 memory accounts resulted in lower performance than the GeForce GTX 1660 compared to 1660 Ti.

Each memory controller is associated with eight ROPs and a 256KB L2 cache. In total, TU116 exhibited 48 ROP and 1.5 MB of L2. The ROP number of the GeForce GTX 1660 compares favorably with RTX 2060, which also uses 48 rendering renderings. But the L2 cache slices of TU116 are half the size of TU106.

Given the similarities to the GeForce GTX 1660 Ti, it is not surprising that the GeForce GTX 1660 has the same 120W capacity. Unfortunately, none of these graphics cards support multiple GPUs. Nvidia continues to argue that SLI is expected to generate higher absolute performance, rather than giving players a way to match configurations to a single GPU.

	Gigabyte GeForce GTX 1660 OC 6G	GeForce GTX 1660 Ti	GeForce RTX 2060 FE	GeForce GTX 1060 FE	GeForce GTX 1070 FE
Architecture (GPU)	Turing (TU116)	Turing (TU116)	Turing (TU106)	Pascal (GP106)	Pascal (GP104)
CUDA Cores	1408	1536	1920	1280	1920
Peak FP32 Compute	5 TFLOPS	5.4 TFLOPS	6.45 TLFOPS	4.4 TFLOPS	6.5 TFLOPS
Tensor nuclei	N / A	N / A	240	N / A	N / A
RT Hearts	N / A	N / A	30	N / A	N / A
Texture units	88	96	120	80	120
Basic clock rate	1530 MHz	1500 MHz	1365 MHz	1506 MHz	1506 MHz
GPU boost rate	1785 MHz	1770 MHz	1680 MHz	1708 MHz	1683 MHz
Memory capacity	6 GB GDDR5	6 GB GDDR6	6 GB GDDR6	6 GB GDDR5	8 GB GDDR5
Memory bus	192 bits	192 bits	192 bits	192 bits	256 bits
Memory bandwidth	192 GB / s	288 GB / s	336 GB / s	192 GB / s	256 GB / s
POR	48	48	48	48	64
L2 cache	1.5 MB	1.5 MB	3MB	1.5 MB	2MB
TDP	120W	120W	160W	120W	150W
Number of transistors	6.6 billion	6.6 billion	10.8 billion	4.4 billion	7.2 billion
Size of the matrix	284 mm²	284 mm²	445 mm²	200 mm²	314 mm²
SLI support	No	No	No	No	Yes (MIO)

PLUS: best graphics cards

MORE: Desktop GPU Performance Hierarchy Table

MORE: All graphic content

[ad_2]

Source link